Writing TensorRT plugins using Triton and PythonDeveloping custom TensorRT plugins via Python APIJul 22, 2024Jul 22, 2024
A Friendly Introduction to TensorRT: Building EnginesLearn to export models to an efficient model formatMay 6, 2024May 6, 2024
Sparse, Quantize and Serving LLMs with NeuralMagic, AutoGPTQ and vLLMA guide to explore Sparse techniques to compress LLMsApr 9, 2024Apr 9, 2024
Add Non Maximum Suppression (NMS) to object detection model using ONNXIntegrate NMS node to your ONNX modelSep 12, 2023Sep 12, 2023
Build a image preprocessing model using Pytorch and integrate into your model using ONNXReduce your project’s dependencies with ONNXSep 6, 2023Sep 6, 2023
Run LLAMA-2 models in a Colab instance using GGML and CTransformersTry new META AI models in free enviromentsJul 18, 20233Jul 18, 20233
Serving Falcon models with 🤗 Text Generation Inference (TGI)Run your LLM eficiently with TGI and LangChain integration Jun 11, 20231Jun 11, 20231
Run your private LLM: Falcon-7B-Instruct with less than 6GB of GPU using 4-bit quantizationBuilding with BitsAndBytes, HuggingFace and LangChainJun 9, 20234Jun 9, 20234
🤖ChatTube🎥: Chat with Youtube VideoBuilding a Retrieval Question Answering System to YouTube videos with LangChain, OpenAI and FAISSJun 6, 2023Jun 6, 2023
Uma análise de tweets com as #TheBatman e #BatmanTrabalho referente ao requisito da disciplina Network Analysis ministrada pelo professor Ivanovitch Silva. Feito em grupo com o Pedro…Feb 18, 2022Feb 18, 2022