Model Compression¶
Techniques for making Large Language Models smaller, faster, and more efficient without significant loss in performance.
Contents¶
- Compression Techniques - Quantization (PTQ, QAT), pruning, and knowledge distillation
Discover methods to deploy LLMs more efficiently through compression and optimization.