Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Similar Tracks
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
Umar Jamil
19 - Jimmy Weber (Institute of Neuroinformatics, University of Zurich and ETH Zurich)
Neuro Inspired Computational Elements Conference
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Umar Jamil
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Umar Jamil