LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Similar Tracks
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Umar Jamil
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
Umar Jamil
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
Umar Jamil
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
Umar Jamil
How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team
Lex Clips
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
Umar Jamil
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Umar Jamil