Similar Tracks
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Umar Jamil
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
Umar Jamil
Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?
Neural Breakdown with AVB
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Umar Jamil
Abstracting Failures Away From Stateful Dataflow Systems | KTH MSc Thesis Defense 2024
Aleksey Veresov
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
Umar Jamil