A Visual Guide to Mixture of Experts (MoE) in LLMs Share: Download MP3 Similar Tracks Transformers (how LLMs work) explained visually | DL5 3Blue1Brown Intuition behind Mamba and State Space Models | Enhancing LLMs! Maarten Grootendorst Topic Modeling with Llama 2 Maarten Grootendorst Visualizing transformers and attention | Talk for TNG Big Tech Day '24 Grant Sanderson Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ) Maarten Grootendorst Mixture of Experts: How LLMs get bigger without getting slower Julia Turc Fine-tuning Large Language Models (LLMs) | w/ Example Code Shaw Talebi But what is a neural network? | Deep learning chapter 1 3Blue1Brown Vision Transformer Basics Samuel Albanie What is Mixture of Experts? IBM Technology Backpropagation Details Pt. 1: Optimizing 3 parameters simultaneously. StatQuest with Josh Starmer MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention Alexander Amini Illustrated Guide to Transformers Neural Network: A step by step explanation The AI Hacker Multimodal RAG: A Beginner-friendly Guide (with Python Code) Shaw Talebi Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) Stanford Online Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!! StatQuest with Josh Starmer How I use LLMs Andrej Karpathy Gradient descent, how neural networks learn | DL2 3Blue1Brown Transformers and Self-Attention (DL 19) Professor Bryce Quantization vs Pruning vs Distillation: Optimizing NNs for Inference Efficient NLP