Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Share:

Similar Tracks

SLICK: Driving SLO Culture At Meta | Dávid Bartók & Filip Klepo @Scale

2025 Swiss Conference: Bioinformatics in the AI Era: From the Data Center to the Edge hpcaiadvisorycouncil

Distributed ML Talk @ UC Berkeley Sourish Kundu

Ray, a Unified Distributed Framework for the Modern AI Stack | Ion Stoica @Scale

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83 Stanford MLSys Seminars

ML Scalability & Performance Reading Group Session 8: Megatron-LM EleutherAI

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision Aleksa Gordić - The AI Epiphany

Optics in AI Clusters - Meta Perspective Open Compute Project

Scaling RoCE Networks for AI Training | Adi Gangidi @Scale

Multi GPU Fine tuning with DDP and FSDP Trelis Research

What Makes Large Language Models Expensive? IBM Technology

Microsoft DeepSpeed introduction at KAUST KAUST Supercomputing Laboratory

How do Graphics Cards Work? Exploring GPU Architecture Branch Education

The Evolution of Multi-GPU Inference in vLLM | Ray Summit 2024 Anyscale

How Fully Sharded Data Parallel (FSDP) works? Ahmed Taha

How DeepSeek Rewrote the Transformer [MLA] Welch Labs