GRPO's new variants and implementation secrets

GRPO's new variants and implementation secrets

Share:

Similar Tracks

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR) Nathan Lambert

How to approach post-training for AI applications Nathan Lambert

DPO Debate: Is RL needed for RLHF? Nathan Lambert

How DeepSeek learns: GRPO explained with Triangle Creatures Dr Mihai Nica

MINIMUM TIME TO REACH LAST ROOM II | LeetCode 3342 | Dijkstra's Algorithm R Sai Siddhu

What Textbooks Don't Tell You About Curve Fitting Artem Kirsanov

Lofi hip hop mix - Beats to Relax/Study to [2018] Lofi Girl

An update on DPO vs PPO for LLM alignment Nathan Lambert

[Talk] Dissertation Talk: Synergy of Prediction and Control in Model-based Reinforcement Learning Nathan Lambert

The Magic of LLM Distillation — Rishabh Agarwal, Google DeepMind Latent Space

Model Context Protocol (MCP), clearly explained (why it matters) Greg Isenberg

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Yannic Kilcher

Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI Cognitive Revolution \

How does GRPO work? Trelis Research

Early stages of the reinforcement learning era of language models Nathan Lambert

DeepSeek R1 Theory Overview | GRPO + RL + SFT Deep Learning with Yacine

Transformers (how LLMs work) explained visually | DL5 3Blue1Brown