Similar Tracks
Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models
Serrano.Academy
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
Serrano.Academy
A friendly introduction to deep reinforcement learning, Q-networks and policy gradients
Serrano.Academy