Similar Tracks
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
Umar Jamil
Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback
Stanford Online