Similar Tracks
Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
Yannic Kilcher
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
Yannic Kilcher