Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Similar Tracks
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Umar Jamil
Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!
StatQuest with Josh Starmer