ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

Similar Tracks
∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)
Yannic Kilcher
ALiBi | Train Short, Test Long: Attention With Linear Biases Enables Input Length Extrapolation
Aleksa Gordić - The AI Epiphany
Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)
Yannic Kilcher
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Yannic Kilcher