The "three variances" of transformers often refer to the major types or variants of transformer architectures that serve different purposes, especially in Natural Language Processing (NLP). Each variance is a modification of the original transformer architecture introduced in "Attention is All You Need" by Vaswani et al. (2017), but tailored for specific tasks or types of data. Here’s an overview