White-Box Transformers via Sparse Rate Reduction

Publication
arXiv preprint arXiv:2306.01129