[2024-04-12] Dr. Alberto Bietti,Flatiron Institute of the Simons Foundation,"Understanding Transformers through Associative Memories”

  • 2024-01-02
Title:  Understanding Transformers through Associative Memories
Date: 2024-04-12  14:20-15:30
Location:  CSIE  R103
Speaker:  Dr. Alberto Bietti,Flatiron Institute of the Simons Foundation
 Prof.  Chih-Jen Lin

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we
illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.


Alberto Bietti is a research scientist in the Center for Computational Mathematics at the Flatiron Institute of the Simons Foundation in New York. He received his PhD in applied mathematics from Inria and Université Grenoble Alpes in 2019, and was a Faculty Fellow at the NYU Center for Data Science from 2020 to 2022. He also spent time at Inria Paris, Microsoft Research, and Meta AI. Prior to his PhD, he obtained degrees from Mines ParisTech and Ecole Normale Supérieure, and worked as a software engineer at Quora. His research focuses on the theoretical foundations of deep learning.