Zurück zur vorherigen Seite

Solvable High-Dimensional Attention Models: A Theory of Generalization for Token Sequences – 29.01.2026

The lecture is dedicated to the theoretical analysis of attention layers, which today form the heart of modern machine learning architectures. The focus is on the question of how such systems generalize from data and what principles determine their learning behavior on sequences of tokens. Using analyzable high-dimensional models, learning and generalization performances are characterized in closed form for the first time.

The focus is on supervised learning scenarios in which the models enable precise theoretical predictions and provide mechanistic insights into the representational learning of attention-based architectures. Finally, an outlook is given on how these results pave the way for manageable theoretical models for self-supervised and generative training with attention.

You can find more information on the homepage of the Munich AI Lectures:
https://baiosphere.org/science/munich-ai-lectures



About the Speaker: Lenka Zdeborová is a professor of physics and computer science at the École Polytechnique Fédérale de Lausanne (EPFL) and leads the Statistical Physics of Computation Laboratory there. Her research combines methods of statistical physics with questions from machine learning, inference, and optimization. She has received numerous awards, including the CNRS Bronze Medal, the Irène Joliot-Curie Prize, and ERC Grants, and works on theoretical models that explain how modern AI systems learn, generalize, and scale.

  • Organizer: Technical University of Munich (TUM)

    Here is where the lecture takes place: https://nav.tum.de/room/0101.Z1.090

  • Language: English

  • Target group: Researchers, students, scientists from machine learning, AI, mathematics, physics, and theoretical computer science