Toussaint introduced the concept of “embodied language models”, which build a bridge between words and perceptions by directly integrating continuous sensor modalities from the real world into language models. These models process multimodal sentences that fuse visual, continuous state estimation and textual input. These models are trained end-to-end, in combination with pre-trained large language models, for various embodied tasks including sequential robotic manipulation planning, visual question answering and description.
The renowned robotics expert spoke about his Task and Motion Planning (TAMP) research, a fundamental topic in robotics that aims to define complex robot behaviors. He presented a nuanced perspective on planning and emphasized that it goes beyond conventional methods and includes optimization and sampling in addition to learning approaches. He also described how LLMs can reduce the gap between objects in the real world and the language humans use to describe them, enabling natural language interaction with robots. He concluded by discussing his thoughts on the broader goals of AI development, arguing for a deeper understanding of the systems we create beyond their operational functionality and questioning the sole reliance on data accumulation.
Marc Toussaint emphasized the importance of the interplay between learning and thinking in AI. His research combines planning, optimization, inference and machine learning to solve fundamental problems in robotics and physical reasoning. We look forward to the next Munich AI Lecture in June. Details on the exact date and speaker will follow.