
MIT Researchers Pioneer New Diagram-Based Method to Optimize Complex AI Systems
Researchers at MIT have unveiled a groundbreaking approach to optimizing complex coordinated systems, such as city transportation networks and efficient robots. This new method employs simple diagrams as tools to enhance software optimization within deep-learning models, potentially revolutionizing how these systems are designed and managed.
According to the researchers, this innovative approach simplifies complex tasks to the point where solutions can be sketched on the back of a napkin. The details of this method are published in the journal Transactions of Machine Learning Research, in a paper co-authored by Vincent Abbott and Professor Gioele Zardini from MIT’s Laboratory for Information and Decision Systems (LIDS).
“We designed a new language to talk about these new systems,” explains Zardini, emphasizing that this diagram-based language is rooted in category theory. This framework aids in designing the architecture of computer algorithms that control and sense different parts of the optimized system. Optimizing these systems is challenging due to the ripple effect of changes in one part impacting others, making it difficult to manage effectively.
The team focused on deep-learning algorithms, which are central to AI models like ChatGPT and Midjourney. These models rely on deep series of matrix multiplications and other operations, with parameters updated during extensive training runs to identify complex patterns. Resource efficiency and optimization are crucial, given the billions of parameters involved.
Diagrams are used to represent the parallelized operations in deep-learning models, revealing relationships between algorithms and the parallelized GPU hardware, often provided by companies like NVIDIA. Zardini expresses enthusiasm, stating that they’ve discovered a language that effectively describes deep learning algorithms by explicitly representing operators, energy consumption, memory allocation, and other optimization parameters.
Resource efficiency optimizations have significantly driven progress in deep learning. DeepSeek’s model demonstrated that a small team can compete with top models by focusing on resource efficiency and software-hardware relationships. Traditionally, these optimizations require substantial trial and error. For instance, FlashAttention, a widely used optimization program, took over four years to develop. The new framework promises a more formal approach, visually represented in a precisely defined graphical language.
The current methods for finding these improvements are limited, highlighting a gap in formally relating an algorithm to its optimal execution and resource requirements. The new diagram-based method aims to fill this gap.
Category theory, the foundation of this approach, mathematically describes system components and their interactions in an abstract manner. It allows for relating different perspectives, such as mathematical formulas to resource-using algorithms, or system descriptions to robust “monoidal string diagrams.” These diagrams enable experimentation with different parts and their interactions, incorporating many graphical conventions and properties.
Abbott describes category theory as the mathematics of abstraction and composition, applicable to any compositional system. Algebraic rules associated with functions can be represented as diagrams, creating a correspondence between different systems.
This approach solves the problem of deep-learning algorithms not being clearly understood as mathematical models. Representing them as diagrams allows for a formal and systematic approach.
It also provides a clear visual understanding of how parallel real-world processes can be represented by parallel processing in multicore computer GPUs. According to Abbott, diagrams can represent a function and show how to optimally execute it on a GPU.
The “attention” algorithm, crucial for deep-learning algorithms requiring contextual information, is a key phase in large language models like ChatGPT. FlashAttention, an optimization that took years to develop, significantly improved the speed of attention algorithms.
Applying their method to FlashAttention, Zardini notes that they could derive it, “literally, on a napkin.” This simplification underscores the power of their approach in managing complex algorithms, as reflected in their paper titled “FlashAttention on a Napkin.”
Abbott emphasizes that this method allows for quick derivation of optimization, contrasting with prevailing methods. While initially applied to FlashAttention to verify its effectiveness, the team hopes to automate the detection of improvements. Zardini envisions software that allows researchers to upload code and automatically detect and optimize it.
In addition to automation, a robust analysis of how deep-learning algorithms relate to hardware resource usage enables systematic co-design of hardware and software. This integrates with Zardini’s focus on categorical co-design, which optimizes various components of engineered systems simultaneously.
Abbott believes that the field of optimized deep learning models is critically unaddressed, making these diagrams a significant step towards a systematic approach.
Jeremy Howard, founder and CEO of Answers.ai, who was not involved in the research, praised the work, noting that the diagramming approach to deep-learning algorithms could be a very significant step. Petar Velickovic, a senior research scientist at Google DeepMind and lecturer at Cambridge University, also lauded the research for its accessibility and communication.
The new diagram-based language has already garnered attention from software developers, with one reviewer noting its artistic appeal. Zardini remarked, “It’s technical research, but it’s also flashy!”