MIT Researchers Devise Novel Diagram-Based Method to Optimize Complex AI Systems

Software designers are increasingly tasked with coordinating complex interactive systems, from city transportation networks to efficient robots. Researchers at MIT have introduced a groundbreaking approach to these challenges, employing simple diagrams to refine software optimization within deep-learning models.

According to the researchers, this method simplifies complex tasks to the point where solutions can be sketched on a napkin. This innovative technique is detailed in the journal Transactions of Machine Learning Research, in a paper co-authored by incoming doctoral student Vincent Abbott and Professor Gioele Zardini from MIT’s Laboratory for Information and Decision Systems (LIDS).

“We designed a new language to talk about these new systems,” explains Zardini. This diagram-based language is rooted in category theory, enabling the design of computer algorithm architectures that control various system components. These algorithms manage information exchange, energy consumption, and memory usage. Optimizing these systems is challenging due to the cascading effects of changes in one area impacting others.

The researchers focused on deep-learning algorithms, which underpin AI models like ChatGPT and Midjourney. These models process data through a series of matrix multiplications, refining parameters during extensive training runs to identify complex patterns. With models containing billions of parameters, resource efficiency and optimization are crucial.

Diagrams can illustrate the parallelized operations in deep-learning models, revealing relationships between algorithms and GPU hardware from companies like NVIDIA. Zardini expressed excitement about this, stating, “We seem to have found a language that very nicely describes deep learning algorithms, explicitly representing all the important things, which is the operators you use,” including energy consumption, memory allocation, and other optimization parameters.

Resource efficiency optimizations have driven much of the progress in deep learning. The DeepSeek model demonstrated that a small team can compete with major labs by prioritizing resource efficiency and the interplay between software and hardware. Traditionally, these optimizations require extensive trial and error. For instance, the FlashAttention optimization program took over four years to develop. However, this new framework allows for a more formal approach, visually represented in a precisely defined graphical language.

Current methods for finding improvements are limited, highlighting a gap in systematically relating algorithms to their optimal execution and resource requirements. The new diagram-based method addresses this gap.

Category theory, which underpins this approach, is a mathematical framework for describing system components and their interactions in an abstract manner. It connects different perspectives, such as mathematical formulas to resource-using algorithms or system descriptions to robust “monoidal string diagrams.” These diagrams facilitate experimentation with component connections and interactions. The developed method essentially enhances “string diagrams” with additional graphical conventions and properties.

Abbott explains, “Category theory can be thought of as the mathematics of abstraction and composition. Any compositional system can be described using category theory, and the relationship between compositional systems can then also be studied.” This approach translates algebraic rules into diagrams, creating a correspondence between different systems.

This solves the critical problem of deep-learning algorithms lacking clear mathematical models. Representing them as diagrams allows for a formal and systematic approach.

This method also provides a clear visual understanding of how parallel real-world processes can be represented by parallel processing in multicore computer GPUs. Abbott notes that diagrams can both represent a function and reveal how to optimally execute it on a GPU.

The “attention” algorithm, crucial for deep-learning models requiring contextual information, is a key phase in large language models like ChatGPT. FlashAttention, an optimization that took years to develop, significantly improved the speed of attention algorithms.

Applying their method to FlashAttention, Zardini states, “Here we are able to derive it, literally, on a napkin,” though he jokes it might be a large one. Their research paper on the work is titled “FlashAttention on a Napkin” to emphasize the simplification achieved.

Abbott says, “This method allows for optimization to be really quickly derived, in contrast to prevailing methods.” While initially applied to FlashAttention to verify its effectiveness, Zardini hopes to automate the detection of improvements. Ultimately, they plan to develop software that allows researchers to upload code and automatically receive an optimized version, says Zardini, who in addition to being a principal investigator in LIDS, is the Rudge and Nancy Allen Assistant Professor of Civil and Environmental Engineering, and an affiliate faculty with the Institute for Data, Systems, and Society.

In addition to automating algorithm optimization, Zardini notes that a robust analysis of how deep-learning algorithms relate to hardware resource usage allows for systematic co-design of hardware and software. This line of work integrates with Zardini’s focus on categorical co-design, which uses the tools of category theory to simultaneously optimize various components of engineered systems.

Abbott believes that the field of optimized deep learning models is critically unaddressed, making these diagrams particularly exciting. “They open the doors to a systematic approach to this problem.”

Jeremy Howard, founder and CEO of Answers.ai, who was not associated with this work, said, “I’m very impressed by the quality of this research. … The new approach to diagramming deep-learning algorithms used by this paper could be a very significant step.” He added, “This paper is the first time I’ve seen such a notation used to deeply analyze the performance of a deep-learning algorithm on real-world hardware. … The next step will be to see whether real-world performance gains can be achieved.”

Petar Velickovic, a senior research scientist at Google DeepMind and a lecturer at Cambridge University, who was also not associated with this work, praised the research as “a beautifully executed piece of theoretical research, which also aims for high accessibility to uninitiated readers — a trait rarely seen in papers of this kind.” He added, “These researchers are clearly excellent communicators, and I cannot wait to see what they come up with next!”

The new diagram-based language has already garnered significant attention from software developers. A reviewer of Abbott’s prior paper noted that “The proposed neural circuit diagrams look great from an artistic standpoint (as far as I am able to judge this).” Zardini quips, “It’s technical research, but it’s also flashy!”