MIT Researchers Develop AI to Generate More Accurate Code in Any Language

In a significant advancement for artificial intelligence in programming, researchers at MIT and collaborating institutions have developed a new approach that enables large language models (LLMs) to generate more accurate and error-free computer code. This innovation addresses a critical challenge: ensuring that AI-generated code adheres to the rules of programming languages and avoids causing system crashes.

The existing methods often struggle to maintain both structural validity and the intended meaning of the code. Some techniques distort the model’s output, while others are too computationally intensive for complex tasks. The MIT team’s solution focuses on guiding the LLM to produce code that is both syntactically correct and semantically accurate, significantly boosting computational efficiency.

Their method allows the LLM to prioritize outputs that are most likely to be valid and accurate, discarding less promising outputs early in the process. This probabilistic approach leads to substantial efficiency gains, allowing smaller LLMs to outperform much larger models in generating accurate outputs across diverse real-world applications, including molecular biology and robotics.

According to João Loula, an MIT graduate student and co-lead author of the paper, this architecture could empower non-experts to control AI-generated content. For example, business professionals could use natural language prompts to write complex queries in SQL, a language used for database manipulation.

“This work has implications beyond research. It could improve programming assistants, AI-powered data analysis, and scientific discovery tools by ensuring that AI-generated outputs remain both useful and correct,” says João Loula.

The research, which will be presented at the International Conference on Learning Representations, involves engineering knowledge into the LLM to steer it toward the most promising outputs. This approach combines expert knowledge with the LLM’s existing knowledge to generate code that aligns with the user’s intended structure and meaning.

Vikash Mansinghka, a principal research scientist at MIT, explains, “We are not trying to train an LLM to do this. Instead, we are engineering some knowledge that an expert would have and combining it with the LLM’s knowledge, which offers a very different approach to scaling than you see in deep learning.”

The researchers employ a technique called sequential Monte Carlo, which enables parallel generation from an LLM. The model dynamically allocates resources to different threads of parallel computation based on how promising their output appears. Each output is assigned a weight reflecting its likelihood of being structurally valid and semantically accurate. The model then focuses on outputs with higher weights, discarding the rest.

To test their approach, the researchers tasked LLMs with generating Python code, SQL database queries, molecular structures, and robot plans. The new method demonstrated superior accuracy and reduced computational requirements compared to existing techniques. For instance, in Python code generation, their architecture enabled a small, open-source model to outperform a larger, specialized commercial model.

Looking ahead, the researchers plan to extend their technique to control larger segments of generated text and integrate it with learning processes to enhance the model’s accuracy over time. This project holds potential for broad applications, including automated data modeling and querying generative models of databases.

Mansinghka adds that the approach could also enable machine-assisted data analysis systems, where users can interact with software that accurately models the meaning of the data and the questions being asked.