
Themis AI: Teaching AI Models What They Don’t Know
In the rapidly evolving landscape of artificial intelligence, ensuring the reliability and accuracy of AI systems is paramount. AI models like ChatGPT often provide seemingly plausible answers, but they may not always reveal the gaps in their knowledge or areas of uncertainty. This can pose significant challenges, especially as AI is increasingly used in critical applications such as drug development, information synthesis, and autonomous driving.
Themis AI, an MIT spinout, is tackling this problem head-on with its Capsa platform. Capsa is designed to quantify model uncertainty and correct outputs before they lead to bigger issues. This innovative platform can be integrated with any machine-learning model to detect and rectify unreliable outputs in mere seconds. By modifying AI models, Capsa enables them to identify patterns in their data processing that indicate ambiguity, incompleteness, or bias.
“The idea is to take a model, wrap it in Capsa, identify the uncertainties and failure modes of the model, and then enhance the model,” explains Daniela Rus, Themis AI co-founder and MIT Professor, who also directs the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). “We’re excited about offering a solution that can improve models and offer guarantees that the model is working correctly.”
Founded in 2021 by Rus, Alexander Amini, and Elaheh Ahmadi, Themis AI has already made significant strides in various industries. The company has assisted telecom companies with network planning and automation, helped oil and gas companies leverage AI for seismic imagery analysis, and published research on developing more trustworthy chatbots.
“We want to enable AI in the highest-stakes applications of every industry,” says Amini. “We’ve all seen examples of AI hallucinating or making mistakes. As AI is deployed more broadly, those mistakes could lead to devastating consequences. Themis makes it possible that any AI can forecast and predict its own failures, before they happen.”
Rus’ lab has dedicated years to researching model uncertainty. In 2018, with funding from Toyota, her team studied the reliability of machine learning-based autonomous driving solutions, recognizing the critical importance of understanding model reliability in safety-critical contexts.
In separate research, Rus, Amini, and their collaborators developed an algorithm capable of detecting and mitigating racial and gender bias in facial recognition systems. This algorithm automatically reweighted the model’s training data, effectively eliminating bias by identifying unrepresentative data segments and generating new, balanced data samples.
In 2021, the co-founders demonstrated that a similar approach could be applied to help pharmaceutical companies predict the properties of drug candidates using AI models. This breakthrough led to the establishment of Themis AI later that year.
“Guiding drug discovery could potentially save a lot of money,” Rus notes. “That was the use case that made us realize how powerful this tool could be.”
Today, Themis AI collaborates with enterprises across diverse sectors, many of which are developing large language models (LLMs). By using Capsa, these models can quantify their own uncertainty for each output, enhancing reliability and trustworthiness.
“Many companies are interested in using LLMs that are based on their data, but they’re concerned about reliability,” observes Stewart Jamieson, Themis AI’s head of technology. “We help LLMs self-report their confidence and uncertainty, which enables more reliable question answering and flagging unreliable outputs.”
Themis AI is also engaged in discussions with semiconductor companies to integrate AI solutions onto their chips for use outside of cloud environments. This would enable low-latency, efficient edge computing without sacrificing quality, as devices can forward uncertain tasks to a central server.
Pharmaceutical companies can also utilize Capsa to enhance AI models for identifying drug candidates and predicting their performance in clinical trials, potentially accelerating the identification of the most promising predictions.
The team at Themis AI is dedicated to pushing the boundaries of AI technology. They are currently exploring Capsa’s ability to improve accuracy in chain-of-thought reasoning, an AI technique where LLMs explain the steps they take to reach an answer.
“We’ve seen signs Capsa could help guide those reasoning processes to identify the highest-confidence chains of reasoning,” Jamieson says. “We think that has huge implications in terms of improving the LLM experience, reducing latencies, and reducing computation requirements. It’s an extremely high-impact opportunity for us.”
For Rus, Themis AI is an opportunity to ensure her MIT research has a real-world impact. “My students and I have become increasingly passionate about going the extra step to make our work relevant for the world,” she concludes. “AI has tremendous potential to transform industries, but AI also raises concerns. What excites me is the opportunity to help develop technical solutions that address these challenges and also build trust and understanding between people and the technologies that are becoming part of their daily lives.”



