
Teaching AI models what they don’t know
Artificial intelligence systems, from generative AI like ChatGPT to autonomous driving solutions, are rapidly integrating into critical sectors. While these systems offer revolutionary capabilities, a significant challenge persists: their inability to consistently articulate the gaps in their knowledge or express uncertainty, leading to potentially profound consequences in high-stakes applications such as drug discovery, information synthesis, and autonomous vehicle operation.
Addressing this crucial reliability issue, MIT spinout Themis AI introduces its innovative Capsa platform. Designed to work seamlessly with any machine-learning model, Capsa quantifies model uncertainty and corrects unreliable outputs in mere seconds. The platform achieves this by intelligently modifying AI models, enabling them to detect patterns in their data processing that signal ambiguity, incompleteness, or inherent bias, thereby enhancing their trustworthiness and precision.
“The idea is to take a model, wrap it in Capsa, identify the uncertainties and failure modes of the model, and then enhance the model,” explains Themis AI co-founder and MIT Professor Daniela Rus, who also serves as the director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). “We’re excited about offering a solution that can improve models and offer guarantees that the model is working correctly.”
Founded in 2021 by Professor Rus alongside Alexander Amini ’17, SM ’18, PhD ’22, and Elaheh Ahmadi ’20, MEng ’21—both former research affiliates in her distinguished lab—Themis AI is rapidly making an impact. The company has already assisted telecom providers with sophisticated network planning and automation, supported oil and gas firms in leveraging AI for complex seismic imagery analysis, and contributed to published research on developing more reliable and trustworthy chatbots.
“We want to enable AI in the highest-stakes applications of every industry,” states Alexander Amini. “We’ve all seen examples of AI hallucinating or making mistakes. As AI is deployed more broadly, those mistakes could lead to devastating consequences. Themis makes it possible that any AI can forecast and predict its own failures, before they happen.”
Empowering Models to Understand Their Limitations
The foundation of Themis AI’s work stems from years of dedicated research by Professor Rus’s lab into model uncertainty. A key milestone was a 2018 funding initiative from Toyota to scrutinize the reliability of machine learning-based autonomous driving solutions. “That is a safety-critical context where understanding model reliability is very important,” Rus underscores.
Further pioneering work by Rus, Amini, and their collaborators led to an algorithm capable of detecting and neutralizing racial and gender bias in facial recognition systems by reweighting training data. This approach of identifying unrepresentative data and generating balancing samples was also successfully applied to assist pharmaceutical companies in predicting drug candidate properties, a use case that significantly shaped Themis AI’s trajectory. “Guiding drug discovery could potentially save a lot of money,” Rus notes, highlighting the profound potential of their tool.
Today, Themis AI collaborates with enterprises across diverse industries, with a particular focus on large language models (LLMs). By integrating Capsa, these LLMs gain the ability to quantify their own uncertainty for each output. “Many companies are interested in using LLMs that are based on their data, but they’re concerned about reliability,” observes Stewart Jamieson SM ’20, PhD ’24, Themis AI’s head of technology. “We help LLMs self-report their confidence and uncertainty, which enables more reliable question answering and flagging unreliable outputs.”
Beyond LLMs, Themis AI is in discussions with semiconductor companies aiming to build robust AI solutions directly on their chips, minimizing reliance on cloud environments. Jamieson elaborates, “Normally these smaller models that work on phones or embedded systems aren’t very accurate compared to what you could run on a server, but we can get the best of both worlds: low latency, efficient edge computing without sacrificing quality. We see a future where edge devices do most of the work, but whenever they’re unsure of their output, they can forward those tasks to a central server.”
In the pharmaceutical sector, Capsa continues to refine AI models used for identifying drug candidates and forecasting their performance in clinical trials. Amini explains, “The predictions and outputs of these models are very complex and hard to interpret — experts spend a lot of time and effort trying to make sense of them. Capsa can give insights right out of the gate to understand if the predictions are backed by evidence in the training set or are just speculation without a lot of grounding. That can accelerate the identification of the strongest predictions, and we think that has a huge potential for societal good.”
Research Driving Real-World Impact
The Themis AI team is confident in its position at the forefront of the perpetually evolving AI landscape. The company is actively exploring Capsa’s capacity to enhance accuracy in advanced AI techniques, such as chain-of-thought reasoning in LLMs, where models explain their step-by-step path to an answer. “We’ve seen signs Capsa could help guide those reasoning processes to identify the highest-confidence chains of reasoning,” says Jamieson, emphasizing its potential for improving LLM experiences, reducing latency, and cutting computational demands.
For Professor Rus, a serial entrepreneur from MIT, Themis AI represents a vital avenue to translate groundbreaking research into tangible real-world solutions. “My students and I have become increasingly passionate about going the extra step to make our work relevant for the world,” Rus concludes. “AI has tremendous potential to transform industries, but AI also raises concerns. What excites me is the opportunity to help develop technical solutions that address these challenges and also build trust and understanding between people and the technologies that are becoming part of their daily lives.”



