
3 Questions: Helping Students Spot Bias in AI Datasets
In an era where artificial intelligence models are increasingly deployed in critical fields like medical diagnosis, a significant gap exists in how students are trained to develop and deploy these models. Many courses fail to adequately teach students to identify and address biases present in the datasets used to train these AI systems. This oversight can lead to models that perform poorly or unfairly when applied to diverse populations.
Leo Anthony Celi, a senior research scientist at MIT’s Institute for Medical Engineering and Science, a physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School, is advocating for a shift in AI education. Celi’s new paper highlights the critical need for students to thoroughly evaluate data for biases before integrating it into their models. He points out that models primarily trained on data from specific demographics, such as white males, often fail to generalize effectively to other groups.
In a recent interview, Celi addressed key questions about the sources of bias in datasets and how educators can better prepare students to recognize and mitigate these issues:
Q: How does bias get into these datasets, and how can these shortcomings be addressed?
Celi explained that biases are inherent in the data itself, which then get incorporated into the AI models. He cited the example of pulse oximeters, which were found to overestimate oxygen levels in people of color due to insufficient representation in clinical trials. He emphasized that medical devices are often optimized on healthy young males, neglecting the diversity of the patient population. Additionally, he cautioned against relying heavily on electronic health record systems, as they were not designed for AI learning and can contain various biases.
One promising solution is the development of transformer models that can analyze numeric electronic health record data, such as laboratory test results. These models can help mitigate the effects of missing data and provider biases by understanding the underlying relationships between different health indicators.
Q: Why is it important for courses in AI to cover the sources of potential bias? What did you find when you analyzed such courses’ content?
Celi noted that his course at MIT, which began in 2016, recognized the need to address data biases. He found that many online AI courses focus primarily on building models without adequately emphasizing the importance of scrutinizing the data. His analysis of 11 courses revealed that only five included sections on bias, and only two contained significant discussions on the topic.
Despite the value of these courses, Celi argues that it is crucial to equip students with the skills to critically evaluate AI data. He hopes that his paper will highlight the significant gap in current AI education and encourage course developers to prioritize teaching students how to identify and address biases.
Q: What kind of content should course developers be incorporating?
Celi suggests providing students with a checklist of questions to guide their data evaluation. These questions should focus on the data’s origin, the observers involved in its collection, and the characteristics of the institutions where the data was gathered. Understanding the landscape of these institutions, such as the criteria for ICU admission, is essential for identifying potential sampling selection biases. Celi believes that at least 50 percent of course content should be dedicated to understanding the data.
He also highlighted the importance of critical thinking skills, which can be fostered by bringing together individuals from diverse backgrounds. The MIT Critical Data consortium organizes datathons worldwide, where doctors, nurses, data scientists, and other healthcare workers collaborate to analyze health and disease in local contexts.
Celi advises students to avoid building models until they thoroughly understand the data’s origins, the patients included, and the devices used for measurement. He encourages the use of local datasets to ensure relevance and emphasizes that acknowledging data imperfections is a necessary step toward improvement. The development of the MIMIC database at Beth Israel Deaconess Medical Center, for example, took a decade and relied on feedback to refine its schema.
Ultimately, Celi hopes to inspire students to recognize both the immense potential and the risks associated with AI, emphasizing the importance of responsible and ethical data practices.



