Home Blog Newsfeed 3 Questions: How to help students recognize potential bias in their AI datasets
3 Questions: How to help students recognize potential bias in their AI datasets

3 Questions: How to help students recognize potential bias in their AI datasets

Cambridge, MA – As artificial intelligence models increasingly find application in critical fields like healthcare, enabling doctors to diagnose diseases and recommend treatments, a significant gap in current educational practices has come to light. Thousands of students annually learn to deploy these powerful AI systems, yet many courses overlook a crucial component: training in the detection of inherent flaws and biases within the datasets used to develop these very models.

Leading this vital conversation is Leo Anthony Celi, a distinguished senior research scientist at MIT’s Institute for Medical Engineering and Science, a physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School. Celi has meticulously documented these educational shortcomings in a new paper titled “Race against the machine learning courses”. His work underscores an urgent need to persuade course developers to equip students with the skills to thoroughly evaluate data before its incorporation into AI models. This is particularly pertinent given numerous previous studies have shown that models primarily trained on clinical data from white males often perform poorly when applied to individuals from other demographic groups.

Celi delves into the pervasive nature of bias in datasets, explaining that any problems embedded in the data will inevitably be ‘baked into’ the resulting AI models. He cites the example of pulse oximeters, which have been found to overestimate oxygen levels for people of color due to insufficient representation in their clinical trials. This issue extends to a broad range of medical devices, often optimized for healthy young males, despite being used for diverse populations such as 80-year-old women with heart failure, without FDA requirements for broader efficacy proof. Furthermore, the electronic health record (EHR) system, never designed as a learning system, presents substantial challenges. While a complete overhaul is not imminent, Celi advocates for smarter, more creative approaches to leverage existing, imperfect data for building algorithms. A promising solution being explored is the development of a transformer model for numeric EHR data, aiming to mitigate the impact of missing data caused by social determinants of health and provider biases.

The imperative for AI courses to cover potential bias sources is stark. Celi recounts that his MIT course, established in 2016, realized they were inadvertently encouraging a race to build models optimized for statistical performance, while the underlying data was plagued with unaddressed issues. An analysis of 11 online AI courses revealed a concerning trend: only five included sections on dataset bias, and a mere two offered any significant discussion. Despite the undeniable value of these courses for self-learners, Celi argues that their immense influence demands a stronger commitment to teaching essential skillsets, especially as more individuals enter the ‘AI multiverse’. His paper aims to illuminate this critical educational deficit.

To address this, Celi recommends that course developers incorporate specific content. This includes providing students with a checklist of foundational questions: Where did this data originate? Who were the observers and collectors of the data (doctors, nurses)? What is the landscape of the institutions involved? For instance, in an ICU database, understanding who is admitted versus who isn’t reveals significant sampling selection bias. If minority patients disproportionately fail to reach the ICU, models trained on such data will inherently fail them. Celi firmly believes that at least 50% of course content, if not more, should be dedicated to understanding the data, as the modeling process itself becomes relatively straightforward once data comprehension is achieved.

The MIT Critical Data consortium, since 2014, has organized datathons globally, bringing together healthcare professionals and data scientists to scrutinize databases within local contexts. This interdisciplinary approach is key to fostering critical thinking, which Celi stresses cannot be taught effectively in homogenous groups. These datathons naturally cultivate critical thinking by assembling individuals from diverse backgrounds and generations. Participants are consistently advised against building any model before gaining a profound understanding of data provenance, patient inclusion criteria, measurement devices, and their consistent accuracy across individuals. While encouraging the use of local datasets in global events, Celi acknowledges initial resistance due to the inevitable discovery of data imperfections. However, he maintains that acknowledging these flaws is the first step towards rectifying them, citing the decade-long process to refine MIMIC (Medical Information Mart for Intensive Care) database schema through continuous feedback. The ultimate goal is to empower students to recognize the vast potential of AI, alongside the immense risk of harm if the foundational data is not handled correctly, transforming their approach to the field.

Add comment

Sign Up to receive the latest updates and news

Newsletter

© 2025 Proaitools. All rights reserved.