Unpacking the bias of large language models

Unpacking the bias of large language models

Large Language Models (LLMs), the powerful AI systems behind many modern virtual assistants and content generators, are revolutionizing how we interact with information. However, recent research from MIT has unveiled a significant limitation: a pervasive “position bias” that causes these models to overemphasize content at the beginning and end of a document or conversation, often neglecting crucial information in the middle.

This phenomenon, aptly dubbed “lost-in-the-middle,” means that a lawyer using an LLM-powered assistant to find a specific phrase in a 30-page affidavit is far more likely to succeed if that phrase is on the initial or final pages. This inherent bias raises critical questions about the reliability and fairness of LLMs in high-stakes applications, from legal document review to medical diagnostics.

A team of MIT researchers, including Xinyi Wu, a graduate student in the MIT Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS), has delved deep into the black-box nature of these models to uncover the underlying mechanism. They developed a theoretical framework to trace the flow of information through the machine-learning architecture—specifically, the transformer networks—that form the backbone of LLMs. Their findings pinpoint how certain design choices in model construction, alongside the training data itself, contribute to this position bias.

The research, which will be presented at the International Conference on Machine Learning, reveals that attention masking techniques, designed to manage computational load by limiting which words a token can ‘attend to,’ play a significant role. For instance, a common ‘causal mask,’ which only allows words to focus on those that came before them, inherently biases the model toward the beginning of an input, even if the earlier words are not semantically more important. This bias intensifies as more layers of the attention mechanism are added to the model.

“These models are black boxes, so as an LLM user, you probably don’t know that position bias can cause your model to be inconsistent. You just feed it your documents in whatever order you want and expect it to work,” explains Xinyi Wu, the paper’s first author. “But by understanding the underlying mechanism of these black-box models better, we can improve them by addressing these limitations.”

The team also investigated the role of positional encodings, mechanisms that help models understand the location of each word. While positional encodings can mitigate position bias by reinforcing connections between nearby words, their effectiveness can be diluted in models with numerous attention layers. Crucially, the researchers emphasize that while design choices are a significant factor, the training data used to teach these models how to prioritize words also contributes to the problem, suggesting a need for finetuning based on data characteristics.

Their experimental validation of the “lost-in-the-middle” phenomenon demonstrated a clear U-shaped accuracy pattern for information retrieval: models performed best when the correct answer was at the beginning, performance declined towards the middle, and slightly rebounded at the end. This practical confirmation underscores the real-world implications of their theoretical findings.

“By doing a combination of theory and experiments, we were able to look at the consequences of model design choices that weren’t clear at the time. If you want to use a model in high-stakes applications, you must know when it will work, when it won’t, and why,” says Ali Jadbabaie, professor and head of the Department of Civil and Environmental Engineering and a senior author on the paper.

The theoretical framework developed by the MIT team offers a pathway not only to diagnose but also to correct position bias in future LLM designs. Potential solutions include adopting different masking techniques, optimizing the number of attention layers, or strategically applying positional encodings. This groundbreaking work paves the way for more reliable chatbots that maintain context in long conversations, fairer medical AI systems that process patient data without overlooking critical details, and more attentive code assistants. As Amin Saberi, professor and director of the Stanford University Center for Computational Market Design, commented, this research offers “mathematical clarity paired with insights that reach into the guts of real-world systems.”

The research was supported in part by the U.S. Office of Naval Research, the National Science Foundation, and an Alexander von Humboldt Professorship. The co-authors on the paper also include Yifei Wang, an MIT postdoc, and Stefanie Jegelka, an associate professor of electrical engineering and computer science (EECS) and a member of IDSS and the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Add comment

Sign Up to receive the latest updates and news

Newsletter

© 2025 Proaitools. All rights reserved.