MIT Researchers Uncover How Large Language Models Can Be Biased

Large language models (LLMs) have shown a tendency to overemphasize information presented at the beginning and end of documents or conversations, often overlooking details in the middle. This phenomenon, known as “position bias,” can impact the reliability of LLMs in various applications.

Researchers at MIT have delved into the underlying mechanisms causing this bias. Their findings reveal that certain design choices within LLM architectures, specifically those governing how the model processes input data, can contribute to position bias. Additionally, the training data used to develop these models also plays a role.

Through theoretical frameworks and experiments, the MIT team pinpointed that model architectures, particularly those influencing information spread across input words, can either create or intensify position bias. Their work not only identifies the origins of the bias but also proposes a framework for diagnosing and rectifying it in future model designs.

The implications of this research are significant. By addressing position bias, chatbots could become more consistent during extended conversations, medical AI systems could reason more equitably when analyzing patient data, and code assistants could pay closer attention to all parts of a program.

“These models are black boxes, so as an LLM user, you probably don’t know that position bias can cause your model to be inconsistent. You just feed it your documents in whatever order you want and expect it to work,” explains Xinyi Wu, a graduate student at MIT. “But by understanding the underlying mechanism of these black-box models better, we can improve them by addressing these limitations.”

Wu and her colleagues, including Yifei Wang, Stefanie Jegelka, and Ali Jadbabaie, presented their research at the International Conference on Machine Learning. Their work utilizes a graph-based theoretical framework to analyze how modeling choices, such as attention masks and positional encodings, can affect position bias. The study reveals that causal masking can inherently bias the model towards the beginning of an input, even if the data itself is unbiased.

The researchers also discovered that positional encodings, which link words more strongly to nearby words, can help mitigate position bias. However, this effect can be diluted in models with multiple attention layers. They emphasize that addressing bias requires considering both model design and the characteristics of the training data.

Experiments conducted by the team demonstrated a “lost-in-the-middle” phenomenon, where retrieval accuracy followed a U-shaped pattern. Models performed best when the correct answer was at the beginning or end of a text sequence, with performance declining as the answer moved towards the middle.

Ultimately, the research suggests that using different masking techniques, reducing the number of attention mechanism layers, or strategically employing positional encodings could reduce position bias and improve model accuracy. This work provides valuable insights for developing more reliable and equitable AI systems.