Literature Review: Unable To Forget: Proactive Interference Reveals Working Memory Limits In LLMs Beyond Context Length

The authors of this paper investigate how Large Language Models handle conflicting information within their context windows. By adapting the concept of Proactive Interference from cognitive science, where previously learned information disrupts the recall of newer information, they evaluate LLM retrieval capabilities on a sequence of continuously updated key-value pairs. The study demonstrates that as the amount of earlier interfering information increases, LLM retrieval accuracy degrades in a log-linear fashion, highlighting a fundamental capacity limit that operates independently of the model’s maximum context length.

Key Insights

  1. Interference overrides recency and instructions Model accuracy drops log-linearly toward zero as the number of interfering updates increases. This degradation happens even when the target information is clearly positioned at the very end of the input to reduce search difficulty. The models are fundamentally unable to ignore irrelevant prior updates, and attempts to mitigate this using explicit natural language instructions, i.e., telling the model to “forget” previous updates, yield marginal or no improvements.

Figure: Model retrieval accuracy declines log-linearly as the number of interfering updates per key increases, demonstrating a universal limitation across architectures.

  1. Interference limits are independent of context length The authors created a control condition where the total input length remained constant while the number of tracked keys varied. The models exhibited the exact same log-linear decay in accuracy, proving that the struggle to distinguish similar information is driven by an independent interference capacity constraint rather than simply having too many tokens in the context window.

  2. Natural language prompts fail but session resets succeed Directly instructing the model to forget specific prior keys causes retrieval errors to anchor around the position of the prompt in the sequence, actively reshaping the interference rather than mitigating it. However, injecting a mock user-assistant QA dialogue to create an artificial task boundary effectively signals a hard session reset. This “hack” successfully bypassed the interference by framing prior updates as a closed batch, significantly outperforming natural language focusing techniques.

Example

Consider a sequence of blood pressure readings in a clinical logging system, where the objective is to extract the most recent value. The prompt receives a sequence: “BP: 120, BP: 128 (10 min later), BP: 125”. The desired output is the final value, 125. As the sequence extends to include hundreds of earlier blood pressure readings, the LLM becomes overwhelmed by the similar semantic distractors. Rather than pulling the correct final reading, the model’s output distribution spreads, and it begins retrieving outdated values from much earlier in the prompt, or hallucinating values entirely, despite easily being able to locate the end of the text if prompted to do so.

Ratings

Novelty: 2.5/5 The work is primarily an empirical extension of the already established “Lost in the Middle” phenomenon, and the mapping of human cognitive functions to stateless transformer architectures is questionable without further reasoning.

Clarity: 3.5/5 The experimental setups are well-isolated and the data is presented clearly, though the theoretical framing relies on potentially unnecessary psychological analogies.

Personal Perspective

While the core idea of this research has merit, I am personally a bit wary of the connection to cognitive science and the anthropomorphizing of LLMs. Similar concepts have been explored in catastrophic forgetting during training dynamics and numerous long-context studies, but it was analyzed mostly under the idea of signal distillation rather than something more “human”. Applying proactive interference as the central methodology assumes we should expect LLMs to exhibit human behavior, despite information being processed in fundamentally different ways. LLMs do not think the way humans do; they are (currently) stateless machines that simply predict the next token based on massive training distributions. If we consider the original “Lost in the Middle” findings regarding length and position, it is a natural extension that the content itself, i.e., conflicting updates within the prompt, would also degrade retrieval. Transformers do not possess built-in mechanisms to suggest they would inherently care more about recent updates over older ones, making this an expected limitation rather than some profound cognitive bottleneck.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Literature Review: Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
  • Literature Review: Gradual Disempowerment: Systemic Existential Risks From Incremental AI Development
  • Literature Review: Gradual Disempowerment: Systemic Existential Risks From Incremental AI Development
  • Literature Review: Automatic Prompt Optimization With "Gradient Descent" And Beam Search
  • Literature Review: Prompt Infection: LLM-To-LLM Prompt Injection Within Multi-Agent Systems