Literature Review: A Practical Memory Injection Attack against LLM Agents
This paper introduces MINJA, a novel Memory INJection Attack against LLM-based agents. Unlike prior attacks that target end users directly, MINJA enables an attacker without privileged access to inject malicious records into an agent’s memory bank via crafted queries and output observations. Once the memory is poisoned, subsequent legitimate queries retrieve these malicious records, leading the agent to perform harmful actions for all future end users :contentReference{index=0}.
Key Insights
In “A Practical Memory Injection Attack against LLM Agents,” the authors introduce MINJA, a novel adversarial technique that targets the memory banks relied upon by agentic systems. Instead of making prompts to coerce an agent into performing malicious actions for the attacker or other end users, MINJA stealthily injects poisoned records into the agent’s long‐term memory via carefully constructed query–response interactions. Once embedded, these malicious memories are retrieved during subsequent legitimate queries, causing the agent to carry out harmful behavior for all future users who trigger that memory key. This work highlights a systemic vulnerability in trusting unverified memory components within LLM‐based agents.
The core technical insight behind MINJA lies in its multi‐step “bridging” method. By designing intermediate reasoning steps that logically connect an innocuous user query to a hidden malicious objective, the attacker overcomes the semantic gap between benign input and harmful output. An attached “indication prompt” then instructs the agent to record this engineered reasoning chain in its memory store. To evade detection, the authors apply a “progressive shortening” strategy, iteratively removing overt cues from the prompt so that the final memory entry appears indistinguishable from normal user‐agent exchanges. Their experiments that are conducted across three distinct agent tasks and multiple victim target term pairs demonstrate that MINJA achieves high injection success rates without any privileged access, thereby illustrating a practical threat scenario for deployed agentic AI.
Example
In a medical-assistant agent, a benign query containing PatientID “12345” is transformed via bridging steps into a reasoning chain designed for PatientID “67890.” By carefully crafting the indication prompt and leveraging progressive shortening, the malicious record is injected when the legitimate user later queries PatientID “12345,” causing the agent to provide medication instructions intended for PatientID “67890,” which could be fatal in practice :contentReference.
Figure: The MINJA attack pipeline, illustrating how bridging steps and indication prompts lead to malicious record injection in an LLM agent's memory bank.
Ratings
Novelty: 4/5
The attack highlights a previously underexplored vulnerability in agentic AI, poisoning the agent’s own memory to target all users, which is a significant extension beyond user-specific adversarial prompts.
Clarity: 3/5
The methodology is well-structured, but the detailed mechanics of bridging-step design and progressive shortening may require careful reading to reproduce.
Personal Comments
Personally, I find this paper unsettling. It’s going back to the early days of adversarial training research, where loopholes in learning systems were exposed one by one, only now the loophole resides within the agent’s own knowledge store. The progressive shortening technique is particularly elegant, yet I worry about its resilience under memory‐sanitization defenses or encrypted storage schemes. Future work should explore formal methods for certifying memory integrity and automated detection of anomalous record patterns, perhaps drawing on quantitative certification frameworks to offer statistical guarantees of safety. This study raises profound questions about how we establish trust in autonomous systems and highlights the urgent need for robust safeguards around internal memory management.
Enjoy Reading This Article?
Here are some more articles you might like to read next: