| Mar 16, 2026 | Literature Review: Distinguishing Ignorance From Error In LLM Hallucinations |
| Mar 16, 2026 | Literature Review: Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning |
| Mar 16, 2026 | Literature Review: Language Model Circuits Are Sparse In The Neuron Basis |
| Mar 09, 2026 | Literature Review: Prompt Infection: LLM-To-LLM Prompt Injection Within Multi-Agent Systems |
| Mar 09, 2026 | Literature Review: Bound By Semanticity: Universal Laws Governing The Generalization-Identification Tradeoff |
| Feb 10, 2026 | Literature Review: Liars' Bench: Evaluating Lie Detectors for Language Models |
| Dec 31, 2025 | Literature Review: From Aleatoric to Epistemic: Exploring Uncertainty Quantification Techniques in Artificial Intelligence |
| Oct 18, 2025 | Literature Review: Shutdown Resistance in Large Language Models |
| Oct 13, 2025 | Literature Review: Searching for Privacy Risks in LLM Agents via Simulation |
| Oct 03, 2025 | Literature Review: Automating Steering for Safe Multimodal Large Language Models |
| Sep 30, 2025 | Literature Review: Knowledge Awareness and Hallucinations in Language Models |
| Sep 30, 2025 | Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats |
| Sep 25, 2025 | Literature Review: One Token to Fool LLM-as-a-Judge |
| Sep 25, 2025 | Literature Review: Scaling Monosemanticity – Extracting Interpretable Features from Claude 3 Sonnet |
| Sep 05, 2025 | Literature Review: Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base |
| Sep 05, 2025 | Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models |
| Aug 16, 2025 | Literature Review: The Hidden Dimensions of LLM Alignment |
| Aug 16, 2025 | Literature Review: Jailbreak Antidote – Runtime Safety-Utility Balance via Sparse Representation Adjustment |
| Aug 16, 2025 | Literature Review: Refusal Behavior in Large Language Models: A Nonlinear Perspective |
| Aug 03, 2025 | Literature Review: Manifold Regularization for Locally Stable Deep Neural Networks |
| Jul 19, 2025 | Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers |
| Jul 05, 2025 | Literature Review: Teaching Language Models to Self-Improve by Learning from Language Feedback |
| Jul 05, 2025 | Literature Review: LLMs Unlock New Paths to Monetizing Exploits |
| Jun 25, 2025 | Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents |
| Jun 21, 2025 | Literature Review: A Practical Memory Injection Attack against LLM Agents |
| Jun 14, 2025 | Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations |
| Jun 14, 2025 | Literature Review: Layer-Gated Sparse Steering for Large Language Models |
| May 19, 2025 | Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives |
| May 19, 2025 | Literature Review: Large Language Models are Autonomous Cyber Defenders |
| May 12, 2025 | Literature Review: Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents |