Sep 30, 2025 Literature Review: Knowledge Awareness and Hallucinations in Language Models Sep 25, 2025 Literature Review: Scaling Monosemanticity – Extracting Interpretable Features from Claude 3 Sonnet Sep 05, 2025 Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models Aug 16, 2025 Literature Review: The Hidden Dimensions of LLM Alignment Aug 16, 2025 Literature Review: Refusal Behavior in Large Language Models: A Nonlinear Perspective Aug 09, 2025 Literature Review: Cross-Modal Safety Mechanism Transfer in LVLMs (TGA) Aug 03, 2025 Literature Review: Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts Jul 19, 2025 Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers Jul 13, 2025 Literature Review: SelfElicit - Your Language Model Secretly Knows Where is the Relevant Evidence Jun 14, 2025 Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations Jun 14, 2025 Literature Review: Layer-Gated Sparse Steering for Large Language Models Jun 14, 2025 Literature Review: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models