Sep 30, 2025 | Literature Review: Knowledge Awareness and Hallucinations in Language Models |
Sep 30, 2025 | Literature Review: Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations |
Sep 30, 2025 | Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats |
Sep 30, 2025 | Literature Review: Effective Red-Teaming of Policy-Adherent Agents |
Sep 25, 2025 | Literature Review: One Token to Fool LLM-as-a-Judge |
Sep 25, 2025 | Literature Review: Scaling Monosemanticity – Extracting Interpretable Features from Claude 3 Sonnet |
Sep 25, 2025 | Literature Review: Agent A/B — Automated and Scalable Web A/B Testing with Interactive LLM Agents |
Sep 05, 2025 | Literature Review: DreamDiffusion – Generating High-Quality Images from EEG Signals |
Sep 05, 2025 | Literature Review: Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base |
Sep 05, 2025 | Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models |
Aug 16, 2025 | Literature Review: The Hidden Dimensions of LLM Alignment |
Aug 16, 2025 | Literature Review: Jailbreak Antidote – Runtime Safety-Utility Balance via Sparse Representation Adjustment |
Aug 16, 2025 | Literature Review: Refusal Behavior in Large Language Models: A Nonlinear Perspective |
Aug 10, 2025 | Literature Review: Context Rot — How Increasing Input Tokens Impacts LLM Performance |
Aug 10, 2025 | Literature Review: Dissecting Recall of Factual Associations in Auto-Regressive Language Models |
Aug 09, 2025 | Literature Review: Hierarchical Reasoning Model |
Aug 09, 2025 | Literature Review: Cross-Modal Safety Mechanism Transfer in LVLMs (TGA) |
Aug 03, 2025 | Literature Review: Learning without training: The implicit dynamics of in-context learning |
Aug 03, 2025 | Literature Review: Manifold Regularization for Locally Stable Deep Neural Networks |
Aug 03, 2025 | Literature Review: Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts |
Jul 19, 2025 | Literature Review: AI Agent Behavioral Science - A New Paradigm for Understanding Autonomous Systems |
Jul 19, 2025 | Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers |
Jul 19, 2025 | Literature Review: AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench |
Jul 13, 2025 | Literature Review: SelfElicit - Your Language Model Secretly Knows Where is the Relevant Evidence |
Jul 13, 2025 | Literature Review: A Survey on Latent Reasoning |
Jul 05, 2025 | Literature Review: Teaching Language Models to Self-Improve by Learning from Language Feedback |
Jul 05, 2025 | Literature Review: On-Policy RL with Optimal Reward Baseline |
Jul 05, 2025 | Literature Review: LLMs Unlock New Paths to Monetizing Exploits |
Jun 28, 2025 | Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning |
Jun 25, 2025 | Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents |
Jun 21, 2025 | Literature Review: Prompt Injection Attack to Tool Selection in LLM Agents |
Jun 21, 2025 | Literature Review: A Practical Memory Injection Attack against LLM Agents |
Jun 14, 2025 | Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations |
Jun 14, 2025 | Literature Review: Layer-Gated Sparse Steering for Large Language Models |
Jun 14, 2025 | Literature Review: Beyond the 80/20 Rule – High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning |
Jun 14, 2025 | Literature Review: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models |
Jun 09, 2025 | Literature Review: Thinkless: LLM Learns When to Think |
Jun 09, 2025 | Literature Review: DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies |
Jun 09, 2025 | Literature Review: Gaming Tool Preferences in Agentic LLMs |
Jun 09, 2025 | Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models |
May 28, 2025 | Literature Review: Programming Refusal with Conditional Activation Steering |
May 28, 2025 | Literature Review: Adversarial Search Engine Optimization for Large Language Models |
May 28, 2025 | Literature Review: Group Think - Collaborating at Token Level Granularity |
May 28, 2025 | Literature Review: Enhancing Latent Computation in Transformers with Latent Tokens |
May 19, 2025 | Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives |
May 19, 2025 | Literature Review: Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models |
May 19, 2025 | Literature Review: Large Language Models are Autonomous Cyber Defenders |
May 19, 2025 | Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs |
May 12, 2025 | Literature Review: Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents |
May 12, 2025 | Literature Review: Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey |
May 12, 2025 | Literature Review: PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind |
May 11, 2025 | Agentic AI: The New 'Groundbreaking Technology' of 2025 |
Apr 29, 2025 | Literature Review: Bypassing Safety Guardrails in LLMs Using Humor |
Apr 29, 2025 | Literature Review: Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents |
Apr 29, 2025 | Literature Review: API Agents vs. GUI Agents: Divergence and Convergence |
Apr 29, 2025 | Literature Review: Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search |
Apr 29, 2025 | Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking |
Apr 29, 2025 | Literature Review: Agent Guide: A Simple Agent Behavioral Watermarking Framework |