2025

an archive of posts from this year

Sep 30, 2025 Literature Review: Knowledge Awareness and Hallucinations in Language Models
Sep 30, 2025 Literature Review: Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations
Sep 30, 2025 Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats
Sep 30, 2025 Literature Review: Effective Red-Teaming of Policy-Adherent Agents
Sep 25, 2025 Literature Review: One Token to Fool LLM-as-a-Judge
Sep 25, 2025 Literature Review: Scaling Monosemanticity – Extracting Interpretable Features from Claude 3 Sonnet
Sep 25, 2025 Literature Review: Agent A/B — Automated and Scalable Web A/B Testing with Interactive LLM Agents
Sep 05, 2025 Literature Review: DreamDiffusion – Generating High-Quality Images from EEG Signals
Sep 05, 2025 Literature Review: Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Sep 05, 2025 Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Aug 16, 2025 Literature Review: The Hidden Dimensions of LLM Alignment
Aug 16, 2025 Literature Review: Jailbreak Antidote – Runtime Safety-Utility Balance via Sparse Representation Adjustment
Aug 16, 2025 Literature Review: Refusal Behavior in Large Language Models: A Nonlinear Perspective
Aug 10, 2025 Literature Review: Context Rot — How Increasing Input Tokens Impacts LLM Performance
Aug 10, 2025 Literature Review: Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Aug 09, 2025 Literature Review: Hierarchical Reasoning Model
Aug 09, 2025 Literature Review: Cross-Modal Safety Mechanism Transfer in LVLMs (TGA)
Aug 03, 2025 Literature Review: Learning without training: The implicit dynamics of in-context learning
Aug 03, 2025 Literature Review: Manifold Regularization for Locally Stable Deep Neural Networks
Aug 03, 2025 Literature Review: Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Jul 19, 2025 Literature Review: AI Agent Behavioral Science - A New Paradigm for Understanding Autonomous Systems
Jul 19, 2025 Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers
Jul 19, 2025 Literature Review: AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Jul 13, 2025 Literature Review: SelfElicit - Your Language Model Secretly Knows Where is the Relevant Evidence
Jul 13, 2025 Literature Review: A Survey on Latent Reasoning
Jul 05, 2025 Literature Review: Teaching Language Models to Self-Improve by Learning from Language Feedback
Jul 05, 2025 Literature Review: On-Policy RL with Optimal Reward Baseline
Jul 05, 2025 Literature Review: LLMs Unlock New Paths to Monetizing Exploits
Jun 28, 2025 Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
Jun 25, 2025 Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Jun 21, 2025 Literature Review: Prompt Injection Attack to Tool Selection in LLM Agents
Jun 21, 2025 Literature Review: A Practical Memory Injection Attack against LLM Agents
Jun 21, 2025 Creative: Blink and You'll Miss It
Jun 14, 2025 Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations
Jun 14, 2025 Literature Review: Layer-Gated Sparse Steering for Large Language Models
Jun 14, 2025 Literature Review: Beyond the 80/20 Rule – High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Jun 14, 2025 Literature Review: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models
Jun 09, 2025 Literature Review: Thinkless: LLM Learns When to Think
Jun 09, 2025 Literature Review: DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
Jun 09, 2025 Literature Review: Gaming Tool Preferences in Agentic LLMs
Jun 09, 2025 Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models
May 28, 2025 Literature Review: Programming Refusal with Conditional Activation Steering
May 28, 2025 Literature Review: Adversarial Search Engine Optimization for Large Language Models
May 28, 2025 Literature Review: Group Think - Collaborating at Token Level Granularity
May 28, 2025 Literature Review: Enhancing Latent Computation in Transformers with Latent Tokens
May 19, 2025 Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives
May 19, 2025 Literature Review: Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
May 19, 2025 Literature Review: Large Language Models are Autonomous Cyber Defenders
May 19, 2025 Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs
May 12, 2025 Literature Review: Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
May 12, 2025 Literature Review: Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
May 12, 2025 Literature Review: PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind
May 11, 2025 Agentic AI: The New 'Groundbreaking Technology' of 2025
Apr 29, 2025 Literature Review: Bypassing Safety Guardrails in LLMs Using Humor
Apr 29, 2025 Literature Review: Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Apr 29, 2025 Literature Review: API Agents vs. GUI Agents: Divergence and Convergence
Apr 29, 2025 Literature Review: Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Apr 29, 2025 Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
Apr 29, 2025 Literature Review: Agent Guide: A Simple Agent Behavioral Watermarking Framework
Feb 10, 2025 Reflection: It Builds Character but
Jan 21, 2025 Opinion: Escapsim, Complacency, and the Inner Gigachad