research | Jehyeok Yeon

Mar 16, 2026	Literature Review: Distinguishing Ignorance From Error In LLM Hallucinations
Mar 16, 2026	Literature Review: Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Mar 16, 2026	Literature Review: Language Model Circuits Are Sparse In The Neuron Basis
Mar 10, 2026	Literature Review: Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
Mar 09, 2026	Literature Review: Gradual Disempowerment: Systemic Existential Risks From Incremental AI Development
Mar 09, 2026	Literature Review: Gradual Disempowerment: Systemic Existential Risks From Incremental AI Development
Mar 09, 2026	Literature Review: Automatic Prompt Optimization With "Gradient Descent" And Beam Search
Mar 09, 2026	Literature Review: Prompt Infection: LLM-To-LLM Prompt Injection Within Multi-Agent Systems
Mar 09, 2026	Literature Review: Unable To Forget: Proactive Interference Reveals Working Memory Limits In LLMs Beyond Context Length
Mar 09, 2026	Literature Review: Bound By Semanticity: Universal Laws Governing The Generalization-Identification Tradeoff
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Feb 10, 2026	Literature Review: Liars' Bench: Evaluating Lie Detectors for Language Models
Feb 10, 2026	Literature Review: A Theory of Unsupervised Translation Motivated by Understanding Animal Communication
Dec 31, 2025	Literature Review: Token Embeddings Violate the Manifold Hypothesis
Dec 31, 2025	Literature Review: From Aleatoric to Epistemic: Exploring Uncertainty Quantification Techniques in Artificial Intelligence
Dec 31, 2025	Literature Review: Research Robots: When AIs Experiment on Us
Dec 31, 2025	Literature Review: Language Models are Injective and Hence Invertible
Nov 29, 2025	Literature Review: Words That Make Language Models Perceive
Nov 23, 2025	Literature Review: Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
Nov 23, 2025	Literature Review: PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding
Nov 22, 2025	Literature Review: CO3: Contrasting Concepts Compose Better
Nov 22, 2025	Literature Review: MAGIC: Near-Optimal Data Attribution for Deep Learning
Nov 22, 2025	Literature Review: DAUNCE: Data Attribution through Uncertainty Estimation
Nov 22, 2025	Literature Review: Reasoning Models Don't Always Say What They Think
Oct 26, 2025	Literature Review: Anthropic's Project V.E.N.D. 1
Oct 26, 2025	Literature Review: In-Context Learning and Induction Heads
Oct 21, 2025	Literature Review: Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Oct 21, 2025	Literature Review: Fresh in Memory: Training-Order Recency is Linearly Encoded in Language Model Activations
Oct 21, 2025	Literature Review: REFRAG — Rethinking RAG-Based Decoding
Oct 18, 2025	Literature Review: Shutdown Resistance in Large Language Models
Oct 18, 2025	Literature Review: Uni-LoRA — One Vector is All You Need
Oct 13, 2025	Literature Review: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Oct 13, 2025	Literature Review: Searching for Privacy Risks in LLM Agents via Simulation
Oct 13, 2025	Literature Review: R-Zero: Self-Evolving Reasoning LLM from Zero Data
Oct 03, 2025	Literature Review: VLMBias — Counterfactual Probing of Vision-Language Model Bias
Oct 03, 2025	Literature Review: Automating Steering for Safe Multimodal Large Language Models
Sep 30, 2025	Literature Review: Knowledge Awareness and Hallucinations in Language Models
Sep 30, 2025	Literature Review: Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations
Sep 30, 2025	Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats
Sep 30, 2025	Literature Review: Effective Red-Teaming of Policy-Adherent Agents
Sep 25, 2025	Literature Review: One Token to Fool LLM-as-a-Judge
Sep 25, 2025	Literature Review: Scaling Monosemanticity – Extracting Interpretable Features from Claude 3 Sonnet
Sep 25, 2025	Literature Review: Agent A/B — Automated and Scalable Web A/B Testing with Interactive LLM Agents
Sep 05, 2025	Literature Review: DreamDiffusion – Generating High-Quality Images from EEG Signals
Sep 05, 2025	Literature Review: Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Sep 05, 2025	Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Aug 16, 2025	Literature Review: The Hidden Dimensions of LLM Alignment
Aug 16, 2025	Literature Review: Jailbreak Antidote – Runtime Safety-Utility Balance via Sparse Representation Adjustment
Aug 16, 2025	Literature Review: Refusal Behavior in Large Language Models: A Nonlinear Perspective
Aug 10, 2025	Literature Review: Context Rot — How Increasing Input Tokens Impacts LLM Performance
Aug 10, 2025	Literature Review: Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Aug 09, 2025	Literature Review: Hierarchical Reasoning Model
Aug 09, 2025	Literature Review: Cross-Modal Safety Mechanism Transfer in LVLMs (TGA)
Aug 03, 2025	Literature Review: Learning without training: The implicit dynamics of in-context learning
Aug 03, 2025	Literature Review: Manifold Regularization for Locally Stable Deep Neural Networks
Aug 03, 2025	Literature Review: Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Jul 19, 2025	Literature Review: AI Agent Behavioral Science - A New Paradigm for Understanding Autonomous Systems
Jul 19, 2025	Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers
Jul 19, 2025	Literature Review: AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Jul 13, 2025	Literature Review: SelfElicit - Your Language Model Secretly Knows Where is the Relevant Evidence
Jul 13, 2025	Literature Review: A Survey on Latent Reasoning
Jul 05, 2025	Literature Review: Teaching Language Models to Self-Improve by Learning from Language Feedback
Jul 05, 2025	Literature Review: On-Policy RL with Optimal Reward Baseline
Jul 05, 2025	Literature Review: LLMs Unlock New Paths to Monetizing Exploits
Jun 28, 2025	Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
Jun 25, 2025	Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Jun 21, 2025	Literature Review: Prompt Injection Attack to Tool Selection in LLM Agents
Jun 21, 2025	Literature Review: A Practical Memory Injection Attack against LLM Agents
Jun 14, 2025	Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations
Jun 14, 2025	Literature Review: Layer-Gated Sparse Steering for Large Language Models
Jun 14, 2025	Literature Review: Beyond the 80/20 Rule – High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Jun 14, 2025	Literature Review: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models
Jun 09, 2025	Literature Review: Thinkless: LLM Learns When to Think
Jun 09, 2025	Literature Review: DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
Jun 09, 2025	Literature Review: Gaming Tool Preferences in Agentic LLMs
Jun 09, 2025	Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models
May 28, 2025	Literature Review: Programming Refusal with Conditional Activation Steering
May 28, 2025	Literature Review: Adversarial Search Engine Optimization for Large Language Models
May 28, 2025	Literature Review: Group Think - Collaborating at Token Level Granularity
May 28, 2025	Literature Review: Enhancing Latent Computation in Transformers with Latent Tokens
May 19, 2025	Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives
May 19, 2025	Literature Review: Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
May 19, 2025	Literature Review: Large Language Models are Autonomous Cyber Defenders
May 19, 2025	Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs
May 12, 2025	Literature Review: Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
May 12, 2025	Literature Review: Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
May 12, 2025	Literature Review: PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind
May 11, 2025	Agentic AI: The New 'Groundbreaking Technology' of 2025
Apr 29, 2025	Literature Review: Bypassing Safety Guardrails in LLMs Using Humor
Apr 29, 2025	Literature Review: Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Apr 29, 2025	Literature Review: API Agents vs. GUI Agents: Divergence and Convergence
Apr 29, 2025	Literature Review: Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Apr 29, 2025	Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
Apr 29, 2025	Literature Review: Agent Guide: A Simple Agent Behavioral Watermarking Framework