Adversarial AI | Jehyeok Yeon

Sep 30, 2025	Literature Review: Effective Red-Teaming of Policy-Adherent Agents
Sep 05, 2025	Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Aug 16, 2025	Literature Review: The Hidden Dimensions of LLM Alignment
Aug 16, 2025	Literature Review: Jailbreak Antidote – Runtime Safety-Utility Balance via Sparse Representation Adjustment
Aug 09, 2025	Literature Review: Cross-Modal Safety Mechanism Transfer in LVLMs (TGA)
Jul 19, 2025	Literature Review: Universal Jailbreak Suffixes Are Strong Attention Hijackers
Jul 05, 2025	Literature Review: LLMs Unlock New Paths to Monetizing Exploits
Jun 28, 2025	Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
Jun 25, 2025	Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Jun 21, 2025	Literature Review: Prompt Injection Attack to Tool Selection in LLM Agents
Jun 21, 2025	Literature Review: A Practical Memory Injection Attack against LLM Agents
Jun 14, 2025	Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations
Jun 09, 2025	Literature Review: Gaming Tool Preferences in Agentic LLMs
Jun 09, 2025	Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models
May 19, 2025	Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives
May 19, 2025	Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs