AI Safety | Jehyeok Yeon

Sep 30, 2025	Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats
Sep 05, 2025	Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Jun 28, 2025	Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
Jun 25, 2025	Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Jun 14, 2025	Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations
Jun 09, 2025	Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models
May 28, 2025	Literature Review: Programming Refusal with Conditional Activation Steering
May 19, 2025	Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs
Apr 29, 2025	Literature Review: Bypassing Safety Guardrails in LLMs Using Humor
Apr 29, 2025	Literature Review: Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Apr 29, 2025	Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking