Sep 30, 2025 Literature Review: Agentic Misalignment – How LLMs Could Be Insider Threats Sep 05, 2025 Literature Review: Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models Jun 28, 2025 Literature Review: Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning Jun 25, 2025 Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents Jun 14, 2025 Literature Review: COSMIC: Generalized Refusal Direction Identification in LLM Activations Jun 09, 2025 Literature Review: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models May 28, 2025 Literature Review: Programming Refusal with Conditional Activation Steering May 19, 2025 Literature Review: REVEAL – Multi-turn Evaluation of Image-Input Harms for Vision LLMs Apr 29, 2025 Literature Review: Bypassing Safety Guardrails in LLMs Using Humor Apr 29, 2025 Literature Review: Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search Apr 29, 2025 Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking