publications

2025

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Jehyeok Yeon, Federico Cinus, Yifan Wu, and 1 more author

2025

Preprint. Under review.
Certifying Robustness of Agent Tool-Selection Under Adversarial Attacks

Jehyeok Yeon, Isha Chaudhary, and Gagandeep Singh

2025

The Fourteenth International Conference on Learning Representations (ICLR 2026 Agentic AI in the Wild Workshop).
TRAP: Targeted Redirecting of Agentic Preferences

Jehyeok Yeon*, Hangoo Kang*, and Gagandeep Singh

In , 2025

Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
The Power of Friendship: Analyzing Leadership and Adversarial Attacks in Multi-Agent Collaboration

Jehyeok Yeon

2025

Poster accepted to ACM Collective Intelligence 2025; Non-archival