publications

2025

  1. GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
    Jehyeok Yeon, Federico Cinus, Yifan Wu, and 1 more author
    2025
    Preprint. Under review.
  2. Certifying Robustness of Agent Tool-Selection Under Adversarial Attacks
    Jehyeok Yeon, Isha Chaudhary, and Gagandeep Singh
    2025
    Preprint. Under review.
  3. trap.png
    TRAP: Targeted Redirecting of Agentic Preferences
    Jehyeok Yeon*, Hangoo Kang*, and Gagandeep Singh
    In , 2025
    Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
  4. The Power of Friendship: Analyzing Leadership and Adversarial Attacks in Multi-Agent Collaboration
    Jehyeok Yeon
    2025
    Poster accepted to ACM Collective Intelligence 2025; Non-archival