publications

2026

  1. flowguard.png
    Securing Multimodal AI through Internal Information Decomposition
    Jehyeok Yeon, Hyeonjeong Ha, Qiusi Zhan, and 1 more author
    Proceedings of the 43rd International Conference on Machine Learning (ICML 2026) , 2026
    Spotlight
  2. InferenceBench: A Benchmark for Open-Ended LLM Inference Optimization by AI Agents
    Jehyeok Yeon, Ben Rank, and Maksym Andriushchenko
    ICML 2026 Agents in the Wild (AIWILD) Workshop , 2026
    Spotlight
  3. llmcertT.png
    Certifying Robustness of Agent Tool-Selection Under Adversarial Attacks
    Jehyeok Yeon, Isha Chaudhary, and Gagandeep Singh
    ICLR 2026 Agentic AI in the Wild (AIWILD) Workshop , 2026
  4. gsae.png
    GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
    Jehyeok Yeon, Federico Cinus, Yifan Wu, and 1 more author
    ICML 2026 AI4GOOD Workshop , 2026
  5. ResearchArena: Evaluating Sabotage and Monitoring in Automated AI R&D
    Ben Rank*, Lena Libon*, Jehyeok Yeon, and 5 more authors
    2026
    Preprint. Under review. * Equal contribution.

2025

  1. trap.png
    TRAP: Targeted Redirecting of Agentic Preferences
    Jehyeok Yeon*, Hangoo Kang*, and Gagandeep Singh
    Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) , 2025
  2. The Power of Friendship: Analyzing Leadership and Adversarial Attacks in Multi-Agent Collaboration
    Jehyeok Yeon, and Lawrence Angrave
    2025
    Poster accepted to ACM Collective Intelligence 2025; Non-archival