Jehyeok Yeon
Oh, hey there! I’m Jehyeok Yeon, an incoming PhD candidate at Max Planck Institute for Intelligent Systems with Maksym Andriushchenko with a focus on building intelligent systems that are both technically robust and socially responsible. I was previously at the University of Illinois Urbana-Champaign, where I was previously advised by Professor Gagandeep Singh at the FOCAL Lab.
My current research interest focuses on scalable oversight and “superalignment”. As AI models become more capable, I believe it’s more important than ever to be able to continue evaluating and analyzing these models, even if the models become smarter than us. This could mean creating more difficult tasks with no ground truth answer, finding scalable ways to control and interpret models, or reverse-engineering their internal representations to guarantee their hidden objectives match our instructions.
Outside of research, I care a lot about writing, both creative and academic, and how it shapes the way we think. I spend a lot of time walking and thinking through ideas, and I try to catch theatre performances whenever I can. There’s something powerful about live storytelling that reminds me why I study intelligence in the first place. When I’m not writing or watching something live, I’m usually reading somethingMoby Dick — Herman Melville.
The fastest way to reach me is through tommy8289@gmail.com. Feel free to reach out if you have any interesting ideas to throw around!
news
| Jun 17, 2026 | InferenceBench accepted to the Agents in the Wild (AIWILD) workshop at ICML 2026 as a Spotlight! |
|---|---|
| May 28, 2026 | Excited to be joining London AI Safety Research Labs as a Research Scholar this summer, working on the Science of Evaluations with the UK AI Security Institute! |
| May 28, 2026 | GSAE accepted to the AI4GOOD workshop at ICML 2026! |
| May 5, 2026 | FlowGuard accepted to ICML 2026 as a Spotlight paper! |
| Apr 13, 2026 | Offered acceptance for the NSF Graduate Research Fellowship Program! |
latest posts
selected publications
2026
-
Securing Multimodal AI through Internal Information DecompositionProceedings of the 43rd International Conference on Machine Learning (ICML 2026) , 2026Spotlight - InferenceBench: A Benchmark for Open-Ended LLM Inference Optimization by AI AgentsICML 2026 Agents in the Wild (AIWILD) Workshop , 2026Spotlight
-
Certifying Robustness of Agent Tool-Selection Under Adversarial AttacksICLR 2026 Agentic AI in the Wild (AIWILD) Workshop , 2026
2025
-
TRAP: Targeted Redirecting of Agentic PreferencesProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) , 2025