publications
2026
-
Securing Multimodal AI through Internal Information DecompositionProceedings of the 43rd International Conference on Machine Learning (ICML 2026) , 2026Spotlight
2025
-
GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering2025Preprint. Under review. -
Certifying Robustness of Agent Tool-Selection Under Adversarial AttacksICLR 2026 Agentic AI in the Wild Workshop , 2025 -
TRAP: Targeted Redirecting of Agentic PreferencesProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) , 2025 - The Power of Friendship: Analyzing Leadership and Adversarial Attacks in Multi-Agent Collaboration2025Poster accepted to ACM Collective Intelligence 2025; Non-archival