Sep 25, 2025 Literature Review: One Token to Fool LLM-as-a-Judge Jul 05, 2025 Literature Review: On-Policy RL with Optimal Reward Baseline