Feb 10, 2026 Literature Review: Liars' Bench: Evaluating Lie Detectors for Language Models Jun 25, 2025 Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents