Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Reader Mode unavailable (could not extract clean content).