Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
arXiv:2601.20103v1 Announce Type: new Abstract: Recent advances in reinforcement learning for code generation have made robust environments essential to prevent reward hacking. As LLMs increasingly...