How does information access affect LLM monitors' ability to detect sabotage?
arXiv:2601.21112v1 Announce Type: new Abstract: Frontier language model agents can exhibit misaligned behaviors, including deception, exploiting reward hacks, and pursuing hidden objectives. To control potentially...