CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents
arXiv:2602.19547v1 Announce Type: new Abstract: LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution...