IDE-Bench: Evaluating Large Language Models as IDE Agents on Real-World Software Engineering Tasks
arXiv:2601.20886v1 Announce Type: new Abstract: IDE-Bench is a comprehensive framework for evaluating AI IDE agents on real-world software engineering tasks through an IDE-native tool interface....