VisualScratchpad: Inference-time Visual Concepts Analysis in Vision Language Models
arXiv:2603.07335v1 Announce Type: new Abstract: High-performing vision language models still produce incorrect answers, yet their failure modes are often difficult to explain. To make model...