Visual Attention Reasoning via Hierarchical Search and Self-Verification
arXiv:2510.18619v4 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) frequently hallucinate due to their reliance on fragile, linear reasoning and weak visual grounding. We...