Enhancing Multi-Image Understanding through Delimiter Token Scaling
arXiv:2602.01984v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) achieve strong performance on single-image tasks, but their performance declines when multiple images are provided as...