V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
arXiv:2503.17736v2 Announce Type: replace Abstract: Large Vision-Language Models (LVLMs) have made significant strides in the field of video understanding in recent times. Nevertheless, existing video...