ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
arXiv:2510.10606v3 Announce Type: replace Abstract: Post-training Large Vision-and-Language Models (LVLMs) typically involves Supervised Fine-Tuning (SFT) for knowledge injection or Reinforcement Learning with Verifiable Rewards (RLVR)...