OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
arXiv:2602.05847v1 Announce Type: new Abstract: While humans perceive the world through diverse modalities that operate synergistically to support a holistic understanding of their surroundings, existing...