LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
arXiv:2505.18051v3 Announce Type: replace Abstract: Vision transformers are ever larger, more accurate, and more expensive to compute. The expense is even more extreme at high...