TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
arXiv:2602.19313v1 Announce Type: new Abstract: While Vision-Language-Action (VLA) models have seen rapid progress in pretraining, their advancement in Reinforcement Learning (RL) remains hampered by low...