Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents
arXiv:2601.21699v1 Announce Type: new Abstract: While reinforcement learning (RL) has empowered multi-turn reasoning agents with retrieval and tools, existing successes largely depend on extensive on-policy...