ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning
arXiv:2602.02192v3 Announce Type: replace Abstract: Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward...