GRRM: Group Relative Reward Modeling for Machine Translation
arXiv:2602.14028v1 Announce Type: new Abstract: While Group Relative Policy Optimization (GRPO) offers a powerful framework for LLM post-training, its effectiveness in open-ended domains like Machine...