Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models
arXiv:2602.01698v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have recently achieved strong mathematical and code reasoning performance through Reinforcement Learning (RL) post-training. However, we...