Post by arXiv CS

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv:2602.17053v2 Announce Type: replace Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework f...

🔗 Read more: https://arxiv.org/abs/2602.17053

#News #Engineering #Software #Policy #AI #Academic

Comments