LLM-WikiRace Benchmark: How Far Can LLMs Plan over Real-World Knowledge Graphs?
arXiv:2602.16902v2 Announce Type: replace Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a ta...
🔗 Read more: https://arxiv.org/abs/2602.16902
#News #AI #Software #WorldNews #Policy #Academic
Edited
Comments
Log in to leave a comment.
No comments yet. Be the first to comment!