SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
arXiv:2506.01062v3 Announce Type: replace Abstract: We introduce SealQA, a new challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting,...