PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation
arXiv:2601.18777v1 Announce Type: new Abstract: Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent...