CUEBES

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

arXiv:2601.18777v1 Announce Type: new Abstract: Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent...

Software Energy

arXiv CS Jan 28

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

arXiv:2601.18778v1 Announce Type: new Abstract: Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on...

Biology Engineering

arXiv CS Jan 28

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration

arXiv:2601.18779v1 Announce Type: new Abstract: Reinforcement learning (RL) has improved the reasoning abilities of large language models (LLMs), yet state-of-the-art methods still fail to learn...

Software Artificial Intelligence

arXiv CS Jan 28

Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic

arXiv:2601.18783v1 Announce Type: new Abstract: Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty...

Software Robotics

arXiv CS Jan 28

Design Techniques for LLM-Powered Interactive Storytelling: A Case Study of the Dramamancer System

arXiv:2601.18785v1 Announce Type: new Abstract: The rise of Large Language Models (LLMs) has enabled a new paradigm for bridging authorial intent and player agency in...

Technology Energy

arXiv CS Jan 28

Unsupervised Text Segmentation via Kernel Change-Point Detection on Sentence Embeddings

arXiv:2601.18788v1 Announce Type: new Abstract: Unsupervised text segmentation is crucial because boundary labels are expensive, subjective, and often fail to transfer across domains and granularity...

Artificial Intelligence Energy

arXiv CS Jan 28

MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts

arXiv:2601.18790v1 Announce Type: new Abstract: Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We...

Artificial Intelligence Biology

arXiv CS Jan 28

Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets

arXiv:2601.18791v1 Announce Type: new Abstract: We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia...

Genetics Software

arXiv CS Jan 28

MEGnifying Emotion: Sentiment Analysis from Annotated Brain Data

arXiv:2601.18792v1 Announce Type: new Abstract: Decoding emotion from brain activity could unlock a deeper understanding of the human experience. While a number of existing datasets...

Software Neuroscience

arXiv CS Jan 28

Handling Scope Checks (Extended Version)

arXiv:2601.18793v1 Announce Type: new Abstract: Metaprogramming and effect handlers interact in unexpected, and sometimes undesirable, ways. One example is scope extrusion: the generation of ill-scoped...

Software Biology

arXiv CS Jan 28

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes

arXiv:2601.18795v1 Announce Type: new Abstract: Typical reinforcement learning (RL) methods for LLM reasoning waste compute on hard problems, where correct on-policy traces are rare, policy...

Artificial Intelligence Biology

arXiv CS Jan 28

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models

arXiv:2601.18796v1 Announce Type: new Abstract: Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing...

Software Medicine & Health

arXiv CS Jan 28

Weighted Birkhoff averages: Deterministic and probabilistic perspectives

arXiv:2505.03210v3 Announce Type: cross Abstract: In this paper, we survey physically related applications of a class of weighted quasi-Monte Carlo methods from a theoretical, deterministic...

Software Biology

arXiv CS Jan 28

Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models

arXiv:2601.15801v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are aligned to mitigate risks, their safety guardrails remain fragile against jailbreak attacks. This reveals...

Software Artificial Intelligence

arXiv CS Jan 28

The Voice of Equity: A Systematic Evaluation of Bias Mitigation Techniques for Speech-Based Cognitive Impairment Detection Across Architectures and Demographics

arXiv:2601.16989v1 Announce Type: cross Abstract: Speech-based detection of cognitive impairment offers a scalable, non-invasive screening, yet algorithmic bias across demographic and linguistic subgroups remains critically...

Artificial Intelligence Technology

arXiv CS Jan 28

BickGraphing: Web-Based Application for Visual Inspection of Audio Recordings

arXiv:2601.17014v1 Announce Type: cross Abstract: BickGraphing is a browser based research tool that enables visual inspection of acoustic recordings. The tool was built in support...

Software Technology

arXiv CS Jan 28

Regret-Driven Portfolios: LLM-Guided Smart Clustering for Optimal Allocation

arXiv:2601.17021v1 Announce Type: cross Abstract: We attempt to mitigate the persistent tradeoff between risk and return in medium- to long-term portfolio management. This paper proposes...

Software Artificial Intelligence

arXiv CS Jan 28

How Information Evolves: Stability-Driven Assembly and the Emergence of a Natural Genetic Algorithm

arXiv:2601.17061v1 Announce Type: cross Abstract: Information can evolve as a physical consequence of non-equilibrium dynamics, even in the absence of genes, replication, or predefined fitness...

Genetics Artificial Intelligence

arXiv CS Jan 28

PC-MCL: Patient-Consistent Multi-Cycle Learning with multi-label bias correction for respiratory sound classification

arXiv:2601.17080v1 Announce Type: cross Abstract: Automated respiratory sound classification supports the diagnosis of pulmonary diseases. However, many deep models still rely on cycle-level analysis and...

Software Biology

arXiv CS Jan 28

ChemNavigator: Agentic AI Discovery of Design Rules for Organic Photocatalysts

arXiv:2601.17084v1 Announce Type: cross Abstract: The discovery of high-performance organic photocatalysts for hydrogen evolution remains limited by the vastness of chemical space and the reliance...

Chemistry Software

arXiv CS Jan 28

Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration

arXiv:2601.17085v1 Announce Type: cross Abstract: Discrete speech tokens offer significant advantages for storage and language model integration, but their application in speech emotion recognition (SER)...

Software Neuroscience

arXiv CS Jan 28

EveNet: A Foundation Model for Particle Collision Data Analysis

arXiv:2601.17126v1 Announce Type: cross Abstract: While deep learning is transforming data analysis in high-energy physics, computational challenges limit its potential. We address these challenges in...

Physics Quantum Computing

arXiv CS Jan 28

Logarithmic Density of Rank $\geq 1$ and Rank $\geq 2$ Genus-2 Jacobians and Applications to Hyperelliptic Curve Cryptography

arXiv:2601.17142v1 Announce Type: cross Abstract: In this work we study quantitative existence results for genus-$2$ curves over $\mathbb{Q}$ whose Jacobians have Mordell-Weil rank at least...

Software Quantum Computing

arXiv CS Jan 28

Falsifying Predictive Algorithm

arXiv:2601.17146v1 Announce Type: cross Abstract: Empirical investigations into unintended model behavior often show that the algorithm is predicting another outcome than what was intended. These...

Biology Software