CUEBES

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

arXiv:2601.20103v1 Announce Type: new Abstract: Recent advances in reinforcement learning for code generation have made robust environments essential to prevent reward hacking. As LLMs increasingly...

Artificial Intelligence Environment

arXiv CS Jan 29

NucFuseRank: Dataset Fusion and Performance Ranking for Nuclei Instance Segmentation

arXiv:2601.20104v1 Announce Type: new Abstract: Nuclei instance segmentation in hematoxylin and eosin (H&E)-stained images plays an important role in automated histological image analysis, with various...

Artificial Intelligence Software

arXiv CS Jan 29

FFE-Hallu:Hallucinations in Fixed Figurative Expressions:Benchmark of Idioms and Proverbs in the Persian Language

arXiv:2601.20105v1 Announce Type: new Abstract: Figurative language, particularly fixed figurative expressions (FFEs) such as idioms and proverbs, poses persistent challenges for large language models (LLMs)....

Artificial Intelligence Policy

arXiv CS Jan 29

Are We All Using Agents the Same Way? An Empirical Study of Core and Peripheral Developers Use of Coding Agents

arXiv:2601.20106v1 Announce Type: new Abstract: Autonomous AI agents are transforming software development and redefining how developers collaborate with AI. Prior research shows that the adoption...

Software Robotics

arXiv CS Jan 29

Look in the Middle: Structural Anchor Pruning for Scalable Visual RAG Indexing

arXiv:2601.20107v1 Announce Type: new Abstract: Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive index vector size overheads. Training-free pruning...

Engineering Software

arXiv CS Jan 29

Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests

arXiv:2601.20109v1 Announce Type: new Abstract: The increasing adoption of AI coding agents has increased the number of agent-generated pull requests (PRs) merged with little or...

Software Policy

arXiv CS Jan 29

Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study

arXiv:2601.20112v1 Announce Type: new Abstract: The rise of large language models (LLMs) has accelerated the development of automated techniques and tools for supporting various software...

Software Technology

arXiv CS Jan 29

A Data-Informed Local Subspaces Method for Error-Bounded Lossy Compression of Large-Scale Scientific Datasets

arXiv:2601.20113v1 Announce Type: new Abstract: The growing volume of scientific simulation data presents a significant challenge for storage and transfer. Error-bounded lossy compression has emerged...

Technology Environment

arXiv CS Jan 29

How Much Progress Has There Been in NVIDIA Datacenter GPUs?

arXiv:2601.20115v1 Announce Type: new Abstract: Graphics Processing Units (GPUs) are the state-of-the-art architecture for essential tasks, ranging from rendering 2D/3D graphics to accelerating workloads in...

World News Policy

arXiv CS Jan 29

In-Context Reinforcement Learning From Suboptimal Historical Data

arXiv:2601.20116v1 Announce Type: new Abstract: Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training...

Policy Psychology

arXiv CS Jan 29

A Reinforcement Learning Based Universal Sequence Design for Polar Codes

arXiv:2601.20118v1 Announce Type: new Abstract: To advance Polar code design for 6G applications, we develop a reinforcement learning-based universal sequence design framework that is extensible...

Software Policy

arXiv CS Jan 29

Improving Smoothed Aggregation AMG Robustness on Stretched Mesh Applications

arXiv:2601.20119v1 Announce Type: new Abstract: Strength-of-connection algorithms play a key role in algebraic multigrid (AMG). Specifically, they determine which matrix nonzeros are classified as weak...

Software Biology

arXiv CS Jan 29

Going NUTS with ADVI: Exploring various Bayesian Inference techniques with Facebook Prophet

arXiv:2601.20120v1 Announce Type: new Abstract: Since its introduction, Facebook Prophet has attracted positive attention from both classical statisticians and the Bayesian statistics community. The model...

Technology Software

arXiv CS Jan 29

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

arXiv:2601.20125v1 Announce Type: new Abstract: Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility...

Software Cybersecurity

arXiv CS Jan 29

Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models

arXiv:2601.20126v1 Announce Type: new Abstract: Large Language Models (LLMs) often produce hallucinated or unverifiable content, undermining their reliability in factual domains. This work investigates Reinforcement...

Software Policy

arXiv CS Jan 29

BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification

arXiv:2601.20129v1 Announce Type: new Abstract: Sentiment analysis for the Bengali language has attracted increasing research interest in recent years. However, progress remains constrained by the...

Psychology Policy

arXiv CS Jan 29

Real-Time Robot Execution with Masked Action Chunking

arXiv:2601.20130v1 Announce Type: new Abstract: Real-time execution is essential for cyber-physical systems such as robots. These systems operate in dynamic real-world environments where even small...

Policy Robotics

arXiv CS Jan 29

Taxonomy of the Retrieval System Framework: Pitfalls and Paradigms

arXiv:2601.20131v1 Announce Type: new Abstract: Designing an embedding retrieval system requires navigating a complex design space of conflicting trade-offs between efficiency and effectiveness. This work...

Neuroscience Software

arXiv CS Jan 29

Control systems for synthetic biology and a case-study in cell fate reprogramming

arXiv:2601.20135v1 Announce Type: new Abstract: This paper gives an overview of the use of control systems engineering in synthetic biology, motivated by applications such as...

Software Biology

arXiv CS Jan 29

Dynamic framework for edge-connectivity maintenance of simple graphs

arXiv:2601.20137v1 Announce Type: new Abstract: We present a dynamic framework for maintaining $k$-edge-connectivity of undirected, simple graphs subject to structural updates, specifically single edge additions...

Engineering Psychology

arXiv CS Jan 29

Scaling Next-Brain-Token Prediction for MEG

arXiv:2601.20138v1 Announce Type: new Abstract: We present a large autoregressive model for source-space MEG that scales next-token prediction to long context across datasets and scanners:...

Software Neuroscience

arXiv CS Jan 29

Large language models accurately predict public perceptions of support for climate action worldwide

arXiv:2601.20141v1 Announce Type: new Abstract: Although most people support climate action, widespread underestimation of others' support stalls individual and systemic changes. In this preregistered experiment,...

Climate & Environment Technology

arXiv CS Jan 29

Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR

arXiv:2601.20142v1 Announce Type: new Abstract: Self-supervised learning (SSL) models have achieved impressive results across many speech tasks, yet child automatic speech recognition (ASR) remains challenging...

Psychology Software

arXiv CS Jan 29

Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents

arXiv:2601.20144v1 Announce Type: new Abstract: Tool-calling agents are increasingly deployed in real-world customer-facing workflows. Yet most studies on tool-calling agents focus on idealized settings with...

Software Policy