CUEBES

How Reliable is Language Model Micro-Benchmarking?

arXiv:2510.08730v2 Announce Type: replace Abstract: Micro-benchmarking offers a solution to the often prohibitive time and cost of language model development: evaluate on a very small...

Software Policy

arXiv CS Mar 9

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

arXiv:2510.09033v2 Announce Type: replace Abstract: Recent work suggests that LLMs "know what they don't know", positing that hallucinated and factually correct outputs arise from distinct...

Software Policy

arXiv CS Mar 9

Beyond Flat Unknown Labels in Open-World Object Detection

arXiv:2510.09173v2 Announce Type: replace Abstract: Most object detectors operate under a closed-world assumption, recognizing only the classes annotated in the training dataset and failing when...

Psychology Robotics

arXiv CS Mar 9

Data-Driven Estimation of Quadrotor Motor Efficiency via Residual Minimization

arXiv:2510.11388v2 Announce Type: replace Abstract: A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as...

Software Robotics

arXiv CS Mar 9

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

arXiv:2510.11512v3 Announce Type: replace Abstract: Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately...

Policy Software

arXiv CS Mar 9

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

arXiv:2510.11689v2 Announce Type: replace Abstract: Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained...

Robotics Software

arXiv CS Mar 9

CanvasMAR: Improving Masked Autoregressive Video Prediction With Canvas

arXiv:2510.13669v2 Announce Type: replace Abstract: Masked autoregressive models (MAR) have emerged as a powerful paradigm for image and video generation, combining the flexibility of masked...

Energy Chemistry

arXiv CS Mar 9

Just-In-Time Objectives: A General Approach for Specialized AI Interactions

arXiv:2510.14591v2 Announce Type: replace Abstract: Large language models promise a broad set of functions, but when not given a specific objective, they default to generic...

Artificial Intelligence Psychology

arXiv CS Mar 9

Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks

arXiv:2510.17798v2 Announce Type: replace Abstract: This paper presents conservative probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under...

Mathematics Psychology

arXiv CS Mar 9

Mixed Monotonicity Reachability Analysis of Neural ODE: A Trade-Off Between Tightness and Efficiency

arXiv:2510.17859v2 Announce Type: replace Abstract: Neural ordinary differential equations (neural ODE) are powerful continuous-time machine learning models for depicting the behavior of complex dynamical systems,...

Psychology Software

arXiv CS Mar 9

Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language Models

arXiv:2510.18077v2 Announce Type: replace Abstract: This paper assesses the ability of large language models (LLMs) to translate texts that include inter-sentential dependencies. We use the...

Artificial Intelligence Policy

arXiv CS Mar 9

Formally Verified Linear-Time Invertible Lexing

arXiv:2510.18479v2 Announce Type: replace Abstract: We present ZipLex, a verified framework for invertible linear-time lexical analysis following the longest match semantics. Unlike past verified lexers...

Software Policy

arXiv CS Mar 9

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

arXiv:2510.18632v2 Announce Type: replace Abstract: Though recent advances in vision-language models (VLMs) have achieved remarkable progress across a wide range of multimodal tasks, understanding 3D...

Embedded Systems Neuroscience

arXiv CS Mar 9

Sample-Based Hybrid Mode Control: Asymptotically Optimal Switching of Algorithmic and Non-Differentiable Control Modes

arXiv:2510.19074v2 Announce Type: replace Abstract: This paper investigates a sample-based solution to the hybrid mode control problem across non-differentiable and algorithmic hybrid modes. Our approach...

Robotics Psychology

arXiv CS Mar 9

Multi-UAV Flood Monitoring via CVT with Gaussian Mixture of Density Functions for Coverage Control

arXiv:2510.19548v2 Announce Type: replace Abstract: This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate...

Environment Software

arXiv CS Mar 9

Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC

arXiv:2510.19974v2 Announce Type: replace Abstract: Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of...

Hardware Robotics

arXiv CS Mar 9

Automated Coding of Communication Data Using ChatGPT: Consistency Across Subgroups

arXiv:2510.20584v2 Announce Type: replace Abstract: Assessing communication and collaboration at scale depends on a labor intensive task of coding communication data into categories according to...

Artificial Intelligence Technology

arXiv CS Mar 9

Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

arXiv:2510.20886v2 Announce Type: replace Abstract: Many emerging applications of AI--from scientific discovery to medical diagnosis--require agents to seek information strategically: forming hypotheses, asking targeted questions,...

Psychology Software

arXiv CS Mar 9

AURASeg: Attention-guided Upsampling with Residual-Assistive Boundary Refinement for Onboard Robot Drivable-Area Segmentation

arXiv:2510.21536v4 Announce Type: replace Abstract: Free space ground segmentation is essential to navigate autonomous robots, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging...

Robotics Energy

arXiv CS Mar 9

DETECT: Determining Ease and Textual Clarity of German Text Simplifications

arXiv:2510.22212v2 Announce Type: replace Abstract: Current evaluation of German automatic text simplification (ATS) relies on general-purpose metrics such as SARI, BLEU, and BERTScore, which insufficiently...

Psychology Software

arXiv CS Mar 9

Multistep Methods for Floquet Multipliers and Subspaces

arXiv:2510.23082v3 Announce Type: replace Abstract: Accurate and efficient computation of Floquet multipliers and subspaces is essential for analyzing limit cycle in dynamical systems and periodic...

Software Policy

arXiv CS Mar 9

AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages

arXiv:2510.23896v2 Announce Type: replace Abstract: Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing...

Psychology Policy

arXiv CS Mar 9

Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning

arXiv:2511.00814v2 Announce Type: replace Abstract: Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers...

Robotics Embedded Systems

arXiv CS Mar 9

Indicating Robot Vision Capabilities with Augmented Reality

arXiv:2511.03550v3 Announce Type: replace Abstract: Research indicates that humans can mistakenly assume that robots and humans have the same field of view, possessing an inaccurate...

Robotics Neuroscience