How Reliable is Language Model Micro-Benchmarking?
arXiv:2510.08730v2 Announce Type: replace Abstract: Micro-benchmarking offers a solution to the often prohibitive time and cost of language model development: evaluate on a very small...
Stay updated with the latest research and technology news
arXiv:2510.08730v2 Announce Type: replace Abstract: Micro-benchmarking offers a solution to the often prohibitive time and cost of language model development: evaluate on a very small...
arXiv:2510.09033v2 Announce Type: replace Abstract: Recent work suggests that LLMs "know what they don't know", positing that hallucinated and factually correct outputs arise from distinct...
arXiv:2510.09173v2 Announce Type: replace Abstract: Most object detectors operate under a closed-world assumption, recognizing only the classes annotated in the training dataset and failing when...
arXiv:2510.11388v2 Announce Type: replace Abstract: A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as...
arXiv:2510.11512v3 Announce Type: replace Abstract: Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately...
arXiv:2510.11689v2 Announce Type: replace Abstract: Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained...
arXiv:2510.13669v2 Announce Type: replace Abstract: Masked autoregressive models (MAR) have emerged as a powerful paradigm for image and video generation, combining the flexibility of masked...
arXiv:2510.14591v2 Announce Type: replace Abstract: Large language models promise a broad set of functions, but when not given a specific objective, they default to generic...
arXiv:2510.17798v2 Announce Type: replace Abstract: This paper presents conservative probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under...
arXiv:2510.17859v2 Announce Type: replace Abstract: Neural ordinary differential equations (neural ODE) are powerful continuous-time machine learning models for depicting the behavior of complex dynamical systems,...
arXiv:2510.18077v2 Announce Type: replace Abstract: This paper assesses the ability of large language models (LLMs) to translate texts that include inter-sentential dependencies. We use the...
arXiv:2510.18479v2 Announce Type: replace Abstract: We present ZipLex, a verified framework for invertible linear-time lexical analysis following the longest match semantics. Unlike past verified lexers...
arXiv:2510.18632v2 Announce Type: replace Abstract: Though recent advances in vision-language models (VLMs) have achieved remarkable progress across a wide range of multimodal tasks, understanding 3D...
arXiv:2510.19074v2 Announce Type: replace Abstract: This paper investigates a sample-based solution to the hybrid mode control problem across non-differentiable and algorithmic hybrid modes. Our approach...
arXiv:2510.19548v2 Announce Type: replace Abstract: This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate...
arXiv:2510.19974v2 Announce Type: replace Abstract: Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of...
arXiv:2510.20584v2 Announce Type: replace Abstract: Assessing communication and collaboration at scale depends on a labor intensive task of coding communication data into categories according to...
arXiv:2510.20886v2 Announce Type: replace Abstract: Many emerging applications of AI--from scientific discovery to medical diagnosis--require agents to seek information strategically: forming hypotheses, asking targeted questions,...
arXiv:2510.21536v4 Announce Type: replace Abstract: Free space ground segmentation is essential to navigate autonomous robots, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging...
arXiv:2510.22212v2 Announce Type: replace Abstract: Current evaluation of German automatic text simplification (ATS) relies on general-purpose metrics such as SARI, BLEU, and BERTScore, which insufficiently...
arXiv:2510.23082v3 Announce Type: replace Abstract: Accurate and efficient computation of Floquet multipliers and subspaces is essential for analyzing limit cycle in dynamical systems and periodic...
arXiv:2510.23896v2 Announce Type: replace Abstract: Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing...
arXiv:2511.00814v2 Announce Type: replace Abstract: Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers...
arXiv:2511.03550v3 Announce Type: replace Abstract: Research indicates that humans can mistakenly assume that robots and humans have the same field of view, possessing an inaccurate...