CUEBES

PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

arXiv:2509.11517v2 Announce Type: replace Abstract: BACKGROUND: Medical large language models (LLMs) have demonstrated remarkable performance in answering medical examinations. However, the extent to which this...

Software Policy

arXiv CS 5d ago

A Uniqueness Theorem for Distributed Computation under Physical Constraint

arXiv:2509.11754v2 Announce Type: replace Abstract: Foundational models of computation often abstract away physical hardware limitations. However, in extreme environments like In-Network Computing (INC), these limitations...

Hardware Engineering

arXiv CS 5d ago

CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings

arXiv:2509.11787v3 Announce Type: replace Abstract: Static analysis tools are widely used to detect bugs, vulnerabilities, and code smells. Traditionally, developers must resolve these warnings manually....

Software Artificial Intelligence

arXiv CS 5d ago

Synthetic vs. Real Training Data for Visual Navigation

arXiv:2509.11791v2 Announce Type: replace Abstract: This paper investigates how the performance of visual navigation policies trained in simulation compares to policies trained with real-world data....

Hardware Robotics

arXiv CS 5d ago

ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference

arXiv:2509.14537v2 Announce Type: replace Abstract: Capturing professionals' decision-making in creative workflows (e.g., UI/UX) is essential for reflection, collaboration, and knowledge sharing, yet existing methods often...

Software Neuroscience

arXiv CS 5d ago

Diversity Boosts AI-Generated Text Detection

arXiv:2509.18880v3 Announce Type: replace Abstract: Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media,...

Biology Engineering

arXiv CS 5d ago

Normalizing Flows are Capable Models for Bi-manual Visuomotor Policy

arXiv:2509.21073v2 Announce Type: replace Abstract: The field of general-purpose robotics has recently embraced powerful probabilistic diffusion-based models to learn the complex embodiment behaviours. However, existing...

Robotics Energy

arXiv CS 5d ago

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

arXiv:2509.21500v2 Announce Type: replace Abstract: Reinforcement fine-tuning (RFT) often suffers from reward over-optimization, where a policy model hacks the reward signals to achieve high scores...

Policy Cybersecurity

arXiv CS 5d ago

Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding

arXiv:2509.21865v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information. However, recent advancements in...

Biology Software

arXiv CS 5d ago

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation

arXiv:2509.22548v2 Announce Type: replace Abstract: Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video...

Neuroscience Psychology

arXiv CS 5d ago

Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition

arXiv:2509.23253v3 Announce Type: replace Abstract: Spiking Neural Networks (SNNs) have garnered significant attention as a central paradigm in neuromorphic computing, owing to their energy efficiency...

Neuroscience Software

arXiv CS 5d ago

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

arXiv:2509.23597v3 Announce Type: replace Abstract: Time series forecasting remains a critical challenge across numerous domains, yet the effectiveness of complex models often varies unpredictably across...

Technology Engineering

arXiv CS 5d ago

Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

arXiv:2509.23744v2 Announce Type: replace Abstract: Multimodal large language models (MLLMs) promise enhanced reasoning by integrating diverse inputs such as text, vision, and audio. Yet cross-modal...

Psychology Software

arXiv CS 5d ago

Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

arXiv:2509.24072v4 Announce Type: replace Abstract: Large vision-language models (LVLMs) show strong performance across multimodal benchmarks but remain limited in structured reasoning and precise grounding. Recent...

Policy Biology

arXiv CS 5d ago

Incentive-Aligned Multi-Source LLM Summaries

arXiv:2509.25184v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used in modern search and answer systems to synthesize multiple, sometimes conflicting, texts into...

Chemistry Policy

arXiv CS 5d ago

Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data

arXiv:2509.25800v2 Announce Type: replace Abstract: Interventional causal discovery seeks to identify causal relations by leveraging distributional changes introduced by interventions, even in the presence of...

Biology Psychology

arXiv CS 5d ago

EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis

arXiv:2510.00024v2 Announce Type: replace Abstract: Large Language Models (LLMs) offer new opportunities to accelerate complex interdisciplinary research domains. Epidemic modeling, characterized by its complexity and...

Robotics Artificial Intelligence

arXiv CS 5d ago

FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates

arXiv:2510.00981v3 Announce Type: replace Abstract: Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled...

Software Neuroscience

arXiv CS 5d ago

PepCompass: Navigating peptide embedding spaces using Riemannian Geometry

arXiv:2510.01988v5 Announce Type: replace Abstract: Antimicrobial peptide discovery is challenged by the astronomical size of peptide space and the relative scarcity of active peptides. Generative...

Software Genetics

arXiv CS 5d ago

The Curious Case of In-Training Compression of State Space Models

arXiv:2510.02823v4 Announce Type: replace Abstract: State Space Models (SSMs), developed to tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference. At...

Software Energy

arXiv CS 5d ago

SciTS: Scientific Time Series Understanding and Generation with LLMs

arXiv:2510.03255v2 Announce Type: replace Abstract: The scientific reasoning ability of large language models (LLMs) has recently attracted significant attention. Time series, as a fundamental modality...

Software Psychology

arXiv CS 5d ago

Rethinking Consistent Multi-Label Classification Under Inexact Supervision

arXiv:2510.04091v2 Announce Type: replace Abstract: Partial multi-label learning and complementary multi-label learning are two popular weakly supervised multi-label classification paradigms that aim to alleviate the...

Psychology Software

arXiv CS 5d ago

Slm-mux: Orchestrating small language models for reasoning

arXiv:2510.05077v2 Announce Type: replace Abstract: With the rapid development of language models, the number of small language models (SLMs) has grown significantly. Although they do...

Artificial Intelligence Software

arXiv CS 5d ago

Hierarchical Semantic RL: Tackling the Problem of Dynamic Action Space for RL-based Recommendations

arXiv:2510.09167v2 Announce Type: replace Abstract: Recommender Systems (RS) are fundamental to modern online services. While most existing approaches optimize for short-term engagement, recent work has...

Software Policy