CUEBES

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

arXiv:2602.17544v1 Announce Type: new Abstract: In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought...

Policy Artificial Intelligence

arXiv CS 4d ago

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

arXiv:2602.17546v1 Announce Type: new Abstract: Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and...

Policy Psychology

arXiv CS 4d ago

KLong: Training LLM Agent for Extremely Long-horizon Tasks

arXiv:2602.17547v1 Announce Type: new Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start...

Policy Artificial Intelligence

arXiv CS 4d ago

MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning

arXiv:2602.17550v1 Announce Type: new Abstract: Existing Reinforcement Learning with Verifiable Rewards (RLVR) algorithms, such as GRPO, rely on rigid, uniform, and symmetric trust region mechanisms...

Policy Artificial Intelligence

arXiv CS 4d ago

TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data

arXiv:2602.17552v1 Announce Type: new Abstract: Error-bounded lossy compression is essential for managing the massive data volumes produced by large-scale HPC simulations. While state-of-the-art compressors such...

Psychology World News

arXiv CS 4d ago

A Theoretical Framework for Modular Learning of Robust Generative Models

arXiv:2602.17554v1 Announce Type: new Abstract: Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we...

Artificial Intelligence Engineering

arXiv CS 4d ago

GraphThinker: Reinforcing Video Reasoning with Event Graph Thinking

arXiv:2602.17555v1 Announce Type: new Abstract: Video reasoning requires understanding the causal relationships between events in a video. However, such relationships are often implicit and costly...

Engineering Business

arXiv CS 4d ago

RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

arXiv:2602.17558v1 Announce Type: new Abstract: Recent advances in multimodal large language models (MLLMs) have shown great potential for extending vision-language reasoning to professional tool-based image...

Technology Software

arXiv CS 4d ago

Revisiting Weight Regularization for Low-Rank Continual Learning

arXiv:2602.17559v1 Announce Type: new Abstract: Continual Learning (CL) with large-scale pre-trained models (PTMs) has recently gained wide attention, shifting the focus from training from scratch...

Software Technology

arXiv CS 4d ago

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

arXiv:2602.17560v1 Announce Type: new Abstract: Activation steering, or representation engineering, offers a lightweight approach to align large language models (LLMs) by manipulating their internal activations...

Engineering Software

arXiv CS 4d ago

A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

arXiv:2602.17566v1 Announce Type: new Abstract: The significant advancements in computational power cre- ate a vast opportunity for using Artificial Intelligence in different ap- plications of...

Health Technology

arXiv CS 4d ago

Be Wary of Your Time Series Preprocessing

arXiv:2602.17568v1 Announce Type: new Abstract: Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from...

Psychology World News

arXiv CS 4d ago

FR-GESTURE: An RGBD Dataset For Gesture-based Human-Robot Interaction In First Responder Operations

arXiv:2602.17573v1 Announce Type: new Abstract: The ever increasing intensity and number of disasters make even more difficult the work of First Responders (FRs). Artificial intelligence...

Robotics Policy

arXiv CS 4d ago

Hybrid System Planning using a Mixed-Integer ADMM Heuristic and Hybrid Zonotopes

arXiv:2602.17574v1 Announce Type: new Abstract: Embedded optimization-based planning for hybrid systems is challenging due to the use of mixed-integer programming, which is computationally intensive and...

Hardware Technology

arXiv CS 4d ago

Simultaneous Blackwell Approachability and Applications to Multiclass Omniprediction

arXiv:2602.17577v1 Announce Type: new Abstract: Omniprediction is a learning problem that requires suboptimality bounds for each of a family of losses $\mathcal{L}$ against a family...

Software Policy

arXiv CS 4d ago

Canonicalizing Multimodal Contrastive Representation Learning

arXiv:2602.17584v1 Announce Type: new Abstract: As models and data scale, independently trained networks often induce analogous notions of similarity. But, matching similarities is weaker than...

Software Cybersecurity

arXiv CS 4d ago

Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

arXiv:2602.17586v1 Announce Type: new Abstract: Safety validation for Level 4 autonomous vehicles (AVs) is currently bottlenecked by the inability to scale the detection of rare,...

Psychology Software

arXiv CS 4d ago

Modeling Distinct Human Interaction in Web Agents

arXiv:2602.17588v1 Announce Type: new Abstract: Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks...

Robotics Psychology

arXiv CS 4d ago

BMC4TimeSec: Verification Of Timed Security Protocols

arXiv:2602.17590v1 Announce Type: new Abstract: We present BMC4TimeSec, an end-to-end tool for verifying Timed Security Protocols (TSP) based on SMT-based bounded model checking and multi-agent...

Environment Software

arXiv CS 4d ago

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

arXiv:2602.17594v1 Announce Type: new Abstract: Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this...

Software Technology

arXiv CS 4d ago

Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

arXiv:2602.17596v1 Announce Type: new Abstract: We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i)...

Genetics Energy

arXiv CS 4d ago

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

arXiv:2602.17598v1 Announce Type: new Abstract: Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to...

Psychology Software

arXiv CS 4d ago

Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment

arXiv:2602.17599v1 Announce Type: new Abstract: Music generation has advanced markedly through multimodal deep learning, enabling models to synthesize audio from text and, more recently, from...

Software Psychology

arXiv CS 4d ago

Graph Neural Model Predictive Control for High-Dimensional Systems

arXiv:2602.17601v1 Announce Type: new Abstract: The control of high-dimensional systems, such as soft robots, requires models that faithfully capture complex dynamics while remaining computationally tractable....

Hardware Artificial Intelligence