CUEBES

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

arXiv:2603.04406v1 Announce Type: new Abstract: With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly...

Biology Software

arXiv CS Mar 6

Semantic Containment as a Fundamental Property of Emergent Misalignment

arXiv:2603.04407v1 Announce Type: new Abstract: Fine-tuning language models on narrowly harmful data causes emergent misalignment (EM) -- behavioral failures extending far beyond training distributions. Recent...

Psychology Policy

arXiv CS Mar 6

Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World

arXiv:2603.04408v1 Announce Type: new Abstract: Current evaluation paradigms for large language models (LLMs) characterize models and datasets separately, yielding coarse descriptions: items in datasets are...

Psychology Software

arXiv CS Mar 6

Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

arXiv:2603.04409v1 Announce Type: new Abstract: The evaluation of large language models faces significant challenges. Technical benchmarks often lack real-world relevance, while existing human preference evaluations...

Artificial Intelligence Technology

arXiv CS Mar 6

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv:2603.04410v1 Announce Type: new Abstract: Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic...

Psychology Software

arXiv CS Mar 6

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv:2603.04411v1 Announce Type: new Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a...

Technology Software

arXiv CS Mar 6

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

arXiv:2603.04412v1 Announce Type: new Abstract: Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex...

Quantum Computing Software

arXiv CS Mar 6

Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

arXiv:2603.04413v1 Announce Type: new Abstract: Meaning in human language is relational, context dependent, and emergent, arising from dynamic systems of signs rather than fixed word-concept...

Software Policy

arXiv CS Mar 6

Multiclass Hate Speech Detection with RoBERTa-OTA: Integrating Transformer Attention and Graph Convolutional Networks

arXiv:2603.04414v1 Announce Type: new Abstract: Multiclass hate speech detection across demographic categories remains computationally challenging due to implicit targeting strategies and linguistic variability in social...

Software Neuroscience

arXiv CS Mar 6

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

arXiv:2603.04415v1 Announce Type: new Abstract: While reasoning-enhanced Large Language Models (LLMs) have demonstrated remarkable advances in complex tasks such as mathematics and coding, their effectiveness...

Software Mathematics

arXiv CS Mar 6

Optimizing What We Trust: Reliability-Guided QUBO Selection of Multi-Agent Weak Framing Signals for Arabic Sentiment Prediction

arXiv:2603.04416v1 Announce Type: new Abstract: Framing detection in Arabic social media is difficult due to interpretive ambiguity, cultural grounding, and limited reliable supervision. Existing LLM-based...

Software Energy

arXiv CS Mar 6

Same Input, Different Scores: A Multi Model Study on the Inconsistency of LLM Judge

arXiv:2603.04417v1 Announce Type: new Abstract: Large language models are increasingly used as automated evaluators in research and enterprise settings, a practice known as LLM-as-a-judge. While...

Artificial Intelligence Environment

arXiv CS Mar 6

Decorrelating the Future: Joint Frequency Domain Learning for Spatio-temporal Forecasting

arXiv:2603.04418v1 Announce Type: new Abstract: Standard direct forecasting models typically rely on point-wise objectives such as Mean Squared Error, which fail to capture the complex...

Software World News

arXiv CS Mar 6

Context-Dependent Affordance Computation in Vision-Language Models

arXiv:2603.04419v1 Announce Type: new Abstract: We characterize the phenomenon of context-dependent affordance computation in vision-language models (VLMs). Through a large-scale computational study (n=3,213 scene-context pairs...

Robotics Biology

arXiv CS Mar 6

Machine Learning for Complex Systems Dynamics: Detecting Bifurcations in Dynamical Systems with Deep Neural Networks

arXiv:2603.04420v1 Announce Type: new Abstract: Critical transitions are the abrupt shifts between qualitatively different states of a system, and they are crucial to understanding tipping...

Artificial Intelligence Biology

arXiv CS Mar 6

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

arXiv:2603.04421v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to...

Medicine & Health Health

arXiv CS Mar 6

FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning

arXiv:2603.04422v1 Announce Type: new Abstract: Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave...

World News Technology

arXiv CS Mar 6

Generating Realistic, Protocol-Compliant Maritime Radio Dialogues using Self-Instruct and Low-Rank Adaptation

arXiv:2603.04423v1 Announce Type: new Abstract: VHF radio miscommunication remains a major safety risk in maritime operations, with human factors accounting for over 58% of recorded...

Software Psychology

arXiv CS Mar 6

When Scaling Fails: Network and Fabric Effects on Distributed GPU Training Performance

arXiv:2603.04424v1 Announce Type: new Abstract: Scaling distributed GPU training is commonly assumed to yield predictable performance gains as additional nodes are added. In practice, many...

Software Technology

arXiv CS Mar 6

Data-Driven Optimization of Multi-Generational Cellular Networks: A Performance Classification Framework for Strategic Infrastructure Management

arXiv:2603.04425v1 Announce Type: new Abstract: The exponential growth in mobile data demand necessitates intelligent management of telecommunications infrastructure to ensure Quality of Service (QoS) and...

Biology Technology

arXiv CS Mar 6

Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes

arXiv:2603.04426v1 Announce Type: new Abstract: Model diffing methods aim to identify how fine-tuning changes a model's internal representations. Crosscoders approach this by learning shared dictionaries...

Software Biology

arXiv CS Mar 6

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection

arXiv:2603.04427v1 Announce Type: new Abstract: Standard transformer attention uses identical dimensionality for queries, keys, and values ($d_q = d_k = d_v = \dmodel$). Our insight...

Psychology Software

arXiv CS Mar 6

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

arXiv:2603.04428v1 Announce Type: new Abstract: Multi-agent LLM systems on edge devices face a memory management problem: device RAM is too small to hold every agent's...

Apple & Mac Software

arXiv CS Mar 6

What Is Missing: Interpretable Ratings for Large Language Model Outputs

arXiv:2603.04429v1 Announce Type: new Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization and Direct Preference Optimization learn from direct...

Policy Artificial Intelligence