Agentic Very Long Video Understanding
arXiv:2601.18157v1 Announce Type: new Abstract: The advent of always-on personal AI assistants, enabled by all-day wearable devices such as smart glasses, demands a new level...
Stay updated with the latest research and technology news
arXiv:2601.18157v1 Announce Type: new Abstract: The advent of always-on personal AI assistants, enabled by all-day wearable devices such as smart glasses, demands a new level...
arXiv:2601.18158v1 Announce Type: new Abstract: Graphs are central to modeling relationships in scientific computing, data analysis, and AI/ML, but their growing scale can exceed the...
arXiv:2601.18159v1 Announce Type: new Abstract: The growing demand for compute-intensive applications has made multi-chiplet architectures a promising alternative to monolithic designs, offering improved scalability and...
arXiv:2601.18162v1 Announce Type: new Abstract: Fine-grained emotion recognition is a challenging multi-label NLP task due to label overlap and class imbalance. In this work, we...
arXiv:2601.18168v1 Announce Type: new Abstract: Transarterial chemoembolization (TACE) is a preferred treatment option for hepatocellular carcinoma and other liver malignancies, yet it remains a highly...
arXiv:2601.18171v1 Announce Type: new Abstract: Unsupervised Domain Adaptation (UDA) aims to mitigate performance degradation when training and testing data are sampled from different distributions. While...
arXiv:2601.18172v1 Announce Type: new Abstract: One-stage object detection, particularly the YOLO series, strikes a favorable balance between accuracy and efficiency. However, existing YOLO detectors lack...
arXiv:2601.18175v1 Announce Type: new Abstract: A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a...
arXiv:2601.18177v1 Announce Type: new Abstract: Silent speech interfaces (SSIs) enable silent interaction in noise-sensitive or privacy-sensitive settings. However, existing SSIs face practical deployment trade-offs among...
arXiv:2601.18179v1 Announce Type: new Abstract: Therapeutic homework (i.e., tasks assigned by therapists for clients to complete between sessions) is essential for effective psychotherapy, yet therapists...
arXiv:2601.18186v1 Announce Type: new Abstract: We introduce a Trajectory-Based RBF Collocation (TBRBF) method for solving surface advection-diffusion equations on smooth, compact manifolds. TBRBF decouples advection...
arXiv:2601.18188v1 Announce Type: new Abstract: Vision-and-Language Navigation (VLN) requires agents to interpret natural language instructions and act coherently in visually rich environments. However, most existing...
arXiv:2601.18189v1 Announce Type: new Abstract: Continuous optimization has significantly advanced causal discovery, yet existing methods (e.g., NOTEARS) generally guarantee only asymptotic convergence to a stationary...
arXiv:2601.18190v1 Announce Type: new Abstract: Vision-Language Pre-training (VLP) models like CLIP have significantly advanced Remote Sensing Image-Text Retrieval (RSITR). However, existing methods predominantly rely on...
arXiv:2601.18192v2 Announce Type: new Abstract: Reconstructing human dynamic visual perception from electroencephalography (EEG) signals is of great research significance since EEG's non-invasiveness and high temporal...
arXiv:2601.18193v1 Announce Type: new Abstract: Visual designers often seek inspiration from Chinese paintings when tasked with creating Chinese-style illustrations, posters, etc. Our formative study (N=10)...
arXiv:2601.18195v1 Announce Type: new Abstract: Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands...
arXiv:2601.18197v1 Announce Type: new Abstract: While Large Vision-Language Models (LVLMs) have significantly advanced GUI agents' capabilities in parsing textual instructions, interpreting screen content, and executing...
arXiv:2601.18198v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are eminently suitable for wireless resource management, thanks to their scalability, but they still face computational...
arXiv:2601.18200v1 Announce Type: new Abstract: Wireless foundation models promise transformative capabilities for channel state information (CSI) processing across diverse 6G network applications, yet face fundamental...
arXiv:2601.18202v1 Announce Type: new Abstract: Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking...
arXiv:2601.18203v2 Announce Type: new Abstract: Existing multimodal document question-answering (QA) systems predominantly rely on flat semantic retrieval, representing documents as a set of disconnected text...
arXiv:2601.18204v1 Announce Type: new Abstract: Large language model-based agents operating in long-horizon interactions require memory systems that support temporal consistency, multi-hop reasoning, and evidence-grounded reuse...
arXiv:2601.18207v1 Announce Type: new Abstract: Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods...