Your AI Papers Research Assistant
All Research
14188 papers
Large Language Models
5415 papers
Computer Vision
2514 papers
Generative AI
1710 papers
AI Safety & Ethics
1583 papers
Reinforcement Learning
1088 papers
Graph Neural Networks
892 papers
Robotics & Embodied AI
691 papers
Speech & Audio
258 papers
uncategorized
30 papers
Efficient AI
6 papers
AI for Science
1 papers
Today's AI Research Top Papers
Wednesday, November 5, 2025
📊 Read Full Intelligence Reports:
Investigates subtraction accuracy in eight LLMs, finding it lags behind addition. Errors in (a-b) are consistently related to errors in (b-a), suggesting models struggle with non-commutativity. This highlights a limitation in basic arithmetic reasoning for LLMs.
Proposes a QUBO formulation to enhance privacy in federated learning by bounding the risk of membership inference attacks. This method aims to improve data protection while maintaining model utility in distributed training scenarios.
Introduces Fast, Private, and Protected (FPP), a novel approach for federated learning that safeguards data privacy and defends against model poisoning attacks. It aims to ensure secure and robust distributed model training.
Proposes IG-Pruning, a novel input-aware method for pruning transformer layers in LLMs. It dynamically removes layers based on input, reducing computational costs for efficient inference without significant performance degradation.
Introduces LTD-Bench, a benchmark for evaluating LLMs' spatial reasoning capabilities through drawing. It addresses the limitations of opaque numerical metrics by providing an intuitive understanding of model abilities for physical world applications.
Proposes GRACE, a lightweight score to quantify teacher model effectiveness for student model distillation. It measures distributional properties of student gradients without a verifier, enabling principled teacher selection for efficient knowledge transfer.
Introduces SEAL, a symmetry-encouraging loss function for high energy physics. It improves robustness and data efficiency of machine learning models by explicitly respecting physical symmetries, even with experimental imperfections.
Addresses GPU NUMA effects in large-scale attention workloads by proposing Swizzle, a novel kernel scheduling strategy. It exploits NUMA-aware locality to optimize attention performance, mitigating memory latency and bandwidth variations.
Introduces the 'Three Taxes' framework to analyze performance inefficiencies in distributed LLMs. Proposes moving beyond BSP to achieve efficient multi-GPU inference by addressing bulk synchronous, locality, and kernel launch overheads.
Proposes PrivGNN, a high-performance secure inference protocol for graph neural networks. It addresses the challenge of securing GNNs and graph data in privacy-critical cloud environments, enabling secure analysis of graph-structured data.
arxiv_cv
3D Point Cloud Object Detection on Edge Devices for Split Computing
Abstract: Abstract: The field of autonomous driving technology is rapidly advancing, with deep
learning being a key component. Particularly in the field of sensing, 3D point
cloud data collected by LiDAR is utilized to run deep neural network models for
3D obj...
arxiv_cv
Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios
Abstract: Abstract: The scarcity of data in various scenarios, such as medical, industry and
autonomous driving, leads to model overfitting and dataset imbalance, thus
hindering effective detection and segmentation performance. Existing studies
employ the gene...
arxiv_cv
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Abstract: Abstract: Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data
for Earth observation but pose challenges for existing multimodal foundation
models due to two key bottlenecks: (1) limited availability of UHR training
data, and ...
arxiv_cv
Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
Abstract: Abstract: Predicting future motion trajectories is a critical capability across domains
such as robotics, autonomous systems, and human activity forecasting, enabling
safer and more intelligent decision-making. This paper proposes a novel,
efficient,...
arxiv_cv
Robust Identity Perceptual Watermark Against Deepfake Face Swapping
Abstract: Abstract: Notwithstanding offering convenience and entertainment to society, Deepfake
face swapping has caused critical privacy issues with the rapid development of
deep generative models. Due to imperceptible artifacts in high-quality
synthetic imag...
arxiv_cv
GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
Abstract: Abstract: Plane Geometry Diagram Synthesis has been a crucial task in computer
graphics, with applications ranging from educational tools to AI-driven
mathematical reasoning. Traditionally, we rely on manual tools (e.g.,
Matplotlib and GeoGebra) to g...
arxiv_cv
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
Abstract: Abstract: We introduce SigmaCollab, a dataset enabling research on physically situated
human-AI collaboration. The dataset consists of a set of 85 sessions in which
untrained participants were guided by a mixed-reality assistive AI agent in
performin...
arxiv_cv
Real World Federated Learning with a Knowledge Distilled Transformer for Cardiac CT Imaging
Abstract: Abstract: Federated learning is a renowned technique for utilizing decentralized data
while preserving privacy. However, real-world applications often face
challenges like partially labeled datasets, where only a few locations have
certain expert ann...
arxiv_cv
Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection
Abstract: Abstract: The rapid development of deep learning has significantly improved salient
object detection (SOD) combining both RGB and thermal (RGB-T) images. However,
existing Transformer-based RGB-T SOD models with quadratic complexity are
memory-intens...
arxiv_cv
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Abstract: Abstract: Recent advances in self-supervised learning for Vision Transformers (ViTs)
have fueled breakthroughs in remote sensing (RS) foundation models. However,
the quadratic complexity of self-attention poses a significant barrier to
scalability, p...
arxiv_cv
A Kullback-Leibler divergence method for input-system-state identification
Abstract: Abstract: The capability of a novel Kullback-Leibler divergence method is examined
herein within the Kalman filter framework to select the input-parameter-state
estimation execution with the most plausible results. This identification
suffers from th...
arxiv_cv
Mobile Robotic Multi-View Photometric Stereo
Abstract: Abstract: Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D
acquisition of an object from images. Despite its outstanding results on
diverse material objects, a typical MVPS experimental setup requires a
well-calibrated li...
arxiv_cv
A Practical Investigation of Spatially-Controlled Image Generation with Transformers
Abstract: Abstract: Enabling image generation models to be spatially controlled is an important
area of research, empowering users to better generate images according to their
own fine-grained specifications via e.g. edge maps, poses. Although this task
has se...
arxiv_cv
Training Convolutional Neural Networks with the Forward-Forward algorithm
Abstract: Abstract: Recent successes in image analysis with deep neural networks are achieved
almost exclusively with Convolutional Neural Networks (CNNs), typically trained
using the backpropagation (BP) algorithm. In a 2022 preprint, Geoffrey Hinton
proposed...
arxiv_cv
3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting
Abstract: Abstract: Recent advancements in text-to-3D generation have shown remarkable results by
leveraging 3D priors in combination with 2D diffusion. However, previous
methods utilize 3D priors that lack detailed and complex structural
information, limiting...
arxiv_cv
An unscented Kalman filter method for real time input-parameter-state estimation
Abstract: Abstract: The input-parameter-state estimation capabilities of a novel unscented Kalman
filter is examined herein on both linear and nonlinear systems. The unknown
input is estimated in two stages within each time step. Firstly, the predicted
dynamic...
arxiv_cv
Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection
Abstract: Abstract: Monocular 3D lane detection aims to estimate the 3D position of lanes from
frontal-view (FV) images. However, existing methods are fundamentally
constrained by the inherent ambiguity of single-frame input, which leads to
inaccurate geometri...
arxiv_cv
Label tree semantic losses for rich multi-class medical image segmentation
Abstract: Abstract: Rich and accurate medical image segmentation is poised to underpin the next
generation of AI-defined clinical practice by delineating critical anatomy for
pre-operative planning, guiding real-time intra-operative navigation, and
supporting ...
arxiv_cv
Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback
Abstract: Abstract: Delineating anatomical regions is a key task in medical image analysis.
Manual segmentation achieves high accuracy but is labor-intensive and prone to
variability, thus prompting the development of automated approaches. Recently,
a breadth ...
arxiv_cv
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Abstract: Abstract: Proactive Deepfake detection via robust watermarks has seen interest ever
since passive Deepfake detectors encountered challenges in identifying
high-quality synthetic images. However, while demonstrating reasonable
detection performance, t...
arxiv_cv
Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
Abstract: Abstract: Universal adverse weather removal (UAWR) seeks to address various weather
degradations within a unified framework. Recent methods are inspired by prompt
learning using pre-trained vision-language models (e.g., CLIP), leveraging
degradation-...
arxiv_cv
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Abstract: Abstract: Event cameras offer microsecond-level latency and robustness to motion blur,
making them ideal for understanding dynamic environments. Yet, connecting these
asynchronous streams to human language remains an open challenge. We introduce
Talk...
arxiv_cv
HAGI++: Head-Assisted Gaze Imputation and Generation
Abstract: Abstract: Mobile eye tracking plays a vital role in capturing human visual attention
across both real-world and extended reality (XR) environments, making it an
essential tool for applications ranging from behavioural research to
human-computer inter...
arxiv_cl
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Abstract: Abstract: We introduce SAIL-RL, a reinforcement learning (RL) post-training framework
that enhances the reasoning capabilities of multimodal large language models
(MLLMs) by teaching them when and how to think. Existing approaches are limited
by outc...
arxiv_cl
Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas
Abstract: Abstract: We evaluate whether persona-based prompting improves Large Language Model
(LLM) performance on macroeconomic forecasting tasks. Using 2,368
economics-related personas from the PersonaHub corpus, we prompt GPT-4o to
replicate the ECB Survey ...
arxiv_cl
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Abstract: Abstract: Code has emerged as a precise and executable medium for reasoning and action
in the agent era. Yet, progress has largely focused on language-centric tasks
such as program synthesis and debugging, leaving visual-centric coding
underexplored....
arxiv_ml
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Abstract: Abstract: AI research agents are demonstrating great potential to accelerate scientific
progress by automating the design, implementation, and training of machine
learning models. We focus on methods for improving agents' performance on
MLE-bench, a ...
arxiv_cv
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
Abstract: Abstract: We propose DenseMarks - a new learned representation for human heads,
enabling high-quality dense correspondences of human head images. For a 2D
image of a human head, a Vision Transformer network predicts a 3D embedding for
each pixel, whi...
arxiv_cl
Mixture of Routers
Abstract: Abstract: Supervised fine-tuning (SFT) is a milestone in aligning large language models
with human instructions and adapting them to downstream tasks. In particular,
Low-Rank Adaptation (LoRA) has gained widespread attention due to its parameter
effi...
arxiv_ml
Evolutionary Machine Learning meets Self-Supervised Learning: a comprehensive survey
Abstract: Abstract: The number of studies that combine Evolutionary Machine Learning and
self-supervised learning has been growing steadily in recent years.
Evolutionary Machine Learning has been shown to help automate the design of
machine learning algorithms...
Loading more papers...
📚 You've reached the end of the papers list