Menu

Today's AI Research Top Papers

Wednesday, November 5, 2025
Investigates subtraction accuracy in eight LLMs, finding it lags behind addition. Errors in (a-b) are consistently related to errors in (b-a), suggesting models struggle with non-commutativity. This highlights a limitation in basic arithmetic reasoning for LLMs.
Proposes a QUBO formulation to enhance privacy in federated learning by bounding the risk of membership inference attacks. This method aims to improve data protection while maintaining model utility in distributed training scenarios.
Introduces Fast, Private, and Protected (FPP), a novel approach for federated learning that safeguards data privacy and defends against model poisoning attacks. It aims to ensure secure and robust distributed model training.
Proposes IG-Pruning, a novel input-aware method for pruning transformer layers in LLMs. It dynamically removes layers based on input, reducing computational costs for efficient inference without significant performance degradation.
Introduces LTD-Bench, a benchmark for evaluating LLMs' spatial reasoning capabilities through drawing. It addresses the limitations of opaque numerical metrics by providing an intuitive understanding of model abilities for physical world applications.
Proposes GRACE, a lightweight score to quantify teacher model effectiveness for student model distillation. It measures distributional properties of student gradients without a verifier, enabling principled teacher selection for efficient knowledge transfer.
Introduces SEAL, a symmetry-encouraging loss function for high energy physics. It improves robustness and data efficiency of machine learning models by explicitly respecting physical symmetries, even with experimental imperfections.
Addresses GPU NUMA effects in large-scale attention workloads by proposing Swizzle, a novel kernel scheduling strategy. It exploits NUMA-aware locality to optimize attention performance, mitigating memory latency and bandwidth variations.
Introduces the 'Three Taxes' framework to analyze performance inefficiencies in distributed LLMs. Proposes moving beyond BSP to achieve efficient multi-GPU inference by addressing bulk synchronous, locality, and kernel launch overheads.
Proposes PrivGNN, a high-performance secure inference protocol for graph neural networks. It addresses the challenge of securing GNNs and graph data in privacy-critical cloud environments, enabling secure analysis of graph-structured data.
arxiv_cv

3D Point Cloud Object Detection on Edge Devices for Split Computing

Abstract: Abstract: The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to run deep neural network models for 3D obj...
#Computer Vision#Autonomous Driving#Edge Computing#Distributed Machine Learning#Deep Learning Optimization
15 hours ago
90%
arxiv_cv

Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

Abstract: Abstract: The scarcity of data in various scenarios, such as medical, industry and autonomous driving, leads to model overfitting and dataset imbalance, thus hindering effective detection and segmentation performance. Existing studies employ the gene...
#Generative AI#Data Augmentation#Domain Adaptation#Computer Vision#Deep Learning
15 hours ago
95%
arxiv_cv

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Abstract: Abstract: Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data for Earth observation but pose challenges for existing multimodal foundation models due to two key bottlenecks: (1) limited availability of UHR training data, and ...
#Multimodal AI#Large Language Models#Remote Sensing#Computer Vision#Earth Observation
15 hours ago
96%
arxiv_cv

Light Future: Multimodal Action Frame Prediction via InstructPix2Pix

Abstract: Abstract: Predicting future motion trajectories is a critical capability across domains such as robotics, autonomous systems, and human activity forecasting, enabling safer and more intelligent decision-making. This paper proposes a novel, efficient,...
#Robotics#Predictive Modeling#Multimodal AI#Computer Vision#Human-Robot Interaction
15 hours ago
91%
arxiv_cv

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Abstract: Abstract: Notwithstanding offering convenience and entertainment to society, Deepfake face swapping has caused critical privacy issues with the rapid development of deep generative models. Due to imperceptible artifacts in high-quality synthetic imag...
#Digital Watermarking#Deepfake Detection and Prevention#AI Security#Privacy Preservation#Computer Vision
15 hours ago
85%
arxiv_cv

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

Abstract: Abstract: Plane Geometry Diagram Synthesis has been a crucial task in computer graphics, with applications ranging from educational tools to AI-driven mathematical reasoning. Traditionally, we rely on manual tools (e.g., Matplotlib and GeoGebra) to g...
#Computer Graphics#Geometric Modeling#AI for Education#Procedural Content Generation#Mathematical Reasoning
15 hours ago
90%
arxiv_cv

SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration

Abstract: Abstract: We introduce SigmaCollab, a dataset enabling research on physically situated human-AI collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performin...
#Human-AI Interaction#Human-Robot Collaboration#Mixed Reality#Embodied AI#Dataset Creation
15 hours ago
95%
arxiv_cv

Real World Federated Learning with a Knowledge Distilled Transformer for Cardiac CT Imaging

Abstract: Abstract: Federated learning is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often face challenges like partially labeled datasets, where only a few locations have certain expert ann...
#Federated Learning#Medical Image Analysis#Semi-Supervised Learning#Model Compression#Privacy in AI
15 hours ago
95%
arxiv_cv

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

Abstract: Abstract: The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing Transformer-based RGB-T SOD models with quadratic complexity are memory-intens...
#Salient Object Detection#Multi-modal Fusion#Computer Vision#Deep Learning#Efficient Architectures
15 hours ago
80%
arxiv_cv

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Abstract: Abstract: Recent advances in self-supervised learning for Vision Transformers (ViTs) have fueled breakthroughs in remote sensing (RS) foundation models. However, the quadratic complexity of self-attention poses a significant barrier to scalability, p...
#Remote Sensing Image Analysis#Foundation Models#Self-Supervised Learning#Efficient Deep Learning Architectures#Computer Vision
15 hours ago
75%
arxiv_cv

A Kullback-Leibler divergence method for input-system-state identification

Abstract: Abstract: The capability of a novel Kullback-Leibler divergence method is examined herein within the Kalman filter framework to select the input-parameter-state estimation execution with the most plausible results. This identification suffers from th...
#Control Theory#State Estimation#Machine Learning#Information Theory#System Identification
15 hours ago
70%
arxiv_cv

Mobile Robotic Multi-View Photometric Stereo

Abstract: Abstract: Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D acquisition of an object from images. Despite its outstanding results on diverse material objects, a typical MVPS experimental setup requires a well-calibrated li...
#3D Reconstruction#Robotics Perception#Photometric Stereo#Sensor Fusion#Machine Learning for Robotics
15 hours ago
70%
arxiv_cv

A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Abstract: Abstract: Enabling image generation models to be spatially controlled is an important area of research, empowering users to better generate images according to their own fine-grained specifications via e.g. edge maps, poses. Although this task has se...
#Generative Models#Image Synthesis#Conditional Generation#Computer Vision#Deep Learning Architectures
15 hours ago
94%
arxiv_cv

Training Convolutional Neural Networks with the Forward-Forward algorithm

Abstract: Abstract: Recent successes in image analysis with deep neural networks are achieved almost exclusively with Convolutional Neural Networks (CNNs), typically trained using the backpropagation (BP) algorithm. In a 2022 preprint, Geoffrey Hinton proposed...
#Deep Learning Training Methods#Alternative Learning Algorithms#Neural Network Architectures#Computational Neuroscience#Image Analysis
15 hours ago
70%
arxiv_cv

3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting

Abstract: Abstract: Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting...
#3D Content Generation#Generative Modeling#Computer Graphics#AI for Design#Deep Learning
15 hours ago
85%
arxiv_cv

An unscented Kalman filter method for real time input-parameter-state estimation

Abstract: Abstract: The input-parameter-state estimation capabilities of a novel unscented Kalman filter is examined herein on both linear and nonlinear systems. The unknown input is estimated in two stages within each time step. Firstly, the predicted dynamic...
#State Estimation#System Identification#Control Theory#Kalman Filtering#Real-time Systems
15 hours ago
65%
arxiv_cv

Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection

Abstract: Abstract: Monocular 3D lane detection aims to estimate the 3D position of lanes from frontal-view (FV) images. However, existing methods are fundamentally constrained by the inherent ambiguity of single-frame input, which leads to inaccurate geometri...
#3D Computer Vision#Autonomous Driving Perception#Deep Learning#Temporal Modeling#Scene Understanding
15 hours ago
94%
arxiv_cv

Label tree semantic losses for rich multi-class medical image segmentation

Abstract: Abstract: Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting ...
#Medical Image Segmentation#Deep Learning Losses#Hierarchical Classification#Weakly Supervised Learning#Computer Vision
15 hours ago
96%
arxiv_cv

Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback

Abstract: Abstract: Delineating anatomical regions is a key task in medical image analysis. Manual segmentation achieves high accuracy but is labor-intensive and prone to variability, thus prompting the development of automated approaches. Recently, a breadth ...
#Medical Image Segmentation#Weakly Supervised Learning#Foundation Models in Healthcare#Image Analysis#Machine Learning
15 hours ago
90%
arxiv_cv

FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks

Abstract: Abstract: Proactive Deepfake detection via robust watermarks has seen interest ever since passive Deepfake detectors encountered challenges in identifying high-quality synthetic images. However, while demonstrating reasonable detection performance, t...
#AI Safety#Deepfake Detection#Digital Watermarking#Image Forensics#Robustness
15 hours ago
95%
arxiv_cv

Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal

Abstract: Abstract: Universal adverse weather removal (UAWR) seeks to address various weather degradations within a unified framework. Recent methods are inspired by prompt learning using pre-trained vision-language models (e.g., CLIP), leveraging degradation-...
#Image Restoration#Computer Vision#Generative AI#Prompt Engineering#Adverse Weather Effects
15 hours ago
80%
arxiv_cv

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Abstract: Abstract: Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk...
#Event-Based Vision#Multimodal Understanding#Language Grounding#Scene Understanding#Robotics Perception
15 hours ago
93%
arxiv_cv

HAGI++: Head-Assisted Gaze Imputation and Generation

Abstract: Abstract: Mobile eye tracking plays a vital role in capturing human visual attention across both real-world and extended reality (XR) environments, making it an essential tool for applications ranging from behavioural research to human-computer inter...
#Human-Computer Interaction#Computer Vision#Machine Learning#Data Imputation#Sensor Fusion
15 hours ago
85%
arxiv_cl

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

Abstract: Abstract: We introduce SAIL-RL, a reinforcement learning (RL) post-training framework that enhances the reasoning capabilities of multimodal large language models (MLLMs) by teaching them when and how to think. Existing approaches are limited by outc...
#Multimodal AI#Reinforcement Learning#LLM Reasoning#AI Alignment#Model Tuning
15 hours ago
90%
arxiv_cl

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

Abstract: Abstract: We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey ...
#Economics#Econometrics#Artificial Intelligence#Large Language Models#Forecasting Methods
15 hours ago
95%
arxiv_cl

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Abstract: Abstract: Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored....
#Multimodal AI#Vision-Language Models#Code Generation#Benchmarking#Symbolic Reasoning
15 hours ago
88%
arxiv_ml

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Abstract: Abstract: AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a ...
#Automated Machine Learning (AutoML)#AI Agents#Search Algorithms#Reinforcement Learning#Machine Learning Benchmarking
15 hours ago
80%
arxiv_cv

Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks

Abstract: Abstract: We propose DenseMarks - a new learned representation for human heads, enabling high-quality dense correspondences of human head images. For a 2D image of a human head, a Vision Transformer network predicts a 3D embedding for each pixel, whi...
#3D Computer Vision#Human Body Modeling#Representation Learning#Geometric Deep Learning#Facial Analysis#Image Synthesis
15 hours ago
90%
arxiv_cl

Mixture of Routers

Abstract: Abstract: Supervised fine-tuning (SFT) is a milestone in aligning large language models with human instructions and adapting them to downstream tasks. In particular, Low-Rank Adaptation (LoRA) has gained widespread attention due to its parameter effi...
#Large Language Models#Model Architectures#Efficient Training#Parameter-Efficient Fine-Tuning#Deep Learning Optimization
15 hours ago
92%
arxiv_ml

Evolutionary Machine Learning meets Self-Supervised Learning: a comprehensive survey

Abstract: Abstract: The number of studies that combine Evolutionary Machine Learning and self-supervised learning has been growing steadily in recent years. Evolutionary Machine Learning has been shown to help automate the design of machine learning algorithms...
#Machine Learning Automation#Representation Learning#Data Efficiency#Algorithm Design#Survey of ML Techniques
15 hours ago
90%

Loading more papers...

📚 You've reached the end of the papers list