AIPapers.ai - AI Research Papers Daily

Today's Computer Vision Research Top Papers

Wednesday, November 5, 2025

📊 Read Full Intelligence Reports:

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Introduces RoMA, scaling Mamba-based foundation models for remote sensing with linear complexity. Addresses scalability barriers of Vision Transformers for large models and high-resolution images in supervised tasks.

HAGI++: Head-Assisted Gaze Imputation and Generation

Proposes HAGI++, a diffusion-based multimodal approach for imputing and generating missing gaze data in real-world and XR environments. Addresses challenges like blinks and tracking errors, enabling better behavioral research and HCI applications.

Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback

Introduces a resource-efficient method for automatic segmentation refinement using weak supervision from light feedback. Addresses limitations of foundation models in medical imaging by improving performance with less labor-intensive annotation.

Training Convolutional Neural Networks with the Forward-Forward algorithm

Extends the Forward-Forward (FF) algorithm for training Convolutional Neural Networks (CNNs). Proposes a biologically inspired alternative to backpropagation, enabling CNN training with locally defined goodness functions.

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

Proposes FreqSal, a purely Fourier Transform-based model for RGB-T salient object detection. Overcomes quadratic complexity limitations of Transformer models, enabling efficient bimodal feature fusion for high-resolution images.

Mobile Robotic Multi-View Photometric Stereo

Introduces a new mobile robotic system for Multi-View Photometric Stereo (MVPS) 3D acquisition. Enables MVPS benefits on movable platforms, expanding 3D acquisition capabilities for mobile robotics applications.

Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal

Proposes CyclicPrompt, a cyclic prompting approach for Universal Adverse Weather Removal (UAWR). Enhances effectiveness, adaptability, and generalizability of weather-free image restoration using prompt learning with vision-language models.

Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection

Proposes a Geometry-aware Temporal Aggregation Network to address monocular ambiguity in 3D lane detection. Exploits temporal evolution information to improve geometric predictions and lane integrity, especially for distant lanes.

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

Proposes GeoSDF, a text-to-3D framework for generating 3D plane geometry diagrams using Signed Distance Fields. Addresses challenges in creating intricate structures by leveraging 3D priors and 2D diffusion models.

Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

Proposes Crucial-Diff, a unified diffusion model for synthesizing crucial images and annotations in data-scarce scenarios. Addresses model overfitting and dataset imbalance by generating targeted training samples to improve detection and segmentation.

Sort by:

arxiv_cv

Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection

Abstract: Abstract: Monocular 3D lane detection aims to estimate the 3D position of lanes from frontal-view (FV) images. However, existing methods are fundamentally constrained by the inherent ambiguity of single-frame input, which leads to inaccurate geometri...

#3D Computer Vision#Autonomous Driving Perception#Deep Learning#Temporal Modeling#Scene Understanding

17 hours ago

94%

arxiv_cv

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

Abstract: Abstract: The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing Transformer-based RGB-T SOD models with quadratic complexity are memory-intens...

#Salient Object Detection#Multi-modal Fusion#Computer Vision#Deep Learning#Efficient Architectures

17 hours ago

80%

arxiv_cv

HAGI++: Head-Assisted Gaze Imputation and Generation

Abstract: Abstract: Mobile eye tracking plays a vital role in capturing human visual attention across both real-world and extended reality (XR) environments, making it an essential tool for applications ranging from behavioural research to human-computer inter...

#Human-Computer Interaction#Computer Vision#Machine Learning#Data Imputation#Sensor Fusion

17 hours ago

85%

arxiv_cv

Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback

Abstract: Abstract: Delineating anatomical regions is a key task in medical image analysis. Manual segmentation achieves high accuracy but is labor-intensive and prone to variability, thus prompting the development of automated approaches. Recently, a breadth ...

#Medical Image Segmentation#Weakly Supervised Learning#Foundation Models in Healthcare#Image Analysis#Machine Learning

17 hours ago

90%

arxiv_cv

3D Point Cloud Object Detection on Edge Devices for Split Computing

Abstract: Abstract: The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to run deep neural network models for 3D obj...

#Computer Vision#Autonomous Driving#Edge Computing#Distributed Machine Learning#Deep Learning Optimization

17 hours ago

90%

arxiv_cv

Real World Federated Learning with a Knowledge Distilled Transformer for Cardiac CT Imaging

Abstract: Abstract: Federated learning is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often face challenges like partially labeled datasets, where only a few locations have certain expert ann...

#Federated Learning#Medical Image Analysis#Semi-Supervised Learning#Model Compression#Privacy in AI

17 hours ago

95%

arxiv_cv

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

Abstract: Abstract: Plane Geometry Diagram Synthesis has been a crucial task in computer graphics, with applications ranging from educational tools to AI-driven mathematical reasoning. Traditionally, we rely on manual tools (e.g., Matplotlib and GeoGebra) to g...

#Computer Graphics#Geometric Modeling#AI for Education#Procedural Content Generation#Mathematical Reasoning

17 hours ago

90%

arxiv_cv

Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal

Abstract: Abstract: Universal adverse weather removal (UAWR) seeks to address various weather degradations within a unified framework. Recent methods are inspired by prompt learning using pre-trained vision-language models (e.g., CLIP), leveraging degradation-...

#Image Restoration#Computer Vision#Generative AI#Prompt Engineering#Adverse Weather Effects

17 hours ago

80%

arxiv_cv

Label tree semantic losses for rich multi-class medical image segmentation

Abstract: Abstract: Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting ...

#Medical Image Segmentation#Deep Learning Losses#Hierarchical Classification#Weakly Supervised Learning#Computer Vision

17 hours ago

96%

arxiv_cv

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Abstract: Abstract: Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk...

#Event-Based Vision#Multimodal Understanding#Language Grounding#Scene Understanding#Robotics Perception

17 hours ago

93%

arxiv_cv

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Abstract: Abstract: Recent advances in self-supervised learning for Vision Transformers (ViTs) have fueled breakthroughs in remote sensing (RS) foundation models. However, the quadratic complexity of self-attention poses a significant barrier to scalability, p...

#Remote Sensing Image Analysis#Foundation Models#Self-Supervised Learning#Efficient Deep Learning Architectures#Computer Vision

17 hours ago

75%

arxiv_cv

Modality-Transition Representation Learning for Visible-Infrared Person Re-Identification

Abstract: Abstract: Visible-infrared person re-identification (VI-ReID) technique could associate the pedestrian images across visible and infrared modalities in the practical scenarios of background illumination changes. However, a substantial gap inherently ...

#Person Re-Identification#Multimodal Learning#Representation Learning#Computer Vision#Image Generation

17 hours ago

85%

arxiv_cv

Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data

Abstract: Abstract: Shearography is a non-destructive testing method for detecting subsurface defects, offering high sensitivity and full-field inspection capabilities. However, its industrial adoption remains limited due to the need for expert interpretation....

#Computer Vision#Unsupervised Learning#Industrial Inspection#Non-Destructive Testing (NDT)#Anomaly Detection#Machine Learning for Manufacturing

17 hours ago

85%

arxiv_cv

Can Foundation Models Revolutionize Mobile AR Sparse Sensing?

Abstract: Abstract: Mobile sensing systems have long faced a fundamental trade-off between sensing quality and efficiency due to constraints in computation, power, and other limitations. Sparse sensing, which aims to acquire and process only a subset of sensor...

#Augmented Reality (AR)#Mobile Sensing#Foundation Models#3D Computer Vision#Efficient AI

17 hours ago

90%

arxiv_cl

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Abstract: Abstract: Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored....

#Multimodal AI#Vision-Language Models#Code Generation#Benchmarking#Symbolic Reasoning

17 hours ago

88%

arxiv_cv

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

Abstract: Abstract: Large-scale chemical reaction datasets are crucial for AI research in chemistry. However, existing chemical reaction data often exist as images within papers, making them not machine-readable and unusable for training machine learning model...

#AI in Chemistry#Computer Vision#Natural Language Processing#Multimodal Learning#Information Extraction

17 hours ago

85%

arxiv_ml

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Abstract: Abstract: Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions, typically restricting evaluation to the human-activity sphere of atmosphere and to at most 1...

#Earth Science#Multimodal Learning#Machine Learning Benchmarking#Climate Modeling#Environmental Science

17 hours ago

85%

arxiv_cv

Assessing the value of Geo-Foundational Models for Flood Inundation Mapping: Benchmarking models for Sentinel-1, Sentinel-2, and Planetscope for end-users

Abstract: Abstract: Geo-Foundational Models (GFMs) enable fast and reliable extraction of spatiotemporal information from satellite imagery, improving flood inundation mapping by leveraging location and time embeddings. Despite their potential, it remains uncl...

#Remote Sensing#Geospatial AI#Environmental Monitoring#Computer Vision#Deep Learning

17 hours ago

90%

arxiv_cv

Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis

Abstract: Abstract: Alzheimer's disease (AD) is the most prevalent form of dementia, and its early diagnosis is essential for slowing disease progression. Recent studies on multimodal neuroimaging fusion using MRI and PET have achieved promising results by int...

#Medical Imaging#Neuroscience#Machine Learning for Healthcare#Multimodal Learning#Disease Diagnosis

17 hours ago

95%

arxiv_ml

NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

Abstract: Abstract: The coupling signal refers to a latent physiological signal that characterizes the transformation from cardiac electrical excitation, captured by the electrocardiogram (ECG), to mechanical contraction, recorded by the phonocardiogram (PCG)....

#Medical Signal Analysis#Multi-modal Fusion#Robust Machine Learning#Cardiovascular Health#Biomedical Engineering#Data-driven Diagnostics

17 hours ago

85%

arxiv_ml

Weakly Supervised Object Segmentation by Background Conditional Divergence

Abstract: Abstract: As a computer vision task, automatic object segmentation remains challenging in specialized image domains without massive labeled data, such as synthetic aperture sonar images, remote sensing, biomedical imaging, etc. In any domain, obtaini...

#Computer Vision#Image Segmentation#Weakly Supervised Learning#Generative Models#Domain Adaptation

17 hours ago

80%

arxiv_cv

Markerless Augmented Reality Registration for Surgical Guidance: A Multi-Anatomy Clinical Accuracy Study

Abstract: Abstract: Purpose: In this paper, we develop and clinically evaluate a depth-only, markerless augmented reality (AR) registration pipeline on a head-mounted display, and assess accuracy across small or low-curvature anatomies in real-life operative s...

#Augmented Reality#Medical Imaging#Computer Vision#Surgical Robotics#Medical Devices#3D Registration

17 hours ago

95%

arxiv_cv

M3PD Dataset: Dual-view Photoplethysmography (PPG) Using Front-and-rear Cameras of Smartphones in Lab and Clinical Settings

Abstract: Abstract: Portable physiological monitoring is essential for early detection and management of cardiovascular disease, but current methods often require specialized equipment that limits accessibility or impose impractical postures that patients cann...

#Physiological Monitoring#Medical Informatics#Computer Vision#Signal Processing#Dataset Creation

17 hours ago

90%

arxiv_cv

Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization

Abstract: Abstract: With the rapid growth of the low-altitude economy, UAVs have become crucial for measurement and tracking in patrol systems. However, in GNSS-denied areas, satellite-based localization methods are prone to failure. This paper presents a cros...

#Robotics#Computer Vision#Localization#Remote Sensing#Geospatial Analysis

17 hours ago

85%

arxiv_cv

LiteVoxel: Low-memory Intelligent Thresholding for Efficient Voxel Rasterization

Abstract: Abstract: Sparse-voxel rasterization is a fast, differentiable alternative for optimization-based scene reconstruction, but it tends to underfit low-frequency content, depends on brittle pruning heuristics, and can overgrow in ways that inflate VRAM....

#Computer Vision#3D Reconstruction#Differentiable Rendering#Scene Representation#Graphics#Neural Rendering

17 hours ago

90%

arxiv_cv

A Novel Grouping-Based Hybrid Color Correction Algorithm for Color Point Clouds

Abstract: Abstract: Color consistency correction for color point clouds is a fundamental yet important task in 3D rendering and compression applications. In the past, most previous color correction methods aimed at correcting color for color images. The purpos...

#Computer Vision#3D Graphics#Point Cloud Processing#Image Processing#Geometric Modeling

17 hours ago

90%

arxiv_cv

Locally-Supervised Global Image Restoration

Abstract: Abstract: We address the problem of image reconstruction from incomplete measurements, encompassing both upsampling and inpainting, within a learning-based framework. Conventional supervised approaches require fully sampled ground truth data, while s...

#Image Reconstruction#Low-Data Learning#Signal Processing#Medical Imaging#Computer Vision

17 hours ago

88%

arxiv_cv

Differentiable Hierarchical Visual Tokenization

Abstract: Abstract: Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remai...

#Vision Transformers#Image Representation#Tokenization#Deep Learning Architectures#Computer Vision

17 hours ago

80%

arxiv_cv

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Abstract: Abstract: We present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing - two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. ...

#3D Computer Vision#Generative Models#Image Reconstruction#Perceptual Learning#Human Face Modeling

17 hours ago

95%

arxiv_cv

Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography by Deep Learning

Abstract: Abstract: Segmental longitudinal strain (SLS) of the left ventricle (LV) is an important prognostic indicator for evaluating regional LV dysfunction, in particular for diagnosing and managing myocardial ischemia. Current techniques for strain estimat...

#Medical Imaging#Cardiology#Deep Learning#Motion Analysis#Prognostic Indicators

17 hours ago

95%

Loading more papers...

📚 You've reached the end of the papers list

Today's Computer Vision Research Top Papers

Weekly Computer Vision Research Top Papers

Weekly Executive Briefing

Monday, November 3, 2025

Tuesday, November 4, 2025

Wednesday, November 5, 2025

Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

HAGI++: Head-Assisted Gaze Imputation and Generation

Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback

3D Point Cloud Object Detection on Edge Devices for Split Computing

Real World Federated Learning with a Knowledge Distilled Transformer for Cardiac CT Imaging

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal

Label tree semantic losses for rich multi-class medical image segmentation

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Modality-Transition Representation Learning for Visible-Infrared Person Re-Identification

Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data

Can Foundation Models Revolutionize Mobile AR Sparse Sensing?

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Assessing the value of Geo-Foundational Models for Flood Inundation Mapping: Benchmarking models for Sentinel-1, Sentinel-2, and Planetscope for end-users

Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis

NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

Weakly Supervised Object Segmentation by Background Conditional Divergence

Markerless Augmented Reality Registration for Surgical Guidance: A Multi-Anatomy Clinical Accuracy Study

M3PD Dataset: Dual-view Photoplethysmography (PPG) Using Front-and-rear Cameras of Smartphones in Lab and Clinical Settings

Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization

LiteVoxel: Low-memory Intelligent Thresholding for Efficient Voxel Rasterization

A Novel Grouping-Based Hybrid Color Correction Algorithm for Color Point Clouds

Locally-Supervised Global Image Restoration

Differentiable Hierarchical Visual Tokenization

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography by Deep Learning