arxiv_cv 80% Match Research Paper Computer vision researchers,AI engineers working on object detection,Developers of surveillance and robotics systems 17 hours ago

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

computer-vision › object-detection

📄 Abstract

Abstract: The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing Transformer-based RGB-T SOD models with quadratic complexity are memory-intensive, limiting their application in high-resolution bimodal feature fusion. To overcome this limitation, we propose a purely Fourier Transform-based model, namely Deep Fourier-embedded Network (FreqSal), for accurate RGB-T SOD. Specifically, we leverage the efficiency of Fast Fourier Transform with linear complexity to design three key components: (1) To fuse RGB and thermal modalities, we propose Modal-coordinated Perception Attention, which aligns and enhances bimodal Fourier representation in multiple dimensions; (2) To clarify object edges and suppress noise, we design Frequency-decomposed Edge-aware Block, which deeply decomposes and filters Fourier components of low-level features; (3) To accurately decode features, we propose Fourier Residual Channel Attention Block, which prioritizes high-frequency information while aligning channel-wise global relationships. Additionally, even when converged, existing deep learning-based SOD models' predictions still exhibit frequency gaps relative to ground-truth. To address this problem, we propose Co-focus Frequency Loss, which dynamically weights hard frequencies during edge frequency reconstruction by cross-referencing bimodal edge information in the Fourier domain. Extensive experiments on ten bimodal SOD benchmark datasets demonstrate that FreqSal outperforms twenty-nine existing state-of-the-art bimodal SOD models. Comprehensive ablation studies further validate the value and effectiveness of our newly proposed components. The code is available at https://github.com/JoshuaLPF/FreqSal.

Key Contributions

FreqSal proposes a purely Fourier Transform-based model for RGB-T Salient Object Detection (SOD), achieving linear complexity and reducing memory usage compared to Transformer models. It introduces novel components like Modal-coordinated Perception Attention and Frequency-decomposed Edge-aware Blocks to effectively fuse bimodal features and enhance edge detection.

Business Value

Enables more efficient and accurate salient object detection using multi-modal data (RGB and thermal), beneficial for applications requiring robust object identification in various lighting and environmental conditions.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Feasible for deployment on edge devices or systems with limited memory due to its linear complexity and efficient Fourier Transform usage.

Limitations Addressed

Memory-intensive nature and quadratic complexity of existing Transformer-based RGB-T SOD models, limiting their application in high-resolution bimodal feature fusion.

Technical Tags

Salient Object Detection (SOD)RGB-T FusionFourier TransformTransformer ModelsLinear ComplexityModal-coordinated Perception AttentionFrequency-decomposed Edge-aware BlockDeep Fourier-embedded Network (FreqSal)

Research Topics

Salient Object DetectionMulti-modal FusionComputer VisionDeep LearningEfficient Architectures

Methods & Architectures

Fast Fourier Transform (FFT)Modal-coordinated Perception AttentionFrequency-decomposed Edge-aware BlockFourier-based feature decoding Transformer-based (modified for Fourier domain)Fourier Transform based network

Applications & Tasks

Computer Vision Image Analysis Surveillance Robotics Perception Efficiently fusing RGB and thermal data for SODReducing memory intensity of Transformer-based SOD modelsImproving edge clarity and noise suppression in SOD Salient Object DetectionRGB-T image analysisFeature fusion

Related Fields

Computer VisionDeep LearningSignal ProcessingMachine LearningImage Processing

Keywords

Salient object detectionRGB-TFourier transformTransformerLinear complexityMulti-modal fusionEdge detectionComputer visionDeep learningEfficient models

Academic Context

#Salient Object Detection#Multi-modal Fusion#Computer Vision#Deep Learning#Efficient Architectures

Commercial Potential

Potential Products

Advanced surveillance systemsRobotic vision systemsImage analysis tools for thermal and RGB data

Target Industries

SecurityAutomotiveRoboticsManufacturingHealthcare (e.g., medical imaging analysis)

Use Case Examples

Detecting people or objects in low-light conditions using thermal and RGB camerasImproving object tracking in autonomous vehiclesAutomated inspection systems that utilize thermal signatures

Competitive Edge

Offers a more memory-efficient and computationally faster alternative to Transformer-based models for RGB-T SOD, while maintaining high accuracy.

Market Opportunity

Growing market for multi-modal sensing and intelligent vision systems.

Revenue Models

Licensing of the algorithmintegration into hardware/software solutions.

Resource Requirements

Compute Needs

Moderate (due to linear complexity, but FFT can still be intensive)

Data Requirements

Paired RGB and thermal images with salient object annotations.

Deployment Constraints

Requires availability of both RGB and thermal sensors.

Scalability

Highly scalable due to linear complexity, suitable for high-resolution images.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Moderate (novel architectural components and Fourier-based approach)

View Full Paper Back to Papers