Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation

arXiv cs.CV·\.Ismail Emre Can{\i}tez, \"Ozg\"ur Erkent

2h ago

·~2 min·5/27/2026·en·0

Quick Take

The paper presents a novel RGB-thermal fusion architecture for improved semantic segmentation in challenging environments.

Key Points

Utilizes dual ConvNeXt V2 backbones for multi-modal fusion.
Introduces frequency-based fusion for early-stage thermal features.
Achieves high mIoU with fewer parameters and lower costs.

Article Content

From source RSS / original summary

arXiv:2605. 26273v1 Announce Type: new Abstract: Semantic segmentation in complex environments such as urban driving scenes remains challenging under adverse lighting conditions, where RGB images alone provide insufficient information. RGB-Thermal fusion leverages the complementary strengths of visible and infrared imagery to improve scene understanding; however, effectively integrating these heterogeneous modalities at varying levels of feature abstraction remains an open problem.

In this paper, we propose a multi-modal fusion architecture built upon dual ConvNeXt V2 backbones that employs stage-wise, modality-adaptive fusion strategies.

For early-stage features, we introduce a Frequency-Based Fusion Module that decomposes infrared features into low- and high-frequency components via Gaussian filtering, applies dual-branch spatial attention to selectively emphasize thermal patterns and fine-grained boundaries, and integrates them with RGB features through a confidence-gated residual mechanism.

For late-stage features, we design a semantic fusion module with cross-modal attention and multi-scale depthwise convolutions to capture semantic correspondences across modalities. The fused features are decoded via a PANet-style bidirectional decoder with deep supervision. Experiments on MFNet and PST900 demonstrate that our lightest variant achieves 61. 73\% and 86. 24\% mIoU, respectively, with only 35.

43M parameters, outperforming recent methods while using substantially fewer parameters and lower computational cost. Code is available at https://github. com/ismailemrecntz/VISIBLE-INFRARED-SENSOR-FUSION

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

This startup is betting India’s gig economy can train the world’s robots