M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement

arXiv cs.CV·Youssef Aboelwafa, Hicham G. Elmongui, Marwan Torki

3d ago

·~2 min·5/14/2026·en·1

Quick Take

M2Retinexformer enhances low-light images by integrating depth, luminance, and semantic features in a refined pipeline.

Key Points

Incorporates multi-modal cues for improved image quality.
Utilizes adaptive gating for dynamic attention balancing.
Outperforms existing methods on multiple benchmarks.

📖 Reader Mode

~2 min read

[Submitted on 11 May 2026]

View PDF HTML (experimental)

Abstract:Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information. We propose M2Retinexformer (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating depth cues, luminance priors, and semantic features within a progressive refinement pipeline. Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding. Modalities are extracted at multiple scales and fused through cross-attention, with adaptive gating dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues. Evaluations on the LOL, SID, SMID, and SDSD benchmarks demonstrate overall improvements over Retinexformer and recent state-of-the-art methods. Code and pretrained weights are available at this https URL

Comments:	Accepted at 2026 IEEE International Conference on Image Processing (ICIP)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.12556 [cs.CV]
	(or arXiv:2605.12556v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.12556 arXiv-issued DOI via DataCite

Submission history

From: Youssef Aboelwafa [view email]
[v1] Mon, 11 May 2026 12:13:13 UTC (591 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

Related in this space

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

China bypasses US GPU bans with 1.54-exaflops 'LineShine' supercomputer — CPU-only monster packs 2.4 million Huawei-designed Armv9 cores