MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

arXiv cs.CV·Qing Cheng, Damiano Bertolini, Wei Zhang, Dong Wang, Niclas Zeller, Daniel Cremers

3d ago

·~2 min·5/14/2026·en·2

Quick Take

MambaPanoptic introduces a Mamba-based framework for efficient panoptic segmentation with improved feature representation.

Key Points

Utilizes Mamba blocks for linear complexity feature representation.
Introduces unified kernels for proposal-free panoptic prediction.
Outperforms existing methods on Cityscapes and COCO benchmarks.

📖 Reader Mode

~2 min read

[Submitted on 12 May 2026]

View PDF HTML (experimental)

Abstract:Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

Comments:	ISPRS Congress 2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.12640 [cs.CV]
	(or arXiv:2605.12640v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.12640 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Qing Cheng [view email]
[v1] Tue, 12 May 2026 18:30:49 UTC (15,080 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

Related in this space

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

Distribution-Aware Algorithm Design with LLM Agents