PT-WNO: Point Transformer with Wavelet Neural Operator for 3D Point Cloud Semantic Segmentation

arXiv cs.CV·Nhut Le, Maryam Rahnemoonfar

2d ago

·~2 min·6/11/2026·en·0

Quick Answer

This paper shows that The Point Transformer with Wavelet Neural Operator (PT-WNO) enhances 3D point cloud semantic segmentation by integrating a learnable global feature extraction module, achieving significant improvements on benchmarks like S3DIS (71.59% mIoU) and DALES (81.05% mIoU), outperforming previous models.

Quick Take

The Point Transformer with Wavelet Neural Operator (PT-WNO) enhances 3D point cloud semantic segmentation by integrating a learnable global feature extraction module, achieving significant improvements on benchmarks like S3DIS (71.59% mIoU) and DALES (81.05% mIoU), outperforming previous models.

Key Points

PT-WNO integrates a Wavelet Neural Operator for enhanced global context.
Achieved 71.59% mIoU on S3DIS, surpassing Point Transformer v3 by +1.03 points.
On DALES, PT-WNO reached 81.05% mIoU, outperforming the baseline by +1.47 points.
Maintained competitive performance on ScanNet v2 with 76.19% mIoU.
Demonstrates improved scene understanding through augmented skip connections.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 11466v1 Announce Type: new Abstract: Point cloud semantic segmentation requires architectures that capture both fine-grained local geometry and broad global scene structure. Transformer-based networks have demonstrated strong performance by focusing on detailed local feature aggregation; however, global context is conveyed primarily through skip connections across encoder-decoder stages, which we argue is insufficient for full scene understanding.

We hypothesize that augmenting skip connections with a learnable global feature extraction module allows the network to acquire scene-level knowledge before descending into local detail, leading to richer and more contextually grounded representations. To this end, we propose Point Transformer with Wavelet Neural Operato (PT-WNO), which integrates a shared Wavelet Neural Operator (WNO) branch alongside the skip connections of a point cloud transformer backbone.

At each encoder-decoder transition, point features are projected onto a dense 3D volumetric grid where the WNO captures multi-scale global spectral context through learnable wavelet decomposition and reconstruction. These global features are fused back into the network via lightweight adapters, complementing rather than replacing the existing skip connections. Experiments on four large-scale 3D point cloud benchmarks demonstrate the effectiveness of PT-WNO. On S3DIS (Area 5), PT-WNO achieves 71.

59% mIoU, outperforming the Point Transformer v3 (PTv3) baseline by +1. 03 points. On DALES it achieves 81. 05% mIoU (+1. 47 over the baseline). On ScanNet~v2, PT-WNO obtains 76. 19% mIoU, remaining competitive with the baseline (76. 36%).

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup