PT-WNO: Point Transformer with Wavelet Neural Operator for 3D Point Cloud Semantic Segmentation
Quick Answer
This paper shows that The Point Transformer with Wavelet Neural Operator (PT-WNO) enhances 3D point cloud semantic segmentation by integrating a learnable global feature extraction module, achieving significant improvements on benchmarks like S3DIS (71.59% mIoU) and DALES (81.05% mIoU), outperforming previous models.
Quick Take
The Point Transformer with Wavelet Neural Operator (PT-WNO) enhances 3D point cloud semantic segmentation by integrating a learnable global feature extraction module, achieving significant improvements on benchmarks like S3DIS (71.59% mIoU) and DALES (81.05% mIoU), outperforming previous models.
Key Points
- PT-WNO integrates a Wavelet Neural Operator for enhanced global context.
- Achieved 71.59% mIoU on S3DIS, surpassing Point Transformer v3 by +1.03 points.
- On DALES, PT-WNO reached 81.05% mIoU, outperforming the baseline by +1.47 points.
- Maintained competitive performance on ScanNet v2 with 76.19% mIoU.
- Demonstrates improved scene understanding through augmented skip connections.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11466v1 Announce Type: new Abstract: Point cloud semantic segmentation requires architectures that capture both fine-grained local geometry and broad global scene structure. Transformer-based networks have demonstrated strong performance by focusing on detailed local feature aggregation; however, global context is conveyed primarily through skip connections across encoder-decoder stages, which we argue is insufficient for full scene understanding.
We hypothesize that augmenting skip connections with a learnable global feature extraction module allows the network to acquire scene-level knowledge before descending into local detail, leading to richer and more contextually grounded representations. To this end, we propose Point Transformer with Wavelet Neural Operato (PT-WNO), which integrates a shared Wavelet Neural Operator (WNO) branch alongside the skip connections of a point cloud transformer backbone.
At each encoder-decoder transition, point features are projected onto a dense 3D volumetric grid where the WNO captures multi-scale global spectral context through learnable wavelet decomposition and reconstruction. These global features are fused back into the network via lightweight adapters, complementing rather than replacing the existing skip connections. Experiments on four large-scale 3D point cloud benchmarks demonstrate the effectiveness of PT-WNO. On S3DIS (Area 5), PT-WNO achieves 71.
59% mIoU, outperforming the Point Transformer v3 (PTv3) baseline by +1. 03 points. On DALES it achieves 81. 05% mIoU (+1. 47 over the baseline). On ScanNet~v2, PT-WNO obtains 76. 19% mIoU, remaining competitive with the baseline (76. 36%).
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.