Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Quick Take
The Geometry-Aware Representation Denoising (GARD) framework enhances multi-view 3D reconstruction by performing diffusion-based restoration in feature space, effectively recovering scene geometry and high-quality RGB images. Tested on the Depth Anything 3 benchmark, GARD demonstrates significant robustness improvements under degraded imaging conditions, addressing a critical challenge in real-world applications.
Key Points
- GARD performs diffusion-based restoration directly in the feature space of 3D models.
- It effectively recovers accurate scene geometry and high-quality RGB images simultaneously.
- The framework addresses robustness challenges in multi-view 3D reconstruction under real-world degradations.
- Comprehensive experiments validate GARD's effectiveness on the Depth Anything 3 benchmark.
Article Excerpt
From source RSS / original summaryarXiv:2605. 26230v1 Announce Type: new Abstract: Multi-view 3D reconstruction has achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness for multi-view 3D reconstruction under degraded conditions therefore remains an important challenge.
We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in the feature space of a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry.
Furthermore, by employing an additional RGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.
