COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing
Quick Take
COSY introduces a novel generator architecture for 3D Gaussian Splatting GANs, enabling independent synthesis of human head components like hair and skin. This method enhances editing control without requiring segmentation masks, achieving superior disentanglement and visual quality compared to existing techniques.
Key Points
- New generator architecture allows independent editing of hair, skin, and glasses.
- Eliminates need for segmentation masks or geometric priors in editing.
- Achieves better disentanglement and precise control over specific attributes.
- Utilizes context tokens for shape and lighting adjustments without prior annotations.
- Demonstrates competitive visual quality against existing GAN-based methods.
Article Content
From source RSS / original summaryarXiv:2605. 24114v1 Announce Type: new Abstract: Recent 3D Gaussian Splatting (3DGS) GANs for human heads synthesize and render photorealistic 3D models in real-time and offer a vast variety in identity and appearance. However, controlling specific semantic attributes such as hair color or glasses remains challenging, as edits in the entangled latent space often induce unintended changes in identity or appearance.
Although there are several methods that aim to disentangle the latent space post training by estimating directions that only modify certain features, these methods cannot guarantee complete disentanglement and often require pre-trained classifiers. In our approach, we propose a new generator architecture that synthesizes components, such as hair, skin, glasses, and torso, completely independently. This allows for changing the latent vector for one region while keeping the remaining parts fixed.
Further, we achieve this separation using only sparse information such as the hair or skin color, eliminating the requirement of segmentation masks or geometric priors, often seen in prior work. To ensure matching shape and lighting conditions during editing, we allow minimal shared information via context tokens between the independent generators. These tokens even allow us to control the shape and light, without any prior annotation.
Compared to existing works on GAN-based generation and editing, our method shows better disentanglement, more precise editing control, and competitive visual quality.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.
