Pixel Cube: Diffusion-based Portrait Video Relighting Through Realistic Lighting Reproduction

arXiv cs.CV·Yufan Zhang, Yu Ji, Ayo Ajiboye, Rundi Wu, Yu Guo, Changxi Zheng, Jinwei Ye

4h ago

·~2 min·6/3/2026·en·0

Quick Take

The Pixel Cube method enables photorealistic and temporally consistent relighting of dynamic portrait videos using a hybrid dataset and LED-based lighting system. It leverages pre-trained video diffusion models for high-performance relighting, achieving state-of-the-art results in photorealism and lighting harmony across various conditions.

Key Points

Utilizes a hybrid dataset of real and rendered dynamic portrait videos.
Employs LED-based lighting for realistic lighting emulation.
Achieves state-of-the-art performance in photorealism and temporal consistency.
Generates identity-preserving relit videos under new lighting conditions.
Generalizes well to unseen data regarding appearance and lighting.

Article Content

From source RSS / original summary

arXiv:2606. 02919v1 Announce Type: new Abstract: We present a diffusion-based method for relighting dynamic portrait videos with photorealism and temporal consistency. Our method is fueled by a hybrid training dataset that consists of real-captured and rendered dynamic portrait videos with diverse subject appearances, facial motions, head poses, and known lighting conditions. Specifically, we construct an LED-based lighting system for realistic lighting emulation and high-speed video relighting data acquisition.

By leveraging the image priors embedded in pre-trained video diffusion models, and using per-frame high dynamic range (HDR) environment map as lighting control, we train a high-performance generative model for realistic and identity-preserving dynamic portrait video relighting. In addition to the environment map control, our model uses a synthesized background image to enable control on the camera's exposure level and color tone.

Our model can produce temporally consistent relit portrait video that looks realistic and harmonious under a provided new environment and faithfully preserve the subject's expression and fine facial features, including skin tone, wrinkles, and facial hair. Our model generalizes well to unseen data, in terms of the subject appearance, motion, and lighting condition.

We perform extensive experiments on relighting in-the-wild videos with various environment maps and demonstrate practical applications on portrait photography. Results show that our method achieves state-of-the-art performance in photorealism, lighting harmony, and temporal consistency.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Fabian Degen, Oishi Deb, Jindong Gu, Junchi Yu, Samuele Marro, Philip Torr, Jialin Yu

4h ago

Original

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

AI Summary

Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.

#Agent #AI Coding #Inference