CFRNet: Cycle-Consistent Fixed-Point Training for Real-Time Blind Face Restoration on Consumer Embedded NPUs
Quick Answer
CFRNet introduces Cycle-Consistent Fixed-Point Training for real-time blind face restoration on consumer NPUs, achieving a best LPIPS score of 0.250 at three cycles, outperforming existing methods like GFPGAN and CodeFormer.
Quick Take
CFRNet introduces Cycle-Consistent Fixed-Point Training for real-time blind face restoration on consumer NPUs, achieving a best LPIPS score of 0.250 at three cycles, outperforming existing methods like GFPGAN and CodeFormer. It runs at 23 ms per cycle on HiSilicon Hi3402 NPU, making it suitable for on-device applications without retraining.
Key Points
- CFRNet is a 2.0M-parameter ResNet-style model for 256x256 face restoration.
- Achieves best perceptual score (LPIPS 0.250) and PSNR/SSIM at two cycles.
- Runs in approximately 23 ms per cycle on HiSilicon Hi3402 NPU.
- Cycle count acts as a quality knob, improving results up to three cycles.
- The same approach is effective with a plain CNN for easier deployment.
Article Content
From source RSS / original summaryarXiv:2606. 06850v1 Announce Type: new Abstract: Blind face restoration on consumer devices has to balance image quality against speed and memory. Strong methods such as GFPGAN and CodeFormer give good perceptual quality, but they rely on large pretrained generative priors and on operators such as attention, codebook lookup, and style modulation that are hard to compile and quantize on the small neural processing units (NPUs) used in consumer hardware.
Small convolutional restorers run fast enough, but they tend to over-smooth and to leave artifacts around the eyes, nose, and mouth. We present CFRNet, a 2. 0,M-parameter ResNet-style restorer for on-device use at $256\times256$, the common face-crop size on consumer NPUs. The main idea is Cycle-Consistent Fixed-Point Training (CCFP).
Instead of training the network for one pass and then running it several times by hand, we train it to act as a fixed-point operator, so that applying it again to a restored face does not change the face. CCFP uses three training losses, namely progressive multi-cycle supervision, an idempotence loss, and a re-degradation cycle loss, and it adds no cost at inference. To compare fairly under our deployment limits, we retrain all baselines from scratch at the same $256\times256$ resolution.
On a 300-image test set, CFRNet reaches the best perceptual score (LPIPS 0. 250 at three cycles, which is 31% lower than one cycle) and also the best PSNR and SSIM at two cycles. It runs in about 23,ms per cycle in INT8 on a HiSilicon Hi3402 NPU, while the same baselines cannot be compiled to that chip. The cycle count $k$ acts as a simple quality knob that needs no retraining: PSNR is best at $k\! =\! 2$ and LPIPS keeps improving up to $k\! =\! 3$.
We further show that the same idea works with a plain CNN that is even easier to deploy, and we run the model in real time on an in-car driver-monitoring board.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.
