Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection

arXiv cs.CV·Franky George, Muhammad Khalid, Adil Khan

1d ago

·~2 min·5/19/2026·en·1

Quick Take

Contrastive-SDXL enhances night-time pedestrian detection by preserving semantic structure through advanced augmentation techniques.

Key Points

Utilizes latent diffusion models for image translation.
Introduces patch-wise semantic contrastive loss.
Achieves 6-7% reduction in miss rate for detectors.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF

Abstract:Night-time pedestrian detection remains challenging because labelled night-time data are limited and large illumination differences make daytime-only trained detectors unreliable. Latent diffusion models (LDMs) provide a powerful basis for image-to-image translation and cross-domain augmentation, but their effectiveness in safety-critical perception depends on whether detector-relevant objects and local semantic structure are preserved when translating between source and target domains. In this work, we present Contrastive-SDXL, a day-to-night augmentation framework for night-time pedestrian detection built on SDXL-Turbo and fine-tuned using Low-Rank Adaptation (LoRA). To preserve semantic correspondence between daytime inputs and translated night-time images, we introduce a patch-wise semantic contrastive loss guided by a pretrained DINOv2 encoder rather than generator encoder features. Multi-level DINOv2 self-attention maps enforce both local and global semantic consistency, while an object consistency loss explicitly encourages pedestrian preservation. Contrastive-SDXL produces realistic night-time images, achieving a Frechet Inception Distance (FID) of 22.5. Detectors trained with our synthetic images obtain a 6-7% reduction in miss rate compared with a daytime-only baseline, approaching the performance of detectors trained on real night-time data. These results demonstrate that consistency-driven diffusion augmentation can effectively support safety-critical night-time pedestrian this http URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2605.16406 [cs.CV]
	(or arXiv:2605.16406v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.16406 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Muhammad Khalid Dr [view email]
[v1] Wed, 13 May 2026 10:41:55 UTC (6,273 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Related in this space

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards