FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection
Quick Answer
FreqKD introduces a frequency-decoupled knowledge distillation framework for infrared object detection, achieving 64.1 mAP50 on KAIST, surpassing DINOv2 by 2.4 points.
Quick Take
FreqKD introduces a frequency-decoupled knowledge distillation framework for infrared object detection, achieving 64.1 mAP50 on KAIST, surpassing DINOv2 by 2.4 points. The method enhances cross-modal transfer across datasets and tasks, demonstrating significant performance improvements in various architectures.
Key Points
- FreqKD applies asymmetric supervision for low and high-frequency components.
- Low-frequency components maintain structural information with strict MSE loss.
- High-frequency components use relaxed log-MSE loss to tolerate texture differences.
- Achieved 64.1 mAP50 on KAIST, improving 2.4 points over DINOv2.
- Demonstrated effective transfer across datasets, tasks, and architectures.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11572v1 Announce Type: new Abstract: Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics.
We investigate the spectral structure of the RGB--IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency.
The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0. 1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2. 4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.
1 mAP50, improving 2. 4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2. 1 mAP50), tasks (MFNet segmentation, +1. 85 mean intersection-over-union), and architectures (ResNet-50, +1. 0 mAP50). Code is available at: https://anonymous. 4open. science/r/freq_decoupled_kd-5E5A
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.