FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

arXiv cs.CV·Keval Thaker, Venkatraman Narayanan, Abdalmalek Aburaddaha, Samir A. Rawashdeh

2d ago

·~2 min·6/11/2026·en·0

Quick Answer

FreqKD introduces a frequency-decoupled knowledge distillation framework for infrared object detection, achieving 64.1 mAP50 on KAIST, surpassing DINOv2 by 2.4 points.

Quick Take

FreqKD introduces a frequency-decoupled knowledge distillation framework for infrared object detection, achieving 64.1 mAP50 on KAIST, surpassing DINOv2 by 2.4 points. The method enhances cross-modal transfer across datasets and tasks, demonstrating significant performance improvements in various architectures.

Key Points

FreqKD applies asymmetric supervision for low and high-frequency components.
Low-frequency components maintain structural information with strict MSE loss.
High-frequency components use relaxed log-MSE loss to tolerate texture differences.
Achieved 64.1 mAP50 on KAIST, improving 2.4 points over DINOv2.
Demonstrated effective transfer across datasets, tasks, and architectures.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 11572v1 Announce Type: new Abstract: Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics.

We investigate the spectral structure of the RGB--IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency.

The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0. 1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2. 4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.

1 mAP50, improving 2. 4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2. 1 mAP50), tasks (MFNet segmentation, +1. 85 mean intersection-over-union), and architectures (ResNet-50, +1. 0 mAP50). Code is available at: https://anonymous. 4open. science/r/freq_decoupled_kd-5E5A

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup