Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception
Quick Take
Thermo-VL enhances vision-language models with thermal infrared perception for low-light conditions.
Key Points
- Introduces a thermal encoder for improved low-light performance.
- Utilizes dual-attention fusion for RGB and thermal data.
- Publicly available dataset and benchmark for thermal reasoning.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.