SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection

arXiv cs.CV·Chenxu Wang, Yuxuan Li, Yunheng Li, Xiang Li, Jingyuan Xia, Qibin Hou

5d ago

·~1 min·5/25/2026·en·0

Quick Take

SLIP-RS introduces a Structured-Attribute Decoupling Paradigm for remote sensing object detection, overcoming limitations of Monolithic Label Learning. It utilizes Structured-Attribute Contrastive Learning and a Conformal Attribute Reliability Engine, resulting in RS-Attribute-15M, the largest dataset with over 15 million attribute annotations, achieving unprecedented performance in fine-grained detection and cross-domain generalization.

Key Points

SLIP-RS maps open-ended categories to a finite attribute space for better representation.
Introduces RS-Attribute-15M, the largest dataset with 15 million attribute annotations.
Achieves unprecedented performance in fine-grained detection tasks.
Utilizes combinatorial attribute augmentation for intrinsic visual logic learning.
Employs conformal prediction theory for high-fidelity supervision from noisy data.

Article Content

From source RSS / original summary

arXiv:2605. 23144v1 Announce Type: new Abstract: Existing language-image pre-training for remote sensing object detection is constrained by Monolithic Label Learning, which relies on exhaustively enumerating open-set categories via black-box data to acquire fine-grained representations, creating a dependency incompatible with the domain's inherent data scarcity.

To transcend this bottleneck, we propose SLIP-RS, establishing a Structured-Attribute Decoupling Paradigm that maps the open-ended category space into a finite, physically meaningful attribute space, unlocking fine-grained discriminability via explicit structural logic.

This paradigm is realized via two technical pillars: (1) Structured-Attribute Contrastive Learning, which enforces the learning of decoupled intrinsic visual logic via combinatorial attribute augmentation; and (2) Conformal Attribute Reliability Engine, which leverages conformal prediction theory to rigorously distill high-fidelity supervision from noisy sources, yielding RS-Attribute-15M, the largest dataset with over 15 million attribute annotations.

Extensive experiments demonstrate that SLIP-RS establishes unprecedented performance in fine-grained detection and cross-domain generalization, validating structured attributes as a vital foundation for remote sensing. Code: https://github. com/facias914/SLIP-RS.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities