iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision
Quick Answer
This paper shows that iSAGE is an innovative human-in-the-loop framework for remote sensing semantic segmentation that achieves 97.2% of dense supervision using only 0.040% of pixels.
Quick Take
iSAGE is an innovative human-in-the-loop framework for remote sensing semantic segmentation that achieves 97.2% of dense supervision using only 0.040% of pixels. It outperforms existing methods on benchmarks like BsB Aerial and ISPRS Vaihingen, achieving 76.78% mIoU with minimal expert input, without relying on auxiliary machinery.
Key Points
- iSAGE uses expert clicks to target model errors, eliminating the need for auxiliary machinery.
- Achieved 74.79% mIoU on BsB Aerial with only 0.040% of labeled pixels.
- On ISPRS Vaihingen, iSAGE reached 76.78% mIoU using just 0.011% of pixels.
- Outperformed four other output-reading mechanisms by 7.4 to 14.5 percentage points.
- iSAGE is the only iterative framework in its category without auxiliary tools.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 10136v1 Announce Type: new Abstract: Semantic segmentation in remote sensing requires costly pixel-level annotations, and nearly every problem demands a new dataset since models rarely transfer across sensors, platforms, or geographies. Existing human-in-the-loop frameworks expand sparse clicks into dense supervision via auxiliary machinery (pseudo-labels, propagation, CRFs, foundation-model prompts, auxiliary heads), all operating on the model's predictive distribution.
A confidently wrong pixel is indistinguishable from a confidently correct one in that distribution by construction, so no rule reading it can separate the two; the distinguishing signal is external to the model. This paper hypothesizes that expert clicks targeting confident model errors, not arbitrary pixels, suffice to match dense supervision, with no expansion machinery.
iSAGE (Iterative Sparse Annotation Guided by Expert) realizes this hypothesis on an integrated open-source platform, where an error-weighted loss amplifies the gradient at each click and the annotation record itself is the dataset, extensible, correctable, and auditable. Experiments use a minimum-effort regime: at most one labeled pixel per class per frame. On BsB Aerial, iSAGE recovers 97. 2% of dense supervision (74. 79% mIoU on 0.
040% of pixels) with contrasting class dynamics: amorphous classes (permeable areas) saturate from the seed, while small classes (cars) require late-iteration effort. On ISPRS Vaihingen (external benchmark), iSAGE reaches 76. 78% mIoU with 0. 011% of pixels, matching the dense baseline (76. 65%) and exceeding all published methods. Under the same pipeline, four output-reading mechanisms (oracle entropy across budgets 1--100x, pseudo-labels across thresholds 0. 90--0.
99, CRF-based propagation, uniform random) plateau 7. 4 to 14. 5 pp below iSAGE. Across 31 surveyed methods, iSAGE is the only iterative human-in-the-loop framework operating without auxiliary machinery.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.