Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization
Quick Take
The study introduces cellular sheaves to enhance weakly-supervised tumor localization in whole-slide images, achieving a patch-level AUC of 0.940 on Camelyon16. This method improves classifier attention from 0.717 to 0.953, providing reliable localization signals for clinical interpretation without retraining on Camelyon17 slides.
Key Points
- Cellular sheaves provide a principled approach for local disagreement detection in graph-structured data.
- Attention-conditional consistency improves classifier performance, enhancing localization reliability.
- Joint training achieves a patch-level AUC of 0.940, significantly improving attention metrics.
- Model transfers to Camelyon17 slides without retraining, maintaining high performance metrics.
- Attention maps and sheaf-disagreement maps align on diagnostic regions, aiding clinical interpretation.
Article Content
From source RSS / original summaryarXiv:2606. 00092v1 Announce Type: new Abstract: Weakly-supervised classification of whole-slide images with attention-based multiple instance learning (ABMIL) on top of foundation features now reaches near-saturation on Camelyon16 slide-level performance, but the corresponding attention maps are an imperfect localization signal: in clinical interpretation, a model that classifies correctly without firing on the actual lesion is hard to trust.
We address this gap with cellular sheaves, which equip each vertex and edge of a graph with a finite-dimensional vector space and consistent linear maps between them, providing a principled way to detect local disagreement on graph-structured data. We apply cellular sheaves to weakly-supervised tumour localization on whole-slide images, combining a sheaf disagreement field with ABMIL.
The natural training objective, encouraging consistency between similar features, produces a disagreement field that tracks tissue-level texture rather than diagnostic content. We propose attention-conditional consistency, which uses the classifier's attention to define which neighbouring patches should agree. Joint training of the classifier and the sheaf under this objective produces a disagreement field with patch-level AUC 0. 940 on Camelyon16 and raises the attention head from its ABMIL-alone level of 0.
717 to 0. 953. Two-stage ablation with the classifier frozen at its ABMIL values reaches only 0. 727 on the disagreement field and leaves attention at 0. 717, confirming that the gain comes from the projector co-adapting under both objectives, not from the loss change in isolation. The trained model transfers without retraining to annotated slides from Camelyon17, maintaining Delta AUC 0. 932 +/- 0. 083 and attention AUC 0. 955 +/- 0. 099.
The result is an attention map and a sheaf-disagreement map that fire on the same diagnostic regions, giving clinicians two complementary explanations for each slide-level prediction.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.