Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

arXiv cs.CV·Devansh Lalwani, Swapnil Bhat, Maulik Shah

3h ago

·~2 min·6/2/2026·en·0

Quick Take

The study introduces cellular sheaves to enhance weakly-supervised tumor localization in whole-slide images, achieving a patch-level AUC of 0.940 on Camelyon16. This method improves classifier attention from 0.717 to 0.953, providing reliable localization signals for clinical interpretation without retraining on Camelyon17 slides.

Key Points

Cellular sheaves provide a principled approach for local disagreement detection in graph-structured data.
Attention-conditional consistency improves classifier performance, enhancing localization reliability.
Joint training achieves a patch-level AUC of 0.940, significantly improving attention metrics.
Model transfers to Camelyon17 slides without retraining, maintaining high performance metrics.
Attention maps and sheaf-disagreement maps align on diagnostic regions, aiding clinical interpretation.

Article Content

From source RSS / original summary

arXiv:2606. 00092v1 Announce Type: new Abstract: Weakly-supervised classification of whole-slide images with attention-based multiple instance learning (ABMIL) on top of foundation features now reaches near-saturation on Camelyon16 slide-level performance, but the corresponding attention maps are an imperfect localization signal: in clinical interpretation, a model that classifies correctly without firing on the actual lesion is hard to trust.

We address this gap with cellular sheaves, which equip each vertex and edge of a graph with a finite-dimensional vector space and consistent linear maps between them, providing a principled way to detect local disagreement on graph-structured data. We apply cellular sheaves to weakly-supervised tumour localization on whole-slide images, combining a sheaf disagreement field with ABMIL.

The natural training objective, encouraging consistency between similar features, produces a disagreement field that tracks tissue-level texture rather than diagnostic content. We propose attention-conditional consistency, which uses the classifier's attention to define which neighbouring patches should agree. Joint training of the classifier and the sheaf under this objective produces a disagreement field with patch-level AUC 0. 940 on Camelyon16 and raises the attention head from its ABMIL-alone level of 0.

717 to 0. 953. Two-stage ablation with the classifier frozen at its ABMIL values reaches only 0. 727 on the disagreement field and leaves attention at 0. 717, confirming that the gain comes from the projector co-adapting under both objectives, not from the loss change in isolation. The trained model transfers without retraining to annotated slides from Camelyon17, maintaining Delta AUC 0. 932 +/- 0. 083 and attention AUC 0. 955 +/- 0. 099.

The result is an attention map and a sheaf-disagreement map that fire on the same diagnostic regions, giving clinicians two complementary explanations for each slide-level prediction.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

6d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.

#AI Coding #Inference #Open Source