SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

arXiv cs.CV·Mingyi He, Xinyi Guo, Xitong Ling, Weiming Chen, Jiawen Li, Lianghui Zhu, Minxi Ouyang, Mingxi Fu, Yizhi Wang, Tian Guan

6/9/2026

·~2 min·6/9/2026·en·2

Quick Answer

SlideCheck is a novel tool that enhances the pretraining of pathology foundation models by providing explicit abnormality and malignancy scores for patch selection.

Quick Take

It utilizes a dual-head MLP to improve data quality and control over pretraining datasets, demonstrating that curated subsets can achieve near full-data performance, thus optimizing the efficiency of self-supervised ViT pretraining.

Key Points

SlideCheck uses a dual-head MLP to model abnormal morphology and malignancy evidence.
It provides scores for organizing and auditing pathology pretraining data effectively.
Curated subsets defined by SlideCheck can achieve performance close to full datasets.
The tool influences downstream behavior in self-supervised ViT pretraining.
It transforms large patch pools into controllable and reusable pretraining datasets.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 07590v1 Announce Type: new Abstract: Pathology foundation models are pretrained on large streams of WSI-derived patches, while supervision during data construction is often slide-level, sparse, or heterogeneous. This mismatch makes it difficult to understand and control which biological patterns enter the pretraining data. We propose SlideCheck, a lightweight pretraining data guidance tool built on frozen pathology foundation model patch features.

Rather than serving as a standalone patch diagnostic model, SlideCheck provides explicit abnormality and malignancy scores for organizing, filtering, and auditing pathology pretraining data. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

2w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Point-Selection Fine-Tuning Framework for Robust Point Cloud Classification

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Point-Selection Fine-Tuning Framework for Robust Point Cloud Classification

-Guided ANN Index Optimization for Human-Object Interaction Retrieval