Trained on 1.5 million whole-slide images from 100,000 patients from the Memorial Sloan Kettering Cancer Center (MSKCC), a pathology foundation model Virchow, named in honour of Dr. Rudolf Virchow, is shown to enable pan-cancer detection and improve performance of specialised models in detection of rare cancers.
The results provide evidence that large-scale foundation models can be the basis for robust results in a new frontier of computational pathology according to Siqi Liu and colleagues from the Paige in New York, NY, US, who published the findings on 22 July 2024 in the Nature Medicine.
The authors wrote in the background that traditional histological preparations used for light microscopy examination are being replaced by their digital counterparts, also known as whole-slide images. Computational pathology applies artificial intelligence to digitised whole-slide images to support the diagnosis, characterisation and understanding of disease. Given the incredible gains in performance of computer vision, a subfield of artificial intelligence focused on images, more recent studies attempt to unlock new insights from routine whole-slide images and reveal undiscovered outcomes such as prognosis and therapeutic response.
If successful, such efforts would enhance the utility of hematoxylin and eosin-stained whole-slide images and reduce reliance on specialised and often expensive immunohistochemistry or genomic testing. A major factor in the performance gains of computer vision models has been the creation of large-scale deep neural networks, termed foundation models. Foundation models generate data representations, called embeddings. The value of generalisation from large datasets is even greater for applications with inadequate quantities of data to develop bespoke models, as is the case for the detection of uncommon or rare tumour types, as well as for less common diagnostic tasks.
Virchow, a 632 million parameter vision transformer model, is trained using the DINO v.2 algorithm, a multiview student–teacher self-supervised algorithm. DINO v.2 leverages global and local regions of tissue tiles to learn to produce embeddings of whole-slide image tiles, which can be aggregated across slides and used to train a variety of downstream predictive tasks.
In addition to the evaluation of biomarker prediction and cell identification, the study team demonstrated that this large foundation model enables pan-cancer detection, achieving 0.95 specimen-level area under the (receiver operating characteristic) curve across nine common and seven rare cancers. Furthermore, they showed that with less training data, the pan-cancer detector built on Virchow can achieve similar performance to tissue-specific clinical-grade models in production and outperform them on some rare variants of cancer.
To provide evidence for potential focus areas for future advances in computational pathology, qualitative analysis was also performed, characterising the error patterns where the artificial intelligence model fails to identify or falsely identifies cancerous cells. Motivated by simplifying clinical workflows, the study team evaluated the use of Virchow embeddings to train biomarker prediction, generally outperforming other models.
There are few areas in which the study investigators anticipate particularly high-value impact. In clinical practice, where most biopsy samples are benign, a pan-cancer detection system can prioritise cases to help reduce diagnostic turnaround. With decreasing training data requirements, clinical-grade products for less common cancers could be developed. Biomarker prediction using routine hematoxylin and eosin whole-slide images would increase screening rates, reduce intrusive, tissue-destructive testing, and rapidly provide the data needed to make more informed treatment decisions.
Virchow unlocks the ability to accurately and precisely detect unusual histological variants of cancer as well as biomarker status, something that is difficult to achieve with cancer- or biomarker-specific training due to the limited amount of associated training data.
Virchow marks a major increase in training data scale to 1.5 million whole-slide images—a volume of data that is over 3,000 times the size of ImageNet as measured by the total number of pixels. This large scale of data in turn motivates large models that can capture the diversity of image features in whole-slide images.
In this work, the Paige’s team has demonstrated that this approach can form the foundation for clinical-grade models in cancer pathology. In particular, Virchow’s performance gains highlight the value of a foundation model and open possibilities for many high-impact applications with limited amounts of labelled training data.
Reference
Vorontsov E, Bozkurt A, Casson A, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine; Published online 22 July 2024. DPO: https://doi.org/10.1038/s41591-024-03141-0