Blog
September 5, 2024
10 mins

Blog 3: Building AI Models from Whole Slide Images

While data is an essential component in developing an AI diagnostic [see blog 2], its power is only realized with an algorithm that analyzes and learns from the data.

An AI diagnostic model consists of a set of instructions that process a whole slide image (WSI) and make a prediction, such as screening for microsatellite instability (MSI). To create this model, it must learn from examples, use different layers of understanding (like a pathologist zooming in and out), and make mistakes before learning to improve – not unlike a human studying a new task.

But enough of this abstract talk of models. This blog will illustrate how multiple models come together in building an AI diagnostic like Owkin’s MSIntuit CRC.

Figure 1 - Tiling: machine learning divides the WSI into thousands of small tiles for analysis
Learning tile features

The first step in building a model is to learn how to convert image tiles into a numerical format that captures important characteristics of the tissue. A pathologist looks at the overall architecture of the tissue and may then assess the size and shape of cells and whether tumor or inflammatory cells are present at high power in a few areas. A machine learning model identifies more abstract features. Some may be similar to what a pathologist identifies, and others will be quite different.

Computational features are a set of numbers that describe the appearance of an image tile e.g. color shade or contrast. These features are learned by comparing image tiles to identify their similarities and differences. At each step of learning, the algorithm takes a set of tiles and randomly transforms each in two different ways. These transformations can be things like rotating the tile, blurring it slightly, cropping it in a different way, or shifting the colors. Using these two different transformations of each tile, the model must learn which tiles are transformations of the same tile and which are transformations of different tiles. No manual labels are needed for this task, so this is a type of unsupervised learning.

Figure 2 - The model learns which images are transformations of the same image and which are transformations of different images

With this contrastive approach, the model learns how to compute a vector of features for each image tile. Any further operations can be done using the feature vectors instead of the original image tiles. The tile feature model for MSIntuit CRC was learned from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection and calculates a set of 2048 features for each image tile.

Learning from slides

Once a model is trained to render vectors for image tiles, these representations can be used to process slides. Predictions from slides can be made in various ways. The fundamental idea is that the tile features must be combined to make a prediction for the entire slide. This means that we now need the slide labels to train a supervised model.

Continuing with the example of MSIntuit CRC, each image tile is first assigned its slide label (MSS or MSI). The machine learning model must learn to distinguish MSS from MSI based on the tile features. The model starts by making random predictions: a score between zero and one, with zero being most likely MSS and one being most likely MSI. Then it makes small improvements based on the predictions it gets wrong. Over millions of these small improvements, the model learns to make much better predictions. Once it is no longer improving, the learning phase is complete.

These MSI scores are still at the tile level. Next, the top ten and the bottom ten scores are used in a final model to predict MSS versus MSI for the slide. This score is once again a value between zero and one.

Figure 3: Tile scoring from whole slide images

Other diagnostics or even future versions of MSIntuit CRC may combine tile features in other ways. Some even incorporate other information in addition to WSIs. For example, Owkin’s RlapsRisk BC diagnostic adds clinical features like age, tumor size, and number of invaded lymph nodes.

Two components to the successful integration of tile features or other data are that the model must predict correctly most of the time and be interpretable so that engineers and pathologists can understand how it arrived at its final prediction.

Model interpretability: What types of features does it learn?

For models built from WSIs, there are two main avenues for interpretability: highlighting which tiles contributed to a prediction and revealing common characteristics that the model associates with each class.

The first is accomplished with a heatmap that highlights which tiles are associated with each class. In the example below, tiles associated with MSI are colored red, and tiles associated with MSS are colored blue. Going a step further, the tiles most strongly associated with MSI are shown underneath the heatmap. After reviewing MSI-associated tiles across numerous slides, Owkin pathologists were able to confirm known features of MSI like tumor lymphocyte infiltration and mucinous differentiation but also less known features like the presence of dirty necrosis and neutrophilic infiltration as common characteristics (Saillard, 2021). These are just a few of the patterns features that the model has learned to associate with MSI.

Figure 4: Owkin models are interpretable allowing pathologists to understand and validate the results

Measuring performance

You may have noticed that all the scores mentioned so far are on a scale from zero to one, with one being most strongly associated with MSI. To make a final prediction of MSS or MSI, a threshold must be selected within this range. All WSIs scoring higher than the threshold are predicted to be MSI, and all scoring lower are considered MSS. Selecting this threshold will be discussed in the next blog.

From these binary predictions, we can now calculate the model's sensitivity and specificity. This is always done on a separate set of patient samples than the ones used to train the model to assess how well it performs on previously unseen data.

The iterative development process

Developing a machine learning model is an iterative process. There are many different knobs to tweak, both with the data and the model, that can influence performance. Perhaps not all tissue processing artifacts like tissue folds were excluded in the beginning and the quality control process needed to be refined. Or a subsequent version of the model performed well on the training set but not nearly as well on the test set. By identifying and understanding each challenge that arises, engineers can tweak the algorithm and run another experiment. It is only after many iterations and many experiments that a model starts to perform well. But is it good enough? That’s the topic of our next blog: how to validate that a model is ready for use in a diagnostic tool.

Authors
Heather Couture
Blog 3: Building AI Models from Whole Slide Images

No items found.
No items found.
"Our collaboration at Owkin has been instrumental in this process. This collaborative approach allows us to advance the field of pathology through AI, and I am excited to see the continued progress we make together."
Katharina Von Loga
Head of Pathology at Owkin