Breakthrough in medical research published in Nature Medicine: Federated learning used to train deep learning models on multiple hospitals' histopathology data

Duration:10 mins

Tags: AI / ML / FL


Date:January 19th, 2023


Breakthrough in medical research published in Nature Medicine: Federated learning used to train deep learning models on multiple hospitals' histopathology data

Source: Images of histology slides, a breast cancer patient, hospitals and software created by Sarah Ulstrup with the assistance of generative AI (DALL·E 2)

Today, Nature Medicine published groundbreaking Owkin research demonstrating the first-ever use of federated learning to train deep learning models on multiple hospitals’ histopathology data. At Owkin, we are on a journey to revolutionize medical research. Our mission is to find the right treatment for every patient by harnessing the power of artificial intelligence. But access to large amounts of medical data is a major challenge. That's why we've spent the last six years developing and testing Substra, our proprietary open source federated learning software, to safely break data silos and unleash the possibilities of collaborative research.

But what is federated learning?

It's a machine learning technique that allows models to be trained across multiple distributed servers without the need to share sensitive data. This means that the data stays local and only the algorithms travel, enabling the use of larger, multi-centric datasets in AI-powered research and escaping the biases of single-centric studies. The result is the potential for more accurate and reliable conclusions, leading to major breakthroughs in precision medicine.

Why does it matter?

Up until now, there have been very few federated learning projects using real-world medical data due to technical challenges and privacy concerns. Most studies have been limited to simulating federated learning by artificially splitting data which ignores the inherent heterogeneity of medical data. However, we have conducted a proof of concept study in which federated learning was applied to real-world datasets from two French hospitals. Using federated AI models, we were able to predict the response of triple-negative breast cancer (TNBC) patients to neoadjuvant chemotherapy. These predictions from histology slides match the performance of methods based on structured clinical data on this distributed cohort. 

This research prioritizes patient privacy by keeping sensitive medical data within the hospital's secure network. This pioneering research was recently published in the journal Nature Medicine, and we hope it will inspire other medical institutions to collaborate in federated learning networks in order to advance research while protecting patient data.

Watch the video below to learn more.

HealthChain explainer
Video Play Button

Jean du Terrail, lead author and Senior Machine Learning Scientist at Owkin, said:

Thanks to our partners, we are proud to have performed an original federated analysis on medical data in real-life conditions, and the first of its kind on histopathology data. By connecting institutions in a federated manner we were able to reach the critical mass of triple negative breast cancer data necessary for the AI to discover, on its own, histological patterns predictive of the response to treatment. 

We hope that this proof of concept will inspire medical institutions to collaborate in federated learning networks in order to move research forward while keeping patient data private.

So how did we get here?

Getting to this point wasn't easy. We had to build a consortium of medical centers (called Healthchain) with curated datasets, navigate complex project management challenges, and overcome technical hurdles. Our consortium included seven public partners and two private partners and required the expertise of more than 25 people at Owkin, as well as teams at Centre Léon Bérard (CLB) and Institut Curie. We learned valuable lessons and made numerous discoveries along the way.

Through HealthChain, a public-private consortium with a €10M budget funded by Banque Publique d'Investissement, we set out to develop the federated learning framework (Substra) and train predictive models in breast cancer.

Why did we choose breast cancer?

In order to prove that federated learning could provide new insights in unanswered problems, we purposefully chose a challenging task where there was little evidence that deep learning methods would be effective at all.

The prevalence of TNBC is relatively low, making up only 15% of all breast cancers or around 9,000 new cases per year in France. This lack of prevalence makes it difficult to access large amounts of data on TNBC. In fact, many published studies on TNBC have small sample sizes, with a median of just 119 patients. Due to this lack of data, only one study had looked at the use of machine learning for predicting the response to neoadjuvant chemotherapy in TNBC using whole slide images (WSIs). This study, from P. Naylor et al., found that WSIs contain useful information for predicting NACT response in TNBC, but it did not include external validation and was based on a small dataset of 122 WSIs from a single center. The potential for machine learning to improve prognosis and identify new predictive biomarkers in early TNBC had not yet been fully realized.

How did we realize the potential of federated learning?

Through federated learning, the Healthchain consortium created one of the largest TNBC cohorts of its kind ever assembled. The cohort consisted of digital pathology data and clinical information from more than 650 patients from Institut Curie in Paris, Centre Léon Bérard in Lyon, Institut Gustave Roussy in Villejuif and IUCT Oncopole in Toulouse.

From this dataset, we built federated AI models that can generate predictions about the future response of TNBC patients to neoadjuvant chemotherapy directly from diagnostic biopsies. These predictions match the performance of methods based on structured clinical data on this distributed cohort. By using interpretable AI to extract information from digital pathology slides, we were also able to generate new hypotheses about potential new biomarkers of response.

So what next?

From a clinical perspective, the hope is that in the future, this research will help funnel patients towards either less toxic treatments or new experimental treatments, improving the personalization of medical care. 

This study is a landmark proofpoint of federated learning in medical research and represents a breakthrough in the realization of the practical applications of AI. Federated learning has the potential to lead a new era in AI powered-collaborative research to solve the biggest medical challenges by breaking down silos to securely train models on larger, more diverse datasets.