What is Federated Learning?
Federated learning is a new decentralized machine learning procedure to train machine learning models with multiple data providers. Instead of gathering data on a single server, the data remains locked on their servers and the algorithms and only the predictive models travel between the servers – never the data. The goal of this approach is for each participant to benefit from a larger pool of data than their own, resulting in increased ML performance, while respecting data ownership and privacy.
The ability to train machine learning models at scale across multiple medical institutions without pooling data is a critical technology to solve the problem of patient privacy and data protection. A successful implementation of federated learning could hold significant potential for enabling precision medicine at a large-scale; helping match the right treatment to the right patient at the right time.
Why Federated Learning solves the main challenges of Machine Learning in Healthcare
As technology advances, machine learning algorithms have become further integrated into every aspect of our lives. Revolutionizing all industries including healthcare, machine learning shows promise in accelerating medical research. We now observe predictive modeling, and other machine learning techniques, being used across the research spectrum to quickly and accurately generate medical insights. That goes for example from cancer biomarker identification, to patient screening and genetic prediction from imaging. These applications not only expand researchers’ abilities to make discoveries, but they also help address time & cost obstacles across the healthcare industry.
There is a major hurdle preventing the consistent deployment of AI in healthcare at an impactful scale. Machine learning approaches are “data-hungry.” Algorithms need access to large and diverse datasets to train, improve their accuracy, and eliminate bias. Modern healthcare systems’ traditional infrastructure makes it difficult to structure the vast quantities of medical data into something that machine learning can make the most of.
Today’s standard approach of centralizing data from multiple centers must be balanced with critical concerns regarding patient privacy and data protection. Software that handle personal data are bound by strict privacy laws; healthcare systems must protect personal data at all times, and current standard practices, such as anonymization, may even require removing data that may be critical for medical discoveries. The data requirements for machine learning in healthcare leave us with a central challenge: how can we access the volume of data needed to transform healthcare with AI at scale, while respecting patient privacy and confidentiality on sensitive health data?
To access more real-world data and larger, more diverse datasets for training AI algorithms on, healthcare stakeholders (hospitals, research centers, life science companies) need to start collaborating with each other. But how to do so in a privacy-preserved way?
Federated Learning powers the next generation of AI in healthcare
Federated Learning technology also creates endless possibilities for data scientists and researchers to work on emerging research questions and improve their models, trained across many diverse and representative datasets. Models that are more accurate in their predictions also reduce healthcare cost for providers and insurers, which are under increasing pressure to provide value-based care with better outcomes.
To find out more about Federated Learning in healthcare, we recommend reading a Nature Digital Medicine (September 2020) paper titled: “The Future of Digital Health with Federated Learning“, in which the authors explore how federated learning may provide a solution for the future of digital health, and highlight the challenges and considerations that need to be addressed.
Image Source: Nature Article: “The Future of Digital Health with Federated Learning”Get in touch
How does it work?
Federated learning collaboratively trains machine learning models in a distributed manner, without the need to exchange the underlying data. Algorithms are dispatched to different data centers, where they train locally. Only what the algorithm learns at these centers returns to a central location, whereby a new algorithm is trained, and the improved predictions are sent to the local datasets to re-train and improve.
Federated Learning opens an unprecedented breadth of collaboration in healthcare
Let’s take an example applied to healthcare. A data scientist from a large pharmaceutical company needs to test if the machine learning model trained on her in-house data works in the real world. After some negotiations on the contract, enforced technically with permissions, she can use federated learning to send the model to data distributed across 5 different hospitals and receive results showing how the model performed. She can then train a new model using the combined power of all 5 datasets plus her in-house data to train a more accurate and robust model. The model remains secure even though it travels to the hospitals, as they cannot access it. In addition, one can use privacy-preserving techniques to prevent leakages between the hospitals and the pharma datasets, in both directions. For healthcare applications, this allows predictive models to learn from an unprecedented amount of highly curated data, resulting in better identification of high-value subgroups of patients for clinical trials, response to treatment prediction, or biomarker selections while respecting data ownership and privacy.
We can also apply this technique with different private companies: this is what we call “collaborative federated learning” or “coopetition”. An example of this is the MELLODDY project, where ten pharmaceutical companies collaborate to train machine learning models for drug discovery based on private and highly sensitive high-content screening datasets. Owkin Connect capabilities help build trust as privacy and security are at the core of the consortium. Our platform is audited yearly by all pharma partners and external security companies:
- Sensitive data and assay- specific models remain securely locked on each pharma’s server
- Lower level model components are securely exchanged and trained over the network with secure aggregation
- Complex but transparent pre-agreed access arrangements are strictly enforced through distributed ledger technology
According to recent studies, federated learning models can achieve performance levels comparable to ones trained on centrally hosted data sets and, even superior to models that only see isolated single-institutional data.
Federated Learning to Accelerate & Transform Medical Research
Owkin Connect is our Federated Learning software that powers collaborations between hospitals, research centers, technology partners and life science companies by connecting datasets without compromising privacy or security. Owkin Connect allows companies to extract insights on decentralized data to solve the data sharing challenge in healthcare. With additional layers of privacy & encryption in place, the data’s confidentiality cannot be breached. This framework complies with GDPR and other data privacy regulations.Book a Demo
Real World applications of Federated Learning in Healthcare
Our federated learning software Owkin Connect is the ideal tool to learn from distributed datasets among consortium partners. It is already involved in many collaborative research projects centered around medical research.
Working with Owkin and adapting to your consortium needs
Owkin can support your consortium in one of two capacities, based on your research topic and the needs of your project.