What is Federated Learning?
Federated learning is a new decentralized machine learning software to train machine learning models with multiple data providers. Instead of gathering data on a single server, the data remains locked on their servers and the algorithms and only the predictive models travel between the servers – never the data. The goal of this approach is for each participant to benefit from a larger pool of data than their own, resulting in increased ML performance, while respecting data ownership and privacy.
The ability to train machine learning models at scale across multiple medical institutions without pooling data is a critical technology to solve the problem of patient privacy and data protection. A successful implementation of federated learning in healthcare could hold significant potential for enabling precision medicine at a large-scale; helping match the right treatment to the right patient at the right time.
Why Federated Learning solves the main challenges of Machine Learning in Healthcare
As technology advances, machine learning algorithms have become further integrated into every aspect of our lives. Revolutionizing all industries including healthcare, machine learning shows promise in accelerating medical research. We now observe regular use of predictive modelling, and other machine learning techniques, across the research spectrum to quickly and accurately generate medical insights. That goes for example from cancer biomarker identification to patient screening and genetic prediction from imaging. These applications not only expand researchers’ abilities to make discoveries but also help address time & cost obstacles across the healthcare industry.
There is a major hurdle preventing the consistent deployment of AI in healthcare at an impactful scale. Machine learning approaches are “data-hungry”. Algorithms need access to large and diverse datasets to train, improve their accuracy, and eliminate bias. The traditional infrastructure of modern healthcare systems makes it difficult to organize vast quantities of medical data in a way that machine learning can make the most of.
Our primary mission is to balance today’s standard approach of centralizing data from multiple centres with critical concerns regarding patient privacy and data protection. Software that handles personal data is bound by strict privacy laws. Healthcare systems must protect personal data at all times, and current standard practices, such as anonymization, may even require removing data that may be critical for medical discoveries. The data requirements for machine learning in healthcare leave us with a central challenge: How can we access the volume of data needed to transform healthcare with AI at scale while respecting patient privacy and confidentiality on sensitive health data?
To access more real-world data and larger, more diverse datasets for training AI algorithms on, healthcare stakeholders (hospitals, research centres, life science companies) need to start collaborating. But how to do so in a privacy-preserved way?
Federated Learning powers the next generation of AI in healthcare
Federated learning technology also creates endless possibilities for data scientists and researchers to work on emerging research questions and improve their models, trained across many diverse and representative datasets. Models that are more accurate in their predictions also reduce healthcare costs for providers and insurers, which are under increasing pressure to provide value-based care with better outcomes.
To find out more about Federated learning in healthcare, we recommend reading a Nature Digital Medicine (September 2020) paper titled: “The Future of Digital Health with Federated Learning“, in which the authors explore how federated learning may provide a solution for the future of digital health, and highlight the challenges and considerations that need to be addressed.
Image Source: Nature Article: “The Future of Digital Health with Federated Learning”Get in touch
How does it work?
Federated learning software collaboratively trains machine learning models in a distributed manner without exchanging the underlying data. We dispatch algorithms to different data centres, where they train locally. Only what the algorithm learns at these centres returns to a central location, whereby a new algorithm is trained, and the improved predictions are sent to the local datasets to re-train and improve.
Federated Learning opens an unprecedented breadth of collaboration in healthcare
Let’s take an example applied to healthcare. A data scientist from a large pharmaceutical company needs to test if the machine learning model trained on her in-house data works in the real world. After some negotiations on the contract, she can use federated learning to send the model to data distributed across 5 different hospitals and receive the model’s performance results. She can then train a new model using the combined power of all 5 datasets plus her in-house data to train a more accurate and robust model. The model remains secure even though it travels to the hospitals, as they cannot access it. In addition, one can use privacy-preserving techniques to prevent leakages between the hospitals and the pharma datasets, in both directions. For healthcare applications, this allows predictive models to learn from an unprecedented amount of highly curated data, resulting in better identification of high-value subgroups of patients for clinical trials, response to treatment prediction, or biomarker selections while respecting data ownership and privacy.
Collaborative Federated Learning applications
We can also apply this technique with different private companies: this is what we call “collaborative federated learning” or “coopetition”. An example of this is the MELLODDY project, where ten pharmaceutical companies collaborate to train machine learning models for drug discovery based on private and highly sensitive high-content screening datasets. Owkin Connect capabilities help build trust as privacy and security are at the core of the consortium. All of our pharma partners and external security companies audit our platform yearly:
- Sensitive data and assay- specific models remain securely locked on each pharma’s server
- Lower level model components are securely exchanged and trained over the network with secure aggregation
- Complex but transparent pre-agreed access arrangements are strictly enforced through distributed ledger technology
According to recent studies, federated learning models can achieve performance levels comparable to ones trained on centrally hosted data sets and, even superior to models that only see isolated single-institutional data.
Federated Learning to Accelerate & Transform Medical Research
Owkin Connect is our federated learning software that powers collaborations between hospitals, research centers, technology partners and life science companies by connecting datasets without compromising privacy or security. Owkin Connect allows companies to extract insights on decentralized data to solve the data sharing challenge in healthcare. With additional layers of privacy & encryption in place, the data’s confidentiality cannot be breached. This framework complies with GDPR and other data privacy regulations.Book a Demo
Real World applications of Federated Learning in Healthcare
Our federated learning software Owkin Connect is the ideal tool to learn from distributed datasets among consortium partners. It has already taken part in a number of collaborative research projects around medical research.
Working with Owkin and adapting to your consortium needs
Owkin can support your consortium in one of two capacities, based on your research topic and the needs of your project.