In part 1 of this three-part series on federated learning in healthcare, we have introduced this new machine learning paradigm and how it can ensure patient privacy and data protection for sensitive health data. Contrary to traditional machine learning, which requires training data to be centralised, only models and insights are aggregated from those participating in a shared federated learning setting. In this article, we will explore how organisations can maximise the potential of federated learning, fueling open innovation and discoveries in biomedical research.
(i) Federated learning is a machine-learning setting where multiple partners (hospitals, pharma companies, or individual researchers) can collaborate on complex research questions without centralising or sharing data. This ‘collaborative machine learning’ approach enables teams to train their models on larger, previously inaccessible, datasets, boosting the predictive power of machine learning algorithms and enhancing AI capabilities.
(ii) By overcoming privacy concerns (see article 1), pharma companies can build partnerships and consortia and retain their competitive edge (‘coopetition’). Federated learning in healthcare also facilitates knowledge transfer between medical researchers and data scientists and helps bridging the gap between AI and clinical care.
(iii) Owkin is spearheading real world use cases of collaborative machine learning, powered by their platform and technology, Owkin Connect. Pharma consortium MELLODDY is pioneering federated learning-based drug discovery across 10 pharma companies. Another example of a successful machine learning collaboration (non-powered by federated learning) is the ‘COVID-19 AI severity score’ tool, result of a recent partnership between hospitals, academia and data scientists. These collaborative networks build trust between key stakeholders and provide a foundation for breakthroughs in medical AI.
Federated learning in healthcare: The shift from competition to data-private, collaborative machine learning
Developing a new drug is a lengthy and costly process, escalating despite continuous advances in technology (known as ‘Eroom’s law’) (1). Pharma companies today are aware, now more than ever, that collaborations with competitors and hospitals potentially hold the key to reversing this trend: from identifying drug leads to improving clinical trial design to monitoring patient outcomes (2).
However, concerns around privacy and protection of IP are major barriers in the industry. Data generated from pharma-sponsored clinical trials (about two thirds of all trials) (3) are generally not available to anyone but the sponsor until trial completion, due to the sensitivity and value of these clinical datasets.
Similarly, pharma-owned ‘chemical libraries’ – precious collections of chemicals used for in-house drug development – are not set up for sharing. What’s more, many pharma companies don’t deploy machine learning technologies at scale yet because building in-house expertise takes time, and they rely on exclusive partnerships with tech companies to protect their innovations (2). Thus, despite the soaring appetite for AI, the potential of data and insight sharing between pharma companies is not yet realised.
How might we apply the federated learning technology to allow pharma companies to build trust and collaborate, without risking their competitive advantage?
In a competitive environment, those who collaborate will excel.
The term ‘coopetition’ stands for a new paradigm of collaborative research in a low-trust environment (4). In federated learning, competing pharma companies combine insights from multiple datasets in form of an machine learning model, without sharing their raw data, their distribution across coopetition partners or access to other servers. Data ownership and ensuring compliance with regulatory standards (such as GDPR or HIPAA) (5) remains in the hands of each partner. Additional privacy-preserving technologies (secure aggregation, differentially private model training) have been put in place to prevent data leakage (6).
An example of ‘coopetition in action’ is MELLODDY, a public-private consortium among 10 pharma companies in Europe, facilitated by Owkin. Together, they make up the largest existing chemical compound library (more than 10 million molecules and 1 billion assays), empowering each partner to better identify the most promising drug candidates for development. Until recently, an unthinkable scenario.
“Pharma can work together if they trust that data won’t be shared. The sky’s the limit for building collaborations to address unmet medical needs” Thomas Clozel, CEO, Owkin
Perhaps counterintuitively, MELLODDY aims to maximise transparency in two ways: (a) it runs on an open-source framework (Substra, made accessible to the scientific community) and (b), is built on distributed ledger technology to make all actions and exchanges fully traceable (more on this in article 3).
After a successful security audit (outsourced to an independent firm) of the platform, the first ‘federated run’ was accomplished in 2020, demonstrating that a multi-task (i.e. target-agnostic) machine learning model can be trained at scale across institutions. Going forward, the platform will be used to test a variety of research questions aimed at increasing efficiency in drug development.
Medical researchers and data scientists don’t speak the same language and lack collaborative machine learning tools.
Another limitation in medical research concerns the imbalance of AI skills across disciplines. Medical researchers often bring many years of clinical experience and deep knowledge about diagnostics and treatment – but developing and testing machine learning algorithms was not part of their training curriculum.
How might we bring machine learning approaches to medical researchers without turning every researcher into a data scientist and vice-versa?
Federated learning and machine learning platforms can help bring different disciplines together. Owkin’s response to this challenge is called Owkin Connect (previously known as Owkin Studio), a Federated Learning software that also supports machine learning collaborations between hospitals, research centers and data scientists with complementary datasets and expertise.
At the height of the COVID-19 pandemic, for instance, Owkin Connect facilitated a partnership across two French hospitals (Institut Gustave Roussy and Kremlin-Bicetre APHP) to help predict disease severity of hospitalised patients. In only 2 months, a model was built that can analyse multimodal data (CT images of the lung, radiology reports and a variety of clinical and biological data points) and indicate a ‘COVID-19 AI severity score’ (7) as soon as a patient has been diagnosed.
The tool makes it easier and faster for radiologists and other medical staff to categorise patients according to their prognosis, guide treatment decisions and help hospitals with resource allocation. The code developed for the project is open source and available to other interested researchers (7).
Conclusions and outlook
These are some examples showcasing how federated learning in healthcare and collaborative machine learning can transform medical research by connecting individuals and institutions. Beyond the examples cited previously, the technology offers many opportunities, such as advancing research on rare diseases by training a model on small datasets that are scattered all around the world (8). Thanks to the flexible nature of federated learning frameworks, the scope of such collaborations can span diverse research areas and inspire novel solutions – often much faster than traditional machine learning approaches.
Many of the outstanding challenges in federated learning are interdisciplinary and can only be solved collaboratively – such as adapting to country-specific privacy regulation, providing sufficient IT infrastructure, standardising code-sharing practices, or designing fair compensation schemes for contributing partners (9). Continuous development of both technology and ecosystem are key to making federated learning collaborations successful in tomorrow’s complex healthcare environment.
The first two articles of this series explored the impact of federated learning in healthcare on data privacy and confidentiality as well as new opportunities for collaboration to drive medical research. In the final piece, we will dive into traceability of machine learning operations.
- Scannell, J., Blanckley, A., Boldon, H. et al. Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov 11, 191–200 (2012)
- Report ‘AI for Drug Discovery 2020’, Deep Pharma Intelligence [Accessed on 01/03/2021]
- ‘Oncology dominates clinical trials activity in 2020, says GlobalData’, GlobalData [Accessed on 01/03/2021]
- Wilson, C. J., ‘Games and Events: A New Era of Coopetition in Pharma R&D’ Elsevier Pharma R&D (2018) [Accessed on 01/03/2021]
- https://www.gdprexplained.eu/ [Accessed on 24/02/2021]
- Lake, J., “Federated learning: Is it really better for your privacy and security?”, comparitech (2019) [Accessed 24/02/2021]
- Lassau, N., Ammari, S., Chouzenoux, E. et al. Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients. Nat Commun 12, 634 (2021).
- Schaefer, J., Lehne, M., Schepers, J. et al. The use of machine learning in rare diseases: a scoping review. Orphanet J Rare Dis 15, 145 (2020).
- Peter Kairouz, H. Brendan McMahan, et al. “Advances and Open Problems in Federated Learning.” arXiv:1912.04977 [cs.LG] (2019).