Our latest thinking
Federated Learning in healthcare – Building trust through traceability
In parts 1 and 2 of our article series, we looked at how Federated Learning protects data privacy and enables a new form of collaboration between competitors, called ‘coopetition’. It allows partners to train machine learning (ML) models on proprietary datasets and benefit from collective insights, without actually revealing or moving data from their servers. This article covers the third essential driving force behind Federated Learning: traceability of all interactions through distributed ledger technology, and how it can redefine trust among healthcare stakeholders.
Making data available to competitors for model training is no small endeavor for pharma. Healthcare data fuels pharma companies’ clinical and business success and is fiercely protected. However, through ground-breaking partnerships like MELLODDY, pharma companies are now able to collaborate in a Federated Learning setting intended to improve their R&D processes. This is achieved by improving the performance of their ML models while protecting highly confidential and valuable data from attacks or leakages (1).
One way to enable ‘distrusting’ or competing partners to do business together is by making every action traceable, like a non-erasable trail of footprints left behind. In Federated Learning, this is accomplished through distributed ledger technology (DLT) – the same technology that underpins the widely known blockchain technology (it may help to think that blockchain is to DLT what Cola is to soft drinks – a popular type).
How does distributed ledger technology work?
DLT stores ‘digitally signed’ transactions on ledgers (activity logs) that are distributed across all participants in the system. In a Federated Learning partnership based on DLT, those ‘transactions’ include access history, permissions for individual partners to perform certain computations, and all operational steps. Only anonymised, non-sensitive meta-data are logged on the ledger, impossible to trace back to the original data owner (3,4,5).
Contrary to well-publicized public blockchains like Bitcoin, Federated Learning uses a private DLT with authorized members only (such as pharma companies or hospitals). There is no central authority managing the data but each step of the model training needs to be approved following a predefined consensus policy and cannot be edited or deleted. The resulting activity log, like a bank statement, can be requested by any member at any time.
For all the reasons above, we refer to DLT as a ‘trustless technology’. Flipping this concept around, collaboration within a Federated Learning setting only requires trusting technology with established mathematical and cryptographic rules and transparency. It does not require trust in one’s competitors. This ultimately makes collaborations in highly competitive environments possible (2).
What is in it for pharma and hospitals?
Having a complete track record of all interactions means that privacy and data protection protocols can be audited by data protection officers or external security auditors to identify potential vulnerabilities. In addition, it leads to the reproducibility of the model training steps – they can be validated on other datasets to assess whether the research findings are generalizable to a broader patient population (1).
One promising technical advantage of DLT is the opportunity to evaluate contributing datasets. The quality of a model is defined by its source data, and tracing the incremental performance increase of a model can reveal potential biases that may have been introduced. This can ultimately lead to calculating contribution scores for participants of a federated network (3).
Compared to the pharma industry, hospitals have a much less competitive stance and are generally regarded as ‘trusted’ partners (1). Nonetheless, DLT-based systems bring key advantages for hospitals. From a data privacy point of view, keeping a record of data access history is essential for compliance and minimising the risk of data breaches. For medical researchers, it is in their best interest to have reproducible research protocols that can be validated on different datasets and improve the robustness of findings.
Traceable ML processes may also impact future business models, driving the ‘industrialization’ of AI (6). Today, AI is often a ‘tailor-made’ process – challenging to scale and maintain. To use an analogy, car engineering was only scaleable once supply chain, terminology, safety guidelines, and so on were standardized.
How does Federated Learning’s traceability look like in practice?
One of the leading Federated Learning collaborations in healthcare based on DLT is a consortium of 10 pharma companies called MELLODDY. The project is powered by Owkin Connect, Owkin’s privacy-preserving, traceable, secure technology platform (see article 2). The MELLODDY team has successfully completed the first ‘federated run’ and is now working on advancing drug discovery and development. HealthChain is another real-world deployment of traceable ML, this time between hospitals co-developing ML models for improved prediction of breast cancer outcome (see article 1).
In both use cases, Owkin uses Substra to orchestrate all distributed ML steps, an in-house developed open-source software platform built on hyperledger fabric (a common DLT framework). DLT (as well as additional security strategies, see article 1) protects the partners from any potential attacks through each other as well as from attacks through Owkin. Full transparency creates trust in a trustless environment and unlocks opportunities for collaborative research.
(i) Federated Learning collaborations are often realized in ‘trustless’ environments. Competing pharma companies need assurance that their highly valuable data are protected from hacking attempts and data leakage. Distributed ledger technology (DLT) ensures traceability of all interactions, stored on an immutable activity log across authorized partners.
(ii) DLT-based collaborations benefit both pharma companies and hospitals through reproducibility – all steps of the model training can be (a) audited and (b) repeated on validation datasets. In the future, DLT can also help assess the value of contributing datasets to establish transparent compensation schemes.
(iii) Traceability of all interactions during model training is a major strength of Federated Learning. Data-private yet transparent ML platforms such as Owkin Connect build trust and enable collaborative ML, driving precision medicine, new business models for pharma, and overall healthcare discoveries.
This article series provided a glimpse into the key advantages of Federated Learning and its significant impact on healthcare stakeholders: (i) preserving privacy, (ii) enabling collaborative ML, and (iii) building trust by ensuring traceability. The technology holds great promises for precision medicine – tailor-made treatments for patient subgroups – which can only progress through access to large and multi-modal datasets (7). The first successful real-world applications, including MELLODDY to enhance AI drug discovery and development, predict an exciting future for Federated Learning in healthcare.