Data does not move, only algorithms travel
Federated learning is a recently proposed decentralized machine learning procedure to train machine learning models with multiple data providers. Instead of gathering data on a single server, the data remain locked on their server and the algorithms and predictive models travel between them. The goal of this approach is to benefit from a large pool of data, resulting in increased machine learning performance, while respecting data ownership and privacy.
The value of real-world data and AI is unquestionable. However the real and difficult challenge lies in identifying high-value subgroups of patients for clinical trials, predicting response to treatment, selecting biomarkers, and extracting relevant insights.
In order to be robust and performant, predictive models need to be trained on heterogeneous data to generalize well to different patient populations, treatment plans, and different data modalities such as histology slides and genomic data.
Today’s standard approach of centralizing data from multiple centers comes at the cost of critical concerns regarding patient privacy and data protection.
For all these reasons, the ability to train machine learning models at scale across multiple medical institutions without moving the data is a critical technology to solve this problem.
Federated Learning, a private and secure machine learning framework
Patient data remains at the hospital, safe within the hospital’s local security infrastructure.
Owkin Connect is the framework for federated learning over the Owkin Loop. It complies with:
- Increasing regulations, addressing risks of patient data breaches
- The need for stricter control and governance by data owners
- The need to meet higher transparency requirements on how the algorithms are trained and how the data is used
Which can be comparable to the predictive performance of pooled data
Furthermore, in some cases, such as accessing sensitive data from the US and Europe, building a centralized data hub to pool the data would be extremely hard and poses many international challenges. In this case, federating the data is the only option. Owkin has developed Federated Learning strategies to optimize training while respecting the security, privacy, and traceability challenges.
And ensures computation traceability while incentivizing data quality
Owkin’s unique Federated Learning stack of distributed training strategies, Federated Learning services, and the distributed learning network unlocks the potential for a sustainable Federated Learning business model promoting data quality.
Hospitals and research institutions retain control and governance over patient data and can access a full and unforgeable record of which data has been used for what purpose.
Owkin data scientists train Federated Learning models across distributed data. The aim is to identify which data contributes which parameters to the algorithmic training. This framework promotes data quality, reinforcing Owkin Loop’s sustainable access to well-curated multimodal data.Owkin Loop
With state-of-the-art technology
- In each research center node, Owkin installs servers that can run large scale machine learning computations.
- The platform is based on a private blockchain and uses Substra, a software framework in the Owkin Federated Learning stack for orchestrating distributed machine learning tasks in a secure way. Substra is based on Hyperledger Fabric. This forms the heart of Owkin’s fully transparent and non-forgeable traceability platform.
- Owkin orchestrates the training of machine learning models between each node, specifying the series of distributed computations required to enact a given federated training session. Owkin also employs techniques like differentially private model training to further protect patient data.
- We provide a Python library and toolkit to support data scientists train their models over Owkin Connect federated network of nodes.
Owkin is launching a federated research movement
Owkin is creating a movement in medicine by establishing Federated Learning as the unique scalable and sustainable way to leverage heterogeneous datasets, and we want data scientists and research centers to join us. For this reason, we have open-sourced the lower layers of our Federated Learning stack under the name of Substra.Join Us