What is Federated Learning?
Federated Learning is a machine learning procedure where the goal is to train a high-quality model with data distributed over several independent providers. Instead of gathering the data on a single central server, the data remains locked on their server, and the algorithms and predictive models travel between them.
Today’s standard approach of centralizing data from multiple centers comes at the cost of critical concerns regarding patient privacy and data protection. The ability to train machine learning models at scale across multiple medical institutions without moving the data is a critical technology to solve this problem.
At Owkin, we see Federated Learning as a platform that connects pharmaceutical companies and hospitals. It must be designed to match all their requirements, from Machine Learning performance to privacy guarantees, IT services, and legal frames. We can also see Federated Learning as a technology used by data scientists, medical experts, data managers among different organizations to train predictive models collaboratively.
Part of our vision is to build the largest collaborative research network in health, powered by federated learning. We have many live projects developing this technology and committed to ensuring proper data privacy by open-sourcing certain code parts.
Federated Learning, a private and secure machine learning framework
Patient data remains at the hospital, safe within the hospital’s local security infrastructure.
Owkin Connect is the framework for federated learning over the Owkin Loop. It complies with:
- Increasing regulations, addressing risks of patient data breaches
- The need for stricter control and governance by data owners
- The need to meet higher transparency requirements on how the algorithms are trained and how the data is used
Which can be comparable to the predictive performance of pooled data
The value of real-world data and AI is unquestionable. However, the real and difficult challenge lies in identifying high-value subgroups of patients for clinical trials, predicting response to treatment, selecting biomarkers, and extracting relevant insights. In order to be robust and performant, predictive models need to be trained on heterogeneous data to generalize well to different patient populations, treatment plans, and different data modalities such as histology slides and genomic data.
Furthermore, in some cases, such as accessing sensitive data from the US and Europe, building a centralized data hub to pool the data would be extremely hard and poses many international challenges. In this case, federating the data is the only option. Owkin has developed Federated Learning strategies to optimize training while respecting the security, privacy, and traceability challenges.Get In Touch
Federated Learning ensures computation traceability while incentivizing data quality
Owkin’s unique Federated Learning stack of distributed training strategies, Federated Learning services, and the distributed learning network unlocks the potential for a sustainable Federated Learning business model promoting data quality.
Hospitals and research institutions retain control and governance over patient data and can access a full and unforgeable record of which data has been used for what purpose.
Owkin data scientists train Federated Learning models across distributed data. The aim is to identify which data contributes which parameters to the algorithmic training. This framework promotes data quality, reinforcing Owkin Loop’s sustainable access to well-curated multimodal data.
With state-of-the-art technology
- In each research center node, Owkin installs servers that can run large scale machine learning computations.
- The platform is based on a private blockchain and uses Substra, a software framework in the Owkin Federated Learning stack for orchestrating distributed machine learning tasks in a secure way. Substra is based on Hyperledger Fabric. This forms the heart of Owkin’s fully transparent and non-forgeable traceability platform.
- Owkin orchestrates the training of machine learning models between each node, specifying the series of distributed computations required to enact a given federated training session. Owkin also employs techniques like differentially private model training to further protect patient data.
- We provide a Python library and toolkit to support data scientists train their models over Owkin Connect federated network of nodes.
Owkin is launching a federated research movement
Owkin is creating a movement in medicine by establishing Federated Learning as the unique scalable and sustainable way to leverage heterogeneous datasets, and we want data scientists and research centers to join us. For this reason, we have open-sourced the lower layers of our Federated Learning stack under the name of Substra.Get In Touch