Open source federated learning software
Aggregate models and insights - not data
Substra is a ready-to-use, open source federated learning (FL) software developed by Owkin, now hosted by the Linux Foundation for AI and Data.
Substra enables the training and validation of machine learning models on distributed datasets. It includes a flexible Python interface and a web application to run federated learning training at scale.
Academic research centers and biopharma companies deploy Substra (formerly Owkin Connect) in a wide variety of federated learning settings for clinical research, drug discovery and development.
Federated learning
Why use Substra?
FAQs
Substra is an open source federated learning (FL) software. It enables the training and validation of machine learning models on distributed datasets. It provides a flexible Python interface and a web app to run federated learning training at scale. Substra is the most proven software for federated learning on healthcare data in real production environments. It has already been deployed and used by hospitals and biotech companies (see the MELLODDY project for instance). Substra can also be used on a single machine on a virtually split dataset to perform FL simulations and debug code before launching experiments on a real network.
Substra was originally developed by Owkin and is now hosted by the Linux Foundation for AI and Data. Today Owkin is the main contributor to Substra.
Substra can run tasks on any type of data: tabular data, images, videos, audio, time series, etc.
Substra is fully compatible with machine learning models written in Python from any library (PyTorch, Tensorflow, Sklearn, etc). However, a specific interface has been developed to use PyTorch in Substra, which makes writing PyTorch code simpler than using other frameworks.
Substra has been designed to work especially well on healthcare use cases; however, as Substra can work on any kind of data with any Python libraries, Substra can be used for any computation on distributed data.
Substra is genuinely production ready - it is the only FL framework to be used on real healthcare data, proven in real production environments (federating pharmaceutical companies or hospitals). It is highly flexible compared to other frameworks; for example, Substra is fully data agnostic, ML model agnostic and ML framework agnostic.
There are different levels of protection depending on the trust level. Substra provides a high degree of traceability: data providers know with full transparency which algos were used. This is a “posteriori” validation, which is very powerful when combined with contractual obligations. Additional protections such as secure aggregation or differential privacy can be added into the software and Owkin has experience implementing these methodologies in real world projects.
SubstraFL provides built-in support for PyTorch. However, the lower levels of Substra are framework agnostic. In particular, it is perfectly possible to run jax with Substra.
Yes but only for specific projects, please contact us to discuss.
LFAI is the Linux Foundation for AI and Data. Its mission is to build and support an open AI and data community, and drive open source innovation in the AI and data domains. LFAI is an umbrella foundation of the Linux Foundation. The Linux Foundation hosts several open source projects from major tech companies Kubernetes (Google), AMundsen (Lyft), Pyro (Uber) and recently PyTorch (Facebook).