How Owkin trains AI models without transferring data with Federated Learning


Tags: FL


Date:October 16th, 2021


How Owkin trains AI models without transferring data with Federated Learning

Computer monitor at the center of circulating data that is securely locked

October 13th 2021, Paris

Training machine learning algorithms locally, before aggregating them into a more efficient model, all without transferring the data: this is federated learning. This is how to reconcile data confidentiality and performance. Explanations with Owkin, a start-up based in France and New York, specializing in this method, which has just emerged from the world of research.

Make the artificial intelligence models travel rather than the data. This is the principle of federated learning, which protects the confidentiality of data. Mostly explored in the world of research, this type of machine learning is gradually making its way into the industry. Owkin, a start-up specializing in artificial intelligence, has made it its core business.

Owkin Connect, the federated learning software developed by the start-up, enables different partners to be linked on the same platform. From their own interfaces, data scientists will be able to train their models using data to which they do not have access. “In concrete terms, a model will be initiated and then shared with all participants. Each person will receive the model and carry out local training,” explains Sebastian Schwarz, product strategy manager at Owkin. Once this step is completed within the infrastructures and behind the firewalls, the parameters of the local models will be sent to a central aggregator – which can be one of the centres or Owkin – to perform an averaging. We’ll then have a model that’s an amalgamation of the local results.”

Data that remains confidential

This process will be repeated about 20 times, returning a more and more elaborate model to the participants, until the final model. Participants get a model that would have been impossible to get on their own, one that is more predictive and more efficient,” says Romain Goussault, product manager. For example, if Nantes University Hospital wants to train a model to differentiate moles from skin tumours, but only does it on patients from Nantes, it will miss many lighter or more tanned skin types, and it won’t be generalizable.”

The fact that the data is not transmitted represents an advantage, especially for processing heavy information, which can therefore remain on site, but above all for preserving the confidentiality of the data. “One of our teams is trying to find out if it is possible to reconstruct information from the model, but for the moment it is impossible,” says Sebastian Schwarz.

Partnerships between hospitals

Ideal therefore for setting up partnerships between hospitals that cannot disclose their patients’ information, or for companies that want to compete. Two consortia have so far subscribed to this system: HealthChain, a network of hospitals, and MELLODDY, a group of ten pharmaceutical companies (including AstraZeneca, Bayer, GSK and the Servier research institute).

Owkin, therefore, focuses on the medical field and the development of associated technologies. For example, HealChaine is working on a model capable of predicting the response to chemotherapy of a breast cancer patient based on sample slides. Other federated learning platforms are on the way, such as FATE (Federated AI Technology Enabler), designed by China’s Webank AI, focused on finance, but examples of applications are still rare.

Federated learning is an active research topic, which remains very experimental and, for the time being, quite unconnected with industry,” confirms Sébastian Schwarz. But we can also say that this was the case for machine learning until the last few years.”