Patient data is full of potential, but don’t “data grab” it too tightly

Nearly a year and a half into the global COVID-19 pandemic and all eyes are riveted to medical news, trained to grasp the finest details on clinical trials and vaccine efficacy. It’s a wonder then that the NHS Digital’s recent rollout of a Data Provision Notice (DPN) to collect primary care data to provide a comprehensive dataset that covers the GP records of everyone living in England was not more conscientious of the current medical information context. NHS Digital does emphasize that these data centralization efforts are to support “vital research and improvements in health services,” a truly laudable goal given the global wreckage on display these past pandemic months. And yet, this somewhat obscured intention is lost in the media coverage of the NHS’s “data grab” and the multitude of troubling indications this has for patient privacy alongside data governance concerns.

Rightly so, as we’ve seen time and again, as the potential of leveraging vast swaths of human-generated data for new knowledge, insights, breakthroughs (and monetary gain) increases, so too does the cognitive dissonance between abstracted data and the fact that it is representative of individual humans (with complex lives and inalienable rights). However, it is both true that patient data privacy and institutional governance are imperative, and also that there is immense potential to change the treatment landscape of severe and cure-avoidant disease with expertly leveraged information from patient data. The challenge and opportunity lies not in pitting these two elements against each other as trade-offs, but in considering how, by prioritizing privacy and keeping patients centric to research, we may increase the chances of developing new therapies and treatment pathways that can support better patient care.

It is no secret that as we’ve come to further understand complex diseases, such as cancer, the more challenging it has become to identify the right treatment for the right patient. Precision medicine’s goal is to do just that by sifting through the variables inherent in each patient’s biological and clinical characteristics and identify the unique treatment plan that will work for them. However, the variance in disease characteristics from patient to patient is vast and appears to keep growing as our knowledge of specific diseases deepens. This diversity of details is the main reason that Artificial Intelligence (AI) and Machine Learning (ML) have been touted as well-suited antidotes to the precision medicine paradox. These data science technologies can help us unlock key medical insights from incredibly large amounts of data points that would be difficult or impossible to do with the human eye in any reasonable amount of time. Yet, ML algorithms and models are what we call “data-hungry” and require a lot of data to train and validate their ability to recognize crucial information.

Furthermore, not all data are created equal. The most robust, predictive, and accurate ML models are created with the best curated datasets, that incorporate as many details of the biological, clinical, molecular, pathological and longitudinal manifestations of disease as is feasible. In short, better treatments for patients can stem from insights from better ML models, developed with and analyzing the most comprehensive datasets. And we find these datasets closest to the patient, curated by leading clinical and research experts in the field.

We believe that medical researchers, clinicians and their institutions are the right gatekeepers for this patient data. Not only because they operate close to the source, but by their nature, these roles and institutions have the best interests of their patients at heart. From a research perspective, abstracted data is only so useful. Expert clinicians provide critical medical context to many different forms of research, AI facilitated or not. And yet, there remains a gap between clinical research breakthroughs and their application through the treatment discovery and development pipelines.

We assert precision medicine cannot happen in isolation, and research silos are a current and relevant obstacle worth addressing. This theme has its echo in the desire of governing bodies to centralize or pool datasets so that we may better optimize insights from the data with less bureaucracy and regulatory red tape. Ultimately though, the problem is not just that privacy and governance concerns are elevating in the transfer of sensitive patient data, but also that we lose impactful connections drawn between different experts from varying fields and forged through quality collaboration.

‍

Protect patient privacy with Federated Learning

At Owkin, we foster unique collaborations with top medical research centers and life science companies to unlock research siloes, make impactful discoveries, and ultimately improve patient outcomes. Though we are certain collaboration is key to advancing medicine, as is also thrown in stark relief by the COVID-19 pandemic, we aren’t blind to the challenges presented by the twofold need for data and the data’s need to stay behind institutional firewalls. We not only must contend with disease heterogeneity between patients, but also data heterogeneity across hospitals and research centers, across different standards of digitization, storage, privacy compliance, and geographical distribution. But a little-known fact is this heterogeneity, once the data is leverageable, actually can make AI models more robust and less biased — meaning they can be more useful in deriving actionable and impactful medical insights.

Connecting to data in its context is worth overcoming these challenges, and as such, we’ve developed novel Federated Learning (FL) technologies and ML approaches to enable us to collaborate with multiple data centers on the training of machine learning models without ever moving the data from their respective locations. Our FL technologies are based on the core principles of collaboration, confidentiality, and compliance. They can be used in collaborative and competitive settings when protecting confidential, proprietary or sensitive data and insights extracted from the data through the models is essential. We believe these approaches can have a significant impact on healthcare stakeholders through preserving privacy, enabling collaborative ML, and building trust by ensuring traceability.

Explore Owkin platform

What’s more, finding opportunities for mutual gain, rather than actions that appear to harvest private and personal data, are crucial for engendering trust across all the stakeholders committed to improving patients’ lives. We need to continue to measure the success of these endeavors in improved patient care rather than in the size of collected datasets, and understand there are alternative methods to connecting patient data to the health care sector actors in pursuit of this ultimate goal, without sacrificing the privacy and data security of those we aim to help.

Perhaps we should recall, in another truth laid bare by COVID-19, “the patient” represents us all; as such, the decisions we make about patient data also affect us all.

‍

Reproduced from Dechert LLP’s June 2021 UK Life Sciences and Healthcare Newsletter.