The SARS-COV-2 pandemic has put pressure on intensive care units, so that identifying predictors of disease severity is a priority. We collect 58 clinical and biological variables, and chest CT scan data, from 1003 coronavirus-infected patients from two French hospitals. We train a deep learning model based on CT scans to predict severity. We then construct the multimodal AI-severity score that includes 5 clinical and biological variables (age, sex, oxygenation, urea, platelet) in addition to the deep learning model. We show that neural network analysis of CT-scans brings unique prognosis information, although it is correlated with other markers of severity (oxygenation, LDH, and CRP) explaining the measurable but limited 0.03 increase of AUC obtained when adding CT-scan information to clinical variables. Here, we show that when comparing AI-severity with 11 existing severity scores, we find significantly improved prognosis performance; AI-severity can therefore rapidly become a reference scoring approach.
As AI-based medical devices are becoming more common in imaging fields like
radiology and histology, interpretability of the underlying predictive models is crucial to expand their use in clinical practice. Existing heatmap-based interpretability
methods such as GradCAM only highlight the location of predictive features but
do not explain how they contribute to the prediction. In this paper, we propose a
new interpretability method that can be used to understand the predictions of any
black-box model on images, by showing how the input image would be modified in
order to produce different predictions. A StyleGAN is trained on medical images
to provide a mapping between latent vectors and images. Our method identifies
the optimal direction in the latent space to create a change in the model prediction.
By shifting the latent representation of an input image along this direction, we can
produce a series of new synthetic images with changed predictions. We validate our
approach on histology and radiology images, and demonstrate its ability to provide
meaningful explanations that are more informative than GradCAM heatmaps. Our
method reveals the patterns learned by the model, which allows clinicians to build
trust in the model’s predictions, discover new biomarkers and eventually reveal
One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.
Purpose Lymphoma lesion detection and segmentation on whole-body FDG-PET/CT are a challenging task because of the diversity of involved nodes, organs or physiological uptakes. We sought to investigate the performances of a three-dimensional (3D) convolutional neural network (CNN) to automatically segment total metabolic tumour volume (TMTV) in large datasets of patients with diffuse large B cell lymphoma (DLBCL). Methods The dataset contained pre-therapy FDG-PET/CT from 733 DLBCL patients of 2 prospective LYmphoma Study Association (LYSA) trials. The first cohort (n = 639) was used for training using a 5-fold cross validation scheme. The second cohort (n = 94) was used for external validation of TMTV predictions. Ground truth masks were manually obtained after a 41% SUVmax adaptive thresholding of lymphoma lesions. A 3D U-net architecture with 2 input channels for PET and CT was trained on patches randomly sampled within PET/CTs with a summed cross entropy and Dice similarity coefficient (DSC) loss. Segmentation performance was assessed by the DSC and Jaccard coefficients. Finally, TMTV predictions were validated on the second independent cohort. Results Mean DSC and Jaccard coefficients (± standard deviation) in the validations set were 0.73 ± 0.20 and 0.68 ± 0.21, respectively. An underestimation of mean TMTV by − 12 mL (2.8%) ± 263 was found in the validation sets of the first cohort (P = 0.27). In the second cohort, an underestimation of mean TMTV by − 116 mL (20.8%) ± 425 was statistically significant (P = 0.01). Conclusion Our CNN is a promising tool for automatic detection and segmentation of lymphoma lesions, despite slight underestimation of TMTV. The fully automatic and open-source features of this CNN will allow to increase both dissemination in routine practice and reproducibility of TMTV assessment in lymphoma patients.
Data-driven machine learning (ML) has emerged as a promising approach for building accurate and robust statistical models from medical data, which is collected in huge volumes by modern healthcare systems. Existing medical data is not fully exploited by ML primarily because it sits in data silos and privacy concerns restrict access to this data. However, without access to sufficient data, ML will be prevented from reaching its full potential and, ultimately, from making the transition from research to clinical practice. This paper considers key factors contributing to this issue, explores how federated learning (FL) may provide a solution for the future of digital health and highlights the challenges and considerations that need to be addressed.
Federated learning enables one to train a common machine learning model across separate, privately-held datasets via distributed model training. During federated training, only intermediate model parameters are transmitted to a central server which aggregates these parameters to create a new common model, thus exposing only intermediate parameters rather than the training data itself. However, some attacks (e.g. membership inference) are able to infer properties of local data from these intermediate model parameters. Hence, performing the aggregation of these client-specific model parameters in a secure way is required. Additionally, the communication cost is often the bottleneck of the federated systems, especially for large neural networks. So, limiting the number and the size of communications is necessary to efficiently train large neural architectures. In this article, we present an efficient and secure protocol for performing secure aggregation over compressed model updates in the context of collaborative, few-party federated learning, a context common in the medical, healthcare, and biotechnical use-cases of federated systems. By making compression-based federated techniques amenable to secure computation, we develop a secure aggregation protocol between multiple servers with very low communication and computation costs and without preprocessing overhead. Our experiments demonstrate the efficiency of this new approach for secure federated training of deep convolutional neural networks.
While federated learning is a promising approach for training deep learning models over distributed sensitive datasets, it presents new challenges for machine learning, especially when applied in the medical domain where multi-centric data heterogeneity is common. Building on previous domain adaptation works, this paper proposes a novel federated learning approach for deep learning architectures via the introduction of local-statistic batch normalization (BN) layers, resulting in collaboratively-trained, yet center-specific models. This strategy improves robustness to data heterogeneity while also reducing the potential for information leaks by not sharing the center-specific layer activation statistics. We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets. We show that our approach compares favorably to previous state-of-the-art methods, especially for transfer learning across datasets.
Deep learning methods for digital pathology analysis have proved an effective way to address multiple clinical questions, from diagnosis to prognosis and even to prediction of treatment outcomes. They have also recently been used to predict gene mutations from pathology images, but no comprehensive evaluation of their potential for extracting molecular features from histology slides has yet been performed.
We propose a novel approach based on the integration of multiple data modes, and show that our deep learning model, HE2RNA, can be trained to systematically predict RNA-Seq profiles from whole-slide images alone, without the need for expert annotation. HE2RNA is interpretable by design, opening up new opportunities for virtual staining. In fact, it provides virtual spatialization of gene expression, as validated by double-staining on an independent dataset. Moreover, the transcriptomic representation learned by HE2RNA can be transferred to improve predictive performance for other tasks, particularly for small datasets.
As an example of a task with direct clinical impact, we studied the prediction of microsatellite instability from hematoxylin & eosin stained images and our results show that better performance can be achieved in this setting.
Building machine learning models from decentralized datasets located in different centers with federated learning (FL) is a promising approach to circumvent local data scarcity while preserving privacy. However, the prominent Cox proportional hazards (PH) model, used for survival analysis, does not fit the FL framework, as its loss function is non-separable with respect to the samples. The naïve method to bypass this non-separability consists in calculating the losses per center, and minimizing their sum as an approximation of the true loss. We show that the resulting model may suffer from important performance loss in some adverse settings. Instead, we leverage the discrete-time extension of the Cox PH model to formulate survival analysis as a classification problem with a separable loss function. Using this approach, we train survival models using standard FL techniques on synthetic data, as well as real-world datasets from The Cancer Genome Atlas (TCGA), showing similar performance to a Cox PH model trained on aggregated data. Compared to previous works, the proposed method is more communication-efficient, more generic, and more amenable to using privacy-preserving techniques.
The purpose of this study was to build and train a deep convolutional neural networks (CNN) algorithm to segment muscular body mass (MBM) to predict muscular surface from a two-dimensional axial computed tomography (CT) slice through L3 vertebra.
An ensemble of 15 deep learning models with a two-dimensional U-net architecture with a 4-level depth and 18 initial filters were trained to segment MBM. The muscular surface values were computed from the predicted masks and corrected with the algorithm’s estimated bias. Resulting mask prediction and surface prediction were assessed using Dice similarity coefficient (DSC) and root mean squared error (RMSE) scores respectively using ground truth masks as standards of reference.
A total of 1025 individual CT slices were used for training and validation and 500 additional axial CT slices were used for testing. The obtained mean DSC and RMSE on the test set were 0.97 and 3.7 cm2 respectively.
Deep learning methods using convolutional neural networks algorithm enable a robust and automated extraction of CT derived MBM for sarcopenia assessment, which could be implemented in a clinical workflow.
With 15% of severe cases among hospitalized patients, the SARS-COV-2 pandemic has put tremendous pressure on Intensive Care Units, and made the identification of early predictors of future severity a public health priority. We collected clinical and biological data, as well as CT scan images and radiology reports from 1,003 coronavirus-infected patients from two French hospitals. Radiologists’ manual CT annotations were also available. We first identified 11 clinical variables and 3 types of radiologist-reported features significantly associated with prognosis. Next, focusing on the CT images, we trained deep learning models to automatically segment the scans and reproduce radiologists’ annotations. We also built CT image-based deep learning models that predicted future severity better than models based on the radiologists’ scan reports. Finally, we showed that including CT scan features alongside the clinical and biological data yielded more accurate predictions than using clinical and biological data alone. These findings show that CT scans provide insightful early predictors of future severity.
Histological subtypes of malignant pleural mesothelioma are a major prognostic indicator and decision denominator for all therapeutic strategies. In ambiguous case a rare transitional [TM) pattern may be diagnosed by pathologists either as epithelioid (EM), biphasic (BM) or sarcomatoid (SM) mesothelioma. The aims of this study were to better characterize the TM subtype from a morphological, immunohistochemical, molecular standpoint; deep learning of pathological slides was applied to this cohort.Methods
A random selection of 49 representative digitalized sections from surgical biopsies of TM were reviewed by 16 panelists. We evaluated BAP1 expression and p16 homozygous deletion [HD]. We conducted a comprehensive integrated transcriptomic analysis. Unsupervised deep learning algorithm was trained to classify tumors.Results
The 16 panelists recorded 784 diagnoses on the 49 cases. Whilst Kappa value of 0.42 is moderate, the presence of a TM component was diagnosed in 51%. In 49%, the reviewers classified the lesion as EM in 53%, SM in 33%, or BM in 14%. Median survival was 6.7 months. Loss of BAP1 observed in 44% was less frequent in TM than in EM and BM. p16 HD was higher in TM 73% followed by BM (63%) and SM (46%). RNA sequencing unsupervised clustering analysis showed that TM grouped together and were closer to SM than to EM. Deep learning analysis achieved a 94% accuracy for TM identificationConclusion
These results demonstrated that TM pattern should be classified in non-epithelioid mesothelioma at minimum as a subgroup of SM type.
Standardized and robust risk stratification systems for patients with hepatocellular carcinoma (HCC) are required to improve therapeutic strategies and investigate the benefits of adjuvant systemic therapies after curative resection/ablation. In this study, we used two deep‐learning algorithms based on whole‐slide digitized histological slides (WSI) to build models for predicting the survival of patients with HCC treated by surgical resection. Two independent series were investigated: a discovery set (Henri Mondor Hospital, n=194) used to develop our algorithms and an independent validation set (TCGA, n=328). WSIs were first divided into small squares (“tiles”) and features were extracted with a pretrained convolutional neural network (preprocessing step). The first deep‐learning based algorithm (“SCHMOWDER”) uses an attention mechanism on tumoral areas annotated by a pathologist while the second (“CHOWDER”) does not require human expertise. In the discovery set, c‐indexes for survival prediction of SCHMOWDER and CHOWDER reached 0.78 and 0.75, respectively. Both models outperformed a composite score incorporating all baseline variables associated with survival. The prognostic value of the models was further validated in the TCGA dataset, and, as observed in the discovery series, both models had a higher discriminatory power than a score combining all baseline variables associated with survival. Pathological review showed that the tumoral areas most predictive of poor survival were characterized by vascular spaces, the macrotrabecular architectural pattern and a lack of immune infiltration.
Introduction The aim of the study was to extract anthropometric measures from CT by deep learning and to evaluate their prognostic value in patients with non-small-cell lung cancer (NSCLC). Methods A convolutional neural network was trained to perform automatic segmentation of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and muscular body mass (MBM) from low-dose CT images in 189 patients with NSCLC who underwent pretherapy PET/CT. After a fivefold cross-validation in a subset of 35 patients, anthropometric measures extracted by deep learning were normalized to the body surface area (BSA) to control the various patient morphologies. VAT/SAT ratio and clinical parameters were included in a Cox proportional-hazards model for progression-free survival (PFS) and overall survival (OS). Results Inference time for a whole volume was about 3 s. Mean Dice similarity coefficients in the validation set were 0.95, 0.93, and 0.91 for SAT, VAT, and MBM, respectively. For PFS prediction, T-stage, N-stage, chemotherapy, radiation therapy, and VAT/ SAT ratio were associated with disease progression on univariate analysis. On multivariate analysis, only N-stage (HR = 1.7 [1.2– 2.4]; p = 0.006), radiation therapy (HR = 2.4 [1.0–5.4]; p = 0.04), and VAT/SAT ratio (HR = 10.0 [2.7–37.9]; p < 0.001) remained significant prognosticators. For OS, male gender, smoking status, N-stage, a lower SAT/BSA ratio, and a higher VAT/SAT ratio were associated with mortality on univariate analysis. On multivariate analysis, male gender (HR = 2.8 [1.2–6.7]; p = 0.02), Nstage (HR = 2.1 [1.5–2.9]; p < 0.001), and the VAT/SAT ratio (HR = 7.9 [1.7–37.1]; p < 0.001) remained significant prognosticators. Conclusion The BSA-normalized VAT/SAT ratio is an independent predictor of both PFS and OS in NSCLC patients. Key Points • Deep learning will make CT-derived anthropometric measures clinically usable as they are currently too time-consuming to calculate in routine practice. • Whole-body CT-derived anthropometrics in non-small-cell lung cancer are associated with progression-free survival and overall survival. • A priori medical knowledge can be implemented in the neural network loss function calculation.
Deep learning frameworks leverage GPUs to perform massively-parallel computations over batches of many training examples efficiently. However, for certain tasks, one may be interested in performing per-example computations, for instance using per-example gradients to evaluate a quantity of interest unique to each example. One notable application comes from the field of differential privacy, where per-example gradients must be norm-bounded in order to limit the impact of each example on the aggregated batch gradient. In this work, we discuss how per-example gradients can be efficiently computed in convolutional neural networks (CNNs). We compare existing strategies by performing a few steps of differentially-private training on CNNs of varying sizes. We also introduce a new strategy for per-example gradient calculation, which is shown to be advantageous depending on the model architecture and how the model is trained. This is a first step in making differentially-private training of CNNs practical.
Malignant mesothelioma (MM) is an aggressive cancer primarily diagnosed on the basis of histological criteria. The World Health Organization classification subdivides mesothelioma tumors into three histological types: epithelioid MM (EMM), biphasic MM (BMM), and sarcomatoid MM (SMM). MM is a highly complex and heterogeneous disease rendering its diagnosis and histological typing difficult leading to suboptimal patient care and decision of treatment modalities.
Here, we developed a new approach based on deep convolutional neural networks (CNNs) called MesoNet to accurately predict the overall survival (OS) of mesothelioma patients from whole slide digitized images (WSIs) without any pathologist-provided locally annotated regions. We validated MesoNet on both an internal validation cohort from the French MESOBANK and an independent cohort from The Cancer Genome Atlas (TCGA). We demonstrated that the model was more accurate in predicting patient survival than using current pathology practices. Furthermore, unlike classical black-box deep learning methods, MesoNet identified regions contributing to patient outcome prediction.
Strikingly, we found that these regions are mainly located in the stroma and are histological features associated with inflammation, cellular diversity and vacuolization. These findings suggest that deep learning models can identify new features predictive of patient survival and potentially lead to new biomarker discoveries.
Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information.
The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait.
ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers.
Timely assessment of compound toxicity is one of the biggest challenges facing the pharmaceutical industry today. A significant proportion of compounds identified as potential leads are ultimately discarded due to the toxicity they induce.
In this paper, we propose a novel machine learning approach for the prediction of molecular activity on ToxCast targets. We combine extreme gradient boosting with fully-connected and graph-convolutional neural network architectures trained on QSAR physical molecular property descriptors, PubChem molecular fingerprints, and SMILES sequences. Our ensemble predictor leverages the strengths of each individual technique, significantly outperforming existing state-of-the art models on the ToxCast and Tox21 toxicity related bioactivity-prediction datasets.
We provide free access to molecule bioactivity prediction using our model at http://toxicblend.owkin.com.
The purpose of this study was to assess the potential of a deep learning model to discriminate between benign and malignant breast lesions using magnetic resonance imaging (MRI) and characterize different histological subtypes of breast lesions.
We developed a deep learning model that simultaneously learns to detect lesions and characterize them. We created a lesion-characterization model based on a single two-dimensional T1-weighted fat suppressed MR image obtained after intravenous administration of a gadolinium chelate selected by radiologists. The data included 335 MR images from 335 patients, representing 17 different histological subtypes of breast lesions grouped into four categories (mammary gland, benign lesions, invasive ductal carcinoma and other malignant lesions). Algorithm performance was evaluated on an independent test set of 168 MR images using weighted sums of the area under the curve (AUC) scores. We obtained a cross-validation score of 0.817 weighted average receiver operating characteristic (ROC)-AUC on the training set computed as the mean of three-shuffle three-fold cross-validation. Our model reached a weighted mean AUC of 0.816 on the independent challenge test set.
This study shows good performance of a supervised-attention model with deep learning for breast MRI. This method should be validated on a larger and independent cohort.
The purpose of this study was to create an algorithm that simultaneously detects and characterizes (benign vs. malignant) focal liver lesion (FLL) using deep learning.
We trained our algorithm on a dataset proposed during a data challenge organized at the 2018 Journées Francophones de Radiologie. The dataset was composed of 367 two-dimensional ultrasound images from 367 individual livers, captured at various institutions. The algorithm was guided using an attention mechanism with annotations made by a radiologist. The algorithm was then tested on a new data set from 177 patients. The models reached mean ROC-AUC scores of 0.935 for FLL detection and 0.916 for FLL characterization over three shuffled three-fold cross-validations performed with the training data. On the new dataset of 177 patients, our models reached a weighted mean ROC-AUC scores of 0.891 for seven different tasks.
This study that uses a supervised-attention mechanism focused on FLL detection and characterization from liver ultrasound images. This method could prove to be highly relevant for medical imaging once validated on a larger independent cohort.
Healthcare is an industry that raises the highest hopes regarding the potential benefits of Artificial Intelligence (AI). Physicians and medical researchers will not become programmers or data scientists overnight, nor will they be replaced by them, but they will need an understanding of what AI actually is and how it works. Similarly, data scientists will need to collaborate closely with doctors to focus on relevant medical questions and understand patients behind the data.
This case study aims to connect both audiences (physicians/medical personnel and data scientists) by providing insights into how to apply machine learning to a specific medical use case. We will walk you through the reasoning of our approach and will enable you to accompany us on a practical journey (via our Colab notebook) focused on understanding the underlying mechanics of an applied machine learning model.
Our experiment focuses on creating and comparing algorithms of increasing complexity in a successful attempt to estimate the physiological age of a brain based on Magnetic Resonance Imaging (MRI) data. Based on this experiment we propose how this imaging biomarker could have an impact on the understanding of neurodegenerative diseases such as Alzheimer’s.
Restricted Boltzmann machines (RBMs) are energy-based neural-networks which are commonly used as the building blocks for deep architectures neural architectures.
In this work, we derive a deterministic framework for the training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer (TAP) mean-field approximation of widely-connected systems with weak interactions coming from spin-glass theory. While the TAP approach has been extensively studied for fully-visible binary spin systems, our construction is generalized to latent-variable models, as well as to arbitrarily distributed real-valued spin systems with bounded support.
In our numerical experiments, we demonstrate the effective deterministic training of our proposed models and are able to show interesting features of unsupervised learning which could not be directly observed with sampling. Additionally, we demonstrate how to utilize our TAP-based framework for leveraging trained RBMs as joint priors in denoising problems.
Analysis of histopathology slides is a critical step for many diagnoses, and in particular in oncology where it defines the gold standard. In the case of digital histopathological analysis, highly trained pathologists must review vast whole-slide-images of extreme digital resolution (100,0002 pixels) across multiple zoom levels in order to locate abnormal regions of cells, or in some cases single cells, out of millions. The application of deep learning to this problem is hampered not only by small sample sizes, as typical datasets contain only a few hundred samples, but also by the generation of ground-truth localized annotations for training interpretable classification and segmentation models.
We propose a method for disease localization in the context of weakly supervised learning, where only image-level labels are available during training. Even without pixel-level annotations, we are able to demonstrate performance comparable with models trained with strong annotations on the Camelyon-16 lymph node metastases detection challenge.
We accomplish this through the use of pre-trained deep convolutional networks, feature embedding, as well as learning via top instances and negative evidence, a multiple instance learning technique from the field of semantic segmentation and object detection.
Detection of interactions between treatment effects and patient descriptors in clinical trials is critical for optimizing the drug development process. The increasing volume of data accumulated in clinical trials provides a unique opportunity to discover new biomarkers and further the goal of personalized medicine, but it also requires innovative robust biomarker detection methods capable of detecting non-linear, and sometimes weak, signals.
We propose a set of novel univariate statistical tests, based on the theory of random walks, which are able to capture non-linear and non-monotonic covariate-treatment interactions. We also propose a novel combined test, which leverages the power of all of our proposed univariate tests into a single general-case tool.
We present results for both synthetic trials as well as real-world clinical trials, where we compare our method with state-of-the-art techniques and demonstrate the utility and robustness of our approach.