July 16, 2019
Scientific Reports

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data

Biology
Abstract

Timely assessment of compound toxicity is one of the biggest challenges facing the pharmaceutical industry today. A significant proportion of compounds identified as potential leads are ultimately discarded due to the toxicity they induce. In this paper, we propose a novel machine learning approach for the prediction of molecular activity on ToxCast targets.

We combine extreme gradient boosting with fully-connected and graph-convolutional neural network architectures trained on QSAR physical molecular property descriptors, PubChem molecular fingerprints, and SMILES sequences. Our ensemble predictor leverages the strengths of each individual technique, significantly outperforming existing state-of-the art models on the ToxCast and Tox21 toxicity related bioactivity-prediction datasets. We provide free access to molecule bioactivity prediction using our model.

Authors
Alberto Romagnoni
Simon Jegou
Kristel Van Steen
Gilles Wainrib
Jean-Pierre Hugot
International Inflammatory Bowel Disease Genetics Consortium (IIBDGC)