Mobile Menu

Outer product-based convolutional neural network to predict the outcome of clinical trials

Researchers at Dankook University in Korea have put forward an outer product-based convolutional neural network (OPCNN) model to accurately predict the outcome of drug clinical trials. The development of reliable models to predict drug performance would greatly increase the speed of drug discovery and development by helping to remove drugs that are unlikely to be successful from the pipeline. 

Current hindrances in drug development

Successful drug development has been increasingly hindered by unexpected adverse effects occurring at various stages of clinical trials. To address this problem, accurately screening drugs with a high probability of failure allows for their removal at the early stages of development. This would elevate the success rate of clinical trials and make efficacious drugs available to patients more rapidly.

For this purpose, PrOCTOR was previously developed by a separate group at Weill Cornell Medicine. In summary, this data-driven approach uses random forests that combine chemical features of drugs and target-based properties to predict the outcome of clinical trials.

Building upon PrOCTOR, the researchers behind this study developed a two-dimensional OPCNN model that accurately predicts clinical trial success and failure. The OPCNN model integrates biological network features, genotype-tissue expression (GTEx) features and target loss frequency.

Dataset used the evaluation of their convolutional neural network

To assess the proposed OPCNN, researchers employed a dataset containing 757 approved drugs and 71 failed drugs. Overall, the researchers used 47 input features to describe each drug, including 10 molecular properties, 34 target-based properties and 3 drug-likeness rule outcomes. Specifically, the target-based features incorporate the median expression of known gene targets in 30 tissue types. The remaining 4 features encapsulate network connectivity and loss-of-function mutation frequency of target genes. Meanwhile, the rules used by the researchers were Lipinski’s Rule of Five, Veber’s and Ghose’s rules. Each of which proposes a set of physicochemical parameters to evaluate the drugability of candidate biomolecules.

Overall, the authors noted that the dataset has a 10.622 imbalance ratio of passed to failed drugs. To address the resulting class-imbalance problem, they combined the synthetic minority oversampling technique with cost-sensitive learning methods. This balances class distributions by oversampling the minority class, or the failed drugs, and undersampling the majority class, or the passed drugs. Subsequently, a heavier cost is then placed on minority class misclassifications to avoid problems in overfitting and discarding useful information.

Outer product-based convolutional neural network model development

In the OPCNN model, each drug, i, is represented by a pair comprised of a feature vector, x­i, and its corresponding clinical outcome, y­i. There are also two feature vectors, namely chemical feature vector and target-based feature vector. Additionally, as the data is bimodal, the researchers adopted an augmented outer product to join two different modalities.

Taken directly from the paper by Seo et al., Figure 1 below outlines how OPCNN predicts successes and failures of clinical trials. The model is comprised of 3 residual blocks and 5 fully connected layers. Representative feature vectors are first calculated for both chemical and target-based feature vectors. Following that, the outer products between each representative feature vector are calculated. Next, a 2D convolutional neural network is applied to extract features from the outer products to predict the outcome of clinical trials.

Figure 1. (A) The OPCNN workflow in predicting clinical trial outcomes. The fully connected (FC) layers contain different numbers of nodes, as represented in the parentheses. (B) Each residual block contains 3 convolution layers.

OPCNN accurately predicts clinical trial outcomes

The researchers evaluated the performance of the OPCNN model with 10-fold cross-validation using several statistical metrics. This was repeated 20 times to generate reliable results, for which the means were reported. The model demonstrated up to 0.9758 for accuracy and up to 0.9868 for F1-score. The researchers evaluated their model using the F1-score as is more useful for imbalanced classes, since it accounts for false positives and false negatives when measuring accuracy. Meanwhile, up to 0.9889 was achieved for precision, or the probability of correctly detecting positive values. A value of up to 0.9893 was achieved for recall, which accounts for the ability of the model to differentiate between classes.

Two ranking order metrics were also used to asses their model: area under the curve (AUC) and area under the precision curve (AUPRC). The model showed up to 0.9824 for AUC of the receiver operating characteristic, and up to 0.9979 for AUPRC. The latter is generally preferred over AUC for evaluating imbalanced classes. The Matthews correlation coefficient (MCC) was also used, which is a reliable measure of biomedicine performance for imbalanced classes. When measured using MCC, the model achieved up to 0.8451.

Notably, the model demonstrated high means in F1-score, AUPRC and MCC, which are all reliable metrics for imbalanced classification. These results suggest that OPCNN is an effective model for predicting the successes and failures of drug clinical trials. However, one drawback of the model is that the dataset lacks enough samples to facilitate its use in deep learning. Moving forward, the researchers hope to apply OPCNN to a larger dataset to verify its efficacy.


This study proposes an OPCNN model that predicts drug performance based on chemical features of drugs and target-based features. The model exhibited good predictive accuracy following 10-fold cross-validation. With further validation, OPCNN may represent an effective means to determine the outcome of clinical trials. This will allow for the elimination of drug candidates with a low probability of success early in the clinical trial pipeline, thereby accelerating drug discovery and development.

Image credit: topntp26 – Freepik 

Share this article