A recent study carried out a comparative evaluation of network-based machine learning algorithms for network link prediction in the application of drug-target and drug-drug interactions.
Network-based machine learning approaches
Traditional machine learning approaches for predicting drug-target interactions (DTIs) have many constraints, including dimensionality and incompleteness, sparsity and heterogeneity. Additionally, the hierarchical nature of biological connections cannot be easily modelled by traditional machine learning examples. Use of machine learning approaches for large datasets requires extensive pre-processing of the data, making the process impractical. Therefore, there is a need for methods and models capable of addressing these application issues.
More recently, network-based machine learning approaches have been gaining attention because of their simplicity. These approaches consider high dimensionality and heterogeneity as well as implicit relationships. This study aimed to evaluate the performance of a variety of different network-based machine learning algorithms using publicly available pharmacological datasets, and reported the performance of each model according to different evaluation metrics.
Methodology behind network link prediction
To carry out their study the researchers exploited network-based link prediction models for solving the following drug discovery problems:
- Drug-target interaction prediction – predicting which drug will affect which protein for drug repurposing.
- Drug-drug side effect prediction – from existing side effect data, the researchers created a network, in which a link reflects the two drugs that have shown some side effect. This will therefore allow the researchers to predict whether new drug combinations will result in side effects.
- Disease-gene association predictions – the researchers carried out this task to predict which new diseases will affect a particular gene.
- Disease-drug association predictions – predicting which drug is associated with which disease.
The researchers applied 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on the evaluation metrics AUROC, AUPR and F1-score. Researchers use the performance metric AUROC (the area under the receiver operating characteristic) to evaluate classification models, as it tells you about the models’ ability to discriminate between cases and non-cases.
AUPR (the area under the precision-recall curve) shows the trade-off between precision and recall for different thresholds. A high area under the curve represents a high recall and high precision, where high precision equates to a low false positive rate and a high recall relates to a low false negative rate. Finally, an F1-score is a measure of a model’s accuracy on a dataset and combines the precision and recall of a model, where the higher the F1 score the more precise and the better the recall of a model.
On the Disease-Gene associated (DGA) dataset, the Average Commute Time (ACT) model achieved the best AUROC score, the LRW3 model with 3 steps achieved the best AUPR score, and the LHR2 model with a 0.95 parameter achieved the best F1-score. Meanwhile, on the Drug-Disease Association (DDA) dataset, the ACT model achieved the best AUROC score again, the LRW with 5 steps achieved the best AUPR score and the LHN2 with parameter 0.95 had the best F1-score. On the Disease-Target Interaction (DTI) dataset, NetMF was the best performer on all three metric scores. Similarly, on the MATADOR dataset, NetMF performed the best with all three metrics. Whereas, on the Drug-Drug Interaction (DDI) dataset, the Prone model was the best performer across all three metrics.
Overall, out all the models tested the Prone, ACT and LRW5 models performed best across the five benchmark datasets.
This study presents a comparative evaluation of network-based machine learning algorithms for network link prediction, with applications in the prediction of drug-target and -drug-drug interactions. Across all of the benchmark datasets used in the study, the Prone, ACT and LRW5 models performed the best on average. This work can be used by to guide researchers in the appropriate selection of machine learning methods for drug discovery.
Image credit: FreePik