Researchers at Queen Mary University of London have developed a machine learning algorithm – DRUML – that ranks drugs based on their efficacy in reducing the growth of cancer cells. This approach may have the potential to facilitate personalised therapies, by enabling oncologists to select the best drug for each individual cancer patient.
Predicting efficacy of cancer drugs
Cancers derived from the same tissue of origin and pathological classification exhibit high levels of genetic and phenotypic variability within individuals. In theory, this heterogeneity can translate to patients having differential responses to therapy. To address this problem, personalised medicine aims to identify measurable biomarkers that correlate with the efficiency of therapies in individuals. This allows clinicians to predict patient responses to these therapies.
For decades researchers have used protein biomarkers to direct several targeted cancer therapies. Examples include expression of HER2 and oestrogen receptor. These predict the responses of breast cancer patients to trastuzumab and tamoxifen, respectively. Affordable next-generation sequencing has enabled the identification of genetic markers that predict responses to several targeted drugs. Therefore, the majority of precision medicine approaches use DNA sequencing methods and other genetic analyses. However, despite the success in some therapeutic contexts, the identification of responses remains difficult for multiple drugs. This is due to the complex biological backgrounds of cancer, where multiple biochemical pathways compensate each other, resulting in the oncogenic phenotype. Therefore, mutations and other genetic aberrations are often inaccurate at stratification.
Existing ML methods to predict efficacy of drugs
Projects such as the Cancer Target Discovery and Development and Genomics of Drug Sensitivity in Cancer have evaluated ML as a means of predicting drug responses by associating gene expression patterns, genomic features, and copy number alterations to drug sensitivity. However, previous research has not applied these methods using large scale proteomics and phosphoproteomics data. This is despite research suggesting that proteomic-derived features may be able to predict drug responses more accurately than genomic alternatives.
A limitation has been the low sample throughput of proteomics by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS), compared with other omics techniques. Proteomic methods often involve comparing proteins after chemical or metabolic labelling. This therefore restricts the number of samples that can be compared and used as an input for ML model generation.
Recent improvements in LC-MS/MS throughput and label-free analysis, as well as recent availability of systematic drug response profiles for a large number of cell lines and drugs, now make it feasible to use proteomics and phosphoproteomics data as the input for predictive models of drug responses. It is therefore timely to assess the performance of ML models constructed using proteomics data, and essential to evaluate the accuracy and potential of proteomics to advance the field of precision medicine.
The researchers behind this study developed an ensemble of predictive models trained for 412 drugs with different modes of action and developmental stages, which they called DRUML. As an input to develop DRUML the researchers used large-scale proteomics and phosphoproteomics data from 48 leukaemia, liver, and oesophagus cancer cell lines. The investigators obtained drug response data by measuring the area above the curve (AAC values), for the cell lines. The AAC values were then scaled to give ranges from 0 (no effect of drug) to 1 (maximum killing) within a given cell line.
Since researchers have not yet systematically investigated the use of large-scale LC-MS/MS proteomics data for ML model generation, as part of DRUML development the researchers assessed the suitability of large-scale datasets as the input for predictive drug response models. Compared with small scale proteomic studies based on protein array methods, which measure a few proteins and phosphorylation sites, this study was based on the analysis of over 20,000 phosphorylation sites and around 7,000 proteins, therefore allowing for systematic and unbiased discovery of drug response markers. Their evaluation showed that phosphoproteomics data consistently produced the lowest training and validation errors.
Evaluation of DRUML
Assessment of DRUML using external verification datasets from 53 cell lines from independent laboratories revealed that their model could rank around 85% of the drugs with absolute errors of <0.15 and the drug rankings were statistically significant (through Spearman testing) within all cancer models tested. Moreover, this finding is even more promising, given that DRUML was trained using oesophageal and liver cancers, whereas the verification datasets contained data from cell lines derived from bone, brain, breast, cervix, ovary, colorectal and prostate cancers.
This study assessed the accuracy of DRUML to produce lists of drugs ranked by their predictive drug efficacy in reducing the proliferation of a given cancer cell population. Their results indicated that DRUML ranks drugs of different modes of action based on their predicted efficacy across different cancer types with reasonably low error. Moving forward, DRUML could assist drug prioritisation by complementing information obtained from clinicopathological parameters and mutational analysis.
Image credit: xb100 – FreePik