A recent study, published in Scientific Reports, designed a test system using machine learning to systematically examine the structural features that may characterise multi-target compounds, which are important for the treatment of multifactorial diseases such as cancer.
Polypharmacology and multi-target compounds
Polypharmacology results from the in vivo modulation of multiple targets, which is often required for effective management of multi-factorial diseases such as cancer. Doctors can achieve polypharmacology through drug combination therapy. However, administration of multi-target drugs is generally preferred. Multi-target activity of small molecules is based upon the molecule’s ability to form ‘pseudo-specific’ interactions with different targets. At present, researchers know very little about these ‘selectively nonselective’ interactions at the atomic level. Despite this, understanding such interactions and their molecular targets is critical for learning how to design new multi-target drugs, which is a highly topical issue in drug discovery.
Systematic analysis of X-ray structures of complexes formed from families with multi-target compounds has revealed that around half of the ligands bound with similar conformations to multiple targets, but formed different target-dependent interaction patterns in their binding sites. Conversely, multi-target compounds interacting with functionally distinct targets often displayed different binding modes. Furthermore, multi-target ligands can adopt similar binding mode interactions with some targets, and very different ones with others. Therefore, binding characteristics of multi-targets compounds varied greatly and are not generalisable.
Structural activity relationship analysis of multi-target compounds using machine learning
Researchers do not fully understand whether multi-target compounds share particular structural features, which could be responsible for their ability to interact with different targets. Structure-activity relationship (SAR) analyses have so far not identified common structural signatures of multi-target compounds. However, more research is required to identify common structural signatures, since preliminary results from research using machine learning has provided evidence for their existence. Researchers have trained different ML models to systematically distinguish between multi-target and single-target compounds from medicinal chemistry with activity against related or unrelated targets on the basis of chemical structure. These existing models have reached more than 70% accuracy in predicting compounds with multi-target activity.
Researchers observed similar results when distinguishing between multi- and single-target compounds from biological screens being tested in large numbers of assays. Negative assay results were available for the screening compounds, meaning groups of multi- and single-target compounds could be assembled. This therefore ensured data completeness for multi-target predictions. The accuracy of the predictions strongly depended on nearest neighbour (NN) relationships between multi- or single-target compounds. Removal of compounds forming NN relationships from the training sets, resulted in significant reduction of prediction accuracy. The researchers found that many single-target and multi-target compounds formed separate analogue series. But only a few series were identified, which combined single- and multi-target compounds. Hence, many multi-target compounds were more similar to each other than to single-target compounds and vice versa.
Taken together, these findings raised a key question for characterising the basis of multi-target compounds. Therefore, the researchers behind this study developed a diagnostic test system using ML to systematically examine structural features that might characterise compounds with multi-target activities.
Diagnostic machine learning test to identify characteristics of multi-target drugs
From compounds with known activity against current pharmaceutical target proteins, data sets were assembled that contained at least 50 compounds with activity against a single-target compound (A), 50 compounds against another single-target compound (B), and 100 compounds active against both of the single-targets A and B (dual-target compounds). Each data set represented a unique combination and dual-target compounds represented prototypic dataset specific multi-target compounds. For each data set, the researchers generated different ML models on the basis of chemical structure in order to distinguish between double-target and corresponding single-target compounds (native predictions). The researchers then used each target pair-specific classification model to predict the test sets of all other target pairs.
The rationale behind the researcher’s method was that if structural features characteristic of multi-target compounds exist, native ML predictions should be successful. Moreover, if characteristics were common to many multi-target compounds, cross-pair predictions should succeed, in principle. In contrast, if features characteristic of multi-target compounds are confined to individual target combinations, cross-pair predictions would not be generally successful and would mostly fail. Therefore, this test is a diagnostic approach, meaning that positive and negative predictions are an indicator of the presence or absence of structural multi-target relationships.
Overall, the test system developed in this study enabled the use of ML to distinguish between double-target and corresponding single-target compounds for over 170 qualifying target combinations with high-confidence activity data. While the majority of qualifying target pairs originated from the same protein families, most cross-pair predictions involved target pairs originating from different families. Native predictions consistently distinguished between double-target and single-target compounds with high-accuracy (greater than 80%). However, systematic cross-pair predictions essentially failed, with the exception of a few compounds, due to target correlations. The findings of this study provide compelling evidence that characteristic features of double-target compounds that set them apart from single-target compounds exist and depend on target combinations they are active against. By contrast, there were no detectable global features that generally characterised compounds with multi-target activity.
Weighting and mapping features from target pair-dependent support-vector machine (SVM) classifiers highlighted substructures in double-target compounds that determined predictions. Identified substructures could be potential signatures in multi-target ligand design.
This study developed a diagnostic test using machine learning algorithms to characterise multi-target compounds. The researchers stated that as long as a meaningful diagnostic model can be generated for a target combination of interest, features characterising the double-target compound can be identified and explored further.
Image credit: rawpixel – FreePik