In a recent article, published in Drug Discovery Today: Technologies, Pedro J. Ballester analyses existing machine learning scoring functions for structure-based virtual screening (SBVS).
Structure-based virtual screening
Virtual screening (VS) is a computational technique used in drug discovery. To identify structures which are mostly likely to bind to a drug target, it searches libraries of small molecules. SBVS specifically relates to when a 3D structure of the protein target is available, and the binding site is known. To approximately predict binding affinity between two molecules once they have docked, scoring functions are used. In SBVS, scoring function is utilised to rank molecules docked to a therapeutic target. Those predicted to bind more strongly will be the top-ranked molecules.
Scoring functions developed using machine learning have demonstrated the most remarkable accuracy on various drug design applications. ML-based scoring functions have been found to provide accurate SBVS performance on many targets. However, there is uncertainty as to which scoring function would be appropriate for a given target.
In this article, Ballester provides insight into this crucial question. He analyses two approaches to select an existing scoring function for the target and a third approach focused on generating a scoring function tailored to the target.
Selecting scoring functions based on published evaluations
This option is available if the target of interest is in a published study and a given scoring function performs well on the corresponding test set. It is the fastest option, yet the least reliable one.
Property-Matched (PM) decoys are the popular choice for benchmarking SBVS methods. They are molecules selected on the basis of being hard to distinguish and being likely inactives. The test set for the target generally consists of a set of known actives and a larger set of decoys. Ballester highlights that the results of scoring functions that are trained and tested on data using the same PM decoy selection criteria are likely to be overestimated. However, if these sets employed decoys that are selected in different ways, then the retrospective performance of the corresponding scoring functions should in theory be trustworthy.
Selecting scoring functions based on your own evaluation
If the scoring functions have not been evaluated properly or at all on the target of interest, creating your own evaluation of selected scoring functions may result in the identification of a predictive scoring function. This involves retrieving all known actives for the target within relevant databases to include in a test set. Decoys will make up a much higher proportion of the test set.
Ballester emphasises that researchers must strive to generate synthetic benchmarks that represent the diversity and distribution of inactives in the intended test set. However, if a high-throughput screening dataset exists for the target, using it as a test set represents the most realistic benchmark.
Building and evaluating a tailored machine-learning scoring function
Building a scoring function tailored to the target is generally more predictive than selecting the scoring function with the best average performance across targets when no sufficiently predictive scoring function is identified on the test. Several machine-learning algorithms exist that can be tuned to a target once data instances and features are in place.
Ballester believes that this approach is the most promising. This is because it exploits target specific data and/or features that have been found to be more predictive than scoring functions using data from any target and generic features. However, he notes that it is the most difficult to implement. Nevertheless, this approach has now been simplified with integrated well-documented software packages.
Image credit: By Harryarts – www.freepik.com