One of the most prominent topics in drug discovery is the efficient exploration of the vast number of drug-like chemicals to find synthesizable and novel chemical structures with desired biological properties. To address this, the researchers behind this study created DrugSpaceX, a database based on expert-defined transformations of approved molecules.
Researchers have spent years attempting to explore and generate drug-like chemicals accurately and effectively in large virtual chemistry spaces. Despite recognised pitfalls, virtual screening is still a practical route in searching for novel bioactive compounds and pharmaceutical research. Several databases exist to try and assist in the discovery of drug-like compounds. However, these databases are created by rule-based transformations and usually lack structural diversity due to limited reaction rules.
To explore the space of drug-like compounds more efficiently, the researchers constructed a virtual compound library called DrugSpaceX, based on transformation rules with approved drug molecules as the starting point.
Creation of the product database of drug-like chemicals
The researchers created a drug set from a list of 2,215 approved small molecule drugs contained in the DrugBank Small Molecule database. Using the drug set as a starting point, DrugSpaceX was built based on transformation rules to explore the chemical space of drug-like molecules. In the first round, 2,215 approved drugs were transformed through the transformation rules on the StarDrop software platform. This resulted in the sample set, which includes 937,230 products after the duplicates were removed. The application of one generation of transformations resulted in 423 child compounds, which suggests that exhaustive enumerations through more than two generations would be intractable. Subsequently, the researchers only performed two rounds of transformations, which gave rise to a final database of 100,946,534 products.
To assess the proportion of novel compounds contained within DrugSpaceX, the researchers removed existing chemicals that are collected in various databases. The proportion of novel structures following the first round of transformations was 95.31% and the proportion of novel compounds following the second transformation was 99.58%.
Creation of representative samples
For the assembly procedure, 100 million products were divided into three sets of representative samples with different sizes, which represent the Drug Set, the Sample Set and the All Data Set. Additionally, five different collections of DrugSpaceX compounds were prepared based on the following criteria:
- An extended drug-like subset (DSX-EL)
- A drug-like subset (DSX-DL)
- A lead-like subset (DSX-LL)
- A fragment-like subset (DSX-FL)
Finally, to reduce the number of DrugSpaceX compounds to make the database more accessible, the researchers performed a random selection of 10% of the compounds.
Chemoinformatic analyses have shown that the DrugSpaceX database possesses excellent structural novelty and diversity, as well as large three-dimensional space coverage. Moreover, when the database was tested on identification of Discoidin domain receptor 1 kinase inhibitors it was evident that the DrugSpaceX was able to offer a viable alternative for rapid lead identification. DrugSpaceX also provides annotations and display functions, which will be useful to guide lead compound optimisation.
The researchers behind this study created DrugSpaceX, a free to access database of drug-like compounds, which can be used by investigators to find synthesizable and novel chemical structures with desired biological properties. In the future, it is hoped that the scientific community will leave feedback on the compounds included in the database, which the researchers will gather to update the products in future versions of the database.