Marine organisms are expected to be an important source of inspiration for drug discovery. At present, very few databases exist that are dedicated to marine natural products research. Therefore, to meet the demand for mining and sharing of marine natural products related data resources, the researchers behind this study developed a comprehensive marine natural products database.
Marine natural products
Natural products and their molecular frameworks play a significant role in the drug discovery and development pipeline. More specifically, around two-thirds of all small-molecule approved drugs produced between January 1981 and September 2019 originated from natural products. Advances in sample collection, compound separation and structure determination techniques have increased the interest and attraction of marine natural products as a resource for drug discovery. The oceans extreme variations in pressure, salinity, temperature, pH, availability of nutrients and light make the secondary metabolites of marine organisms present incredibly diverse. In total, over 30,000 marine natural products have been discovered since the first report of biologically active spongothymidine in 1950. As of writing, several marine-derived drugs have since been approved by the FDA, including Ziconotide and Eribulin, and there are many more candidates in clinical trials. Consequently, marine innovative drug discovery has become a hotspot for global drug discovery and development.
Existing databases of marine natural products
Access to suitable databases is crucial for the comprehensive research of marine natural products. Despite this, there are still only a small number of databases dedicated to marine natural product research. The commercial database MarinLit and the Dictionary of Marine Natural Products are currently the most exhaustive marine natural product databases, but they have subscription fees that may prevent their broader access and use in research. More recently the MarinChem3D database was produced, which contains 3D structures, but its biological activity data is limited.
Other smaller databases also exist, but most of these have not been updated for a long time. Additionally, generic chemical databases such as Reaxys, PubChem and ChEMBL include a certain number of marine natural products, but they lack specific annotations making it difficult to retrieve these products from the millions of compounds that these databases include. Subsequently, there is still a need for a free and complete marine natural product database.
Development of a complete marine natural products database
The researchers behind this study developed CMNPD, a comprehensive marine natural products database, which includes information on chemical entities with various physicochemical and pharmacokinetic properties, standardized biological activity data, systematic taxonomy and geographical distribution of source organisms, and detailed literature citations. CMNPD aims to provide an open access knowledge base to facilitate the research and development of marine drugs.
The data included in CMNPD was manually curated from the annual MNP reviews in Nat. Prod. Rep, which includes more than 20,000 articles. The structural information extracted from these articles was then used as a query to search through generic chemical databases and patents of hit compounds, which were then integrated into a general document library. In order to improve the efficiency of structure extraction, the optical chemical structure recognition tool – CliDE – was used to convert the graphical representations of chemical structures to a machine-readable format and transfer them into the chemical editor – ChemDraw – for manual inspection and correction.
To estimate the drug-likeness of each compound, some physiochemical and pharmacokinetic properties were calculated using widely accepted algorithms. Physiochemical properties were calculated using RDKit, including properties such as molecular weight, molecular mass and polar surface area. Predictions of pharmacokinetic properties were calculated using Pipeline Pilot ADMET models. These pharmacokinetic properties include blood brain barrier penetration, human intestinal absorption, aqueous solubility and plasma protein binding.
Current content of the CMNPD marine natural products database
The first release of the CMNPD database contained 31,561 distinct chemical entities of marine natural products from over 13,000 different sampling organisms. The organisms are distributed in 7 kingdoms, 38 phyla, 93 classes, 289 orders, 682 families, 1480 genera and 3,354 species. There are over 15,774 active compounds included, which are mapped to 2,652 targets with 72,343 bioactivities. These targets include 1,122 single proteins, 923 cell lines, 459 organisms, among others. The accompanying document library included 128,488 scientific literature and patents, of which around 11,000 articles describe the discovery of new compounds and structure revisions.
To improve the quantity and quality of the data, the researchers ensured that CMNPD provides a deposit system, which allows users of the database to submit new compounds, new data of existing compounds and corrections to existing data. However, only published data is accepted, and references must be attached to ensure reliability of the data.
To make the most of the full potential on offer by the chemical diversity of secondary metabolites from marine organisms for drug discovery, the researchers developed CMPD as an open access knowledge base with comprehensive marine natural product data. CMNPD supplies accurate chemical structures and various calculated physiochemical and pharmacokinetic properties for computer-aided drug design. In the future, the researchers expect CMNPD to grow continuously with extensive data deposition and integration, enabling the database to become an even more comprehensive MNPs repository that could lead to an exciting wave of marine drug development.
Image credit: bedneyimages – FreePik