For the first time, a review explored in depth how big data and artificial intelligence (AI) techniques are currently being implemented in drug discovery pipelines.
Big data is a term used to describe huge volumes of data, sized beyond the capabilities of commonly used software. There are five main characteristics of big data that are referred to as the 5 V’s – insights derived from high volume, high velocity and validated data collected from varied sources can add the most value. The advent of faster and cheaper technology combined with the development of computing power has led to the generation of big data in the pharmaceutical world.
Artificial intelligence (AI) is defined as intelligence demonstrated by machines, particularly computer systems. AI approaches have become a necessity at various drug discovery stages, such as the design of novel drug molecules, drug repurposing and the development of personalised medicine. R&D sectors of renowned pharmaceutical companies, such as Pfizer and GlaxoSmithKline, are adopting AI-based methods to manage big data and deliver cost effective solutions. It is estimated that the AI-based drug discovery market will reach $1.43 billion dollars in 2024, growing by more than 40% each year.
Big data resources in drug design
The first step of the drug discovery process involves screening huge ever-expanding data libraries that contain a multitude of properties, such as chemical structure, chemical assay, target structure and clinical data. AI plays a significant role in solving the complexity present in this big data. In fact, data-driven approaches have proven beneficial to such a great extent that the high demand for advanced computational algorithms has resulted in a shift from personal computers to high-performance computers, cloud computing and graphical processing units (GPUs).
Examples of big data utilised for drug discovery can be classified into various databases:
- Collection of chemical compounds – PubChem or ChEMBL.
- Drug-like compounds – Drugbank or e-Drug3D.
- Collection of drug targets – Binding DB or Supertarget.
- Collection of assay screening and metabolism studies – HMDB or TTD.
AI methods in drug design
AI can sort through big data to recognise and learn patterns using several methods:
- Machine learning – Uses a set of algorithms that do not require human intervention or explicit instructions for learning. Big data has opened huge opportunities for machine learning methods in the pharmaceutical industry as they can be developed to mine interesting patterns, such as predicting drug side effects.
- Deep learning – The successful training of deep learning neural networks (DLNN) typically requires vast amounts of data. Nevertheless, the approach has proven highly effective in the analysis of big data and useful in de novo study designs.
- Deep learning variants – Generative adversarial networks (GANs), which are a combination of generative networks and discriminator networks. GANs are used to distinguish real data from fake data and have proved useful in novel molecule design and optimisation of novel molecules with desired properties. Convolutional neural networks (CNNs) are mainly used for computer vision or image classification and have been implemented for the diagnosis of diseases, such as cancer.
- Autoencoder – Unsupervised neural networks that are mainly used for predicting features for drug target interactions and assessing drug similarities.
- Deep belief networks – Generative graphical models that can be trained in a supervised or unsupervised manner. They have applications in virtual screening, classification of multi-target drugs and classification of small molecules into drugs or non-drugs.
The role of AI technology in different phases of drug discovery. Image credit: P. Kaur et al, 2021
Future of AI and big data in drug design
It is clear that the recent breakthrough in data-driven technology has benefitted various phases of the drug discovery cycle, such as the rapid screening of virtual compound libraries, the prediction of molecule physical properties and the surveillance of patients. Therefore, it is unsurprising that the impact of AI algorithms in big data is growing in the academic sector and within pharmaceutical companies.
However, data-driven methods do present some limitations. These include problems with data accessibility or the fact that skilled personnel for the operation of AI applications in drug discovery are not yet readily available. Also, although big data and AI have sped up the drug design pipeline, clinical experiments still need to be conducted before the drugs can be approved.
Nonetheless, AI has and will continue to innovate drug discovery. These data-driven methods are set to be fine-tuned and become an essential tool in the search for novel drugs, embedding deeper within the pharmaceutical sector.
Image credit: FreePik fullvector