Mobile Menu

Powering drug discovery with 10 simple rules

A recent Perspective, published in PLOS Computational Biology, has highlighted 10 simple rules to power drug discovery with data science.

The development of novel high-throughput technologies has generated vast amounts of complex and diverse data. This in turn has supported the drug discovery and development process. In addition, clinical data created by companies or compiled by biobanks are growing at an exponential pace. To match this increasing data, methodological advances have driven developments in statistical modelling, machine learning and artificial intelligence. As a result, this has positioned data science as a critical component of pharmaceutical research. To realise the full potential of data science, both the structure and culture of organisations must adapt. This transformation is already being seen across the pharmaceutical industry with the expansion of large data science teams. In this Perspective, authors provide strategic recommendations to help those aiming to propel a digital culture shift and data science transformation within their organisation. A summary of these rules is below:

Rule 1: Establish data science as a core drug discovery discipline

The pharmaceutical industry define data science as the discipline at the interface of statistics, computer science and drug discovery. Therefore, it includes a range of roles, from statisticians to more recently machine learning engineers. To establish data science as a core drug discovery discipline the authors suggest that team composition needs to evolve at all levels – from leadership to project teams. There needs to be a greater understanding within leadership teams about the potential applications, limitations and pitfalls of data science. The inclusion of data science leaders in decision-making bodies is also important to connect data scientists with critical business questions. The authors argue that data scientists must be integral members of the project teams. They suggest that this will fuel projects with computational insights and predictions.

Rule 2: Engage data scientists ahead of data generation

Some believe that data analysis is easily and quickly done at the end of an experiment. However, poorly designed experiments often limit biological insight and make it more difficult for data scientists to try and correct for confounding or other issues. Experiments that take into account best practices and data analysis needs, lead to more accurate, interpretable and actionable results. The authors highlight that data science, data generation and analysis need to be given equal importance. Data scientists and experimental scientists must work together to understand each other’s requirements. Scientists can achieve this through regular dialogue and information exchange.

Rule 3: Enforce FAIR play

Retrospective data efforts to FAIRifying existing data is time-consuming and cost intensive. Therefore, it is critical to have in place FAIR play processes from the point of data generation. The authors advise organisations to set expectations and provide incentives for scientists generating data to include rich and harmonised metadata. They suggest that user-friendly study design tools should be integrated within both data production and analysis. Importantly, they highlight that clear data access rules need to be established as part of this FAIRification process. They state that there needs to be a shift away from a “my data” mindset, towards an “our data” mindset. 

Rule 4: Build analytics and visualization on top of an integrated data store

The challenge in data analysis is how to find the data of interest, connect it with the linked metadata and access and analyse the data in order to gain new insights. While following the FAIR principles is a cornerstone of this challenge, it is not enough by itself. Realising the full potential of the data requires the development of resources and tools to enable experts to explore, visualise and analyse data. The authors highlight that the ability to merge data and query across datasets is key. They suggest that strategic investments need to be made in data management, data repositories and FAIR play processes.

Furthermore, another key challenge is maintaining nimble systems for exposing the data while enabling a variety of needs and users through multiple graphical and programming interfaces. They suggest that both R and Python offer efficient ecosystems for data wrangling, modelling and visualisation. Integrating these software solutions with other domain-specific tools is a common challenge. However, the authors emphasise that the main principle of such a system should be to pursue and enforce best practices for reproducible research. 

Rule 5: Connect distributed data science teams through a strong community

The team propose a distributed model, whereby computational teams are embedded within each department to ensure continued exposure of biomedical data scientists to projects in well-defined areas of research. This model provides data scientists with wider opportunities to develop both their technical skills and their understanding of different aspects of drug discovery within their department. The authors highlight that connecting distributed data science teams is fundamental in effectively supporting the needs of individual departments, while enabling an enterprise-wide stewardship of data science assets and talents.

Rule 6: Promote a culture of digital savvy across the organization

Data scientists alone cannot achieve digital transformation. Experimental scientists and clinicians must also be engaged in the digital transformation by developing some level of digital proficiency. The authors suggest the roll out of company-wide education to allow scientists to make better use of novel data science technologies. The authors emphasise that they are not suggesting that experimental scientists need to be advanced in computational skills, nor computational scientists needing to know how to run experiments. However, they explain that mutual exposure and understanding will help create a more effective collaborative environment. Overall, the aim is to remove communication barriers and develop hybrid scientists that can effectively use data to bridge the two disciplines.

Rule 7: Embrace and deploy AI without hyping it

The widespread availability of advanced machine learning methods is helping to drive the current transformation in data science in the industry. While, these methods have been used for decades, the impressive performance of deep learning methods has garnered attention from the media. As a result, this has fuelled interest in AI among experts and nonexperts alike. The authors reinforce that the impact machine learning is having across the drug discovery pipeline and within healthcare should be highlighted. However, they argue that we must equally communicate the caveats, biases and limitations of these approaches.

Rule 8: Complement internal capabilities with strategic partnerships

The authors suggest that the development of an organisational model that systematically complements internal capabilities with external opportunities is critical. They highlight that reproducible data solutions that are aligned with external best practices are key for contextualisation of internal data with the wealth of public data. In addition, free open source software will help ensure FAIR data principles. They express that public-private partnerships provide excellent platforms for data scientists to collaborate not just between industry and academia but also across the spectrum of biotechnology and pharmaceutical companies.

Rule 9: Allocate sufficient and appropriate resources to data science teams

In order for data science teams to achieve their goals, they must be appropriately resources. The authors suggest that the ratio of computational to experimental scientists should progressively increase to supplement and empower data generation with proper analytics. The optimal balance is likely to vary by department. They recommend that organisations should employ at least 10% data scientists in research departments to enable biomedical data science at a scale that truly maximises impact.

The team emphasise that organisations must fully embrace the diversity of roles required in data science. To ensure that data scientists contribute in an optimal capacity, specialisation and complementarity of roles are needed. While most data scientists will be able to cover many roles, they all are likely to have specialities.

The authors state that organisations should consider appropriate financial investment in digital resources. Without additional investment, the authors suggest that data scientists will not be able to deliver maximal benefit. The appropriate resourcing and differentiation of data science functions is critical in catalysing a digital transformation in the pharmaceutical industry.

Rule 10: Invest in attracting and retaining talent

In recent years, a growing number of sectors are seeking the modern data scientist. For pharmaceutical companies to be at the forefront of biomedical innovation, they must focus on attracting and retaining top talent. The authors suggest that pharmaceutical organisations must continue to refine and align to benchmarks from the global data science market. They must provide data scientists with opportunities to develop and specialise. While extrinsic motivations are important such a salary, benefits and promotions, intrinsic motivations play an equally important role. Leaders must encourage teams to develop their own ideas and inspire a sense of purpose that goes beyond data analysis.

Image credit: Man vector created by freepik –

More on these topics

Data / drug discovery

Share this article