Mobile Menu

Data-driven approach to identify PROTAC targets

Proteolysis-targeting chimeras (PROTACs) are an emerging class of drugs that may offer new opportunities to overcome limitations associated with small-molecule therapeutics. A recent study has used data from publicly available databases to develop a series of criteria to systematically identify PROTAC targets. This approach could help to support decision-making on whether a particular target may be amenable to modulation using PROTACs.

Proteolysis-targeting chimeras (PROTACs)

PROTACs were first described in 2001 and are designed to degrade target proteins by redirecting the ubiquitin-proteasome system. Overall, ubiquitin-directed degradation involves two broad steps: (1) tagging the target protein via covalent attachment of several ubiquitin molecules (polyubiquitylation), followed by (2) subsequent recognition and degradation of the tagged protein by a proteasome. During tagging, an E3 ubiquitin ligase transfers ubiquitin from a recruited E2 ubiquitin-conjugating enzyme to a lysine residue on the target protein via an isopeptide bond.

The key role of a PROTAC molecule is to drive the formation of the ternary complex by bringing a specific E3 ligase into proximity of the target protein, therefore catalysing the ubiquitylation process. PROTACs represent a powerful tool to extend the druggable space to new target types that were previously considered undruggable. Examples include targets such as transcription factors and RNA-binding proteins. In contrast to traditional small-molecule-mediated functionality, PROTACs act as catalysts, accelerating ubiquitination and degradation of the target protein. This may result in different pharmacodynamic consequences compared with traditional inhibition. Furthermore, despite PROTACs usually being larger in size than drug-like small molecules, they can show good tissue penetration, hence why interest is growing in PROTACs both as potential therapeutics and as chemical biology tools.

The PROTACtable genome – identifying PROTAC targets

Inspired by the concept of the ‘druggable genome’, the researchers behind this study refer to the set of potential PROTAC targets as the ‘PROTACtable genome’. The team worked to identify which potential drug targets may be most amenable to PROTACs by integrating information from various publicly available data sources to deliver a genome-wide analysis.

Several groups have worked to define the concept of the druggable genome since the concept was first introduced around 20 years ago. In the case of this study, the teams used work previously carried out by GlazoSmithKline, who explored the question of target tractability. In the GSK tractability approach, proteins of interest were assessed against a series of criteria and were assigned to a range of tractability buckets based on data from either the literature, or from derived knowledge for the two most common drug modalities – small molecules and antibodies.

Importantly, a target may appear in none or more than one of the buckets, depending on the available data. Together, these buckets define an approximate hierarchy, such as where targets with existing marketed drugs or clinical precedence are assigned a higher level of tractability. This level of the hierarchy is followed by targets for which there may only be preclinical data, and then finally targets for which there is no or only very limited existing data.

Using existing data to identify PROTAC targets

In this study, the researchers took an analogous approach. The team used the set of existing data and information about a target that will help to determine whether a PROTAC approach is viable or not as the starting point of their PROTAC tractability (PROTACtability) workflow. The researchers defined targets as human proteins with a corresponding coding gene (the one target per gene approach) represented by a primary UniProt ID.

The information that the team focused on for each protein includes the cellular location of the protein, evidence that the protein has ubiquitylation sites, information about the protein’s half-life and the availability of one or more small-molecule ligands to provide evidence for binders to the target. The teams PROTACtability workflow also incorporates data on targets that have been reported in the PROTAC literature and on PROTAC molecules in clinical development. This study used the same bucket terminology as GSK to define the distinct categories of information that may or may not be present for a given molecule.

The different defined buckets in this study are as follows:

  1. Clinical
  2. Literature
  3. Ubiquitylation
  4. Turnover
  5. Small-molecule binder
  6. Location

How does this study go beyond previous attempts to identify PROTAC targets?

One important distinction between the bucket scheme used in this study compared with the previous work done by GSK, is that there is a less pronounced relationship between the different buckets for PROTACtability. Assignment to some buckets is clearly strongly linked to the PROTACtability of a target, such as the buckets including targets with PROTACs in clinical trials or reported in literature.

The approach developed in this study was able to identify 1,067 proteins of the human proteome that have not yet been described in literature as PROTAC targets that offer potential opportunities for future PROTAC-based efforts.


This study developed an approach that can be used in combination with other data to help drug discovery researchers identify and prioritise PROTAC targets. For the first time, this research has enabled assessment and quantification of the potential of the PROTACtable genome. Despite the researchers focusing on PROTACs as the most advanced of the emerging degrader technologies, the team hope that their approach will provide a tool for the study of different types of degrader molecules as more data become available.

More on these topics

Data / drug discovery / Proteins

Share this article