Mobile Menu

Knowledge Graph Approach to Connecting SARS-CoV-2 Viral Proteins to Host Function

A recent study used previously identified viral interactions and human host proteins to apply a machine learning-based approach for connecting SARS-CoV-2 viral proteins to relevant host biological functions, diseases and pathways in a large-scale knowledge graph derived from biomedical literature.

SARS-CoV-2 proteins

Like other positive-stranded RNA viruses, SARS-CoV-2’s encoded proteins interact with proteins of the infected host cell at various stages of its replicative cycle. Such proteins therefore represent possible targets for the development of antiviral drugs. Researchers have previously identified human host proteins that bind to overexpressed SARS-CoV-2 viral proteins in immortalised human cells using an affinity-purification mass spectrometry screen, which provides a starting point to study virus-host interactions via network-based approaches. Subsequent work has expanded on this by further integrating SARS-CoV-2 viral proteins into the human interactome and applying network biology approaches to identify existing drugs for repurposing.

Integrating SARS-CoV-2 viral proteins into a knowledge graph

In this study, the researchers integrated SARS-CoV-2 viral proteins into a large-scale knowledge graph, which included the human interactome as well as various cause-effect relationships curated from existing biomedical literature. In contrast to previous approaches, these relationships specifically distinguish between activating and inhibiting effects, enabling predictions to be made about the direction of drug effects on host functions that are important in clinical or disease context. For example, clinical observations have indicated that SARS-CoV-2 has an activating effect on the coagulation of blood, which has led researchers to look for drugs that have an inhibiting effect on coagulation in order to counteract viral effects.

Therefore, the advantage of the method used in this study, comparted to the purely interactome-driven network biology approach, is that by integrating other experimental evidence from the literature in the form of cause-effect relationships, the researchers are able to better elucidate relevant biological mechanisms, and propose repurposing candidates specifically targeted to block or counteract observed clinical endpoints in infection.

A machine learning approach to prioritize genes

The researcher’s algorithm uses machine learning to prioritize genes that are known or predicted to casually affect a given host function through either inhibition or activation. Their approach is based on the distributed representation of genes as vectors embedded in a high-dimensional vector space. Gene embeddings have previously been obtained from protein-protein interaction and co-expression networks, and used for function prediction. However, in this study they constructed embeddings from known causal effects on the expression of other genes, thereby explicitly distinguishing between up- and down-regulations. Their method has the advantage that the direction of effects is already encoded in the embedding vectors.

Building a knowledge graph approach

Overall, 70 networks involving viral proteins and a number of relevant ‘endpoint’ functions were computed and made available through a web interface called the Coronavirus Network Explorer (CNE). These networks represent the large spectrum of host biology that are affected by infection from SARS-CoV-2. Immunological signalling pathways, such as IL-1 and IL-6, were included as they describe the impact of the inflammatory response in COVID-19 patients. The researchers also included networks that display biological endpoints observed in severely or critically ill COVID-19 patients such as pneumonia and respiratory failure. A set of networks represents the complex viral life cycle and its counterpart host response, and the researchers also included networks for functions that are possibly hijacked by the virus itself for its replication/multiplication or transmission.


The knowledge graph approach outlined in this study is able to identify biologically plausible hypotheses for COVID-19 pathogenesis, explicitly connected to the immunological, virological and pathological observations seen in infected patients. The discovery of repurposing candidates is driven by knowledge of relevant functional endpoints that reflect known viral biology or clinical observations, therefore suggesting potential mechanisms of action. As a result, the researchers behind this study believe that their CNE offers relevant insights that go beyond more conventional network approaches, which will be a valuable tool for drug repurposing.

Image credit: kjpargeter – FreePik

Share this article