Knowledge graphs are critical to many industries today, with big tech giants such as Google and Facebook driving their recent spike in popularity. Applied to pharmaceutical research, knowledge graphs provide a framework to take advantage of huge libraries of prospective drugs and isolate which compounds could be functionally relevant. These tools allow us to model biological complexity in minute detail and thereby make drug discovery more efficient and economical.
What are they?
Surprisingly, there isn’t a consensus of what constitutes a knowledge graph. However, there are some generally agreed-upon features including:
- They are in fact, graphs. These graphs demonstrate relationships in the data, but also make the tools very flexible.
- Knowledge graphs are semantic, meaning that the data has ontologies applied to it. Being able to express data in terms of entities makes querying easier.
- Knowledge graphs provide inference as they are based on ontologies. This allows the derivation of implicit information.
But wait, what are ontologies?
In artificial intelligence ontologies are defined as a specification of meanings, that is they describe types, properties and inter-relationships between entities in a system.
Knowledge graphs are one way to represent ontologies, particularly where there are values assigned to the entities and the relationships between them are mapped.
Why use knowledge graphs?
- They provide an encompassing framework through which to leverage data across an enterprise
- They enable AI and machine learning across data at scale
- They allow interpretability by both humans and computers
- Rather than acting as a search query, they produce answers
Challenges with knowledge graphs:
- Knowledge extraction and integration: Despite the advances in ML and NLP, extraction of named entities and defining the relationships between heterogeneous sources remains a challenge. Tools such as entity recognition, classification, text, and entity embeddings, can help to link and integrate unstructured data.
- Entity disambiguation and resolution: Synonyms across different sources may refer to the same entity differently, and conversely, entities with shared names may have different properties. Unique and normalised identifiers need to be applied across entities in the graph. A variety of tools have strengths here including embedding approaches, probabilistic methods, rule-based approaches, and supervised and unsupervised learning.
- Managing changing data and detecting errors: To be an effective entity-linking system, a knowledge graph must grow naturally with ever-changing data input. Managing highly dynamic data poses greater challenges than point-in-time knowledge. Automated approaches to knowledge verification include probabilistic graphical models and natural language inferences.
- Operating at scale: It’s a double-edged sword that by allowing us to model, represent, and query great quantities of biological data, these challenges in using knowledge graphs are compounded by scale.
Vendor example:
The Benevolent AI Platform™ is an experimental discovery platform that focuses on three key areas: target identification, molecular design, and precision medicine.
At the core of the platform sits the Benevolent Knowledge Graph which integrates many data types from many sources. Concepts derived from this data form the base representations for entities. This enables researchers to extract and harmonize relationships between them. E.g. interactions between proteins, genes and mechanisms, or relationships between diseases and drugs.
Further resources:
For further reading on knowledge graphs Peter Henstock, Machine Learning & AI Technical Lead at Pfizer, recommends the following textbooks:
- Social and Economic Networks by Matthew O. Jackson (Amazon)
- Networks, Crowds, and Markets: Reasoning about a Highly Connected World by David Easley and Jon Kleinberg (Amazon)
- Linked: The New Science of Networks by Albert-László Barabási
- Network Science by Albert-László Barabási (Amazon)
- Understanding Social Networks: Theories, Concepts, and Findings by Charles Kadushi (Amazon)
- Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data by Richard Brath and David Jonker (Amazon)
- Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences) by Stanley Wasserman and Katherine Faust (Amazon)
Title image credit: OpenDataScience