Knowledge Graphs
A knowledge graph is a way of organising data from different sources so that the links between different ‘entities’ (objects, people, or things) are clear. One way of understanding this is to think of a family tree that shows the relationships between 3 generations. The information in the family tree might be recorded like this:
- Queen Elizabeth II married to Prince Phillip
- Queen Elizabeth II mother of Prince Charles
- Prince Charles divorced from Princess Diana
- Princess Diana mother of Prince William
- Prince William married to Princess Kate
- Princess Kate mother of Prince George
In this example, the people (e.g., Queen Elizabeth II, Prince Phillip) are the ‘entities’ (known as ‘nodes’ in knowledge graphs), the metaphorical branches are the links between entities (known as ‘edges’ in knowledge graphs), and the relationships (e.g., married to) represent the ‘encoded knowledge’ held within the knowledge graph.
This way of structuring data is very useful because it can enable researchers to infer (or make an informed guess about) new previously unknown information. For example, if the following piece of information was added to the above family tree (or knowledge graph) ‘Prince Harry is brother of Prince William,’ it would be possible to infer that Princess Diana was also the mother of Prince Harry even if this piece of information was not previously known.
This capability of knowledge graphs is what makes them very popular in commonly used technologies. For example:
- Google uses a knowledge graph to enable its search functionality.
- Facebook uses a knowledge graph to identify connections between people.
- Netflix uses a knowledge graph to recommend films or TV shows to viewers.
Knowledge graphs are also used extensively in medicine. For example:
- Diagnostics:
A knowledge graph can be used to represent the links between symptoms and diseases helping technologies like chatbots provide ‘decision support’ to clinicians. - Drug repurposing
A knowledge graph can be used to show similarities between diseases so that researchers can see whether it might be possible to use a drug designed to treat Disease A to also potentially treat Disease B. - Personalised treatment planning
A knowledge graph can be used to show which drugs are most likely to be effective at treating a specific patient based on how similar they are to other patients like them that have been treated with that specific drug.
Knowledge graphs used to be created manually by human experts such as doctors, which was a very time-consuming process. Increasingly, AI is used to automate the creation of knowledge graphs by independently ‘learning’ (or figuring out) connections between different pieces of information contained in the datasets it is given. Alternatively, existing knowledge graphs may act as the ‘input’ for AI models acting as the training material and reducing the need to create large labelled datasets.
A knowledge graph is a directed label graph that represents the relationships between different entities (people, objects, or things). There are three key components in a knowledge graph:
- nodes (the entities)
- edges (the relationship between entities)
- labels (the meaning of the relationships).
Combined, these three components are known as ‘triples’ and are often represented like this:
A (the original entity, the first node) is known as the subject.
B (the link, or edge) is known as the predicate.
C (the target entity, the second node) is known as the object.
This relatively simple data structure can be used to link together vast amounts of heterogeneous information such as test results, genomic information, air quality information, and previous prescriptions.
These links might simply make known information more discoverable (for example, the connections between a set of symptoms and a specific disease), or they might enable inferences about completely new information (for example, whether a drug used to treat Disease A might also be used to treat Disease B). In this way, knowledge graphs enable deeper understanding of particular domains.
Knowledge graphs can be created manually by following a series of steps including:
1. Identifying the scope and objectives of the knowledge graph and developing the data schema.
2. Gathering the data from different sources, including medical literature, clinical trial results, and real world data such as that contained in electronic health records (EHRs).
3. Extracting and transforming the data into a structured format by identifying the entities and defining the relationships between them.
4. Mapping the entities and relationships to the chosen data schema.
5. Using prediction models to infer the information needed to complete ‘missing’ information such as missing relationships.
6. Validating the knowledge graph to ensure accuracy.
7. Regularly updating the knowledge graph to ensure relevance by incorporating new data and knowledge, refining the schema and evaluating the quality of the knowledge graph.
This is a time-consuming process that is not well suited to the exponential growth of medical information that can be witnessed today. This is why, increasingly, AI techniques including natural language processing (NLP) are being used to automate the creation of knowledge graphs.
Alternatively, knowledge graphs can be used as inputs in machine learning models, helping them to better ‘understand’ specific domain knowledge. Finally, knowledge graphs might be used to make machine learning models more ‘explainable’ by representing the knowledge (and therefore decision making process) in a structured and relatively transparent format.
Knowledge graphs have numerous uses in healthcare. Three of the most common are:
- Treatment recommendation decision support
A knowledge graph where the nodes are medicines, diseases, and patients, and the edges represent the interactions between these different entities, can be used to recommend to clinicians the best treatment for specific patients. - Identifying misinformation
By comparing information presented online (for example, the effectiveness of a particular drug), with the authoritative information contained within the knowledge graph, knowledge graphs can be used to identify and suppress misinformation. - Drug discovery and drug repurposing
A knowledge graph comprised of entities (e.g., the ingredients of a drug) and relationships between these kinds of entities (e.g., the interaction of Drug A and Drug B) can be used to predict dangerous drug-to-drug interactions, the potential repurposing of an existing drug as a treatment for a newly discovered disease, or the identification of a completely new potential treatment.
An Owkin example
Owkin’s work on drug discovery and drug repurposing makes use of a ‘knowledge engine’ that identifies new target diseases and target subpopulations for existing ‘on-the-market’ drugs. The basis of this knowledge engine is a knowledge graph that helps match the right patient to the right drug.
Further reading
- Abu-Salih, Bilal et al. 2022. ‘Healthcare Knowledge Graph Construction: State-of-the-Art, Open Issues, and Opportunities’. https://arxiv.org/abs/2207.03771
- Chaudhri, Vinay K, Naren Chittar, and Michael Genesereth. 2021. ‘An Introduction to Knowledge Graphs’. SAIL Blog. http://ai.stanford.edu/blog/introduction-to-knowledge-graphs/ (July 18, 2023).
- Cui, Hejie et al. 2023. ‘A Survey on Knowledge Graphs for Healthcare: Resources, Applications, and Promises’. http://arxiv.org/abs/2306.04802 (
- Ehrlinger, Lisa, and Wolfram Wöß. 2016. ‘Towards a Definition of Knowledge Graphs’. In Leipzig, Germany. https://ceur-ws.org/Vol-1695/paper4.pdf (July 18, 2023).
- Hänsel, Katrin et al. 2023. ‘From Data to Wisdom: Biomedical Knowledge Graphs for Real-World Data Insights’. Journal of Medical Systems 47(1): 65.
- Peng, Ciyuan, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. 2023. ‘Knowledge Graphs: Opportunities and Challenges’. https://arxiv.org/abs/2303.13948 (July 18, 2023).
- Rajabi, Enayat, and Somayeh Kafaie. 2022. ‘Knowledge Graphs and Explainable AI in Healthcare’. Information13(10): 459.
- Wilcke, Xander, Peter Bloem, and Victor De Boer. 2017. ‘The Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge’ ed. Michel Dumontier. Data Science 1(1–2): 39–57.