EN 日本語
← Back to archive

Knowledge Graph

Data structure for AI and search · February 16, 2026

Summary

A knowledge graph is a network of real-world entities (people, places, concepts) connected by their relationships. It's how Google understands "things, not strings" — and increasingly, how AI systems ground their reasoning in structured facts.

1
Elementary School
Ages 8-10

Imagine you have a giant web of everything you know. In the middle is "you," and lines connect you to your friends, your school, your favorite foods, and your pets. Each thing connects to other things too — your friend connects to their school, their pets, their favorite foods.

A knowledge graph is like this web, but for a computer. It helps computers understand that "Paris" isn't just a word — it's a city, which is in France, which has the Eiffel Tower, which was built in 1889.

When you ask Google "How tall is the Eiffel Tower?", it doesn't just search for pages with those words. It knows the Eiffel Tower is a thing, and it knows facts about that thing — including that it's 330 meters tall!

2
High School
Ages 14-18

Historical context: The term "knowledge graph" was coined in 1972 by linguist Edgar W. Schneider, but the concept truly took off in 2012 when Google announced their Knowledge Graph with the tagline "things, not strings."

Before knowledge graphs, search engines treated everything as text. If you searched "mercury," you'd get results mixing the planet, the element, the Roman god, and Freddie Mercury. Knowledge graphs solved this by understanding that these are different entities that happen to share a name.

Entity → Relationship → Entity
(Einstein) → [born in] → (Ulm, Germany)

The three building blocks:

  • Nodes: Entities like people, places, concepts, events
  • Edges: Relationships connecting entities (works at, located in, invented)
  • Properties: Attributes of entities (birth date, population, height)
Example

When you search "Obama age," Google doesn't scan web pages. It looks up the entity "Barack Obama" in its knowledge graph, finds the birth date property (August 4, 1961), calculates the age, and displays it directly.

3
College Undergraduate
Ages 18-22

Knowledge graphs are fundamentally different from traditional databases. In a relational database, you define rigid schemas upfront (customers table, orders table). In a knowledge graph, you model the world as a flexible network that can grow organically.

Key technical concepts:

  • Triples: The atomic unit of knowledge: (Subject, Predicate, Object). Example: (Marie Curie, won, Nobel Prize in Physics)
  • Ontologies: Formal schemas that define entity types and valid relationships. Think of them as the "rules" of your knowledge domain
  • Inference: Deriving new facts from existing ones. If A is a parent of B, and B is a parent of C, then A is a grandparent of C

Major public knowledge graphs:

  • Wikidata: Open, community-maintained, powers Wikipedia info boxes
  • DBpedia: Structured extraction from Wikipedia articles
  • Google Knowledge Graph: Powers Google Search, ~500 billion facts
  • YAGO: Academic project combining Wikipedia, WordNet, GeoNames
The "Knowledge Panel"

When you search for a celebrity, company, or landmark on Google, the box on the right side is the Knowledge Panel — a direct visualization of knowledge graph data about that entity.

4
Graduate Student
Advanced degree level

Graph databases and query languages:

Knowledge graphs are typically stored in specialized graph databases optimized for traversing relationships. Unlike SQL (designed for row-based queries), graph query languages excel at multi-hop traversals.

  • SPARQL: Query language for RDF triplestores (W3C standard)
  • Cypher: Neo4j's declarative query language
  • Gremlin: Apache TinkerPop's traversal language

Knowledge graph embeddings:

For machine learning applications, entities and relationships need to be represented as vectors. Models like TransE, RotatE, and ComplEx learn embeddings that preserve the relational structure, enabling:

  • Link prediction (inferring missing relationships)
  • Entity classification
  • Similarity search

Virtual vs. materialized knowledge graphs:

Materialized graphs store data directly in a graph database. Virtual knowledge graphs query underlying relational databases through a mapping layer (ontology-based data access) — useful for enterprises that can't migrate existing data.

Entity Alignment Problem

When merging knowledge graphs, the same entity often appears with different identifiers. "NYC," "New York City," and "City of New York" must be recognized as the same entity. This disambiguation is an active research area combining string matching, structural similarity, and learned embeddings.

5
Expert
Researchers & practitioners

Knowledge graphs for LLM grounding (GraphRAG):

The 2024 emergence of GraphRAG (Microsoft Research) represents a paradigm shift. Traditional RAG retrieves text chunks; GraphRAG constructs a knowledge graph from source documents, then uses graph structure to retrieve contextually relevant information. Benefits:

  • Handles multi-hop reasoning ("Who are the investors of companies founded by Stanford AI lab alumni?")
  • Provides citation chains and explainability
  • Reduces hallucination by grounding generation in structured facts

Automated knowledge graph construction:

LLMs have transformed KG construction from manual curation to semi-automated extraction. Modern pipelines use:

  • Named Entity Recognition (NER) for entity extraction
  • Relation extraction models for edge prediction
  • Entity linking to existing KGs (Wikidata, domain ontologies)
  • Iterative refinement with human-in-the-loop validation

Enterprise knowledge graph architecture:

Production systems typically layer: (1) raw data sources, (2) ETL/mapping layer, (3) graph storage, (4) inference engine, (5) API/query layer, (6) application tier. Challenges include versioning, provenance tracking, and handling temporal dynamics (facts that change over time).

Neurosymbolic integration:

The frontier is combining neural networks (learning from data) with symbolic reasoning (logical inference over KGs). This addresses LLM weaknesses in factual accuracy and multi-step reasoning while preserving flexibility and language understanding.

Biomedical Knowledge Graphs

Drug discovery increasingly relies on KGs like Hetionet (50,000+ nodes linking genes, diseases, compounds, pathways). GNN-based models traverse these graphs to predict novel drug-target interactions, accelerating the discovery pipeline from years to months.

Companies & Tools in This Space

Sources