A knowledge graph is a network of real-world entities (people, places, concepts) connected by their relationships. It's how Google understands "things, not strings" — and increasingly, how AI systems ground their reasoning in structured facts.
Imagine you have a giant web of everything you know. In the middle is "you," and lines connect you to your friends, your school, your favorite foods, and your pets. Each thing connects to other things too — your friend connects to their school, their pets, their favorite foods.
A knowledge graph is like this web, but for a computer. It helps computers understand that "Paris" isn't just a word — it's a city, which is in France, which has the Eiffel Tower, which was built in 1889.
When you ask Google "How tall is the Eiffel Tower?", it doesn't just search for pages with those words. It knows the Eiffel Tower is a thing, and it knows facts about that thing — including that it's 330 meters tall!
Historical context: The term "knowledge graph" was coined in 1972 by linguist Edgar W. Schneider, but the concept truly took off in 2012 when Google announced their Knowledge Graph with the tagline "things, not strings."
Before knowledge graphs, search engines treated everything as text. If you searched "mercury," you'd get results mixing the planet, the element, the Roman god, and Freddie Mercury. Knowledge graphs solved this by understanding that these are different entities that happen to share a name.
The three building blocks:
When you search "Obama age," Google doesn't scan web pages. It looks up the entity "Barack Obama" in its knowledge graph, finds the birth date property (August 4, 1961), calculates the age, and displays it directly.
Knowledge graphs are fundamentally different from traditional databases. In a relational database, you define rigid schemas upfront (customers table, orders table). In a knowledge graph, you model the world as a flexible network that can grow organically.
Key technical concepts:
Major public knowledge graphs:
When you search for a celebrity, company, or landmark on Google, the box on the right side is the Knowledge Panel — a direct visualization of knowledge graph data about that entity.
Graph databases and query languages:
Knowledge graphs are typically stored in specialized graph databases optimized for traversing relationships. Unlike SQL (designed for row-based queries), graph query languages excel at multi-hop traversals.
Knowledge graph embeddings:
For machine learning applications, entities and relationships need to be represented as vectors. Models like TransE, RotatE, and ComplEx learn embeddings that preserve the relational structure, enabling:
Virtual vs. materialized knowledge graphs:
Materialized graphs store data directly in a graph database. Virtual knowledge graphs query underlying relational databases through a mapping layer (ontology-based data access) — useful for enterprises that can't migrate existing data.
When merging knowledge graphs, the same entity often appears with different identifiers. "NYC," "New York City," and "City of New York" must be recognized as the same entity. This disambiguation is an active research area combining string matching, structural similarity, and learned embeddings.
Knowledge graphs for LLM grounding (GraphRAG):
The 2024 emergence of GraphRAG (Microsoft Research) represents a paradigm shift. Traditional RAG retrieves text chunks; GraphRAG constructs a knowledge graph from source documents, then uses graph structure to retrieve contextually relevant information. Benefits:
Automated knowledge graph construction:
LLMs have transformed KG construction from manual curation to semi-automated extraction. Modern pipelines use:
Enterprise knowledge graph architecture:
Production systems typically layer: (1) raw data sources, (2) ETL/mapping layer, (3) graph storage, (4) inference engine, (5) API/query layer, (6) application tier. Challenges include versioning, provenance tracking, and handling temporal dynamics (facts that change over time).
Neurosymbolic integration:
The frontier is combining neural networks (learning from data) with symbolic reasoning (logical inference over KGs). This addresses LLM weaknesses in factual accuracy and multi-step reasoning while preserving flexibility and language understanding.
Drug discovery increasingly relies on KGs like Hetionet (50,000+ nodes linking genes, diseases, compounds, pathways). GNN-based models traverse these graphs to predict novel drug-target interactions, accelerating the discovery pipeline from years to months.
The leading graph database platform. Powers knowledge graphs at eBay, NASA, and most Fortune 500 companies. Raised $582M, valued at $2B+.
Builds the world's largest commercial knowledge graph by crawling and understanding the entire web using AI. Raised $50M+ total.
Enterprise knowledge graph platform focused on data fabric and virtual graph capabilities. Series C, $45M raised.
GraphDB creator and semantic technology company. Powers BBC, Financial Times, and AstraZeneca knowledge systems.
Knowledge graph compute platform focused on enterprise AI reasoning. Founded by database pioneer. Series C, $122M raised.
Startup building automated knowledge graph construction from unstructured text for RAG applications. Seed stage.
AI-native search engine using knowledge graph understanding. Pivoted from consumer to API. Raised $17M Series A.
AWS's managed graph database service supporting both property graphs and RDF, integrated with the AWS ecosystem.