We’re Entering the “Blue Links” Era of RAG
我们正在进入 RAG 的“蓝链”时代

We are on the verge of realizing that in order to do anything significantly useful with GenAI, you can’t depend only on autoregressive LLMs to make your decisions. I know what you’re thinking: “RAG is the answer.” Or fine-tuning, or GPT-5.
我们即将意识到,为了有效地使用 GenAI,仅仅依赖自回归LLMs来做决策是不够的。我知道你在想:“RAG 是答案。”或者微调,或者 GPT-5。
Yes. Techniques like vector-based RAG and fine-tuning can help. And they are good enough for some use cases. But there’s another whole class of use cases where these techniques all bump into a ceiling. Vector-based RAG – in the same way as fine-tuning – increases the probability of a correct answer for many kinds of questions. However neither technique provides the certainty of a correct answer. Oftentimes they also lack context, color, and a connection to what you know to be true. Further, these tools don’t leave you with many clues about why they made a particular decision.
是的。基于向量的 RAG 和微调等技术会有所帮助,而且在某些场景下效果不错。但还有另一类应用场景,这些技术都遇到了瓶颈。基于向量的 RAG 和微调一样,能提高许多类型问题的正确答案概率。然而,两者都无法提供确定的答案。它们常常缺乏上下文、丰富度和与你所知事实的关联。此外,这些工具很少给出特定决策的原因线索。
Back in 2012, Google introduced their second-generation search engine with an iconic blog post titled “Introducing the Knowledge Graph: things, not strings1.” They discovered that a huge leap in capability is possible if you use a knowledge graph to organize the things represented by the strings in all these web pages, in addition to also doing all of the string processing. We are seeing this same pattern unfold in GenAI today. Many GenAI projects are bumping up against a ceiling, where the quality of results is gated by the fact that the solutions in use are dealing in strings, not things.
早在 2012 年,谷歌就在一篇名为“推出知识图谱:事物,而非字符串 1 ”的标志性博客文章中介绍了他们的第二代搜索引擎。他们发现,如果除了处理所有网页中的字符串外,还使用知识图谱来组织这些字符串所代表的事物,那么能力将实现巨大飞跃。今天我们也在 GenAI 中看到相同的模式。许多 GenAI 项目遇到瓶颈,结果的质量受限于它们使用的解决方案处理的是字符串,而非事物。
Fast forward to today, AI engineers and academic researchers at the leading edge are discovering the same thing that Google did: that the secret to breaking through this ceiling is knowledge graphs. In other words, bring knowledge about things into the mix of statistically-based text techniques. This works just like any other type of RAG, except with a call to a knowledge graph in addition to a vector index. Or in other words, GraphRAG!
快进到今天,领先的 AI 工程师和学术研究人员正在发现与谷歌相同的事实:突破这个上限的秘密在于知识图谱。换句话说,将关于事物的知识融入基于统计的文本技术中。这就像任何其他类型的 RAG,只是除了向向量索引外,还会调用知识图谱。或者换句话说,GraphRAG!
This post is intended to be a comprehensive and easy-to-read treatment of GraphRAG. It turns out that building a knowledge graph of your data and using it in RAG gives you several powerful advantages. There’s a robust body of research proving that it gives you better answers to most if not ALL questions you might ask an LLM using normal vector-only RAG.
这篇帖子旨在全面而易懂地介绍 GraphRAG。事实证明,构建数据的知识图谱并在 RAG 中使用它,为你带来了许多强大的优势。大量的研究证明,与仅使用向量的普通 RAG 相比,它能为你提供更准确的答案,几乎可以回答所有你可能问LLM的问题。
That alone will be a huge driver of GraphRAG adoption. In addition to that, you get easier development thanks to data being visible when building your app. A third major advantage is that graphs can be readily understood and reasoned upon by humans as well as machines. Building with GraphRAG is therefore easier, gives you better results, and – this is a killer in many industries – is explainable and auditable! I believe GraphRAG will subsume vector-only RAG and emerge as the default RAG architecture for most use cases. This post explains why.
这本身将极大地推动 GraphRAG 的采用。除此之外,构建应用时数据可见,使得开发更加便捷。第三个主要优势是,图形既适合人类理解,也适合机器处理。因此,使用 GraphRAG 构建更简单,效果更好——这一点在许多行业中至关重要——可解释且可审计!我相信 GraphRAG 将取代仅支持向量的 RAG,成为大多数用例的默认 RAG 架构。这篇文章将解释其中的原因。
Wait, Graph? 等一下,图表?
Let’s be clear that when we say graph, we mean something like this:
让我们明确,当提到图(graph),我们指的是这样的:

While this image has been widely used to exemplify knowledge graphs, the original source and author remain unidentified. The earliest known usage appears to be this Medium post from Farahnaz Akrami. If you are the creator of this image, please contact us so we may provide proper attribution.
尽管这张图片被广泛用于说明知识图谱,但其原始来源和作者仍未明确。已知最早出现的地方可能是 Farahnaz Akrami 在 Medium 上的这篇文章。如果您是此图像的创作者,请联系我们,以便我们提供恰当的引用。

The Graph of Thrones visualization by William Lyon.
权力的游戏可视化,由 William Lyon 创作。

London Underground Map (Credit: Transport for London.) Fun fact, Transport for London recently deployed a graph-powered digital twin to improve incident response and reduce congestion.
伦敦地铁地图(来源:伦敦交通局)。有趣的是,伦敦交通局最近部署了一个基于图的数字双胞胎,以改善应急响应并减少拥堵。
In other words, not a chart.
换句话说,不是图表。
If you want to delve more into graphs and knowledge graphs, I’d recommend a detour to Neo4j’s GraphAcademy or Andrew Ng’s Deeplearning.ai course on Knowledge Graphs for RAG. We won’t linger on definitions here and will continue forward assuming basic working knowledge of graphs.
如果你想深入了解图和知识图谱,我建议你去 Neo4j 的 GraphAcademy 或 Andrew Ng 在 RAG 上的知识图谱深度学习课程。这里我们不会过多纠结定义,假设大家对图的基本工作原理已有了解。
If you understand the pictures above, you can see how you might query the underlying knowledge graph data (stored in a graph database) as part of your RAG pipeline. This is what GraphRAG is about.
如果你理解了上面的图片,你就能看到如何在 RAG 管道中查询底层知识图谱数据(存储在图数据库中)。这就是 GraphRAG 的核心。
Two Types of Knowledge Representation: Vectors & Graphs
两种知识表示形式:向量与图
The core of typical RAG – vector search – takes in a chunk of text and returns conceptually similar text from a candidate body of written material. This is pleasantly automagical and is very useful for basic searches.
典型的 RAG-向量搜索的核心接收一段文本,从候选的书面材料中返回在概念上相似的文本。这非常愉快地实现了自动化,对基本搜索非常有用。
What you might not think about every time you do this is what a vector looks like, or what the similarity calculation is doing. Let’s look at an apple in human terms, vector terms, and graph terms:
每次这样做的时候,你可能不会去想向量看起来像什么,或者相似度计算在做什么。让我们从人类、向量和图的角度来看一个苹果:

The human representation is complex and multidimensional and not something we can fully capture on paper. Let’s grant some poetic license and imagine that this beautifully tempting picture represents an apple in all its perceptual & conceptual glory.
人类的表达复杂而多元,无法完全在纸上捕捉。让我们赋予一些诗意,想象这幅诱人美丽的画面代表了苹果在感知和概念上的全部魅力。
The vector representation of the apple2 is an array of numbers – a construct of the statistical realm. The magic of vectors is that they each capture the essence of their corresponding text in encoded form. In a RAG context however, they are only valuable when you need to identify how similar one handful of words is to another. Doing this is as simple as running a similarity calculation (aka vector math) and getting a match. However, if you want to make sense of what’s inside of a vector, understand what’s around it, get a handle on the things represented in your text, or understand how any of these fit into a larger context, then vectors as a representation just aren’t able to do that.
苹果 2 的向量表示是一个数字数组——统计学领域的构造。向量的魔力在于,它们以编码形式捕捉了对应文本的本质。但在 RAG(可能指的是某种评估或排名系统)背景下,只有在你需要判断一组词与其他词的相似性时,它们才有价值。这只需运行一个相似度计算(即向量运算)并找到匹配即可。但如果你想理解向量内部的内容,了解其上下文,掌握文本中所代表的事物,或者理解这些如何融入更大的语境,那么作为表示方式,向量是无法做到的。
Knowledge graphs, by contrast, are declarative – or in AI terms, symbolic – representations of the world. As a result, both humans and machines can understand and reason upon knowledge graphs. This is a BIG DEAL, which we’ll revisit later. Additionally, you can query, visualize, annotate, fix, and grow knowledge graphs. A knowledge graph represents your world model3 – the part of the world that represents the domain you are working with.
相反,知识图谱是声明式的——或者在人工智能术语中,是符号式的——世界表示。因此,人类和机器都能理解并推理知识图谱。这是个大问题,我们稍后会再讨论。此外,你还可以查询、可视化、注释、修复和扩展知识图谱。知识图谱代表了你的世界模型——你正在处理领域的那一部分世界。
GraphRAG “vs.” RAG GraphRAG “对抗” RAG
It’s not a competition 🙂 Vector and graph queries each add value in RAG. As pointed out by founder of LlamaIndex Jerry Liu, it’s helpful to think about GraphRAG as inclusive of vectors. This is distinct from “vector-only RAG,” which is strictly based on similarity with embeddings based on words in text.
这不是竞赛呵)向量查询和图查询在 RAG 中各有所长。正如 LlamaIndex 创始人 Jerry Liu 所指出的,将 GraphRAG 视为包括向量是有帮助的。这与仅基于文本中单词嵌入的相似性进行的“纯向量 RAG”有所不同。
Fundamentally, GraphRAG is RAG, where the Retrieval path includes a knowledge graph. As you can see below, the core GraphRAG pattern is straightforward. It’s basically the same architecture as RAG with vectors4 but with a knowledge graph layered into the picture.
从根本上说,GraphRAG 就是 RAG,其中检索路径包含了一个知识图谱。如您所见,GraphRAG 的核心模式相当直接。它基本上是与向量 4 相同的 RAG 架构,只是增加了知识图谱的层次。
GraphRAG Pattern 图形 RAG 模式

Here, you see a graph query being triggered. It can optionally include a vector similarity component. You can choose to store your graphs and vectors either separately in two distinct databases, or use a graph database like Neo4j which also supports vector search.
在这里,你看到一个图查询被触发。它可选地包含向量相似性组件。你可以选择将图和向量分别存储在两个独立的数据库中,或者使用像 Neo4j 这样的图数据库,它也支持向量搜索。
One of the common patterns for using GraphRAG is as follows:
GraphRAG 的常见用法模式如下:
- Do a vector or keyword search to find an initial set of nodes.
执行向量或关键词搜索,找到初始节点集。 - Traverse the graph to bring back information about related nodes.
遍历图以获取相关节点的信息。 - Optionally, re-rank documents using a graph-based ranking algorithm such as PageRank.
可选地,使用图基排名算法(如 PageRank)重新排序文档。
Patterns vary by use case, and like everything else in AI today, GraphRAG is proving to be a rich space, with new discoveries emerging every week. We will dedicate a future blog post to the most common GraphRAG patterns we see today.
用例不同,模式各异,就像当今人工智能中的其他一切一样,GraphRAG 正在展现出丰富的可能性,每周都有新的发现。我们将在未来的博客文章中详细介绍当前最常见的 GraphRAG 模式。
GraphRAG Lifecycle GraphRAG 生命周期
A GenAI application that uses GraphRAG follows the same pattern as any RAG application, with an added “create graph” step at the start:
使用 GraphRAG 的 GenAI 应用程序遵循与任何 RAG 应用程序相同的模式,只是在开始时增加了一个“创建图形”步骤:

Creating a graph is analogous to chunking documents and loading them into a vector database. Advances in tooling have made graph creation literally that easy. The good news is threefold:
创建图形类似于切分文档并将其加载到向量数据库中。工具的进步使得创建图形变得轻而易举。好消息有三个方面:
- Graphs are highly iterative – you can start with a “minimum viable graph” and expand from there.
图表具有高度迭代性 - 可以从“最小可行图”开始,然后逐步扩展。 - Once your data is in a knowledge graph, it becomes very easy to evolve. You can add more kinds of data, to reap the benefits of data network effects. You can also improve the quality of the data to up the value of your application results.
一旦数据进入知识图谱,它变得非常易于演化。您可以添加更多种类的数据,以利用数据网络效应。同时,也可以提升数据质量,提高应用结果的价值。 - This part of the stack is rapidly improving, which means graph creation will only get easier as tooling gets more sophisticated.
这部分栈正在迅速改进,这意味着随着工具的日益复杂,图的创建将变得越来越容易。
Adding the graph creation step to the earlier picture gives you a pipeline that looks like this:
在上面的图中加入图形创建步骤,得到的管道看起来像这样:

I will dive deeper into graph creation later. For now, let’s set that aside and talk about the benefits of GraphRAG.
稍后我会深入探讨图的创建。现在,让我们先放下这个,谈谈 GraphRAG 的好处。
Why GraphRAG? 为什么选择 GraphRAG?
The benefits we are seeing from GraphRAG relative to vector-only RAG fall into three main buckets:
我们从 GraphRAG 相对于向量仅 RAG 中看到的好处主要分为三个类别:
- Higher accuracy and more complete answers (runtime / production benefit)
更高的精确度和更完整的答案(运行时/生产效益) - Once you’ve created your knowledge graph, then it’s easier to both build5 and subsequently maintain your RAG application (development time benefit)
一旦创建了知识图谱,构建 5 和后续维护您的 RAG 应用会更加便捷(开发时间优势) - Better explainability, traceability6, and access controls (governance benefit)
更好的可解释性、追踪性和访问控制(治理效益) 6
Let’s drill into these:
让我们深入探讨这些:
#1: Higher Accuracy & More Useful Answers
#1:更精确的答案,更有用的信息
The first (and most immediately tangible) benefit we see with GraphRAG is higher-quality responses. In addition to a growing number of examples we see from our customers, an increasing number of academic studies also support this. One such example is by data catalog company Data.world. At the end of 2023, they published a study that showed that GraphRAG, on average, improved accuracy of LLM responses by 3x across 43 business questions. The benchmark found evidence of a significant improvement in the accuracy of responses when backed by a knowledge graph.
GraphRAG 带来的第一个(也是最直观的)好处是更高质量的回答。除了我们客户提供的越来越多的例子,越来越多的学术研究也支持这一观点。例如,数据目录公司 Data.world 在 2023 年底发布的一项研究表明,GraphRAG 平均在 43 个业务问题上将LLM回答的准确性提高了 3 倍。基准测试发现,当支持知识图谱时,回答的准确性有显著提升。

More recently and perhaps better known is a series of posts by Microsoft starting in February 2024 with a research blog titled GraphRAG: Unlocking LLM discovery on narrative private data, along with an associated research paper, and software release. Here they observed that baseline RAG (i.e. with vectors) has the two following problems:
更近期且可能更为人所知的是,微软从 2024 年 2 月开始的一系列帖子,包括一篇名为“GraphRAG:在叙事私人数据上解锁LLM发现”的研究博客、相关研究论文以及软件发布。他们观察到,基础 RAG(即使用向量的版本)存在以下两个问题:
- Baseline RAG struggles to connect the dots. This happens when answering a question requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights.
基线 RAG 难以关联这些点。这是当回答问题需要通过共享属性遍历不同片段信息,以提供新的合成见解时发生的。 - Baseline RAG performs poorly when being asked to holistically understand summarized semantic concepts over large data collections or even singular large documents.
基线 RAG 在整体理解大量数据集合或甚至单个大型文档的总结语义概念时表现不佳。
Microsoft found that “By using the LLM-generated knowledge graph, GraphRAG vastly improves the ‘retrieval’ portion of RAG, populating the context window with higher relevance content, resulting in better answers and capturing evidence provenance.” They also discovered that GraphRAG required between 26% and 97% fewer tokens than alternative approaches, making it not just better at providing answers, but also cheaper and more scalable7.
微软发现,通过使用LLM生成的知识图谱,GraphRAG 极大地提高了 RAG 的“检索”部分,为上下文窗口填充了更高相关性的内容,从而得到更好的答案,并捕捉了证据来源。他们还发现,GraphRAG 比其他方法所需的令牌减少了 26%到 97%,这不仅使其在提供答案方面更出色,而且更经济、更具可扩展性 7 。
Digging deeper into the topic of accuracy, it’s not just whether an answer is correct that’s important; it’s also how useful the answers are. What people have been finding with GraphRAG is that not only are the answers more accurate, but they are also richer, more complete, and more useful. LinkedIn’s recent paper describing the impact of GraphRAG on their customer service application provides an excellent example of this. GraphRAG improves both correctness and richness (and therefore usefulness) for answering customer service questions, reducing median per-issue resolution time by 28.6% for their customer service team8.
深入探讨准确性问题,答案不仅是否正确至关重要,其有用性也同样重要。GraphRAG 的研究结果显示,答案不仅更准确,而且更丰富、完整、更有用。LinkedIn 最近关于 GraphRAG 对客户服务应用影响的论文就是一个很好的例子。GraphRAG 提升了回答客户服务问题的正确性和丰富度(从而提高实用性),使客服团队处理每个问题的中位解决时间减少了 28.6% 8 。
A similar example comes from a GenAI workshop taught by Neo4j and with our partners at GCP, AWS, and Microsoft. The sample query below, which targets a collection of SEC filings, provides a good illustration of the kinds of answers that are possible when using vector + GraphRAG vs. those that one obtains when using vector-only RAG:
类似的例子来自 Neo4j 和我们的合作伙伴 GCP、AWS 和微软举办的 GenAI 研讨会。下面的查询示例针对一组 SEC 文件,很好地展示了使用向量+图 RAG 可能得到的答案,与仅使用向量 RAG 相比有何不同:
Note the difference between describing the characteristics of companies likely to be impacted by a lithium shortage, and listing specific companies that are likely to be. If you are an investor looking to rebalance your portfolio in the face of a change in the market or a company looking to rebalance its supply chain in the face of a natural disaster, having access to the latter and not just the former can be game changing. Here, both answers are accurate. The second one is clearly more useful.
注意区分描述可能受到锂短缺影响的公司的特性,与列出具体可能受影响的公司之间的差异。对于因市场变化或公司面临自然灾害而寻求调整投资组合的投资者,或者希望供应链应对灾害的公司来说,后者而非前者的信息可能是决定性的。在这两个答案中,两者都正确,但第二个显然更有用。
Episode 23 of Going Meta by Jesus Barrasa provides another great example using a legal documents use case, starting with the lexical graph.
耶稣·巴拉萨的《Going Meta》第 23 集通过一个法律文件用例,展示了另一个出色的实例,从词汇图谱开始。
Those observing the X-sphere and who are active on LinkedIn will spot new examples coming out regularly from not just the lab but the field. Here, Charles Borderie at Lettria gives an example of vector-only RAG contrasted with GraphRAG, against an LLM-based text-to-graph pipeline that ingests 10,000 financial articles into a knowledge graph:
观察 X 球体并在领英上活跃的用户会定期发现不仅来自实验室,也来自实际应用的新案例。以下是 Lettria 的 Charles Borderie 提供的一个例子,他展示了仅使用向量的 RAG(向量级关系图)与 GraphRAG 的对比,后者是基于LLM的文本到图管道,该管道将 10,000 篇财经文章输入知识图谱:

As you can see, not only did the quality of the answer improve markedly with GraphRAG vs. plain RAG, but the answer took one-third fewer tokens.
如您所见,与纯 RAG 相比,GraphRAG 下的答案质量明显提高,且答案的长度减少了三分之一。
One last notable example I will include comes from Writer. They recently announced a RAG Benchmarking Report based on the RobustQA framework, comparing their GraphRAG-based approach9 to competitive best-in-class tools. GraphRAG resulted in a score of 86%, which is a significant improvement from the competition, whose scores ranged between 33% and 76%, with equivalent or better latency.
我要包括的最后一个值得注意的例子来自 Writer。他们最近基于 RobustQA 框架发布了一份 RAG 基准报告,对比了他们的图谱 RAG 方法 9 与竞争中的最佳同类工具。GraphRAG 的成绩为 86%,显著优于竞争对手,其分数在 33%到 76%之间,且延迟相当或更低。

Every week I meet with customers across many industries who are experiencing similar positive effects with a wide variety of GenAI applications. Knowledge graphs are unblocking the path for GenAI by making the results more accurate and more useful.
每周我都会与不同行业的客户会面,他们使用各种 GenAI 应用都经历了类似积极的效果。知识图谱通过提高准确性并增加实用性,正在为 GenAI 扫清道路。
#2: Improved Data Understanding, Faster Iteration
#2:提升数据理解,加快迭代速度
Knowledge graphs are intuitive both conceptually and visually. Being able to explore them often reveals new insights. An unexpected side benefit that many users are reporting is that once they’ve invested in creating their knowledge graph, they find that it helps them build and debug their GenAI applications in unexpected ways. This has partly to do with how seeing one’s data as a graph paints a living picture of the data underlying the application. The graph also gives you hooks for tracing answers back to data, and tracing that data up the causal chain.
知识图谱在概念上和视觉上都直观易懂。探索它们常常能揭示新的见解。许多用户报告的一个意想不到的好处是,一旦他们投入创建了知识图谱,发现它以意想不到的方式帮助他们构建和调试他们的 GenAI 应用。这在一定程度上是因为将数据视为图谱能让应用背后的底层数据呈现出动态的画面。图谱还提供了追踪答案到数据的钩子,以及追溯这些数据的因果链。
Let’s look at an example using the lithium exposure question above. If you visualize the vectors, you will get something like this, except with far more rows and columns:
让我们用上面锂暴露问题的例子来看。如果你可视化这些向量,你会得到类似这样的东西,但行数和列数要多得多:

When you work with your data as a graph, you can apprehend it in a way that’s just not possible with a vector representation.
当你以图的形式处理数据时,你可以从向量表示中无法理解的角度去把握它。
Here is an example from a recent webinar from LlamaIndex10, showing off their ability to extract the graph of vectorized chunks (the lexical graph) and LLM-extracted entities (the domain graph) and tie the two together with “MENTIONS” relationships:
这是来自 LlamaIndex 最近网络研讨会的一个示例,展示了他们提取向量化片段图(词汇图)的能力 10 ,以及提取的实体图(领域图)LLM,并通过“MENTIONS”关系将两者连接起来:

(You can find similar examples with Langchain, Haystack, SpringAI, and more.)
(您可以在此找到与 Langchain、Haystack、SpringAI 等类似的示例。)
Looking at this diagram, you can probably start to see how having a rich structure where your data resides opens up a wide range of new development and debugging possibilities. The individual pieces of data retain their value, and the structure itself stores and conveys additional meaning, which you can use to add more intelligence to your application.
看到这个图表,你可能开始理解为何数据存储在结构丰富的地方能带来大量新的开发和调试机会。每个数据点保持其价值,而结构本身则储存并传递额外信息,供你用于提升应用的智能化。
It’s not just the visualization. It’s also the effect of having your data structured in a way that conveys and stores meaning. Here is the reaction of a developer from a well-known fintech a week into introducing knowledge graphs into their RAG workflow:
这不仅仅体现在可视化上,还在于数据结构化的方式能够传达和存储意义。以下是知名金融科技公司在引入知识图谱到其 RAG 工作流程一周后,一位开发者的反应:

This developer’s reaction aligns well with the test-driven development assumption of verifying – not trusting – that answers are correct. Speaking for myself, I get the heebie-jeebies handing 100% of my autonomy over to SkyNet to make decisions that are entirely opaque! More concretely though, even AI non-doomers can appreciate the value of being able to see that a chunk or a document tied to “Apple, Inc.” should really not be mapped to “Apple Corps”. Since the data is ultimately what’s driving GenAI decisions, having facilities at hand to assess and assure correctness is all but paramount.
这位开发者的反应与测试驱动开发的假设相符,即验证答案的正确性,而不是信任。就我个人而言,完全将决策权交给 SkyNet 让我感到不安!更具体地说,即使是不认为 AI 会导致末日的人也会欣赏能够确认与"Apple, Inc."相关的信息不应被映射到"Apple Corps"的价值。毕竟,数据最终驱动 GenAI 的决策,因此拥有检查和确保准确性的设施至关重要。
#3: Governance: Explainability, Security, and More
#3:治理:可解释性、安全性和更多内容
The higher the impact11 of a GenAI decision, the more you need to be able to convince the person who will ultimately be accountable if it goes wrong to trust the decision. This typically involves being able to audit each decision. It also requires a solid and repeatable track record of good decisions. But that isn’t enough. You also need to be able to explain the underlying reasoning to that person when they call a decision to the mat.
生成人工智能决策的影响力 11 越高,你越需要能够说服最终可能负责的那个人信任这个决定。这通常意味着需要审计每个决策。它还要求有良好的决策记录。但这还不够。当他们质疑决策时,你还需要能够向那个人解释背后的推理。
LLMs don’t offer a good way of doing this on their own. Yes, you can get references to the documents used to make the decision. But those don’t explain the decision itself – not to mention the fact that LLMs are known to make up those references! Knowledge graphs operate at an entirely different level, making the reasoning logic inside of GenAI pipelines much clearer, and the inputs a lot more explainable.
LLMs 自己并没有提供一个很好的解决方案。是的,你可以获取到做决策所用文档的引用,但这些并不能解释决策本身——更不用说LLMs 以编造这些引用而闻名!知识图谱在完全不同的层面运作,使得 GenAI 管道内部的推理逻辑更加清晰,输入也更具可解释性。
Let’s continue with one of the examples above, where Charles from Lettria loads up a knowledge graph with extracted entities from 10,000 financial articles and uses this with an LLM to carry out GraphRAG. We saw how this provides better answers. Let’s get a look at the data:
让我们继续上面的一个例子,Charles 来自 Lettria 加载了从 10,000 篇财经文章中提取出的实体,并使用LLM进行 GraphRAG。我们已经看到这提供了更准确的答案。现在让我们看看数据:

Seeing the data as a graph is the first part. The data is also navigable and queryable and can be corrected and updated as time goes on. The governance advantage is that it becomes far easier to view and audit the “world model” of the data. Using a graph makes it more likely that the responsible human who is ultimately accountable for the decision will understand it, relative to being served up the vector version of the same data. On the quality assurance side, having the data in a knowledge graph makes it a lot easier to pick out errors and surprises in the data (pleasant or otherwise), and trace them back to their source. You can also capture provenance and confidence information in the graph and use this not just in your calculation but your explanation. This just isn’t possible when you’re looking at the vector-only version of the same data, which as we discussed earlier is pretty inscrutable to the average – and even above-average!–human.
将数据视为图形是第一步。数据也是可导航和查询的,并且会随着时间的推移进行修正和更新。治理优势在于,查看和审计数据的“世界模型”变得更加容易。相对于提供相同数据的向量版本,使用图形更有可能让最终负责决策的人理解。在质量保证方面,将数据存储在知识图谱中使得更容易发现数据中的错误和意外(无论是好是坏),并追溯其来源。你还可以在图谱中捕获来源和置信度信息,并将其不仅用于计算,还用于解释。这在只查看向量版本的数据时是不可能的,正如我们之前讨论的,这对普通人甚至高于平均水平的人来说都是难以理解的。
Knowledge graphs can also significantly enhance security and privacy. This tends to be less top of mind when building a prototype, but it’s a critical part of the path to production. If you’re in a regulated business such as banking or healthcare, the access any given employee has to information probably depends on that person’s role. Neither LLMs nor vector databases have a good way of limiting the scope of information to match up with the role. You can readily handle this with permissions inside a knowledge graph, where any given actor’s ability to access data is governed by the database, and exclude results that they aren’t allowed to see. Here is a mock-up of a simple security policy that you can implement in a knowledge graph with fine-grained access controls:
知识图谱也能显著提升安全性和隐私保护。在构建原型时,这可能不那么显眼,但却是走向生产的关键环节。如果你身处银行或医疗等受监管行业,员工对信息的访问权限很可能取决于其职位。LLMs 或向量数据库往往没有很好的方式来限制信息范围以匹配角色。知识图谱中,你可以轻松通过权限管理来处理这个问题,任何用户的的数据访问权限都由数据库决定,并且可以排除他们无权查看的结果。以下是一个简单的安全策略示例,你可以在知识图谱中实现精细化访问控制:

Knowledge Graph Creation
知识图谱创建
People often ask me what it takes to build a knowledge graph. The first step in understanding the answer is to know the two kinds of graphs most relevant to GenAI applications:
人们常问我如何构建知识图谱。理解答案的第一步是了解与 GenAI 应用相关的两种主要图谱:
- The Domain graph is a graph representation of the world model relevant to your application. Here is a simple example:
领域图是与您的应用相关的世界模型的图形表示。这是一个简单的例子: - The Lexical graph12 is a graph of document structure. The most basic lexical graph has a node for each chunk of text:
词汇图 12 是一个文档结构图。最基本的词汇图为每段文本有一个节点:
People often expand this to include relationships between chunks and document objects (such as tables), chapters, sections, page numbers, document name/ID, collections, sources, and so on. You can also combine domain and lexical graphs like so:
人们常将此扩展到块与文档对象(如表格)之间的关系,章节、节、页码、文档名称/ID、集合、来源等。你还可以这样结合领域和词汇图:

Creating a lexical graph is easy and largely a matter of simple parsing and chunking strategies13. As for the domain graph, there are a few different paths depending on whether the data you’re bringing in comes from a structured source, from unstructured text, or both. Luckily, tooling for creating knowledge graphs from unstructured data sources is rapidly improving. For example, the new Neo4j Knowledge Graph Builder takes PDF documents, web pages, YouTube clips, or Wikipedia articles, and automatically creates a knowledge graph from them. It’s as easy as clicking a few buttons, and lets you visualize (and of course query) both domain and lexical graphs of your input text. It’s powerful and fun, and significantly reduces the barrier to creating a knowledge graph.
构建词典图很简单,主要依赖于简单的解析和分块策略 13 。至于领域图,取决于你的数据来自结构化源、非结构化文本还是两者兼有。幸运的是,从非结构化数据源创建知识图谱的工具正在迅速提升。例如,新版本的 Neo4j 知识图谱构建器可以处理 PDF 文档、网页、YouTube 视频或维基百科文章,自动从中生成知识图谱。只需点击几个按钮即可操作,它让你能够可视化(当然还有查询)输入文本的领域图和词典图。它强大且有趣,大大降低了创建知识图谱的门槛。
Data about customers, products, geographies, etc. probably lives somewhere in your enterprise in a structured form, and can be sourced directly from wherever it lives. Taking the most common case where it’s in a relational database, you can use standard tools14 that follow tried-and-true rules for relational-to-graph mapping.
关于客户、产品、地域等数据可能以结构化形式存在于您的企业中的某个地方,可以直接从其来源获取。如果它位于关系数据库中,您可以使用标准工具 14 ,遵循关系到图谱映射的成熟规则。
Working with Knowledge Graphs
与知识图谱合作
Once you have a knowledge graph, there is a growing abundance of frameworks for doing GraphRAG, including LlamaIndex Property Graph Index, Langchain’s Neo4j integration as well as Haystack’s and others. This space is moving fast, but we’re now at the point where programmatic methods are becoming straightforward.
一旦有了知识图谱,越来越多的框架支持 GraphRAG,如 LlamaIndex 属性图索引、Langchain 的 Neo4j 集成以及 Haystack 等。这个领域发展迅速,但现在编程方法已经变得简单明了。
The same is true on the graph construction front, with tools such as the Neo4j Importer, which has a graphical UI for mapping & importing tabular data into a graph, and Neo4j’s new v1 LLM Knowledge Graph Builder mentioned above. The picture below summarizes the steps for building a knowledge graph.
图构建方面也是如此,例如 Neo4j 导入器,它有一个图形用户界面,可以映射和导入表格数据到图中,以及上述提到的 Neo4j 新的 v1LLM知识图谱构建器。下图总结了构建知识图谱的步骤。

The other thing you’ll find yourself doing with knowledge graphs is mapping human-language questions to graph database queries. A new open source tool from Neo4j, NeoConverse, is designed to help with natural language querying of graphs. It’s a first solid step forward toward generalizing this15.
你还会发现自己在知识图谱上做的事情是将自然语言问题映射到图数据库查询。Neo4j 的全新开源工具 NeoConverse 旨在帮助用户自然地查询图谱,这是朝着这个方向迈出的坚实一步。
While it’s certainly the case that graphs require some work and learning to get started with, there is also good news in that it’s getting easier & easier as the tools improve.
虽然开始使用图形确实需要一些学习和努力,但好消息是随着工具的不断改进,它正变得越来越容易。
Conclusion: GraphRAG is the Next Natural Step for RAG
结论:GraphRAG 是 RAG 的下一步自然发展
The word-based computations and language skills inherent in LLMs and vector-based RAG offer good results. To get a consistently great result, one needs to go beyond strings and capture the world model in addition to the word model. In the same way that Google discovered that to master search, they needed to go beyond mere textual analysis and map out the underlying things underneath the strings, we are beginning to see the same pattern emerge in the world of AI. This pattern is GraphRAG.
基于词汇的计算和语言技能在LLMs和向量基 RAG 中表现出色。为了获得始终如一的优秀结果,需要超越字符串,除了词模型外,还要捕捉世界模型。就像 Google 发现要掌握搜索,需要超越纯粹的文本分析,绘制出字符串背后的底层事物一样,我们开始在 AI 领域看到同样的模式出现。这种模式就是 GraphRAG。
Progress happens in S-curves: as one technology tops out, another spurs progress and leapfrogs this prior. As GenAI progresses, for uses where answer quality is essential; or where an internal, external, or regulatory stakeholder requires explainability; or where fine-grained controls over access to data for privacy and security is needed, then there’s a good chance your next GenAI application will be using a knowledge graph.
进步遵循 S 曲线进行:当一项技术达到顶峰时,另一项技术会推动进步并超越前者。随着 GenAI 在对答案质量有严格要求的场景中进步,或者内部、外部或监管方需要可解释性,或者对于数据访问的细粒度控制以保障隐私和安全时,那么你的下一个 GenAI 应用很可能将使用知识图谱。

You Can Experience GraphRAG Firsthand!
您可以亲身体验 GraphRAG!
If you’re ready to take the next step with GraphRAG, I invite you to try the Neo4j LLM Knowledge Graph Builder. This simple web app lets you create a knowledge graph in just a few clicks, from unstructured text sources like PDFs, web pages, and YouTube videos. It’s the perfect playground for experiencing the power of GraphRAG firsthand.
如果你准备好了进一步探索 GraphRAG,我邀请你尝试 Neo4jLLM知识图谱构建器。这个简单的网络应用让你只需几下点击,就能从 PDF、网页和 YouTube 视频等非结构化文本源创建知识图谱。它是亲身体验 GraphRAG 威力的理想场所。
With the LLM Knowledge Graph Builder, you can:
使用LLM知识图谱构建器,您可以:
- Connect to your free cloud-based Neo4j instance and build a graph from your favorite text sources.
连接到您的免费基于云的 Neo4j 实例,从您喜欢的文本源构建图形。 - Explore your newly created knowledge graph with interactive visualizations.
使用交互式可视化探索您新创建的知识图谱。 - Chat with your data and put GraphRAG to the test.
与你的数据聊天,测试 GraphRAG。 - Integrate your knowledge graph into applications and unlock new insights.
将您的知识图谱融入应用,开启新洞察。
To get started, spin up a free AuraDB instance and build your knowledge graph. You can learn more about the Neo4j LLM Knowledge Graph Builder and get a guided tour here!
首先,启动一个免费的 AuraDB 实例,构建你的知识图谱。你可以在这里了解更多关于 Neo4jLLM知识图谱构建器,并获取引导教程!
Acknowledgments 致谢
A great many people contributed to this post. I’d like to acknowledge all of you who share your learnings, writings, and code—many examples of which are cited here—and encourage you to keep doing so. It is by sharing as a community that we all learn.
本篇帖子凝聚了众多人的贡献。我要感谢所有在这里分享学习心得、著述和代码的你们,其中许多都在这里被引用。我鼓励大家继续这样做。正是通过社区共享,我们共同学习。
I would also like to thank the many people who see the importance of GraphRAG and who generously offered their time to review and comment on the post itself. In many cases, this was informed by examples showing up in their world.
我也要感谢众多看到 GraphRAG 重要性的人,他们慷慨地花费时间审阅并评论了这篇文章。在很多情况下,这源于他们在自己世界中看到的示例。
Rather than attempting to name everyone, I’d like to call out some of the people outside of what you would normally think about as the “graph world.” We are together seeing GraphRAG as not only an important trend but as a convergence between two worlds.
我不打算点出每个人的名字,而是想提一些你们通常不会想到的“图世界”之外的人。我们共同见证了 GraphRAG 不仅是一个重要趋势,更是两个世界的交汇点。
Having said all of this, my deepest thanks to all of you, including (alphabetically by last name):
说完这些,我要向所有人深深致谢,包括(按姓氏字母顺序):
- Harrison Chase, CEO of Langchain
哈里森·蔡斯,朗链首席执行官 - Ali Ghodsi, CEO of Databricks
阿里·戈德西,Databricks 的首席执行官 - Rod Johnson, Investor and Founder of SpringSource
罗德·约翰逊,SpringSource 的投资人和创始人 - Douwe Kiela, CEO of ContextualAI and Co-inventor of RAG
杜威·基拉,ContextualAI 的首席执行官,RAG 的共同发明者 - Christina Li, FPV Ventures
李思婷,FPV 风险投资 - Jerry Liu, CEO of LlamaIndex
刘杰瑞,LlamaIndex 的 CEO - Owen Robertson, Principal, DTS
奥文·罗伯逊,DTS 校长 - Milos Rusic, CEO of deepset / Haystack
米尔科·鲁西奇,deepset / Haystack 的首席执行官
Supplement: Further Reading
补充阅读:更多参考文献
There’s been a lot written about this topic, with new insights and examples appearing every day. While I can’t hope to provide a comprehensive list, here are a few particularly good pieces you can check out if you’re interested in learning more:
关于这个主题已经写了很多,每天都有新的见解和例子出现。虽然我无法提供详尽的列表,但如果你有兴趣深入了解,以下是一些特别值得阅读的文章:
- The DeepLearning.AI short course on Knowledge Graphs for RAG is a great 60-minute way to get started.
深度学习.AI 关于 RAG 的知识图谱短期课程是入门的理想 60 分钟途径。 - The GraphRAG Ecosystem Tools. Start by spending a few minutes creating a knowledge graph of the data and concepts in a video from YouTube or your favorite PDF or Wikipedia page using the LLM Knowledge Graph Builder. If you don’t already have an Aura Free instance, you can create your own one here for us with the Knowledge Graph Builder.
图格拉 AG 生态系统工具。首先,使用LLM知识图谱构建器,花几分钟时间为 YouTube 视频或你喜爱的 PDF 或维基百科页面上的数据和概念创建一个知识图谱。如果你还没有 Aura 免费实例,可以在这里通过知识图谱构建器为自己创建一个。 - Join the GraphRAG Discord.
加入 GraphRAG 的 Discord。 - Tomaz Bratanic’s post called Implementing ‘From Local to Global’ GraphRAG with Neo4j and LangChain: Constructing the Graph, which integrates Microsoft’s GraphRAG work into a Neo4j + Langchain pipeline.
托马兹·布拉坦尼克的帖子标题为“使用 Neo4j 和 LangChain 实现从本地到全局的 GraphRAG:构建图”,将微软的 GraphRAG 工作整合进 Neo4j 和 Langchain 管道中。 - Any of Tomaz Bratanic’s many other blog posts. Seriously, they’re all awesome.
任何一篇 Tomaz Bratanic 的其他博客文章。说真的,它们都很棒。 - Ben Lorica’s two posts: Charting the Graphical Roadmap to Smarter AI and GraphRAG: Design Patterns, Challenges, Recommendations.
本·洛里卡的两篇文章:智能 AI 的图形路线图和 GraphRAG:设计模式、挑战与建议。 - A couple of audio references:
几段音频参考:-
- The Data Exchange podcast episode, Supercharging AI with Graphs (June 27, 2024) where Ben and I both discuss the material in this post, and more.
数据交换播客节目,"图谱驱动 AI 的超级充电"(2024 年 6 月 27 日),本和我在此节目中讨论了本文的内容,以及更多相关内容。 - The July 4, 2024 ThursdAI podcast 1-year anniversary episode, which includes a dedicated segment on GraphRAG, led by Emil Eifrem.
2024 年 7 月 4 日星期四的 ThursdAI 播客一周年特别节目,其中包括 Emil Eifrem 主持的 GraphRAG 专题环节。
- The Data Exchange podcast episode, Supercharging AI with Graphs (June 27, 2024) where Ben and I both discuss the material in this post, and more.
-
- Deloitte’s paper titled Responsible Enterprise Decisions with Knowledge-Enriched Generative AI, with the subtitle Why is it essential for enterprise-level generative AI to incorporate knowledge graphs?
德勤的论文标题:《知识强化生成 AI 驱动的企业责任决策》,副标题:为何企业级生成 AI 融入知识图谱至关重要? - Jesus Barrasa’s Going Meta series. It’s 27 videos and counting, each covering a different aspect or example of GraphRAG.
耶稣·巴拉斯的"Going Meta"系列。已有 27 集,每集都涵盖了 GraphRAG 的不同方面或示例。 - Any of Leann Chen’s learning videos, including You Need Better Knowledge Graphs for Your RAG and Build an Advanced RAG Chatbot with Neo4j Knowledge Graph.
任何来自 Leann Chen 的学习视频,包括《为你的 RAG 需要更好的知识图谱》和《使用 Neo4j 知识图谱构建高级 RAG 聊天机器人》。 - LlamaIndex’s six-part lightning Introduction to Property Graphs.
LlamaIndex 的六部分闪电介绍房产图。 - The GraphStuff.fm podcast, hosted by Jennifer Reif, Andreas Kollegger, Alison Cossette, Jason Koo.
《GraphStuff.fm》播客,由 Jennifer Reif、Andreas Kollegger、Alison Cossette 和 Jason Koo 主持。 - Last but not least, if you find yourself needing to justify GraphRAG to your boss and want to throw around some extra weight, look no further than Gartner’s 2024 Impact Radar for Generative AI, which puts knowledge graphs at the center of the bullseye for GenAI technologies most relevant right now!
最后但同样重要的是,如果你需要向老板解释 GraphRAG,并想增加一些分量,Gartner 在 2024 年生成式人工智能影响雷达中的观点不容忽视。该雷达将知识图谱置于当前最相关生成 AI 技术的中心焦点!
1 Read this blog post to see just how great an analogy Google’s journey in web search is for what’s happening now in GenAI.
1 阅读这篇博客文章,了解 Google 在网页搜索中的历程如何与 GenAI 当前的发展形成绝妙的类比。
2 NB: These particular numbers may or may not actually represent an apple. It’s hard to know, which illustrates one of the key differences between vectors and graphs.
2 注意:这些特定的数字可能代表苹果,也可能不代表。这很难确定,这突显了向量和图的关键区别之一。
3 As is discussed later in the “Knowledge Graph Creation” section, another kind of knowledge graph distinct from the “domain graph” is emerging and proving to be useful. This is the “lexical graph”, which instead of a world model is a graph of the vector chunks and how they relate to one another and to the document structures around them: tables/ figures/ pages/ documents/ collections/ authors and so on.
在“知识图谱创建”部分稍后将讨论的另一种与“领域图谱”不同的知识图谱正在崭露头角并证明其有用性。这是“词汇图谱”,它不是世界模型,而是一个向量片段图,以及它们如何相互关联和与其周围的文档结构:表格/ 图表/ 页/ 文档/ 文集/ 作者等。
4 Naturally this often shows up in the real world not just as a single all-encompassing step, but increasingly as a part of an agentic pipeline that follows its own set of steps and logic. This by the way is also a graph. As these get more complex one could potentially see capturing these workflows and rules in a graph database rather than in code. But we’re not there yet and it’s a different topic from the one at hand.
4 这在现实世界中往往不会表现为单一的、包罗万象的步骤,而是越来越多地体现在遵循自身逻辑和流程的代理管道中。顺便说一句,这也是一个图。随着这些变得越来越复杂,人们可能会考虑将这些工作流和规则存储在图形数据库中,而不是代码中。但我们还没到那一步,这与手头的话题是不同的。
5 This kicks in once you already have a knowledge graph in place. This doesn’t happen for free, but you may be surprised at how accessible this is becoming with the latest advances. Because this is such a foundational topic, we’ve dedicated a section after this one on the science and art of building a knowledge graph.
一旦已经有了知识图谱,这个就会生效。这并非免费,但随着最新进展,你可能会惊讶于它变得多么易用。由于这是个基础性话题,我们在本章之后专门有一节讲述构建知识图谱的科学与艺术。
6 Knowledge graphs can also help with other forms of traceability, such as capturing how data flows between systems with systems-of-systems / provenance / data lineage graphs. They can also offer other AI benefits, such as keeping track of resolved entities. Since the focus here is GraphRAG, we’ll leave all of that aside.
6 知识图谱还可以协助其他形式的追踪,比如通过系统间体系/来源/数据血缘关系图记录数据如何在系统之间流动。它们也能提供其他 AI 优势,如跟踪已解决的实体。鉴于此处的重点是 GraphRAG,我们暂且不谈这些。
7 If you’re looking to dive more deeply into this and get your hands into some working code, I highly recommend my colleague Tomaz Bratanic’s post: Implementing ‘From Local to Global’ GraphRAG with Neo4j and LangChain: Constructing the Graph. This takes Microsoft’s work a step further, integrating it into a Neo4j + Langchain pipeline.
7 如果你想深入了解并亲手实现一些代码,我强烈推荐我的同事 Tomaz Bratanic 的文章:使用 Neo4j 和 LangChain 实现从局部到全局的 GraphRAG:构建图。这篇文章在微软的工作基础上更进一步,将其融入了 Neo4j+Langchain 管道中。
8 The paper itself includes a more detailed comparison of the GraphRAG and vector-only RAG approaches, finding that GraphRAG improved answers by 77.6% in MRR and by 0.32 in BLEU over the baseline.
8 纸本身对 GraphRAG 和向量-only RAG 方法进行了更详细的比较,发现 GraphRAG 在 MRR 上提高了 77.6%,BLEU 得分比基线高出 0.32。
9 Powered by Neo4j, as it happens.
9 由 Neo4j 提供动力。
10 Which is a great webinar showing off using their new (circa May ‘24) Property Graph Index, which includes built-in methods for converting text into a graph.
10 非常棒的网络研讨会,展示了他们新(约 24 年 5 月)的物业图索引,该索引内置了将文本转换为图的方法。
11 I think we all know what “impact” means, but just to break it down: this includes any decision where a wrong answer can have health & human safety impacts, social & fairness impacts, reputational impacts, or high dollar impacts. It obviously also includes any decision that might fall under government regulation or where there is otherwise a compliance impact.
11 我想我们都知道“影响”意味着什么,但简单来说:这包括任何可能产生健康与人身安全影响、社会与公正影响、声誉影响或高价值影响的决策。显然,这也包括任何可能受政府监管或有合规性影响的决定。
12 Note that the term word “lexical” here refers not just to individual words, but more broadly (as the following dictionary definition suggests) “of or relating to words or the vocabulary of a language”. This encompasses everything that lies in the domain of a body of words and their relationships.
12 注意,“词汇”一词在这里不仅指单个词,而更广泛地(如词典定义所示)指的是“与单词或语言词汇有关的”。这涵盖了词汇及其关系所在的范畴。
13 A few libraries that do this are, in no particular order: Docs2KG, Diffbot, GLiNER, spaCy, NuMind, NetOwl®, and (particularly for its strength in entity resolution) Senzing.
13 按无特定顺序,以下是一些实现此功能的库:Docs2KG、Diffbot、GLiNER、spaCy、NuMind、NetOwl®,以及(特别是因其实体识别能力)Senzing。
14 Stay tuned for a new version of this tool in H2 2024 that will support direct connectivity to your relational database of choice.
14 请关注 2024 年 H2 我们将推出的新版本,该工具将支持您选择的关系数据库的直接连接。
15 NeoConverse and the LLM GraphBUilder are both part of a growing body of GraphRAG Ecosystem Tools built by Neo4j.
15 NeoConverse 和 LLM GraphBuilder 都是 Neo4j 打造的图谱 RAG 生态系统工具的一部分。