When to Use GraphRAG

Louis-François Bouchard

Making AI accessible. 🎥 What's AI on YouTube. Co-founder at Towards AI. ex-PhD Student.

Published Aug 12, 2024

Good morning everyone! In this iteration, we focus on the new hype in LLMs: GraphRAG.

GraphRAG is a powerful extension to the Retrieval-Augmented Generation (RAG) stack making a lot of noise thanks to Microsoft and LlamaIndex’s contributions.

But the question remains: Should YOU be using it?

To answer when we need it, we first need to understand what it is...

This issue is brought to you thanks to Yandex.

1️⃣ From QuIP to AQLM with PV-Tuning: LLM compression at the extreme

The trade-off between large model size and computational efficiency has long been a challenge in deploying language models. The research community has been looking to reduce model size by 8 times, down to 2 bits. This year, they found a way to do it without sacrificing model performance.

Here's the story behind the evolution of extreme LLM compression methods.

2️⃣ GraphRAG

Before we start, this is a piece I made along with two friends working at Towards AI for our weekly High Learning Rate newsletter (which you should follow), where we share real-world solutions for real-world problems, and do our best to teach to leverage AI's potential with insider tips from specialists in the field, every week.

What is GraphRAG?

GraphRAG enhances traditional RAG by incorporating knowledge graphs into the retrieval process. Instead of relying solely on vector similarity (comparing numbers to find the most relevant ‘similar’ matches), GraphRAG extracts entities and relationships from your data, creating a structured representation that captures semantic connections. Semantic means understanding the meaning behind words or data, in a specific context, not just their literal definitions. This approach allows for more nuanced and context-aware retrieval, potentially leading to more accurate and comprehensive responses from your LLM.

A knowledge graph is simply a structured representation of data that captures entities and their relationships, allowing for better understanding and retrieval of information.

When to Use GraphRAG: It's All About Your Data

The decision to implement GraphRAG heavily depends on your dataset's nature. If your data is rich in interconnected entities and relationships - think academic papers (many cite each other and progress in time), corporate knowledge bases, or complex historical records - GraphRAG might outperform regular RAG. It’s perfect for capturing and leveraging these connections, enabling more informed and contextually relevant retrievals that standard RAG might miss.

User Queries: Complexity is Key

GraphRAG is most useful when dealing with complex, multi-faceted queries that require traversing multiple pieces of information (or asking meta-questions about the data itself, such as “How many papers have been published between 2010 and 2020 about RAG” (Spoiler: 0)). If your users frequently ask questions like "How does the theory proposed in Paper A relate to the findings in Paper B, and what are the implications for field C?", GraphRAG's ability to navigate and synthesize information across your knowledge graph becomes essential, whereas regular RAG might just bring out the most relevant chunks to some of these topics, and the LLM might hallucinate the rest.

Data Storage Considerations

While GraphRAG can work with various data storage systems, it's particularly powerful when your data is already structured in a graph-like format or can be easily transformed into one. Graph databases like Neo4j or Amazon Neptune are natural fits, but even relational databases can be leveraged if you have a clear understanding of the relationships between your data entities.

Recommended by LinkedIn

How Good Are the Latest Open LLMs? And Is DPO Better…

Sebastian Raschka, PhD 6 months ago

How Knowledge Graphs Enhance LLM Application…

Data Science Dojo 1 month ago

To Data & Beyond Week 21 Summary

Youssef Hosni 6 months ago

p.s. ideally, you want a dataset built for that with relationship information (such as who is citing who), but you do not necessarily need that. Fortunately for us, libraries like Microsoft’s GraphRAG do that automatically, using the best LLM to find our entities and relationships.

When to Skip GraphRAG

Despite its power, GraphRAG isn't always the best choice. For simpler datasets (and single-faceted queries) with straightforward relationships or when dealing primarily with structured text documents, traditional RAG or advanced search methods might be more efficient. Advanced methods include hybrid search, which combines vector similarity and keyword search, or techniques that use metadata filtering to narrow down search possibilities.

It’s important to note that GraphRAG introduces additional complexity and computational overhead in index creation and query processing, which may not be justified for straightforward information lookup tasks. This is an example from Microsoft’s paper comparing traditional RAG and GraphRAG for the same query:

Even though the results are more interesting, GraphRAG requires almost 10x more time and 10x more tokens to produce. Make sure you need it!

Combining Approaches: The Router Strategy

In real-world applications, a one-size-fits-all approach rarely works. Consider implementing a router system that can dynamically choose between GraphRAG, Advanced RAG, text-to-SQL retrieval, or any other search method based on the query type and available data. This flexible approach ensures you're using the most appropriate retrieval method for each specific query, optimizing both performance and accuracy. You will need a good base LLM and prompt to re-orient your queries to the right retrieval system.

TL;DR: GraphRAG - Powerful but Not Universal

GraphRAG offers a significant improvement in information retrieval capabilities for complex, interconnected datasets and queries requiring deep relational understanding. However, it comes with increased complexity and resource requirements. Evaluate your specific use case, data structure, and query patterns carefully. For many applications, a combination of retrieval methods, orchestrated by a smart router, will provide the best balance of performance and flexibility.

And that's it for this iteration! I'm incredibly grateful that the What's AI newsletter is now read by over 20,000 incredible human beings. Click here to share this iteration with a friend if you learned something new!

Looking for more cool AI stuff? 👇

Looking for AI news, code, learning resources, papers, memes, and more? Follow our weekly newsletter at Towards AI!
Looking to connect with other AI enthusiasts? Join the Discord community: Learn AI Together!

Want to share a product, event or course with my AI community? Reply directly to this email, or visit my Passionfroot profile to see my offers.

Thank you for reading, and I wish you a fantastic week! Be sure to have enough sleep and physical activities next week!

Louis-François Bouchard

The What's AI Newsletter

12,539 followers

+ Subscribe

Ahmed Moubtahij

ing. | NLP Scientist | ML Engineer | LLMs

3mo

;)

1 Reaction

Brendon Ribeiro

3mo

Thanks for sharing this valuable information. I will make more use of GraphRAG.

1 Reaction

See more comments

To view or add a comment, sign in

See all

When to Use GraphRAG

Louis-François Bouchard

Making AI accessible. 🎥 What's AI on YouTube. Co-founder at Towards AI. ex-PhD Student.

1️⃣ From QuIP to AQLM with PV-Tuning: LLM compression at the extreme

2️⃣ GraphRAG

What is GraphRAG?

When to Use GraphRAG: It's All About Your Data

User Queries: Complexity is Key

Data Storage Considerations

Recommended by LinkedIn

When to Skip GraphRAG

Combining Approaches: The Router Strategy

TL;DR: GraphRAG - Powerful but Not Universal

The What's AI Newsletter

12,539 followers

More articles by this author

Insights from the community

Others also viewed

🗃️ GraphRAG Evolves into StructRAG

AI Newsletter

To Data & Beyond Week 17 Summary

🧑🔬 The Next Impact Factor

The LLaMA Takeover

To Data & Beyond Week 3 Summary

Top LLM Papers of the Week (November Week 2, 2024)

⚡️ Infinite Text Input? This changes everything.

Introducing Mixtral-8x22B: The new open model from Mistral outperforms all existing open LLMs 🔥

Explore topics

1️⃣ From QuIP to AQLM with PV-Tuning: LLM compression at the extreme

2️⃣ GraphRAG

What is GraphRAG?

When to Use GraphRAG: It's All About Your Data

User Queries: Complexity is Key

Data Storage Considerations

Recommended by LinkedIn

When to Skip GraphRAG

Combining Approaches: The Router Strategy

TL;DR: GraphRAG - Powerful but Not Universal

The What's AI Newsletter

12,539 followers

Build Smarter RAG Systems

Dec 4, 2024

How to Improve Your RAG Pipeline with Fine-Tuned Embedding Models (and black Friday deal!)

Nov 30, 2024

Is RAG Dead? RAG vs. Long Context Models

Nov 27, 2024

When is AI Integration REALLY Worth the Hype?

Nov 24, 2024

Advanced RAG Evaluation Techniques for Optimal LLM Performance

Nov 22, 2024

Indexing Methods for Vector Retrieval

Nov 19, 2024

Releasing our 90+ lesson practical LLM Developer course!

Nov 16, 2024

LLM Evaluations: Find the Best AI Model for Your Specific Task (no code)

Nov 15, 2024

Master Multi-Agent Systems Like a PRO with AGENTIC AI

Nov 12, 2024

Running YOLOv7 on Your Phone

Nov 9, 2024

Insights from the community

Others also viewed

🗃️ GraphRAG Evolves into StructRAG

AI Newsletter

To Data & Beyond Week 17 Summary

🧑🔬 The Next Impact Factor

The LLaMA Takeover

To Data & Beyond Week 3 Summary

Top LLM Papers of the Week (November Week 2, 2024)

⚡️ Infinite Text Input? This changes everything.

Introducing Mixtral-8x22B: The new open model from Mistral outperforms all existing open LLMs 🔥

Explore topics