Comparison of Document Summary Index & Sentence Window Methods in RAG (Coding with LlamaIndex Walkthrough)
This is the coding walk-through of the RAG (Retrieval Augmented Generation) for Company Centric Documents: Summary Index vs Node Sentence Window Method Comparison article. It largely follows the demos mentioned in LlamaIndex’s Advanced RAG site.
Related Links:
Therea are 5 application files in the repository:
Because indexing takes between 10 to 20 minutes to process and has token costs, indexes needed to be persisted (saved) to disk first, and the retrieval & query are done separately by different apps. I didn’t create a main.py file to execute all as I plan to modify the app files for another larger project later on.
The two contextual retrieval methods for comparison are Document Summary Index and Node Sentence Window, which I will discuss more in detail later.
Here is a simple analogy:
Essentially, both methods are supposed to make retrieval smarter by providing “context,” but for document summary, context is like a park ranger’s map, and for sentence window, it is more the beacons to the coastguard. Now let’s go through the codes.
Code Example: Document Summary Index Method
The code here is based on LlamaIndex’s tutorial - Document Summary Index tutorial.
This is split into two files so that we only do index once:
Build Knowledgebase (Reading Documents & Indexing)
Dependencies & Settings
First, I imported the libraries from Llama-Index and OpenAI.
Read Documents
[insert: create_idx_summ_idx_md_scr_shot_2_read_docs]
I used LlamaIndex’s reader, SimpleDirectoryRead to read the all the pdf files in the data folder (there is a PDF reader - using MuPyPDF, but I used this generic reader.)
If you just want to read certain files, you can use input_files= “…” parameter instead of input_dir.
Index
[insert: create_idx_summ_idx_md_scr_shot_3_build_index]
I instantiated openai API with my openai api key (I saved it directly in my VS Code’s launch.js file, but recommend you save it in your environment).
The ServiceContext function is a setting function:
Note: you do not necessarily have to use the same LLM for embedding and query. Embedding requires you to send your entire dataset to the LLM - a different level of data exposure than just the query process. If you are using cloud LLM and working with big document files, embedding engines makes more sense b/c their token costs are much cheaper. Also, if you have to keep data in-house, you will have to use an on-premises LLM instead.
“splitter = SentenceSplitter …” specifies what “splitter” to use to cut texts into “chunks”, small snippets of texts that will be pulled out to be combined with your query to form the “augmented” prompt into a LLM (the core concept of RAG.) Chunk size is how big each snippet should be - 1,024 characters is the industry norm. Vanilla RAG models usually chunk exactly by no. of characters, which cut off sentences and words. Here, because we are using a sentence splitter here (SentenceSplitter), if the chunk length is reached but the sentence is not done yet, it will finish the sentence and then chunk (i.e., it will finish “related to the UK only.”
Each chunk is also referred to as nodes “chunk”, but it’s commonly called a “node” by LlamaIndex and some others, to reflect their relationships between the chunks and higher levels like sections, paragraphs, chapters, documents, so on…
It’s also important to note that the documents are also broken down into sub-documents (each pdf file is segmented into 20 to 100+ sub-documents depending on the document size.) Because I set filename_as_id = True when reading the documents, document ids are the full file paths of the original pdf documents plus “.pdf_part_0”, “…part_1”, etc. after.
The next line of code is the crux of the customization:
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize", use_async=False
)
The get_response_synthesizer function is part of the llama_index.response_synthesizers module. It sends text chunks into a Large Language Model (LLM) to get responses - summaries of the texts - back. But why is it here in the indexing stage?
The synthesizer here is to enhance RAG with LLM - we are using ChatGPT as a text summarizer. It shrinks the text size and builds a simple hierarchical structure in the retrieval knowledgebase. The response_mode determines how the user query and the retrieved text chunks are combined to form the prompt that is sent to the Language Model (LLM). Here it is set to “tree_summarize”, meaning hierarchical summarization.
When the synthesizer is called by DocumentSummaryIndex() in the next line, the LLM summarizes the text at the sub-document level.
Note: other response_mode include refine, compact, simple_summarize, no_text, accumulate, compact_accumulate: Similar to accumulate, but will "compact" each LLM prompt similar to compact, and run the same query against each text chunk.
Save/Persist to Disk
After indexing, which takes some time, I saved the index to disk using the persist method.
A Note on LlamaIndex’s Data Structure
If you are new to LlamaIndex like me, you may find LlamaIndex’s data structure here a little confusing. The chatbot actually won’t help much. I had to dig into the source code.
So let me explain it further:
After indexing, there are two “layers of nodes”: leaf nodes with original text chunks (bottom layer), and root nodes with summary texts (top layer.) See below:
Personally, I find the creation of the “document” class to be redundant and confusing:
LlamaIndex’s NodeRelationship class’s constructor attributes has “previous”, “next”, “parent”, and “child”. So, the options are there in the BaseNode class. These simple “front-back-up-and-down” attributes are enough to cover any hierarchy you can think of. But in DocumentSummaryIndex, the parent and child attributes were not used at all. It may be becasue the document summary index tool was built a while back. Next time I will just use LlamaIndex's tree classes directly (build my own structure if needed.)
Query (Retriever, Synthesizer, & Query Engine)
Dependencies & Setting
Imported libraries from Llama-Index and OpenAI. I used Settings to specify which LLM to use this time.
Load Index
Recommended by LinkedIn
Load the index files from disk into an index object (if you used a vector database, like Pinecone or Chromadb, load accordingly.)
Set up Questions
I set up 9 questions to test the model. The first 3 questions are direct and specific. The rest are more nuanced.
Retrieval, Synthesize, and Query
Simple Method (with Default Settings)
The high-level querying uses default parameters to retrieve, synthesize, and query, so that you don’t have to worry about configurations, and you can call it directly from the vectorstore class. But it’s very limiting, as you will see later in the results section. For example, it only retrieves the top match (search in sub-document summaries sub-document but retrieve the nodes underneath it) while most RAG models retrieve at least top 3.
Embedding Retrieval Method (Custom Settings)
For more custom retrievals, LlamIndex has built two tools specific to the tree summarization approach: the LLMRetriever method and the embeddingRetriever method (both are in the same file and share the same config). The former uses LLM to perform the query. The latter plugs the embedded query vector directly into the index for retrieval. I picked the latter because it should be faster and saves more API tokens.
How it works:
I instantiated the retriever object and set similarity_top to 3 instead of just 1 (top 3 results, not just 1.)
I used the retriever’s retrieve method to take the query, retrieved the top 3 results, and iterated to display the retrieved nodes. This is to show us what’s under the hood.
I then instantiated get_response_synthesizer. Response_mode was set to “tree_summerize.” (But the method allows you to customize a few other things, such as the prompt template to combine retrieve nodes and the query. But I left them to default.)
Finally, I used the RetrieverQueryEngine to activate the whole thing and performed the query.
Coding Example: Sentence Window Method
The code largely follows the “Metadata Replacement + Node Sentence Window” tutorial.
Build Knowledgebase
Dependencies & Settings
Import similar libraries as in the document summary index method.
[insert create_idx_sent_window_md_scr_shot_2_settings]
Instantiate OpenAI API with your API key. Here, I used Settings to specify which LLM to use. The window method is very token intensive. LlamaIndex recommended using a local embedding engine instead - HuggingFace’s embedding engine. I opted for OpenAI’s “text-embedding-ada-002” instead because my dataset is not that big.
Also, a different node parser was imported.
Read Documents
Load the documents using SimpleDirectoryReader, exactly like in the document summary index method.
Extract Nodes
[insert create_idx_sent_window_md_scr_shot_4_extract_nodes]
Indexing for sentence window is fairly straightforward. The only nuance is that “chunking” and indexing/embedding is a 3-step process:
The SentenceWindowNodeParser is a subclass of the NodeParser. It’s similar to a vanilla sentence parser, except for storing extra sentences in the metadata dictionary. Here we set the window_size to 3, metadata_keys to “window” and “original_text”. This means that each complete sentence is a node, but in the metadata dictionary, it is saving 7 sentences (3 before + 1 as itself + 3 after), and in metadata, we put the 7 under a key called “window” and the one sentence under “original_text.”
Note that the documents are also automatically broken into 409 small sub-documents, but here, there is no summarization nor hierarchy, the sub-documents (what LlamaIndex calls “documents” or “docs”) do not really matter.
Index and Save
Use the standard vectorstoreindex to index/embed.
Query
Dependencies & Settings
Dependencies and settings are mostly the same from the build index app.
Load Index
Load from index files to create an index instance.
Retrieve & Query
Set up query questions exactly the same as the other code example.
The MetadataReplacementPostProcessor class in LlamaIndex is used to replace the sentence in each node with its surrounding context.
Output
Here are selected response samples:
Document Summary Index
Method: Sentence Windows
See GitHub Repository for the detailed output excel file.
See the main article, Comparative Analysis of Summary Index and Node Sentence Window Methods in RAG for Local Subsidiary Documents, for more in-depth analysis.
Data Scientist
4moThanks, I found this article incredibly informative. Navigating source code can be a pain, documentation wasn't helpful enough. Wish you a great day.