Skip to content

RAG

Contents

Introduction

RAG stands for Retrieval-Augumented Generation. RAG system works two steps: 1. Retrieve: It retrieves relevant information from a large corpus of text. 2. Generate: It generates a response based on the retrieved information. Common Use-case: question answering, document summarization, content generation. RAG Why do we need RAG? 1. avoid hallucination 2. timeliness 3. LLMs cannot access private data, feed more internal/user private data to get customized results. 4. Answer constraint.

A naive RAG mainly consists of the following steps: 1. Indexing: Cleaning and extracting the raw text into standardized plain text -> Chunking -> transformed into vector via embedding -> create (key, value) pairs, which is (index, vector) pairs. 2. Retrieval: users query processed by an encoding model -> query embedding -> similarity search on a vector database -> top-k results are retrieved. 3. Generation: user query and retrieved documents are fed into a prompt template -> generate the response.

Best practices of RAG

In Paper [2]:

A typical RAG workflow usually contains multiple intervening processing steps: query classification (determining whether retrieval is necessary for a given input query), retrieval (efficiently obtaining relevant documents for the query), reranking (refining the order of retrieved documents based on their relevance to the query), repacking (organizing the retrieved documents into a structured one for better generation), summarization (extracting key information for response generation from the repacked document and eliminating redundancies) modules. Implementing RAG also requires decisions on the ways to properly split documents into chunks, the types of embeddings to use for semantically representing these chunks, the choice of vector databases to efficiently store feature representations, and the methods for effectively fine-tuning LLMs

RAG Workflow

  1. Query Classification. For tasks entirely based on user-given information, we denote as “sufficient”, which need not retrieval; otherwise, we denote as “insufficient”, and retrieval may be necessary.
  2. Chunking. Three types of chunking: token sentence, and semantic levels.
  3. Token-level chunking: split the text into tokens, usually with a fixed length.
  4. Sentence-level chunking: split the text into sentences.
  5. Semantic-level chunking: take the embeddings of every sentence in the document, comparing the similarity of all sentences with each other, and then grouping sentences with the most similar embeddings together.
  6. Vector databases. Store embedding vectors with their metadata, enabling efficient retrival of documents relevant to queries through various indexing and approximate nearest neighbor search.
  7. Retrieval Method. The recommended steps:
  8. query rewriting.
  9. query decomposition.
  10. pseudo-document generation. This approach generates a hypothetical document based on the user query and uses the embedding of hypothetical answers to retrieve similar documents. One notable implement is HyDE.
  11. Hybrid search. Combining sparse retrieval (BM25) and dense retrieval (original embedding). The weights between the two retrieval methods can be appropriately adjusted.
  12. Reranking. Enhance the relevance of the retrieved documents.
  13. Document repacking. The performance of subsequent processes, such as LLM response generation, may be affected by the order documents are provided.
  14. Summarization. Extractive or abstractive.

img.png

Graph RAG

Why RAG is not enough?

Traditional RAG systems, while powerful, have several limitations:

  1. Loss of Structural Information: Standard RAG treats documents as independent chunks, losing important relationships and connections between pieces of information. Real-world knowledge often has inherent graph-like structures (e.g., relationships between entities, hierarchical information, or causal chains).

  2. Limited Context Understanding: When retrieving information, traditional RAG looks at chunks in isolation. This can miss broader context that might be spread across multiple related documents or sections.

  3. Inability to Handle Complex Queries: Questions that require connecting multiple pieces of information or understanding relationships between entities are difficult for traditional RAG to handle effectively.

  4. Static Document View: Traditional RAG typically treats documents as static pieces of text, without capturing how information evolves or relates to other pieces of knowledge over time.

Graph RAG addresses these limitations by: - Representing knowledge as a graph structure where nodes contain information and edges represent relationships - Preserving structural information during retrieval - Enabling multi-hop reasoning across connected pieces of information - Supporting more complex query patterns that require traversing relationships

Useful videos - GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

Papers

My literature review: RAG 1. Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv, April 12, 2021. http://arxiv.org/abs/2005.11401. The first paper talks about RAG - models which combine pre-trained parametric and non-parametric memory for language generation. RAG models, the parametric memory is a pre-trained seq2seq transformer and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. other resource: Youtube video - https://www.youtube.com/watch?v=JGpmQvlYRdU (by the Author of the paper) - https://www.youtube.com/watch?v=dzChvuZI6D4 (explanation of the paper) 2. Wang, Xiaohua, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, et al. “Searching for Best Practices in Retrieval-Augmented Generation.” arXiv, July 1, 2024. http://arxiv.org/abs/2407.01219. it gives an overview of current practice of RAG. A good tech blog to explain the paper.

  1. Shi, Yunxiao, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, and Min Xu. “Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems.” arXiv, July 15, 2024. http://arxiv.org/abs/2407.10670.

    This paper introduces 4 modules to solving several key challenges with RAG.

  2. Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2022.

    this is the paper talks about HYDE method.

Graph RAG - A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases

Online Articles

Introduction

Chucking - Semantic Chunking for RAG

Retrieval - HYDE: Revolutionising Search with Hypothetical Document Embeddings

Implementation

Chucking - Semantic Chunking for RAG

Retrieval - Power of Hypothetical Document Embeddings: An In-Depth Exploration of HyDE - Exploring Query Rewriting. This blog uses LlamaIndex and LangChain to demostrate several techniques for query rewriting: Hypothetical Document Embeddings (HyDE), Rewrite-Retrieve-Read, Step-Back Prompting, and etc..

Graph RAG - Enhancing RAG with Graph