RAG開發技術堆疊綜合指南

構建 RAG（檢索增強生成）應用程式不僅僅是插入幾個工具，而是要選擇正確的技術堆疊，使檢索和生成不僅成為可能，而且高效、可擴充套件。

比方說，您正在開發類似“基於 PDF 文件的 AI 聊天”的人工智慧應用程式，讓使用者與 PDF 進行對話式互動。這並不像載入檔案和提問那麼簡單。您需要

從 PDF 中提取相關內容
將文字分割成有意義的片段
將這些片段儲存到向量資料庫中
然後，當使用者提出問題時，應用程式會執行相似性搜尋，獲取最相關的文字塊，並將它們傳遞給語言模型，以生成連貫、準確的回覆。

聽起來很複雜？是的。跨多個工具、框架和資料庫工作會很快讓人應接不暇。

這正是整理這份RAG開發技術堆疊的原因–這是一套精心設計的工具和框架，旨在簡化整個流程。從智慧資料提取器到高效的向量資料庫，再到高價效比的生成模型，它能滿足您的一切需求，讓您無需每次都重新發明輪子，就能構建強大、可投入生產的 RAG 應用程式。

為什麼需要RAG開發堆疊？

RAG開發堆疊

Source: Hugging Face

首先，這裡簡要介紹一下檢索增強生成（RAG）–檢索增強生成（RAG）透過整合外部資訊檢索機制來增強大型語言模型（LLMs）的能力。這種方法透過用最新的或特定領域的資訊補充靜態訓練資料，使 LLM 生成更準確、與上下文相關和有事實根據的反應。

RAG如何工作？

RAG 的執行分為四個關鍵階段：

編制索引：將外部來源（如文件、資料庫）的資料轉換為向量表示（嵌入）並儲存在向量資料庫中。這樣就能高效檢索相關資訊。
檢索：當使用者提交查詢時，系統會使用基於相似性的搜尋技術從索引來源中檢索最相關的資料。
增強（Augmentation）：透過提示工程將檢索到的資訊與使用者的查詢結合起來，有效地 “增強” LLM 的輸入。
生成：LLM 利用其內部知識和增強的提示來生成回覆。這一過程可確保輸出結果既參考了預先訓練的資料，也參考了即時的權威來源。

現在，您為什麼需要 RAG 開發堆疊？

為什麼需要RAG開發堆疊？

加速開發：利用預構建、可隨時整合的元件，更快地從原型轉向生產。
提高準確性：檢索即時的、與上下文相關的資料，以確定響應並減少幻覺。
加強部署：內建工具增強了安全性、可觀察性和可擴充套件性，使生產準備工作更加順利。
靈活性最大化：模組化設計可讓您混合和搭配工具，以適應不同行業和用例的獨特需求。
可定製設計：開發人員可以根據自己的工作流程、架構和效能目標，親自挑選適合自己的元件。

為您的下一個專案提供RAG開發堆疊

以下是開發 RAG 專案應瞭解的 9 件事：

1. 大型語言模型 (LLM)

大型語言模型 (LLM)

Source: Author

LLM 是 RAG 系統的大腦，利用基於轉換器的架構生成連貫且與上下文相關的文字。這些模型分為兩類：

開源 LLM：例如 LLaMA、Falcon、Cohere 等，允許定製和本地部署。
封源 LLM：GPT-4 和 Bard 等專有模型可提供高階功能，但通常只能透過 API 訪問。

RAG系統架構

Source: Author

在RAG中使用LLM的示例

我已經使用 JSON 載入器匯入了 JSON 文件，下面是瞭解 RAG 中如何使用 LLM 的管道。

提示模板

from langchain_core.prompts import ChatPromptTemplate

rag_prompt = """You are an assistant who is an expert in question-answering tasks.

Answer the following question using only the following pieces of retrieved context.

If the answer is not in the context, do not make up answers, just say that you don't know.

Keep the answer detailed and well formatted based on the information from the context.

Question:

{question}

Context:

{context}

Answer:

"""

rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

from langchain_core.prompts import ChatPromptTemplate rag_prompt = """You are an assistant who is an expert in question-answering tasks. Answer the following question using only the following pieces of retrieved context. If the answer is not in the context, do not make up answers, just say that you don't know. Keep the answer detailed and well formatted based on the information from the context. Question: {question} Context: {context} Answer: """ rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

from langchain_core.prompts import ChatPromptTemplate
rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer detailed and well formatted based on the information from the context.
                Question:
                {question}
                Context:
                {context}
                Answer:
            """
rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

管道構建

from langchain_core.runnables import RunnablePassthrough

from langchain_openai import ChatOpenAI

# Initialize ChatGPT model

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

# Format documents into a single string

def format_docs(docs):

return "\n\n".join(doc.page_content for doc in docs)

# Construct the RAG pipeline

qa_rag_chain = (

{

"context": (similarity_retriever | format_docs),

"question": RunnablePassthrough()

}

rag_prompt_template

chatgpt

)

from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI # Initialize ChatGPT model chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0) # Format documents into a single string def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) # Construct the RAG pipeline qa_rag_chain = ( { "context": (similarity_retriever | format_docs), "question": RunnablePassthrough() } | rag_prompt_template | chatgpt )

from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
# Initialize ChatGPT model
chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)
# Format documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
# Construct the RAG pipeline
qa_rag_chain = (
    {
        "context": (similarity_retriever | format_docs),
        "question": RunnablePassthrough()
    }
      |
    rag_prompt_template
      |
    chatgpt
)

使用示例

query = "What is the difference between AI, ML, and DL?"

result = qa_rag_chain.invoke(query)

# Display the generated answer

from IPython.display import display, Markdown

display(Markdown(result.content))

query = "What is the difference between AI, ML, and DL?" result = qa_rag_chain.invoke(query) # Display the generated answer from IPython.display import display, Markdown display(Markdown(result.content))

query = "What is the difference between AI, ML, and DL?"
result = qa_rag_chain.invoke(query)
# Display the generated answer
from IPython.display import display, Markdown
display(Markdown(result.content))

輸出

RAG 中如何使用 LLM 的管道

2. 用於RAG響應生成的LLM

在檢索增強生成（RAG）系統中，響應生成 LLM 作為最終決策者扮演著重要角色–它將檢索到的文件、使用者查詢和上下文綜合成一個連貫的、相關的、通常是對話式的響應。雖然檢索模型會帶來潛在的有用資訊，但 LLM 可以進行推理、總結和上下文關聯，從而確保輸出結果具有智慧感和人性化。這一點在企業搜尋、客戶支援、法律/醫療助理和技術問答等應用中尤為重要，因為在這些應用中，使用者希望得到精確、有根據和可信的回答。

一言以蔽之，如果沒有有效的生成模型，即使是最好的檢索堆疊也會變得平淡無奇–這使得該元件成為任何 RAG 管道的核心大腦。

商業LLM

模型	開發商	關鍵優勢	常見用例
GPT-4.5	OpenAI	高階文字生成、摘要、對話流暢性	聊天機器人、客戶支援、內容建立
Claude 3.7 Sonnet	Anthropic	即時對話、強推理、”擴充套件思維模式”	業務自動化、客戶服務
Gemini 2.0 Pro	Google DeepMind	多模態（文字 + 影像）、高效能	資料分析、企業自動化、內容生成
Cohere Command R+	Cohere	檢索增強生成（RAG）、企業級設計	知識管理、支援自動化、節制
DeepSeek	深度求索	內部部署、安全資料處理、高度可定製性	金融、醫療保健、隱私敏感行業

開源LLM

模型	開發商	關鍵優勢	常見用例
LLaMA 3	Meta	可擴充套件（多達 405B 個引數）、多模態功能	對話式人工智慧、研究、內容生成
Mistral 7B	Mistral AI	輕量級但功能強大，針對程式碼和聊天進行了最佳化	程式碼生成、聊天機器人、內容自動化
Falcon 180B	Technology Innovation Institute	高效、高效能、開放訪問	即時應用、科學/研究機器人
DeepSeek R1	深度求索	強大的邏輯/推理能力，128K 上下文視窗	數學任務、總結、複雜推理
Qwen2.5-72B-Instruct	阿里雲	727 億個引數，支援多達 128K 標記的長上下文、編碼、數學推理和多語言支援。	可生成 JSON 等結構化輸出，因此在 RAG 工作流程中的技術應用方面具有很強的通用性。

3. 框架

RAG框架

Source: Author

框架透過提供預置元件簡化了 RAG 應用程式的開發：

LangChain：用於 LLM 應用程式開發的框架，具有提示管理、鏈、記憶體處理和代理建立的模組化架構。擅長構建 RAG 管道，內建支援文件載入器、檢索器和向量儲存。
LlamaIndex：用於資料索引和檢索的專用框架，透過自定義索引將非結構化資料與語言模型連線起來。針對聊天機器人和知識管理的大型資料集的攝取、轉換和查詢進行了最佳化。
LangGraph：它將 LLM 與基於圖的結構整合在一起，允許開發人員使用節點和邊定義應用邏輯。它是具有多個分支和反饋迴路的複雜工作流的理想選擇，尤其適用於多代理系統。
RAGFlow：專門用於檢索-增強生成系統的框架，可將檢索器、排序器和生成器協調成連貫的管道。當從外部資料來源提取資料用於搜尋驅動介面和問答系統時，可增強相關性。

框架與RAG系統構建

Source: Author

LangChain、LangGraph 和 LlamaIndex 等框架透過提供整合檢索和生成流程的模組化工具，大大簡化了 RAG（檢索-增強生成）的開發。LangChain 簡化了 LLM 呼叫鏈、提示管理以及與向量儲存的連線。LangGraph 引入了基於圖形的流程控制，實現了動態和多步驟的 RAG 工作流。LlamaIndex 專注於資料攝取、索引和檢索，使 LLM 可以查詢大型資料集。它們共同抽象出複雜的基礎架構，使開發人員能夠專注於邏輯和資料質量。透過這些工具，可以為問題解答、文件搜尋和知識輔助等任務快速建立原型並穩健部署 RAG 應用程式。

構建RAG的框架示例

讓我們使用 LangChain 構建一個簡單的 RAG：

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph

!pip install -qU "langchain[openai]"

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph !pip install -qU "langchain[openai]"

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
!pip install -qU "langchain[openai]"

聊天模型

import getpass

import os

if not os.environ.get("OPENAI_API_KEY"):

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

import getpass import os if not os.environ.get("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ") from langchain.chat_models import init_chat_model llm = init_chat_model("gpt-4o-mini", model_provider="openai")

import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")

選擇嵌入模型

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

選擇向量儲存

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

from langchain_core.vectorstores import InMemoryVectorStore vector_store = InMemoryVectorStore(embeddings)

from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)

建立索引管道

import bs4

from langchain import hub

from langchain_community.document_loaders import WebBaseLoader

from langchain_core.documents import Document

from langchain_text_splitters import RecursiveCharacterTextSplitter

from langgraph.graph import START, StateGraph

from typing_extensions import List, TypedDict

# Load and chunk contents of the blog

loader = WebBaseLoader(

web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),

bs_kwargs=dict(

parse_only=bs4.SoupStrainer(

class_=("post-content", "post-title", "post-header")

)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

all_splits = text_splitter.split_documents(docs)

# Index chunks

_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering

prompt = hub.pull("rlm/rag-prompt")

# Define state for application

class State(TypedDict):

question: str

context: List[Document]

answer: str

# Define application steps

def retrieve(state: State):

retrieved_docs = vector_store.similarity_search(state["question"])

return {"context": retrieved_docs}

def generate(state: State):

docs_content = "\n\n".join(doc.page_content for doc in state["context"])

messages = prompt.invoke({"question": state["question"], "context": docs_content})

response = llm.invoke(messages)

return {"answer": response.content}

# Compile application and test

graph_builder = StateGraph(State).add_sequence([retrieve, generate])

graph_builder.add_edge(START, "retrieve")

graph = graph_builder.compile()

import bs4 from langchain import hub from langchain_community.document_loaders import WebBaseLoader from langchain_core.documents import Document from langchain_text_splitters import RecursiveCharacterTextSplitter from langgraph.graph import START, StateGraph from typing_extensions import List, TypedDict # Load and chunk contents of the blog loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ), ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) all_splits = text_splitter.split_documents(docs) # Index chunks _ = vector_store.add_documents(documents=all_splits) # Define prompt for question-answering prompt = hub.pull("rlm/rag-prompt") # Define state for application class State(TypedDict): question: str context: List[Document] answer: str # Define application steps def retrieve(state: State): retrieved_docs = vector_store.similarity_search(state["question"]) return {"context": retrieved_docs} def generate(state: State): docs_content = "\n\n".join(doc.page_content for doc in state["context"]) messages = prompt.invoke({"question": state["question"], "context": docs_content}) response = llm.invoke(messages) return {"answer": response.content} # Compile application and test graph_builder = StateGraph(State).add_sequence([retrieve, generate]) graph_builder.add_edge(START, "retrieve") graph = graph_builder.compile()

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
# Load and chunk contents of the blog
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
# Index chunks
_ = vector_store.add_documents(documents=all_splits)
# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")
# Define state for application
class State(TypedDict):
question: str
context: List[Document]
answer: str
# Define application steps
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

response = graph.invoke({"question": "What are Types of Memory?"})

print(response["answer"])

response = graph.invoke({"question": "What are Types of Memory?"}) print(response["answer"])

response = graph.invoke({"question": "What are Types of Memory?"})
print(response["answer"])

輸出

The types of memory include Sensory Memory, Short-Term Memory (STM), and Long-Term Memory (LTM). Sensory Memory retains impressions of sensory information for a few seconds, while Short-Term Memory holds currently relevant information for 20-30 seconds. Long-Term Memory can store information for days to decades and includes explicit (declarative) and implicit (procedural) memory.

4. 資料提取

資料提取

Source: Author

如果要從其他來源提取資料，那麼資料提取工具就能很好地發揮作用。RAG 應用程式需要強大的工具來從各種來源提取結構化和非結構化資料：

網站、PDF、Word 文件、幻燈片等。
BeautifulSoup 或 PyPDF2 等工具可以自動完成這一過程。

用於構建RAG的資料提取示例

pip install -U langchain-community

%pip install langchain pypdf

pip install -U langchain-community %pip install langchain pypdf

pip install -U langchain-community
%pip install langchain pypdf

讓我們從PDF檔案中提取內容

# %pip install langchain pypdf

from langchain.document_loaders import PyPDFLoader

# Define the path to your PDF file

pdf_path = "/content/Multimodal Agent Using Agno Framework.pdf"

# Initialize the PyPDFLoader

loader = PyPDFLoader(pdf_path)

# Load the PDF and split it into pages

documents = loader.load()

# Print the content of each page

for i, doc in enumerate(documents):

print(f"Page {i + 1} Content:")

print(doc.page_content)

print("\n")

# %pip install langchain pypdf from langchain.document_loaders import PyPDFLoader # Define the path to your PDF file pdf_path = "/content/Multimodal Agent Using Agno Framework.pdf" # Initialize the PyPDFLoader loader = PyPDFLoader(pdf_path) # Load the PDF and split it into pages documents = loader.load() # Print the content of each page for i, doc in enumerate(documents): print(f"Page {i + 1} Content:") print(doc.page_content) print("\n")

# %pip install langchain pypdf
from langchain.document_loaders import PyPDFLoader
# Define the path to your PDF file
pdf_path = "/content/Multimodal Agent Using Agno Framework.pdf"
# Initialize the PyPDFLoader
loader = PyPDFLoader(pdf_path)
# Load the PDF and split it into pages
documents = loader.load()
# Print the content of each page
for i, doc in enumerate(documents):
print(f"Page {i + 1} Content:")
print(doc.page_content)
print("\n")

輸出

從PDF檔案中提取內容

5. 嵌入器

嵌入器

Source: Author

文字嵌入將文字資料轉化為數字向量，用於基於相似性的檢索。超越文字嵌入：

影像嵌入：用於多模態 RAG 應用。
多模態嵌入：結合文字、影像和其他資料型別來完成複雜的任務。

以下是各提供商的嵌入模型：

OpenAI嵌入模型

最新模型：text-embedding-3-small （成本較低）和 text-embedding-3-large（準確度較高）
功能：動態維度調整（如 256-3072 dim）、多語言支援、針對搜尋/RAG 進行了最佳化

Cohere Embed v3

專注於文件質量排名和噪聲資料處理
模型：英語/多語種變體（1024/384 dim），壓縮感知訓練以提高成本效益

Nomic Embed v2

具有 Matryoshka 嵌入的開源 MoE 架構（3.05 億個活動引數）
多語種（100 多種語言），在 MTEB/BEIR 基準測試中表現優於 2 倍規模的模型

Gemini嵌入

具有 8K 標記輸入和 3K 維度的實驗模型（gemini-embedding-exp-03-07）
MTEB 排行榜領先者（平均分 68.32），支援 100 多種語言

Ollama嵌入

支援 mxbai-embed-large 等模型和自定義變體（如 suntray-embedding）。
專為 RAG 工作流設計，具有本地推理和 ChromaDB 整合功能

BGE (BAAI)

基於 BERT 的模型（大型/基礎/小型-en-v1.5），用於檢索/RAG
開源，支援指令調整（例如 “表示這個句子……”）

Mixedbread

Mixedbread AI 的 mxbai-embed-large-v1 模型是最先進的句子嵌入解決方案，專為多語言和多模態檢索任務而設計。
它支援 Matryoshka 表徵學習（MRL）和二進位制量化等先進技術，可實現高效記憶體使用和大規模降低成本。它在各種任務中表現出色，可與大型專有模型相媲美，同時保持開源的可訪問性。

嵌入器與RAG

將PDF內容分割成塊

from langchain.document_loaders import PyMuPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=200):

loader = PyMuPDFLoader(file_path)

doc_pages = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

return splitter.split_documents(doc_pages)

from glob import glob

pdf_files = glob('./rag_docs/*.pdf')

# Process PDF files

paper_docs = []

for fp in pdf_files:

paper_docs.extend(create_simple_chunks(file_path=fp))

from langchain.document_loaders import PyMuPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=200): loader = PyMuPDFLoader(file_path) doc_pages = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap) return splitter.split_documents(doc_pages) from glob import glob pdf_files = glob('./rag_docs/*.pdf') # Process PDF files paper_docs = [] for fp in pdf_files: paper_docs.extend(create_simple_chunks(file_path=fp))

from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=200):
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return splitter.split_documents(doc_pages)
from glob import glob
pdf_files = glob('./rag_docs/*.pdf')
# Process PDF files
paper_docs = []
for fp in pdf_files:
    paper_docs.extend(create_simple_chunks(file_path=fp))

輸出

Loading pages: ./rag_docs/cnn_paper.pdfChunking pages: ./rag_docs/cnn_paper.pdfFinished processing: ./rag_docs/cnn_paper.pdfLoading pages: ./rag_docs/attention_paper.pdfChunking pages: ./rag_docs/attention_paper.pdfFinished processing: ./rag_docs/attention_paper.pdfLoading pages: ./rag_docs/vision_transformer.pdfChunking pages: ./rag_docs/vision_transformer.pdfFinished processing: ./rag_docs/vision_transformer.pdfLoading pages: ./rag_docs/resnet_paper.pdfChunking pages: ./rag_docs/resnet_paper.pdfFinished processing: ./rag_docs/resnet_paper.pdf

建立嵌入

from langchain_openai import OpenAIEmbeddings

from langchain_chroma import Chroma

# Initialize embedding model

openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

# Combine documents

total_docs = wiki_docs_processed + paper_docs

# Create and save vector database

chroma_db = Chroma.from_documents(documents=total_docs,

collection_name='my_db',

embedding=openai_embed_model,

collection_metadata={"hnsw:space": "cosine"},

persist_directory="./my_db")

from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma # Initialize embedding model openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small') # Combine documents total_docs = wiki_docs_processed + paper_docs # Create and save vector database chroma_db = Chroma.from_documents(documents=total_docs, collection_name='my_db', embedding=openai_embed_model, collection_metadata={"hnsw:space": "cosine"}, persist_directory="./my_db")

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Initialize embedding model
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')
# Combine documents
total_docs = wiki_docs_processed + paper_docs
# Create and save vector database
chroma_db = Chroma.from_documents(documents=total_docs,
                                  collection_name='my_db',
                                  embedding=openai_embed_model,
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_db")

6. 向量資料庫

向量資料庫

向量資料庫儲存嵌入（文字或其他資料的數字表示），可高效檢索語義相似的資料塊。例子包括

Pinecone：一個可管理的向量資料庫平臺，專為高效能和可擴充套件的應用而設計，可實現高維向量嵌入的高效儲存和檢索。
Chroma DB：一個開源的人工智慧原生嵌入式資料庫，包括向量搜尋、文件儲存、全文搜尋和後設資料過濾等功能，可促進人工智慧應用中的無縫檢索。
Qdrant：用 Rust 編寫的開源向量資料庫和搜尋引擎，提供快速、可擴充套件的向量相似性搜尋服務，支援擴充套件過濾，適用於基於神經網路或語義的匹配。
Milvus DB：為可擴充套件的相似性搜尋而構建的開源向量資料庫，能夠處理大規模動態向量資料，並支援各種索引型別以實現高效檢索。
Weaviate：開源向量資料庫，可同時儲存物件和向量，可將向量搜尋與結構化過濾相結合，具有模組化、雲原生和即時性等特點。

向量資料庫與RAG

用於構建RAG的向量資料庫示例

注：上面我們已經做了嵌入，現在我們將把它們儲存到向量資料庫中。

使用Chroma db儲存嵌入式資料

from langchain_openai import OpenAIEmbeddings

from langchain_chroma import Chroma

# Initialize embedding model

openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

# Combine documents

total_docs = wiki_docs_processed + paper_docs

# Create and save vector database

chroma_db = Chroma.from_documents(documents=total_docs,

collection_name='my_db',

embedding=openai_embed_model,

collection_metadata={"hnsw:space": "cosine"},

persist_directory="./my_db")

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Initialize embedding model
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')
# Combine documents
total_docs = wiki_docs_processed + paper_docs
# Create and save vector database
chroma_db = Chroma.from_documents(documents=total_docs,
                                  collection_name='my_db',
                                  embedding=openai_embed_model,
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_db")

載入向量資料庫

chroma_db = Chroma(persist_directory="./my_db",

collection_name='my_db',

embedding_function=openai_embed_model)

chroma_db = Chroma(persist_directory="./my_db", collection_name='my_db', embedding_function=openai_embed_model)

chroma_db = Chroma(persist_directory="./my_db",
                   collection_name='my_db',
                   embedding_function=openai_embed_model)

檢索資訊並獲得輸出

similarity_retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 5})

# Query for semantic similarity

query = "What is machine learning?"

top_docs = similarity_retriever.invoke(query)

# Display results

from IPython.display import display, Markdown

def display_docs(docs):

for doc in docs:

print('Metadata:', doc.metadata)

print('Content Brief:')

display(Markdown(doc.page_content[:1000]))

print()

display_docs(top_docs)

similarity_retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 5}) # Query for semantic similarity query = "What is machine learning?" top_docs = similarity_retriever.invoke(query) # Display results from IPython.display import display, Markdown def display_docs(docs): for doc in docs: print('Metadata:', doc.metadata) print('Content Brief:') display(Markdown(doc.page_content[:1000])) print() display_docs(top_docs)

similarity_retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 5})
# Query for semantic similarity
query = "What is machine learning?"
top_docs = similarity_retriever.invoke(query)
# Display results
from IPython.display import display, Markdown
def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()
display_docs(top_docs)

輸出

檢索資訊並獲得輸出

7. 重新排序

重新排序

Source: Link

重新排序器透過提高檢索文件的相關性來完善檢索過程：

它們在兩個階段的檢索管道中執行：

初始檢索從向量資料庫中檢索出大量候選文件。
重新排序器會根據語義相似性或上下文相關性等附加評分機制對最相關的文件進行優先排序。

透過將重新排序器整合到堆疊中，開發人員可以確保為使用者查詢量身定製更高質量的響應，同時最佳化檢索效率。

重新排序器

構建RAG的Reranker示例

%pip install --upgrade --quiet cohere

%pip install --upgrade --quiet  cohere

設定 Cohere 和 ContextualCompressionRetriever

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

from langchain_cohere import CohereRerank

from langchain_community.llms import Cohere

from langchain.chains import RetrievalQA

llm = Cohere(temperature=0)

compressor = CohereRerank(model="rerank-english-v3.0")

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

chain = RetrievalQA.from_chain_type(

llm=Cohere(temperature=0), retriever=compression_retriever

)

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever from langchain_cohere import CohereRerank from langchain_community.llms import Cohere from langchain.chains import RetrievalQA llm = Cohere(temperature=0) compressor = CohereRerank(model="rerank-english-v3.0") compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) chain = RetrievalQA.from_chain_type( llm=Cohere(temperature=0), retriever=compression_retriever )

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA
llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
llm=Cohere(temperature=0), retriever=compression_retriever
)

輸出

Reranker示例

8. 評估

評估可確保 RAG 系統的準確性和相關性：

Giskard：用於測試機器學習管道的庫。
Ragas：專門設計用於透過分析檢索質量和生成的輸出來評估 RAG 管道。
Arize Phoenix：一個開源的可觀測性庫，用於評估、故障診斷和改進 LLM 輸出，具有模型漂移檢測和佇列分析等功能。
Comet Opik：一個完全開源的平臺，用於評估、測試和監控 LLM 應用程式，並在整個開發生命週期中提供可觀測性、自動評分和單元測試工具。
DeepEval：deepeval 提供三種 LLM 評估指標來評估檢索：
- ContextualPrecisionMetric：評估檢索器（retriever）中的重新排序器（reranker）是否將檢索上下文中更相關的節點排在比不相關節點更靠前的位置。
- ContextualRecallMetric：評估檢索器中的嵌入模型是否能根據輸入上下文準確捕捉和檢索相關資訊。
- ContextualRelevancyMetric：評估檢索器的文字塊大小和 top-K 是否能夠檢索到不太相關的資訊。

建立RAG的評估示例

from tqdm.notebook import tqdm

from datasets import load_dataset

from qdrant_client import QdrantClient

from tqdm import tqdm

from langchain.docstore.document import Document as LangchainDocument

from langchain_text_splitters import RecursiveCharacterTextSplitter

from openai import OpenAI

import deepeval

# Get your key from https://platform.openai.com/api-keys

OPENAI_API_KEY = "<OPENAI_API_KEY>"

# Get your Confident AI API key from https://app.confident-ai.com

CONFIDENT_AI_API_KEY = "<CONFIDENT_AI_API_KEY>"

# Get a FREE forever cluster at https://cloud.qdrant.io/

# More info: https://qdrant.tech/documentation/cloud/create-cluster/

QDRANT_URL = "<QDRANT_URL>"

QDRANT_API_KEY = "<QDRANT_API_KEY>"

COLLECTION_NAME = "qdrant-deepeval"

EVAL_SIZE = 10

RETRIEVAL_SIZE = 3

dataset = load_dataset("atitaarora/qdrant_doc", split="train")

langchain_docs = [

LangchainDocument(

page_content=doc["text"], metadata={"source": doc["source"]}

)

for doc in tqdm(dataset)

]

text_splitter = RecursiveCharacterTextSplitter(

chunk_size=512,

chunk_overlap=50,

add_start_index=True,

separators=["\n\n", "\n", ".", " ", ""],

)

docs_processed = []

for doc in langchain_docs:

docs_processed += text_splitter.split_documents([doc])

client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

docs_contents, docs_metadatas = [], []

for doc in docs_processed:

if hasattr(doc, "page_content") and hasattr(doc, "metadata"):

docs_contents.append(doc.page_content)

docs_metadatas.append(doc.metadata)

else:

print(

"Warning: Some documents do not have 'page_content' or 'metadata' attributes."

)

# Uses FastEmbed - https://qdrant.tech/documentation/fastembed/

# To generate embeddings for the documents

# The default model is `BAAI/bge-small-en-v1.5`

client.add(

collection_name=COLLECTION_NAME,

metadata=docs_metadatas,

documents=docs_contents,

)

openai_client = OpenAI(api_key=OPENAI_API_KEY)

def query_with_context(query, limit):

search_result = client.query(

collection_name=COLLECTION_NAME, query_text=query, limit=limit

)

contexts = [

"document: " + r.document + ",source: " + r.metadata["source"]

for r in search_result

]

prompt_start = """ You're assisting a user who has a question based on the documentation.

Your goal is to provide a clear and concise response that addresses their query while referencing relevant information

from the documentation.

Remember to:

Understand the user's question thoroughly.

If the user's query is general (e.g., "hi," "good morning"),

greet them normally and avoid using the context from the documentation.

If the user's query is specific and related to the documentation, locate and extract the pertinent information.

Craft a response that directly addresses the user's query and provides accurate information

referring the relevant source and page from the 'source' field of fetched context from the documentation to support your answer.

Use a friendly and professional tone in your response.

If you cannot find the answer in the provided context, do not pretend to know it.

Instead, respond with "I don't know".

Context:\n"""

prompt_end = f"\n\nQuestion: {query}\nAnswer:"

prompt = prompt_start + "\n\n---\n\n".join(contexts) + prompt_end

res = openai_client.completions.create(

model="gpt-3.5-turbo-instruct",

prompt=prompt,

temperature=0,

max_tokens=636,

top_p=1,

frequency_penalty=0,

presence_penalty=0,

stop=None,

)

return (contexts, res.choices[0].text)

qdrant_qna_dataset = load_dataset("atitaarora/qdrant_doc_qna", split="train")

def create_deepeval_dataset(dataset, eval_size, retrieval_window_size):

test_cases = []

for i in range(eval_size):

entry = dataset[i]

question = entry["question"]

answer = entry["answer"]

context, rag_response = query_with_context(

question, retrieval_window_size

)

test_case = deepeval.test_case.LLMTestCase(

input=question,

actual_output=rag_response,

expected_output=answer,

retrieval_context=context,

)

test_cases.append(test_case)

return test_cases

test_cases = create_deepeval_dataset(

qdrant_qna_dataset, EVAL_SIZE, RETRIEVAL_SIZE

)

deepeval.login_with_confident_api_key(CONFIDENT_AI_API_KEY)

deepeval.evaluate(

test_cases=test_cases,

metrics=[

deepeval.metrics.AnswerRelevancyMetric(),

deepeval.metrics.FaithfulnessMetric(),

deepeval.metrics.ContextualPrecisionMetric(),

deepeval.metrics.ContextualRecallMetric(),

deepeval.metrics.ContextualRelevancyMetric(),

)

from tqdm.notebook import tqdm from datasets import load_dataset from qdrant_client import QdrantClient from tqdm import tqdm from langchain.docstore.document import Document as LangchainDocument from langchain_text_splitters import RecursiveCharacterTextSplitter from openai import OpenAI import deepeval # Get your key from https://platform.openai.com/api-keys OPENAI_API_KEY = "<OPENAI_API_KEY>" # Get your Confident AI API key from https://app.confident-ai.com CONFIDENT_AI_API_KEY = "<CONFIDENT_AI_API_KEY>" # Get a FREE forever cluster at https://cloud.qdrant.io/ # More info: https://qdrant.tech/documentation/cloud/create-cluster/ QDRANT_URL = "<QDRANT_URL>" QDRANT_API_KEY = "<QDRANT_API_KEY>" COLLECTION_NAME = "qdrant-deepeval" EVAL_SIZE = 10 RETRIEVAL_SIZE = 3 dataset = load_dataset("atitaarora/qdrant_doc", split="train") langchain_docs = [ LangchainDocument( page_content=doc["text"], metadata={"source": doc["source"]} ) for doc in tqdm(dataset) ] text_splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=50, add_start_index=True, separators=["\n\n", "\n", ".", " ", ""], ) docs_processed = [] for doc in langchain_docs: docs_processed += text_splitter.split_documents([doc]) client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY) docs_contents, docs_metadatas = [], [] for doc in docs_processed: if hasattr(doc, "page_content") and hasattr(doc, "metadata"): docs_contents.append(doc.page_content) docs_metadatas.append(doc.metadata) else: print( "Warning: Some documents do not have 'page_content' or 'metadata' attributes." ) # Uses FastEmbed - https://qdrant.tech/documentation/fastembed/ # To generate embeddings for the documents # The default model is `BAAI/bge-small-en-v1.5` client.add( collection_name=COLLECTION_NAME, metadata=docs_metadatas, documents=docs_contents, ) openai_client = OpenAI(api_key=OPENAI_API_KEY) def query_with_context(query, limit): search_result = client.query( collection_name=COLLECTION_NAME, query_text=query, limit=limit ) contexts = [ "document: " + r.document + ",source: " + r.metadata["source"] for r in search_result ] prompt_start = """ You're assisting a user who has a question based on the documentation. Your goal is to provide a clear and concise response that addresses their query while referencing relevant information from the documentation. Remember to: Understand the user's question thoroughly. If the user's query is general (e.g., "hi," "good morning"), greet them normally and avoid using the context from the documentation. If the user's query is specific and related to the documentation, locate and extract the pertinent information. Craft a response that directly addresses the user's query and provides accurate information referring the relevant source and page from the 'source' field of fetched context from the documentation to support your answer. Use a friendly and professional tone in your response. If you cannot find the answer in the provided context, do not pretend to know it. Instead, respond with "I don't know". Context:\n""" prompt_end = f"\n\nQuestion: {query}\nAnswer:" prompt = prompt_start + "\n\n---\n\n".join(contexts) + prompt_end res = openai_client.completions.create( model="gpt-3.5-turbo-instruct", prompt=prompt, temperature=0, max_tokens=636, top_p=1, frequency_penalty=0, presence_penalty=0, stop=None, ) return (contexts, res.choices[0].text) qdrant_qna_dataset = load_dataset("atitaarora/qdrant_doc_qna", split="train") def create_deepeval_dataset(dataset, eval_size, retrieval_window_size): test_cases = [] for i in range(eval_size): entry = dataset[i] question = entry["question"] answer = entry["answer"] context, rag_response = query_with_context( question, retrieval_window_size ) test_case = deepeval.test_case.LLMTestCase( input=question, actual_output=rag_response, expected_output=answer, retrieval_context=context, ) test_cases.append(test_case) return test_cases test_cases = create_deepeval_dataset( qdrant_qna_dataset, EVAL_SIZE, RETRIEVAL_SIZE ) deepeval.login_with_confident_api_key(CONFIDENT_AI_API_KEY) deepeval.evaluate( test_cases=test_cases, metrics=[ deepeval.metrics.AnswerRelevancyMetric(), deepeval.metrics.FaithfulnessMetric(), deepeval.metrics.ContextualPrecisionMetric(), deepeval.metrics.ContextualRecallMetric(), deepeval.metrics.ContextualRelevancyMetric(), ], )

from tqdm.notebook import tqdm
from datasets import load_dataset
from qdrant_client import QdrantClient
from tqdm import tqdm
from langchain.docstore.document import Document as LangchainDocument
from langchain_text_splitters import RecursiveCharacterTextSplitter
from openai import OpenAI
import deepeval
# Get your key from https://platform.openai.com/api-keys
OPENAI_API_KEY = "<OPENAI_API_KEY>"
# Get your Confident AI API key from https://app.confident-ai.com
CONFIDENT_AI_API_KEY = "<CONFIDENT_AI_API_KEY>"
# Get a FREE forever cluster at https://cloud.qdrant.io/
# More info: https://qdrant.tech/documentation/cloud/create-cluster/
QDRANT_URL = "<QDRANT_URL>"
QDRANT_API_KEY = "<QDRANT_API_KEY>"
COLLECTION_NAME = "qdrant-deepeval"
EVAL_SIZE = 10
RETRIEVAL_SIZE = 3
dataset = load_dataset("atitaarora/qdrant_doc", split="train")
langchain_docs = [
LangchainDocument(
page_content=doc["text"], metadata={"source": doc["source"]}
)
for doc in tqdm(dataset)
]
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50,
add_start_index=True,
separators=["\n\n", "\n", ".", " ", ""],
)
docs_processed = []
for doc in langchain_docs:
docs_processed += text_splitter.split_documents([doc])
client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
docs_contents, docs_metadatas = [], []
for doc in docs_processed:
if hasattr(doc, "page_content") and hasattr(doc, "metadata"):
docs_contents.append(doc.page_content)
docs_metadatas.append(doc.metadata)
else:
print(
"Warning: Some documents do not have 'page_content' or 'metadata' attributes."
)
# Uses FastEmbed - https://qdrant.tech/documentation/fastembed/
# To generate embeddings for the documents
# The default model is `BAAI/bge-small-en-v1.5`
client.add(
collection_name=COLLECTION_NAME,
metadata=docs_metadatas,
documents=docs_contents,
)
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def query_with_context(query, limit):
search_result = client.query(
collection_name=COLLECTION_NAME, query_text=query, limit=limit
)
contexts = [
"document: " + r.document + ",source: " + r.metadata["source"]
for r in search_result
]
prompt_start = """ You're assisting a user who has a question based on the documentation.
Your goal is to provide a clear and concise response that addresses their query while referencing relevant information
from the documentation.
Remember to:
Understand the user's question thoroughly.
If the user's query is general (e.g., "hi," "good morning"),
greet them normally and avoid using the context from the documentation.
If the user's query is specific and related to the documentation, locate and extract the pertinent information.
Craft a response that directly addresses the user's query and provides accurate information
referring the relevant source and page from the 'source' field of fetched context from the documentation to support your answer.
Use a friendly and professional tone in your response.
If you cannot find the answer in the provided context, do not pretend to know it.
Instead, respond with "I don't know".
Context:\n"""
prompt_end = f"\n\nQuestion: {query}\nAnswer:"
prompt = prompt_start + "\n\n---\n\n".join(contexts) + prompt_end
res = openai_client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=prompt,
temperature=0,
max_tokens=636,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
stop=None,
)
return (contexts, res.choices[0].text)
qdrant_qna_dataset = load_dataset("atitaarora/qdrant_doc_qna", split="train")
def create_deepeval_dataset(dataset, eval_size, retrieval_window_size):
test_cases = []
for i in range(eval_size):
entry = dataset[i]
question = entry["question"]
answer = entry["answer"]
context, rag_response = query_with_context(
question, retrieval_window_size
)
test_case = deepeval.test_case.LLMTestCase(
input=question,
actual_output=rag_response,
expected_output=answer,
retrieval_context=context,
)
test_cases.append(test_case)
return test_cases
test_cases = create_deepeval_dataset(
qdrant_qna_dataset, EVAL_SIZE, RETRIEVAL_SIZE
)
deepeval.login_with_confident_api_key(CONFIDENT_AI_API_KEY)
deepeval.evaluate(
test_cases=test_cases,
metrics=[
deepeval.metrics.AnswerRelevancyMetric(),
deepeval.metrics.FaithfulnessMetric(),
deepeval.metrics.ContextualPrecisionMetric(),
deepeval.metrics.ContextualRecallMetric(),
deepeval.metrics.ContextualRelevancyMetric(),
],
)

9. 開放LLM訪問

開放LLM訪問

支援本地或基於應用程式介面訪問開放式 LLM 的平臺包括

Ollama：允許在本地執行開放式 LLM。
Groq、Hugging Face、Together AI：為開放式 LLM 提供 API 整合。

用於構建RAG的開放式LLM訪問示例

下載 Ollama：點選此處下載

curl -fsSL https://ollama.com/install.sh | sh

curl -fsSL https://ollama.com/install.sh | sh

之後，使用下面命令提取 DeepSeek R1:1.5b：

ollama pull deepseek-r1:1.5b

ollama pull deepseek-r1:1.5b

安裝所需的庫

!pip install langchain==0.3.11

!pip install langchain-openai==0.2.12

!pip install langchain-community==0.3.11

!pip install langchain-chroma==0.1.4

!pip install langchain==0.3.11 !pip install langchain-openai==0.2.12 !pip install langchain-community==0.3.11 !pip install langchain-chroma==0.1.4

!pip install langchain==0.3.11
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install langchain-chroma==0.1.4

開放式AI嵌入模型

from langchain_openai import OpenAIEmbeddings

openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

from langchain_openai import OpenAIEmbeddings openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

from langchain_openai import OpenAIEmbeddings
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

建立向量資料庫並在磁碟上持久化

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('AgenticAI.pdf')

pages = loader.load_and_split()

texts = [doc.page_content for doc in pages]

from langchain_chroma import Chroma

chroma_db = Chroma.from_texts(

texts=texts,

collection_name='db_docs',

collection_metadata={"hnsw:space": "cosine"}, # Set distance function to cosine

embedding=openai_embed_model

)

from langchain_community.document_loaders import PyPDFLoader loader = PyPDFLoader('AgenticAI.pdf') pages = loader.load_and_split() texts = [doc.page_content for doc in pages] from langchain_chroma import Chroma chroma_db = Chroma.from_texts( texts=texts, collection_name='db_docs', collection_metadata={"hnsw:space": "cosine"}, # Set distance function to cosine embedding=openai_embed_model )

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('AgenticAI.pdf')
pages = loader.load_and_split()
texts = [doc.page_content for doc in pages]
from langchain_chroma import Chroma
chroma_db = Chroma.from_texts(
    texts=texts,
    collection_name='db_docs',
    collection_metadata={"hnsw:space": "cosine"},  # Set distance function to cosine
embedding=openai_embed_model
)

構建RAG鏈

from langchain_core.prompts import ChatPromptTemplate

prompt = """You are an assistant for question-answering tasks.

Use the following pieces of retrieved context to answer the question.

If no context is present or if you don't know the answer, just say that you don't know.

Do not make up the answer unless it is there in the provided context.

Keep the answer concise and to the point with regard to the question.

Question:

{question}

Context:

{context}

Answer:

"""

prompt_template = ChatPromptTemplate.from_template(prompt)

from langchain_core.prompts import ChatPromptTemplate prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If no context is present or if you don't know the answer, just say that you don't know. Do not make up the answer unless it is there in the provided context. Keep the answer concise and to the point with regard to the question. Question: {question} Context: {context} Answer: """ prompt_template = ChatPromptTemplate.from_template(prompt)

from langchain_core.prompts import ChatPromptTemplate
prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If no context is present or if you don't know the answer, just say that you don't know.
            Do not make up the answer unless it is there in the provided context.
            Keep the answer concise and to the point with regard to the question.
            Question:
            {question}
            Context:
            {context}
            Answer:
         """
prompt_template = ChatPromptTemplate.from_template(prompt)

載入與LLM的連線

from langchain_community.llms import Ollama

deepseek = Ollama(model="deepseek-r1:1.5b")

from langchain_community.llms import Ollama deepseek = Ollama(model="deepseek-r1:1.5b")

from langchain_community.llms import Ollama
deepseek = Ollama(model="deepseek-r1:1.5b")

用於RAG鏈的LangChain語法

from langchain.chains import Retrieval

rag_chain = Retrieval.from_chain_type(llm=deepseek,

chain_type="stuff",

retriever=similarity_threshold_retriever,

chain_type_kwargs={"prompt": prompt_template})

query = "Tell the Leaders’ Perspectives on Agentic AI"

rag_chain.invoke(query)

{'query': 'Tell the Leaders’ Perspectives on Agentic AI',

from langchain.chains import Retrieval rag_chain = Retrieval.from_chain_type(llm=deepseek, chain_type="stuff", retriever=similarity_threshold_retriever, chain_type_kwargs={"prompt": prompt_template}) query = "Tell the Leaders’ Perspectives on Agentic AI" rag_chain.invoke(query) {'query': 'Tell the Leaders’ Perspectives on Agentic AI',

from langchain.chains import Retrieval
rag_chain = Retrieval.from_chain_type(llm=deepseek,
                                           chain_type="stuff",
                                           retriever=similarity_threshold_retriever,
                                           chain_type_kwargs={"prompt": prompt_template})
query = "Tell the Leaders’ Perspectives on Agentic AI"
rag_chain.invoke(query)
{'query': 'Tell the Leaders’ Perspectives on Agentic AI',

輸出

開放式LLM訪問示例

小結

構建有效的 RAG 應用程式並不僅僅是插入一個語言模型，而是要選擇正確的 RAG 開發人員堆疊，從框架和嵌入到向量資料庫和檢索工具。當這些元件經過深思熟慮的整合後，就能實現智慧、可擴充套件的系統，這些系統可以與 PDF 聊天，即時提取相關事實，並生成上下文感知響應。隨著生態系統的不斷發展，保持工具的靈活性和堅實的架構基礎將是構建可靠、面向未來的人工智慧解決方案的關鍵。

RAG 技術堆疊

RAG開發技術堆疊綜合指南

為什麼需要RAG開發堆疊？

RAG如何工作？

為什麼需要RAG開發堆疊？

為您的下一個專案提供RAG開發堆疊

1. 大型語言模型 (LLM)

在RAG中使用LLM的示例

提示模板

管道構建

使用示例

輸出

2. 用於RAG響應生成的LLM

商業LLM

開源LLM

3. 框架

構建RAG的框架示例

聊天模型

選擇嵌入模型

選擇向量儲存

建立索引管道

輸出

4. 資料提取

用於構建RAG的資料提取示例

讓我們從PDF檔案中提取內容

輸出

5. 嵌入器

OpenAI嵌入模型

Cohere Embed v3

Nomic Embed v2

Gemini嵌入

Ollama嵌入

BGE (BAAI)

Mixedbread

將PDF內容分割成塊

輸出

建立嵌入

6. 向量資料庫

用於構建RAG的向量資料庫示例

使用Chroma db儲存嵌入式資料

載入向量資料庫

檢索資訊並獲得輸出

輸出

7. 重新排序

構建RAG的Reranker示例

輸出

8. 評估

建立RAG的評估示例

9. 開放LLM訪問

用於構建RAG的開放式LLM訪問示例

安裝所需的庫

開放式AI嵌入模型

建立向量資料庫並在磁碟上持久化

構建RAG鏈

載入與LLM的連線

用於RAG鏈的LangChain語法

輸出

小結

相關文章

評論留言

取消回覆

文章目录