2025年最佳RAG重排序模型盤點：Cohere、bge-reranker、Voyage等對比

RAG重排序模型

檢索增強生成 (RAG) 標誌著自然語言處理向前邁出了重要一步。它允許大型語言模型 (LLM) 在建立響應之前檢查訓練資料之外的資訊，從而提高其效能。這意味著 LLM 無需進行昂貴的重新訓練即可出色地處理特定的公司知識或新資訊。RAG 的重排序器在最佳化檢索到的資訊方面發揮著至關重要的作用，確保提供最相關的上下文。RAG 將資訊檢索與文字生成相結合，從而生成準確、相關且聽起來自然的答案。

為什麼初始檢索還不夠

RAG 的第一步是查詢與使用者查詢相關的文件。系統通常使用關鍵字搜尋或向量相似度等方法。這些方法是良好的起點，但它們返回的文件可能並非都同樣有用。所使用的嵌入模型可能無法掌握挑選最相關資訊所需的細節。

向量搜尋用於查詢相似的含義，但在處理簡短的查詢或專業術語時可能會遇到困難。此外，LLM 處理上下文的能力有限。輸入過多的文件，即使是稍微相關的文件，也會使模型混亂，降低最終答案的質量。這種初始的“噪聲”檢索會削弱 LLM 的專注力。我們需要一種方法來完善這第一批資訊。

RAG 系統架構

這張圖描繪了 RAG 的檢索和生成步驟：使用者提出一個問題，然後我們的系統透過搜尋向量庫 (Vector store) 提取基於該問題的結果。檢索到的內容連同問題一起傳遞給 LLM，LLM 提供結構化的輸出。

重排序器：最佳化搜尋

重排序器的作用就在於此。重排序可以提高搜尋結果的精確度。重排序器使用智慧演算法來分析最初檢索到的文件，並根據它們與使用者特定問題和意圖的匹配程度進行重新排序。

在 RAG 中，重排序器充當質量過濾器。它們會檢查第一組結果，並優先選擇那些為查詢提供最佳資訊的文件。目標是將最相關的部分提升到頂部。可以將重排序器視為一位專家，它會仔細檢查初始搜尋，利用對語言的更深入理解來找到文件與問題之間的最佳匹配。

重排序器：最佳化搜尋

Source: Click Here

此圖展示了一個兩階段的搜尋過程。第二階段是重排序，在此階段，基於語義或關鍵詞匹配的初始搜尋結果集會進行最佳化，以顯著提高最終結果的相關性和排序，從而為使用者的查詢提供更準確、更實用的結果。

重排序如何提升RAG

重排序器提升了提供給 LLM 的上下文的準確性。它們會分析使用者的問題與每個檢索到的文件之間的含義和關係，而不僅僅是簡單的關鍵詞匹配。這種更深入的理解有助於識別最有用的資訊。

透過將 LLM 集中於更小、更優質的文件集，重排序器可以得出更精確的答案。LLM 獲得高質量的上下文，從而能夠形成更明智、更直接的響應。重排序器會計算一個分數，顯示文件與查詢在語義上的接近程度，從而實現更最佳化的最終排序。即使沒有完全匹配的關鍵詞，它們也能找到相關資訊。

這種對高質量上下文的關注有助於減少 LLM 的“幻覺”——即模型生成不正確但看似合理的資訊的情況。將經過重排序器驗證的文件作為 LLM 的基礎，可以提高最終輸出的可信度。

標準 RAG 流程包括檢索和生成。增強型 RAG 流程在中間新增了重排序步驟。

檢索：獲取一組初始候選文件。
重排序：使用重排序模型根據與查詢的相關性對這些文件進行重新排序。
生成：僅向 LLM 提供排名靠前、最相關的文件來建立答案。

這種兩階段方法允許初始檢索撒下大網（召回率），而重排序器則專注於從該網路中挑選出最佳項（精確度）。這種劃分改進了整體流程，併為 LLM 提供了最佳輸入。

重新排序改進RAG

Source: Click Here

使用查詢來搜尋向量資料庫，檢索出相關性最高的前 25 個文件。然後，這些文件被傳遞到“重排序器”模組。重排序器會最佳化結果，選擇相關性最高的前 3 個文件作為最終輸出。

2025年最佳重排序模型

讓我們來看看 2025 年最熱門的重新排名模型。

2025年最佳重排序模型

Source: Click Here

有幾種重新排序模型是 RAG 管道的熱門選擇：

重排序器	模型型別	來源	主要優勢	主要劣勢	最佳使用場景
Cohere	Cross-encoder（API）	私有	• 準確率高 • 多語言支援 • 使用簡單（託管 API） • Nimble 版本速度快	• 需付 API 費用 • 閉源	• 通用 RAG 系統 • 企業級應用 • 多語言場景 • 追求易用性
bge-reranker	Cross-encoder	開源	• 準確率高 • 完全開源 • 中等硬體即可執行	• 需要自建/自運維	• 通用 RAG • 偏好開源 • 成本敏感的專案
Voyage	Cross-encoder（API）	私有	• 相關性與準確率業界領先	• 需付 API 費用 • 頂配模型可能延遲略高	• 對準確率要求極高的金融、法律等場景 • 相關性關鍵的應用
Jina	Cross-encoder / ColBERT 變體	混合	• 效能均衡 • 成本效益高 • Jina-ColBERT 對長文件友好	• 最高準確率略低於頂尖模型	• 通用 RAG • 處理長文件 • 追求成本/效能平衡
FlashRank	輕量級 Cross-encoder	開源	• 極快 • 資源佔用低 • 易整合	• 準確率低於大型模型	• 對速度敏感的應用 • 資源受限環境
ColBERT	多向量（Late Interaction）	開源	• 處理超大規模文件集合高效 • 查詢速度快	• 構建索引需較高計算/儲存成本	• 超大文件庫 • 追求檢索效率與擴充套件性
MixedBread（mxbai-rerank-v2）	Cross-encoder	開源	• 宣稱 SOTA 效能 • 推理速度快 • 多語言 • 支援長上下文 • 適用多種資料（程式碼/JSON 等）	• 需自建/自運維 • 專案較新，生態尚在完善	• 高效能 RAG • 多語言 • 長文件或程式碼/JSON • 偏好開源

Cohere Rerank

Cohere Rerank 使用一個複雜的神經網路（可能基於 Transformer 架構）充當交叉編碼器。它將查詢和文件一起處理，以精確判斷相關性。它是一個專有模型，可透過 API 訪問。

主要特點：其主要特點是支援超過 100 種語言，使其能夠靈活應用於全球應用。它可以輕鬆整合為託管服務。Cohere 還提供“Rerank 3 Nimble”，該版本旨在在生產環境中顯著提高效能，同時保持高精度。
效能：Cohere Rerank 在初始檢索步驟中使用的各種嵌入模型中始終保持高精度。Nimble 版本顯著縮短了響應時間。成本取決於 API 的使用情況。
優勢：可透過 API 輕鬆整合，效能強大可靠，具有出色的多語言功能，並提供速度最佳化選項（Nimble）。
劣勢：它是一項閉源的商業服務，因此您需要按使用量付費，並且無法修改模型。
理想用例：適用於通用 RAG 應用程式、企業搜尋平臺、客戶支援聊天機器人以及需要廣泛語言支援且無需管理模型基礎架構的情況。

示例程式碼

首先安裝 Cohere 庫。

%pip install --upgrade --quiet cohere

%pip install --upgrade --quiet  cohere

設定 Cohere 和 ContextualCompressionRetriever。

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

from langchain_cohere import CohereRerank

from langchain_community.llms import Cohere

from langchain.chains import RetrievalQA

llm = Cohere(temperature=0)

compressor = CohereRerank(model="rerank-english-v3.0")

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

chain = RetrievalQA.from_chain_type(

llm=Cohere(temperature=0), retriever=compression_retriever

)

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever from langchain_cohere import CohereRerank from langchain_community.llms import Cohere from langchain.chains import RetrievalQA llm = Cohere(temperature=0) compressor = CohereRerank(model="rerank-english-v3.0") compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) chain = RetrievalQA.from_chain_type( llm=Cohere(temperature=0), retriever=compression_retriever )

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA
llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
llm=Cohere(temperature=0), retriever=compression_retriever
)

輸出：

{'query': 'What did the president say about Ketanji Brown Jackson',
'result': " The president speaks highly of Ketanji Brown Jackson, stating that she
is one of the nation's top legal minds, and will continue the legacy of excellence
of Justice Breyer. The president also mentions that he worked with her family and
that she comes from a family of public school educators and police officers. Since
her nomination, she has received support from various groups, including the
Fraternal Order of Police and judges from both major political parties. \n\nWould
you like me to extract another sentence from the provided text? "}

bge-reranker（基礎版/大型版）

這些模型來自北京人工智慧研究院 (BAAI)，並且是開源的（Apache 2.0 許可證）。它們基於 Transformer，可能是交叉編碼器，專為重排序任務而設計。它們提供不同大小，例如基礎版和大型版。

主要特點：開源特性讓使用者可以自由部署和修改它們。例如，bge-reranker-v2-m3 模型的引數量不到 6 億，能夠在包括消費級 GPU 在內的常見硬體上高效執行。
效能：這些模型效能卓越，尤其是大型版，通常能夠達到接近頂級商業模型的結果。它們展現出強大的平均倒數排名 (MRR) 得分。成本主要在於自託管所需的計算資源。
優勢：無需許可費用（開源）、高準確率、自託管靈活性，即使在中等硬體上也能保持良好的效能。
劣勢：需要使用者管理部署、基礎架構和更新。效能取決於託管硬體。
理想用例：適用於一般 RAG 任務、研究專案、偏愛開源工具的團隊、注重預算的應用程式以及習慣自託管的使用者。

示例程式碼

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import CrossEncoderReranker

from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")

compressor = CrossEncoderReranker(model=model, top_n=3)

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke("What is the plan for the economy?")

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import CrossEncoderReranker from langchain_community.cross_encoders import HuggingFaceCrossEncoder model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base") compressor = CrossEncoderReranker(model=model, top_n=3) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke("What is the plan for the economy?") pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
pretty_print_docs(compressed_docs)

輸出：

Document 1:
More infrastructure and innovation in America. 
More goods moving faster and cheaper in America. 
More jobs where you can earn a good living in America. 
And instead of relying on foreign supply chains, let’s make it in America. 
Economists call it “increasing the productive capacity of our economy.” 
I call it building a better America. 
My plan to fight inflation will lower your costs and lower the deficit.
----------------------------------------------------------------------------------------------------
Document 2:
Second – cut energy costs for families an average of $500 a year by combatting
climate change.  
Let’s provide investments and tax credits to weatherize your homes and businesses to
be energy efficient and you get a tax credit; double America’s clean energy
production in solar, wind, and so much more;  lower the price of electric vehicles,
saving you another $80 a month because you’ll never have to pay at the gas pump
again.
----------------------------------------------------------------------------------------------------
Document 3:
Look at cars. 
Last year, there weren’t enough semiconductors to make all the cars that people
wanted to buy. 
And guess what, prices of automobiles went up. 
So—we have a choice. 
One way to fight inflation is to drive down wages and make Americans poorer.  
I have a better plan to fight inflation. 
Lower your costs, not your wages. 
Make more cars and semiconductors in America. 
More infrastructure and innovation in America. 
More goods moving faster and cheaper in America.

Voyage Rerank

Voyage AI 提供專有的神經網路模型（voyage-rerank-2、voyage-rerank-2-lite），可透過 API 訪問。這些模型可能是經過精細調整的高階交叉編碼器，旨在實現最高的相關性評分。

主要特點：其主要優勢在於在基準測試中取得了頂級的相關性得分。Voyage 提供了一個簡單的 Python 客戶端庫，方便整合。精簡版在效能和速度/成本之間取得了平衡。
效能：voyage-rerank-2 在純相關性準確度方面通常領先於基準測試。精簡版模型的效能與其他強勁競爭對手相當。高精度 rerank-2 模型的延遲可能略高於某些競爭對手。成本與 API 使用情況相關。
優勢：最先進的相關性，可能是目前最準確的選擇。可透過 Python 客戶端輕鬆使用。
劣勢：基於專有 API 的服務，需要支付相關費用。準確度最高的模型可能比其他模型略慢。
理想用例：最適合最大化相關性至關重要的應用，例如財務分析、法律檔案審查或高風險問答，在這些應用中，準確性比細微的速度差異更重要。

示例程式碼

首先安裝 voyage 庫

%pip install --upgrade --quiet voyageai

%pip install --upgrade --quiet langchain-voyageai

%pip install --upgrade --quiet voyageai %pip install --upgrade --quiet langchain-voyageai

%pip install --upgrade --quiet  voyageai
%pip install --upgrade --quiet  langchain-voyageai

設定 Cohere 和 ContextualCompressionRetriever

from langchain_community.document_loaders import TextLoader

from langchain_community.vectorstores import FAISS

from langchain.retrievers import ContextualCompressionRetriever

from langchain_openai import OpenAI

from langchain_voyageai import VoyageAIRerank

from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_voyageai import VoyageAIEmbeddings

documents = TextLoader("../../how_to/state_of_the_union.txt").load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

texts = text_splitter.split_documents(documents)

retriever = FAISS.from_documents(

texts, VoyageAIEmbeddings(model="voyage-law-2")

).as_retriever(search_kwargs={"k": 20})

llm = OpenAI(temperature=0)

compressor = VoyageAIRerank(

model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3

)

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What did the president say about Ketanji Jackson Brown"

)

pretty_print_docs(compressed_docs)

from langchain_community.document_loaders import TextLoader from langchain_community.vectorstores import FAISS from langchain.retrievers import ContextualCompressionRetriever from langchain_openai import OpenAI from langchain_voyageai import VoyageAIRerank from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_voyageai import VoyageAIEmbeddings documents = TextLoader("../../how_to/state_of_the_union.txt").load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) texts = text_splitter.split_documents(documents) retriever = FAISS.from_documents( texts, VoyageAIEmbeddings(model="voyage-law-2") ).as_retriever(search_kwargs={"k": 20}) llm = OpenAI(temperature=0) compressor = VoyageAIRerank( model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3 ) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What did the president say about Ketanji Jackson Brown" ) pretty_print_docs(compressed_docs)

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.retrievers import ContextualCompressionRetriever
from langchain_openai import OpenAI
from langchain_voyageai import VoyageAIRerank
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_voyageai import VoyageAIEmbeddings
documents = TextLoader("../../how_to/state_of_the_union.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(
   texts, VoyageAIEmbeddings(model="voyage-law-2")
).as_retriever(search_kwargs={"k": 20})
llm = OpenAI(temperature=0)
compressor = VoyageAIRerank(
model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3
)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

輸出：

Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 3:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.

Jina Reranker

它提供重排序解決方案，包括 Jina Reranker v2 和 Jina-ColBERT 等神經模型。Jina Reranker v2 可能是一種跨編碼器風格的模型。Jina-ColBERT 使用 Jina 的基礎模型實現了 ColBERT 架構（下文將進行解釋）。

主要特點：Jina 提供經濟高效的選項，效能卓越。Jina-ColBERT 的一個突出特點是能夠處理超長文件，支援高達 8,000 個 token 的上下文長度。這減少了對長文字進行分塊的需要。開源元件也是 Jina 生態系統的一部分。
效能：Jina Reranker v2 在速度、成本和相關性方面實現了良好的平衡。Jina-ColBERT 在處理長源文件方面表現出色，價格通常具有競爭力。
優勢：效能均衡、經濟高效、透過 Jina-ColBERT 出色地處理長文件，並可靈活使用可用的開源元件。
缺點：標準 Jina 重排序器可能無法達到 Voyage 頂級等專業模型的絕對峰值準確率。
理想用例：通用 RAG 系統、處理長文件（技術手冊、研究論文、書籍）的應用程式、需要在成本和效能之間取得良好平衡的專案。

示例程式碼

from langchain_community.document_loaders import TextLoader

from langchain_community.embeddings import JinaEmbeddings

from langchain_community.vectorstores import FAISS

from langchain_text_splitters import RecursiveCharacterTextSplitter

documents = TextLoader(

"../../how_to/state_of_the_union.txt",

).load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

texts = text_splitter.split_documents(documents)

embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")

retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})

query = "What did the president say about Ketanji Brown Jackson"

docs = retriever.get_relevant_documents(query)

from langchain_community.document_loaders import TextLoader from langchain_community.embeddings import JinaEmbeddings from langchain_community.vectorstores import FAISS from langchain_text_splitters import RecursiveCharacterTextSplitter documents = TextLoader( "../../how_to/state_of_the_union.txt", ).load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) texts = text_splitter.split_documents(documents) embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en") retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20}) query = "What did the president say about Ketanji Brown Jackson" docs = retriever.get_relevant_documents(query)

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import JinaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
documents = TextLoader(
"../../how_to/state_of_the_union.txt",
).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")
retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})
query = "What did the president say about Ketanji Brown Jackson"
docs = retriever.get_relevant_documents(query)

使用 JIna 進行重新排名

from langchain.retrievers import ContextualCompressionRetriever

from langchain_community.document_compressors import JinaRerank

compressor = JinaRerank()

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.get_relevant_documents(

"What did the president say about Ketanji Jackson Brown"

)

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain_community.document_compressors import JinaRerank compressor = JinaRerank() compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.get_relevant_documents( "What did the president say about Ketanji Jackson Brown" ) pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.document_compressors import JinaRerank
compressor = JinaRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

輸出：

Document 1:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 2:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers 
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.

ColBERT

ColBERT（語境化後期互動 BERT）是一個多向量模型。它不是用一個向量來表示文件，而是建立多個語境化向量（通常每個標記一個）。它採用“後期互動”機制，將查詢向量與編碼後的多個文件向量進行比較。這使得文件向量可以預先計算並索引。

主要特點：一旦文件被索引，其架構便允許從大型集合中高效檢索。多向量方法支援在查詢詞和文件內容之間進行細粒度比較。它是一種開源方法。
效能：ColBERT 在檢索有效性和效率之間實現了良好的平衡，尤其是在大規模情況下。初始索引步驟後，檢索延遲較低。主要成本在於索引和自託管的計算成本。
優勢：對大型文件集高效、檢索可擴充套件、開源靈活性。
劣勢：初始索引過程可能需要大量計算並佔用大量儲存空間。
理想用例：大規模 RAG 應用程式、需要快速檢索數百萬或數十億文件的系統，以及預計算時間可接受的場景。

示例程式碼

安裝 Ragtouille 庫以使用 ColBERT 重排序器。

pip install -U ragatouille

pip install -U ragatouille

現在設定 ColBERT 重新排序器

from ragatouille import RAGPretrainedModel

from langchain.retrievers import ContextualCompressionRetriever

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

compression_retriever = ContextualCompressionRetriever(

base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What animation studio did Miyazaki found"

)

print(compressed_docs[0])

from ragatouille import RAGPretrainedModel from langchain.retrievers import ContextualCompressionRetriever RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0") compression_retriever = ContextualCompressionRetriever( base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What animation studio did Miyazaki found" ) print(compressed_docs[0])

from ragatouille import RAGPretrainedModel
from langchain.retrievers import ContextualCompressionRetriever
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
    "What animation studio did Miyazaki found"
)
print(compressed_docs[0])

輸出：

Document(page_content='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded
the animation production company Studio Ghibli, with funding from Tokuma Shoten.
Studio Ghibli\'s first film, Laputa: Castle in the Sky (1986), employed the same
production crew of Nausicaä. Miyazaki\'s designs for the film\'s setting were 
inspired by Greek architecture and "European urbanistic templates". Some of the
architecture in the film was also inspired by a Welsh mining town; Miyazaki
witnessed the mining strike upon his first', metadata={'relevance_score':
26.5194149017334})

FlashRank

FlashRank 被設計為一個非常輕量級且快速的重排序庫，通常利用較小且經過最佳化的 Transformer 模型（通常是較大模型的精簡或修剪版本）。它旨在以最小的計算開銷，在簡單的相似性搜尋的基礎上顯著提升相關性。它的功能類似於交叉編碼器，但使用了一些技術來加速處理過程。它通常以開源 Python 庫的形式提供。

主要特點：其主要特點是速度和效率。它易於整合且資源消耗低（CPU 或中等 GPU 使用率）。通常只需極少的程式碼即可實現。
效能：雖然 FlashRank 的準確率無法達到 Cohere 或 Voyage 等大型交叉編碼器的峰值，但它的目標是在無重排序或基本雙編碼器重排序的基礎上實現顯著的提升。其速度使其適用於即時或高吞吐量場景。成本極低（自託管計算）。
優勢：推理速度極快，計算要求低，易於整合，開源。
缺點：準確率可能低於更大、更復雜的重排序模型。與更廣泛的框架相比，模型選擇可能更有限。
理想用例：需要在資源受限的硬體（如 CPU 或邊緣裝置）上快速重排序的應用程式、延遲至關重要的大容量搜尋系統、尋求簡單且“聊勝於無”的重排序步驟且複雜度最低的專案。

示例程式碼

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import FlashrankRerank

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

compressor = FlashrankRerank()

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What did the president say about Ketanji Jackson Brown"

)

print([doc.metadata["id"] for doc in compressed_docs])

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import FlashrankRerank from langchain_openai import ChatOpenAI llm = ChatOpenAI(temperature=0) compressor = FlashrankRerank() compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What did the president say about Ketanji Jackson Brown" ) print([doc.metadata["id"] for doc in compressed_docs]) pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0)
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
   base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
   "What did the president say about Ketanji Jackson Brown"
)
print([doc.metadata["id"] for doc in compressed_docs])
pretty_print_docs(compressed_docs)

此程式碼片段利用 ContextualCompressionRetriever 中的 FlashrankRerank 來提升檢索到的文件的相關性。它根據查詢“總統對 Ketanji Jackson Brown 的評價如何”的相關性，對基礎檢索器（用檢索器表示）獲取的文件進行重新排序。最後，它會列印文件 ID 以及壓縮後、重新排序後的文件。

輸出：

[0, 5, 3]
Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage,
their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to
retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy said in his speech to the European
Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United
States is here tonight.
----------------------------------------------------------------------------------------------------
Document 3:
And tonight, I’m announcing that the Justice Department will name a chief prosecutor
for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was
before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a
single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism
It’s exploitation—and it drives up prices.
The output shoes it reranks the retrieved chunks based on the relevancy.

MixedBread

該系列由 Mixedbread AI 提供，包括 mxbai-rerank-base-v2（5 億個引數）和 mxbai-rerank-large-v2（15 億個引數）。它們是基於 Qwen-2.5 架構的開源（Apache 2.0 許可證）交叉編碼器。其關鍵區別在於訓練過程，在初始訓練的基礎上融入了三階段強化學習 (RL) 方法（GRPO、對比學習和偏好學習）。

主要特點：聲稱在各項基準測試（如 BEIR）中均擁有領先效能。支援超過 100 種語言。可處理多達 8k 個 token 的長上下文（併相容 32k 個 token）。旨在處理各種資料型別，包括文字、程式碼、JSON，並支援 LLM 工具選擇。可透過 Hugging Face 和 Python 庫獲取。
效能：Mixedbread 釋出的基準測試表明，這些模型在 BEIR 上的表現優於其他頂級開源和閉源競爭對手，例如 Cohere 和 Voyage（Large 達到 57.49，Base 達到 55.57）。它們還表現出顯著的速度優勢，15 億引數的模型在延遲測試中明顯快於其他大型開源重排器。成本是用於自託管的計算資源。
優勢：高基準效能（聲稱 SOTA）、開源許可證、相對於準確率的快速推理速度、廣泛的語言支援、非常長的上下文視窗、跨資料型別（程式碼、JSON）的通用性。
劣勢：需要自託管和基礎設施管理。作為相對較新的模型，長期效能和社羣審查仍在進行中。
理想用例：需要頂級效能的通用 RAG、多語言應用程式、處理程式碼、JSON 或長文件的系統、LLM 工具/函式呼叫選擇、偏好高效能開源模型的團隊。

示例程式碼

!pip install mxbai_rerank

from mxbai_rerank import MxbaiRerankV2

# Load the model, here we use our base sized model

model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")

# Example query and documents

query = "Who wrote To Kill a Mockingbird?"

documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",

"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",

"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",

"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",

"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",

"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."

]

# Calculate the scores

results = model.rank(query, documents)

print(results)

!pip install mxbai_rerank from mxbai_rerank import MxbaiRerankV2 # Load the model, here we use our base sized model model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2") # Example query and documents query = "Who wrote To Kill a Mockingbird?" documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.", "The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.", "Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.", "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.", "The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.", "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan." ] # Calculate the scores results = model.rank(query, documents) print(results)

!pip install mxbai_rerank
from mxbai_rerank import MxbaiRerankV2
# Load the model, here we use our base sized model
model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")
# Example query and documents
query = "Who wrote To Kill a Mockingbird?"
documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
 "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# Calculate the scores
results = model.rank(query, documents)
print(results)

輸出：

[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a
novel by Harper Lee published in 1960. It was immediately successful, winning the
Pulitzer Prize, and has become a classic of modern American literature.'), 
RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American
novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in
Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),
RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English
novelist known primarily for her six major novels, which interpret, critique and
comment upon the British landed gentry at the end of the 18th century.'), 
RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,
which consists of seven fantasy novels written by British author J.K. Rowling, is
among the most popular and critically acclaimed books of the modern era.'), 
RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was
written by Herman Melville and first published in 1851. It is considered a 
masterpiece of American literature and deals with complex themes of obsession,
revenge, and the conflict between good and evil.'), 
RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel
written by American author F. Scott Fitzgerald, was published in 1925. The story is
set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit
of Daisy Buchanan.')]

如何判斷你的重排序器是否有效

評估重排序器非常重要。常用指標有助於衡量其有效性：

準確率@k（Accuracy@k）：相關文件在前k個結果中出現的頻率。
精確率@k（Precision@k）：相關文件在前k個結果中的比例。
召回率@k（Recall@k）：在前k個結果中找到的所有相關文件的比例。
歸一化折扣累積增益 (NDCG)：透過同時考慮相關性和位置來衡量排名質量。排名較高的相關文件對得分的貢獻更大。它是歸一化的（0到1），允許進行比較。
平均倒數排名 (MRR)：關注找到的第一個相關文件的排名。它是多個查詢的1/排名的平均值。當快速找到一個好的結果很重要時很有用。
F1-score：精確率和召回率的調和平均值，提供平衡的視角。
選擇適合您需求的重排序器

選擇適合您需求的重排序器

選擇最佳重排序器需要平衡以下幾個因素：

相關性需求：您的應用需要多高的準確度？
延遲：重排序器返回結果的速度必須有多快？速度對於即時應用至關重要。
可擴充套件性：模型能否處理您當前和未來的資料量和使用者負載？
整合性：重排序器與您現有的 RAG 流程（嵌入模型、向量資料庫、LLM 框架）的適配性如何？
領域特異性：您是否需要一個針對特定領域資料進行訓練的模型？
成本：考慮私有模型的 API 費用或自託管模型的計算成本。

需要權衡利弊：

交叉編碼器精度高，但速度較慢。
雙編碼器速度更快且可擴充套件，但精度可能略低。
基於 LLM 的重排序器精度很高，但價格昂貴且速度較慢。
多向量模型力求實現平衡。
基於分數的方法速度最快，但可能缺乏語義深度。

明智選擇：

明確您的準確度和速度目標。
分析您的資料特徵（大小、領域）。
使用 NDCG 和 MRR 等指標評估不同模型在您的資料上的表現。
考慮整合的難易程度和預算。

最佳的重排序器應符合您特定的效能、效率和成本要求。

小結

RAG 的重排序器對於充分利用 RAG 系統至關重要。它們可以最佳化提供給 LLM 的資訊，從而獲得更好、更可靠的答案。從高精度交叉編碼器到高效的雙編碼器，再到像 ColBERT 這樣的專用模型，各種模型可供選擇，開發人員擁有豐富的選擇。選擇合適的模型需要理解準確度、速度、可擴充套件性和成本之間的權衡。隨著 RAG 的發展，尤其是在處理多樣化資料型別方面，RAG 的重排序器將繼續在構建更智慧、更可靠的 AI 應用中發揮關鍵作用。謹慎的評估和選擇仍然是成功的關鍵。

RAG 重排序模型

2025年最佳RAG重排序模型盤點：Cohere、bge-reranker、Voyage等對比

為什麼初始檢索還不夠

重排序器：最佳化搜尋

重排序如何提升RAG

2025年最佳重排序模型

Cohere Rerank

示例程式碼

bge-reranker（基礎版/大型版）

示例程式碼

Voyage Rerank

示例程式碼

Jina Reranker

示例程式碼

ColBERT

示例程式碼

FlashRank

示例程式碼

MixedBread

示例程式碼

如何判斷你的重排序器是否有效

選擇適合您需求的重排序器

小結

評論留言

取消回覆

文章目录

2025年最佳RAG重排序模型盤點：Cohere、bge-reranker、Voyage等對比

為什麼初始檢索還不夠

重排序器：最佳化搜尋

重排序如何提升RAG

2025年最佳重排序模型

Cohere Rerank

示例程式碼

bge-reranker（基礎版/大型版）

示例程式碼

Voyage Rerank

示例程式碼

Jina Reranker

示例程式碼

ColBERT

示例程式碼

FlashRank

示例程式碼

MixedBread

示例程式碼

如何判斷你的重排序器是否有效

選擇適合您需求的重排序器

小結

相關文章

評論留言

取消回覆

文章目录