2025年最佳RAG重排序模型盘点：Cohere、bge-reranker、Voyage等对比

RAG重排序模型

检索增强生成 (RAG) 标志着自然语言处理向前迈出了重要一步。它允许大型语言模型 (LLM) 在创建响应之前检查训练数据之外的信息，从而提高其性能。这意味着 LLM 无需进行昂贵的重新训练即可出色地处理特定的公司知识或新信息。RAG 的重排序器在优化检索到的信息方面发挥着至关重要的作用，确保提供最相关的上下文。RAG 将信息检索与文本生成相结合，从而生成准确、相关且听起来自然的答案。

为什么初始检索还不够

RAG 的第一步是查找与用户查询相关的文档。系统通常使用关键字搜索或向量相似度等方法。这些方法是良好的起点，但它们返回的文档可能并非都同样有用。所使用的嵌入模型可能无法掌握挑选最相关信息所需的细节。

向量搜索用于查找相似的含义，但在处理简短的查询或专业术语时可能会遇到困难。此外，LLM 处理上下文的能力有限。输入过多的文档，即使是稍微相关的文档，也会使模型混乱，降低最终答案的质量。这种初始的“噪声”检索会削弱 LLM 的专注力。我们需要一种方法来完善这第一批信息。

RAG 系统架构

这张图描绘了 RAG 的检索和生成步骤：用户提出一个问题，然后我们的系统通过搜索向量库 (Vector store) 提取基于该问题的结果。检索到的内容连同问题一起传递给 LLM，LLM 提供结构化的输出。

重排序器：优化搜索

重排序器的作用就在于此。重排序可以提高搜索结果的精确度。重排序器使用智能算法来分析最初检索到的文档，并根据它们与用户特定问题和意图的匹配程度进行重新排序。

在 RAG 中，重排序器充当质量过滤器。它们会检查第一组结果，并优先选择那些为查询提供最佳信息的文档。目标是将最相关的部分提升到顶部。可以将重排序器视为一位专家，它会仔细检查初始搜索，利用对语言的更深入理解来找到文档与问题之间的最佳匹配。

重排序器：优化搜索

Source: Click Here

此图展示了一个两阶段的搜索过程。第二阶段是重排序，在此阶段，基于语义或关键词匹配的初始搜索结果集会进行优化，以显著提高最终结果的相关性和排序，从而为用户的查询提供更准确、更实用的结果。

重排序如何提升RAG

重排序器提升了提供给 LLM 的上下文的准确性。它们会分析用户的问题与每个检索到的文档之间的含义和关系，而不仅仅是简单的关键词匹配。这种更深入的理解有助于识别最有用的信息。

通过将 LLM 集中于更小、更优质的文档集，重排序器可以得出更精确的答案。LLM 获得高质量的上下文，从而能够形成更明智、更直接的响应。重排序器会计算一个分数，显示文档与查询在语义上的接近程度，从而实现更优化的最终排序。即使没有完全匹配的关键词，它们也能找到相关信息。

这种对高质量上下文的关注有助于减少 LLM 的“幻觉”——即模型生成不正确但看似合理的信息的情况。将经过重排序器验证的文档作为 LLM 的基础，可以提高最终输出的可信度。

标准 RAG 流程包括检索和生成。增强型 RAG 流程在中间添加了重排序步骤。

检索：获取一组初始候选文档。
重排序：使用重排序模型根据与查询的相关性对这些文档进行重新排序。
生成：仅向 LLM 提供排名靠前、最相关的文档来创建答案。

这种两阶段方法允许初始检索撒下大网（召回率），而重排序器则专注于从该网络中挑选出最佳项（精确度）。这种划分改进了整体流程，并为 LLM 提供了最佳输入。

重新排序改进RAG

Source: Click Here

使用查询来搜索向量数据库，检索出相关性最高的前 25 个文档。然后，这些文档被传递到“重排序器”模块。重排序器会优化结果，选择相关性最高的前 3 个文档作为最终输出。

2025年最佳重排序模型

让我们来看看 2025 年最热门的重新排名模型。

2025年最佳重排序模型

Source: Click Here

有几种重新排序模型是 RAG 管道的热门选择：

重排序器	模型类型	来源	主要优势	主要劣势	最佳使用场景
Cohere	Cross-encoder（API）	私有	• 准确率高 • 多语言支持 • 使用简单（托管 API） • Nimble 版本速度快	• 需付 API 费用 • 闭源	• 通用 RAG 系统 • 企业级应用 • 多语言场景 • 追求易用性
bge-reranker	Cross-encoder	开源	• 准确率高 • 完全开源 • 中等硬件即可运行	• 需要自建/自运维	• 通用 RAG • 偏好开源 • 成本敏感的项目
Voyage	Cross-encoder（API）	私有	• 相关性与准确率业界领先	• 需付 API 费用 • 顶配模型可能延迟略高	• 对准确率要求极高的金融、法律等场景 • 相关性关键的应用
Jina	Cross-encoder / ColBERT 变体	混合	• 性能均衡 • 成本效益高 • Jina-ColBERT 对长文档友好	• 最高准确率略低于顶尖模型	• 通用 RAG • 处理长文档 • 追求成本/性能平衡
FlashRank	轻量级 Cross-encoder	开源	• 极快 • 资源占用低 • 易集成	• 准确率低于大型模型	• 对速度敏感的应用 • 资源受限环境
ColBERT	多向量（Late Interaction）	开源	• 处理超大规模文档集合高效 • 查询速度快	• 构建索引需较高计算/存储成本	• 超大文档库 • 追求检索效率与扩展性
MixedBread（mxbai-rerank-v2）	Cross-encoder	开源	• 宣称 SOTA 性能 • 推理速度快 • 多语言 • 支持长上下文 • 适用多种数据（代码/JSON 等）	• 需自建/自运维 • 项目较新，生态尚在完善	• 高性能 RAG • 多语言 • 长文档或代码/JSON • 偏好开源

Cohere Rerank

Cohere Rerank 使用一个复杂的神经网络（可能基于 Transformer 架构）充当交叉编码器。它将查询和文档一起处理，以精确判断相关性。它是一个专有模型，可通过 API 访问。

主要特点：其主要特点是支持超过 100 种语言，使其能够灵活应用于全球应用。它可以轻松集成为托管服务。Cohere 还提供“Rerank 3 Nimble”，该版本旨在在生产环境中显著提高性能，同时保持高精度。
性能：Cohere Rerank 在初始检索步骤中使用的各种嵌入模型中始终保持高精度。Nimble 版本显著缩短了响应时间。成本取决于 API 的使用情况。
优势：可通过 API 轻松集成，性能强大可靠，具有出色的多语言功能，并提供速度优化选项（Nimble）。
劣势：它是一项闭源的商业服务，因此您需要按使用量付费，并且无法修改模型。
理想用例：适用于通用 RAG 应用程序、企业搜索平台、客户支持聊天机器人以及需要广泛语言支持且无需管理模型基础架构的情况。

示例代码

首先安装 Cohere 库。

%pip install --upgrade --quiet cohere

%pip install --upgrade --quiet  cohere

设置 Cohere 和 ContextualCompressionRetriever。

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

from langchain_cohere import CohereRerank

from langchain_community.llms import Cohere

from langchain.chains import RetrievalQA

llm = Cohere(temperature=0)

compressor = CohereRerank(model="rerank-english-v3.0")

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

chain = RetrievalQA.from_chain_type(

llm=Cohere(temperature=0), retriever=compression_retriever

)

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever from langchain_cohere import CohereRerank from langchain_community.llms import Cohere from langchain.chains import RetrievalQA llm = Cohere(temperature=0) compressor = CohereRerank(model="rerank-english-v3.0") compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) chain = RetrievalQA.from_chain_type( llm=Cohere(temperature=0), retriever=compression_retriever )

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA
llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
llm=Cohere(temperature=0), retriever=compression_retriever
)

输出：

{'query': 'What did the president say about Ketanji Brown Jackson',
'result': " The president speaks highly of Ketanji Brown Jackson, stating that she
is one of the nation's top legal minds, and will continue the legacy of excellence
of Justice Breyer. The president also mentions that he worked with her family and
that she comes from a family of public school educators and police officers. Since
her nomination, she has received support from various groups, including the
Fraternal Order of Police and judges from both major political parties. \n\nWould
you like me to extract another sentence from the provided text? "}

bge-reranker（基础版/大型版）

这些模型来自北京人工智能研究院 (BAAI)，并且是开源的（Apache 2.0 许可证）。它们基于 Transformer，可能是交叉编码器，专为重排序任务而设计。它们提供不同大小，例如基础版和大型版。

主要特点：开源特性让用户可以自由部署和修改它们。例如，bge-reranker-v2-m3 模型的参数量不到 6 亿，能够在包括消费级 GPU 在内的常见硬件上高效运行。
性能：这些模型性能卓越，尤其是大型版，通常能够达到接近顶级商业模型的结果。它们展现出强大的平均倒数排名 (MRR) 得分。成本主要在于自托管所需的计算资源。
优势：无需许可费用（开源）、高准确率、自托管灵活性，即使在中等硬件上也能保持良好的性能。
劣势：需要用户管理部署、基础架构和更新。性能取决于托管硬件。
理想用例：适用于一般 RAG 任务、研究项目、偏爱开源工具的团队、注重预算的应用程序以及习惯自托管的用户。

示例代码

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import CrossEncoderReranker

from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")

compressor = CrossEncoderReranker(model=model, top_n=3)

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke("What is the plan for the economy?")

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import CrossEncoderReranker from langchain_community.cross_encoders import HuggingFaceCrossEncoder model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base") compressor = CrossEncoderReranker(model=model, top_n=3) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke("What is the plan for the economy?") pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
pretty_print_docs(compressed_docs)

输出：

Document 1:
More infrastructure and innovation in America. 
More goods moving faster and cheaper in America. 
More jobs where you can earn a good living in America. 
And instead of relying on foreign supply chains, let’s make it in America. 
Economists call it “increasing the productive capacity of our economy.” 
I call it building a better America. 
My plan to fight inflation will lower your costs and lower the deficit.
----------------------------------------------------------------------------------------------------
Document 2:
Second – cut energy costs for families an average of $500 a year by combatting
climate change.  
Let’s provide investments and tax credits to weatherize your homes and businesses to
be energy efficient and you get a tax credit; double America’s clean energy
production in solar, wind, and so much more;  lower the price of electric vehicles,
saving you another $80 a month because you’ll never have to pay at the gas pump
again.
----------------------------------------------------------------------------------------------------
Document 3:
Look at cars. 
Last year, there weren’t enough semiconductors to make all the cars that people
wanted to buy. 
And guess what, prices of automobiles went up. 
So—we have a choice. 
One way to fight inflation is to drive down wages and make Americans poorer.  
I have a better plan to fight inflation. 
Lower your costs, not your wages. 
Make more cars and semiconductors in America. 
More infrastructure and innovation in America. 
More goods moving faster and cheaper in America.

Voyage Rerank

Voyage AI 提供专有的神经网络模型（voyage-rerank-2、voyage-rerank-2-lite），可通过 API 访问。这些模型可能是经过精细调整的高级交叉编码器，旨在实现最高的相关性评分。

主要特点：其主要优势在于在基准测试中取得了顶级的相关性得分。Voyage 提供了一个简单的 Python 客户端库，方便集成。精简版在性能和速度/成本之间取得了平衡。
性能：voyage-rerank-2 在纯相关性准确度方面通常领先于基准测试。精简版模型的性能与其他强劲竞争对手相当。高精度 rerank-2 模型的延迟可能略高于某些竞争对手。成本与 API 使用情况相关。
优势：最先进的相关性，可能是目前最准确的选择。可通过 Python 客户端轻松使用。
劣势：基于专有 API 的服务，需要支付相关费用。准确度最高的模型可能比其他模型略慢。
理想用例：最适合最大化相关性至关重要的应用，例如财务分析、法律文件审查或高风险问答，在这些应用中，准确性比细微的速度差异更重要。

示例代码

首先安装 voyage 库

%pip install --upgrade --quiet voyageai

%pip install --upgrade --quiet langchain-voyageai

%pip install --upgrade --quiet voyageai %pip install --upgrade --quiet langchain-voyageai

%pip install --upgrade --quiet  voyageai
%pip install --upgrade --quiet  langchain-voyageai

设置 Cohere 和 ContextualCompressionRetriever

from langchain_community.document_loaders import TextLoader

from langchain_community.vectorstores import FAISS

from langchain.retrievers import ContextualCompressionRetriever

from langchain_openai import OpenAI

from langchain_voyageai import VoyageAIRerank

from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_voyageai import VoyageAIEmbeddings

documents = TextLoader("../../how_to/state_of_the_union.txt").load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

texts = text_splitter.split_documents(documents)

retriever = FAISS.from_documents(

texts, VoyageAIEmbeddings(model="voyage-law-2")

).as_retriever(search_kwargs={"k": 20})

llm = OpenAI(temperature=0)

compressor = VoyageAIRerank(

model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3

)

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What did the president say about Ketanji Jackson Brown"

)

pretty_print_docs(compressed_docs)

from langchain_community.document_loaders import TextLoader from langchain_community.vectorstores import FAISS from langchain.retrievers import ContextualCompressionRetriever from langchain_openai import OpenAI from langchain_voyageai import VoyageAIRerank from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_voyageai import VoyageAIEmbeddings documents = TextLoader("../../how_to/state_of_the_union.txt").load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) texts = text_splitter.split_documents(documents) retriever = FAISS.from_documents( texts, VoyageAIEmbeddings(model="voyage-law-2") ).as_retriever(search_kwargs={"k": 20}) llm = OpenAI(temperature=0) compressor = VoyageAIRerank( model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3 ) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What did the president say about Ketanji Jackson Brown" ) pretty_print_docs(compressed_docs)

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.retrievers import ContextualCompressionRetriever
from langchain_openai import OpenAI
from langchain_voyageai import VoyageAIRerank
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_voyageai import VoyageAIEmbeddings
documents = TextLoader("../../how_to/state_of_the_union.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(
   texts, VoyageAIEmbeddings(model="voyage-law-2")
).as_retriever(search_kwargs={"k": 20})
llm = OpenAI(temperature=0)
compressor = VoyageAIRerank(
model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3
)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

输出：

Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 3:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.

Jina Reranker

它提供重排序解决方案，包括 Jina Reranker v2 和 Jina-ColBERT 等神经模型。Jina Reranker v2 可能是一种跨编码器风格的模型。Jina-ColBERT 使用 Jina 的基础模型实现了 ColBERT 架构（下文将进行解释）。

主要特点：Jina 提供经济高效的选项，性能卓越。Jina-ColBERT 的一个突出特点是能够处理超长文档，支持高达 8,000 个 token 的上下文长度。这减少了对长文本进行分块的需要。开源组件也是 Jina 生态系统的一部分。
性能：Jina Reranker v2 在速度、成本和相关性方面实现了良好的平衡。Jina-ColBERT 在处理长源文档方面表现出色，价格通常具有竞争力。
优势：性能均衡、经济高效、通过 Jina-ColBERT 出色地处理长文档，并可灵活使用可用的开源组件。
缺点：标准 Jina 重排序器可能无法达到 Voyage 顶级等专业模型的绝对峰值准确率。
理想用例：通用 RAG 系统、处理长文档（技术手册、研究论文、书籍）的应用程序、需要在成本和性能之间取得良好平衡的项目。

示例代码

from langchain_community.document_loaders import TextLoader

from langchain_community.embeddings import JinaEmbeddings

from langchain_community.vectorstores import FAISS

from langchain_text_splitters import RecursiveCharacterTextSplitter

documents = TextLoader(

"../../how_to/state_of_the_union.txt",

).load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

texts = text_splitter.split_documents(documents)

embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")

retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})

query = "What did the president say about Ketanji Brown Jackson"

docs = retriever.get_relevant_documents(query)

from langchain_community.document_loaders import TextLoader from langchain_community.embeddings import JinaEmbeddings from langchain_community.vectorstores import FAISS from langchain_text_splitters import RecursiveCharacterTextSplitter documents = TextLoader( "../../how_to/state_of_the_union.txt", ).load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) texts = text_splitter.split_documents(documents) embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en") retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20}) query = "What did the president say about Ketanji Brown Jackson" docs = retriever.get_relevant_documents(query)

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import JinaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
documents = TextLoader(
"../../how_to/state_of_the_union.txt",
).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")
retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})
query = "What did the president say about Ketanji Brown Jackson"
docs = retriever.get_relevant_documents(query)

使用 JIna 进行重新排名

from langchain.retrievers import ContextualCompressionRetriever

from langchain_community.document_compressors import JinaRerank

compressor = JinaRerank()

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.get_relevant_documents(

"What did the president say about Ketanji Jackson Brown"

)

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain_community.document_compressors import JinaRerank compressor = JinaRerank() compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.get_relevant_documents( "What did the president say about Ketanji Jackson Brown" ) pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.document_compressors import JinaRerank
compressor = JinaRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

输出：

Document 1:
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.
----------------------------------------------------------------------------------------------------
Document 2:
I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers 
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.

ColBERT

ColBERT（语境化后期交互 BERT）是一个多向量模型。它不是用一个向量来表示文档，而是创建多个语境化向量（通常每个标记一个）。它采用“后期交互”机制，将查询向量与编码后的多个文档向量进行比较。这使得文档向量可以预先计算并索引。

主要特点：一旦文档被索引，其架构便允许从大型集合中高效检索。多向量方法支持在查询词和文档内容之间进行细粒度比较。它是一种开源方法。
性能：ColBERT 在检索有效性和效率之间实现了良好的平衡，尤其是在大规模情况下。初始索引步骤后，检索延迟较低。主要成本在于索引和自托管的计算成本。
优势：对大型文档集高效、检索可扩展、开源灵活性。
劣势：初始索引过程可能需要大量计算并占用大量存储空间。
理想用例：大规模 RAG 应用程序、需要快速检索数百万或数十亿文档的系统，以及预计算时间可接受的场景。

示例代码

安装 Ragtouille 库以使用 ColBERT 重排序器。

pip install -U ragatouille

pip install -U ragatouille

现在设置 ColBERT 重新排序器

from ragatouille import RAGPretrainedModel

from langchain.retrievers import ContextualCompressionRetriever

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

compression_retriever = ContextualCompressionRetriever(

base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What animation studio did Miyazaki found"

)

print(compressed_docs[0])

from ragatouille import RAGPretrainedModel from langchain.retrievers import ContextualCompressionRetriever RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0") compression_retriever = ContextualCompressionRetriever( base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What animation studio did Miyazaki found" ) print(compressed_docs[0])

from ragatouille import RAGPretrainedModel
from langchain.retrievers import ContextualCompressionRetriever
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
    "What animation studio did Miyazaki found"
)
print(compressed_docs[0])

输出：

Document(page_content='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded
the animation production company Studio Ghibli, with funding from Tokuma Shoten.
Studio Ghibli\'s first film, Laputa: Castle in the Sky (1986), employed the same
production crew of Nausicaä. Miyazaki\'s designs for the film\'s setting were 
inspired by Greek architecture and "European urbanistic templates". Some of the
architecture in the film was also inspired by a Welsh mining town; Miyazaki
witnessed the mining strike upon his first', metadata={'relevance_score':
26.5194149017334})

FlashRank

FlashRank 被设计为一个非常轻量级且快速的重排序库，通常利用较小且经过优化的 Transformer 模型（通常是较大模型的精简或修剪版本）。它旨在以最小的计算开销，在简单的相似性搜索的基础上显著提升相关性。它的功能类似于交叉编码器，但使用了一些技术来加速处理过程。它通常以开源 Python 库的形式提供。

主要特点：其主要特点是速度和效率。它易于集成且资源消耗低（CPU 或中等 GPU 使用率）。通常只需极少的代码即可实现。
性能：虽然 FlashRank 的准确率无法达到 Cohere 或 Voyage 等大型交叉编码器的峰值，但它的目标是在无重排序或基本双编码器重排序的基础上实现显著的提升。其速度使其适用于实时或高吞吐量场景。成本极低（自托管计算）。
优势：推理速度极快，计算要求低，易于集成，开源。
缺点：准确率可能低于更大、更复杂的重排序模型。与更广泛的框架相比，模型选择可能更有限。
理想用例：需要在资源受限的硬件（如 CPU 或边缘设备）上快速重排序的应用程序、延迟至关重要的大容量搜索系统、寻求简单且“聊胜于无”的重排序步骤且复杂度最低的项目。

示例代码

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import FlashrankRerank

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

compressor = FlashrankRerank()

compression_retriever = ContextualCompressionRetriever(

base_compressor=compressor, base_retriever=retriever

)

compressed_docs = compression_retriever.invoke(

"What did the president say about Ketanji Jackson Brown"

)

print([doc.metadata["id"] for doc in compressed_docs])

pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import FlashrankRerank from langchain_openai import ChatOpenAI llm = ChatOpenAI(temperature=0) compressor = FlashrankRerank() compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) compressed_docs = compression_retriever.invoke( "What did the president say about Ketanji Jackson Brown" ) print([doc.metadata["id"] for doc in compressed_docs]) pretty_print_docs(compressed_docs)

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0)
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
   base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
   "What did the president say about Ketanji Jackson Brown"
)
print([doc.metadata["id"] for doc in compressed_docs])
pretty_print_docs(compressed_docs)

此代码片段利用 ContextualCompressionRetriever 中的 FlashrankRerank 来提升检索到的文档的相关性。它根据查询“总统对 Ketanji Jackson Brown 的评价如何”的相关性，对基础检索器（用检索器表示）获取的文档进行重新排序。最后，它会打印文档 ID 以及压缩后、重新排序后的文档。

输出：

[0, 5, 3]
Document 1:
One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------
Document 2:
He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage,
their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to
retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy said in his speech to the European
Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United
States is here tonight.
----------------------------------------------------------------------------------------------------
Document 3:
And tonight, I’m announcing that the Justice Department will name a chief prosecutor
for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was
before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a
single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism
It’s exploitation—and it drives up prices.
The output shoes it reranks the retrieved chunks based on the relevancy.

MixedBread

该系列由 Mixedbread AI 提供，包括 mxbai-rerank-base-v2（5 亿个参数）和 mxbai-rerank-large-v2（15 亿个参数）。它们是基于 Qwen-2.5 架构的开源（Apache 2.0 许可证）交叉编码器。其关键区别在于训练过程，在初始训练的基础上融入了三阶段强化学习 (RL) 方法（GRPO、对比学习和偏好学习）。

主要特点：声称在各项基准测试（如 BEIR）中均拥有领先性能。支持超过 100 种语言。可处理多达 8k 个 token 的长上下文（并兼容 32k 个 token）。旨在处理各种数据类型，包括文本、代码、JSON，并支持 LLM 工具选择。可通过 Hugging Face 和 Python 库获取。
性能：Mixedbread 发布的基准测试表明，这些模型在 BEIR 上的表现优于其他顶级开源和闭源竞争对手，例如 Cohere 和 Voyage（Large 达到 57.49，Base 达到 55.57）。它们还表现出显著的速度优势，15 亿参数的模型在延迟测试中明显快于其他大型开源重排器。成本是用于自托管的计算资源。
优势：高基准性能（声称 SOTA）、开源许可证、相对于准确率的快速推理速度、广泛的语言支持、非常长的上下文窗口、跨数据类型（代码、JSON）的通用性。
劣势：需要自托管和基础设施管理。作为相对较新的模型，长期性能和社区审查仍在进行中。
理想用例：需要顶级性能的通用 RAG、多语言应用程序、处理代码、JSON 或长文档的系统、LLM 工具/函数调用选择、偏好高性能开源模型的团队。

示例代码

!pip install mxbai_rerank

from mxbai_rerank import MxbaiRerankV2

# Load the model, here we use our base sized model

model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")

# Example query and documents

query = "Who wrote To Kill a Mockingbird?"

documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",

"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",

"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",

"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",

"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",

"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."

]

# Calculate the scores

results = model.rank(query, documents)

print(results)

!pip install mxbai_rerank from mxbai_rerank import MxbaiRerankV2 # Load the model, here we use our base sized model model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2") # Example query and documents query = "Who wrote To Kill a Mockingbird?" documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.", "The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.", "Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.", "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.", "The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.", "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan." ] # Calculate the scores results = model.rank(query, documents) print(results)

!pip install mxbai_rerank
from mxbai_rerank import MxbaiRerankV2
# Load the model, here we use our base sized model
model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")
# Example query and documents
query = "Who wrote To Kill a Mockingbird?"
documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
 "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# Calculate the scores
results = model.rank(query, documents)
print(results)

输出：

[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a
novel by Harper Lee published in 1960. It was immediately successful, winning the
Pulitzer Prize, and has become a classic of modern American literature.'), 
RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American
novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in
Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),
RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English
novelist known primarily for her six major novels, which interpret, critique and
comment upon the British landed gentry at the end of the 18th century.'), 
RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,
which consists of seven fantasy novels written by British author J.K. Rowling, is
among the most popular and critically acclaimed books of the modern era.'), 
RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was
written by Herman Melville and first published in 1851. It is considered a 
masterpiece of American literature and deals with complex themes of obsession,
revenge, and the conflict between good and evil.'), 
RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel
written by American author F. Scott Fitzgerald, was published in 1925. The story is
set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit
of Daisy Buchanan.')]

如何判断你的重排序器是否有效

评估重排序器非常重要。常用指标有助于衡量其有效性：

准确率@k（Accuracy@k）：相关文档在前k个结果中出现的频率。
精确率@k（Precision@k）：相关文档在前k个结果中的比例。
召回率@k（Recall@k）：在前k个结果中找到的所有相关文档的比例。
归一化折扣累积增益 (NDCG)：通过同时考虑相关性和位置来衡量排名质量。排名较高的相关文档对得分的贡献更大。它是归一化的（0到1），允许进行比较。
平均倒数排名 (MRR)：关注找到的第一个相关文档的排名。它是多个查询的1/排名的平均值。当快速找到一个好的结果很重要时很有用。
F1-score：精确率和召回率的调和平均值，提供平衡的视角。
选择适合您需求的重排序器

选择适合您需求的重排序器

选择最佳重排序器需要平衡以下几个因素：

相关性需求：您的应用需要多高的准确度？
延迟：重排序器返回结果的速度必须有多快？速度对于实时应用至关重要。
可扩展性：模型能否处理您当前和未来的数据量和用户负载？
集成性：重排序器与您现有的 RAG 流程（嵌入模型、向量数据库、LLM 框架）的适配性如何？
领域特异性：您是否需要一个针对特定领域数据进行训练的模型？
成本：考虑私有模型的 API 费用或自托管模型的计算成本。

需要权衡利弊：

交叉编码器精度高，但速度较慢。
双编码器速度更快且可扩展，但精度可能略低。
基于 LLM 的重排序器精度很高，但价格昂贵且速度较慢。
多向量模型力求实现平衡。
基于分数的方法速度最快，但可能缺乏语义深度。

明智选择：

明确您的准确度和速度目标。
分析您的数据特征（大小、领域）。
使用 NDCG 和 MRR 等指标评估不同模型在您的数据上的表现。
考虑集成的难易程度和预算。

最佳的重排序器应符合您特定的性能、效率和成本要求。

小结

RAG 的重排序器对于充分利用 RAG 系统至关重要。它们可以优化提供给 LLM 的信息，从而获得更好、更可靠的答案。从高精度交叉编码器到高效的双编码器，再到像 ColBERT 这样的专用模型，各种模型可供选择，开发人员拥有丰富的选择。选择合适的模型需要理解准确度、速度、可扩展性和成本之间的权衡。随着 RAG 的发展，尤其是在处理多样化数据类型方面，RAG 的重排序器将继续在构建更智能、更可靠的 AI 应用中发挥关键作用。谨慎的评估和选择仍然是成功的关键。

RAG 重排序模型

2025年最佳RAG重排序模型盘点：Cohere、bge-reranker、Voyage等对比

为什么初始检索还不够

重排序器：优化搜索

重排序如何提升RAG

2025年最佳重排序模型

Cohere Rerank

示例代码

bge-reranker（基础版/大型版）

示例代码

Voyage Rerank

示例代码

Jina Reranker

示例代码

ColBERT

示例代码

FlashRank

示例代码

MixedBread

示例代码

如何判断你的重排序器是否有效

选择适合您需求的重排序器

小结

评论留言

取消回复

文章目录

2025年最佳RAG重排序模型盘点：Cohere、bge-reranker、Voyage等对比

为什么初始检索还不够

重排序器：优化搜索

重排序如何提升RAG

2025年最佳重排序模型

Cohere Rerank

示例代码

bge-reranker（基础版/大型版）

示例代码

Voyage Rerank

示例代码

Jina Reranker

示例代码

ColBERT

示例代码

FlashRank

示例代码

MixedBread

示例代码

如何判断你的重排序器是否有效

选择适合您需求的重排序器

小结

相关文章

评论留言

取消回复

文章目录