
AI 代理系統如今風靡全球!它們是簡單的 LLM,與特定的提示和工具相連,可以自主地為你完成任務。不過,你也可以建立可靠的分步工作流程,指導 LLM 更可靠地為你解決問題。最近,OpenAI 在 2025 年 2 月推出了 “深度研究”(Deep Research),它是一個代理,可以根據使用者的主題,自動執行大量搜尋,並將其編譯成一份精美的報告。不過,它只適用於 200 美元的專業計劃。在這裡,我將手把手教你如何使用 LangGraph 以不到一美元的價格建立自己的深度研究和報告生成代理!
OpenAI深度研究簡介
OpenAI 於 2025 年 2 月 2 日推出了深度研究,並將其作為 ChatGPT 產品的一項附加功能。他們稱這是一種新的代理能力,可以針對使用者提出的複雜任務或查詢在網際網路上進行多步驟研究。他們聲稱,它可以在數十分鐘內完成人類需要花費數小時才能完成的工作。

深度研究執行任務-來源:OpenAI
深度研究是 OpenAI 當前的 Agentic AI 產品,可以自主為您完成工作。您只需透過提示給它一個任務或主題,ChatGPT 就會查詢、分析和綜合數百個線上資料來源,以研究分析師的水平建立一份綜合報告。ChatGPT 由即將推出的 OpenAI o3 模型版本提供支援,該模型針對網頁瀏覽和資料分析進行了最佳化,它利用推理來搜尋、解釋和分析網際網路上的海量文字、圖片和 PDF 檔案,最終編制出一份結構合理的報告。
不過,這也有一些限制,因為只有訂閱了 200 美元的 ChatGPT 專業版才能使用它。這就是我的 Agentic AI 系統的優勢所在,它可以在不到一美元的時間內進行深入研究,並編寫出一份精美的報告。讓我們開始吧!
深度研究與結構化報告生成規劃Agentic AI系統架構
下圖顯示了我們系統的整體架構,我們將使用 LangChain 的 LangGraph 開源框架來實現該系統,從而輕鬆構建有狀態的代理系統。

深度研究與報告生成AI代理
為上述系統提供動力的關鍵元件包括
- 強大的大型語言模型(Large Language Model),推理能力強。我們使用的是 GPT-4o,它並不昂貴,速度也很快,不過,你甚至可以使用 Llama 3.2 等 LLM 或其他開源替代品。
- LangGraph 用於構建我們的代理系統,因為它是構建基於迴圈圖的系統的絕佳框架,可以在整個工作流程中保持狀態變數,並有助於輕鬆構建代理反饋迴路。
- Tavily AI 是一款出色的人工智慧搜尋引擎,非常適合網路研究和從網站獲取資料,為我們的深度研究系統提供動力。
本專案的重點是為深度研究和結構化報告生成構建一個規劃代理 ,作為 OpenAI 深度研究的替代方案。該代理遵循流行的規劃代理設計模式(Planning Agent Design Pattern),自動分析使用者定義的主題、執行深度網路研究並生成結構良好的報告。這個工作流程的靈感實際上來自 LangChain 自己的Report mAIstro,所以我對他們提出的工作流程給予了充分肯定:
1. 報告規劃:
- 代理分析使用者提供的主題和預設報告模板,為報告建立自定義計劃。
- 根據主題定義導言、關鍵部分和結論等部分。
- 在確定主要章節之前,會使用網路搜尋工具收集所需資訊。
2. 2. 研究與寫作並行執行:
- 代理使用並行執行來高效執行:
- 網路研究:為每個章節生成查詢,並透過網路搜尋工具執行,以檢索最新資訊。
- 撰寫章節:利用檢索到的資料為每個章節撰寫內容,流程如下:
- 研究員從網上收集相關資料。
- 章節撰寫人使用這些資料為指定章節生成結構化內容。
3. 格式化已完成的章節:
- 所有章節撰寫完成後,將對其進行格式化,以確保報告結構的一致性和一致性。
4. 撰寫引言和結論:
- 在完成主要章節的撰寫和格式化之後:
- 根據其餘章節的內容撰寫引言和結論(同步進行)。
- 這一過程可確保這些部分與報告的整體流程和見解保持一致。
5. 最後彙編:
- 將所有已完成的章節彙編在一起,形成最終報告。
- 最終輸出的是一份全面而有條理的維基文件式報告。
現在,讓我們開始使用 LangGraph 和 Tavily 逐步構建這些元件。
深度研究與結構化報告生成規劃AI代理系統的實踐實施
現在,我們將根據上一節詳細討論的架構,透過詳細說明、程式碼和輸出,逐步實現深度研究報告生成器代理人工智慧系統的端到端工作流程。
安裝依賴項
我們首先安裝必要的依賴庫,這些庫將用於構建我們的系統。其中包括 langchain、LangGraph 和用於生成漂亮標記符報告的 rich。
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install rich
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install rich
輸入Open AI API金鑰
我們使用 getpass() 函式輸入 Open AI 金鑰,這樣就不會在程式碼中意外暴露金鑰。
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
輸入Tavily Search API金鑰
我們使用 getpass() 函式輸入 Tavily Search 金鑰,這樣就不會在程式碼中意外暴露金鑰。您可以從這裡獲取金鑰,他們還提供免費服務。
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
設定環境變數
接下來,我們設定一些系統環境變數,這些變數將在以後驗證 LLM 和 Tavily Search 時使用。
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
import os
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
import os
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
定義代理狀態模式
我們使用 LangGraph 將代理系統構建為帶有節點的圖,其中每個節點都包含整個工作流程中的一個特定執行步驟。每個特定的操作集(節點)都有自己的模式,定義如下。您可以根據自己的報告生成風格進一步定製。
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
description="Name for a particular section of the report.",
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
description="Whether to perform web search for this section of the report."
description="The content for this section."
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
import operator
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
name: str = Field(
description="Name for a particular section of the report.",
)
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
)
research: bool = Field(
description="Whether to perform web search for this section of the report."
)
content: str = Field(
description="The content for this section."
)
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
)
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
)
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
import operator
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
name: str = Field(
description="Name for a particular section of the report.",
)
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
)
research: bool = Field(
description="Whether to perform web search for this section of the report."
)
content: str = Field(
description="The content for this section."
)
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
)
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
)
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
實用函式
我們定義了幾個實用函式,它們將幫助我們執行並行網路搜尋查詢並格式化從網路上獲取的結果。
1. run_search_queries(…)
該函式將非同步執行針對特定查詢列表的 Tavily 搜尋查詢,並返回搜尋結果。由於是非同步的,因此它是非阻塞的,可以並行執行。
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
def to_dict(self) -> Dict[str, Any]:
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
include_raw_content: bool = False
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
# get results from tavily async (in parallel) for each search query
tavily_search.raw_results_async(
include_raw_content=include_raw_content
print(f"Error creating search task for query '{query_str}': {e}")
# Execute all searches concurrently and await results
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
doc for doc in search_docs
if not isinstance(doc, Exception)
print(f"Error during search queries: {e}")
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
import asyncio
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
@dataclass
class SearchQuery:
search_query: str
def to_dict(self) -> Dict[str, Any]:
return asdict(self)
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
num_results: int = 5,
include_raw_content: bool = False
) -> List[Dict]:
search_tasks = []
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
# search_query: str
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
try:
# get results from tavily async (in parallel) for each search query
search_tasks.append(
tavily_search.raw_results_async(
query=query_str,
max_results=num_results,
search_depth='advanced',
include_answer=False,
include_raw_content=include_raw_content
)
)
except Exception as e:
print(f"Error creating search task for query '{query_str}': {e}")
continue
# Execute all searches concurrently and await results
try:
if not search_tasks:
return []
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
valid_results = [
doc for doc in search_docs
if not isinstance(doc, Exception)
]
return valid_results
except Exception as e:
print(f"Error during search queries: {e}")
return []
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
import asyncio
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
@dataclass
class SearchQuery:
search_query: str
def to_dict(self) -> Dict[str, Any]:
return asdict(self)
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
num_results: int = 5,
include_raw_content: bool = False
) -> List[Dict]:
search_tasks = []
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
# search_query: str
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
try:
# get results from tavily async (in parallel) for each search query
search_tasks.append(
tavily_search.raw_results_async(
query=query_str,
max_results=num_results,
search_depth='advanced',
include_answer=False,
include_raw_content=include_raw_content
)
)
except Exception as e:
print(f"Error creating search task for query '{query_str}': {e}")
continue
# Execute all searches concurrently and await results
try:
if not search_tasks:
return []
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
valid_results = [
doc for doc in search_docs
if not isinstance(doc, Exception)
]
return valid_results
except Exception as e:
print(f"Error during search queries: {e}")
return []
這將從 Tavily 搜尋結果中提取上下文,確保相同 URL 中的內容沒有重複,並將其格式化以顯示來源、URL 和相關內容(以及可選的原始內容,原始內容可根據標記數量進行截斷)。
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
include_raw_content: bool = False
encoding = tiktoken.encoding_for_model("gpt-4")
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
import tiktoken
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
max_tokens: int = 2000,
include_raw_content: bool = False
) -> str:
encoding = tiktoken.encoding_for_model("gpt-4")
sources_list = []
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
else:
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
else:
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
if not sources_list:
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
unique_sources = {}
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
# Format output
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
if include_raw_content:
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
if raw_content:
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
import tiktoken
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
max_tokens: int = 2000,
include_raw_content: bool = False
) -> str:
encoding = tiktoken.encoding_for_model("gpt-4")
sources_list = []
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
else:
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
else:
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
if not sources_list:
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
unique_sources = {}
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
# Format output
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
if include_raw_content:
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
if raw_content:
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
我們可以測試一下這些函式是否能正常工作,如下所示:
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
print(output)
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
print(output)
輸出
Content from web search:Source Introduction - GitHub Pages:===URL: https://langchain-ai.github.io/langgraphjs/===Most relevant content from source: Overview¶. LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows......===Raw Content: 🦜🕸️LangGraph.js¶⚡ Building language agents as graphs ⚡Looking for the Python version? Clickhere ( docs).Overview......Source ️LangGraph - GitHub Pages:===URL: https://langchain-ai.github.io/langgraph/===Most relevant content from source: Overview¶. LangGraph is a library for building stateful, multi-actor applications with LLMs, ......===Raw Content: 🦜🕸️LangGraph¶⚡ Building language agents as graphs ⚡NoteLooking for the JS version? See the JS repo and the JS docs.Overview¶LangGraph is a library for buildingstateful, multi-actor applications with LLMs, ......
建立預設報告模板
這是 LLM 瞭解如何建立一般報告的起點,它將以此為指導,根據主題建立自定義報告結構。請記住,這不是最終的報告結構,而更像是指導代理的提示。
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
# Structure Guideline
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
2. Main Body Sections:
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
# Structure Guideline
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
2. Main Body Sections:
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
報告規劃器的指令提示
主要有兩個指令提示:
1. REPORT_PLAN_QUERY_GENERATOR_PROMPT(報告計劃查詢生成器提示)
幫助 LLM 根據主題生成初始問題列表,以便從網上獲取更多有關該主題的資訊,從而規劃報告的整體章節和結構。
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
The report structure will follow these guidelines:
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
{topic}
The report structure will follow these guidelines:
{report_organization}
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
The query should:
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
"""
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
{topic}
The report structure will follow these guidelines:
{report_organization}
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
The query should:
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
"""
2. REPORT_PLAN_SECTION_GENERATOR_PROMPT(報告計劃章節生成器提示)
在這裡,我們向 LLM 提供預設報告模板、主題名稱和初始查詢生成的搜尋結果,以建立詳細的報告結構。LLM 將為報告中的每個主要部分生成包含以下欄位的結構化響應(這只是報告結構–此步驟不建立內容):
- Name – 報告此部分的名稱。
- Description – 本節將涵蓋的主要主題和概念的簡要概述。
- Research – 是否對報告的這一部分進行網路搜尋。
- Content – 本節的內容,暫時留空。
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
The report should follow this organizational structure:
You should reflect on this additional context information from web searches to plan the main sections of the report:
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
{topic}
The report should follow this organizational structure:
{report_organization}
You should reflect on this additional context information from web searches to plan the main sections of the report:
{search_context}
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
"""
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
{topic}
The report should follow this organizational structure:
{report_organization}
You should reflect on this additional context information from web searches to plan the main sections of the report:
{search_context}
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
"""
報告規劃器節點函式
我們將構建報告規劃器節點的邏輯,其目的是根據輸入的使用者主題和預設報告模板指南,建立一個結構化的自定義報告模板,幷包含主要部分的名稱和描述。

報告規劃器節點函式
該功能使用之前建立的兩個提示:
- 首先,根據使用者主題生成一些查詢
- 搜尋網路,獲取有關這些查詢的一些資訊
- 利用這些資訊生成報告的整體結構,以及需要建立的關鍵部分
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
report_organization=report_structure,
number_of_queries=number_of_queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
# Convert SearchQuery objects to strings
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
# Search web and ensure we wait for results
search_docs = await run_search_queries(
include_raw_content=False
print("Warning: No search results returned")
search_context = "No search results available."
search_context = format_search_query_results(
include_raw_content=False
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
report_organization=report_structure,
search_context=search_context
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
print(f"Error in generate_report_plan: {e}")
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
topic = state["topic"]
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
number_of_queries = 8
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
number_of_queries=number_of_queries
)
try:
# Generate queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
])
# Convert SearchQuery objects to strings
query_list = [
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
]
# Search web and ensure we wait for results
search_docs = await run_search_queries(
query_list,
num_results=5,
include_raw_content=False
)
if not search_docs:
print("Warning: No search results returned")
search_context = "No search results available."
else:
search_context = format_search_query_results(
search_docs,
include_raw_content=False
)
# Generate sections
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
search_context=search_context
)
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
])
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
except Exception as e:
print(f"Error in generate_report_plan: {e}")
return {"sections": []}
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
topic = state["topic"]
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
number_of_queries = 8
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
number_of_queries=number_of_queries
)
try:
# Generate queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
])
# Convert SearchQuery objects to strings
query_list = [
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
]
# Search web and ensure we wait for results
search_docs = await run_search_queries(
query_list,
num_results=5,
include_raw_content=False
)
if not search_docs:
print("Warning: No search results returned")
search_context = "No search results available."
else:
search_context = format_search_query_results(
search_docs,
include_raw_content=False
)
# Generate sections
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
search_context=search_context
)
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
])
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
except Exception as e:
print(f"Error in generate_report_plan: {e}")
return {"sections": []}
章節生成器 – 查詢生成器的指令提示
有一個主要指令提示:
1. REPORT_SECTION_QUERY_GENERATOR_PROMPT
幫助 LLM 為需要構建的特定章節的主題生成一個全面的問題列表
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
Topic for this section:
{section_topic}
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
Your queries should be:
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
Topic for this section:
{section_topic}
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
Your queries should be:
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
章節生成器的節點函式 – 生成查詢(查詢生成器)
該功能使用章節主題和上面的指令提示生成一些問題,以便在網路上查詢有關章節主題的有用資訊。

查詢生成器節點函式
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
# Get state
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
# Get configuration
number_of_queries = 5
# Generate queries
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
# Generate queries
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
# Get state
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
# Get configuration
number_of_queries = 5
# Generate queries
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
# Generate queries
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
章節生成器的節點函式 – 搜尋網路
獲取由 generate_queries(…)為特定章節生成的查詢,使用我們之前定義的實用功能搜尋網路並格式化搜尋結果。

網路研究員節點函式
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
# Get state
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
# Web search
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
# Get state
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
# Web search
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
章節生成器–章節寫作的指令提示
有一個主要的指令提示:
1. SECTION_WRITER_PROMPT(章節編寫提示)
限制 LLM 使用特定的文體、結構、長度和方法指南生成並編寫特定章節的內容,同時傳送使用 search_web(…) 函式從網上獲取的文件。
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
- Strict 150-200 word limit
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
- Include at least one specific example or case study if available
- Use concrete details over general statements
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Guidelines for writing:
1. Technical Accuracy:
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
2. Length and Style:
- Strict 150-200 word limit
- No marketing language
- Technical focus
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
3. Structure:
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
3. Writing Approach:
- Include at least one specific example or case study if available
- Use concrete details over general statements
- Make every word count
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
{context}
5. Quality Checks:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- Sources cited at end
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Guidelines for writing:
1. Technical Accuracy:
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
2. Length and Style:
- Strict 150-200 word limit
- No marketing language
- Technical focus
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
3. Structure:
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
3. Writing Approach:
- Include at least one specific example or case study if available
- Use concrete details over general statements
- Make every word count
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
{context}
5. Quality Checks:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- Sources cited at end
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
章節建立器的節點函式 – 編寫章節(章節編寫器)
使用上面的 SECTION_WRITER_PROMPT,輸入章節名稱、描述和網路搜尋文件,然後將其傳遞給 LLM,由 LLM 撰寫該章節的內容

章節撰寫器節點函式
def write_section(state: SectionState):
""" Write a section of the report """
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_section(state: SectionState):
""" Write a section of the report """
# Get state
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
# Generate section
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_section(state: SectionState):
""" Write a section of the report """
# Get state
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
# Generate section
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
建立章節生成器子代理
這個代理(或者更具體地說,子代理)將被並行呼叫多次,每個章節都會被呼叫一次,以搜尋網路、獲取內容,然後編寫特定的章節。我們利用 LangGraph 的傳送結構來實現這一功能。

章節構建子代理
from langgraph.graph import StateGraph, START, END
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
from langgraph.graph import StateGraph, START, END
# Add nodes and edges
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
# Display the graph
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
from langgraph.graph import StateGraph, START, END
# Add nodes and edges
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
# Display the graph
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
輸出

建立動態並行化節點函式 – 並行化章節編寫
Send(…) 用於並行化併為每個部分呼叫一次 section_builder_subagent,以(並行)寫入內容。
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
Send("section_builder_with_web_search", # name of the subagent node
for s in state["sections"]
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
return [
Send("section_builder_with_web_search", # name of the subagent node
{"section": s})
for s in state["sections"]
if s.research
]
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
return [
Send("section_builder_with_web_search", # name of the subagent node
{"section": s})
for s in state["sections"]
if s.research
]
建立格式化章節節點函式
這基本上是對所有章節進行格式化併合併成一個大文件的部分。

格式章節節點函式
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
for idx, section in enumerate(sections, 1):
Section {idx}: {section.name}
{section.content if section.content else '[Not yet written]'}
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
formatted_str = ""
for idx, section in enumerate(sections, 1):
formatted_str += f"""
{'='*60}
Section {idx}: {section.name}
{'='*60}
Description:
{section.description}
Requires Research:
{section.research}
Content:
{section.content if section.content else '[Not yet written]'}
"""
return formatted_str
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
formatted_str = ""
for idx, section in enumerate(sections, 1):
formatted_str += f"""
{'='*60}
Section {idx}: {section.name}
{'='*60}
Description:
{section.description}
Requires Research:
{section.research}
Content:
{section.content if section.content else '[Not yet written]'}
"""
return formatted_str
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
最後章節的指導提示
有一個主要的指導提示:
1. FINAL_SECTION_WRITER_PROMPT(最後章節寫作提示)
要求 LLM 根據有關文體、結構、長度、方法的某些指導原則生成並撰寫引言或結論的內容,同時傳送已撰寫部分的內容。
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Available report content of already completed sections:
1. Section-Specific Approach:
- Use # for report title (Markdown format)
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
- Use ## for section title (Markdown format)
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
- Use concrete details over general statements
- Focus on your single most important point
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Available report content of already completed sections:
{context}
1. Section-Specific Approach:
For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
For Conclusion/Summary:
- Use ## for section title (Markdown format)
- 100-150 word limit
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
3. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
4. Quality Checks:
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Available report content of already completed sections:
{context}
1. Section-Specific Approach:
For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
For Conclusion/Summary:
- Use ## for section title (Markdown format)
- 100-150 word limit
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
3. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
4. Quality Checks:
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
建立撰寫最後章節節點函式
該函式使用上述 FINAL_SECTION_WRITER_PROMPT 指令提示來編寫引言和結論。該函式將使用下面的 Send(…) 並行執行

最後章節寫作節點函式
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
# Get state
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
# Generate section
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
# Get state
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
# Generate section
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
建立動態並行化節點函式 – 並行化最後章節的編寫
Send(…) 用於並行化,為引言和結論各呼叫一次 write_final_sections,(並行)寫入內容
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
return [
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
if not s.research
]
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
return [
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
if not s.research
]
編譯最終報告節點函式
該函式將報告的所有部分合並在一起,並將其編譯成最終報告檔案

編譯最終報告節點函式
def compile_final_report(state: ReportState):
""" Compile the final report """
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
section.content = completed_sections[section.name]
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
def compile_final_report(state: ReportState):
""" Compile the final report """
# Get sections
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
for section in sections:
section.content = completed_sections[section.name]
# Compile final report
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
def compile_final_report(state: ReportState):
""" Compile the final report """
# Get sections
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
for section in sections:
section.content = completed_sections[section.name]
# Compile final report
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
建立我們的深度研究和報告撰寫代理
現在,我們將所有已定義的元件和子代理整合在一起,建立我們的主規劃代理。

深度研究與報告撰寫代理工作流程
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
# view agent structure
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
# view agent structure
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
輸出

現在我們可以執行並測試我們的代理系統了!
執行並測試我們的深度研究報告撰寫代理
最後,讓我們來測試一下我們的深度研究報告撰寫代理!我們將建立一個簡單的函式來即時流式傳輸進度,然後顯示最終報告。我建議在代理執行後關閉所有中間列印資訊!
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
async for event in events:
for k, v in event.items():
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
events = agent.astream(
{'topic' : prompt},
config,
stream_mode="values",
)
async for event in events:
for k, v in event.items():
if verbose:
if k != "__end__":
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
if k == 'final_report':
print('='*50)
print('Final Report:')
md = RichMarkdown(v)
display(md)
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
events = agent.astream(
{'topic' : prompt},
config,
stream_mode="values",
)
async for event in events:
for k, v in event.items():
if verbose:
if k != "__end__":
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
if k == 'final_report':
print('='*50)
print('Final Report:')
md = RichMarkdown(v)
display(md)
測試執行
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
prompt=topic)
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
prompt=topic)
輸出
--- Generating Report Plan ------ Generating Report Plan Completed ------ Generating Search Queries for Section: NVIDIA's Market Dominance in GPUs ------ Generating Search Queries for Section: Strategic Acquisitions and Partnerships ------ Generating Search Queries for Section: Technological Innovations and AI Leadership ------ Generating Search Queries for Section: Financial Performance and Growth Strategy ------ Generating Search Queries for Section: NVIDIA's Market Dominance in GPUs Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Financial Performance and Growth Strategy Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Technological Innovations and AI Leadership Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Strategic Acquisitions and Partnerships Completed ------ Searching Web for Queries ------ Searching Web for Queries Completed ------ Writing Section : Strategic Acquisitions and Partnerships ------ Searching Web for Queries Completed ------ Writing Section : Financial Performance and Growth Strategy ------ Searching Web for Queries Completed ------ Writing Section : NVIDIA's Market Dominance in GPUs ------ Searching Web for Queries Completed ------ Writing Section : Technological Innovations and AI Leadership ------ Writing Section : Strategic Acquisitions and Partnerships Completed ------ Writing Section : Financial Performance and Growth Strategy Completed ------ Writing Section : NVIDIA's Market Dominance in GPUs Completed ------ Writing Section : Technological Innovations and AI Leadership Completed ------ Formatting Completed Sections ------ Formatting Completed Sections is Done ------ Writing Final Section: Introduction ------ Writing Final Section: Conclusion ------ Writing Final Section: Introduction Completed ------ Writing Final Section: Conclusion Completed ------ Compiling Final Report ------ Compiling Final Report Done ---==================================================Final Report:

如上圖所示,它為我們提供了一份相當全面、經過深入研究且結構合理的報告!
小結
如果你正在閱讀這篇文章,我對你在這本大型指南中堅持到最後的努力表示讚賞!在這裡,我們看到了構建類似於 OpenAI 推出的成熟商業產品(而且還不便宜!)並不太困難,OpenAI 是一家絕對知道如何推出生成式人工智慧(Generative AI)優質產品的公司,現在又推出了代理式人工智慧(Agentic AI)。
我們看到了如何構建我們自己的深度研究和報告生成代理人工智慧系統的詳細架構和工作流程,總體而言,執行這個系統的成本還不到承諾的一美元!如果一切都使用開源元件,那麼它就是完全免費的!此外,這個系統完全可以定製,你可以控制搜尋的方式、報告的結構、長度和風格。需要注意的是,如果使用 Tavily,在執行該代理進行深度研究時,很容易會出現大量搜尋,因此要注意並跟蹤使用情況。這只是給你提供了一個基礎,你可以隨意使用這些程式碼和系統,並對其進行定製,使其變得更好!
評論留言