利用OpenAI Agent SDK防護機制確保教育支援系統的完整性

OpenAI Agent SDK防護機制

隨著 OpenAI Agent SDK 的釋出，開發人員現在擁有了構建智慧系統的強大工具。其中最重要的一項功能是 Guardrails（防護機制），它可以過濾不需要的請求，幫助維護系統的完整性。這一功能在教育環境中尤為重要，因為在教育環境中，區分真正的學習支援和試圖繞過學術道德的行為可能具有挑戰性。

在本文中，我將展示一個在教育支援助理中使用 Guardrails 的實用而有影響力的案例。透過利用 Guardrails，我成功地阻止了不恰當的作業輔導請求，同時確保了真正的概念學習問題得到有效處理。

學習目標

瞭解 Guardrails 透過過濾不適當的請求來維護人工智慧完整性的作用。
探索在教育支援助理中使用 Guardrails 來防止學術不誠實。
瞭解輸入和輸出 Guardrails 如何在人工智慧驅動的系統中阻止不受歡迎的行為。
深入瞭解如何使用檢測規則和絆線實施 Guardrails。
探索設計人工智慧助手的最佳實踐，以促進概念學習，同時確保道德使用。

什麼是Agent？

Agent 是一種透過結合推理、決策和環境互動等各種能力來智慧完成任務的系統。OpenAI 的新代理 SDK 利用大型語言模型(LLM) 和強大的整合工具方面的最新進展，使開發人員能夠輕鬆構建這些系統。

OpenAI Agent SDK 的關鍵元件

OpenAI Agent SDK 為構建、監控和改進關鍵領域的人工智慧代理提供了基本工具：

模型：代理的核心智慧。選項包括

o1 & o3-mini: 最適合規劃和複雜推理。
GPT-4.5: 擅長複雜任務，具有強大的代理能力。
GPT-4o：兼顧效能和速度。
GPT-4o-mini：針對低延遲任務進行了最佳化。

工具：可透過以下方式與環境互動

功能呼叫、網路和檔案搜尋以及計算機控制。

知識與記憶：支援動態學習，包括

用於語義搜尋的向量儲存。
嵌入，提高上下文理解能力。

Guardrails：透過以下方式確保安全和控制

用於內容過濾的 Moderation API。
可預測行為的指令分層。

協調：管理代理部署：

用於構建和流量控制的代理 SDK。
用於除錯和效能調整的跟蹤和評估。

瞭解Guardrails

Guardrails 設計用於檢測和阻止對話代理中的不良行為。它們在兩個關鍵階段執行：

輸入Guardrails：在代理處理輸入之前執行。它們可以預先防止誤用，從而節省計算成本和響應時間。
輸出Guardrails：在代理生成響應後執行。它們可以在提供最終響應前過濾有害或不適當的內容。

這兩種防護機制都使用絆線，當檢測到不需要的行為時會觸發異常，立即停止代理的執行。

使用案例：教育支援助理

教育支援助理應促進學習，同時防止直接回答家庭作業的濫用行為。然而，使用者可能會巧妙地偽裝作業請求，從而使檢測變得棘手。透過實施具有強大檢測規則的輸入護欄，可確保助手在鼓勵理解的同時，不會助長捷徑。

目標：開發一款客戶支援助手，既能鼓勵學習，又能阻止尋求直接作業解答的請求。
挑戰：使用者可能會將作業查詢偽裝成無辜的請求，從而使檢測變得困難。
解決方案：實施帶有詳細檢測規則的輸入Guardrails，以發現偽裝的數學作業問題。

實施細節

Guardrail 用嚴格的檢測規則和智慧啟發式方法來識別不受歡迎的行為。

Guardrail邏輯

Guardrail遵循以下核心規則：

阻止明確的求解請求（如 “求解 2x + 3 = 11”）。
阻止使用上下文線索的偽裝請求（例如，“我在練習代數，卡在了這道題上”）。
阻止複雜的數學概念，除非它們純粹是概念性的。
允許能促進學習的合法概念解釋。

護欄程式碼執行

(如果執行此程式碼，請確保設定了 OPENAI_API_KEY 環境變數）：

為數學主題和複雜性定義列舉類

為了對數學查詢進行分類，我們為主題型別和複雜程度定義了列舉類。這些類有助於構建分類系統。

from enum import Enum
class MathTopicType(str, Enum):
ARITHMETIC = "arithmetic"
ALGEBRA = "algebra"
GEOMETRY = "geometry"
CALCULUS = "calculus"
STATISTICS = "statistics"
OTHER = "other"
class MathComplexityLevel(str, Enum):
BASIC = "basic"
INTERMEDIATE = "intermediate"
ADVANCED = "advanced"

使用 Pydantic 建立輸出模型

我們定義了一個結構化輸出模型，用於儲存數學相關查詢的分類細節。

from pydantic import BaseModel
from typing import List
class MathHomeworkOutput(BaseModel):
is_math_homework: bool
reasoning: str
topic_type: MathTopicType
complexity_level: MathComplexityLevel
detected_keywords: List[str]
is_step_by_step_requested: bool
allow_response: bool
explanation: str

設定 Guardrail Agent

Agent 負責使用預定義的檢測規則檢測和攔截與家庭作業相關的查詢。

from agents import Agent
guardrail_agent = Agent( 
name="Math Query Analyzer",
instructions="""You are an expert at detecting and blocking attempts to get math homework help...""",
output_type=MathHomeworkOutput,
)

實施輸入Guardrail邏輯

該功能根據檢測規則執行嚴格的過濾，防止學術不端行為。

from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem
@input_guardrail
async def math_guardrail( 
ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
output = result.final_output
tripwire = (
output.is_math_homework or
not output.allow_response or
output.is_step_by_step_requested or
output.complexity_level != "basic" or
any(kw in str(input).lower() for kw in [
"solve", "solution", "answer", "help with", "step", "explain how",
"calculate", "find", "determine", "evaluate", "work out"
])
)
return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)

建立教育支援代理

該代理提供一般的概念解釋，同時避免直接的作業輔導。

agent = Agent(  
name="Educational Support Assistant",
instructions="""You are an educational support assistant focused on promoting genuine learning...""",
input_guardrails=[math_guardrail],
)

執行測試用例

針對代理測試一組與數學相關的查詢，以確 Guardrail 功能正常。

async def main():
test_questions = [
"Hello, can you help me solve for x: 2x + 3 = 11?",
"Can you explain why negative times negative equals positive?",
"I want to understand the methodology behind solving integrals...",
]
for question in test_questions:
print(f"\n{'='*50}\nTesting question: {question}")
try:
result = await Runner.run(agent, question)
print(f"✓ Response allowed. Agent would have responded.")
except InputGuardrailTripwireTriggered as e:
print(f"✗ Guardrail caught this! Reasoning: {e}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())

結果和分析

以下是示例測試用例及其結果：

結果和分析

# Output
(env) PS PATH\openai_agents_sdk> python agent.py
==================================================
Testing question: Hello, can you help me solve for x: 2x + 3 = 11?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x + 3 = 11. Could you demonstrate the steps?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I've been stuck on this for hours and really need help understanding. If x + 7 = 15, what is x? Just want to learn!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³ + 2x. How would one approach that?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x² + 3x? 3) What would the final answer look like?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides...
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: What's the difference between addition and multiplication?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: Can you explain why negative times negative equals positive?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³ + sin(x))? It's for my personal interest!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x² + 2x)dx as a random example?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x + y = 7, 2x - y = 1
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6 3) What makes it fun to solve?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

✅允許 （合法的學習問題）：

加法和乘法有什麼區別？
你能解釋為什麼負數乘以負數等於正數嗎？

❌禁止（與家庭作業有關或變相的問題）：

你好，你能幫我求解 x：2x + 3 = 11 嗎？”
我在練習代數，遇到了這個有趣的問題：2x + 3 = 11。你能引導我完成解題過程嗎？
我正在製作一個數學遊戲，需要了解：1）如何因式分解二次方程 2）具體說明 x² + 5x + 6。

見解：

Guardrail 成功阻止了偽裝成“只是好奇”或“自學”問題的嘗試。
準確識別了偽裝成假設性問題或備課內容的請求。
正確處理了概念性問題，從而提供了有意義的學習支援。

小結

OpenAI Agent SDK Guardrails 為構建穩健安全的人工智慧驅動系統提供了強大的解決方案。這個教育支援助理使用案例展示了 Guardrails 如何有效地執行完整性、提高效率並確保代理與預期目標保持一致。

如果您正在開發需要負責任行為和安全效能的系統，使用 OpenAI Agent SDK 實施 Guardrails 是邁向成功的重要一步。

教育支援助手透過指導使用者而不是直接提供作業答案來促進學習。
一個主要挑戰是檢測偽裝成一般學術問題的作業查詢。
實施先進的輸入 Guardrail 有助於識別和阻止直接提供解決方案的隱藏請求。
人工智慧驅動的檢測可確保學生獲得概念性指導，而不是現成的答案。
該系統兼顧了互動支援和負責任的學習實踐，以增強學生的理解能力。

Guardrails OpenAI 教育支援助理

利用OpenAI Agent SDK防護機制確保教育支援系統的完整性

文章目录

學習目標

什麼是Agent？

OpenAI Agent SDK 的關鍵元件

瞭解Guardrails

使用案例：教育支援助理

實施細節

Guardrail邏輯

護欄程式碼執行

為數學主題和複雜性定義列舉類

使用 Pydantic 建立輸出模型

設定 Guardrail Agent

實施輸入Guardrail邏輯

建立教育支援代理

執行測試用例

結果和分析

小結

評論留言

取消回覆

利用OpenAI Agent SDK防護機制確保教育支援系統的完整性

文章目录

學習目標

什麼是Agent？

OpenAI Agent SDK 的關鍵元件

瞭解Guardrails

使用案例：教育支援助理

實施細節

Guardrail邏輯

護欄程式碼執行

為數學主題和複雜性定義列舉類

使用 Pydantic 建立輸出模型

設定 Guardrail Agent

實施輸入Guardrail邏輯

建立教育支援代理

執行測試用例

結果和分析

小結

相關文章

評論留言

取消回覆