如何構建緊急接線員語音聊天機器人

如何構建緊急接線員語音聊天機器人

語言模型在世界上迅速發展。現在,隨著多模態 LLM 在這場語言模型競賽中佔據前列,我們有必要了解如何利用這些多模態模型的功能。從傳統的基於文字的人工智慧聊天機器人,我們正在過渡到基於語音的聊天機器人。它們就像我們的私人助理,隨時滿足我們的需求。在本文章中,我們將建立一個緊急接線員語音聊天機器人。這個想法非常簡單:

  • 我們對聊天機器人說話
  • 聊天機器人聽懂我們說的話
  • 它用語音註釋做出回應

文字轉語音

我們的使用案例

讓我們想象一下現實世界中的場景。我們生活在一個擁有超過 14 億人口的國家,面對如此龐大的人口數量,無論是醫療問題、火災突發、警察干預,甚至是反自殺援助等心理健康支援,都必然會發生緊急情況。

在這種時刻,分秒必爭。此外,考慮到缺乏應急接線員和提出的問題數量過多。這就是語音聊天機器人大顯身手的地方,它能在人們最需要的時候提供快速、有聲的援助。

  • 緊急援助:針對健康、火災、犯罪或災難相關問題的即時幫助,無需等待人工接線員(在無法接通的情況下)。
  • 心理健康幫助熱線:基於語音的情感支援助理,以同情的態度引導使用者。
  • 農村無障礙:移動應用程式訪問受限的地區可以從簡單的語音介面中受益,因為在這些地區,人們通常透過說話進行交流。

這正是我們要打造的產品。我們將扮演一個尋求幫助的人,而聊天機器人將在大型語言模型的支援下扮演緊急救援人員的角色。

我們將使用的工具

為了實現我們的語音聊天機器人,我們將使用下面提到的人工智慧模型:

  • Whisper (Large)– OpenAI 的語音轉文字模型,透過 GroqCloud 執行,將語音轉換為文字。
  • GPT-4.1-mini– 由 CometAPI(免費 LLM 提供商)提供,是我們聊天機器人的大腦,可以理解我們的詢問並生成有意義的回覆。
  • Google Text-to-Speech(gTTS)–將聊天機器人的回覆轉換成語音,這樣它就能與我們對話了。
  • FFmpeg– 幫助我們輕鬆錄製和管理音訊的便捷庫。

要求

在開始編碼之前,我們需要設定一些事項:

  1. GroqCloud API 金鑰
    從此處獲取:https://console.groq.com/keys
  2. CometAPI 金鑰
    從以下網址註冊並儲存您的 API 金鑰:https://api.cometapi.com/
  3. ElevenLabs API 金鑰
    從以下網址註冊並儲存您的 API 金鑰:https://elevenlabs.io/app/home
  4. 安裝 FFmpeg
    如果尚未安裝,請按照以下指南在系統中安裝 FFmpeg:https://itsfoss.com/ffmpeg/

在終端輸入“ffmeg -version”確認

完成這些設定後,您就可以開始構建自己的語音聊天機器人了!

專案結構

專案結構非常簡單,大部分工作將在 app.py utils.py python 指令碼中完成。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
VOICE-CHATBOT/
├── venv/ # Virtual environment for dependencies
├── .env # Environment variables (API keys, etc.)
├── app.py # Main application script
├── emergency.png # Emergency-related image asset
├── README.md # Project documentation (optional)
├── requirements.txt # Python dependencies
├── utils.py # Utility/helper functions
VOICE-CHATBOT/ ├── venv/ # Virtual environment for dependencies ├── .env # Environment variables (API keys, etc.) ├── app.py # Main application script ├── emergency.png # Emergency-related image asset ├── README.md # Project documentation (optional) ├── requirements.txt # Python dependencies ├── utils.py # Utility/helper functions
VOICE-CHATBOT/

├── venv/                  # Virtual environment for dependencies
├── .env                   # Environment variables (API keys, etc.)
├── app.py                 # Main application script
├── emergency.png          # Emergency-related image asset
├── README.md              # Project documentation (optional)
├── requirements.txt       # Python dependencies
├── utils.py               # Utility/helper functions

我們需要修改一些必要的檔案,以確保滿足所有的依賴條件:

在 .env 檔案中

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
GROQ_API_KEY = "<your-groq-api-key"
COMET_API_KEY = "<your-comet-api-key>"
ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"
GROQ_API_KEY = "<your-groq-api-key" COMET_API_KEY = "<your-comet-api-key>" ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"
GROQ_API_KEY = "<your-groq-api-key"
COMET_API_KEY = "<your-comet-api-key>"
ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"

在 requirements.txt 中

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ffmpeg-python
pydub
pyttsx3
langchain
langchain-community
langchain-core
langchain-groq
langchain_openai
python-dotenv
streamlit==1.37.0
audio-recorder-streamlit
dotenv
elevenlabs
gtts
ffmpeg-python pydub pyttsx3 langchain langchain-community langchain-core langchain-groq langchain_openai python-dotenv streamlit==1.37.0 audio-recorder-streamlit dotenv elevenlabs gtts
ffmpeg-python
pydub
pyttsx3
langchain
langchain-community
langchain-core
langchain-groq
langchain_openai
python-dotenv
streamlit==1.37.0
audio-recorder-streamlit
dotenv
elevenlabs
gtts

設定虛擬環境

我們還必須設定一個虛擬環境(這是一個很好的做法)。我們將在終端中完成這項工作。

  1. 建立虛擬環境
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y
~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y
~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y

建立虛擬環境

  1. 啟用我們的虛擬環境
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/
~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/
~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/

啟用我們的虛擬環境

  1. 執行應用程式後,也可以停用虛擬環境
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
~/Desktop/Emergency-Voice-Chatbot$ conda deactivate
~/Desktop/Emergency-Voice-Chatbot$ conda deactivate
~/Desktop/Emergency-Voice-Chatbot$ conda deactivate

停用虛擬環境

主要Python指令碼

讓我們先來了解一下 utils.py 指令碼。

1. 主要匯入

  • time, tempfile, os, re, BytesIO– 處理定時、臨時檔案、環境變數、regex 和記憶體資料。
  • requests– 執行 HTTP 請求(例如呼叫 API)。
  • gTTS, elevenlabs, pydub– 將文字轉換為語音、將語音轉換為文字以及播放/操縱音訊。
  • groq, langchain_*– 使用 Groq/OpenAI LLMs 與 LangChain 處理和生成文字。
  • streamlit– 構建互動式網路應用。
  • dotenv– 從 .env 檔案載入環境變數(如 API 金鑰)。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import time
import requests
import tempfile
import re
from io import BytesIO
from gtts import gTTS
from elevenlabs.client import ElevenLabs
from elevenlabs import play
from pydub import AudioSegment
from groq import Groq
from langchain_groq import ChatGroq
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
import os
from dotenv import load_dotenv
load_dotenv()
import time import requests import tempfile import re from io import BytesIO from gtts import gTTS from elevenlabs.client import ElevenLabs from elevenlabs import play from pydub import AudioSegment from groq import Groq from langchain_groq import ChatGroq from langchain_openai import ChatOpenAI from langchain_core.messages import AIMessage, HumanMessage from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate import streamlit as st import os from dotenv import load_dotenv load_dotenv()
import time
import requests
import tempfile
import re
from io import BytesIO
from gtts import gTTS
from elevenlabs.client import ElevenLabs
from elevenlabs import play
from pydub import AudioSegment
from groq import Groq
from langchain_groq import ChatGroq
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
import os
from dotenv import load_dotenv
load_dotenv()

2. 載入API金鑰並初始化模型

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Initialize the Groq client
client = Groq(api_key=os.getenv('GROQ_API_KEY'))
# Initialize the Groq model for LLM responses
llm = ChatOpenAI(
model_name="gpt-4.1-mini",
openai_api_key=os.getenv("COMET_API_KEY"),
openai_api_base="https://api.cometapi.com/v1"
)
# Set the path to ffmpeg executable
AudioSegment.converter = "/bin/ffmpeg"
# Initialize the Groq client client = Groq(api_key=os.getenv('GROQ_API_KEY')) # Initialize the Groq model for LLM responses llm = ChatOpenAI( model_name="gpt-4.1-mini", openai_api_key=os.getenv("COMET_API_KEY"), openai_api_base="https://api.cometapi.com/v1" ) # Set the path to ffmpeg executable AudioSegment.converter = "/bin/ffmpeg"
# Initialize the Groq client
client = Groq(api_key=os.getenv('GROQ_API_KEY'))
# Initialize the Groq model for LLM responses
llm = ChatOpenAI(
model_name="gpt-4.1-mini",
openai_api_key=os.getenv("COMET_API_KEY"), 
openai_api_base="https://api.cometapi.com/v1"
)
# Set the path to ffmpeg executable
AudioSegment.converter = "/bin/ffmpeg"

3. 將音訊檔案(我們的語音錄音)轉換為.wav格式

在此,我們將透過 AudioSegment 和 BytesIO 完成音訊的位元組轉換,並將其轉換為wav格式:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def audio_bytes_to_wav(audio_bytes):
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:
audio = AudioSegment.from_file(BytesIO(audio_bytes))
# Downsample to reduce file size if needed
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export(temp_wav.name, format="wav")
return temp_wav.name
except Exception as e:
st.error(f"Error during WAV file conversion: {e}")
return None
def audio_bytes_to_wav(audio_bytes): try: with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav: audio = AudioSegment.from_file(BytesIO(audio_bytes)) # Downsample to reduce file size if needed audio = audio.set_frame_rate(16000).set_channels(1) audio.export(temp_wav.name, format="wav") return temp_wav.name except Exception as e: st.error(f"Error during WAV file conversion: {e}") return None
def audio_bytes_to_wav(audio_bytes):
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:
audio = AudioSegment.from_file(BytesIO(audio_bytes))
# Downsample to reduce file size if needed
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export(temp_wav.name, format="wav")
return temp_wav.name
except Exception as e:
st.error(f"Error during WAV file conversion: {e}")
return None

4. 分割音訊

我們將建立一個函式,根據輸入引數(check_length_ms)分割音訊。我們還將利用 regex 函式去除任何標點符號。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def split_audio(file_path, chunk_length_ms):
audio = AudioSegment.from_wav(file_path)
return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)]
def remove_punctuation(text):
return re.sub(r'[^\w\s]', '', text)
def split_audio(file_path, chunk_length_ms): audio = AudioSegment.from_wav(file_path) return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)] def remove_punctuation(text): return re.sub(r'[^\w\s]', '', text)
def split_audio(file_path, chunk_length_ms):
   audio = AudioSegment.from_wav(file_path)
   return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)]


def remove_punctuation(text):
   return re.sub(r'[^\w\s]', '', text)

5. 生成LLM響應

現在,我們開始執行主應答器功能,LLM 將在此功能下生成對我們的查詢的適當回覆。在提示模板中,我們將向 LLM 提供指示,說明它們應該如何響應查詢。我們將使用Langchain 表示式語言來完成這項任務。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def get_llm_response(query, chat_history):
try:
template = template = """
You are an experienced Emergency Response Phone Operator trained to handle critical situations in India.
Your role is to guide users calmly and clearly during emergencies involving:
- Medical crises (injuries, heart attacks, etc.)
- Fire incidents
- Police/law enforcement assistance
- Suicide prevention or mental health crises
You must:
1. **Remain calm and assertive**, as if speaking on a phone call.
2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc.
3. **Provide immediate and practical steps** the user can take before help arrives.
4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.).
5. **Prioritize user safety**, and clearly instruct them what *not* to do as well.
6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions.
If the user's query is not related to an emergency, respond with:
"I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions."
Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**.
**Chat History:** {chat_history}
**User:** {user_query}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | llm | StrOutputParser()
response_gen = chain.stream({
"chat_history": chat_history,
"user_query": query
})
response_text = ''.join(list(response_gen))
response_text = remove_punctuation(response_text)
# Remove repeated text
response_lines = response_text.split('\n')
unique_lines = list(dict.fromkeys(response_lines)) # Removing duplicates
cleaned_response = '\n'.join(unique_lines)
return cleaned_responseChatbot
except Exception as e:
st.error(f"Error during LLM response generation: {e}")
return "Error"
def get_llm_response(query, chat_history): try: template = template = """ You are an experienced Emergency Response Phone Operator trained to handle critical situations in India. Your role is to guide users calmly and clearly during emergencies involving: - Medical crises (injuries, heart attacks, etc.) - Fire incidents - Police/law enforcement assistance - Suicide prevention or mental health crises You must: 1. **Remain calm and assertive**, as if speaking on a phone call. 2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc. 3. **Provide immediate and practical steps** the user can take before help arrives. 4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.). 5. **Prioritize user safety**, and clearly instruct them what *not* to do as well. 6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions. If the user's query is not related to an emergency, respond with: "I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions." Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**. **Chat History:** {chat_history} **User:** {user_query} """ prompt = ChatPromptTemplate.from_template(template) chain = prompt | llm | StrOutputParser() response_gen = chain.stream({ "chat_history": chat_history, "user_query": query }) response_text = ''.join(list(response_gen)) response_text = remove_punctuation(response_text) # Remove repeated text response_lines = response_text.split('\n') unique_lines = list(dict.fromkeys(response_lines)) # Removing duplicates cleaned_response = '\n'.join(unique_lines) return cleaned_responseChatbot except Exception as e: st.error(f"Error during LLM response generation: {e}") return "Error"
def get_llm_response(query, chat_history):
try:
template = template = """
You are an experienced Emergency Response Phone Operator trained to handle critical situations in India.
Your role is to guide users calmly and clearly during emergencies involving:
- Medical crises (injuries, heart attacks, etc.)
- Fire incidents
- Police/law enforcement assistance
- Suicide prevention or mental health crises
You must:
1. **Remain calm and assertive**, as if speaking on a phone call.
2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc.
3. **Provide immediate and practical steps** the user can take before help arrives.
4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.).
5. **Prioritize user safety**, and clearly instruct them what *not* to do as well.
6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions.
If the user's query is not related to an emergency, respond with:
"I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions."
Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**.
**Chat History:** {chat_history}
**User:** {user_query}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | llm | StrOutputParser()
response_gen = chain.stream({
"chat_history": chat_history,
"user_query": query
})
response_text = ''.join(list(response_gen))
response_text = remove_punctuation(response_text)
# Remove repeated text
response_lines = response_text.split('\n')
unique_lines = list(dict.fromkeys(response_lines))  # Removing duplicates
cleaned_response = '\n'.join(unique_lines)
return cleaned_responseChatbot
except Exception as e:
st.error(f"Error during LLM response generation: {e}")
return "Error"

6. 文字轉語音

我們將在 ElevenLabs TTS Client 的幫助下建立一個將文字轉換為語音的函式,它將以 AudioSegment 格式返回音訊。我們也可以使用其他 TTS 模型,如 Nari Lab 的 Dia 或 Google 的 gTTS。Eleven Labs 首先會提供一些免費點數,然後我們需要支付更多點數,而 gTTS 則完全免費。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def text_to_speech(text: str, retries: int = 3, delay: int = 5):
attempt = 0
while attempt < retries:
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Write streamed bytes to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
temp_path = f.name
# Load and return the audio
audio = AudioSegment.from_mp3(temp_path)
return audio
else:
st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.")
return AudioSegment.silent(duration=1000)
except Exception as e:
st.error(f"Error during text-to-speech conversion: {e}")
return AudioSegment.silent(duration=1000)
return AudioSegment.silent(duration=1000)
def text_to_speech(text: str, retries: int = 3, delay: int = 5): attempt = 0 while attempt < retries: try: # Request speech synthesis (streaming generator) response_stream = tts_client.text_to_speech.convert( text=text, voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Write streamed bytes to a temporary file with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f: for chunk in response_stream: f.write(chunk) temp_path = f.name # Load and return the audio audio = AudioSegment.from_mp3(temp_path) return audio else: st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.") return AudioSegment.silent(duration=1000) except Exception as e: st.error(f"Error during text-to-speech conversion: {e}") return AudioSegment.silent(duration=1000) return AudioSegment.silent(duration=1000)
def text_to_speech(text: str, retries: int = 3, delay: int = 5):
attempt = 0
while attempt < retries:
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Write streamed bytes to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
temp_path = f.name
# Load and return the audio
audio = AudioSegment.from_mp3(temp_path)
return audio
else:
st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.")
return AudioSegment.silent(duration=1000)
except Exception as e:
st.error(f"Error during text-to-speech conversion: {e}")
return AudioSegment.silent(duration=1000)
return AudioSegment.silent(duration=1000)

7. 建立介紹性資訊

我們還將建立一個介紹性文字並將其傳遞給我們的 TTS 模型,因為受訪者通常會進行自我介紹並尋求使用者可能需要的幫助。這裡我們將返回 mp3 檔案的路徑。

lang= “en”-> 英語

tld= “co.in” -> 可以為特定語言生成不同的本地化“口音”。預設為 “com”

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def create_welcome_message():
welcome_text = (
"Hello, you’ve reached the Emergency Help Desk. "
"Please let me know if it's a medical, fire, police, or mental health emergency—"
"I'm here to guide you right away."
)
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=welcome_text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Save streamed bytes to temp file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
return f.name
except requests.ConnectionError:
st.error("Failed to generate welcome message due to connection error.")
except Exception as e:
st.error(f"Error creating welcome message: {e}")
return None
def create_welcome_message(): welcome_text = ( "Hello, you’ve reached the Emergency Help Desk. " "Please let me know if it's a medical, fire, police, or mental health emergency—" "I'm here to guide you right away." ) try: # Request speech synthesis (streaming generator) response_stream = tts_client.text_to_speech.convert( text=welcome_text, voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Save streamed bytes to temp file with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f: for chunk in response_stream: f.write(chunk) return f.name except requests.ConnectionError: st.error("Failed to generate welcome message due to connection error.") except Exception as e: st.error(f"Error creating welcome message: {e}") return None
def create_welcome_message():
welcome_text = (
"Hello, you’ve reached the Emergency Help Desk. "
"Please let me know if it's a medical, fire, police, or mental health emergency—"
"I'm here to guide you right away."
)
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=welcome_text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Save streamed bytes to temp file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
return f.name
except requests.ConnectionError:
st.error("Failed to generate welcome message due to connection error.")
except Exception as e:
st.error(f"Error creating welcome message: {e}")
return None

Streamlit應用程式

現在,讓我們跳轉到 main.py 指令碼,在這裡我們將使用 Streamlit 來視覺化我們的聊天機器人。

匯入庫和函式

匯入我們在 utils.py 中建立的庫和函式

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import tempfile
import re # This can be removed if not used
from io import BytesIO
from pydub import AudioSegment
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
from audio_recorder_streamlit import audio_recorder
from utils import *
import tempfile import re # This can be removed if not used from io import BytesIO from pydub import AudioSegment from langchain_core.messages import AIMessage, HumanMessage from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate import streamlit as st from audio_recorder_streamlit import audio_recorder from utils import *
import tempfile
import re  # This can be removed if not used
from io import BytesIO
from pydub import AudioSegment
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
from audio_recorder_streamlit import audio_recorder
from utils import *

Streamlit設定

現在,我們將設定標題名稱和漂亮的“Emergency”視覺照片

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
st.title(":blue[Emergency Help Bot] 🚨🚑🆘")
st.sidebar.image('./emergency.jpg', use_column_width=True)
st.title(":blue[Emergency Help Bot] 🚨🚑🆘") st.sidebar.image('./emergency.jpg', use_column_width=True)
st.title(":blue[Emergency Help Bot] 🚨🚑🆘")
st.sidebar.image('./emergency.jpg', use_column_width=True)

我們將設定會話狀態,以跟蹤聊天和音訊內容

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
if "chat_histories" not in st.session_state:
st.session_state.chat_histories = []
if "played_audios" not in st.session_state:
st.session_state.played_audios = {}
if "chat_history" not in st.session_state: st.session_state.chat_history = [] if "chat_histories" not in st.session_state: st.session_state.chat_histories = [] if "played_audios" not in st.session_state: st.session_state.played_audios = {}
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
if "chat_histories" not in st.session_state:
st.session_state.chat_histories = []
if "played_audios" not in st.session_state:
st.session_state.played_audios = {}

呼叫實用程式函式

我們將從應答方建立歡迎資訊介紹。這將是我們對話的開始。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
if len(st.session_state.chat_history) == 0:
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]
st.session_state.played_audios[welcome_audio_path] = False
if len(st.session_state.chat_history) == 0: welcome_audio_path = create_welcome_message() st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)] st.session_state.played_audios[welcome_audio_path] = False
if len(st.session_state.chat_history) == 0:
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]
st.session_state.played_audios[welcome_audio_path] = False

側邊欄設定

現在,我們將在側邊欄中設定錄音機、語音轉文字llm_response text-to-speech 邏輯,這是本專案的主要核心內容

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
with st.sidebar:
audio_bytes = audio_recorder(
energy_threshold=0.01,
pause_threshold=0.8,
text="Speak on clicking the ICON (Max 5 min) \n",
recording_color="#e9b61d", # yellow
neutral_color="#2abf37", # green
icon_name="microphone",
icon_size="2x"
)
if audio_bytes:
temp_audio_path = audio_bytes_to_wav(audio_bytes)
if temp_audio_path:
try:
user_input = speech_to_text(audio_bytes)
if user_input:
st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path))
response = get_llm_response(user_input, st.session_state.chat_history)
audio_response = text_to_speech(response)
with st.sidebar: audio_bytes = audio_recorder( energy_threshold=0.01, pause_threshold=0.8, text="Speak on clicking the ICON (Max 5 min) \n", recording_color="#e9b61d", # yellow neutral_color="#2abf37", # green icon_name="microphone", icon_size="2x" ) if audio_bytes: temp_audio_path = audio_bytes_to_wav(audio_bytes) if temp_audio_path: try: user_input = speech_to_text(audio_bytes) if user_input: st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path)) response = get_llm_response(user_input, st.session_state.chat_history) audio_response = text_to_speech(response)
with st.sidebar:
audio_bytes = audio_recorder(
energy_threshold=0.01,
pause_threshold=0.8,
text="Speak on clicking the ICON (Max 5 min) \n",
recording_color="#e9b61d",   # yellow
neutral_color="#2abf37",    # green
icon_name="microphone",
icon_size="2x"
)
if audio_bytes:
temp_audio_path = audio_bytes_to_wav(audio_bytes)
if temp_audio_path:
try:
user_input = speech_to_text(audio_bytes)
if user_input:
st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path))
response = get_llm_response(user_input, st.session_state.chat_history)
audio_response = text_to_speech(response)

我們還將在側邊欄上設定一個按鈕,以便在必要時重新啟動會話,當然,我們還將在受訪者一側設定介紹性語音註釋。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
if st.button("Start New Chat"):
st.session_state.chat_histories.append(st.session_state.chat_history)
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]
if st.button("Start New Chat"): st.session_state.chat_histories.append(st.session_state.chat_history) welcome_audio_path = create_welcome_message() st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]
if st.button("Start New Chat"):
st.session_state.chat_histories.append(st.session_state.chat_history)
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]

在應用程式的主頁上,我們將以點選播放音訊檔案的形式直觀顯示聊天記錄

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
for msg in st.session_state.chat_history:
if isinstance(msg, AIMessage):
with st.chat_message("AI"):
st.audio(msg.audio_file, format="audio/mp3")
else: # HumanMessage
with st.chat_message("user"):
st.audio(msg.audio_file, format="audio/wav")
for msg in st.session_state.chat_history: if isinstance(msg, AIMessage): with st.chat_message("AI"): st.audio(msg.audio_file, format="audio/mp3") else: # HumanMessage with st.chat_message("user"): st.audio(msg.audio_file, format="audio/wav")
for msg in st.session_state.chat_history:
if isinstance(msg, AIMessage):
with st.chat_message("AI"):
st.audio(msg.audio_file, format="audio/mp3")
else:  # HumanMessage
with st.chat_message("user"):
st.audio(msg.audio_file, format="audio/wav")

現在,我們已經完成了執行應用程式所需的所有 Python 指令碼。我們將使用以下命令執行 Streamlit 應用程式:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
streamlit run app.py
streamlit run app.py
streamlit run app.py

這就是我們的專案工作流程:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio
[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio
[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio

如需完整程式碼,請訪問此 GitHub 程式碼庫

最終輸出

Streamlit 應用

如何構建緊急接線員語音聊天機器人配圖6

Streamlit 應用程式看起來非常簡潔,而且執行正常!

讓我們來看看它的一些回覆:- 使用者:你好,有人心臟病發作了,我該怎麼辦?

1. 使用者:Hi, someone is having a heart attack right now, what should I do?

 

然後我們就該人的位置和狀態進行了對話,然後聊天機器人提供了以下內容

2. 使用者:Hello, there has been a huge fire breakout in Delhi. Please send help quick

回覆者詢問情況和我目前的位置,然後提供相應的預防措施

3. 使用者:Hey there, there is a person standing alone across the edge of the bridge, how should i proceed?

回覆者詢問我所在的位置以及我提到的那個人的精神狀態

總之,我們的聊天機器人能夠根據情況回覆我們的詢問,並提出相關問題以提供預防措施。

可以做出哪些改進?

  • 多語言支援:可以整合具有強大多語言功能的 LLM,這樣聊天機器人就能與來自不同地區和方言的使用者進行無縫互動。
  • 即時轉錄和翻譯:新增語音轉文字和即時翻譯功能有助於消除溝通障礙。
  • 基於位置的服務:透過整合 GPS 或其他基於位置的即時 API,系統可以檢測使用者的位置,並引導使用者使用最近的應急設施。
  • 語音到語音互動:我們還可以使用語音對語音模型,由於這些模型是專為此類功能而設計的,因此可以讓對話感覺更加自然。
  • 微調 LLM:根據特定緊急情況資料對 LLM 進行自定義微調,可提高其理解能力並提供更準確的響應。

小結

在本文中,我們結合人工智慧模型和一些相關工具,成功構建了一個基於語音的緊急響應聊天機器人。該聊天機器人複製了訓練有素的應急操作員的角色,能夠處理從醫療危機、火災事故到心理健康支援等各種高壓力情況,使用時冷靜果斷,能夠改變我們的 LLM 的行為,以適應現實世界中的各種緊急情況,使城市和農村場景的體驗更加逼真。

評論留言