如何构建紧急接线员语音聊天机器人

语言模型在世界上迅速发展。现在，随着多模态 LLM 在这场语言模型竞赛中占据前列，我们有必要了解如何利用这些多模态模型的功能。从传统的基于文本的人工智能聊天机器人，我们正在过渡到基于语音的聊天机器人。它们就像我们的私人助理，随时满足我们的需求。在本文章中，我们将创建一个紧急接线员语音聊天机器人。这个想法非常简单：

我们对聊天机器人说话
聊天机器人听懂我们说的话
它用语音注释做出回应

文本转语音

我们的使用案例

让我们想象一下现实世界中的场景。我们生活在一个拥有超过 14 亿人口的国家，面对如此庞大的人口数量，无论是医疗问题、火灾突发、警察干预，甚至是反自杀援助等心理健康支持，都必然会发生紧急情况。

在这种时刻，分秒必争。此外，考虑到缺乏应急接线员和提出的问题数量过多。这就是语音聊天机器人大显身手的地方，它能在人们最需要的时候提供快速、有声的援助。

紧急援助：针对健康、火灾、犯罪或灾难相关问题的即时帮助，无需等待人工接线员（在无法接通的情况下）。
心理健康帮助热线：基于语音的情感支持助理，以同情的态度引导用户。
农村无障碍：移动应用程序访问受限的地区可以从简单的语音界面中受益，因为在这些地区，人们通常通过说话进行交流。

这正是我们要打造的产品。我们将扮演一个寻求帮助的人，而聊天机器人将在大型语言模型的支持下扮演紧急救援人员的角色。

我们将使用的工具

为了实现我们的语音聊天机器人，我们将使用下面提到的人工智能模型：

Whisper (Large)– OpenAI 的语音转文本模型，通过 GroqCloud 运行，将语音转换为文本。
GPT-4.1-mini– 由 CometAPI（免费 LLM 提供商）提供，是我们聊天机器人的大脑，可以理解我们的询问并生成有意义的回复。
Google Text-to-Speech（gTTS）–将聊天机器人的回复转换成语音，这样它就能与我们对话了。
FFmpeg– 帮助我们轻松录制和管理音频的便捷库。

要求

在开始编码之前，我们需要设置一些事项：

GroqCloud API 密钥
从此处获取：https://console.groq.com/keys
CometAPI 密钥
从以下网址注册并存储您的 API 密钥：https://api.cometapi.com/
ElevenLabs API 密钥
从以下网址注册并存储您的 API 密钥：https://elevenlabs.io/app/home
安装 FFmpeg
如果尚未安装，请按照以下指南在系统中安装 FFmpeg：https://itsfoss.com/ffmpeg/

在终端输入“ffmeg -version”确认

完成这些设置后，您就可以开始构建自己的语音聊天机器人了！

项目结构

项目结构非常简单，大部分工作将在 app.py 和 utils.py python 脚本中完成。

VOICE-CHATBOT/

├── venv/ # Virtual environment for dependencies

├── .env # Environment variables (API keys, etc.)

├── app.py # Main application script

├── emergency.png # Emergency-related image asset

├── README.md # Project documentation (optional)

├── requirements.txt # Python dependencies

├── utils.py # Utility/helper functions

VOICE-CHATBOT/ ├── venv/ # Virtual environment for dependencies ├── .env # Environment variables (API keys, etc.) ├── app.py # Main application script ├── emergency.png # Emergency-related image asset ├── README.md # Project documentation (optional) ├── requirements.txt # Python dependencies ├── utils.py # Utility/helper functions

VOICE-CHATBOT/

├── venv/                  # Virtual environment for dependencies
├── .env                   # Environment variables (API keys, etc.)
├── app.py                 # Main application script
├── emergency.png          # Emergency-related image asset
├── README.md              # Project documentation (optional)
├── requirements.txt       # Python dependencies
├── utils.py               # Utility/helper functions

我们需要修改一些必要的文件，以确保满足所有的依赖条件：

在 .env 文件中

GROQ_API_KEY = "<your-groq-api-key"

COMET_API_KEY = "<your-comet-api-key>"

ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"

GROQ_API_KEY = "<your-groq-api-key" COMET_API_KEY = "<your-comet-api-key>" ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"

GROQ_API_KEY = "<your-groq-api-key"
COMET_API_KEY = "<your-comet-api-key>"
ELEVENLABS_API_KEY = "<your-elevenlabs-api–key"

在 requirements.txt 中

ffmpeg-python

pydub

pyttsx3

langchain

langchain-community

langchain-core

langchain-groq

langchain_openai

python-dotenv

streamlit==1.37.0

audio-recorder-streamlit

dotenv

elevenlabs

gtts

ffmpeg-python pydub pyttsx3 langchain langchain-community langchain-core langchain-groq langchain_openai python-dotenv streamlit==1.37.0 audio-recorder-streamlit dotenv elevenlabs gtts

ffmpeg-python
pydub
pyttsx3
langchain
langchain-community
langchain-core
langchain-groq
langchain_openai
python-dotenv
streamlit==1.37.0
audio-recorder-streamlit
dotenv
elevenlabs
gtts

设置虚拟环境

我们还必须设置一个虚拟环境（这是一个很好的做法）。我们将在终端中完成这项工作。

创建虚拟环境

~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y

~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y

创建虚拟环境

激活我们的虚拟环境

~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/

~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/

运行应用程序后，也可以停用虚拟环境

~/Desktop/Emergency-Voice-Chatbot$ conda deactivate

~/Desktop/Emergency-Voice-Chatbot$ conda deactivate

主要Python脚本

让我们先来了解一下 utils.py 脚本。

1. 主要导入

time, tempfile, os, re, BytesIO– 处理定时、临时文件、环境变量、regex 和内存数据。
requests– 执行 HTTP 请求（例如调用 API）。
gTTS, elevenlabs, pydub– 将文本转换为语音、将语音转换为文本以及播放/操纵音频。
groq, langchain_*– 使用 Groq/OpenAI LLMs 与 LangChain 处理和生成文本。
streamlit– 构建交互式网络应用。
dotenv– 从 .env 文件加载环境变量（如 API 密钥）。

import time

import requests

import tempfile

import re

from io import BytesIO

from gtts import gTTS

from elevenlabs.client import ElevenLabs

from elevenlabs import play

from pydub import AudioSegment

from groq import Groq

from langchain_groq import ChatGroq

from langchain_openai import ChatOpenAI

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

import os

from dotenv import load_dotenv

load_dotenv()

import time import requests import tempfile import re from io import BytesIO from gtts import gTTS from elevenlabs.client import ElevenLabs from elevenlabs import play from pydub import AudioSegment from groq import Groq from langchain_groq import ChatGroq from langchain_openai import ChatOpenAI from langchain_core.messages import AIMessage, HumanMessage from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate import streamlit as st import os from dotenv import load_dotenv load_dotenv()

import time
import requests
import tempfile
import re
from io import BytesIO
from gtts import gTTS
from elevenlabs.client import ElevenLabs
from elevenlabs import play
from pydub import AudioSegment
from groq import Groq
from langchain_groq import ChatGroq
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
import os
from dotenv import load_dotenv
load_dotenv()

2. 加载API密钥并初始化模型

# Initialize the Groq client

client = Groq(api_key=os.getenv('GROQ_API_KEY'))

# Initialize the Groq model for LLM responses

llm = ChatOpenAI(

model_name="gpt-4.1-mini",

openai_api_key=os.getenv("COMET_API_KEY"),

openai_api_base="https://api.cometapi.com/v1"

)

# Set the path to ffmpeg executable

AudioSegment.converter = "/bin/ffmpeg"

# Initialize the Groq client client = Groq(api_key=os.getenv('GROQ_API_KEY')) # Initialize the Groq model for LLM responses llm = ChatOpenAI( model_name="gpt-4.1-mini", openai_api_key=os.getenv("COMET_API_KEY"), openai_api_base="https://api.cometapi.com/v1" ) # Set the path to ffmpeg executable AudioSegment.converter = "/bin/ffmpeg"

# Initialize the Groq client
client = Groq(api_key=os.getenv('GROQ_API_KEY'))
# Initialize the Groq model for LLM responses
llm = ChatOpenAI(
model_name="gpt-4.1-mini",
openai_api_key=os.getenv("COMET_API_KEY"), 
openai_api_base="https://api.cometapi.com/v1"
)
# Set the path to ffmpeg executable
AudioSegment.converter = "/bin/ffmpeg"

3. 将音频文件（我们的语音录音）转换为.wav格式

在此，我们将通过 AudioSegment 和 BytesIO 完成音频的字节转换，并将其转换为wav格式：

def audio_bytes_to_wav(audio_bytes):

try:

with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:

audio = AudioSegment.from_file(BytesIO(audio_bytes))

# Downsample to reduce file size if needed

audio = audio.set_frame_rate(16000).set_channels(1)

audio.export(temp_wav.name, format="wav")

return temp_wav.name

except Exception as e:

st.error(f"Error during WAV file conversion: {e}")

return None

def audio_bytes_to_wav(audio_bytes): try: with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav: audio = AudioSegment.from_file(BytesIO(audio_bytes)) # Downsample to reduce file size if needed audio = audio.set_frame_rate(16000).set_channels(1) audio.export(temp_wav.name, format="wav") return temp_wav.name except Exception as e: st.error(f"Error during WAV file conversion: {e}") return None

def audio_bytes_to_wav(audio_bytes):
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:
audio = AudioSegment.from_file(BytesIO(audio_bytes))
# Downsample to reduce file size if needed
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export(temp_wav.name, format="wav")
return temp_wav.name
except Exception as e:
st.error(f"Error during WAV file conversion: {e}")
return None

4. 分割音频

我们将创建一个函数，根据输入参数（check_length_ms）分割音频。我们还将利用 regex 函数去除任何标点符号。

def split_audio(file_path, chunk_length_ms):

audio = AudioSegment.from_wav(file_path)

return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)]

def remove_punctuation(text):

return re.sub(r'[^\w\s]', '', text)

def split_audio(file_path, chunk_length_ms): audio = AudioSegment.from_wav(file_path) return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)] def remove_punctuation(text): return re.sub(r'[^\w\s]', '', text)

def split_audio(file_path, chunk_length_ms):
   audio = AudioSegment.from_wav(file_path)
   return [ audio [i:i + chunk_length_ms ] for i in range(0, len(audio), chunk_length_ms)]


def remove_punctuation(text):
   return re.sub(r'[^\w\s]', '', text)

5. 生成LLM响应

现在，我们开始执行主应答器功能，LLM 将在此功能下生成对我们的查询的适当回复。在提示模板中，我们将向 LLM 提供指示，说明它们应该如何响应查询。我们将使用Langchain 表达式语言来完成这项任务。

def get_llm_response(query, chat_history):

try:

template = template = """

You are an experienced Emergency Response Phone Operator trained to handle critical situations in India.

Your role is to guide users calmly and clearly during emergencies involving:

- Medical crises (injuries, heart attacks, etc.)

- Fire incidents

- Police/law enforcement assistance

- Suicide prevention or mental health crises

You must:

1. **Remain calm and assertive**, as if speaking on a phone call.

2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc.

3. **Provide immediate and practical steps** the user can take before help arrives.

4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.).

5. **Prioritize user safety**, and clearly instruct them what *not* to do as well.

6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions.

If the user's query is not related to an emergency, respond with:

"I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions."

Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**.

**Chat History:** {chat_history}

**User:** {user_query}

"""

prompt = ChatPromptTemplate.from_template(template)

chain = prompt | llm | StrOutputParser()

response_gen = chain.stream({

"chat_history": chat_history,

"user_query": query

})

response_text = ''.join(list(response_gen))

response_text = remove_punctuation(response_text)

# Remove repeated text

response_lines = response_text.split('\n')

unique_lines = list(dict.fromkeys(response_lines)) # Removing duplicates

cleaned_response = '\n'.join(unique_lines)

return cleaned_responseChatbot

except Exception as e:

st.error(f"Error during LLM response generation: {e}")

return "Error"

def get_llm_response(query, chat_history): try: template = template = """ You are an experienced Emergency Response Phone Operator trained to handle critical situations in India. Your role is to guide users calmly and clearly during emergencies involving: - Medical crises (injuries, heart attacks, etc.) - Fire incidents - Police/law enforcement assistance - Suicide prevention or mental health crises You must: 1. **Remain calm and assertive**, as if speaking on a phone call. 2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc. 3. **Provide immediate and practical steps** the user can take before help arrives. 4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.). 5. **Prioritize user safety**, and clearly instruct them what *not* to do as well. 6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions. If the user's query is not related to an emergency, respond with: "I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions." Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**. **Chat History:** {chat_history} **User:** {user_query} """ prompt = ChatPromptTemplate.from_template(template) chain = prompt | llm | StrOutputParser() response_gen = chain.stream({ "chat_history": chat_history, "user_query": query }) response_text = ''.join(list(response_gen)) response_text = remove_punctuation(response_text) # Remove repeated text response_lines = response_text.split('\n') unique_lines = list(dict.fromkeys(response_lines)) # Removing duplicates cleaned_response = '\n'.join(unique_lines) return cleaned_responseChatbot except Exception as e: st.error(f"Error during LLM response generation: {e}") return "Error"

def get_llm_response(query, chat_history):
try:
template = template = """
You are an experienced Emergency Response Phone Operator trained to handle critical situations in India.
Your role is to guide users calmly and clearly during emergencies involving:
- Medical crises (injuries, heart attacks, etc.)
- Fire incidents
- Police/law enforcement assistance
- Suicide prevention or mental health crises
You must:
1. **Remain calm and assertive**, as if speaking on a phone call.
2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc.
3. **Provide immediate and practical steps** the user can take before help arrives.
4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.).
5. **Prioritize user safety**, and clearly instruct them what *not* to do as well.
6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions.
If the user's query is not related to an emergency, respond with:
"I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions."
Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**.
**Chat History:** {chat_history}
**User:** {user_query}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | llm | StrOutputParser()
response_gen = chain.stream({
"chat_history": chat_history,
"user_query": query
})
response_text = ''.join(list(response_gen))
response_text = remove_punctuation(response_text)
# Remove repeated text
response_lines = response_text.split('\n')
unique_lines = list(dict.fromkeys(response_lines))  # Removing duplicates
cleaned_response = '\n'.join(unique_lines)
return cleaned_responseChatbot
except Exception as e:
st.error(f"Error during LLM response generation: {e}")
return "Error"

6. 文本转语音

我们将在 ElevenLabs TTS Client 的帮助下建立一个将文本转换为语音的函数，它将以 AudioSegment 格式返回音频。我们也可以使用其他 TTS 模型，如 Nari Lab 的 Dia 或 Google 的 gTTS。Eleven Labs 首先会提供一些免费点数，然后我们需要支付更多点数，而 gTTS 则完全免费。

def text_to_speech(text: str, retries: int = 3, delay: int = 5):

attempt = 0

while attempt < retries:

try:

# Request speech synthesis (streaming generator)

response_stream = tts_client.text_to_speech.convert(

text=text,

voice_id="JBFqnCBsd6RMkjVDRZzb",

model_id="eleven_multilingual_v2",

output_format="mp3_44100_128",

)

# Write streamed bytes to a temporary file

with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:

for chunk in response_stream:

f.write(chunk)

temp_path = f.name

# Load and return the audio

audio = AudioSegment.from_mp3(temp_path)

return audio

else:

st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.")

return AudioSegment.silent(duration=1000)

except Exception as e:

st.error(f"Error during text-to-speech conversion: {e}")

return AudioSegment.silent(duration=1000)

def text_to_speech(text: str, retries: int = 3, delay: int = 5): attempt = 0 while attempt < retries: try: # Request speech synthesis (streaming generator) response_stream = tts_client.text_to_speech.convert( text=text, voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Write streamed bytes to a temporary file with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f: for chunk in response_stream: f.write(chunk) temp_path = f.name # Load and return the audio audio = AudioSegment.from_mp3(temp_path) return audio else: st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.") return AudioSegment.silent(duration=1000) except Exception as e: st.error(f"Error during text-to-speech conversion: {e}") return AudioSegment.silent(duration=1000) return AudioSegment.silent(duration=1000)

def text_to_speech(text: str, retries: int = 3, delay: int = 5):
attempt = 0
while attempt < retries:
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Write streamed bytes to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
temp_path = f.name
# Load and return the audio
audio = AudioSegment.from_mp3(temp_path)
return audio
else:
st.error(f"Failed to connect after {retries} attempts. Please check your internet connection.")
return AudioSegment.silent(duration=1000)
except Exception as e:
st.error(f"Error during text-to-speech conversion: {e}")
return AudioSegment.silent(duration=1000)
return AudioSegment.silent(duration=1000)

7. 创建介绍性信息

我们还将创建一个介绍性文本并将其传递给我们的 TTS 模型，因为受访者通常会进行自我介绍并寻求用户可能需要的帮助。这里我们将返回 mp3 文件的路径。

lang= “en”-> 英语

tld= “co.in” -> 可以为特定语言生成不同的本地化“口音”。默认为 “com”

def create_welcome_message():

welcome_text = (

"Hello, you’ve reached the Emergency Help Desk. "

"Please let me know if it's a medical, fire, police, or mental health emergency—"

"I'm here to guide you right away."

)

try:

# Request speech synthesis (streaming generator)

response_stream = tts_client.text_to_speech.convert(

text=welcome_text,

voice_id="JBFqnCBsd6RMkjVDRZzb",

model_id="eleven_multilingual_v2",

output_format="mp3_44100_128",

)

# Save streamed bytes to temp file

with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:

for chunk in response_stream:

f.write(chunk)

return f.name

except requests.ConnectionError:

st.error("Failed to generate welcome message due to connection error.")

except Exception as e:

st.error(f"Error creating welcome message: {e}")

return None

def create_welcome_message(): welcome_text = ( "Hello, you’ve reached the Emergency Help Desk. " "Please let me know if it's a medical, fire, police, or mental health emergency—" "I'm here to guide you right away." ) try: # Request speech synthesis (streaming generator) response_stream = tts_client.text_to_speech.convert( text=welcome_text, voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Save streamed bytes to temp file with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f: for chunk in response_stream: f.write(chunk) return f.name except requests.ConnectionError: st.error("Failed to generate welcome message due to connection error.") except Exception as e: st.error(f"Error creating welcome message: {e}") return None

def create_welcome_message():
welcome_text = (
"Hello, you’ve reached the Emergency Help Desk. "
"Please let me know if it's a medical, fire, police, or mental health emergency—"
"I'm here to guide you right away."
)
try:
# Request speech synthesis (streaming generator)
response_stream = tts_client.text_to_speech.convert(
text=welcome_text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
# Save streamed bytes to temp file
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
for chunk in response_stream:
f.write(chunk)
return f.name
except requests.ConnectionError:
st.error("Failed to generate welcome message due to connection error.")
except Exception as e:
st.error(f"Error creating welcome message: {e}")
return None

Streamlit应用程序

现在，让我们跳转到 main.py 脚本，在这里我们将使用 Streamlit 来可视化我们的聊天机器人。

导入库和函数

导入我们在 utils.py 中创建的库和函数

import tempfile

import re # This can be removed if not used

from io import BytesIO

from pydub import AudioSegment

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

from audio_recorder_streamlit import audio_recorder

from utils import *

import tempfile import re # This can be removed if not used from io import BytesIO from pydub import AudioSegment from langchain_core.messages import AIMessage, HumanMessage from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate import streamlit as st from audio_recorder_streamlit import audio_recorder from utils import *

import tempfile
import re  # This can be removed if not used
from io import BytesIO
from pydub import AudioSegment
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import streamlit as st
from audio_recorder_streamlit import audio_recorder
from utils import *

Streamlit设置

现在，我们将设置标题名称和漂亮的“Emergency”视觉照片

st.title(":blue[Emergency Help Bot] 🚨🚑🆘")

st.sidebar.image('./emergency.jpg', use_column_width=True)

st.title(":blue[Emergency Help Bot] 🚨🚑🆘") st.sidebar.image('./emergency.jpg', use_column_width=True)

st.title(":blue[Emergency Help Bot] 🚨🚑🆘")
st.sidebar.image('./emergency.jpg', use_column_width=True)

我们将设置会话状态，以跟踪聊天和音频内容

if "chat_history" not in st.session_state:

st.session_state.chat_history = []

if "chat_histories" not in st.session_state:

st.session_state.chat_histories = []

if "played_audios" not in st.session_state:

st.session_state.played_audios = {}

if "chat_history" not in st.session_state: st.session_state.chat_history = [] if "chat_histories" not in st.session_state: st.session_state.chat_histories = [] if "played_audios" not in st.session_state: st.session_state.played_audios = {}

if "chat_history" not in st.session_state:
st.session_state.chat_history = []
if "chat_histories" not in st.session_state:
st.session_state.chat_histories = []
if "played_audios" not in st.session_state:
st.session_state.played_audios = {}

调用实用程序函数

我们将从应答方创建欢迎信息介绍。这将是我们对话的开始。

if len(st.session_state.chat_history) == 0:

welcome_audio_path = create_welcome_message()

st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]

st.session_state.played_audios[welcome_audio_path] = False

if len(st.session_state.chat_history) == 0: welcome_audio_path = create_welcome_message() st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)] st.session_state.played_audios[welcome_audio_path] = False

if len(st.session_state.chat_history) == 0:
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]
st.session_state.played_audios[welcome_audio_path] = False

侧边栏设置

现在，我们将在侧边栏中设置录音机、语音转文本、llm_response 和 text-to-speech 逻辑，这是本项目的主要核心内容

with st.sidebar:

audio_bytes = audio_recorder(

energy_threshold=0.01,

pause_threshold=0.8,

text="Speak on clicking the ICON (Max 5 min) \n",

recording_color="#e9b61d", # yellow

neutral_color="#2abf37", # green

icon_name="microphone",

icon_size="2x"

)

if audio_bytes:

temp_audio_path = audio_bytes_to_wav(audio_bytes)

if temp_audio_path:

try:

user_input = speech_to_text(audio_bytes)

if user_input:

st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path))

response = get_llm_response(user_input, st.session_state.chat_history)

audio_response = text_to_speech(response)

with st.sidebar: audio_bytes = audio_recorder( energy_threshold=0.01, pause_threshold=0.8, text="Speak on clicking the ICON (Max 5 min) \n", recording_color="#e9b61d", # yellow neutral_color="#2abf37", # green icon_name="microphone", icon_size="2x" ) if audio_bytes: temp_audio_path = audio_bytes_to_wav(audio_bytes) if temp_audio_path: try: user_input = speech_to_text(audio_bytes) if user_input: st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path)) response = get_llm_response(user_input, st.session_state.chat_history) audio_response = text_to_speech(response)

with st.sidebar:
audio_bytes = audio_recorder(
energy_threshold=0.01,
pause_threshold=0.8,
text="Speak on clicking the ICON (Max 5 min) \n",
recording_color="#e9b61d",   # yellow
neutral_color="#2abf37",    # green
icon_name="microphone",
icon_size="2x"
)
if audio_bytes:
temp_audio_path = audio_bytes_to_wav(audio_bytes)
if temp_audio_path:
try:
user_input = speech_to_text(audio_bytes)
if user_input:
st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path))
response = get_llm_response(user_input, st.session_state.chat_history)
audio_response = text_to_speech(response)

我们还将在侧边栏上设置一个按钮，以便在必要时重新启动会话，当然，我们还将在受访者一侧设置介绍性语音注释。

if st.button("Start New Chat"):

st.session_state.chat_histories.append(st.session_state.chat_history)

welcome_audio_path = create_welcome_message()

if st.button("Start New Chat"): st.session_state.chat_histories.append(st.session_state.chat_history) welcome_audio_path = create_welcome_message() st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]

if st.button("Start New Chat"):
st.session_state.chat_histories.append(st.session_state.chat_history)
welcome_audio_path = create_welcome_message()
st.session_state.chat_history = [AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)]

在应用程序的主页上，我们将以点击播放音频文件的形式直观显示聊天记录

for msg in st.session_state.chat_history:

if isinstance(msg, AIMessage):

with st.chat_message("AI"):

st.audio(msg.audio_file, format="audio/mp3")

else: # HumanMessage

with st.chat_message("user"):

st.audio(msg.audio_file, format="audio/wav")

for msg in st.session_state.chat_history: if isinstance(msg, AIMessage): with st.chat_message("AI"): st.audio(msg.audio_file, format="audio/mp3") else: # HumanMessage with st.chat_message("user"): st.audio(msg.audio_file, format="audio/wav")

for msg in st.session_state.chat_history:
if isinstance(msg, AIMessage):
with st.chat_message("AI"):
st.audio(msg.audio_file, format="audio/mp3")
else:  # HumanMessage
with st.chat_message("user"):
st.audio(msg.audio_file, format="audio/wav")

现在，我们已经完成了运行应用程序所需的所有 Python 脚本。我们将使用以下命令运行 Streamlit 应用程序：

streamlit run app.py

streamlit run app.py

这就是我们的项目工作流程：

[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio

[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio

如需完整代码，请访问此 GitHub 代码库。

最终输出

Streamlit 应用

如何构建紧急接线员语音聊天机器人插图6

Streamlit 应用程序看起来非常简洁，而且运行正常！

让我们来看看它的一些回复：- 用户：你好，有人心脏病发作了，我该怎么办？

1. 用户：Hi, someone is having a heart attack right now, what should I do?

然后我们就该人的位置和状态进行了对话，然后聊天机器人提供了以下内容

2. 用户：Hello, there has been a huge fire breakout in Delhi. Please send help quick

回复者询问情况和我目前的位置，然后提供相应的预防措施

3. 用户：Hey there, there is a person standing alone across the edge of the bridge, how should i proceed?

回复者询问我所在的位置以及我提到的那个人的精神状态

总之，我们的聊天机器人能够根据情况回复我们的询问，并提出相关问题以提供预防措施。

可以做出哪些改进？

多语言支持：可以集成具有强大多语言功能的 LLM，这样聊天机器人就能与来自不同地区和方言的用户进行无缝互动。
实时转录和翻译：添加语音转文本和实时翻译功能有助于消除沟通障碍。
基于位置的服务：通过集成 GPS 或其他基于位置的实时 API，系统可以检测用户的位置，并引导用户使用最近的应急设施。
语音到语音交互：我们还可以使用语音对语音模型，由于这些模型是专为此类功能而设计的，因此可以让对话感觉更加自然。
微调 LLM：根据特定紧急情况数据对 LLM 进行自定义微调，可提高其理解能力并提供更准确的响应。

小结

在本文中，我们结合人工智能模型和一些相关工具，成功构建了一个基于语音的紧急响应聊天机器人。该聊天机器人复制了训练有素的应急操作员的角色，能够处理从医疗危机、火灾事故到心理健康支持等各种高压力情况，使用时冷静果断，能够改变我们的 LLM 的行为，以适应现实世界中的各种紧急情况，使城市和农村场景的体验更加逼真。

聊天机器人

如何构建紧急接线员语音聊天机器人

我们的使用案例

我们将使用的工具

要求

项目结构

设置虚拟环境

主要Python脚本

1. 主要导入

2. 加载API密钥并初始化模型

3. 将音频文件（我们的语音录音）转换为.wav格式

4. 分割音频

5. 生成LLM响应

6. 文本转语音

7. 创建介绍性信息

Streamlit应用程序

导入库和函数

Streamlit设置

调用实用程序函数

侧边栏设置

最终输出

可以做出哪些改进？

小结

评论留言

取消回复

文章目录

如何构建紧急接线员语音聊天机器人

我们的使用案例

我们将使用的工具

要求

项目结构

设置虚拟环境

主要Python脚本

1. 主要导入

2. 加载API密钥并初始化模型

3. 将音频文件（我们的语音录音）转换为.wav格式

4. 分割音频

5. 生成LLM响应

6. 文本转语音

7. 创建介绍性信息

Streamlit应用程序

导入库和函数

Streamlit设置

调用实用程序函数

侧边栏设置

最终输出

可以做出哪些改进？

小结

相关文章

评论留言

取消回复

文章目录