
您是否曾经遇到过这样的情况:您拥有一个庞大的数据集,却想从中获得洞察?听起来很可怕,对吧?获取有用的洞察,尤其是从庞大的数据集中获取洞察,是一项艰巨的任务。想象一下,在没有任何数据可视化前端专业知识的情况下,将您的数据集转换为交互式 Web 应用程序。Gradio 与 Python 结合使用,只需极少的代码即可实现此功能。数据可视化是有效呈现数据洞察的强大工具。在本指南中,我们将探讨如何构建现代化的交互式数据仪表板,重点介绍 Gradio 数据可视化,并演示如何使用 Python 构建 GUI。
了解Gradio
Gradio 是一个用于构建基于 Web 界面的开源 Python 库。它专为简化用于部署机器学习模型和数据应用程序的用户界面开发而构建。您无需具备 HTML、JavaScript 和 CSS 等 Web 技术方面的丰富背景。Gradio 会在内部处理所有复杂问题和其他事务。这使您只需专注于 Python 代码即可。

Gradio与Streamlit
Streamlit 和 Gradio 都支持以最少的代码行开发 Web 应用程序。它们彼此完全不同。因此,了解它们的差异可以帮助您选择合适的 Web 应用程序构建框架。
方面 |
Gradio |
Streamlit |
易用性 |
Gradio 上手难度低,界面及 API 简洁直观,新手也能快速入门。 |
Streamlit 功能丰富、可定制性高,但学习曲线相对更陡峭。 |
主要定位 |
侧重于为机器学习 / 人工智能模型快速构建交互式界面。 |
更偏向通用型的应用框架,可用于更广泛的数据应用场景。 |
响应模型 |
组件通常在特定动作(如点击按钮)后更新,也可配置实时刷新。 |
采用整体式响应模型,任何输入变动都会触发脚本整体重跑。 |
优势 |
适合快速演示模型、构建轻量级可视化工具。 |
擅长数据驱动型应用和复杂的交互式数据仪表盘。 |

两种工具均可用于制作交互式仪表板。具体选择哪种工具取决于项目的具体需求。
构建交互式仪表板的步骤
让我们来看看构建此交互式仪表板所需的关键步骤。
1. 获取数据
创建仪表板之前的关键步骤之一是获取用于可视化的基础数据。我们将为 Python Gradio 仪表板准备一个合成的 CSV 文件。它包含 100,000 条模拟网站用户互动的记录。每条记录代表一次用户会话或一次重要的互动。
以下是我们的 CSV 文件示例:
timestamp |
user_id |
page_visited |
session_duration_seconds |
country |
device_type |
browser |
2023-01-15 10:30:00 |
U1001 |
/home |
120 |
USA |
Desktop |
Chrome |
2023-01-15 10:32:00 |
U1002 |
/products |
180 |
Canada |
Mobile |
Safari |
2023-01-15 10:35:00 |
U1001 |
/contact |
90 |
USA |
Desktop |
Chrome |
… |
… |
… |
… |
… |
… |
… |
您可以使用以下 Python 代码生成此类数据。此处我们生成一个用于演示。请确保您已安装 numpy 和 pandas。
from datetime import datetime, timedelta
def generate_website_data(nrows: int, filename: str):
# Possible values for categorical fields
pages = ["/home", "/products", "/services", "/about", "/contact", "/blog"]
countries = ["USA", "Canada", "UK", "Germany", "France", "India", "Australia"]
device_types = ["Desktop", "Mobile", "Tablet"]
browsers = ["Chrome", "Firefox", "Safari", "Edge", "Opera"]
user_ids = [f"User_{i}" for i in np.random.randint(1000, 2000, size=nrows)]
page_visited_data = np.random.choice(pages, size=nrows)
session_durations = np.random.randint(30, 1800, size=nrows) # Session duration between 30s and 30min
country_data = np.random.choice(countries, size=nrows)
device_type_data = np.random.choice(device_types, size=nrows)
browser_data = np.random.choice(browsers, size=nrows)
# Generate random timestamps over the last two years
start_t = end_t - timedelta(days=730)
time_range_seconds = int((end_t - start_t).total_seconds())
random_seconds = np.random.randint(0, time_range_seconds)
timestamp = start_t + timedelta(seconds=random_seconds)
timestamps_data.append(timestamp.strftime('%Y-%m-%d %H:%M:%S'))
# Define columns for the DataFrame
"timestamp": timestamps_data,
"page_visited": page_visited_data,
"session_duration_seconds": session_durations,
"device_type": device_type_data,
# Create Pandas DataFrame
df = pd.DataFrame(columns)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by="timestamp").reset_index(drop=True)
df.to_csv(filename, index=False)
print(f"{nrows} rows of data generated and saved to {filename}")
# Generate 100,000 rows of data
generate_website_data(100_000, "website_engagement_data.csv")
# print("Please uncomment the above line to generate the data.")
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
def generate_website_data(nrows: int, filename: str):
# Possible values for categorical fields
pages = ["/home", "/products", "/services", "/about", "/contact", "/blog"]
countries = ["USA", "Canada", "UK", "Germany", "France", "India", "Australia"]
device_types = ["Desktop", "Mobile", "Tablet"]
browsers = ["Chrome", "Firefox", "Safari", "Edge", "Opera"]
# Generate random data
user_ids = [f"User_{i}" for i in np.random.randint(1000, 2000, size=nrows)]
page_visited_data = np.random.choice(pages, size=nrows)
session_durations = np.random.randint(30, 1800, size=nrows) # Session duration between 30s and 30min
country_data = np.random.choice(countries, size=nrows)
device_type_data = np.random.choice(device_types, size=nrows)
browser_data = np.random.choice(browsers, size=nrows)
# Generate random timestamps over the last two years
end_t = datetime.now()
start_t = end_t - timedelta(days=730)
time_range_seconds = int((end_t - start_t).total_seconds())
timestamps_data = []
for _ in range(nrows):
random_seconds = np.random.randint(0, time_range_seconds)
timestamp = start_t + timedelta(seconds=random_seconds)
timestamps_data.append(timestamp.strftime('%Y-%m-%d %H:%M:%S'))
# Define columns for the DataFrame
columns = {
"timestamp": timestamps_data,
"user_id": user_ids,
"page_visited": page_visited_data,
"session_duration_seconds": session_durations,
"country": country_data,
"device_type": device_type_data,
"browser": browser_data,
}
# Create Pandas DataFrame
df = pd.DataFrame(columns)
# Sort by timestamp
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by="timestamp").reset_index(drop=True)
# Write to CSV
df.to_csv(filename, index=False)
print(f"{nrows} rows of data generated and saved to {filename}")
# Generate 100,000 rows of data
generate_website_data(100_000, "website_engagement_data.csv")
# print("Please uncomment the above line to generate the data.")
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
def generate_website_data(nrows: int, filename: str):
# Possible values for categorical fields
pages = ["/home", "/products", "/services", "/about", "/contact", "/blog"]
countries = ["USA", "Canada", "UK", "Germany", "France", "India", "Australia"]
device_types = ["Desktop", "Mobile", "Tablet"]
browsers = ["Chrome", "Firefox", "Safari", "Edge", "Opera"]
# Generate random data
user_ids = [f"User_{i}" for i in np.random.randint(1000, 2000, size=nrows)]
page_visited_data = np.random.choice(pages, size=nrows)
session_durations = np.random.randint(30, 1800, size=nrows) # Session duration between 30s and 30min
country_data = np.random.choice(countries, size=nrows)
device_type_data = np.random.choice(device_types, size=nrows)
browser_data = np.random.choice(browsers, size=nrows)
# Generate random timestamps over the last two years
end_t = datetime.now()
start_t = end_t - timedelta(days=730)
time_range_seconds = int((end_t - start_t).total_seconds())
timestamps_data = []
for _ in range(nrows):
random_seconds = np.random.randint(0, time_range_seconds)
timestamp = start_t + timedelta(seconds=random_seconds)
timestamps_data.append(timestamp.strftime('%Y-%m-%d %H:%M:%S'))
# Define columns for the DataFrame
columns = {
"timestamp": timestamps_data,
"user_id": user_ids,
"page_visited": page_visited_data,
"session_duration_seconds": session_durations,
"country": country_data,
"device_type": device_type_data,
"browser": browser_data,
}
# Create Pandas DataFrame
df = pd.DataFrame(columns)
# Sort by timestamp
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by="timestamp").reset_index(drop=True)
# Write to CSV
df.to_csv(filename, index=False)
print(f"{nrows} rows of data generated and saved to {filename}")
# Generate 100,000 rows of data
generate_website_data(100_000, "website_engagement_data.csv")
# print("Please uncomment the above line to generate the data.")
输出:
100000 rows of data generated and saved to website_engagement_data.csv
执行此代码后,您将看到输出,并生成一个包含数据的 CSV 文件。
2. 安装Gradio
使用 pip 安装 Gradio 非常简单。建议使用专用的 Python 环境。可以使用 venv 和 conda 等工具创建独立的环境。Gradio 需要 Python 3.8 或更高版本。
python -m venv gradio_env
source gradio_env/bin/activate # On Linux/macOS
.\gradio_env\Scripts\activate # On Windows
python -m venv gradio_env
source gradio_env/bin/activate # On Linux/macOS
.\gradio_env\Scripts\activate # On Windows
python -m venv gradio_env
source gradio_env/bin/activate # On Linux/macOS
.\gradio_env\Scripts\activate # On Windows
安装必要的库
pip install gradio pandas plotly cachetools
pip install gradio pandas plotly cachetools
pip install gradio pandas plotly cachetools
现在我们已经安装了所有依赖项,让我们逐步创建仪表板。
3. 导入必要的库
首先,创建一个 app.py 文件,然后导入构建交互式仪表板所需的库。我们将使用 Plotly 进行 Gradio 数据可视化。并使用 Cachetools 为昂贵的函数调用创建缓存,以提高性能。
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, date
from cachetools import cached, TTLCache
warnings.filterwarnings("ignore", category=FutureWarning, module="plotly")
warnings.filterwarnings("ignore", category=UserWarning, module="plotly")
import gradio as gr
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, date
from cachetools import cached, TTLCache
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="plotly")
warnings.filterwarnings("ignore", category=UserWarning, module="plotly")
import gradio as gr
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, date
from cachetools import cached, TTLCache
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="plotly")
warnings.filterwarnings("ignore", category=UserWarning, module="plotly")
4. 加载CSV数据
让我们加载生成的 CSV 文件。确保该 CSV 文件与你的 app.py 文件位于同一目录中。
DATA_FILE = "website_engagement_data.csv" # Make sure this file is generated and in the same directory or provide full path
def load_engagement_data():
# Generate data if it doesn't exist (for first-time run)
if not os.path.exists(DATA_FILE):
print(f"{DATA_FILE} not found. Generating synthetic data...")
print(f"Please generate '{DATA_FILE}' using the provided script first if it's missing.")
'page_visited': 'category',
'session_duration_seconds': 'int32',
'device_type': 'category',
parse_dates=["timestamp"],
# Ensure timestamp is datetime
raw_data['timestamp'] = pd.to_datetime(raw_data['timestamp'])
print(f"Data loaded successfully: {len(raw_data)} rows.")
except FileNotFoundError:
print(f"Error: The file {DATA_FILE} was not found.")
raw_data = pd.DataFrame() # Return empty dataframe if file not found
print(f"An error occurred while loading data: {e}")
raw_data = pd.DataFrame()
# Load data at script startup
# --- Load CSV data ---
DATA_FILE = "website_engagement_data.csv" # Make sure this file is generated and in the same directory or provide full path
raw_data = None
def load_engagement_data():
global raw_data
try:
# Generate data if it doesn't exist (for first-time run)
import os
if not os.path.exists(DATA_FILE):
print(f"{DATA_FILE} not found. Generating synthetic data...")
print(f"Please generate '{DATA_FILE}' using the provided script first if it's missing.")
return pd.DataFrame()
dtype_spec = {
'user_id': 'string',
'page_visited': 'category',
'session_duration_seconds': 'int32',
'country': 'category',
'device_type': 'category',
'browser': 'category'
}
raw_data = pd.read_csv(
DATA_FILE,
parse_dates=["timestamp"],
dtype=dtype_spec,
low_memory=False
)
# Ensure timestamp is datetime
raw_data['timestamp'] = pd.to_datetime(raw_data['timestamp'])
print(f"Data loaded successfully: {len(raw_data)} rows.")
except FileNotFoundError:
print(f"Error: The file {DATA_FILE} was not found.")
raw_data = pd.DataFrame() # Return empty dataframe if file not found
except Exception as e:
print(f"An error occurred while loading data: {e}")
raw_data = pd.DataFrame()
return raw_data
# Load data at script startup
load_engagement_data()
# --- Load CSV data ---
DATA_FILE = "website_engagement_data.csv" # Make sure this file is generated and in the same directory or provide full path
raw_data = None
def load_engagement_data():
global raw_data
try:
# Generate data if it doesn't exist (for first-time run)
import os
if not os.path.exists(DATA_FILE):
print(f"{DATA_FILE} not found. Generating synthetic data...")
print(f"Please generate '{DATA_FILE}' using the provided script first if it's missing.")
return pd.DataFrame()
dtype_spec = {
'user_id': 'string',
'page_visited': 'category',
'session_duration_seconds': 'int32',
'country': 'category',
'device_type': 'category',
'browser': 'category'
}
raw_data = pd.read_csv(
DATA_FILE,
parse_dates=["timestamp"],
dtype=dtype_spec,
low_memory=False
)
# Ensure timestamp is datetime
raw_data['timestamp'] = pd.to_datetime(raw_data['timestamp'])
print(f"Data loaded successfully: {len(raw_data)} rows.")
except FileNotFoundError:
print(f"Error: The file {DATA_FILE} was not found.")
raw_data = pd.DataFrame() # Return empty dataframe if file not found
except Exception as e:
print(f"An error occurred while loading data: {e}")
raw_data = pd.DataFrame()
return raw_data
# Load data at script startup
load_engagement_data()
5. 缓存和实用函数
这些函数用于创建缓存,以便快速加载数据,从而减少计算时间。
# Caching and Utility Functions ---
# Cache for expensive function calls to improve performance
ttl_cache = TTLCache(maxsize=100, ttl=300) # Cache up to 100 items, expire after 5 minutes
def get_unique_filter_values():
if raw_data is None or raw_data.empty:
pages = sorted(raw_data['page_visited'].dropna().unique().tolist())
devices = sorted(raw_data['device_type'].dropna().unique().tolist())
countries = sorted(raw_data['country'].dropna().unique().tolist())
return pages, devices, countries
def get_date_range_from_data():
if raw_data is None or raw_data.empty:
return date.today(), date.today()
min_dt = raw_data['timestamp'].min().date()
max_dt = raw_data['timestamp'].max().date()
# Caching and Utility Functions ---
# Cache for expensive function calls to improve performance
ttl_cache = TTLCache(maxsize=100, ttl=300) # Cache up to 100 items, expire after 5 minutes
@cached(ttl_cache)
def get_unique_filter_values():
if raw_data is None or raw_data.empty:
return [], [], []
pages = sorted(raw_data['page_visited'].dropna().unique().tolist())
devices = sorted(raw_data['device_type'].dropna().unique().tolist())
countries = sorted(raw_data['country'].dropna().unique().tolist())
return pages, devices, countries
def get_date_range_from_data():
if raw_data is None or raw_data.empty:
return date.today(), date.today()
min_dt = raw_data['timestamp'].min().date()
max_dt = raw_data['timestamp'].max().date()
return min_dt, max_dt
# Caching and Utility Functions ---
# Cache for expensive function calls to improve performance
ttl_cache = TTLCache(maxsize=100, ttl=300) # Cache up to 100 items, expire after 5 minutes
@cached(ttl_cache)
def get_unique_filter_values():
if raw_data is None or raw_data.empty:
return [], [], []
pages = sorted(raw_data['page_visited'].dropna().unique().tolist())
devices = sorted(raw_data['device_type'].dropna().unique().tolist())
countries = sorted(raw_data['country'].dropna().unique().tolist())
return pages, devices, countries
def get_date_range_from_data():
if raw_data is None or raw_data.empty:
return date.today(), date.today()
min_dt = raw_data['timestamp'].min().date()
max_dt = raw_data['timestamp'].max().date()
return min_dt, max_dt
6. 数据过滤和关键指标函数
以下函数将用于根据用户在仪表板上的输入或操作来过滤数据。
# Data Filtering Function ---
def filter_engagement_data(start_date_dt, end_date_dt, selected_page, selected_device, selected_country):
if raw_data is None or raw_data.empty:
# Ensure dates are datetime.date objects if they are strings
if isinstance(start_date_dt, str):
start_date_dt = datetime.strptime(start_date_dt, '%Y-%m-%d').date()
if isinstance(end_date_dt, str):
end_date_dt = datetime.strptime(end_date_dt, '%Y-%m-%d').date()
# Convert dates to datetime for comparison with timestamp column
start_datetime = datetime.combine(start_date_dt, datetime.min.time())
end_datetime = datetime.combine(end_date_dt, datetime.max.time())
(raw_data['timestamp'] >= start_datetime) &
(raw_data['timestamp'] <= end_datetime)
if selected_page != "All Pages" and selected_page is not None:
filtered_df = filtered_df[filtered_df['page_visited'] == selected_page]
if selected_device != "All Devices" and selected_device is not None:
filtered_df = filtered_df[filtered_df['device_type'] == selected_device]
if selected_country != "All Countries" and selected_country is not None:
filtered_df = filtered_df[filtered_df['country'] == selected_country]
# Data Filtering Function ---
def filter_engagement_data(start_date_dt, end_date_dt, selected_page, selected_device, selected_country):
global raw_data
if raw_data is None or raw_data.empty:
return pd.DataFrame()
# Ensure dates are datetime.date objects if they are strings
if isinstance(start_date_dt, str):
start_date_dt = datetime.strptime(start_date_dt, '%Y-%m-%d').date()
if isinstance(end_date_dt, str):
end_date_dt = datetime.strptime(end_date_dt, '%Y-%m-%d').date()
# Convert dates to datetime for comparison with timestamp column
start_datetime = datetime.combine(start_date_dt, datetime.min.time())
end_datetime = datetime.combine(end_date_dt, datetime.max.time())
filtered_df = raw_data[
(raw_data['timestamp'] >= start_datetime) &
(raw_data['timestamp'] <= end_datetime)
].copy()
if selected_page != "All Pages" and selected_page is not None:
filtered_df = filtered_df[filtered_df['page_visited'] == selected_page]
if selected_device != "All Devices" and selected_device is not None:
filtered_df = filtered_df[filtered_df['device_type'] == selected_device]
if selected_country != "All Countries" and selected_country is not None:
filtered_df = filtered_df[filtered_df['country'] == selected_country]
return filtered_df
# Data Filtering Function ---
def filter_engagement_data(start_date_dt, end_date_dt, selected_page, selected_device, selected_country):
global raw_data
if raw_data is None or raw_data.empty:
return pd.DataFrame()
# Ensure dates are datetime.date objects if they are strings
if isinstance(start_date_dt, str):
start_date_dt = datetime.strptime(start_date_dt, '%Y-%m-%d').date()
if isinstance(end_date_dt, str):
end_date_dt = datetime.strptime(end_date_dt, '%Y-%m-%d').date()
# Convert dates to datetime for comparison with timestamp column
start_datetime = datetime.combine(start_date_dt, datetime.min.time())
end_datetime = datetime.combine(end_date_dt, datetime.max.time())
filtered_df = raw_data[
(raw_data['timestamp'] >= start_datetime) &
(raw_data['timestamp'] <= end_datetime)
].copy()
if selected_page != "All Pages" and selected_page is not None:
filtered_df = filtered_df[filtered_df['page_visited'] == selected_page]
if selected_device != "All Devices" and selected_device is not None:
filtered_df = filtered_df[filtered_df['device_type'] == selected_device]
if selected_country != "All Countries" and selected_country is not None:
filtered_df = filtered_df[filtered_df['country'] == selected_country]
return filtered_df
下一个函数将用于计算关键指标,如总会话数、独立用户数和按访问者数量排名的首页。
#Function to Calculate Key Metrics ---
def calculate_key_metrics(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
total_sessions = df['user_id'].count() # Assuming each row is a session/interaction
unique_users = df['user_id'].nunique()
avg_session_duration = df['session_duration_seconds'].mean()
if pd.isna(avg_session_duration): # Handle case where mean is NaN (e.g., no sessions)
# Top page by number of visits
if not df['page_visited'].mode().empty:
top_page_visited = df['page_visited'].mode()[0]
return total_sessions, unique_users, round(avg_session_duration, 2), top_page_visited
#Function to Calculate Key Metrics ---
@cached(ttl_cache)
def calculate_key_metrics(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return 0, 0, 0, "N/A"
total_sessions = df['user_id'].count() # Assuming each row is a session/interaction
unique_users = df['user_id'].nunique()
avg_session_duration = df['session_duration_seconds'].mean()
if pd.isna(avg_session_duration): # Handle case where mean is NaN (e.g., no sessions)
avg_session_duration = 0
# Top page by number of visits
if not df['page_visited'].mode().empty:
top_page_visited = df['page_visited'].mode()[0]
else:
top_page_visited = "N/A"
return total_sessions, unique_users, round(avg_session_duration, 2), top_page_visited
#Function to Calculate Key Metrics ---
@cached(ttl_cache)
def calculate_key_metrics(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return 0, 0, 0, "N/A"
total_sessions = df['user_id'].count() # Assuming each row is a session/interaction
unique_users = df['user_id'].nunique()
avg_session_duration = df['session_duration_seconds'].mean()
if pd.isna(avg_session_duration): # Handle case where mean is NaN (e.g., no sessions)
avg_session_duration = 0
# Top page by number of visits
if not df['page_visited'].mode().empty:
top_page_visited = df['page_visited'].mode()[0]
else:
top_page_visited = "N/A"
return total_sessions, unique_users, round(avg_session_duration, 2), top_page_visited
7. 图形绘制函数
现在我们将使用 Plotly 创建一些图形绘制函数。这将使我们的仪表盘看起来更加详细和引人入胜。
# Functions for Plotting with Plotly ---
def create_sessions_over_time_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
sessions_by_date = df.groupby(df['timestamp'].dt.date)['user_id'].count().reset_index()
sessions_by_date.rename(columns={'timestamp': 'date', 'user_id': 'sessions'}, inplace=True)
fig = px.line(sessions_by_date, x='date', y='sessions', title='User Sessions Over Time')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
def create_engagement_by_device_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
device_engagement = df.groupby('device_type')['session_duration_seconds'].sum().reset_index()
device_engagement.rename(columns={'session_duration_seconds': 'total_duration'}, inplace=True)
fig = px.bar(device_engagement, x='device_type', y='total_duration',
title='Total Session Duration by Device Type', color='device_type')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
def create_page_visits_distribution_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
page_visits = df['page_visited'].value_counts().reset_index()
page_visits.columns = ['page_visited', 'visits']
fig = px.pie(page_visits, names='page_visited', values='visits',
title='Distribution of Page Visits', hole=0.3)
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
# Functions for Plotting with Plotly ---
def create_sessions_over_time_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
sessions_by_date = df.groupby(df['timestamp'].dt.date)['user_id'].count().reset_index()
sessions_by_date.rename(columns={'timestamp': 'date', 'user_id': 'sessions'}, inplace=True)
fig = px.line(sessions_by_date, x='date', y='sessions', title='User Sessions Over Time')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_engagement_by_device_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
device_engagement = df.groupby('device_type')['session_duration_seconds'].sum().reset_index()
device_engagement.rename(columns={'session_duration_seconds': 'total_duration'}, inplace=True)
fig = px.bar(device_engagement, x='device_type', y='total_duration',
title='Total Session Duration by Device Type', color='device_type')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_page_visits_distribution_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
page_visits = df['page_visited'].value_counts().reset_index()
page_visits.columns = ['page_visited', 'visits']
fig = px.pie(page_visits, names='page_visited', values='visits',
title='Distribution of Page Visits', hole=0.3)
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
# Functions for Plotting with Plotly ---
def create_sessions_over_time_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
sessions_by_date = df.groupby(df['timestamp'].dt.date)['user_id'].count().reset_index()
sessions_by_date.rename(columns={'timestamp': 'date', 'user_id': 'sessions'}, inplace=True)
fig = px.line(sessions_by_date, x='date', y='sessions', title='User Sessions Over Time')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_engagement_by_device_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
device_engagement = df.groupby('device_type')['session_duration_seconds'].sum().reset_index()
device_engagement.rename(columns={'session_duration_seconds': 'total_duration'}, inplace=True)
fig = px.bar(device_engagement, x='device_type', y='total_duration',
title='Total Session Duration by Device Type', color='device_type')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_page_visits_distribution_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
page_visits = df['page_visited'].value_counts().reset_index()
page_visits.columns = ['page_visited', 'visits']
fig = px.pie(page_visits, names='page_visited', values='visits',
title='Distribution of Page Visits', hole=0.3)
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
8. 表格显示和数据更新函数
以下函数用于准备表格显示的数据,并在用户执行任何函数或输入后更新仪表板值。
# Function to Prepare Data for Table Display ---
def get_data_for_table_display(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
return pd.DataFrame(columns=['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser'])
# Select and order columns for display
display_columns = ['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser']
df_display = df[display_columns].copy()
df_display['timestamp'] = df_display['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S') # Format date for display
return df_display.head(100) # Display top 100 rows for performance
#Main Update Function for the Dashboard ---
def update_full_dashboard(start_date_str, end_date_str, selected_page, selected_device, selected_country):
if raw_data is None or raw_data.empty: # Handle case where data loading failed
empty_fig = go.Figure().update_layout(title_text="Data not loaded", xaxis_showgrid=False, yaxis_showgrid=False)
empty_df = pd.DataFrame()
return empty_fig, empty_fig, empty_fig, empty_df, 0, 0, 0.0, "N/A"
# Convert date strings from Gradio input to datetime.date objects
start_date_obj = datetime.strptime(start_date_str, '%Y-%m-%d').date() if isinstance(start_date_str, str) else start_date_str
end_date_obj = datetime.strptime(end_date_str, '%Y-%m-%d').date() if isinstance(end_date_str, str) else end_date_str
sessions, users, avg_duration, top_page = calculate_key_metrics(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
plot_sessions_time = create_sessions_over_time_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
plot_engagement_device = create_engagement_by_device_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
plot_page_visits = create_page_visits_distribution_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
table_df = get_data_for_table_display(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
# Function to Prepare Data for Table Display ---
def get_data_for_table_display(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return pd.DataFrame(columns=['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser'])
# Select and order columns for display
display_columns = ['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser']
df_display = df[display_columns].copy()
df_display['timestamp'] = df_display['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S') # Format date for display
return df_display.head(100) # Display top 100 rows for performance
#Main Update Function for the Dashboard ---
def update_full_dashboard(start_date_str, end_date_str, selected_page, selected_device, selected_country):
if raw_data is None or raw_data.empty: # Handle case where data loading failed
empty_fig = go.Figure().update_layout(title_text="Data not loaded", xaxis_showgrid=False, yaxis_showgrid=False)
empty_df = pd.DataFrame()
return empty_fig, empty_fig, empty_fig, empty_df, 0, 0, 0.0, "N/A"
# Convert date strings from Gradio input to datetime.date objects
start_date_obj = datetime.strptime(start_date_str, '%Y-%m-%d').date() if isinstance(start_date_str, str) else start_date_str
end_date_obj = datetime.strptime(end_date_str, '%Y-%m-%d').date() if isinstance(end_date_str, str) else end_date_str
# Get key metrics
sessions, users, avg_duration, top_page = calculate_key_metrics(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Generate plots
plot_sessions_time = create_sessions_over_time_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_engagement_device = create_engagement_by_device_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_page_visits = create_page_visits_distribution_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Get data for table
table_df = get_data_for_table_display(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
return (
plot_sessions_time,
plot_engagement_device,
plot_page_visits,
table_df,
sessions,
users,
avg_duration,
top_page
)
# Function to Prepare Data for Table Display ---
def get_data_for_table_display(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return pd.DataFrame(columns=['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser'])
# Select and order columns for display
display_columns = ['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser']
df_display = df[display_columns].copy()
df_display['timestamp'] = df_display['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S') # Format date for display
return df_display.head(100) # Display top 100 rows for performance
#Main Update Function for the Dashboard ---
def update_full_dashboard(start_date_str, end_date_str, selected_page, selected_device, selected_country):
if raw_data is None or raw_data.empty: # Handle case where data loading failed
empty_fig = go.Figure().update_layout(title_text="Data not loaded", xaxis_showgrid=False, yaxis_showgrid=False)
empty_df = pd.DataFrame()
return empty_fig, empty_fig, empty_fig, empty_df, 0, 0, 0.0, "N/A"
# Convert date strings from Gradio input to datetime.date objects
start_date_obj = datetime.strptime(start_date_str, '%Y-%m-%d').date() if isinstance(start_date_str, str) else start_date_str
end_date_obj = datetime.strptime(end_date_str, '%Y-%m-%d').date() if isinstance(end_date_str, str) else end_date_str
# Get key metrics
sessions, users, avg_duration, top_page = calculate_key_metrics(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Generate plots
plot_sessions_time = create_sessions_over_time_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_engagement_device = create_engagement_by_device_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_page_visits = create_page_visits_distribution_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Get data for table
table_df = get_data_for_table_display(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
return (
plot_sessions_time,
plot_engagement_device,
plot_page_visits,
table_df,
sessions,
users,
avg_duration,
top_page
)
9. 创建Gradio界面
最后,我们将利用上面创建的所有实用函数来创建 Gradio 界面。
# Create Gradio Dashboard Interface ---
def build_engagement_dashboard():
unique_pages, unique_devices, unique_countries = get_unique_filter_values()
min_data_date, max_data_date = get_date_range_from_data()
# Set initial dates as strings for Gradio components
initial_start_date_str = min_data_date.strftime('%Y-%m-%d')
initial_end_date_str = max_data_date.strftime('%Y-%m-%d')
with gr.Blocks(theme=gr.themes.Soft(), title="Website Engagement Dashboard") as dashboard_interface:
gr.Markdown("# Website User Engagement Dashboard")
gr.Markdown("Explore user activity trends and engagement metrics for your website. This **Python Gradio dashboard** helps with **Gradio data visualization**.")
start_date_picker = gr.Textbox(label="Start Date (YYYY-MM-DD)", value=initial_start_date_str, type="text")
end_date_picker = gr.Textbox(label="End Date (YYYY-MM-DD)", value=initial_end_date_str, type="text")
page_dropdown = gr.Dropdown(choices=["All Pages"] + unique_pages, label="Page Visited", value="All Pages")
device_dropdown = gr.Dropdown(choices=["All Devices"] + unique_devices, label="Device Type", value="All Devices")
country_dropdown = gr.Dropdown(choices=["All Countries"] + unique_countries, label="Country", value="All Countries")
# --- Key Metrics Display ---
gr.Markdown("## Key Metrics")
total_sessions_num = gr.Number(label="Total Sessions", value=0, precision=0)
unique_users_num = gr.Number(label="Unique Users", value=0, precision=0)
avg_duration_num = gr.Number(label="Avg. Session Duration (s)", value=0, precision=2)
top_page_text = gr.Textbox(label="Most Visited Page", value="N/A", interactive=False)
# --- Visualizations Tabs ---
gr.Markdown("## Visualizations")
with gr.TabItem("Sessions Over Time"):
sessions_plot_output = gr.Plot()
with gr.TabItem("Engagement by Device"):
device_plot_output = gr.Plot()
with gr.TabItem("Page Visit Distribution"):
page_visits_plot_output = gr.Plot()
gr.Markdown("## Raw Engagement Data (Sample)")
# Corrected: Removed max_rows. The number of rows displayed will be controlled
# by the DataFrame returned by get_data_for_table_display (which returns head(100)).
# Gradio will then paginate or scroll this.
data_table_output = gr.DataFrame(
label="User Sessions Data",
headers=['Timestamp', 'User ID', 'Page Visited', 'Duration (s)', 'Country', 'Device', 'Browser']
# For display height, you can use the `height` parameter, e.g., height=400
# --- Define Inputs & Outputs for Update Function ---
inputs_list = [start_date_picker, end_date_picker, page_dropdown, device_dropdown, country_dropdown]
sessions_plot_output, device_plot_output, page_visits_plot_output,
total_sessions_num, unique_users_num, avg_duration_num, top_page_text
# --- Event Handling: Update dashboard when filters change ---
for filter_component in inputs_list:
if isinstance(filter_component, gr.Textbox):
filter_component.submit(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
filter_component.change(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
# --- Initial load of the dashboard ---
dashboard_interface.load(
fn=update_full_dashboard,
return dashboard_interface
# Create Gradio Dashboard Interface ---
def build_engagement_dashboard():
unique_pages, unique_devices, unique_countries = get_unique_filter_values()
min_data_date, max_data_date = get_date_range_from_data()
# Set initial dates as strings for Gradio components
initial_start_date_str = min_data_date.strftime('%Y-%m-%d')
initial_end_date_str = max_data_date.strftime('%Y-%m-%d')
with gr.Blocks(theme=gr.themes.Soft(), title="Website Engagement Dashboard") as dashboard_interface:
gr.Markdown("# Website User Engagement Dashboard")
gr.Markdown("Explore user activity trends and engagement metrics for your website. This **Python Gradio dashboard** helps with **Gradio data visualization**.")
# --- Filters Row ---
with gr.Row():
start_date_picker = gr.Textbox(label="Start Date (YYYY-MM-DD)", value=initial_start_date_str, type="text")
end_date_picker = gr.Textbox(label="End Date (YYYY-MM-DD)", value=initial_end_date_str, type="text")
with gr.Row():
page_dropdown = gr.Dropdown(choices=["All Pages"] + unique_pages, label="Page Visited", value="All Pages")
device_dropdown = gr.Dropdown(choices=["All Devices"] + unique_devices, label="Device Type", value="All Devices")
country_dropdown = gr.Dropdown(choices=["All Countries"] + unique_countries, label="Country", value="All Countries")
# --- Key Metrics Display ---
gr.Markdown("## Key Metrics")
with gr.Row():
total_sessions_num = gr.Number(label="Total Sessions", value=0, precision=0)
unique_users_num = gr.Number(label="Unique Users", value=0, precision=0)
avg_duration_num = gr.Number(label="Avg. Session Duration (s)", value=0, precision=2)
top_page_text = gr.Textbox(label="Most Visited Page", value="N/A", interactive=False)
# --- Visualizations Tabs ---
gr.Markdown("## Visualizations")
with gr.Tabs():
with gr.TabItem("Sessions Over Time"):
sessions_plot_output = gr.Plot()
with gr.TabItem("Engagement by Device"):
device_plot_output = gr.Plot()
with gr.TabItem("Page Visit Distribution"):
page_visits_plot_output = gr.Plot()
# --- Raw Data Table ---
gr.Markdown("## Raw Engagement Data (Sample)")
# Corrected: Removed max_rows. The number of rows displayed will be controlled
# by the DataFrame returned by get_data_for_table_display (which returns head(100)).
# Gradio will then paginate or scroll this.
data_table_output = gr.DataFrame(
label="User Sessions Data",
interactive=False,
headers=['Timestamp', 'User ID', 'Page Visited', 'Duration (s)', 'Country', 'Device', 'Browser']
# For display height, you can use the `height` parameter, e.g., height=400
)
# --- Define Inputs & Outputs for Update Function ---
inputs_list = [start_date_picker, end_date_picker, page_dropdown, device_dropdown, country_dropdown]
outputs_list = [
sessions_plot_output, device_plot_output, page_visits_plot_output,
data_table_output,
total_sessions_num, unique_users_num, avg_duration_num, top_page_text
]
# --- Event Handling: Update dashboard when filters change ---
for filter_component in inputs_list:
if isinstance(filter_component, gr.Textbox):
filter_component.submit(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
else:
filter_component.change(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
# --- Initial load of the dashboard ---
dashboard_interface.load(
fn=update_full_dashboard,
inputs=inputs_list,
outputs=outputs_list
)
return dashboard_interface
# Create Gradio Dashboard Interface ---
def build_engagement_dashboard():
unique_pages, unique_devices, unique_countries = get_unique_filter_values()
min_data_date, max_data_date = get_date_range_from_data()
# Set initial dates as strings for Gradio components
initial_start_date_str = min_data_date.strftime('%Y-%m-%d')
initial_end_date_str = max_data_date.strftime('%Y-%m-%d')
with gr.Blocks(theme=gr.themes.Soft(), title="Website Engagement Dashboard") as dashboard_interface:
gr.Markdown("# Website User Engagement Dashboard")
gr.Markdown("Explore user activity trends and engagement metrics for your website. This **Python Gradio dashboard** helps with **Gradio data visualization**.")
# --- Filters Row ---
with gr.Row():
start_date_picker = gr.Textbox(label="Start Date (YYYY-MM-DD)", value=initial_start_date_str, type="text")
end_date_picker = gr.Textbox(label="End Date (YYYY-MM-DD)", value=initial_end_date_str, type="text")
with gr.Row():
page_dropdown = gr.Dropdown(choices=["All Pages"] + unique_pages, label="Page Visited", value="All Pages")
device_dropdown = gr.Dropdown(choices=["All Devices"] + unique_devices, label="Device Type", value="All Devices")
country_dropdown = gr.Dropdown(choices=["All Countries"] + unique_countries, label="Country", value="All Countries")
# --- Key Metrics Display ---
gr.Markdown("## Key Metrics")
with gr.Row():
total_sessions_num = gr.Number(label="Total Sessions", value=0, precision=0)
unique_users_num = gr.Number(label="Unique Users", value=0, precision=0)
avg_duration_num = gr.Number(label="Avg. Session Duration (s)", value=0, precision=2)
top_page_text = gr.Textbox(label="Most Visited Page", value="N/A", interactive=False)
# --- Visualizations Tabs ---
gr.Markdown("## Visualizations")
with gr.Tabs():
with gr.TabItem("Sessions Over Time"):
sessions_plot_output = gr.Plot()
with gr.TabItem("Engagement by Device"):
device_plot_output = gr.Plot()
with gr.TabItem("Page Visit Distribution"):
page_visits_plot_output = gr.Plot()
# --- Raw Data Table ---
gr.Markdown("## Raw Engagement Data (Sample)")
# Corrected: Removed max_rows. The number of rows displayed will be controlled
# by the DataFrame returned by get_data_for_table_display (which returns head(100)).
# Gradio will then paginate or scroll this.
data_table_output = gr.DataFrame(
label="User Sessions Data",
interactive=False,
headers=['Timestamp', 'User ID', 'Page Visited', 'Duration (s)', 'Country', 'Device', 'Browser']
# For display height, you can use the `height` parameter, e.g., height=400
)
# --- Define Inputs & Outputs for Update Function ---
inputs_list = [start_date_picker, end_date_picker, page_dropdown, device_dropdown, country_dropdown]
outputs_list = [
sessions_plot_output, device_plot_output, page_visits_plot_output,
data_table_output,
total_sessions_num, unique_users_num, avg_duration_num, top_page_text
]
# --- Event Handling: Update dashboard when filters change ---
for filter_component in inputs_list:
if isinstance(filter_component, gr.Textbox):
filter_component.submit(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
else:
filter_component.change(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
# --- Initial load of the dashboard ---
dashboard_interface.load(
fn=update_full_dashboard,
inputs=inputs_list,
outputs=outputs_list
)
return dashboard_interface
10. 运行Gradio的主执行函数
这里我们执行的是主函数 build_engagement_dashboard,它将为 Web 应用程序的启动准备界面。
# --- Main execution block ---
if __name__ == "__main__":
if raw_data is None or raw_data.empty:
print("Halting: Data could not be loaded. Please ensure 'website_engagement_data.csv' exists or can be generated.")
print("Building and launching the Gradio dashboard...")
engagement_dashboard = build_engagement_dashboard()
engagement_dashboard.launch(server_name="0.0.0.0") # Makes it accessible on local network
print("Dashboard is running. Open your browser to the provided URL.")
# --- Main execution block ---
if __name__ == "__main__":
if raw_data is None or raw_data.empty:
print("Halting: Data could not be loaded. Please ensure 'website_engagement_data.csv' exists or can be generated.")
else:
print("Building and launching the Gradio dashboard...")
engagement_dashboard = build_engagement_dashboard()
engagement_dashboard.launch(server_name="0.0.0.0") # Makes it accessible on local network
print("Dashboard is running. Open your browser to the provided URL.")
# --- Main execution block ---
if __name__ == "__main__":
if raw_data is None or raw_data.empty:
print("Halting: Data could not be loaded. Please ensure 'website_engagement_data.csv' exists or can be generated.")
else:
print("Building and launching the Gradio dashboard...")
engagement_dashboard = build_engagement_dashboard()
engagement_dashboard.launch(server_name="0.0.0.0") # Makes it accessible on local network
print("Dashboard is running. Open your browser to the provided URL.")
现在,在终端中运行 Python app.py 来运行 Web 应用程序。
输出:

单击本地 URL 链接以启动 Gradio 界面。
输出:

一个交互式仪表板已创建。我们可以使用此界面以交互方式分析数据集并轻松获取见解。

我们可以看到基于不同过滤器的可视化效果。

小结
Gradio 可以有效地从海量数据集中获取洞察。通过创建交互式可视化仪表板,数据分析过程可以更加引人入胜。如果您完成了本详细指南,那么您将能够高效地使用 Gradio 创建交互式仪表板。我们涵盖了数据生成、加载、缓存、定义过滤逻辑、计算指标以及使用 Plotly 创建图表。无需任何前端编程和技术知识即可构建此仪表板。虽然我们在本指南中使用了 CSV,但您可以根据需要使用任何其他数据源。事实证明,Gradio 是一款创建动态且用户友好的仪表板的宝贵工具。
评论留言