使用Nano Banana开发明星合影应用程序：Selfie with a Celeb

明星合影

自从谷歌推出 Nano Banana 以来，互联网上就充斥着各种 AI 生成的图片编辑。从风靡一时的 3D 人偶到复古宝莱坞纱丽造型，各种新风格层出不穷，让人欲罢不能。当我第一次深入研究该模型的配置时，它的精准度就让我惊叹不已：它融合图像、匹配光线，并创造出自然逼真的效果。就在那时，我萌生了一个想法——何不利用这种强大的功能，打造一款有趣的 Nano Banana 应用呢？在这篇博文中，我将带你了解我是如何打造 Selfie with a Celeb 的，这是一款 AI 应用，可以让你与自己喜欢的演员、音乐家或公众人物一起生成逼真的合影。我将使用谷歌的 Nano Banana (Gemini 2.5 Flash Image) 模型作为引擎，将其与 Gradio 配对，打造一个简单的网页界面，并将其部署到 Hugging Face Spaces 上，供所有人试用。

项目概述

本项目的目标是构建一款简单有趣的人工智能图像应用，名为“Selfie with a Celeb”。该应用允许用户：

上传自己的照片。
上传自己喜欢的明星的照片。
生成一张逼真的合成图像，使两人同时出现，如同在同一场景中拍摄。

为实现这一目标，该项目使用了：

谷歌的 Nano Banana（Gemini 2.5 Flash Image）作为人工智能模型，执行图像混合、光线调整和背景生成。
Gradio 作为框架，用于创建用户友好的网页界面，用户可以上传照片并查看结果。
Hugging Face Spaces 作为免费公开部署的托管平台，任何人都可以访问和测试该应用。

最终结果是一张可共享的人工智能生成照片，看起来自然、一致且引人入胜。由于它由个性化（您和您选择的明星）驱动，它充分利用了内容传播的要素：人们喜欢在富有创意的新环境中看到自己。

构建我们的 Nano Banana 应用

让我们开始构建我们的 Nano Banana 应用：

步骤 1：设置您的项目

首先，为您的项目创建一个新文件夹。在该文件夹中，创建一个名为 requirements.txt 的文件，并添加我们需要的库。

requirements.txt

gradio
google-generativeai
pillow

通过在终端中运行以下命令来安装这些依赖项：

pip install -r requirements.txt

我们的 app.py 脚本可能看起来很长，但它被组织成逻辑合理的几个部分，处理从与 AI 对话到构建用户界面的所有任务。让我们来详细了解一下。

步骤 2：导入和初始设置

在文件的最顶部，我们导入所有必要的库并定义一些常量：

import os
import io
import time
import base64
import mimetypes
from datetime import datetime
from pathlib import Path
from PIL import Image
import gradio as gr
# --- Gemini SDKs (new + legacy) -------------------------------------------------
# Prefer the NEW Google Gen AI SDK (google.genai). Fallback to legacy
# google.generativeai if the new one is not installed yet.
try:
from google import genai as genai_new  # new SDK (preferred)
HAVE_NEW_SDK = True
except Exception:
genai_new = None
HAVE_NEW_SDK = False
try:
import google.generativeai as genai_legacy  # legacy SDK
from google.api_core.exceptions import ResourceExhausted, InvalidArgument, GoogleAPICallError
HAVE_LEGACY_SDK = True
except Exception:
genai_legacy = None
ResourceExhausted = InvalidArgument = GoogleAPICallError = Exception
HAVE_LEGACY_SDK = False
APP_TITLE = "Take a Picture with Your Favourite Celeb!"
# New SDK model id (no "models/" prefix)
MODEL_NAME_NEW = "gemini-2.5-flash-image-preview"
# Legacy SDK model id (prefixed)
MODEL_NAME_LEGACY = "models/gemini-2.5-flash-image-preview"

标准库：我们导入了 os、io、time 和 base64 等基本操作，例如处理文件、数据流和延迟。PIL（Pillow）对于图像处理至关重要。
核心组件：gradio 用于构建 Web UI，google.generativeai 是 Google 官方库，用于与 Gemini 模型交互。
常量：我们在顶部定义了 APP_TITLE 和 MODEL_NAME。这样可以轻松更改应用标题或稍后更新模型版本，而无需搜索代码。

步骤 3：用于实现稳健API交互的辅助函数

这组函数使我们的应用程序更加可靠。它们能够优雅地处理复杂的 API 响应和潜在的网络问题：

# Helper functions
def _iter_parts(resp):
"""Yield all part-like objects across candidates for both SDKs."""
# New SDK: response.candidates[0].content.parts
if hasattr(resp, "candidates") and resp.candidates:
for c in resp.candidates:
content = getattr(c, "content", None)
if content and getattr(content, "parts", None):
for p in content.parts:
yield p
# Legacy SDK also exposes resp.parts sometimes (tool responses, etc.)
if hasattr(resp, "parts") and resp.parts:
for p in resp.parts:
yield p
def _find_first_image_part(resp):
for p in _iter_parts(resp):
inline = getattr(p, "inline_data", None) or getattr(p, "inlineData", None)
if inline and getattr(inline, "data", None):
mime = getattr(inline, "mime_type", None) or getattr(inline, "mimeType", "") or ""
if str(mime).lower().startswith("image/"):
return p
return None
def _collect_text(resp, limit=2):
msgs = []
for p in _iter_parts(resp):
txt = getattr(p, "text", None)
if isinstance(txt, str) and txt.strip():
msgs.append(txt.strip())
if len(msgs) >= limit:
break
return msgs
def _format_candidate_reasons(resp):
infos = []
cands = getattr(resp, "candidates", None)
if not cands:
return ""
for i, c in enumerate(cands):
fr = getattr(c, "finish_reason", None)
if fr is None:  # new SDK uses enum-ish, legacy uses ints; stringify either way
fr = getattr(c, "finishReason", None)
if fr is not None:
infos.append(f"candidate[{i}].finish_reason={fr}")
safety = getattr(c, "safety_ratings", None) or getattr(c, "safetyRatings", None)
if safety:
infos.append(f"candidate[{i}].safety_ratings={safety}")
return "\n".join(infos)
def _preprocess_image(pil_img, max_side=512):
# Smaller input to reduce token/count cost
pil_img = pil_img.convert("RGB")
w, h = pil_img.size
m = max(w, h)
if m <= max_side:
return pil_img
scale = max_side / float(m)
nw, nh = int(w * scale), int(h * scale)
return pil_img.resize((nw, nh), Image.LANCZOS)
def _call_with_backoff_legacy(model, contents, max_retries=3):
"""Retries for the legacy SDK on 429/quota errors with exponential backoff."""
last_exc = None
for attempt in range(max_retries + 1):
try:
return model.generate_content(contents)
except ResourceExhausted as e:
last_exc = e
if attempt == max_retries:
raise
time.sleep(2 ** attempt)
except GoogleAPICallError as e:
err_str = str(e)
if "429" in err_str or "quota" in err_str.lower():
last_exc = e
if attempt == max_retries:
raise
time.sleep(2 ** attempt)
else:
raise
if last_exc:
raise last_exc
raise RuntimeError("Unreachable in backoff logic")
def _extract_image_bytes_and_mime(resp):
"""Return (image_bytes, mime) from response's first image part.
Handles both SDKs:
- NEW SDK returns bytes in part.inline_data.data
- LEGACY SDK returns base64-encoded str in part.inline_data.data
"""
part = _find_first_image_part(resp)
if not part:
return None, None
inline = getattr(part, "inline_data", None) or getattr(part, "inlineData", None)
mime = getattr(inline, "mime_type", None) or getattr(inline, "mimeType", None) or "image/png"
raw = getattr(inline, "data", None)
if raw is None:
return None, None
if isinstance(raw, (bytes, bytearray)):
# NEW SDK already returns raw bytes
return bytes(raw), mime
elif isinstance(raw, str):
# LEGACY SDK returns base64 string
try:
return base64.b64decode(raw), mime
except Exception:
# It *might* already be raw bytes in string-like form; last resort
try:
return raw.encode("latin1", "ignore"), mime
except Exception:
return None, None
else:
return None, None
def _guess_ext_from_mime(mime: str) -> str:
ext = mimetypes.guess_extension(mime or "")
if not ext:
# Fallbacks for common cases not covered by mimetypes on some OSes
if mime == "image/webp":
return ".webp"
if mime == "image/jpeg":
return ".jpg"
return ".png"
# Normalize jpeg
if ext == ".jpe":
ext = ".jpg"
return ext

响应解析器（_iter_parts、_find_first_image_part、_collect_text、_format_candidate_reasons）：Gemini API 可以返回包含多个部分的复杂响应。这些函数会安全地搜索响应，以查找重要信息：生成的图像数据、任何文本消息或错误/安全信息。
_preprocess_image：为了节省 API 成本并加快生成速度，此函数会获取上传的图像，如果图像过大，则会调整其大小。它会保持宽高比，同时确保最长边不超过 512 像素。
_call_with_backoff：这是确保可靠性的关键函数。如果 Google API 繁忙并提示我们降低速度（“超出配额”错误），此函数会自动等待片刻，然后重试。每次尝试失败都会增加等待时间，以防止应用崩溃。

步骤 4：主要生成逻辑

这是我们应用程序的核心。 generate_image_with_celeb 函数协调整个过程，从验证用户输入到返回最终图像。

# Main function
def generate_image_with_celeb(api_key, user_image, celeb_image, auto_download=False, progress=gr.Progress()):
if not api_key:
return None, " Authentication Error: Please provide your Google AI API key.", None, ""
if user_image is None or celeb_image is None:
return None, " Please upload both your photo and the celebrity photo.", None, ""
progress(0.05, desc="Configuring API...")
client_new = None
model_legacy = None
# Prefer NEW SDK
if HAVE_NEW_SDK:
try:
client_new = genai_new.Client(api_key=api_key)
except Exception as e:
return None, f" API key configuration failed (new SDK): {e}", None, ""
elif HAVE_LEGACY_SDK:
try:
genai_legacy.configure(api_key=api_key)
except Exception as e:
return None, f" API key configuration failed (legacy SDK): {e}", None, ""
else:
return None, " Neither google.genai (new) nor google.generativeai (legacy) SDK is installed.", None, ""
progress(0.15, desc="Preparing images...")
try:
user_pil = Image.fromarray(user_image)
celeb_pil = Image.fromarray(celeb_image)
user_pil = _preprocess_image(user_pil, max_side=512)
celeb_pil = _preprocess_image(celeb_pil, max_side=512)
except Exception as e:
return None, f" Failed to process images: {e}", None, ""
prompt = (
"Analyze these two images. Create a single, new, photorealistic image where the person from the first image is standing next to the celebrity from the second image. "
"Key requirements: 1) Seamless integration in the same physical space. 2) Generate a natural background (e.g., red carpet, casual street, studio). "
"3) Consistent lighting/shadows/color tones matching the generated background. 4) Natural poses/interactions. 5) High-resolution, artifact-free output."
)
contents = [user_pil, celeb_pil, prompt]
progress(0.35, desc="Sending request...")
response = None
try:
if client_new is not None:
# New SDK call
response = client_new.models.generate_content(
model=MODEL_NAME_NEW, contents=contents
)
else:
# Legacy SDK call
model_legacy = genai_legacy.GenerativeModel(MODEL_NAME_LEGACY)
response = _call_with_backoff_legacy(model_legacy, contents, max_retries=4)
except Exception as e:
err = str(e)
if "429" in err or "quota" in err.lower():
return None, (
" You’ve exceeded your quota for image preview model. "
"Wait for quota reset or upgrade billing / permissions."
), None, ""
return None, f" API call failed: {err}", None, ""
progress(0.65, desc="Parsing response...")
if not response:
return None, " No response from model.", None, ""
# Robust decode that works for both SDKs and avoids double base64-decoding
image_bytes, mime = _extract_image_bytes_and_mime(response)
if image_bytes:
try:
# Persist to disk first (works even if PIL can't decode e.g., some WEBP builds)
outputs_dir = Path("outputs")
outputs_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
ext = _guess_ext_from_mime(mime or "image/png")
file_path = outputs_dir / f"celeb_fusion_{ts}{ext}"
with open(file_path, "wb") as f:
f.write(image_bytes)
# Try to open with PIL for preview; if it fails, return the file path (Gradio can render by path)
img_obj = None
try:
img_obj = Image.open(io.BytesIO(image_bytes))
except Exception:
img_obj = None
progress(0.95, desc="Rendering...")
# Optional auto-download via data-URI (no reliance on Gradio file routes)
auto_dl_html = ""
if auto_download:
b64 = base64.b64encode(image_bytes).decode("ascii")
fname = file_path.name
auto_dl_html = (
f"<a id='autodl' href='data:{mime};base64,{b64}' download='{fname}'></a>"
f"<script>(function(){{var a=document.getElementById('autodl');if(a) a.click();}})();</script>"
)
# Return PIL image if available, else the file path (both supported by gr.Image)
display_obj = img_obj if img_obj is not None else str(file_path)
return display_obj, f" Image generated ({mime}).", str(file_path), auto_dl_html
except Exception as e:
details = _format_candidate_reasons(response)
return None, f" Failed to write or load image: {e}\n\n{details}", None, ""
# If no image part → get text
texts = _collect_text(response, limit=2)
reasons = _format_candidate_reasons(response)
guidance = (
"\nTo get image output you need access to the preview image model "
"and sufficient free-tier quota or a billed project."
)
txt_msg = texts[0] if texts else "No text message."
debug = f"\n[Debug info] {reasons}" if reasons else ""
return None, f" Model returned text: {txt_msg}{guidance}{debug}", None, ""

输入验证：首先检查用户是否提供了 API 密钥和两张图片。如果没有，则立即返回错误消息。
API 配置：使用 genai.configure() 函数，通过用户的个人 API 密钥建立与 Google 服务器的连接。
图片准备：将通过 Gradio 上传的图片转换为 API 可以理解的格式（PIL 图片），并使用我们的 _preprocess_image 辅助函数调整图片大小。
提示和 API 调用：构建最终提示，将两张图片和我们的文本指令组合在一起。然后，使用我们可靠的 _call_with_backoff 函数调用 Gemini 模型。
响应处理：收到响应后，使用我们的辅助函数查找图像数据。如果找到图片，则对其进行解码并将其返回到 UI。如果没有找到，则查找并返回文本消息，以便用户了解发生了什么。

步骤 5：使用Gradio构建用户界面

代码的最后一部分使用 Gradio 构建交互式网页：

# Gradio UI
custom_css = """
.gradio-container { border-radius: 20px !important; box-shadow: 0 4px 20px rgba(0,0,0,0.05); }
#title { text-align: center; font-family: 'Helvetica Neue', sans-serif; font-weight: 700; font-size: 2.0rem; }
#subtitle { text-align: center; font-size: 1.0rem; margin-bottom: 1.0rem; }
.gr-button { font-weight: 600 !important; border-radius: 8px !important; padding: 12px 10px !important; }
#output_header { text-align: center; font-weight: bold; font-size: 1.2rem; }
footer { display: none !important; }
"""
with gr.Blocks(theme=gr.themes.Soft(), css=custom_css) as demo:
gr.Markdown(f"# {APP_TITLE}", elem_id="title")
gr.Markdown(
"Uses Gemini 2.5 Flash Image Preview model. Provide your API key (new SDK: google.genai) or legacy SDK key.",
elem_id="subtitle"
)
with gr.Accordion("Step 1: Enter Your Google AI API Key", open=True):
api_key_box = gr.Textbox(
label="API Key",
placeholder="Paste your Google AI API key here...",
type="password",
info="Ensure the key is from a project with billing and image preview access.",
interactive=True
)
with gr.Row(variant="panel"):
user_image = gr.Image(type="numpy", label="Upload your picture", height=350)
celeb_image = gr.Image(type="numpy", label="Upload celeb picture", height=350)
with gr.Row():
auto_dl = gr.Checkbox(label="Auto-download after generation", value=False)
generate_btn = gr.Button("Generate Image", variant="primary")
gr.Markdown("### Output", elem_id="output_header")
output_image = gr.Image(label="Generated Image", height=500, interactive=False, show_download_button=True)
output_text = gr.Markdown(label="Status / Message")
# Download button fed by the function (returns a filepath)
download_btn = gr.DownloadButton("Download Image", value=None, size="md")
# Hidden HTML slot to run the client-side auto-download script (data URI)
auto_dl_html = gr.HTML(visible=False)
generate_btn.click(
fn=generate_image_with_celeb,
inputs=[api_key_box, user_image, celeb_image, auto_dl],
outputs=[output_image, output_text, download_btn, auto_dl_html]
)
if __name__ == "__main__":
# Ensure example/output dirs exist
Path("./examples").mkdir(parents=True, exist_ok=True)
Path("./outputs").mkdir(parents=True, exist_ok=True)
demo.launch(debug=True)

布局 (gr.Blocks)：我们使用 gr.Blocks 创建自定义布局。我们还传入了 custom_css 字符串来设置组件的样式。
组件：页面上的每个元素，例如标题 (gr.Markdown)、API 密钥输入字段 (gr.Textbox) 和图片上传框 (gr.Image)，都创建为 Gradio 组件。
排列：两个图片上传框等组件放置在 gr.Row 中并排显示。API 密钥字段位于可折叠的 gr.Accordion 中。
按钮和输出：我们定义“Generate”按钮 (gr.Button) 以及显示结果的组件 (output_image 和 output_text)。
事件处理 (.click())：这是将 UI 与 Python 逻辑连接起来的关键。这行代码告诉 Gradio：“点击 generate_btn 按钮后，运行 generate_image_with_celeb 函数。将 api_key_box、user_image 和 celeb_image 的值作为输入，并将结果放入 output_image 和 output_text 中。”

上线：在Hugging Face上部署Gradio应用

这个项目的一大亮点在于 Gradio 应用的部署非常简单。我们将使用 Hugging Face Spaces，这是一个免费的机器学习演示托管平台。

Create a Hugging Face Account: If you do not have one, sign up at huggingface.co.创建 Hugging Face 帐户：如果您还没有帐户，请访问 huggingface.co 注册。
创建新空间：在您的个人资料中，点击“New Space”。
- 空间名称：为其指定一个唯一的名称（例如，celebrity-selfie-generator）。
- 许可证：选择一个许可证（例如，mit）。
- 选择 Space SDK：选择 Gradio。
- 硬件：免费的“CPU basic”选项即可满足需求。
- 点击“Create Space”。
上传文件：
- 在新空间中，导航至“Files and versions”选项卡。
- 点击“Add file”，然后点击“Upload files”。
- 选择您的 app.py 和 requirements.txt 文件并上传。

就这样！Hugging Face Spaces 将自动安装所需的库并启动您的应用程序。片刻之后，您的应用程序将上线，供全球任何人使用。由于该应用程序要求每个用户输入自己的 API 密钥，因此您无需担心服务器端密钥的管理。

点击此处查看“Selfie with a Celeb”应用程序！

输入：

输入图像 | Nano Banana App

输出：

Nano-Banana-App 输出

您需要提供您的 Gemini API 密钥、上传照片并添加一张名人图片。点击“Generate”后，应用将在几分钟内处理并交付您的输出。输出效果自然逼真，两张图片之间高度一致。

使用您的 API 密钥，用您自己的照片和您最喜欢的名人图片试试吧！

小结

现在，您已经拥有了构建自己的病毒式 AI 图像应用的完整蓝图。我们探讨了 Google 的 Nano Banana 模型 (Gemini 2.5 Flash Image) 如何生成高度逼真、一致的输出，以及如何轻松地将其与 Gradio 和 Hugging Face Spaces 等框架集成。最棒的是，您可以自定义提示、调整界面，甚至可以将这个想法扩展为全新的应用。只需几个步骤，您就可以将这个项目从概念变为现实，并创造出真正值得分享的作品。

您正在构建哪款 Nano Banana 应用？请在下方评论区告诉我！

Gradio Hugging Face Nano Banana

使用Nano Banana开发明星合影应用程序：Selfie with a Celeb

文章目录

项目概述

步骤 1：设置您的项目

步骤 2：导入和初始设置

步骤 3：用于实现稳健API交互的辅助函数

步骤 4：主要生成逻辑

步骤 5：使用Gradio构建用户界面

上线：在Hugging Face上部署Gradio应用

小结

评论留言

取消回复

使用Nano Banana开发明星合影应用程序：Selfie with a Celeb

文章目录

项目概述

步骤 1：设置您的项目

步骤 2：导入和初始设置

步骤 3：用于实现稳健API交互的辅助函数

步骤 4：主要生成逻辑

步骤 5：使用Gradio构建用户界面

上线：在Hugging Face上部署Gradio应用

小结

相关文章

评论留言

取消回复