使用Nano Banana開發明星合影應用程式：Selfie with a Celeb

明星合影

自從谷歌推出 Nano Banana 以來，網際網路上就充斥著各種 AI 生成的圖片編輯。從風靡一時的 3D 人偶到復古寶萊塢紗麗造型，各種新風格層出不窮，讓人慾罷不能。當我第一次深入研究該模型的配置時，它的精準度就讓我驚歎不已：它融合影像、匹配光線，並創造出自然逼真的效果。就在那時，我萌生了一個想法——何不利用這種強大的功能，打造一款有趣的 Nano Banana 應用呢？在這篇博文中，我將帶你瞭解我是如何打造 Selfie with a Celeb 的，這是一款 AI 應用，可以讓你與自己喜歡的演員、音樂家或公眾人物一起生成逼真的合影。我將使用谷歌的 Nano Banana (Gemini 2.5 Flash Image) 模型作為引擎，將其與 Gradio 配對，打造一個簡單的網頁介面，並將其部署到 Hugging Face Spaces 上，供所有人試用。

專案概述

本專案的目標是構建一款簡單有趣的人工智慧影像應用，名為“Selfie with a Celeb”。該應用允許使用者：

上傳自己的照片。
上傳自己喜歡的明星的照片。
生成一張逼真的合成影像，使兩人同時出現，如同在同一場景中拍攝。

為實現這一目標，該專案使用了：

谷歌的 Nano Banana（Gemini 2.5 Flash Image）作為人工智慧模型，執行影像混合、光線調整和背景生成。
Gradio 作為框架，用於建立使用者友好的網頁介面，使用者可以上傳照片並檢視結果。
Hugging Face Spaces 作為免費公開部署的託管平臺，任何人都可以訪問和測試該應用。

最終結果是一張可共享的人工智慧生成照片，看起來自然、一致且引人入勝。由於它由個性化（您和您選擇的明星）驅動，它充分利用了內容傳播的要素：人們喜歡在富有創意的新環境中看到自己。

構建我們的 Nano Banana 應用

讓我們開始構建我們的 Nano Banana 應用：

步驟 1：設定您的專案

首先，為您的專案建立一個新資料夾。在該資料夾中，建立一個名為 requirements.txt 的檔案，並新增我們需要的庫。

requirements.txt

gradio
google-generativeai
pillow

透過在終端中執行以下命令來安裝這些依賴項：

pip install -r requirements.txt

我們的 app.py 指令碼可能看起來很長，但它被組織成邏輯合理的幾個部分，處理從與 AI 對話到構建使用者介面的所有任務。讓我們來詳細瞭解一下。

步驟 2：匯入和初始設定

在檔案的最頂部，我們匯入所有必要的庫並定義一些常量：

import os
import io
import time
import base64
import mimetypes
from datetime import datetime
from pathlib import Path
from PIL import Image
import gradio as gr
# --- Gemini SDKs (new + legacy) -------------------------------------------------
# Prefer the NEW Google Gen AI SDK (google.genai). Fallback to legacy
# google.generativeai if the new one is not installed yet.
try:
from google import genai as genai_new  # new SDK (preferred)
HAVE_NEW_SDK = True
except Exception:
genai_new = None
HAVE_NEW_SDK = False
try:
import google.generativeai as genai_legacy  # legacy SDK
from google.api_core.exceptions import ResourceExhausted, InvalidArgument, GoogleAPICallError
HAVE_LEGACY_SDK = True
except Exception:
genai_legacy = None
ResourceExhausted = InvalidArgument = GoogleAPICallError = Exception
HAVE_LEGACY_SDK = False
APP_TITLE = "Take a Picture with Your Favourite Celeb!"
# New SDK model id (no "models/" prefix)
MODEL_NAME_NEW = "gemini-2.5-flash-image-preview"
# Legacy SDK model id (prefixed)
MODEL_NAME_LEGACY = "models/gemini-2.5-flash-image-preview"

標準庫：我們匯入了 os、io、time 和 base64 等基本操作，例如處理檔案、資料流和延遲。PIL（Pillow）對於影像處理至關重要。
核心元件：gradio 用於構建 Web UI，google.generativeai 是 Google 官方庫，用於與 Gemini 模型互動。
常量：我們在頂部定義了 APP_TITLE 和 MODEL_NAME。這樣可以輕鬆更改應用標題或稍後更新模型版本，而無需搜尋程式碼。

步驟 3：用於實現穩健API互動的輔助函式

這組函式使我們的應用程式更加可靠。它們能夠優雅地處理複雜的 API 響應和潛在的網路問題：

# Helper functions
def _iter_parts(resp):
"""Yield all part-like objects across candidates for both SDKs."""
# New SDK: response.candidates[0].content.parts
if hasattr(resp, "candidates") and resp.candidates:
for c in resp.candidates:
content = getattr(c, "content", None)
if content and getattr(content, "parts", None):
for p in content.parts:
yield p
# Legacy SDK also exposes resp.parts sometimes (tool responses, etc.)
if hasattr(resp, "parts") and resp.parts:
for p in resp.parts:
yield p
def _find_first_image_part(resp):
for p in _iter_parts(resp):
inline = getattr(p, "inline_data", None) or getattr(p, "inlineData", None)
if inline and getattr(inline, "data", None):
mime = getattr(inline, "mime_type", None) or getattr(inline, "mimeType", "") or ""
if str(mime).lower().startswith("image/"):
return p
return None
def _collect_text(resp, limit=2):
msgs = []
for p in _iter_parts(resp):
txt = getattr(p, "text", None)
if isinstance(txt, str) and txt.strip():
msgs.append(txt.strip())
if len(msgs) >= limit:
break
return msgs
def _format_candidate_reasons(resp):
infos = []
cands = getattr(resp, "candidates", None)
if not cands:
return ""
for i, c in enumerate(cands):
fr = getattr(c, "finish_reason", None)
if fr is None:  # new SDK uses enum-ish, legacy uses ints; stringify either way
fr = getattr(c, "finishReason", None)
if fr is not None:
infos.append(f"candidate[{i}].finish_reason={fr}")
safety = getattr(c, "safety_ratings", None) or getattr(c, "safetyRatings", None)
if safety:
infos.append(f"candidate[{i}].safety_ratings={safety}")
return "\n".join(infos)
def _preprocess_image(pil_img, max_side=512):
# Smaller input to reduce token/count cost
pil_img = pil_img.convert("RGB")
w, h = pil_img.size
m = max(w, h)
if m <= max_side:
return pil_img
scale = max_side / float(m)
nw, nh = int(w * scale), int(h * scale)
return pil_img.resize((nw, nh), Image.LANCZOS)
def _call_with_backoff_legacy(model, contents, max_retries=3):
"""Retries for the legacy SDK on 429/quota errors with exponential backoff."""
last_exc = None
for attempt in range(max_retries + 1):
try:
return model.generate_content(contents)
except ResourceExhausted as e:
last_exc = e
if attempt == max_retries:
raise
time.sleep(2 ** attempt)
except GoogleAPICallError as e:
err_str = str(e)
if "429" in err_str or "quota" in err_str.lower():
last_exc = e
if attempt == max_retries:
raise
time.sleep(2 ** attempt)
else:
raise
if last_exc:
raise last_exc
raise RuntimeError("Unreachable in backoff logic")
def _extract_image_bytes_and_mime(resp):
"""Return (image_bytes, mime) from response's first image part.
Handles both SDKs:
- NEW SDK returns bytes in part.inline_data.data
- LEGACY SDK returns base64-encoded str in part.inline_data.data
"""
part = _find_first_image_part(resp)
if not part:
return None, None
inline = getattr(part, "inline_data", None) or getattr(part, "inlineData", None)
mime = getattr(inline, "mime_type", None) or getattr(inline, "mimeType", None) or "image/png"
raw = getattr(inline, "data", None)
if raw is None:
return None, None
if isinstance(raw, (bytes, bytearray)):
# NEW SDK already returns raw bytes
return bytes(raw), mime
elif isinstance(raw, str):
# LEGACY SDK returns base64 string
try:
return base64.b64decode(raw), mime
except Exception:
# It *might* already be raw bytes in string-like form; last resort
try:
return raw.encode("latin1", "ignore"), mime
except Exception:
return None, None
else:
return None, None
def _guess_ext_from_mime(mime: str) -> str:
ext = mimetypes.guess_extension(mime or "")
if not ext:
# Fallbacks for common cases not covered by mimetypes on some OSes
if mime == "image/webp":
return ".webp"
if mime == "image/jpeg":
return ".jpg"
return ".png"
# Normalize jpeg
if ext == ".jpe":
ext = ".jpg"
return ext

響應解析器（_iter_parts、_find_first_image_part、_collect_text、_format_candidate_reasons）：Gemini API 可以返回包含多個部分的複雜響應。這些函式會安全地搜尋響應，以查詢重要資訊：生成的影像資料、任何文字訊息或錯誤/安全資訊。
_preprocess_image：為了節省 API 成本並加快生成速度，此函式會獲取上傳的影像，如果影像過大，則會調整其大小。它會保持寬高比，同時確保最長邊不超過 512 畫素。
_call_with_backoff：這是確保可靠性的關鍵函式。如果 Google API 繁忙併提示我們降低速度（“超出配額”錯誤），此函式會自動等待片刻，然後重試。每次嘗試失敗都會增加等待時間，以防止應用崩潰。

步驟 4：主要生成邏輯

這是我們應用程式的核心。 generate_image_with_celeb 函式協調整個過程，從驗證使用者輸入到返回最終影像。

# Main function
def generate_image_with_celeb(api_key, user_image, celeb_image, auto_download=False, progress=gr.Progress()):
if not api_key:
return None, " Authentication Error: Please provide your Google AI API key.", None, ""
if user_image is None or celeb_image is None:
return None, " Please upload both your photo and the celebrity photo.", None, ""
progress(0.05, desc="Configuring API...")
client_new = None
model_legacy = None
# Prefer NEW SDK
if HAVE_NEW_SDK:
try:
client_new = genai_new.Client(api_key=api_key)
except Exception as e:
return None, f" API key configuration failed (new SDK): {e}", None, ""
elif HAVE_LEGACY_SDK:
try:
genai_legacy.configure(api_key=api_key)
except Exception as e:
return None, f" API key configuration failed (legacy SDK): {e}", None, ""
else:
return None, " Neither google.genai (new) nor google.generativeai (legacy) SDK is installed.", None, ""
progress(0.15, desc="Preparing images...")
try:
user_pil = Image.fromarray(user_image)
celeb_pil = Image.fromarray(celeb_image)
user_pil = _preprocess_image(user_pil, max_side=512)
celeb_pil = _preprocess_image(celeb_pil, max_side=512)
except Exception as e:
return None, f" Failed to process images: {e}", None, ""
prompt = (
"Analyze these two images. Create a single, new, photorealistic image where the person from the first image is standing next to the celebrity from the second image. "
"Key requirements: 1) Seamless integration in the same physical space. 2) Generate a natural background (e.g., red carpet, casual street, studio). "
"3) Consistent lighting/shadows/color tones matching the generated background. 4) Natural poses/interactions. 5) High-resolution, artifact-free output."
)
contents = [user_pil, celeb_pil, prompt]
progress(0.35, desc="Sending request...")
response = None
try:
if client_new is not None:
# New SDK call
response = client_new.models.generate_content(
model=MODEL_NAME_NEW, contents=contents
)
else:
# Legacy SDK call
model_legacy = genai_legacy.GenerativeModel(MODEL_NAME_LEGACY)
response = _call_with_backoff_legacy(model_legacy, contents, max_retries=4)
except Exception as e:
err = str(e)
if "429" in err or "quota" in err.lower():
return None, (
" You’ve exceeded your quota for image preview model. "
"Wait for quota reset or upgrade billing / permissions."
), None, ""
return None, f" API call failed: {err}", None, ""
progress(0.65, desc="Parsing response...")
if not response:
return None, " No response from model.", None, ""
# Robust decode that works for both SDKs and avoids double base64-decoding
image_bytes, mime = _extract_image_bytes_and_mime(response)
if image_bytes:
try:
# Persist to disk first (works even if PIL can't decode e.g., some WEBP builds)
outputs_dir = Path("outputs")
outputs_dir.mkdir(parents=True, exist_ok=True)
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
ext = _guess_ext_from_mime(mime or "image/png")
file_path = outputs_dir / f"celeb_fusion_{ts}{ext}"
with open(file_path, "wb") as f:
f.write(image_bytes)
# Try to open with PIL for preview; if it fails, return the file path (Gradio can render by path)
img_obj = None
try:
img_obj = Image.open(io.BytesIO(image_bytes))
except Exception:
img_obj = None
progress(0.95, desc="Rendering...")
# Optional auto-download via data-URI (no reliance on Gradio file routes)
auto_dl_html = ""
if auto_download:
b64 = base64.b64encode(image_bytes).decode("ascii")
fname = file_path.name
auto_dl_html = (
f"<a id='autodl' href='data:{mime};base64,{b64}' download='{fname}'></a>"
f"<script>(function(){{var a=document.getElementById('autodl');if(a) a.click();}})();</script>"
)
# Return PIL image if available, else the file path (both supported by gr.Image)
display_obj = img_obj if img_obj is not None else str(file_path)
return display_obj, f" Image generated ({mime}).", str(file_path), auto_dl_html
except Exception as e:
details = _format_candidate_reasons(response)
return None, f" Failed to write or load image: {e}\n\n{details}", None, ""
# If no image part → get text
texts = _collect_text(response, limit=2)
reasons = _format_candidate_reasons(response)
guidance = (
"\nTo get image output you need access to the preview image model "
"and sufficient free-tier quota or a billed project."
)
txt_msg = texts[0] if texts else "No text message."
debug = f"\n[Debug info] {reasons}" if reasons else ""
return None, f" Model returned text: {txt_msg}{guidance}{debug}", None, ""

輸入驗證：首先檢查使用者是否提供了 API 金鑰和兩張圖片。如果沒有，則立即返回錯誤訊息。
API 配置：使用 genai.configure() 函式，透過使用者的個人 API 金鑰建立與 Google 伺服器的連線。
圖片準備：將透過 Gradio 上傳的圖片轉換為 API 可以理解的格式（PIL 圖片），並使用我們的 _preprocess_image 輔助函式調整圖片大小。
提示和 API 呼叫：構建最終提示，將兩張圖片和我們的文字指令組合在一起。然後，使用我們可靠的 _call_with_backoff 函式呼叫 Gemini 模型。
響應處理：收到響應後，使用我們的輔助函式查詢影像資料。如果找到圖片，則對其進行解碼並將其返回到 UI。如果沒有找到，則查詢並返回文字訊息，以便使用者瞭解發生了什麼。

步驟 5：使用Gradio構建使用者介面

程式碼的最後一部分使用 Gradio 構建互動式網頁：

# Gradio UI
custom_css = """
.gradio-container { border-radius: 20px !important; box-shadow: 0 4px 20px rgba(0,0,0,0.05); }
#title { text-align: center; font-family: 'Helvetica Neue', sans-serif; font-weight: 700; font-size: 2.0rem; }
#subtitle { text-align: center; font-size: 1.0rem; margin-bottom: 1.0rem; }
.gr-button { font-weight: 600 !important; border-radius: 8px !important; padding: 12px 10px !important; }
#output_header { text-align: center; font-weight: bold; font-size: 1.2rem; }
footer { display: none !important; }
"""
with gr.Blocks(theme=gr.themes.Soft(), css=custom_css) as demo:
gr.Markdown(f"# {APP_TITLE}", elem_id="title")
gr.Markdown(
"Uses Gemini 2.5 Flash Image Preview model. Provide your API key (new SDK: google.genai) or legacy SDK key.",
elem_id="subtitle"
)
with gr.Accordion("Step 1: Enter Your Google AI API Key", open=True):
api_key_box = gr.Textbox(
label="API Key",
placeholder="Paste your Google AI API key here...",
type="password",
info="Ensure the key is from a project with billing and image preview access.",
interactive=True
)
with gr.Row(variant="panel"):
user_image = gr.Image(type="numpy", label="Upload your picture", height=350)
celeb_image = gr.Image(type="numpy", label="Upload celeb picture", height=350)
with gr.Row():
auto_dl = gr.Checkbox(label="Auto-download after generation", value=False)
generate_btn = gr.Button("Generate Image", variant="primary")
gr.Markdown("### Output", elem_id="output_header")
output_image = gr.Image(label="Generated Image", height=500, interactive=False, show_download_button=True)
output_text = gr.Markdown(label="Status / Message")
# Download button fed by the function (returns a filepath)
download_btn = gr.DownloadButton("Download Image", value=None, size="md")
# Hidden HTML slot to run the client-side auto-download script (data URI)
auto_dl_html = gr.HTML(visible=False)
generate_btn.click(
fn=generate_image_with_celeb,
inputs=[api_key_box, user_image, celeb_image, auto_dl],
outputs=[output_image, output_text, download_btn, auto_dl_html]
)
if __name__ == "__main__":
# Ensure example/output dirs exist
Path("./examples").mkdir(parents=True, exist_ok=True)
Path("./outputs").mkdir(parents=True, exist_ok=True)
demo.launch(debug=True)

佈局 (gr.Blocks)：我們使用 gr.Blocks 建立自定義佈局。我們還傳入了 custom_css 字串來設定元件的樣式。
元件：頁面上的每個元素，例如標題 (gr.Markdown)、API 金鑰輸入欄位 (gr.Textbox) 和圖片上傳框 (gr.Image)，都建立為 Gradio 元件。
排列：兩個圖片上傳框等元件放置在 gr.Row 中並排顯示。API 金鑰欄位位於可摺疊的 gr.Accordion 中。
按鈕和輸出：我們定義“Generate”按鈕 (gr.Button) 以及顯示結果的元件 (output_image 和 output_text)。
事件處理 (.click())：這是將 UI 與 Python 邏輯連線起來的關鍵。這行程式碼告訴 Gradio：“點選 generate_btn 按鈕後，執行 generate_image_with_celeb 函式。將 api_key_box、user_image 和 celeb_image 的值作為輸入，並將結果放入 output_image 和 output_text 中。”

上線：在Hugging Face上部署Gradio應用

這個專案的一大亮點在於 Gradio 應用的部署非常簡單。我們將使用 Hugging Face Spaces，這是一個免費的機器學習演示託管平臺。

Create a Hugging Face Account: If you do not have one, sign up at huggingface.co.建立 Hugging Face 帳戶：如果您還沒有帳戶，請訪問 huggingface.co 註冊。
建立新空間：在您的個人資料中，點選“New Space”。
- 空間名稱：為其指定一個唯一的名稱（例如，celebrity-selfie-generator）。
- 許可證：選擇一個許可證（例如，mit）。
- 選擇 Space SDK：選擇 Gradio。
- 硬體：免費的“CPU basic”選項即可滿足需求。
- 點選“Create Space”。
上傳檔案：
- 在新空間中，導航至“Files and versions”選項卡。
- 點選“Add file”，然後點選“Upload files”。
- 選擇您的 app.py 和 requirements.txt 檔案並上傳。

就這樣！Hugging Face Spaces 將自動安裝所需的庫並啟動您的應用程式。片刻之後，您的應用程式將上線，供全球任何人使用。由於該應用程式要求每個使用者輸入自己的 API 金鑰，因此您無需擔心伺服器端金鑰的管理。

點選此處檢視“Selfie with a Celeb”應用程式！

輸入：

輸入影像 | Nano Banana App

輸出：

Nano-Banana-App 輸出

您需要提供您的 Gemini API 金鑰、上傳照片並新增一張名人圖片。點選“Generate”後，應用將在幾分鐘內處理並交付您的輸出。輸出效果自然逼真，兩張圖片之間高度一致。

使用您的 API 金鑰，用您自己的照片和您最喜歡的名人圖片試試吧！

小結

現在，您已經擁有了構建自己的病毒式 AI 影像應用的完整藍圖。我們探討了 Google 的 Nano Banana 模型 (Gemini 2.5 Flash Image) 如何生成高度逼真、一致的輸出，以及如何輕鬆地將其與 Gradio 和 Hugging Face Spaces 等框架整合。最棒的是，您可以自定義提示、調整介面，甚至可以將這個想法擴充套件為全新的應用。只需幾個步驟，您就可以將這個專案從概念變為現實，並創造出真正值得分享的作品。

您正在構建哪款 Nano Banana 應用？請在下方評論區告訴我！

Gradio Hugging Face Nano Banana

使用Nano Banana開發明星合影應用程式：Selfie with a Celeb

文章目录

專案概述

步驟 1：設定您的專案

步驟 2：匯入和初始設定

步驟 3：用於實現穩健API互動的輔助函式

步驟 4：主要生成邏輯

步驟 5：使用Gradio構建使用者介面

上線：在Hugging Face上部署Gradio應用

小結

評論留言

取消回覆

使用Nano Banana開發明星合影應用程式：Selfie with a Celeb

文章目录

專案概述

步驟 1：設定您的專案

步驟 2：匯入和初始設定

步驟 3：用於實現穩健API互動的輔助函式

步驟 4：主要生成邏輯

步驟 5：使用Gradio構建使用者介面

上線：在Hugging Face上部署Gradio應用

小結

相關文章

評論留言

取消回覆