Midjourney用新的影象到文字生成器翻轉公式

Midjourney宣佈了一個新的”/describe”命令，允許使用者利用強大的人工智慧（AI）平臺將影象轉化為文字，顛覆了Midjourney將文字轉換為影象的典型程式。

Paul DelSignore在Medium上描述了這一功能，他寫道：”describe” 對廣泛的使用案例有許多重大好處。

Today we’re releasing a /describe command that lets you transform images-into-words. Give it a shot! We think this tool will transform your liguistic-visual process both in terms of creative power and discovery.

今天，我們釋出了一個 /describe 命令，可讓您將影象轉換為文字。試一試！我們認為該工具將在創造力和發現方面改變您的語言視覺過程。-Midjourney

描述功能的一個最好的方面是，它應該提高無障礙性。對於有視覺障礙的人來說，瀏覽網頁可能是一種挑戰。通過描述圖片的Alt文字元素，使其更容易被訪問。手動建立這些Alt元素是很耗時的，而Midjourney的描述功能可能會克服這一障礙。

改進的搜尋功能幾乎對每個網際網路使用者都有好處。當圖片包含更好、更豐富的描述時，搜尋引擎可以更有效地索引圖片。

DelSignore還強調了標題的重要性，因為詳細的標題有助於解釋影象，為觀眾提供更清晰的資訊。

影象到文字的生成與Midjourney的文字到影象系統創造了一個有趣的反饋迴圈。雖然Midjourney的使用者已經可以根據選擇生成類似的影象，但影象到文字的工具可能使其更容易為文字到影象生成器開發替代的和可能更有成效的描述。

Gonna remix one of my images I created with Element 3D on AE

Using the /describe function to see what it says on #midjourney v5 is really interesting for prompt generation so will now see what they make.

將重新混合我在 AE 上使用 Element 3D 建立的其中一張圖片

使用 /describe 函式檢視它在#midjourney v5 上的內容對於提示生成來說真的很有趣，所以現在將看看它們做了什麼。

–GooRee

在目前的迭代中，就像它的文字到影象生成器一樣，Midjourney將為上傳的影象建立四個不同的文字描述。還可以根據選定的描述生成新的變化。要上傳一張照片，使用者在文字欄位中寫下”/describe”，然後出現一個拖放式的上傳欄位。

然後，使用者可以從生成的描述中選擇一個，並使用新的文字提示 “remix” 上傳的圖片。使用者還可以編輯文字提示，為創作過程增加一個新的控制元素。

測試Midjourney的新/describe功能

PetaPixel測試了該功能，首先使用了主編Jaron Schneider拍攝的肖像。

主編Jaron Schneider拍攝的肖像圖片來源：Jaron Schneider

Midjourney的四個生成的描述質量不一。

young woman in polka dot dress standing on top of hill, in the style of pensive portraiture, lens flares, voigtlander heliar 15mm f/4.5, troubadour style, close up, calarts, simple
a young woman wearing a black and white polka dot dress standing, in the style of hazy landscapes, pensive portraiture, sun-soaked colors, candid portraiture, tumblewave, troubadour style, close up
a young woman in black and white checkered dress standing on slope of hill, in the style of anamorphic lens flare, polka dots, california plein air, pensive stillness, sun-soaked colors, candid portraits, delicate
leigh wyatt | san francisco weddings|daughtersmaidsoflake benson | san francisco, in the style of neo-pop sensibility, gentle lyricism

前兩個描述相當不錯，尤其是第二個描述。有趣的是，Midjourney描述了一個特定的Voigtlander 15mm prime鏡頭，儘管為了記錄，該影象是用Tamron 35mm f/1.8 prime拍攝的。使用第二個描述來生成一個remix，導致了相當令人印象深刻的結果。

midjourney基於主編Jaron Schneider拍攝的肖像生成影象

Midjourney v5使用 “a young woman wearing a black and white polka dot dress standing, in the style of hazy landscapes, pensive portraiture, sun-soaked colors, candid portraiture, tumblewave, troubadour style, close up“的原始肖像描述而生成的四張AI圖片。

使用施耐德的另一張圖片，這次是一張來自加利福尼亞莫諾湖的風景圖片，Midjourney再次產生了大部分有用的文字描述，儘管有關於莫諾湖的錯誤位置資訊。

加利福尼亞莫諾湖的風景圖片圖片來源：Jaron Schneider

monolake, las vegas, utah, united states of america near crystal, in the style of shot on 70mm, mikalojus konstantinas ciurlionis, post processing, 32k uhd, antoni gaudí, hazy landscapes, fenghua zhong
mono lake at sunset after a rain, in the style of focus stacking, light sky-blue and bronze, 32k uhd, national geographic photo, stock photo, dansaekhwa
mono lake, utah, in the style of 32k uhd, balanced symmetry, american tonalist, hazy, dreamlike quality, nikon d850, fenghua zhong
mono lake, california, sunrise photograph 1, in the style of 32k uhd, isolated landscapes, low depth of field

使用第三種描述作為remix提示，Midjourney提供了四個非常逼真的新影象。

midjourney基於加利福尼亞莫諾湖的風景圖片重新生成影象

四張由人工智慧生成的影象，基於 “mono lake, utah, in the style of 32k uhd, balanced symmetry, american tonalist, hazy, dreamlike quality, nikon d850, fenghua zhong”

Midjourney的/describe工具很吸引人，即使在其早期狀態。這個工具應該可以幫助創作者製作更詳細的Alt文字、標題，甚至不同的AI生成的藝術品。雖然描述的某些部分令人費解，但至少可以說，它們顯示了前景。