/motoso/stable diffusionで下絵から画像を出す

generated at 2/20/2025, 5:49:27 PM
stable diffusionで下絵から画像を出す
https://github.com/CompVis/stable-diffusion#image-modification-with-stable-diffusion
img2img(stable diffusion)
こういうやつがやりたい
>@8co28: ＃stablediffusion の ＃Img2Img (指定画像から画像を生成する)機能を使ってみました。
>3分で描いた指示用雑絵(2枚目)に絵の要素のプロンプトを指示し1枚目を生成しました。
>2枚とも生成時・指示時のもので、無編集。
>いや、すごい……
>

>@8co28: ＃Img2Img 指示絵失敗備忘録
>線と色が多い絵で手がある(1枚目)と当然上手くいかない(2枚目)
>パーツ指示が簡単な手がない絵(3枚目)だと割と良い結果が出る(4枚目)
>1枚目3枚目は自前の過去絵
>すでに出来上がってる絵を指示絵に使うのはまだ現実的じゃなさそう(簡略化した指示絵を作る必要がある)
>


環境構築
Stable diffusionのimg2imgをGTX1070（VRAM 8GB）で使う

 

zshpython scripts/img2img.py --prompt "long haired girl is standing on the moon. Her arms are crossed and she is looking at us with a smile. Behind her is the earth in space. makoto shinkai style." --init-img img/mito3.jpg --strength 0.9 --n_samples 2 --n_iter 2
img2imgは n_tier 2で行数を2行にできる

strength 0.9
右下は背景としては意図通り
strength 0.8 / 0.7
0.8 
右下、リムライトがいいね
0.7
左上、構図はアイレベルも意図通り
0.6 / 0.5
0.6
上段が良い
0.4/0.3
かなりもとの構図に似てきた。人は書き込まれてるけどお得意の背景が生きてない
0.2 / 0.1
生成が早い
元の絵と殆ど変わらないのでやる価値がない


考察
～0.2を使うことはない。
0.4程度でも書き込みの情報量は増えるので、ラフを書いて少し加筆してほしいときにつかえるかもしれない
0.5ごろから下絵と明らかに違うものが出てくる
0.6のこれは気に入った
0.8以降になるとAIが好きに書き始めて意外性が出る
0.8のこれは完全に異質
構図が違うがズームのこういう感じもいいかも。構図のバリエーション出してくれるのは一枚絵を書くときのブレストに使えるかも
ブレストだとAI二重に書かせるような大きな重み（promptの重みが大きくなる）0.9がよい
右上は設定が変わってしまっているがこれはこれで2000年代ラノベ表紙みたいな感じだし
右下も「さよなら地球（テラ）」みたいなタイトルの小説の表紙っぽいし
野暮ったい普段着と、頭を抑えているのにストーリーを感じるね
左下は手塚治虫の漫画にありそうだ（なんとなく）。自分は左下のような絵を絶対に描かないと思うから面白い

0.6ぐらいが使いやすいという意見
>@Pretty_Mundane: ちょっと補足
>最初にAIに読み込ませるイメージはラフなほうが良い出力結果が出やすいと感じました
>シルエットと大まかなライティング、色合いくらいまで描き込んでStrength 0.6で出力すると、良い感じにニュアンスを汲み取ってくれる



zshpython scripts/img2img.py --prompt "VTuber Tsukino Mito is standing on the moon. Her arms are crossed and she is looking at us with a smile full of herself. Behind her is the bright blue earth. Eye level is her knee. This is an animation so it is in 3D. The background is the earth with the moon in the sky" --init-img img/mito2.jpg --strength 0.8
https://github.com/CompVis/stable-diffusion
> strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
>strength は 0.0 から 1.0 の間の値で、入力画像に加えられるノイズの量を制御します。1.0に近い値では、多くのバリエーションが可能になりますが、入力と意味的に一致しない画像も生成されます。

 
左：下絵　右：生成物。悪くはないけどいまいち！アイレベルが違う。

n_samples 2でもいけたが
n-samples = 2にしても2x2にならないしrowsを指定しても2x2にならない
zshpython scripts/img2img.py --prompt "VTuber Tsukino Mito is standing on the moon. Her arms are crossed and she is looking at us with a smile full of herself. Behind her is the bright blue earth. Eye level is her knee. This is an animation so it is in 3D. The background is the earth with the moon in the sky" --init-img img/mito3.jpg --strength 0.8 --n_samples 2
コードを読んだら n_iter でループしてたので指定したら行増えた


Macだとできないらしい
>@s_ryuuki: >resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
>warnings.warn('resource_tracker: There appear to be %d '
>macだとダメだった。
>👀 stable diffusionで下絵から画像を出す - 基素基 https://t.co/UGhLSe7Mzg
公式はM1/M2対応は後からで、NVidiaのGPUを推奨している
/villagepump/@kidooom#63045086774b170000e03874