/motoso/negative prompt - Scrapbox Reader

generated at 2/20/2025, 5:21:42 PM
negative prompt
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt
AUTOMATIC1111版Stable Diffusion web UIでAUTOMATIC1111が初めて採用した手法
サンプリング時にunconditional条件で空のプロンプトで行うのではなく、ユーザー指定のテキストを使用する
サンプラは次のことを繰り返す
promptを誘導する画像ノイズ除去を行う（promptによるconditional）
例：お髭のおじさん
negative promptを見えるように誘導する画像のノイズを除去
例：お髭
後者から前者への方向に近づけようとする
そういう方向にテンソルを作るのだと思う
つまり、ガイドするテンソルの始点を決めるのがnegative prompt
実際にはnegative promptをtext encoderで処理したテンソルになるはず
https://stable-diffusion-art.com/how-negative-prompt-work/

Automaticの実装がこうなってるのか確認してみる
コードの読み方はとても雑なのであっているか不明
すでに結構大きいので読みづらい

結局こんなことをやっていた
入力されたnegative promptをunconditional conditionに変換する
Samplerにconditionとunconditional conditionを与え、サンプリングを実行する
ゴールとしてこれらの条件を混ぜ込んで一つのテンソルにして計算している
このテンソルを元に画像を更新する
ひたすらループを回す
うーん、それぞれのプロンプトにおいてデノイジング処理が見当たらない
サンプラーの実装だからか〜
後者から前者への方向に近づけようとする処理の部分を読んだわけだ
 get_learned_conditioning() を読まないと1, 2ステップはわからない

negative promptがunconditional conditionに変換される
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/433b3ab7017556a19173a86d1215ed0a0b5b1396/modules/processing.py#L642
negative promptを get_learned_conditioning() で変換する
pydef get_learned_conditioning(model, prompts, steps):
    """converts a list of prompts into a list of prompt schedules - each schedule is a list of ScheduledPromptConditioning, specifying the comdition (cond),
    and the sampling step at which this condition is to be replaced by the next one.

    Input:
    (model, ['a red crown', 'a [blue:green:5] jeweled crown'], 20)

    Output:
    [
        [
            ScheduledPromptConditioning(end_at_step=20, cond=tensor([[-0.3886,  0.0229, -0.0523,  ..., -0.4901, -0.3066,  0.0674], ..., [ 0.3317, -0.5102, -0.4066,  ...,  0.4119, -0.7647, -1.0160]], device='cuda:0'))
        ],
        [
            ScheduledPromptConditioning(end_at_step=5, cond=tensor([[-0.3886,  0.0229, -0.0522,  ..., -0.4901, -0.3067,  0.0673], ..., [-0.0192,  0.3867, -0.4644,  ...,  0.1135, -0.3696, -0.4625]], device='cuda:0')),
            ScheduledPromptConditioning(end_at_step=20, cond=tensor([[-0.3886,  0.0229, -0.0522,  ..., -0.4901, -0.3067,  0.0673], ..., [-0.7352, -0.4356, -0.7888,  ...,  0.6994, -0.4312, -1.2593]], device='cuda:0'))
        ]
    ]
    """
このテンソルをcondition（promptから作ったテンソル）と同じ次元にする
py# for DDIM, shapes must match, we can't just process cond and uncond independently;
# filling unconditional_conditioning with repeats of the last vector to match length is
# not 100% correct but should work well enough
if unconditional_conditioning.shape[1] < cond.shape[1]:
    last_vector = unconditional_conditioning[:, -1:]
    last_vector_repeated = last_vector.repeat([1, cond.shape[1] - unconditional_conditioning.shape[1], 1])
    unconditional_conditioning = torch.hstack([unconditional_conditioning, last_vector_repeated])
elif unconditional_conditioning.shape[1] > cond.shape[1]:
    unconditional_conditioning = unconditional_conditioning[:, :cond.shape[1]]
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/8a34671fe91e142bce9e5556cca2258b3be9dd6e/modules/sd_samplers_compvis.py#L88-L97

Smaplerにconditionとunconditional conditionsを与える
実際にはこのconditionの形式はScheduledPromptConditioningで、テンソル
Samplerの指定は色々あるようだが、CompVisの実装を呼ぶ場合ここで選択される
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/8a34671fe91e142bce9e5556cca2258b3be9dd6e/modules/sd_samplers_compvis.py#L14
sampleメソッドでlaunch_samplingをする
pysamples_ddim = self.launch_sampling(steps, lambda: self.sampler.sample(S=steps, conditioning=conditioning, batch_size=int(x.shape[0]), shape=x[0].shape, verbose=False, unconditional_guidance_scale=p.cfg_scale, unconditional_conditioning=unconditional_conditioning, x_T=x, eta=self.eta)[0])
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/8a34671fe91e142bce9e5556cca2258b3be9dd6e/modules/sd_samplers_compvis.py#L218
ここではDDIMの実装に与えたとして読んでいく
実態としてはこのあたりが呼ばれるようだ
https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/diffusion/ddim.py
When you use the import statement in Python, the interpreter searches for the specified module in a list of directories specified by the  sys.path  variable. This variable is initialized from the  PYTHONPATH  environment variable and some default locations such as the current working directory and the standard library directories.
If the module is found, it is loaded and made available for use in the current script. If it is not found, an  ImportError  is raised.
For example, when you run  import ldm.models.diffusion.ddim , Python will search for a file named ddim.py in a directory named ldm/models/diffusion within one of the directories specified in sys.path. If it is found, the code within that file will be executed and any objects defined within it will be made available for use in your script.
手元にリポジトリを持ってきてinstallしてもddim.pyは見当たらない
sampleメソッドの引数が一致する
https://github.com/CompVis/latent-diffusion/blob/66df437e52826a5149a1c20dcc9f0be0abd0f685/ldm/models/diffusion/ddim.py#L56
ddim_sampling()が繰り返しp_sample_ddim()をよぶ。ここでモデルの適用が1ステップ行われる

p_sampl_ddimの中身

sampler（DDIM）は、unconditional conditionがある場合、image, uc + cond, ts（time step？）を与えてモデルを適用する
適用した結果の e_t_uncond ,  e_t を使ってe_tを更新する
e_tの意味がわからない
promptとnegative promptを加味したテンソル？
 e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond) 
この式は、e_tを∇cond, e_t_uncondを∇uncond , unconditional_guidance_scaleをCFGnormと読みかえると
(1- CFGnorm)∇uncond + CFGnorm∇condになり、classifier-free guidance#6427feac774b170000f2ad53の式と一致する

最終的に2つを返す
https://github.com/CompVis/latent-diffusion/blob/66df437e52826a5149a1c20dcc9f0be0abd0f685/ldm/models/diffusion/ddim.py#L203
py   return x_prev, pred_x0
1.   pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt() 
xは元画像、sqrt_one_minus_atは4次元のテンソル
atはalpha_tのようだ
係数を無視すると、e_t→元画像のテンソルで、e_tは最終的なゴールのような概念だと思うからゴールから元画像方向のテンソル？
元画像方向からゴールのテンソルにならないとへんじゃない？
多分このx_0方向の推定値
Denoising Diffusion Probabilistic Models
DDIMはDDPMそのままの式ではないからこの図のとおりではないらしい
陰解法が使われているらしい
2. このstepで生成された画像x_prev
 x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise 
文字通り現時点のxから見た時の一つ前の画像の予測値に見える
aはalpha。dir_xt（x_t方向）
サンプリングの結果のstep更新では2番目（pred_x0）を使う
py def after_sample(self, x, ts, cond, uncond, res):
        if not self.is_unipc:
            self.update_step(res[1]) # res[1] = pred_x0

        return x, ts, cond, uncond, res
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/8a34671fe91e142bce9e5556cca2258b3be9dd6e/modules/sd_samplers_compvis.py#L127-L131

ループ中の一つ前の結果を使ってループを回す
https://github.com/CompVis/latent-diffusion/blob/66df437e52826a5149a1c20dcc9f0be0abd0f685/ldm/models/diffusion/ddim.py#L148
pred_x0は変数として撮ってはいるが利用していない（生成時に利用している）