/motoso/Stable diffusionのimg2imgをGTX1070（VRAM 8GB）で使う

generated at 2/20/2025, 5:50:02 PM

Stable diffusionのimg2imgをGTX1070（VRAM 8GB）で使う
from stable diffusionで下絵から画像を出す

img2img(stable diffusion)を動かしたい
Stable diffusionでVRAMが10GB未満のGPUをつかっているならfloat16 precisionのパイプラインを使うわけで、こちらもfloat16 precisionにしたい

コードを次のように変更したら動いた
diffdiff --git a/scripts/img2img.py b/scripts/img2img.py
index 421e215..1a4f3ba 100644
--- a/scripts/img2img.py
+++ b/scripts/img2img.py
@@ -49,9 +49,17 @@ def load_img(path):
     image = Image.open(path).convert("RGB")
     w, h = image.size
     print(f"loaded input image of size ({w}, {h}) from {path}")
-    w, h = map(lambda x: x - x % 32, (w, h))  # resize to integer multiple of 32
+    ar = w/h
+    if(w > h):
+        h = 512
+        w = int(ar*h)
+    else:
+        w = 512
+        h = int(w/ar)
+    w, h = map(lambda x: x - x % 16, (w, h))  # resize to integer multiple of 16
+    print(f"resized image of size ({w}, {h})")
     image = image.resize((w, h), resample=PIL.Image.LANCZOS)
-    image = np.array(image).astype(np.float32) / 255.0
+    image = np.array(image).astype(np.float16) / 255.0
     image = image[None].transpose(0, 3, 1, 2)
     image = torch.from_numpy(image)
     return 2.*image - 1.
@@ -198,6 +206,7 @@ def main():

     config = OmegaConf.load(f"{opt.config}")
     model = load_model_from_config(config, f"{opt.ckpt}")
+    # https://github.com/CompVis/stable-diffusion/issues/71#issuecomment-1224699343
+    model.half()

     device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
     model = model.to(device)

入力画像を512x512にして通ることを確認した
注意：10-18行目の解像度変更は、書いたはいいものの使えるのか検証していない

解説
https://github.com/CompVis/stable-diffusion/issues/71#issuecomment-1224699343
model.half()をつかうとfloat16にできる これでallocateしに行く量が半分になる（2.64->1.32）
zshRuntimeError: CUDA out of memory. Tried to allocate 1.32 GiB (GPU 0; 8.00 GiB total capacity; 6.05 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
512x800ぐらいだとこれでも足りなかった
入力ソースを512x512にしたら通った

試行錯誤のログ
2715x1642の画像を読んだらVRAM不足になった
zshRuntimeError: CUDA out of memory. Tried to allocate 4.18 GiB (GPU 0; 8.00 GiB total capacity; 4.09 GiB already allocated; 2.40 GiB free; 4.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
縮小処理をして512ベースにしてみたが、だめだった
zshRuntimeError: CUDA out of memory. Tried to allocate 2.64 GiB (GPU 0; 8.00 GiB total capacity; 5.39 GiB already allocated; 406.09 MiB free; 5.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF