Textual Inversion
>Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts.
CLIP LはSDXLでもSD3.5でも使われているから、LoRAと違ってStable diffuion 1.5からの資産を一応引き継げるのか

SDXL以降は複数のテキストエンコーダを併用してるから効果は薄まるのだろうけど