generated at
PyLLMCoreでBakLLaVAを動かしてみる
参考

PyLLMCoreライブラリを使ったけれど、LLaVA(VLM)は対応したばかりなので多分今後だいぶ変わるnomadoor

PyLLMCoreインストール
$ python -m venv venv
$ venv/Scripts/activate
$ pip install py-llm-core

量子化されたBakLLaVAモデルとCLIPのダウンロード
BakLLaVAは以下の場所に置く
$ C:\Users\ユーザー名\.cache\py-llm-core\models
CLIPは好きな場所に置く

コードを書く
main.py
from llm_core.llm import LLaVACPPModel model = "BakLLaVA-1-Q4_K_M.gguf" llm = LLaVACPPModel( name=model, llama_cpp_kwargs={ "logits_all": True, "n_ctx": 8000, "verbose": False, "n_gpu_layers": 100, #: Set to 0 if you don't have a GPU "n_threads": 1, #: Set to the number of available CPU cores "clip_model_path": r"F:\BakLLaVA\model\BakLLaVA-1-clip-model.gguf", } ) llm.load_model() history = [ { 'role': 'user', 'content': [ {'type': 'image_url', 'image_url': "http://localhost:8000/pexels-photo-4946625.jpg"} ] } ] response = llm.ask('Please describe this image as accurately as possible', history=history) print(response.choices[0].message.content)
なぜか画像をurlでしか指定できないので、 python -m http.server でローカルサーバーを作って無理やりurlで指定する

実行
$ python main.py

結果
BakLLaVA
> The image features a kitchen with a counter that has various items on it, including several coffee cups and bottles. There is a sink located in the middle of the counter. A large, colorful world map is displayed on the wall behind the counter, adding a unique touch to the space.
> In addition to the cups and bottles, there are also some bowls placed on the counter. The kitchen appears to be well-equipped for coffee making and serving, with multiple cups and bottles available for use.
> There is a brown wooden cabinet and a kitchen countertop in the picture. On the left side of the cabinet, there is a black sink and two silver water pipes. There are also two thermos and several bottles of soap on the countertop. On the right side of the cabinet, there is a coffee maker and a black electric stove. There are two cups and a black electronic thermometer on the countertop. Behind the cabinet is a blue world map, with English, Chinese, and Latin letters on it.
>english_text, no_humans, window, table, scenery, paper

BakLLaVA
>The image features a young woman with black hair, wearing a red bow and sporting a large, red bubble in her mouth. She is also wearing an Asian-style school uniform with a tie. Her expression appears to be a mix of innocence and rebellion, as she chews on the gum with attitude.
CogVLM
>A cartoon girl wearing a gray sailor suit is blowing bubble gum. She has a black hair tie and a red tie. Her eyes are red, and she is wearing a black collar. The background is white.
WD14-tagger
>1girl, solo, breasts, looking_at_viewer, bangs, simple_background, shirt, black_hair, red_eyes, jewelry, school_uniform, monochrome, upper_body, earrings, serafuku, choker, sailor_collar, hair_bun, neckerchief, eyelashes, piercing, single_hair_bun, ear_piercing, red_neckerchief, spot_color, bubble_blowing, chewing_gum

LoRA学習のデータセットのキャプションをLLaVAにやらせようと思ったけれどどうだろうnomadoor
WD14-taggerはアニメ系のキャラにはめっぽう強いな