/work4ai/Vision Transformer

generated at 2/12/2025, 6:48:10 PM

Vision Transformer
CLIP ViT-H/14 CLIP ViT-H/14
CLIP ViT-L/14
>We release a new CLIP ViT-G/14 CLIP model with OpenCLIP which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (Recall@5) on MS COCO. As of January 2023, this is the best open source CLIP model.
>https://t.co/TmVTUP3tBx
>https://t.co/PMnpUUTNpc LAION
>
https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k

LAION
https://arxiv.org/abs/2302.05442


単にViTというと/motoso/Vision Transformerだと思う