generated at
Vision Transformer
>We release a new CLIP ViT-G/14 CLIP model with OpenCLIP which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (Recall@5) on MS COCO. As of January 2023, this is the best open source CLIP model.
>



単にViTというと/motoso/Vision Transformerだと思う基素