generated at
clip-ViT-L-14
ViTとはVisual Transformer