Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories

GLIGEN: Open-Set Grounded Text-to-Image Generation

ImageBind: One Embedding Space To Bind Them All

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation