> Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
> DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
> F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
> GLIGEN: Open-Set Grounded Text-to-Image Generation
> ImageBind: One Embedding Space To Bind Them All
> Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
> MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
> Planning-oriented Autonomous Driving
> SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
> VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation