VASA-1 is an AI framework developed by Microsoft Research that can generate lifelike talking faces from a single static image and a speech audio clip. The main innovations include a model for holistically generating facial dynamics and head movements in the latent space of the face, as well as developing an expressive and disentangled latent face space using videos. VASA-1 can produce realistic lip movements that are exquisitely synchronized with the audio, along with a wide range of facial nuances and natural head movements that contribute to authenticity and liveliness. It enables real-time generation of 512×512 videos at up to 40 FPS and paves the way for lifelike avatars capable of mimicking human conversational behavior. For more information, you can visit the official project page .
AI-Framework: Microsoft VASA-1
by
Tags:
Leave a Reply