This paper introduces InsTaG, a 3D talking head synthesis framework that allows a fast learning of realistic personalized 3D talking head from few training data. Built upon a lightweight 3DGS person-specific synthesizer with universal motion priors, InsTaG achieves high-quality and fast adaptation while preserving high-level personalization and efficiency.
As preparation, we first propose an Identity-Free Pre-training strategy that enables the pre-training of the person-specific model and encourages the collection of universal motion priors from long-video data corpus. To fully exploit the universal motion priors to learn an unseen new identity, we then present a Motion-Aligned Adaptation strategy to adaptively align the target head to the pre-trained field, and constrain a robust dynamic head structure under few training data.
Extensive experiments demonstrate our outstanding performance and efficiency under various data scenarios to render high-quality personalized talking head videos.
For preparation, InsTaG collects the common knowledge of talking motion from a long-video corpus by Identity-Free Pre-training, storing it as a motion field. Given a short video with new identity, Motion-Aligned Adaptation strategy builds a robust and fast person-specific synthesizer with the pre-trained motion field to learn a high-quality personalized 3D talking head.
Comparison with current SOTA baselines. Zoom in for better visualization.
@inproceedings{li2025instag,
title={InsTaG: Learning Personalized 3D Talking Head from Few-Second Video},
author={Li, Jiahe and Zhang, Jiawei and Bai, Xiao and Zheng, Jin and Zhou, Jun and Gu, Lin},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
year={2025}
}