Motion Transfer with Dedicated-Purpose Models: High-Quality Generation with Enhanced Inference Efficiency

Thank you for your valuable comments! Our framework incorporates two components: pose-guided appearance generation and local enhancement. Both of these components employ generative adversarial networks (GANs), resulting in an extended training duration but an improved generation quality.

Motion transfer models can be classified into two categories: dedicated-purpose models and general-purpose models. Dedicated-purpose models excel in generating fake videos of a specific person, offering high video quality at the expense of longer training time. On the other hand, general-purpose models can generate fake videos of any person, requiring less training time but yielding less satisfactory generation results compared to dedicated-purpose models. In this paper, we concentrate on dedicated-purpose models.

It proves challenging to conduct a fair comparison of time and resource consumption among different models. However, we would like to emphasize that although our dedicated-purpose model requires a longer training time than the general-purpose model, it exhibits shorter inference time. More specifically, during the inference phase, our approach achieves an average Frames Per Second (FPS) of 15. In terms of model training resources, we utilize a server equipped with NVIDIA GeForce RTX 2080 Ti GPUs.

Motion Transfer with Dedicated-Purpose Models: High-Quality Generation with Enhanced Inference Efficiency