About the future works #141

huyduong7101 · 2024-06-14T04:34:34Z

In the scope of human-related video generation, there are two main and emergent problems, namely, Talking Face Generation (TFG) and Human Animation Generation (HAG). The discrepancy between those problems is what inputs we feed into the models (I assume that models here are Diffusion-based):

For TFG, it is audio + image/video
For HAG, it is pose + image/video.

Hence, I wonder there are any studies now adopt an approach to merge two problems into one? If not, what are the obstacles now? (Data, Modeling, ...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the future works #141

About the future works #141

huyduong7101 commented Jun 14, 2024 •

edited

Loading

About the future works #141

About the future works #141

Comments

huyduong7101 commented Jun 14, 2024 • edited Loading

huyduong7101 commented Jun 14, 2024 •

edited

Loading