Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the future works #141

Open
huyduong7101 opened this issue Jun 14, 2024 · 0 comments
Open

About the future works #141

huyduong7101 opened this issue Jun 14, 2024 · 0 comments

Comments

@huyduong7101
Copy link

huyduong7101 commented Jun 14, 2024

In the scope of human-related video generation, there are two main and emergent problems, namely, Talking Face Generation (TFG) and Human Animation Generation (HAG). The discrepancy between those problems is what inputs we feed into the models (I assume that models here are Diffusion-based):

  • For TFG, it is audio + image/video
  • For HAG, it is pose + image/video.

Hence, I wonder there are any studies now adopt an approach to merge two problems into one? If not, what are the obstacles now? (Data, Modeling, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant