Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Results Not Matching Demo Quality – Possible Overclaim? #38

Open
caliber1313 opened this issue Sep 4, 2024 · 6 comments
Open

Comments

@caliber1313
Copy link

Hello,
unfortunately, my results are nowhere close to the demo clip shown https://www.youtube.com/watch?v=c5VG7HkDs8I.

  • lips not moving, worse than reference papers, blurry.
  • The movements of the synthesized 3D talking head appear less accurate.
  • Despite using the same configurations mentioned in the paper, the model doesn’t seem to achieve the promised quality.

and Could you provide clarification on:

  1. The exact hyperparameters like seeds ? and dataset you used to train the model in the demo?
  2. Any additional specific pre-processing steps or adjustments that might help improve the quality?
  3. Whether the demo clip was further enhanced or fine-tuned in ways not covered in the training script?

here's my result (DeepSpeech), Is there anything I did wrong? I'm sure that I followed all the instructions : https://drive.google.com/file/d/1MC9O9c5Rtk5_GyTKUL0ak6wHt6qVZbb-/view

@Fictionarry
Copy link
Owner

output2_aud.mp4
output_aud.mp4

Hi, there are two models I just trained completely using the code in this repo with deepspeech, both of which are more reasonable than the results you provided. So I consider it must have some problems with your reproduction process. Please have a double-check.

Experiments in the paper are based on the code in this repo. All processes and hyperparameters are given. The only few adjustments can be seen in the submission history. They are to enhance the robustness of a wider range of data, which would not lower the performance.

@jarun-title
Copy link

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

@Fictionarry
Copy link
Owner

@Fictionarry Can you provide trained .pth files and all preprocessed data I can use to reproduce above result? I adjust the environment to work with cuda 12.1 and I'm not sure if it's the cause of expected result.

Here are the checkpoint and the estimated camera poses for May. Because the entire preprocessed data is a bit large, I'm afraid it's troublesome to upload it. You can first try if the performance can be well reproduced with the provided checkpoint, to find where the problem is located, whether the data preprocessing or the training stage.

I have tried the code with CUDA 11.7. In that situation, there seems no problem if pytorch is installed with the correct version (1.13.1 cuda 11.7 I used).

https://drive.google.com/drive/folders/14oKQz113I0jCGfbyq0SIJ4eVRddoO02D?usp=drive_link

@jarun-title
Copy link

Thank! I'll try

@sstzal
Copy link

sstzal commented Oct 29, 2024

@caliber1313 @jarun-title
Hi,
I have encountered the same problem, that is, the mouth of the generated face almost does not move. Have you solved the problem? Could you provide some experiences on this?

Thank you very much!

@Fictionarry
Copy link
Owner

@caliber1313 @jarun-title Hi, I have encountered the same problem, that is, the mouth of the generated face almost does not move. Have you solved the problem? Could you provide some experiences on this?

Thank you very much!

Hi, sorry for replying late. Does the problem still exist? If this can be reproduced on all our provided video samples, I guess it to be an environment problem. Otherwise, it may be caused by the initialization. You can try the code of this version https://github.com/Fictionarry/TalkingGaussian/tree/98aa6f729ec4e4dd0551fa8b389b375cafddd13f and decrease the select_interval in train_face.py if necessary. However, I have not encountered such a problem before and failed to reproduce it on two servers, so I'm not sure whether the tip would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants