Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing part and confusing place #47

Open
massyzs opened this issue Dec 2, 2024 · 7 comments
Open

Missing part and confusing place #47

massyzs opened this issue Dec 2, 2024 · 7 comments

Comments

@massyzs
Copy link

massyzs commented Dec 2, 2024

Hi,
In training script, you use InterGen (train.py, def build_models: ...)
But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

@bring-nirachornkul
Copy link

After working on this project since June 2024, I feel like they have manipulated the InterCLIP checkpoints. I cannot achieve the same evaluation results as those reported in the paper.

There is something you can check in your dataset. You will find a lot of duplicates, which they CLAIM:

"Although the textual annotations are similar (since the semantic category of these motions is 'dance'), each motion captured is unique. This does not limit but rather enhances diversity. For example, the diffusion model inherently has the capability to model such diversity effectively. Hence, similar annotations in this context are not a problem but an opportunity to refine the model's ability to generate nuanced variations of similar actions."
[Source](#45)

Dancing : 50 sequences

5088 - two people are dancing together.
5320 - two people practice dancing together.
5326 - two individuals are dancing together.
5382 - two individuals are dancing together.
5416 - two people are dancing together.
5504 - two people are dancing together.
5559 - two people are dancing together.
5587 - two people are dancing together.
5716 - the two persons are dancing together.
5782 - two people are dancing together.
5825 - two individuals are dancing together.
5917 - two people are breakdancing.
5926 - two people are dancing together.
5941 - two individuals are dancing together gracefully.
5997 - two individuals are dancing together.
6011 - two persons are dancing.
6035 - the two individuals are dancing together.
6043 - two people are dancing in pairs.
6077 - two individuals are dancing separately.
6096 - two people are dancing together.
6145 - two individuals are dancing together.
6159 - two individuals are dancing together.
6232 - two people are dancing together.
6237 - two individuals are dancing together.
6247 - two people are dancing together.
6286 - the two individuals are dancing together.
6299 - two persons are dancing together.
6311 - the two people are dancing together.
6401 - two people are dancing together.
6409 - they are dancing together.
6420 - the two persons are dancing together.
6436 - two people are dancing together.
6466 - two people are dancing together.
6478 - two individuals are dancing together.
6495 - two people are dancing together.
6506 - two persons are dancing together.
6533 - the two individuals are dancing together.
6544 - two people are dancing together.
6568 - the two individuals are dancing together.
6587 - the two individuals are dancing together.
6596 - two individuals are dancing together.
6619 - two individuals are dancing together.
6629 - the two persons are dancing.
6671 - two people are dancing gracefully.
6739 - two persons are dancing together.
6744 - two persons are dancing together.
6867 - the two people are dancing together.
6870 - two people are dancing together.
6877 - two people are dancing.
6939 - two people are dancing a ballroom dance together.
6944 - the two individuals are dancing together.

taichi : 17 sequences

2851 - two individuals are practicing tai chi together.
2855 - two individuals are practicing tai chi.
2863 - two people are practicing tai chi.
2867 - two individuals are practicing tai chi.
2913 - two individuals are practicing tai chi.
2918 - two people practicing tai chi.
2922 - two people are practicing tai chi.
2929 - two people are practicing tai chi.
2956 - two persons are practicing tai chi.
2963 - two people are practicing tai chi.
2967 - two individuals are practicing tai chi.
2986 - two individuals are practicing tai chi.
3683 - two people are practicing tai chi together.
3771 - two people are practicing tai chi.
4479 - two people are practicing tai chi.
4952 - two individuals are practicing tai chi.
7059 - two people practicing tai chi together.

sparring : 28 sequences

562 - two people are sparring in taekwondo, exchanging kicks with one another.
635 - the two are sparring in taekwondo.
1399 - the two are sparring in taekwondo, exchanging kicks and strikes.
1716 - two performers are sparring in the ring, throwing punches at one another.
3017 - two persons are sparring using fists.
3030 - two individuals are sparring with each other.
3055 - two persons are sparring with each other.
3057 - two individuals are sparring with each other.
3059 - two individuals are sparring with each other.
3137 - the two people are sparring with martial arts techniques.
3246 - two individuals are sparring with each other.
3249 - two individuals are sparring against each other.
3253 - two individuals are sparring with each other.
3256 - two individuals are sparring with each other.
3258 - two individuals are sparring with each other.
3260 - two individuals are sparring with each other.
3591 - two individuals are sparring with each other.
3593 - two people are sparring against each other.
3595 - two persons are sparring with each other.
3597 - the two people are sparring in martial arts.
3673 - two people are sparring with each other.
3675 - two individuals are sparring with each other.
3677 - two individuals are sparring each other.
3679 - two people are sparring against each other.
3681 - two people are sparring against each other.
3855 - two individuals are sparring with each other.
3857 - the two people are sparring in martial arts.
3859 - two individuals are sparring with each other.

rock-paper-scissors : 4 sequences

2753 - two individuals are playing a game of rock-paper-scissors.
2756 - two individuals are playing a game of rock-paper-scissors.
2759 - two people are playing a game of rock-paper-scissors.
3381 - the two people are playing rock-paper-scissors.
  • Some sequences are entirely blank(7 sequences), such as the following examples:
    2258 - no modification made.
    4193 - transition 
    4385 - transition  
    4434 - transition  
    6028 - transition  
    6940 - transition  
    7220 - pass  
    7221 - pass  
    

I trained same as epoch they mentions, but the result is totally different.

Dear Author, can you clarify these duplications again? How you have 10 actions in one similar sentence. If this happen for real then it means that the model can generate the 10 actions of dancing, right? then how could we proof accurate like you did.

@massyzs
Copy link
Author

massyzs commented Dec 2, 2024

After working on this project since June 2024, I feel like they have manipulated the InterCLIP checkpoints. I cannot achieve the same evaluation results as those reported in the paper.

There is something you can check in your dataset. You will find a lot of duplicates, which they CLAIM:

"Although the textual annotations are similar (since the semantic category of these motions is 'dance'), each motion captured is unique. This does not limit but rather enhances diversity. For example, the diffusion model inherently has the capability to model such diversity effectively. Hence, similar annotations in this context are not a problem but an opportunity to refine the model's ability to generate nuanced variations of similar actions." [Source](#45)

Dancing : 50 sequences

5088 - two people are dancing together.
5320 - two people practice dancing together.
5326 - two individuals are dancing together.
5382 - two individuals are dancing together.
5416 - two people are dancing together.
5504 - two people are dancing together.
5559 - two people are dancing together.
5587 - two people are dancing together.
5716 - the two persons are dancing together.
5782 - two people are dancing together.
5825 - two individuals are dancing together.
5917 - two people are breakdancing.
5926 - two people are dancing together.
5941 - two individuals are dancing together gracefully.
5997 - two individuals are dancing together.
6011 - two persons are dancing.
6035 - the two individuals are dancing together.
6043 - two people are dancing in pairs.
6077 - two individuals are dancing separately.
6096 - two people are dancing together.
6145 - two individuals are dancing together.
6159 - two individuals are dancing together.
6232 - two people are dancing together.
6237 - two individuals are dancing together.
6247 - two people are dancing together.
6286 - the two individuals are dancing together.
6299 - two persons are dancing together.
6311 - the two people are dancing together.
6401 - two people are dancing together.
6409 - they are dancing together.
6420 - the two persons are dancing together.
6436 - two people are dancing together.
6466 - two people are dancing together.
6478 - two individuals are dancing together.
6495 - two people are dancing together.
6506 - two persons are dancing together.
6533 - the two individuals are dancing together.
6544 - two people are dancing together.
6568 - the two individuals are dancing together.
6587 - the two individuals are dancing together.
6596 - two individuals are dancing together.
6619 - two individuals are dancing together.
6629 - the two persons are dancing.
6671 - two people are dancing gracefully.
6739 - two persons are dancing together.
6744 - two persons are dancing together.
6867 - the two people are dancing together.
6870 - two people are dancing together.
6877 - two people are dancing.
6939 - two people are dancing a ballroom dance together.
6944 - the two individuals are dancing together.

taichi : 17 sequences

2851 - two individuals are practicing tai chi together.
2855 - two individuals are practicing tai chi.
2863 - two people are practicing tai chi.
2867 - two individuals are practicing tai chi.
2913 - two individuals are practicing tai chi.
2918 - two people practicing tai chi.
2922 - two people are practicing tai chi.
2929 - two people are practicing tai chi.
2956 - two persons are practicing tai chi.
2963 - two people are practicing tai chi.
2967 - two individuals are practicing tai chi.
2986 - two individuals are practicing tai chi.
3683 - two people are practicing tai chi together.
3771 - two people are practicing tai chi.
4479 - two people are practicing tai chi.
4952 - two individuals are practicing tai chi.
7059 - two people practicing tai chi together.

sparring : 28 sequences

562 - two people are sparring in taekwondo, exchanging kicks with one another.
635 - the two are sparring in taekwondo.
1399 - the two are sparring in taekwondo, exchanging kicks and strikes.
1716 - two performers are sparring in the ring, throwing punches at one another.
3017 - two persons are sparring using fists.
3030 - two individuals are sparring with each other.
3055 - two persons are sparring with each other.
3057 - two individuals are sparring with each other.
3059 - two individuals are sparring with each other.
3137 - the two people are sparring with martial arts techniques.
3246 - two individuals are sparring with each other.
3249 - two individuals are sparring against each other.
3253 - two individuals are sparring with each other.
3256 - two individuals are sparring with each other.
3258 - two individuals are sparring with each other.
3260 - two individuals are sparring with each other.
3591 - two individuals are sparring with each other.
3593 - two people are sparring against each other.
3595 - two persons are sparring with each other.
3597 - the two people are sparring in martial arts.
3673 - two people are sparring with each other.
3675 - two individuals are sparring with each other.
3677 - two individuals are sparring each other.
3679 - two people are sparring against each other.
3681 - two people are sparring against each other.
3855 - two individuals are sparring with each other.
3857 - the two people are sparring in martial arts.
3859 - two individuals are sparring with each other.

rock-paper-scissors : 4 sequences

2753 - two individuals are playing a game of rock-paper-scissors.
2756 - two individuals are playing a game of rock-paper-scissors.
2759 - two people are playing a game of rock-paper-scissors.
3381 - the two people are playing rock-paper-scissors.
  • Some sequences are entirely blank(7 sequences), such as the following examples:
    2258 - no modification made.
    4193 - transition 
    4385 - transition  
    4434 - transition  
    6028 - transition  
    6940 - transition  
    7220 - pass  
    7221 - pass  
    

I trained same as epoch they mentions, but the result is totally different.

Dear Author, can you clarify these duplications again? How you have 10 actions in one similar sentence. If this happen for real then it means that the model can generate the 10 actions of dancing, right? then how could we proof accurate like you did.

The rendered video in my side is totally white video. Do you know how to fix this?

@tr3e
Copy link
Owner

tr3e commented Dec 2, 2024

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work!
The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model.
While the InterGen is the main Generative Model for motion generation.
They are two different things, thus yielding different parameters :)

@massyzs
Copy link
Author

massyzs commented Dec 2, 2024

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi,
Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?

In addition, the rendered videos are all white, do you have any idea about this?

@tr3e
Copy link
Owner

tr3e commented Dec 2, 2024

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?

In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

@massyzs
Copy link
Author

massyzs commented Dec 3, 2024

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?
In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

Hi, really thanks and it works now.

One more question:

May you share the dataset visualization code? Or would you mind share the link of other project that can directly visualize your dataset? Dataset is really impressive and useful.

@tr3e
Copy link
Owner

tr3e commented Dec 4, 2024

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?
In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

Hi, really thanks and it works now.

One more question:

May you share the dataset visualization code? Or would you mind share the link of other project that can directly visualize your dataset? Dataset is really impressive and useful.

you can use this visualization code :)
https://github.com/davrempe/humor/tree/main/humor/viz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants