Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[duplicate prompts found]Clarification on Prompt Diversity and Action Generation #45

Open
bring-nirachornkul opened this issue Nov 25, 2024 · 3 comments

Comments

@bring-nirachornkul
Copy link

bring-nirachornkul commented Nov 25, 2024

Dear Intergen,

I have been reviewing the dataset for the InterGen project and noticed that many prompts for specific actions, such as "dancing," are highly similar, with minimal variations in wording. Below are 50 examples related to dancing from the dataset:

Dancing : 50 sequences

5088 - two people are dancing together.
5320 - two people practice dancing together.
5326 - two individuals are dancing together.
5382 - two individuals are dancing together.
5416 - two people are dancing together.
5504 - two people are dancing together.
5559 - two people are dancing together.
5587 - two people are dancing together.
5716 - the two persons are dancing together.
5782 - two people are dancing together.
5825 - two individuals are dancing together.
5917 - two people are breakdancing.
5926 - two people are dancing together.
5941 - two individuals are dancing together gracefully.
5997 - two individuals are dancing together.
6011 - two persons are dancing.
6035 - the two individuals are dancing together.
6043 - two people are dancing in pairs.
6077 - two individuals are dancing separately.
6096 - two people are dancing together.
6145 - two individuals are dancing together.
6159 - two individuals are dancing together.
6232 - two people are dancing together.
6237 - two individuals are dancing together.
6247 - two people are dancing together.
6286 - the two individuals are dancing together.
6299 - two persons are dancing together.
6311 - the two people are dancing together.
6401 - two people are dancing together.
6409 - they are dancing together.
6420 - the two persons are dancing together.
6436 - two people are dancing together.
6466 - two people are dancing together.
6478 - two individuals are dancing together.
6495 - two people are dancing together.
6506 - two persons are dancing together.
6533 - the two individuals are dancing together.
6544 - two people are dancing together.
6568 - the two individuals are dancing together.
6587 - the two individuals are dancing together.
6596 - two individuals are dancing together.
6619 - two individuals are dancing together.
6629 - the two persons are dancing.
6671 - two people are dancing gracefully.
6739 - two persons are dancing together.
6744 - two persons are dancing together.
6867 - the two people are dancing together.
6870 - two people are dancing together.
6877 - two people are dancing.
6939 - two people are dancing a ballroom dance together.
6944 - the two individuals are dancing together.

taichi : 17 sequences

2851 - two individuals are practicing tai chi together.
2855 - two individuals are practicing tai chi.
2863 - two people are practicing tai chi.
2867 - two individuals are practicing tai chi.
2913 - two individuals are practicing tai chi.
2918 - two people practicing tai chi.
2922 - two people are practicing tai chi.
2929 - two people are practicing tai chi.
2956 - two persons are practicing tai chi.
2963 - two people are practicing tai chi.
2967 - two individuals are practicing tai chi.
2986 - two individuals are practicing tai chi.
3683 - two people are practicing tai chi together.
3771 - two people are practicing tai chi.
4479 - two people are practicing tai chi.
4952 - two individuals are practicing tai chi.
7059 - two people practicing tai chi together.

sparring : 28 sequences

562 - two people are sparring in taekwondo, exchanging kicks with one another.
635 - the two are sparring in taekwondo.
1399 - the two are sparring in taekwondo, exchanging kicks and strikes.
1716 - two performers are sparring in the ring, throwing punches at one another.
3017 - two persons are sparring using fists.
3030 - two individuals are sparring with each other.
3055 - two persons are sparring with each other.
3057 - two individuals are sparring with each other.
3059 - two individuals are sparring with each other.
3137 - the two people are sparring with martial arts techniques.
3246 - two individuals are sparring with each other.
3249 - two individuals are sparring against each other.
3253 - two individuals are sparring with each other.
3256 - two individuals are sparring with each other.
3258 - two individuals are sparring with each other.
3260 - two individuals are sparring with each other.
3591 - two individuals are sparring with each other.
3593 - two people are sparring against each other.
3595 - two persons are sparring with each other.
3597 - the two people are sparring in martial arts.
3673 - two people are sparring with each other.
3675 - two individuals are sparring with each other.
3677 - two individuals are sparring each other.
3679 - two people are sparring against each other.
3681 - two people are sparring against each other.
3855 - two individuals are sparring with each other.
3857 - the two people are sparring in martial arts.
3859 - two individuals are sparring with each other.

rock-paper-scissors : 4 sequences

2753 - two individuals are playing a game of rock-paper-scissors.
2756 - two individuals are playing a game of rock-paper-scissors.
2759 - two people are playing a game of rock-paper-scissors.
3381 - the two people are playing rock-paper-scissors.

Given this level of similarity, could you clarify how the model is expected to generate distinct and meaningful actions based on such closely related prompts? Additionally, do these similar tokenized inputs limit the diversity of generated actions, and if so, how does the system address this?

I appreciate your time in clarifying this matter.

Best regards,

Phongsiri

@tr3e
Copy link
Owner

tr3e commented Nov 26, 2024

Dear Phongsiri,

Thank you for your thoughtful review and for bringing up the concerns regarding the similarity in the dataset prompts, specifically related to the action "dancing". Your observations are insightful.

To address your concerns:

The dataset intentionally includes prompts with minimal variations to test the model's sensitivity to subtle linguistic cues that might influence the generated motions. For instance, terms like "gracefully" or "breakdancing" indicate different styles and energies of dancing, which are meant to prompt slight variations in the generated motions. This is part of our effort to refine the model's ability to discern and react to nuanced differences in human interaction descriptions.

Although the textual annotations are similar (since the semantic category of these motions are "dance"), each motion captured is unique, which does not limit but rather enhances the diversity. For example, the diffusion model inherently has the capability to model such diversity effectively. Hence, similar annotation in this context is not a problem but an opportunity to refine the model's ability to generate nuanced variations of similar actions.

Thank you once again for your interests in our work and the detailed review.

Best regards,
Han

@bring-nirachornkul
Copy link
Author

bring-nirachornkul commented Nov 26, 2024

Dear Han,

Thank you for your earlier response. While I appreciate the augmentation methods mentioned in the paper, they appear to be tied primarily to the evaluation process rather than addressing redundancy in the raw dataset.

As I began training for over 20,000 epochs, I noticed some concerning patterns:

  • Certain sequences appear to be duplicates with only slight variations in descriptions.
  • Some sequences are entirely blank(7 sequences), such as the following examples:
    2258 - no modification made.
    4193 - transition 
    4385 - transition  
    4434 - transition  
    6028 - transition  
    6940 - transition  
    7220 - pass  
    7221 - pass  
    

This raises two key questions:

  1. Do the reported 7,779 sequences include repeated motions with slightly different descriptions?
  2. Were blank sequences, like those above, considered in the dataset statistics, and if so, how were they addressed during training?

These issues impact the diversity and usability of the dataset during real-world training. I’d greatly appreciate clarification on how these potential redundancies and anomalies are handled in the dataset preparation and evaluation stages.

Best regards,

Phongsiri

@tr3e
Copy link
Owner

tr3e commented Nov 26, 2024

hi, to address your questions:

  1. No, each sequence represents a unique motion. The descriptions may appear similar because all sequences fall under a broad category and are sub-segments of recorded long-term motions within this category. Consequently, the annotators may use similar language to describe them due to their semantic similarities.

  2. The sequences you mentioned are transition motions, not blank ones. In our experiments, we retained these samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants