Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training data for reference_video #45

Open
jamie212 opened this issue Oct 25, 2023 · 13 comments
Open

Training data for reference_video #45

jamie212 opened this issue Oct 25, 2023 · 13 comments

Comments

@jamie212
Copy link

Hello, your work is excellent! I want to train the 'reference_video' model, but you only provided the method for placing the data. May I ask what dataset you used? And where can I download it?

@SerialLain3170
Copy link
Owner

Sorry for the late response. I had prepared one episode of 18 animation titles (about 540=30x18 min) from official channels on YouTube.

@jamie212
Copy link
Author

jamie212 commented Nov 7, 2023

Sorry for the late response. I had prepared one episode of 18 animation titles (about 540=30x18 min) from official channels on YouTube.

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches? May I ask which official channels these were from? Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

@SerialLain3170
Copy link
Owner

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches?

Yes. They were originally from the channel, but they seem to have stopped providing. I think that any 18 animation episodes are OK as far as they are from different animations to make sure the robustness of the model.

Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

Please refer to #14

@jamie212
Copy link
Author

jamie212 commented Nov 13, 2023

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches?

Yes. They were originally from the channel, but they seem to have stopped providing. I think that any 18 animation episodes are OK as far as they are from different animations to make sure the robustness of the model.

Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

Please refer to #14

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir? Wouldn't that be too many? Because it seems from the paper that they only use very short videos

@SerialLain3170
Copy link
Owner

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir?

Yes about all the questions. I used 30 fps.

Wouldn't that be too many? Because it seems from the paper that they only use very short videos

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

@jamie212
Copy link
Author

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

OK! Thank you for your response, I will give it a try.

@jamie212
Copy link
Author

jamie212 commented Dec 6, 2023

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir?

Yes about all the questions. I used 30 fps.

Wouldn't that be too many? Because it seems from the paper that they only use very short videos

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

Hello, I have a few questions regarding training:

  1. Did you put folders from 18 different anime into the DATA_PATH? I'm asking because it seems from the paper that only data from one anime was used for training, and data from other anime were used for testing. I just want to confirm.
  2. Could you please explain what 'validsize' and 'anime_dir' in the param.yaml file are used for, and what should they be set to?
  3. In your code, is there a testing process included, or do I need to write my own code for inference?

@SerialLain3170
Copy link
Owner

I am really sorry for the response. First, I need to mention that I am not referring to the original paper rigorously. I just borrowed the ideas from Method (Section 3) and shot selection (former part of Section 4.1). So, I have not been careful about the dataset selection.

  1. Yes. anime_dir parameter in param.yaml is used to set what animation (directory name in DATA_PATH) is utilized to train. If you have 10 animations for training and 3 datasets for testing, you should include 10 animations in anime_dir. For the 3 datasets for testing, you are required to write your own code to infer.
  2. validsize is the batch size during the validation. Sorry for confusion because it is very close to valid_size.
  3. It is already mentioned in 1, you need to write your own code for the inference.

@jamie212
Copy link
Author

I am really sorry for the response. First, I need to mention that I am not referring to the original paper rigorously. I just borrowed the ideas from Method (Section 3) and shot selection (former part of Section 4.1). So, I have not been careful about the dataset selection.

  1. Yes. anime_dir parameter in param.yaml is used to set what animation (directory name in DATA_PATH) is utilized to train. If you have 10 animations for training and 3 datasets for testing, you should include 10 animations in anime_dir. For the 3 datasets for testing, you are required to write your own code to infer.
  2. validsize is the batch size during the validation. Sorry for confusion because it is very close to valid_size.
  3. It is already mentioned in 1, you need to write your own code for the inference.

Thank you very much for your response. I have trained the model myself, and the ctn part seems to be fine.(picture1) However, the visualize images produced during the training of tcn are showing up in gray. I have written some testing code and it outputs a similar result.(picture 2) Do you have any idea what might be the problem?
截圖 2023-12-19 下午4 05 59
截圖 2023-12-19 下午4 03 54

@jamie212
Copy link
Author

As I mentioned before, it seems that there is an issue with the code for TCN. The visualized images saved during training, when processed through TCN, result in outputs that are all in gray. However, this is normal with CTN. I would like to inquire whether I made a mistake in my operation or if this behavior is expected.

@SerialLain3170
Copy link
Owner

I am really sorry for the late response. I do not have the solid answer for your question. I confirmed that the training TCN was unstable, and changing the hyperparameter (batch size and learning rate) led to the stable behavior. Could you try to increase batch size or decrease learning rate? If this does not work, I do not have any ideas.

@jamie212
Copy link
Author

jamie212 commented Jan 3, 2024

I am really sorry for the late response. I do not have the solid answer for your question. I confirmed that the training TCN was unstable, and changing the hyperparameter (batch size and learning rate) led to the stable behavior. Could you try to increase batch size or decrease learning rate? If this does not work, I do not have any ideas.

Increasing my batch size causes CUDA out of memory:( I will try reducing the learning rate to see if it helps. Thank you for your suggestion. However, I want to confirm, when you say 'unstable', are you referring to the situation with the gray images?

@SerialLain3170
Copy link
Owner

However, I want to confirm, when you say 'unstable', are you referring to the situation with the gray images?

Yes. Situation with the grey images would be a result of mode collapse, so the trained generator has found an easy solution. You might be required to add the regularization loss term to avoid the mode collapse. If decreasing learning rate does not work, it might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants