Incomplete training #30

fjxmlzn · 2022-07-25T16:32:26Z

Recently, I met with another problem.
I tried to run main.py in the example_training file and main_generate_data.py in the example_generating_data file. However, the result was that only a file named results was created. And in sub-files of 'results', there was only a worker_*.log.txt.
Q1: Why no synthetic datasets of [web/google/FCC_MBA] were generated?

I looked for whether there is a place in the code to specify the dataset path. But I found nothing.

Q2: When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz. Whether another codes need to be written to achieve this work?

At last, thank you for your continued patient answers.

Originally posted by @chameleonzz in #3 (comment)

chameleonzz · 2022-07-27T13:41:44Z

According to the previous 'work.log', I found that maybe my TF has something wrong (there were multiple TFs). Therefore, I re-install TF, and run example_training/main.py again. Now, the output is as follows.

In the aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False, file, the continet of work.log is as follows.

In the last raw, it showed: FileNotFoundError: [Errno 2] No such file or directory: '../results/aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False,\sample\epoch_id-69,batch_id--1,global_id-419,type-free,feature,output-0,dim-0.png'
�[0m
I think this is about to run successfully. I am debugging recently based on the worker.log suggestion.
Thank you very much for your continued help.

chameleonzz · 2022-08-01T07:58:05Z

I think I can run the DG rightly now.
To solve the above problem, I try to debug main.py in example_training(without_GPUTaskScheduler).
Then I amended doppelganer.py in gan. I deleted checkpoint_dir in the last row. And the code could run properly. It cost about 22 hours, just as follows. (My computer is i7-10750H CPU, NVIDIA GeForce RTX 2060 GPU and 32GB)

In the example_training(without_GPUTaskScheduler)/test, there are three files, including checkpoint, sample, and time.txt.
The checkpoint file includes many documents, as follows.

In addition, the sample file comprises a sea of picture files, as follows.
There are around 19,000 pictures and several npz files.

Is the right results of running example_training(without_GPUTaskScheduler)/main.py? If it is right, how to generate synthesis data of web/goggle/FCC_MBA?

fjxmlzn · 2022-08-01T16:32:43Z

Yes, it is the right result with this code.

Regarding the FileNotFoundError you posted in #30 (comment), it should have already been fixed in c2f4bfb in June 2022. Please re-clone the repo and rerun and check if that works.

Regarding data generation for web, you can use https://github.com/fjxmlzn/DoppelGANger/tree/master/example_generating_data(without_GPUTaskScheduler) (before re-runing the above training code).

The above "without_GPUTaskScheduler" version of training and generation codes are only for web dataset. For other datasets (google, FCC_MBA), you can either modify the hyper-parameters according to the config files https://github.com/fjxmlzn/DoppelGANger/blob/master/example_training/config.py, or directly use the version with GPUTaskScheduler (https://github.com/fjxmlzn/DoppelGANger/tree/master/example_training and https://github.com/fjxmlzn/DoppelGANger/tree/master/example_generating_data)

Let me know if you run into any issues with the code.

chameleonzz · 2022-08-15T12:31:17Z

Yes, it is the right result with this code.

Regarding the FileNotFoundError you posted in #30 (comment), it should have already been fixed in c2f4bfb in June 2022. Please re-clone the repo and rerun and check if that works.

Regarding data generation for web, you can use https://github.com/fjxmlzn/DoppelGANger/tree/master/example_generating_data(without_GPUTaskScheduler) (before re-runing the above training code).

The above "without_GPUTaskScheduler" version of training and generation codes are only for web dataset. For other datasets (google, FCC_MBA), you can either modify the hyper-parameters according to the config files https://github.com/fjxmlzn/DoppelGANger/blob/master/example_training/config.py or directly use the version with GPUTaskScheduler (https://github.com/fjxmlzn/DoppelGANger/tree/master/example_training and https://github.com/fjxmlzn/DoppelGANger/tree/master/example_generating_data)

Let me know if you run into any issues with the code.

After modifying example_training/config,py and other config*.py according to c2f4bfb, it also had the same error information after re-running the code, just as showed in 30(comment).

In the 'aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-,sample_len-,self_norm-False,\sample', there was only a npz file named 'epoch_id-69,batch_id--1,global_id-419,type-free,samples.npz'.
And in the 'aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,\sample', it had the same situation.
However, in the 'aux_disc-True,dataset-web,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-True,\sample', there were many files, including lots of pictures and two npz files. But the 'worker.log' also had the likely error information: 'FileNotFoundError: [Errno 2] No such file or directory: '..\results\aux_disc-True,dataset-web,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-True,\sample\epoch_id-0,batch_id-199,global_id-199,type-teacher,attribute,output-3,dim-0.png'
�[0m".

fjxmlzn · 2022-08-15T14:43:02Z

This looks weird. Could you please attach worker.log in these three folders here? Thank you!

chameleonzz · 2022-08-16T12:01:36Z

This looks weird. Could you please attach worker.log in these three folders here? Thank you!

OK, I sent you an email.

fjxmlzn · 2022-08-16T15:54:40Z

Thank you. Since I believe we found the root cause of this issue, I am closing this issue now.

For future readers of this thread, the issue is that Windows system has a max path length requirement, and a FileNotFoundError will be raised when writing to a path that exceeds this length.

To reduce the length of paths, we can add some keys into ignored_keys_for_folder_name in the config file so that they do not appear in the folder name. For example, we can change the top part of https://github.com/fjxmlzn/DoppelGANger/blob/master/example_training/config.py to

config = {
    "scheduler_config": {
        "gpu": ["0"],
        "config_string_value_maxlen": 1000,
        "result_root_folder": os.path.join("..", "results”),
	“ignored_keys_for_folder_name”: ['extra_checkpoint_freq', 'epoch_checkpoint_freq', 'aux_disc', 'self_norm']
    },

See https://github.com/fjxmlzn/GPUTaskScheduler for more details of the config options of GPUTaskScheduler. Alternatively, we can try moving the entire folder of DoppelGANger to a path that is shorter.

fjxmlzn mentioned this issue Jul 25, 2022

TF2 #3

Open

fjxmlzn closed this as completed Aug 16, 2022

dgtriantis mentioned this issue Feb 26, 2023

Training does not run although the input is of the required form #37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete training #30

Incomplete training #30

fjxmlzn commented Jul 25, 2022

chameleonzz commented Jul 27, 2022

chameleonzz commented Aug 1, 2022

fjxmlzn commented Aug 1, 2022 •

edited

Loading

chameleonzz commented Aug 15, 2022 •

edited

Loading

fjxmlzn commented Aug 15, 2022

chameleonzz commented Aug 16, 2022

fjxmlzn commented Aug 16, 2022

Incomplete training #30

Incomplete training #30

Comments

fjxmlzn commented Jul 25, 2022

chameleonzz commented Jul 27, 2022

chameleonzz commented Aug 1, 2022

fjxmlzn commented Aug 1, 2022 • edited Loading

chameleonzz commented Aug 15, 2022 • edited Loading

fjxmlzn commented Aug 15, 2022

chameleonzz commented Aug 16, 2022

fjxmlzn commented Aug 16, 2022

fjxmlzn commented Aug 1, 2022 •

edited

Loading

chameleonzz commented Aug 15, 2022 •

edited

Loading