Fix colab pretraining notebook #22

sokffa · 2022-07-01T12:00:01Z

The colab pretraining notebook is not updated. There's a lot of bugs on the code (related to paths and inexistent files)
https://colab.research.google.com/drive/1IUGYn-fMRbjH2IyYoAn5VKSzEkaXyP2s

youngt913 · 2022-07-06T16:36:22Z

CHECK THIS OUT https://www.youtube.com/watch?v=Kwhqj93wyXU

Twenkid · 2022-07-16T14:58:08Z

@sokffa Yes. I just tried to fix it, I managed until the last cell which had some incompatibility, outdated/mismatched versions or so which I couldn't fix yet. I tried many things, installing a proper version of basicsr, but there is a strange mismatch, one source file is different in the cloned repository (basicsr/utils/options.py) and in the installation path, with different calls to parse, one is parse(...) the other is parse_options. I couldn't fit it so far, either by installing basicsr with !python -m pip install basicsr

Or with cloning it from the repository and "setup.py install", or even a nasty copying to the python installation location:
cp /content/wav2lip-hq/basicsr/utils/options.py /usr/local/lib/python3.7/dist-packages/basicsr-1.4.0-py3.7.egg/basicsr/utils/options.py (but if the versions have big difference it would probably fail in another place).

Maybe importing from the local folder (not the system's installation), that is nasty too, I guess it may work if the paths and imports are adjusted properly.

The other fixes:

!mkdir data

for:

!mkdir data/gt
!mkdir data/lq
!mkdir data/hq

Import cv2, ignore tqdm:

import os, cv2
paths = os.listdir("data/gt")

#for img_path in tqdm(paths): 
for img_path in paths:
    img = cv2.imread("data/gt/" + img_path)
    img = cv2.resize(img, (384, 384))
    cv2.imwrite("data/hq/" + img_path, img)

davidchateau · 2022-08-17T19:35:52Z

Hello,
I tried to fix the notebook, here is what I did so far:

duplicate the notebook ("file" -> "save a copy in drive")
"runtime" -> "change runtime type" -> "GPU" or else I have an error about no GPU available
add "!mkdir data" before the other "mkdir"s
downgrade torchvision to avoid deprecation warnings:
!pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
install basicsr. I downloaded the code of every previous version of the library in order to find the ones where "parse" exists in "basicsr.utils.options" -> it is versions <= 1.3.3.10. I had errors installing versions < 1.3.3.4 so I went with 1.3.3.4.
!pip3 install https://files.pythonhosted.org/packages/8c/ac/74f4e34fdbc7d3d9233a6f02a740ddb446d75551fbb6ed0c4243c4511a86/basicsr-1.3.3.4.tar.gz#sha256=b448cf9efa4ff2ca75109d3aac36ef50d6e08b0bcb310ebef57ed88c09a2d2ba
create log files directory structure because I had errors about it:
!mkdir /content/wav2lip-hq/experiments/
!mkdir /content/wav2lip-hq/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/
stop pretraining mode as mentioned in Training without pretrained state #17
!sed -i '/resume_state/d' /content/wav2lip-hq/train_basicsr.yml

Now I have error "Input spatial size must be 128x128, but received torch.Size([4, 3, 384, 384])" and I know that I could change images resize to 128*128 but then I have another error.

Maybe @Markfryazino has an old working environment and can give us the details of libraries versions and/or proper pth files?

Note: I'm trying both to train the model for proper lipsync AND use deepfacelab as mentioned above.

Thank you!

AIMads · 2022-08-20T13:07:58Z

Hello, I tried to fix the notebook, here is what I did so far:

duplicate the notebook ("file" -> "save a copy in drive")

"runtime" -> "change runtime type" -> "GPU" or else I have an error about no GPU available

add "!mkdir data" before the other "mkdir"s

downgrade torchvision to avoid deprecation warnings:
!pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

install basicsr. I downloaded the code of every previous version of the library in order to find the ones where "parse" exists in "basicsr.utils.options" -> it is versions <= 1.3.3.10. I had errors installing versions < 1.3.3.4 so I went with 1.3.3.4.
!pip3 install https://files.pythonhosted.org/packages/8c/ac/74f4e34fdbc7d3d9233a6f02a740ddb446d75551fbb6ed0c4243c4511a86/basicsr-1.3.3.4.tar.gz#sha256=b448cf9efa4ff2ca75109d3aac36ef50d6e08b0bcb310ebef57ed88c09a2d2ba

create log files directory structure because I had errors about it:
!mkdir /content/wav2lip-hq/experiments/
!mkdir /content/wav2lip-hq/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/

stop pretraining mode as mentioned in Training without pretrained state #17
!sed -i '/resume_state/d' /content/wav2lip-hq/train_basicsr.yml

Now I have error "Input spatial size must be 128x128, but received torch.Size([4, 3, 384, 384])" and I know that I could change images resize to 128*128 but then I have another error.

Maybe @Markfryazino has an old working environment and can give us the details of libraries versions and/or proper pth files?

Note: I'm trying both to train the model for proper lipsync AND use deepfacelab as mentioned above.

Thank you!

Great work! I have identified the same issues. But also stuck at the 128x128 error.

Twenkid · 2022-08-20T19:38:09Z

Good job! What is the other error you get after resizing? I think 4,3,384,384 means batch size 4, 3 channels etc. Shouldn't you resize your input to 384x384 rather than to 128, what is the size of yours? Because 128x128 doesn't sound as much HQ to me, the normal wav2lip is 96x96. На ср, 17.08.2022 г., 22:36 ч. davidchateau ***@***.***> написа:

…

Hello, I tried to fix the notebook, here is what I did so far: - duplicate the notebook ("file" -> "save a copy in drive") - "runtime" -> "change runtime type" -> "GPU" or else I have an error about no GPU available - add "!mkdir data" before the other "mkdir"s - downgrade torchvision to avoid deprecation warnings: !pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html - install basicsr. I downloaded the code of every previous version of the library in order to find the ones where "parse" exists in "basicsr.utils.options" -> it is versions <= 1.3.3.10. I had errors installing versions < 1.3.3.4 so I went with 1.3.3.4. !pip3 install https://files.pythonhosted.org/packages/8c/ac/74f4e34fdbc7d3d9233a6f02a740ddb446d75551fbb6ed0c4243c4511a86/basicsr-1.3.3.4.tar.gz#sha256=b448cf9efa4ff2ca75109d3aac36ef50d6e08b0bcb310ebef57ed88c09a2d2ba - create log files directory structure because I had errors about it: !mkdir /content/wav2lip-hq/experiments/ !mkdir /content/wav2lip-hq/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/ - stop pretraining mode as mentioned in #17 <#17> !sed -i '/resume_state/d' /content/wav2lip-hq/train_basicsr.yml Now I have error "Input spatial size must be 128x128, but received torch.Size([4, 3, 384, 384])" and I know that I could change images resize to 128*128 but then I have another error. Maybe @Markfryazino <https://github.com/Markfryazino> has an old working environment and can give us the details of libraries versions and/or proper pth files? Note: I'm trying both to train the model for proper lipsync AND use deepfacelab as mentioned above. Thank you! — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFSI7WFVKI2JGHJTD32LYATVZU5KHANCNFSM52MQARHQ> . You are receiving this because you commented.Message ID: ***@***.***>

AIMads · 2022-08-21T08:12:15Z

Good job! What is the other error you get after resizing? I think 4,3,384,384 means batch size 4, 3 channels etc. Shouldn't you resize your input to 384x384 rather than to 128, what is the size of yours? Because 128x128 doesn't sound as much HQ to me, the normal wav2lip is 96x96. На ср, 17.08.2022 г., 22:36 ч. davidchateau @.> написа:
…
Hello, I tried to fix the notebook, here is what I did so far: - duplicate the notebook ("file" -> "save a copy in drive") - "runtime" -> "change runtime type" -> "GPU" or else I have an error about no GPU available - add "!mkdir data" before the other "mkdir"s - downgrade torchvision to avoid deprecation warnings: !pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html - install basicsr. I downloaded the code of every previous version of the library in order to find the ones where "parse" exists in "basicsr.utils.options" -> it is versions <= 1.3.3.10. I had errors installing versions < 1.3.3.4 so I went with 1.3.3.4. !pip3 install https://files.pythonhosted.org/packages/8c/ac/74f4e34fdbc7d3d9233a6f02a740ddb446d75551fbb6ed0c4243c4511a86/basicsr-1.3.3.4.tar.gz#sha256=b448cf9efa4ff2ca75109d3aac36ef50d6e08b0bcb310ebef57ed88c09a2d2ba - create log files directory structure because I had errors about it: !mkdir /content/wav2lip-hq/experiments/ !mkdir /content/wav2lip-hq/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/ - stop pretraining mode as mentioned in #17 <#17> !sed -i '/resume_state/d' /content/wav2lip-hq/train_basicsr.yml Now I have error "Input spatial size must be 128x128, but received torch.Size([4, 3, 384, 384])" and I know that I could change images resize to 128128 but then I have another error. Maybe @Markfryazino https://github.com/Markfryazino has an old working environment and can give us the details of libraries versions and/or proper pth files? Note: I'm trying both to train the model for proper lipsync AND use deepfacelab as mentioned above. Thank you! — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSI7WFVKI2JGHJTD32LYATVZU5KHANCNFSM52MQARHQ . You are receiving this because you commented.Message ID: @.*>

Yes so my idea was to resize the LQ set too 128x128 and feed them into the model instead of the GT as it made more sense and then also upsamle the HQ images to 512x512 as that is still keep the 4x difference required. My problem is the LQ images are still detected as having a size of 96x96 🙃

Twenkid · 2022-08-21T18:33:06Z

@AIMads I'll check it out when I could, but that sounds as the neural model has fixed input size (architecture) of 96x96, so I guess your desired solution wouldn't fit so simply, it may require to refactor the NN arch.

BTW, one alternative of this library is to use the basic wav2lip and then Deepfacelab with a self-generating model. I used this method for my deepfakes with my custom DFL modification for grayscale training, and that way I repair and upscale the bad and broken mouths from wav2lip to smooth 192x192 faces. My videos "Lena Schwarzenegger announces Arnold',s return in Red Heat 2 ..." https://youtu.be/4F7PB7wBEXk
and this more advanced one: "Arnold reacts to ..."
https://youtu.be/X56QkNzkkVM
..are produced with this method. See my comment to the latter which explains the technique.
It can be applied for higher upscaling too, with respective DFL model if you have good GPU. For 192x192 (and grayscale in my use case) it looks good.

xjw00654 · 2022-09-29T02:28:39Z

The original ESRGAN is trained with GT with 128x128 or 256x256. So the Discriminative will ask the input size to be the same size.
So, just do some modification as followed:

    # in /wav2lip-hq/basicsr/archs/discriminator_arch.py
    # .......
        self.linear1 = nn.Linear(num_feat * 8 * 12 * 12, 100)  # 4 * 4 ->12 * 12, where 12 = 384 // 2 **4, 4 is the downsample rate
        self.linear2 = nn.Linear(100, 1)
    def forward(self, x): 
        # assert x.size(2) == 128 and x.size(3) == 128, (f'Input spatial size must be 128x128, '
        #                                                f'but received {x.size()}.')

It will takes more memory in GPU, since we increase the input size. In this case, it consumes around 12GB memory with batchsize = 4 and Input size =384x384

rookiexyz · 2022-10-14T14:59:27Z

Sorry for the dumb question but what am I doing wrong? I did all the steps carefully and at the end, getting this error

Traceback (most recent call last):
  File "basicsr/train.py", line 221, in <module>
    train_pipeline(root_path)
  File "basicsr/train.py", line 132, in train_pipeline
    result = create_train_val_dataloader(opt, logger)
  File "basicsr/train.py", line 74, in create_train_val_dataloader
    train_set = build_dataset(dataset_opt)
  File "/usr/local/lib/python3.7/dist-packages/basicsr/data/__init__.py", line 34, in build_dataset
    dataset = DATASET_REGISTRY.get(dataset_opt['type'])(dataset_opt)
  File "/usr/local/lib/python3.7/dist-packages/basicsr/data/paired_image_dataset.py", line 65, in __init__
    self.paths = paired_paths_from_folder([self.lq_folder, self.gt_folder], ['lq', 'gt'], self.filename_tmpl)
  File "/usr/local/lib/python3.7/dist-packages/basicsr/data/data_util.py", line 213, in paired_paths_from_folder
    assert len(input_paths) == len(gt_paths), (f'{input_key} and {gt_key} datasets have different number of images: '
AssertionError: lq and gt datasets have different number of images: 3750, 4243.

davidchateau · 2022-12-21T17:17:22Z

Hello, I fixed the training notebook.
I was able to train, and also resume training (because colab notebooks don't like background processes, even if you buy compute units, so you have to train, download the models, and resume if you get disconnected)
I trained a model up to iteration 100.000 but results are not good when running inference. I'm not talking about video quality, but about the quality of lip syncing. Maybe I'm doing something wrong? I'll investigate further.
Here is the colab notebook, I suggest you duplicate it in your own google drive first
https://colab.research.google.com/drive/1fWCy4Vri2FKrVV7q50ybL_ftzE4bz0Od?usp=sharing
Please share your results (video before / after lipsync, along with the audio file, maybe the file used for training?)
Regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix colab pretraining notebook #22

Fix colab pretraining notebook #22

sokffa commented Jul 1, 2022

youngt913 commented Jul 6, 2022

Twenkid commented Jul 16, 2022 •

edited

Loading

davidchateau commented Aug 17, 2022

AIMads commented Aug 20, 2022

Twenkid commented Aug 20, 2022 via email

AIMads commented Aug 21, 2022

Twenkid commented Aug 21, 2022

xjw00654 commented Sep 29, 2022 •

edited

Loading

rookiexyz commented Oct 14, 2022

davidchateau commented Dec 21, 2022 •

edited

Loading

Fix colab pretraining notebook #22

Fix colab pretraining notebook #22

Comments

sokffa commented Jul 1, 2022

youngt913 commented Jul 6, 2022

Twenkid commented Jul 16, 2022 • edited Loading

davidchateau commented Aug 17, 2022

AIMads commented Aug 20, 2022

Twenkid commented Aug 20, 2022 via email

AIMads commented Aug 21, 2022

Twenkid commented Aug 21, 2022

xjw00654 commented Sep 29, 2022 • edited Loading

rookiexyz commented Oct 14, 2022

davidchateau commented Dec 21, 2022 • edited Loading

Twenkid commented Jul 16, 2022 •

edited

Loading

xjw00654 commented Sep 29, 2022 •

edited

Loading

davidchateau commented Dec 21, 2022 •

edited

Loading