Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained model #2

Open
KimWu1994 opened this issue Jul 28, 2022 · 12 comments
Open

Pretrained model #2

KimWu1994 opened this issue Jul 28, 2022 · 12 comments

Comments

@KimWu1994
Copy link

Can trained models be provided, especially on the GQA dataset.

@jeasinema
Copy link
Contributor

Sorry for the late reply. We're working towards an initial release of GQA models. The ETA is in the coming ~1-2 weeks.

@jeasinema
Copy link
Contributor

Hi,

The pre-trained models on GQA has been relealsed since ae34a63. Sorry for the delay and please let us know if you have any questions.

@gulu999
Copy link

gulu999 commented Oct 17, 2022

Sorry to interrupt you. When I use the pre-trained model(swin_base), the program runs with the following error:
RuntimeError: Error(s) in loading state_dict for DataParallel:
Unexpected key(s) in state_dict: "module.encoder.encoder.proj.last_layer.weight_g","module.encoder.encoder.proj.last_layer.weight_v", "module.encoder.encoder.proj2.last_layer.weight_g", "module.encoder.encoder.proj2.last_layer.weight_v".

@jeasinema
Copy link
Contributor

Hi @gulu999, please try to add strict=False to all the load_state_dict function calls. LMK if you still have any questions and it will be very helpful if you can provide the complete error log (with line number, etc).

@gulu999
Copy link

gulu999 commented Oct 17, 2022

First of all thanks for such a quick reply.
After I add strict=False,the program runs with the following error:
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 281, in main_worker
optimizer.load_state_dict(ckpt['optimizer'],strict=False)
TypeError: load_state_dict() got an unexpected keyword argument 'strict'
And the complete error log without strict=False:
C:\Users\li\anaconda3\python.exe C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py
config:
{'train_dataset': 'gqa', 'train_dataset_args': {'root_dir': 'D:\datasets\GQA\relvit\gqa_annotations', 'split': 'train'}, 'test_dataset': 'gqa', 'test_dataset_args': {'root_dir': 'D:\datasets\GQA\relvit\gqa_annotations', 'split': 'val'}, 'model': 'mcan-customized', 'model_args': {'word_emb_path': './cache/gqa_word_embed.npy', 'encoder': 'transparent_superpixel_encoder', 'encoder_args': {'encoder': 'swin_base', 'use_boxes_dim': False}}, 'load_encoder': './cache/swin_base-{}.pth', 'encoder_pretrain': 'imagenet', 'train_batches': 1000000, 'ep_per_batch': 1, 'max_epoch': 12, 'eval_mode': 1, 'relvit': True, 'relvit_weight': 1.0, 'relvit_loss_tau': 0.04, 'relvit_local_only': 2, 'relvit_mode': 1, 'relvit_sample_uniform': True, 'relvit_num_concepts': 1615, 'relvit_moco_m': 0.999, 'relvit_moco_use_queue': False, 'relvit_moco_K': 10, 'relvit_num_tokens': 49, 'optimizer': 'adamw', 'optimizer_args': {'lr': 0.0001, 'weight_decay': 0, 'milestones': [8, 10], 'eps': '1e-8'}, 'print_freq': 10, 'save_epoch': 1, 'eval_epoch': 1, 'grad_norm': 0.5}
set gpu: 0
train dataset: 711945 samples
test dataset: 32509 samples
==> Successfully loaded ./cache/swin_base-imagenet.pth for the enocder.
MCANCustomized(
(encoder): TransparentSuperpixelEncoder(
(encoder): SwinTransformer(
....
(proj_norm): LayerNorm()
(proj): Linear(in_features=1024, out_features=1843, bias=True)
)
Traceback (most recent call last):
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 632, in
main(config)
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 102, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 278, in main_worker
model.load_state_dict(ckpt['state_dict'])
File "C:\Users\li\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DataParallel:
Unexpected key(s) in state_dict: "module.encoder.encoder.proj.last_layer.weight_g", "module.encoder.encoder.proj.last_layer.weight_v", "module.encoder.encoder.proj2.last_layer.weight_g", "module.encoder.encoder.proj2.last_layer.weight_v".

@jeasinema
Copy link
Contributor

Hi @gulu999, I found this in the error log

optimizer.load_state_dict(ckpt['optimizer'],strict=False)
TypeError: load_state_dict() got an unexpected keyword argument 'strict'

Sorry for not making it clear but there is no need to add strict=False to optimizer.load_state_dict.

Please let me know if this helps with the issue.

@gulu999
Copy link

gulu999 commented Oct 18, 2022

Thank you again!
After I follow your modification,the program runs with the following error:

Traceback (most recent call last):
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 632, in
main(config)
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 102, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "C:/Users/li/Desktop/xgqapaper/relvit/relvitli/main.py", line 281, in main_worker
optimizer.load_state_dict(ckpt['optimizer'])
File "C:\Users\li\anaconda3\lib\site-packages\torch\optim\optimizer.py", line 145, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

@jeasinema
Copy link
Contributor

Hi @gulu999, since we don't have relvit/relvitli/main.py in the original repo, it could be a bit hard to reproduce this error on our end. Quick question: do you want to continue the training or just evaluate/fine-tune the weights? You may simply skip optimizer.load_state_dict(ckpt['optimizer']) if not continuing the training. Otherwise, could you help with more details on how to reproduct it? Thank you.

@gulu999
Copy link

gulu999 commented Oct 18, 2022

I want to evaluate the weight and I use pycharm to run the code on my laptop(only one Gpu) . The main.py is almost the same as your code(train_gqa.py) . The difference is that I changed some parameters:
if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('--config-file',default='train_gqa_mcan.yaml')
parser.add_argument('--svname', default=None)
parser.add_argument('--save_dir', default='./save_dist')
parser.add_argument('--tag', default=None)
# parser.add_argument('--gpu', default='0')
parser.add_argument('--seed', type=int, default=123)
parser.add_argument('--workers', type=int, default=8)
# parser.add_argument('--test_only', action='store_true')
parser.add_argument('--test_only', default=True)
parser.add_argument('--test_model', default='D:\datasets\GQA\relvit\swin_base_original_gqa.pth')

And in the train_gqa_mcan.yaml, I only Change the encoder to swin_base.
sorry to bother you and my English is not very good. Thank you very much.

@jeasinema
Copy link
Contributor

Hi @gulu999, thank you so much for the information! Since you're evaluating with the weights, you may just skip optimizer.load_state_dict(ckpt['optimizer']) and add strict=False to the remaining load_state_dict calls.

@gulu999
Copy link

gulu999 commented Oct 20, 2022

Thank you very much, I have successfully run the code and got the correct result.

@dal-code
Copy link

Hi,

The pre-trained models on GQA has been relealsed since ae34a63. Sorry for the delay and please let us know if you have any questions.

Hi, Can you provide a model trained on the on the HICO dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants