Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Finetune on custom dataset locally using TF2 #211

Open
vsk-phi opened this issue Jan 21, 2023 · 1 comment
Open

Finetune on custom dataset locally using TF2 #211

vsk-phi opened this issue Jan 21, 2023 · 1 comment

Comments

@vsk-phi
Copy link

vsk-phi commented Jan 21, 2023

hello,
thanks for sharing code. i wanted to try the pretrained model for a fine tuning task on a custom dataset. After some trial and error, i was able to write as per tfds. Also, due to the version changes of tensorflow, i had a bit of trial and error on that as well and am able to make some progress on getting it to run. The command i use to finetune is from the README and is as
python ./code/simclr/tf2/run.py --mode=train_then_eval --train_mode=finetune --fine_tune_after_block=4 --zero_init_logits_layer=True --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0 --train_epochs=10 --train_batch_size=64 --warmup_epochs=0 --dataset=tf_guidance --image_size=128 --eval_split=val --resnet_depth=50 --checkpoint=simclr_v1/1x/saved_model/ --model_dir=/tmp/simclr_test_ft --use_tpu=False
The number of classes for my problem is 6.
i have used the checkpoint from sumclr1 with 1x resnet as can be seen above.
On running the above, the warning I get during the checkpoint loading is: I0121 17:28:25.435804 139671615018752 api.py:447] head_supervised/linear_layer/dense_3/bias:0 WARNING:tensorflow:Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_53/batch_normalization_53 /gamma:0', 'projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/beta:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/gamma:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/beta:0', 'projection_head/nl_2/batch_norm_relu_55/batch_normalization_55/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument? W0121 17:28:25.458693 139671615018752 utils.py:76] Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/gamma:0', 'projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/beta:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/gamma:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/beta:0', 'projection_head/nl_2/batch_norm_relu_55/batch_normalization_55/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument?
after a bit,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: logits and labels must be broadcastable: logits_size=[64,1000] labels_size=[64,6] [[node categorical_crossentropy/softmax_cross_entropy_with_logits (defined at /home/krishnan/pyenv/tf2/lib/python3.8/site-packages/keras/backend.py:5009) ]] [[Func/while/body/_1/image/write_summary/summary_cond/then/_1242/input/_1253/_30]] (1) INVALID_ARGUMENT: logits and labels must be broadcastable: logits_size=[64,1000] labels_size=[64,6] [[node categorical_crossentropy/softmax_cross_entropy_with_logits (defined at /home/krishnan/pyenv/tf2/lib/python3.8/site-packages/keras/backend.py:5009) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_multiple_steps_11689]
I can guess that it is is due to the mismatch of the number of classes. As i can see, the number of classes is set in the code in run.py using the datasets.
num_classes = builder.info.features['label'].num_classes on line 475
Unfortunately, i am unable to understand more or debug on this error in tensorflow.
any suggestions welcome.
Tensorflow version is 2.7.4. I have checked with 2.5.3 as well. Unfortunately, the code does not work in 2.4.1 citing some cuda version mismatch(though it works for later TF versions).
Any help/hints welcome! Sorry if it seems a basic question.
thank you.

@deepankarvarma
Copy link

Can you share the directory structure I am running the same code but I am quite unclear with the path to give in --model_dir and --checkpoint

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants