You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is conversion inconsistency in both scripts/checkpoint_converters/convert_bert_hf_to_nemo.py and scripts/checkpoint_converters/convert_bert_nemo_to_hf.py.
Error/difference between HF Embedding output and NeMo embedding output given as part of the sanity test test at the end of the script run (screenshot is OCRed below):
Describe the bug
There is conversion inconsistency in both
scripts/checkpoint_converters/convert_bert_hf_to_nemo.py
andscripts/checkpoint_converters/convert_bert_nemo_to_hf.py
.Slack thread for reference: https://nvidia.enterprise.slack.com/archives/C0271E234TB/p1737581333923069
Steps/Code to reproduce bug
Low performance of converted .nemo checkpoint:
Reference numbers on test dataset - numbers attained by HF model:
{'NDCG@1': 0.54878, 'NDCG@3': 0.67963, 'NDCG@5': 0.71479, 'NDCG@10': 0.73648, 'NDCG@100': 0.75289, 'NDCG@1000': 0.75289}, {'MAP@1': 0.54878, 'MAP@3': 0.64634, 'MAP@5': 0.66585, 'MAP@10': 0.67442, 'MAP@100': 0.67852, 'MAP@1000': 0.67852}, {'Recall@1': 0.54878, 'Recall@3': 0.77642, 'Recall@5': 0.86179, 'Recall@10': 0.93089, 'Recall@100': 1.0, 'Recall@1000': 1.0}, {'P@1': 0.54878, 'P@3': 0.25881, 'P@5': 0.17236, 'P@10': 0.09309, 'P@100': 0.01, 'P@1000': 0.001}
Observed numbers on test dataset - numbers by converted NeMo model:
{'NDCG@1': 0.17073, 'NDCG@3': 0.25561, 'NDCG@5': 0.27344, 'NDCG@10': 0.30846, 'NDCG@100': 0.37972, 'NDCG@1000': 0.40227}, {'MAP@1': 0.17073, 'MAP@3': 0.23442, 'MAP@5': 0.24397, 'MAP@10': 0.25815, 'MAP@100': 0.27021, 'MAP@1000': 0.27124}, {'Recall@1': 0.17073, 'Recall@3': 0.31707, 'Recall@5': 0.36179, 'Recall@10': 0.47154, 'Recall@100': 0.83333, 'Recall@1000': 1.0}, {'P@1': 0.17073, 'P@3': 0.10569, 'P@5': 0.07236, 'P@10': 0.04715, 'P@100': 0.00833, 'P@1000': 0.001}
Error/difference between HF Embedding output and NeMo embedding output given as part of the sanity test test at the end of the script run (screenshot is OCRed below):
Expected behavior
For HF to NeMo conversion, performance should be close to same for the NeMo model as the MF model.
For NeMo to HF conversion, the conversion error should not be high and should ideally be close to zero.
Environment overview (please complete the following information)
Ran inside NeMo FW container 24.07, with
pip install beir
The text was updated successfully, but these errors were encountered: