Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Bug #1

Open
Hannibal046 opened this issue May 17, 2023 · 2 comments
Open

Possible Bug #1

Hannibal046 opened this issue May 17, 2023 · 2 comments

Comments

@Hannibal046
Copy link

Hi, thanks for the great work of ICL evaluation in NMT.

I encounter some problems when executing python test/test_flores101.py. Could you please double check this? It seems the code is not fully prepared, e.g. there are some obvious problems from python linter:

https://github.com/OwenNJU/MMT-LLM/blob/36e275dcede8ac0ab4501d7173753e007fec4e3d/test/test_flores101.py#L26-L27

https://github.com/OwenNJU/MMT-LLM/blob/36e275dcede8ac0ab4501d7173753e007fec4e3d/test/test_flores101.py#L77

And there are some problems about Accelerate version as discussed in Shark-NLP/OpenICL#15
Thanks again for this work!

@Lhtie
Copy link
Collaborator

Lhtie commented Jun 7, 2023

Sorry for these mistakes, the evaluation code is updated, along with other typos. Thanks for reminder and feel free to contact whenever anything confusing is spotted. As for the Accelerate version, it is indeed outdated and the newest version is updated in requirements.txt.

@Hannibal046
Copy link
Author

Hi, thanks for response! Could you please check this? I am using the updated code.

python test/test_flores101.py \
>   --lang_pair deu-eng \
>   --retriever random \
>   --ice_num 8 \
>   --prompt_template "</E></X>=</Y>" \
>   --model_name facebook/xglm-7.5B \
>   --tokenizer_name facebook/xglm-7.5B \
>   --output_dir output \
>   --output_file test \
>   --seed 43
Namespace(cross_lang=False, direction_order=None, disorder=False, ex_lang=None, ice_num=8, lang_order=None, lang_pair='deu-eng', model_name='facebook/xglm-7.5B', oracle=False, output_dir='output', output_file='test', prompt_template='</E></X>=</Y>', repeat=False, retriever='random', reverse_direction=False, seed=43, tokenizer_name='facebook/xglm-7.5B')
retrieve started
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 41282.52it/s]
retrieve finished
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 146.80it/s]
Average ice num:  8.0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
  0%|                                                                                                                                                | 0/1 [00:00<?, ?it/s]You're using a XGLMTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/anaconda/envs/mmt/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2382: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
  warnings.warn(
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 576.93it/s]
Traceback (most recent call last):
  File "test/test_flores101.py", line 122, in <module>
    print(f"BLEU score = {test_flores(args)}")
  File "test/test_flores101.py", line 91, in test_flores
    score = infr.score(src_lang=src, tgt_lang=tgt)
  File "MMT-LLM/openicl/icl_inferencer/icl_base_inferencer.py", line 74, in score
    return self.metric.score(predictions, src_lang=src_lang, tgt_lang=tgt_lang)
  File "MMT-LLM/openicl/icl_evaluator/icl_bleu_evaluator.py", line 25, in score
    pred_dict[idx] = predictions[idx].split()
AttributeError: 'int' object has no attribute 'split'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants