You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/NeMo-Run/src/nemo_run/core/runners/fdl_runner.py", line 66, in <module>
fdl_runner_app()
File "/usr/local/lib/python3.12/dist-packages/typer/main.py", line 340, in __call__
raise e
File "/usr/local/lib/python3.12/dist-packages/typer/main.py", line 323, in __call__
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/typer/core.py", line 680, in main
return _main(
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/typer/core.py", line 198, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/typer/main.py", line 698, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/NeMo-Run/src/nemo_run/core/runners/fdl_runner.py", line 62, in fdl_direct_run
fdl_fn()
File "/nemo_run/code/nemo/collections/llm/api.py", line 150, in pretrain
return train(
^^^^^^
File "/nemo_run/code/nemo/collections/llm/api.py", line 96, in train
app_state = _setup(
^^^^^^^
File "/nemo_run/code/nemo/collections/llm/api.py", line 838, in _setup
_use_tokenizer(model, data, tokenizer)
File "/nemo_run/code/nemo/collections/llm/api.py", line 795, in _use_tokenizer
_set_with_io(model, "tokenizer", data.tokenizer)
^^^^^^^^^^^^^^
AttributeError: 'HFDatasetDataModule' object has no attribute 'tokenizer'
In addition, HFDatasetDataModule does not take 'tokenizer' as arguments so passing a tokenizer into it will only be passed all the way to load_dataset function and cause other problems.
# this will not work
from nemo.collections.common.tokenizers.huggingface import AutoTokenizer
pretrain.data.tokenizer = AutoTokenizer("meta-llama/Llama-3.1-8B")
Steps/Code to reproduce bug
Modify MockDataModule to HFDatasetDataModule as described above.
Run the script.
Expected behavior
The HFDatasetDataModule should configure tokenizer from the model or explicitly take tokenizer from the init arguments.
If I am doing it wrong, please let me know what is the right way to build my own datamodule cause I cannot find any document or example for it.
Environment overview (please complete the following information)
I am using docker with the nvcr.io/nvidia/nemo:dev image updated on 02/03/2025 5:48 PM.
The text was updated successfully, but these errors were encountered:
Not all datamodule has tokenizer attribute.
It is possible to directly assign a tokenizer to the datamodule, but this will fail when run with nemo_run cause the attribute cannot be serialized.
Describe the bug
I am running the script here: https://github.com/NVIDIA/NeMo/blob/main/scripts/llm/pretraining.py
I modified line #160 to change the MockDataModule to HFDatasetDataModule for my own data.
It will run into the following error:
In addition, HFDatasetDataModule does not take 'tokenizer' as arguments so passing a tokenizer into it will only be passed all the way to
load_dataset
function and cause other problems.Steps/Code to reproduce bug
Expected behavior
The HFDatasetDataModule should configure tokenizer from the model or explicitly take tokenizer from the init arguments.
If I am doing it wrong, please let me know what is the right way to build my own datamodule cause I cannot find any document or example for it.
Environment overview (please complete the following information)
I am using docker with the
nvcr.io/nvidia/nemo:dev
image updated on 02/03/2025 5:48 PM.The text was updated successfully, but these errors were encountered: