Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the generations not right #254

Open
sankexin opened this issue Nov 28, 2024 · 2 comments
Open

the generations not right #254

sankexin opened this issue Nov 28, 2024 · 2 comments

Comments

@sankexin
Copy link

sankexin commented Nov 28, 2024

torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1

result:

_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, ID=94603600370528
11/28/2024 12:31:29 [INFO|DP=0|PP=0|TP=0]: model_config: LlamaConfig(bos_token_id=1, eos_token_id=2, hidden_act='silu', hidden_size=16, initializer_range=0.02, intermediate_size=64, is_llama_config=True, max_position_embeddings=256, num_attention_heads=4, num_hidden_layers=2, num_key_value_heads=4, pad_token_id=None, pretraining_tp=1, rms_norm_eps=1e-05, rope_scaling=None, rope_theta=10000.0, rope_interleaved=False, tie_word_embeddings=True, use_cache=True, vocab_size=256)
11/28/2024 12:31:29 [INFO|DP=0|PP=0|TP=0]: tokenizer_path: robot-test/dummy-tokenizer-wordlevel
I1128 12:31:29.845922  3442 ProcessGroupNCCL.cpp:1991] [PG 2 Rank 0] ProcessGroupNCCL created ncclComm_ 0x560a9dfa4f90 on CUDA device:
I1128 12:31:29.845943  3442 ProcessGroupNCCL.cpp:1996] [PG 2 Rank 0] NCCL_DEBUG: N/A
11/28/2024 12:31:29 [INFO|DP=0|PP=0|TP=0]: Building model..
11/28/2024 12:31:29 [INFO|DP=0|PP=0|TP=0]: Initialize RoPE Theta = 10000.0
11/28/2024 12:31:29 [INFO|DP=0|PP=0|TP=0]: Setting PP block ranks...
I1128 12:31:30.700415  3442 ProcessGroupNCCL.cpp:1991] [PG 1 Rank 0] ProcessGroupNCCL created ncclComm_ 0x560a9dd4afd0 on CUDA device:
I1128 12:31:30.700441  3442 ProcessGroupNCCL.cpp:1996] [PG 1 Rank 0] NCCL_DEBUG: N/A
11/28/2024 12:31:30 [INFO|DP=0|PP=0|TP=0]: Loading checkpoint from checkpoints/10:
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 407.27it/s]
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: input: [CLS] the [UNK] [UNK] [UNK] is [SEP]
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: generation: [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP]
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: input: [CLS] [UNK] [UNK] ( [UNK] [UNK] [SEP]
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: generation: [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP]
11/28/2024 12:31:39 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------
@sankexin sankexin changed the title the generations not conrrect the generations not right Nov 28, 2024
@hz-nm
Copy link

hz-nm commented Nov 29, 2024

I think that is more to do with the tokenizer used for example training. The tokenizer contains just a few words only such as [CLS], [UNK] [SEP] etc.

You can look at it here, [https://huggingface.co/robot-test/dummy-tokenizer-wordlevel/blob/main/tokenizer.json](Dummy Tokenizer).
Use a different tokenizer such bert or GPT2 to get words. Although the generation will still be very weird. Also don't forget to update the vocabulary parameter in the config file when you use a different tokenizer because of poor training.

@sankexin
Copy link
Author

I think that is more to do with the tokenizer used for example training. The tokenizer contains just a few words only such as [CLS], [UNK] [SEP] etc.

You can look at it here, [https://huggingface.co/robot-test/dummy-tokenizer-wordlevel/blob/main/tokenizer.json](Dummy Tokenizer). Use a different tokenizer such bert or GPT2 to get words. Although the generation will still be very weird. Also don't forget to update the vocabulary parameter in the config file when you use a different tokenizer because of poor training.

nice! great!

when I use:

tokenizer: https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer
datasets: https://huggingface.co/datasets/argilla-warehouse/fineweb-edu-dedup-filtered

and modify "examples/config_tiny_llama.yaml"

vocab_size: 49152

then result:

11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: input: The future of AI is
11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: generation:  poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry poetry
11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------
11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: input: def fib(n)
11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: generation:  spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl spl
11/29/2024 08:45:36 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants