Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin' #532

Open
HarikrishnanK9 opened this issue Jun 28, 2024 · 1 comment

Comments

@HarikrishnanK9
Copy link

HarikrishnanK9 commented Jun 28, 2024

tokens per iteration will be: 491,520
Initializing a new model from scratch
defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency)
number of parameters: 123.59M
num decayed parameter tensors: 50, with 124,354,560 parameters
num non-decayed parameter tensors: 25, with 19,200 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
Traceback (most recent call last):
File "/home/paperspace/clinsight/backend/ner_re/test/Finetuning/Trash/nanoGPT/train.py", line 250, in
X, Y = get_batch('train') # fetch the very first batch
File "/home/paperspace/clinsight/backend/ner_re/test/Finetuning/Trash/nanoGPT/train.py", line 120, in get_batch
data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r')
File "/home/paperspace/anaconda3/envs/finetune_env/lib/python3.10/site-packages/numpy/core/memmap.py", line 229, in new
f_ctx = open(os_fspath(filename), ('r' if mode == 'c' else mode)+'b')
FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin'

@kalgoritmi
Copy link

Did you run this before training to download the data?

python data/openwebtext/prepare.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants