-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New installation will not run- "Failed to build object with prefix dataset
using builder NpzDataset
"🐛 [BUG]
#440
Comments
Hi @tft225 Thank you for your interest in NequIP. On my WSL, the following works.
doing Could you maybe share more about the exact steps you took - that might be helpful for us to figure it out. Also, it could be useful to delete the training directory when you want to start a new training run (potentially could help with your debugging process to distinguish different factors at play). Looking at the stack trace
it's related to |
The last time I tried I was on Python 3.9.19. It would have been whatever automatically installed when I installed nequip; I've since removed the environment, sorry. I can try again and check the version if that's necessary. My steps were as follows the last time I tried: Install the cmd.exe prompt in that environment from anaconda navigator pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 Installed Cuda Toolkit 11.3 and set this version in system environment variables git clone https://github.com/mir-group/nequip.git pip install wandb conda install numpy 1.20.3 Restarted my computer nequip-train configs/minimal.yaml I also tried different python versions and torch versions and kept running into the same issue, this method just fixed a separate problem where my gpu wasn't recognized. After following the steps specified above I think I ran into this error: c:\users\thoma\nequip\nequip\utils_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see #311. At present we strongly recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue. when running example.yaml, and the same error as before for minimal.yaml. This looks like the first one has something to do with how wandb installed? Thanks for the quick reply, I appreciate your help. |
I'm not sure of this exact error, but in general we do not support Windows systems except inside of Windows Subsystem for Linux (WSL). |
Describe the bug
When trying to run the minimal.yaml test set, I get this out:
Processing dataset...
Loaded data: Batch(atomic_numbers=[21000, 1], batch=[21000], cell=[1000, 3, 3], edge_cell_shift=[220186, 3], edge_index=[2, 220186], forces=[21000, 3], pbc=[1000, 3], pos=[21000, 3], ptr=[1001], total_energy=[1000, 1])
processed data size:
9.77 MB'Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\shutil.py", line 791, in move
os.rename(src, real_dst)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q' -> 'results\aspirin\processed_dataset_afe51556e8a832da62377bded6857e80a9523c1b\.tmp-data.pth
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 39, in _process_moves
shutil.move(from_name, tmp_path)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\shutil.py", line 812, in move
os.unlink(src)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\auto_init.py", line 243, in instantiate
instance = builder(**positional_args, **final_optional_args)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_npz_dataset.py", line 81, in init
super().init(
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 152, in init
super().init(root=root, type_mapper=type_mapper)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 43, in init
super().init(root=root, transform=type_mapper)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\torch_geometric\dataset.py", line 91, in init
self._process()
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\torch_geometric\dataset.py", line 176, in _process
self.process()
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 280, in process
torch.save((data, self.include_frames), f)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\contextlib.py", line 120, in exit
next(self.gen)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 182, in atomic_write
_submit_move(Path(tp.name), Path(fname), blocking=blocking)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 128, in _submit_move
_process_moves([obj])
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 43, in _process_moves
_delete_files_if_exist([m[1] for m in moves])
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 25, in _delete_files_if_exist
f.unlink(missing_ok=True)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\pathlib.py", line 1325, in unlink
self._accessor.unlink(self)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Thoma\anaconda3\envs\NequIP\Scripts\nequip-train.exe_main.py", line 7, in
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\scripts\train.py", line 83, in main
trainer = fresh_start(config)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\scripts\train.py", line 196, in fresh_start
dataset = dataset_from_config(config, prefix="dataset")
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_build.py", line 78, in dataset_from_config
instance, _ = instantiate(
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\auto_init.py", line 245, in instantiate
raise RuntimeError(
RuntimeError: Failed to build object with prefix
dataset
using builderNpzDataset
To Reproduce
Try to install NequIP in a new environment created through anaconda navigator and run the given test:
$ nequip-train configs/example.yaml
Expected behavior
runs minimal.yaml training
Environment:
Additional context
tried multiple different new environments with different python/torch/cuda versions created through anaconda, all showed the same issue
The text was updated successfully, but these errors were encountered: