Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New installation will not run- "Failed to build object with prefix dataset using builder NpzDataset"🐛 [BUG] #440

Open
tft225 opened this issue Jul 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@tft225
Copy link

tft225 commented Jul 3, 2024

Describe the bug
When trying to run the minimal.yaml test set, I get this out:

Processing dataset...
Loaded data: Batch(atomic_numbers=[21000, 1], batch=[21000], cell=[1000, 3, 3], edge_cell_shift=[220186, 3], edge_index=[2, 220186], forces=[21000, 3], pbc=[1000, 3], pos=[21000, 3], ptr=[1001], total_energy=[1000, 1])
processed data size: 9.77 MB
Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\shutil.py", line 791, in move
os.rename(src, real_dst)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q' -> 'results\aspirin\processed_dataset_afe51556e8a832da62377bded6857e80a9523c1b\.tmp-data.pth
'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 39, in _process_moves
shutil.move(from_name, tmp_path)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\shutil.py", line 812, in move
os.unlink(src)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\auto_init.py", line 243, in instantiate
instance = builder(**positional_args, **final_optional_args)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_npz_dataset.py", line 81, in init
super().init(
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 152, in init
super().init(root=root, type_mapper=type_mapper)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 43, in init
super().init(root=root, transform=type_mapper)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\torch_geometric\dataset.py", line 91, in init
self._process()
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\torch_geometric\dataset.py", line 176, in _process
self.process()
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_dataset_base_datasets.py", line 280, in process
torch.save((data, self.include_frames), f)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\contextlib.py", line 120, in exit
next(self.gen)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 182, in atomic_write
_submit_move(Path(tp.name), Path(fname), blocking=blocking)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 128, in _submit_move
_process_moves([obj])
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 43, in _process_moves
_delete_files_if_exist([m[1] for m in moves])
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 25, in _delete_files_if_exist
f.unlink(missing_ok=True)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\pathlib.py", line 1325, in unlink
self._accessor.unlink(self)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Thoma\anaconda3\envs\NequIP\Scripts\nequip-train.exe_main
.py", line 7, in
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\scripts\train.py", line 83, in main
trainer = fresh_start(config)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\scripts\train.py", line 196, in fresh_start
dataset = dataset_from_config(config, prefix="dataset")
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\data_build.py", line 78, in dataset_from_config
instance, _ = instantiate(
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\auto_init.py", line 245, in instantiate
raise RuntimeError(
RuntimeError: Failed to build object with prefix dataset using builder NpzDataset

To Reproduce
Try to install NequIP in a new environment created through anaconda navigator and run the given test:
$ nequip-train configs/example.yaml

Expected behavior
runs minimal.yaml training

Environment:

  • OS: Windows 11
  • python version 3.8.19, 3.9.19, 3.12.3
  • Torch 1.11, 1.13, 2.3
  • Cuda 11.6, 12.5
  • Laptop Ryzen 5800H/RTX 3070

Additional context
tried multiple different new environments with different python/torch/cuda versions created through anaconda, all showed the same issue

@tft225 tft225 added the bug Something isn't working label Jul 3, 2024
@cw-tan
Copy link
Collaborator

cw-tan commented Jul 4, 2024

Hi @tft225

Thank you for your interest in NequIP. On my WSL, the following works.

git clone https://github.com/mir-group/nequip.git
cd nequip
conda create -n nequip python=3.11
conda activate nequip
pip install torch
pip install -e .
pip install wandb
nequip-train configs/example.yaml

doing nequip-train configs/minimal.yaml also works.

Could you maybe share more about the exact steps you took - that might be helpful for us to figure it out. Also, it could be useful to delete the training directory when you want to start a new training run (potentially could help with your debugging process to distinguish different factors at play). Looking at the stack trace

File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\site-packages\nequip\utils\savenload.py", line 25, in _delete_files_if_exist
f.unlink(missing_ok=True)
File "C:\Users\Thoma\anaconda3\envs\NequIP\lib\pathlib.py", line 1325, in unlink
self._accessor.unlink(self)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Thoma\AppData\Local\Temp\tmpq1_2yq_q'

it's related to pathlib - what version of pathlib (and python) were you using for one of the cases that failed with such an error?

@tft225
Copy link
Author

tft225 commented Jul 4, 2024

The last time I tried I was on Python 3.9.19. It would have been whatever automatically installed when I installed nequip; I've since removed the environment, sorry. I can try again and check the version if that's necessary.

My steps were as follows the last time I tried:
Create new environment in anaconda navigator (python 3.9.19)

Install the cmd.exe prompt in that environment from anaconda navigator

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
(following instructions on pytorch website)

Installed Cuda Toolkit 11.3 and set this version in system environment variables
(for compatibility with this torch version)

git clone https://github.com/mir-group/nequip.git
cd nequip
pip install .

pip install wandb

conda install numpy 1.20.3
(got a different warning using the latest version of numpy)

Restarted my computer
(I think this is needed for the cuda driver change to take effect but I'm not sure)

nequip-train configs/minimal.yaml

I also tried different python versions and torch versions and kept running into the same issue, this method just fixed a separate problem where my gpu wasn't recognized.

After following the steps specified above I think I ran into this error:

c:\users\thoma\nequip\nequip\utils_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see #311. At present we strongly recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
warnings.warn(
Traceback (most recent call last):
File "\?\C:\Users\Thoma\anaconda3\envs\nequip\Scripts\nequip-train-script.py", line 33, in
sys.exit(load_entry_point('nequip', 'console_scripts', 'nequip-train')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\users\thoma\nequip\nequip\scripts\train.py", line 83, in main
trainer = fresh_start(config)
^^^^^^^^^^^^^^^^^^^
File "c:\users\thoma\nequip\nequip\scripts\train.py", line 182, in fresh_start
config = init_n_update(config)
^^^^^^^^^^^^^^^^^^^^^
File "c:\users\thoma\nequip\nequip\utils\wandb.py", line 23, in init_n_update
wandb.init(
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_init.py", line 1195, in init
wandb._sentry.reraise(e)
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\analytics\sentry.py", line 155, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_init.py", line 1180, in init
wi.setup(kwargs)
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_init.py", line 189, in setup
self._wl = wandb_setup.setup(settings=setup_settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_setup.py", line 325, in setup
ret = _setup(settings=settings)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_setup.py", line 318, in _setup
wl = _WandbSetup(settings=settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_setup.py", line 303, in init
_WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_setup.py", line 108, in init
self._settings = self._settings_setup(settings, self._early_logger)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_setup.py", line 138, in _settings_setup
s._infer_run_settings_from_environment(_logger=early_logger)
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_settings.py", line 1788, in _infer_run_settings_from_environment
program_relpath = self.program_relpath or _get_program_relpath(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Thoma\anaconda3\envs\nequip\Lib\site-packages\wandb\sdk\wandb_settings.py", line 188, in _get_program_relpath
relative_path = os.path.relpath(full_path_to_program, start=root)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 766, in relpath
ValueError: path is on mount '\\?\C:', start on mount 'C:'

when running example.yaml, and the same error as before for minimal.yaml. This looks like the first one has something to do with how wandb installed?

Thanks for the quick reply, I appreciate your help.

@Linux-cpp-lisp
Copy link
Collaborator

I'm not sure of this exact error, but in general we do not support Windows systems except inside of Windows Subsystem for Linux (WSL).

@cw-tan
Copy link
Collaborator

cw-tan commented Jul 5, 2024

Yea, you could try setting up WSL (instructions here) and conda (instructions here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants