Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Encountered Operation not supported OSError when running MNIST Torch example #233

Open
hwpang opened this issue Jul 31, 2024 · 4 comments

Comments

@hwpang
Copy link

hwpang commented Jul 31, 2024

What are you trying to do?

I am a new user to SubstraFL and am currently going through the example at https://docs.substra.org/en/stable/examples/substrafl/get_started/run_mnist_torch.html.

Issue Description (what is happening?)

The notebook failed at the following cell with an OSError.

from substrafl.experiment import execute_experiment
import logging
import substrafl

substrafl.set_logging_level(loglevel=logging.ERROR)
# A round is defined by a local training step followed by an aggregation operation
NUM_ROUNDS = 3

compute_plan = execute_experiment(
    client=clients[ALGO_ORG_ID],
    strategy=strategy,
    train_data_nodes=train_data_nodes,
    evaluation_strategy=my_eval_strategy,
    aggregation_node=aggregation_node,
    num_rounds=NUM_ROUNDS,
    experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
    dependencies=dependencies,
    clean_models=False,
    name="MNIST documentation example",
)

Expected Behavior (what should happen?)

Expected to not have the error when running the tutorial.

Reproducible Example

No response

Operating system

Ubuntu 20.04

Python version

3.11.9

Installed Substra versions

substra==0.53.0
substrafl==0.46.0
substratools==0.21.4

Installed versions of dependencies

# packages in environment at /mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/conda_envs/substrafl_env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
annotated-types           0.7.0                    pypi_0    pypi
anyio                     4.2.0           py311h06a4308_0  
argon2-cffi               21.3.0             pyhd3eb1b0_0  
argon2-cffi-bindings      21.2.0          py311h5eee18b_0  
asttokens                 2.0.5              pyhd3eb1b0_0  
async-lru                 2.0.4           py311h06a4308_0  
attrs                     23.1.0          py311h06a4308_0  
babel                     2.11.0          py311h06a4308_0  
beautifulsoup4            4.12.3          py311h06a4308_0  
bleach                    4.1.0              pyhd3eb1b0_0  
brotli-python             1.0.9           py311h6a678d5_8  
build                     1.2.1                    pypi_0    pypi
bzip2                     1.0.8                h5eee18b_6  
ca-certificates           2024.7.2             h06a4308_0  
certifi                   2024.7.4        py311h06a4308_0  
cffi                      1.16.0          py311h5eee18b_1  
charset-normalizer        3.3.2              pyhd3eb1b0_0  
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
cmake                     3.30.1                   pypi_0    pypi
comm                      0.2.1           py311h06a4308_0  
contourpy                 1.2.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
debugpy                   1.6.7           py311h6a678d5_0  
decorator                 5.1.1              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
docker                    7.1.0                    pypi_0    pypi
executing                 0.8.3              pyhd3eb1b0_0  
expat                     2.6.2                h6a678d5_0  
filelock                  3.15.4                   pypi_0    pypi
fonttools                 4.53.1                   pypi_0    pypi
idna                      3.7             py311h06a4308_0  
ipykernel                 6.28.0          py311h06a4308_0  
ipython                   8.25.0          py311h06a4308_0  
jedi                      0.19.1          py311h06a4308_0  
jinja2                    3.1.4           py311h06a4308_0  
joblib                    1.4.2                    pypi_0    pypi
json5                     0.9.6              pyhd3eb1b0_0  
jsonschema                4.19.2          py311h06a4308_0  
jsonschema-specifications 2023.7.1        py311h06a4308_0  
jupyter-lsp               2.2.0           py311h06a4308_0  
jupyter_client            8.6.0           py311h06a4308_0  
jupyter_core              5.7.2           py311h06a4308_0  
jupyter_events            0.10.0          py311h06a4308_0  
jupyter_server            2.14.1          py311h06a4308_0  
jupyter_server_terminals  0.4.4           py311h06a4308_1  
jupyterlab                4.0.11          py311h06a4308_0  
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.25.1          py311h06a4308_0  
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libsodium                 1.0.18               h7b6447c_0  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
lit                       18.1.8                   pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.6.3                    pypi_0    pypi
matplotlib-inline         0.1.6           py311h06a4308_0  
mistune                   2.0.4           py311h06a4308_0  
mpmath                    1.3.0                    pypi_0    pypi
nbclient                  0.8.0           py311h06a4308_0  
nbconvert                 7.10.0          py311h06a4308_0  
nbformat                  5.9.2           py311h06a4308_0  
ncurses                   6.4                  h6a678d5_0  
nest-asyncio              1.6.0           py311h06a4308_0  
networkx                  3.3                      pypi_0    pypi
notebook                  7.0.8           py311h06a4308_2  
notebook-shim             0.2.3           py311h06a4308_0  
numpy                     1.24.3                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
openssl                   3.0.14               h5eee18b_0  
overrides                 7.4.0           py311h06a4308_0  
packaging                 24.1            py311h06a4308_0  
pandas                    1.5.3                    pypi_0    pypi
pandocfilters             1.5.0              pyhd3eb1b0_0  
parso                     0.8.3              pyhd3eb1b0_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pillow                    10.4.0                   pypi_0    pypi
pip                       24.0            py311h06a4308_0  
pip-tools                 7.4.1                    pypi_0    pypi
platformdirs              3.10.0          py311h06a4308_0  
prometheus_client         0.14.1          py311h06a4308_0  
prompt-toolkit            3.0.43          py311h06a4308_0  
prompt_toolkit            3.0.43               hd3eb1b0_0  
psutil                    5.9.0           py311h5eee18b_0  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pure_eval                 0.2.2              pyhd3eb1b0_0  
pycparser                 2.21               pyhd3eb1b0_0  
pydantic                  2.8.2                    pypi_0    pypi
pydantic-core             2.20.1                   pypi_0    pypi
pygments                  2.15.1          py311h06a4308_1  
pyparsing                 3.1.2                    pypi_0    pypi
pyproject-hooks           1.1.0                    pypi_0    pypi
pysocks                   1.7.1           py311h06a4308_0  
python                    3.11.9               h955ad1f_0  
python-dateutil           2.9.0post0      py311h06a4308_2  
python-fastjsonschema     2.16.2          py311h06a4308_0  
python-json-logger        2.0.7           py311h06a4308_0  
python-slugify            8.0.4                    pypi_0    pypi
pytz                      2024.1          py311h06a4308_0  
pyyaml                    6.0.1           py311h5eee18b_0  
pyzmq                     25.1.2          py311h6a678d5_0  
readline                  8.2                  h5eee18b_0  
referencing               0.30.2          py311h06a4308_0  
requests                  2.31.0                   pypi_0    pypi
rfc3339-validator         0.1.4           py311h06a4308_0  
rfc3986-validator         0.1.1           py311h06a4308_0  
rpds-py                   0.10.6          py311hb02cf49_0  
scikit-learn              1.3.1                    pypi_0    pypi
scipy                     1.14.0                   pypi_0    pypi
send2trash                1.8.2           py311h06a4308_0  
setuptools                69.5.1          py311h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sniffio                   1.3.0           py311h06a4308_0  
soupsieve                 2.5             py311h06a4308_0  
sqlite                    3.45.3               h5eee18b_0  
stack_data                0.2.0              pyhd3eb1b0_0  
substra                   0.53.0                   pypi_0    pypi
substrafl                 0.46.0                   pypi_0    pypi
substratools              0.21.4                   pypi_0    pypi
sympy                     1.13.1                   pypi_0    pypi
terminado                 0.17.1          py311h06a4308_0  
text-unidecode            1.3                      pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tinycss2                  1.2.1           py311h06a4308_0  
tk                        8.6.14               h39e8969_0  
torch                     2.0.1                    pypi_0    pypi
torchvision               0.15.2                   pypi_0    pypi
tornado                   6.4.1           py311h5eee18b_0  
tqdm                      4.66.4                   pypi_0    pypi
traitlets                 5.14.3          py311h06a4308_0  
triton                    2.0.0                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
typing_extensions         4.11.0          py311h06a4308_0  
tzdata                    2024a                h04d1e81_0  
urllib3                   2.2.2           py311h06a4308_0  
wcwidth                   0.2.5              pyhd3eb1b0_0  
webencodings              0.5.1           py311h06a4308_1  
websocket-client          1.8.0           py311h06a4308_0  
wheel                     0.43.0          py311h06a4308_0  
xz                        5.4.6                h5eee18b_1  
yaml                      0.2.5                h7b6447c_0  
zeromq                    4.3.5                h6a678d5_0  
zlib                      1.2.13               h5eee18b_1  

Logs / Stacktrace

Rounds progress: 100%|██████████| 3/3 [00:00<00:00, 1050.24it/s]
Compute plan progress:  10%|▉         | 2/21 [02:35<24:34, 77.61s/it]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[14], line 9
      6 # A round is defined by a local training step followed by an aggregation operation
      7 NUM_ROUNDS = 3
----> 9 compute_plan = execute_experiment(
     10     client=clients[ALGO_ORG_ID],
     11     strategy=strategy,
     12     train_data_nodes=train_data_nodes,
     13     evaluation_strategy=my_eval_strategy,
     14     aggregation_node=aggregation_node,
     15     num_rounds=NUM_ROUNDS,
     16     experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
     17     dependencies=dependencies,
     18     clean_models=False,
     19     name="MNIST documentation example",
     20 )

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substrafl/experiment.py:498, in execute_experiment(client, strategy, train_data_nodes, experiment_folder, num_rounds, aggregation_node, evaluation_strategy, dependencies, clean_models, name, additional_metadata, task_submission_batch_size)
    485 # save the experiment summary in experiment_folder
    486 _save_experiment_summary(
    487     experiment_folder=experiment_folder,
    488     compute_plan_key=compute_plan_key,
   (...)
    496     additional_metadata=additional_metadata,
    497 )
--> 498 compute_plan = client.add_compute_plan(
    499     substra.sdk.schemas.ComputePlanSpec(
    500         key=compute_plan_key,
    501         tasks=tasks,
    502         name=name or timestamp,
    503         metadata=cp_metadata,
    504     ),
    505     auto_batching=True,
    506     batch_size=task_submission_batch_size,
    507 )
    508 logger.info(("The compute plan has been registered to Substra, its key is {0}.").format(compute_plan.key))
    509 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/client.py:48, in logit.<locals>.wrapper(*args, **kwargs)
     46 error = None
     47 try:
---> 48     return f(*args, **kwargs)
     49 except Exception as e:
     50     error = e.__class__.__name__

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/client.py:548, in Client.add_compute_plan(self, data, auto_batching, batch_size)
    542 if not is_valid_uuid(spec.key):
    543     raise exceptions.ComputePlanKeyFormatError(
    544         "The compute plan key has to respect the UUID format. You can use the uuid library to generate it. \
    545     Example: compute_plan_key=str(uuid.uuid4())"
    546     )
--> 548 return self._backend.add(spec, spec_options=spec_options)

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:487, in Local.add(self, spec, spec_options, key)
    485 else:
    486     if spec.__class__.type_ == schemas.Type.ComputePlan:
--> 487         compute_plan = add_asset(spec, spec_options)
    488         return compute_plan
    489     else:

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:406, in Local._add_compute_plan(self, spec, spec_options)
    403 compute_plan = self._db.add(compute_plan)
    405 # go through the tasks sorted by rank
--> 406 compute_plan = self.__execute_compute_plan(spec, compute_plan, visited, tasks, spec_options)
    407 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:269, in Local.__execute_compute_plan(self, spec, compute_plan, visited, tasks, spec_options)
    266         if not task_spec:
    267             continue
--> 269         self.add(
    270             key=task_spec.key,
    271             spec=task_spec,
    272             spec_options=spec_options,
    273         )
    275         progress_bar.update()
    277 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:491, in Local.add(self, spec, spec_options, key)
    489 else:
    490     key = key or spec.compute_key()
--> 491     add_asset(key, spec, spec_options)
    492     return key

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:437, in Local._add_task(self, key, spec, spec_options)
    420 task = models.Task(
    421     key=key,
    422     creation_date=self.__now(),
   (...)
    433     metadata=spec.metadata if spec.metadata else dict(),
    434 )
    436 task = self._db.add(task)
--> 437 self._worker.schedule_task(task)
    438 return task

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/worker.py:313, in Worker.schedule_task(self, task)
    310 elif asset_type == schemas.Type.Dataset:
    311     dataset = self._db.get_with_files(schemas.Type.Dataset, task_input.asset_key)
    312     cmd_line_inputs.append(
--> 313         self._prepare_dataset_input(
    314             dataset=dataset,
    315             task_input=task_input,
    316             input_volume=volumes[VOLUME_INPUTS],
    317             multiple=multiple,
    318         )
    319     )
    320     addable_asset = dataset
    322 if addable_asset:

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/worker.py:161, in Worker._prepare_dataset_input(self, dataset, task_input, input_volume, multiple)
    157 def _prepare_dataset_input(
    158     self, dataset: models.Dataset, task_input: models.InputRef, input_volume: str, multiple: bool
    159 ):
    160     path_to_opener = input_volume / Filenames.OPENER.value
--> 161     Path(dataset.opener.storage_address).link_to(path_to_opener)
    162     return TaskResource(
    163         id=task_input.identifier,
    164         value=f"{TPL_VOLUME_INPUTS}/{Filenames.OPENER.value}",
    165         multiple=multiple,
    166     )

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/pathlib.py:1226, in Path.link_to(self, target)
   1211 """
   1212 Make the target path a hard link pointing to this path.
   1213 
   (...)
   1220 Use `hardlink_to()` instead.
   1221 """
   1222 warnings.warn("pathlib.Path.link_to() is deprecated and is scheduled "
   1223               "for removal in Python 3.12. "
   1224               "Use pathlib.Path.hardlink_to() instead.",
   1225               DeprecationWarning, stacklevel=2)
-> 1226 self.__class__(target).hardlink_to(self)

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/pathlib.py:1208, in Path.hardlink_to(self, target)
   1206 if not hasattr(os, "link"):
   1207     raise NotImplementedError("os.link() not available on this system")
-> 1208 os.link(target, self)

OSError: [Errno 95] Operation not supported: '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/yumnknd_/61c0f7fa-5228-4804-9d24-8beac24bfbc2/mnist_opener.py' -> '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/d18aa0b7-4aaf-4a4d-9e87-ebead4d168f9/inputs/opener.py'
@SdgJlbl
Copy link
Contributor

SdgJlbl commented Aug 6, 2024

Thanks a lot for raising this issue.
We were aware that the way of handling paths had changed in 3.12, but I didn't know that it could affect Python versions before that.
We will look into it.

@KindEmily
Copy link

Hey @hwpang

I`m currently also facing an issue with this tutorial

Would appreciate any help if you`re managed to finish that tutorial

Contact me pls 👋

P.s. I'm also active on Substra slack channel, you're very welcomed to come say hi and share your current progress
I`d be happy to have a contact with anyone I can discuss the potential problems solutions

You can find the Slack channel invite in the Substra community
URL: https://docs.substra.org/en/stable/additional/community.html

Help me pls 🆘

And if you would like to check on my issue, please take a look at the Run-experiment-console-error-help-request branch
URL: https://github.com/KindEmily/Using-Torch-FedAvg-on-MNIST-dataset/tree/Run-experiment-console-error-help-request

image

@KindEmily
Copy link

@SdgJlbl Kindly asking if you managed to check on this ? 🥺

@KindEmily
Copy link

@hwpang I was able to finish the tutorial by using flat structure instead of modules (putting all the code in a single file e.g. main.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants