Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local execution of the .py version of the ttbar analysis is broken #143

Open
eguiraud opened this issue May 18, 2023 · 5 comments
Open

Local execution of the .py version of the ttbar analysis is broken #143

eguiraud opened this issue May 18, 2023 · 5 comments
Labels
bug Something isn't working implementation concerns analysis implementation

Comments

@eguiraud
Copy link
Contributor

I switched to local execution by setting AF: local in config.yaml. With that change, python ttbar_analysis_pipeline.py is not able to run. It seems that the problem is linked to running the backend "at global scope": this patch that puts all of the data processing under if __name__ == "__main__" fixes the problem.

The actual error:

    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

with this traceback:

Traceback (most recent call last):
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/nanny.py", line 442, in instantiate
    result = await self.process.start()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/nanny.py", line 711, in start
    await self.process.start()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/process.py", line 55, in _call_and_set_future
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/process.py", line 215, in _start
    process.start()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/spawn.py", line 158, in get_preparation_data
    _check_not_importing_main()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/spawn.py", line 138, in _check_not_importing_main
@eguiraud
Copy link
Contributor Author

(the patch is probably too intrusive and it does not really make sense for the notebook version of the script, so I'm not sure how to proceed -- there might be a better fix)

@alexander-held
Copy link
Member

Thanks for raising this, we definitely need to find a better version that works with both notebook and script.

Does that patch work in the notebook as-is? I didn't think that the if __name__ == "__main__": block could continue across cells. I was hoping that wrapping the functionality in run_processor() is already enough by itself, does that fix the problem without the __main__ guard? If so, that would be easiest. If not, then I'm honestly not sure what to best do here.

@eguiraud
Copy link
Contributor Author

eguiraud commented May 18, 2023

Unfortunately just wrapping the processing in run_processor() did not seem to help (but as I don't actually understand the underlying issue it could be that I missed something simple).

I updated the branch with the patch to a simpler version that just uses if __name__ == "__main__" to avoid the confusion.

EDIT:

here's the simpler patch

@eguiraud
Copy link
Contributor Author

@alexander-held
Copy link
Member

Thinking about this some more: I imagine it is fine to only guard the cell (or even just the commands) where the coffea execution happens and then go back, like this:

...  # all the other code

if __name__ == "__main__":
    run.preprocess(fileset, ...)
    run(fileset, ...)

...  # all the rest

It is perhaps a bit unusual but this might be a minimally invasive solution, assuming it works as I imagine it does.

@alexander-held alexander-held added bug Something isn't working implementation concerns analysis implementation labels Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working implementation concerns analysis implementation
Projects
None yet
Development

No branches or pull requests

2 participants