Local execution of the `.py` version of the ttbar analysis is broken #143

eguiraud · 2023-05-18T16:25:14Z

I switched to local execution by setting AF: local in config.yaml. With that change, python ttbar_analysis_pipeline.py is not able to run. It seems that the problem is linked to running the backend "at global scope": this patch that puts all of the data processing under if __name__ == "__main__" fixes the problem.

The actual error:

    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

with this traceback:

Traceback (most recent call last):
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/nanny.py", line 442, in instantiate
    result = await self.process.start()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/nanny.py", line 711, in start
    await self.process.start()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/process.py", line 55, in _call_and_set_future
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/site-packages/distributed/process.py", line 215, in _start
    process.start()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/spawn.py", line 158, in get_preparation_data
    _check_not_importing_main()
  File "/home/blue/Tools/miniconda3/envs/agc-py311/lib/python3.11/multiprocessing/spawn.py", line 138, in _check_not_importing_main

The text was updated successfully, but these errors were encountered:

eguiraud · 2023-05-18T16:26:03Z

(the patch is probably too intrusive and it does not really make sense for the notebook version of the script, so I'm not sure how to proceed -- there might be a better fix)

alexander-held · 2023-05-18T17:03:23Z

Thanks for raising this, we definitely need to find a better version that works with both notebook and script.

Does that patch work in the notebook as-is? I didn't think that the if __name__ == "__main__": block could continue across cells. I was hoping that wrapping the functionality in run_processor() is already enough by itself, does that fix the problem without the __main__ guard? If so, that would be easiest. If not, then I'm honestly not sure what to best do here.

eguiraud · 2023-05-18T17:19:30Z

Unfortunately just wrapping the processing in run_processor() did not seem to help (but as I don't actually understand the underlying issue it could be that I missed something simple).

I updated the branch with the patch to a simpler version that just uses if __name__ == "__main__" to avoid the confusion.

EDIT:

here's the simpler patch

eguiraud · 2023-05-20T22:34:54Z

More info at https://docs.dask.org/en/stable/scheduling.html#standalone-python-scripts

alexander-held · 2023-05-23T12:20:04Z

Thinking about this some more: I imagine it is fine to only guard the cell (or even just the commands) where the coffea execution happens and then go back, like this:

...  # all the other code

if __name__ == "__main__":
    run.preprocess(fileset, ...)
    run(fileset, ...)

...  # all the rest

It is perhaps a bit unusual but this might be a minimally invasive solution, assuming it works as I imagine it does.

alexander-held added bug Something isn't working implementation concerns analysis implementation labels Aug 30, 2023

alexander-held mentioned this issue Sep 7, 2023

Core dependencies for running AGC implementation #199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local execution of the `.py` version of the ttbar analysis is broken #143

Local execution of the `.py` version of the ttbar analysis is broken #143

eguiraud commented May 18, 2023

eguiraud commented May 18, 2023

alexander-held commented May 18, 2023

eguiraud commented May 18, 2023 •

edited

Loading

eguiraud commented May 20, 2023

alexander-held commented May 23, 2023

Local execution of the .py version of the ttbar analysis is broken #143

Local execution of the .py version of the ttbar analysis is broken #143

Comments

eguiraud commented May 18, 2023

eguiraud commented May 18, 2023

alexander-held commented May 18, 2023

eguiraud commented May 18, 2023 • edited Loading

eguiraud commented May 20, 2023

alexander-held commented May 23, 2023

Local execution of the `.py` version of the ttbar analysis is broken #143

Local execution of the `.py` version of the ttbar analysis is broken #143

eguiraud commented May 18, 2023 •

edited

Loading