Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench:ecal_gaps:allow_failure: true #115

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

wdconinc
Copy link
Contributor

Briefly, what does this PR introduce?

bench:ecal_gaps is failing more often than acceptable, for reasons unrelated to the job. This PR allows the benchmarks to fail.

@wdconinc wdconinc requested a review from veprbl December 18, 2024 04:31
@veprbl veprbl enabled auto-merge (squash) December 18, 2024 05:41
@veprbl veprbl disabled auto-merge December 18, 2024 05:42
@veprbl
Copy link
Member

veprbl commented Dec 18, 2024

Is this meant for specific upstream issue of Today (addressed in #114)? Or just vote of distrust for reliability of this specific benchmark?

@wdconinc
Copy link
Contributor Author

It seems to be this specific benchmark that most often has issues downstream of container builds:

Or just the whole list: https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines?page=1&scope=all&status=failed

Usually it's

distributed.comm.core.CommClosedError: in <TCP (closed) Worker->Scheduler local=tcp://[::1]:60324 remote=tcp://localhost:6805>: Stream is closed

@veprbl
Copy link
Member

veprbl commented Dec 19, 2024

Usually it's

distributed.comm.core.CommClosedError: in <TCP (closed) Worker->Scheduler local=tcp://[::1]:60324 remote=tcp://localhost:6805>: Stream is closed

That's actually not the issue. The actual messages from the client are lost in the noise from the scheduler process. I need to fix that.

The recent ones are a result of a regressions in dask_awkward/dask_histogram. We probably need to pin things in requirements.txt.

There are some spurious ones like

Unable to obtain modification time of file sim_output/backwards_ecal/epic/pi-/100MeV/130to177deg/pi-_100MeV_130to177deg.0000.eicrecon.tree.edm4eic.root although it existed before. It could be that a concurrent process has deleted it while Snakemake was running.

That's inherent to caching being broken by scratch cleanup. We need to either patch snakemake to touch its files or, maybe, we could switch cleanup to use atime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants