Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify execute API of JobQueue, refactor websocket connection #6558

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

berland
Copy link
Contributor

@berland berland commented Nov 13, 2023

  • Put websocket connection handling into only one (async) function, that will run until it is finished.
  • Reduce the execute functions to only one, letting both legacy ensemble and simulation context use it.

The simulation_context now has to call it in an async loop, and it is also doing slightly more work (logging changes) than before.

Pre review checklist

  • Read through the code changes carefully after finishing work
  • Make sure tests pass locally (after every commit!)
  • Prepare changes in small commits for more convenient review (optional)
  • PR title captures the intent of the changes, and is fitting for release notes.
  • Updated documentation
  • Ensured that unit tests are added for all new behavior (See
    Ground Rules),
    and changes to existing code have good test coverage.

Pre merge checklist

  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.

@berland berland force-pushed the jobqueue_unify_execute_api branch 4 times, most recently from d805142 to 41868ba Compare November 14, 2023 11:00
@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (cd4bc85) 83.51% compared to head (ceb0b47) 83.51%.

Files Patch % Lines
src/ert/job_queue/queue.py 93.42% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6558      +/-   ##
==========================================
- Coverage   83.51%   83.51%   -0.01%     
==========================================
  Files         346      346              
  Lines       20764    20778      +14     
  Branches      948      948              
==========================================
+ Hits        17341    17352      +11     
- Misses       3129     3132       +3     
  Partials      294      294              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@berland berland force-pushed the jobqueue_unify_execute_api branch from 0c76269 to ceb0b47 Compare November 14, 2023 12:09
@berland berland marked this pull request as ready for review November 14, 2023 12:18
@berland berland changed the title Jobqueue unify execute api Unify execute API of JobQueue, refactor websocket connection Nov 14, 2023
@berland berland self-assigned this Nov 14, 2023
@berland berland added the release-notes:maintenance Automatically categorise as maintenance change in release notes label Nov 14, 2023
@berland
Copy link
Contributor Author

berland commented Nov 14, 2023

Tested manually with GUI for poly_example, bigpoly and also with everest's egg model.

@berland
Copy link
Contributor Author

berland commented Nov 14, 2023

Up for discussion: Should this be squashed?
(there are some minor details to be fixed if not squashed, some print-statements and maybe one mypy issue)

@jonathan-eq jonathan-eq self-requested a review November 14, 2023 14:52
Copy link
Contributor

@jonathan-eq jonathan-eq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job!

) -> None:
assert self.ens_id is not None # mypy
events = deque(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use deque here instead of a normal queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a local and minor optimization thing to have a double-ended queue.

if self._ee_uri is None:
# If no ensemble evaluator present, we will publish to the log
while (
change := await self._changes_to_publish.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know about this (walrus operator :=). Thanks🎉

experiment_id: Optional[str] = None,
) -> None:
for q_index, q_node in enumerate(self.job_list):
cert_path = f"{q_node.run_path}/{CERT_FILE}"
if cert is not None:
if self._ee_cert is not None:
Copy link
Contributor

@xjules xjules Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it not clearer when conninfo was passed directly into the function as a parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more explicit, but that explicitness would f.ex make it possible to mixup the ee connection for the execute function with what is given in this jobs file. I think it is cleaner to set the connection info just one place in the Queue object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is that Queue does not need to know about connection setup between ee and job_runner; hence my comment on passing it rather as a function params 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, I am removing that "feature" to have different connection info.

@xjules
Copy link
Contributor

xjules commented Nov 15, 2023

Up for discussion: Should this be squashed? (there are some minor details to be fixed if not squashed, some print-statements and maybe one mypy issue)

I would squash the fixup commits at least :)

Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good! If there are troubles with mypy I'd squash it as it sort of addresses the same thing.

The websocket connection from the JobQueue is refactored
to use a single asyncio task for fetching changes to be published,
publish them, and also keep the connection open.

The queue execute function had two variants, one that
included websocket communication (for "legacy" ert, and one
for Everest, the "simulation context"). Now there is only one,
which is async.

The changes that are published to the websocket channel is in the
simulation context (Everest) logged to the debug log. Previously that
information was not propagated.

The connection to the ensemble evaluator is now configured once,
and from now on it is not possible to have the job_runner use a
different connection than the JobQueue uses for publishing its
realization state changes.
@berland berland force-pushed the jobqueue_unify_execute_api branch from c327ea1 to 1ddc0c8 Compare November 15, 2023 15:04
@berland berland enabled auto-merge (rebase) November 15, 2023 15:05
@berland berland merged commit 0569a3d into equinor:main Nov 15, 2023
41 checks passed
@berland berland deleted the jobqueue_unify_execute_api branch November 16, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:maintenance Automatically categorise as maintenance change in release notes
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants