Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test ensemble_evaluator with new scheduler #6803

Merged
merged 1 commit into from
Dec 22, 2023

Conversation

berland
Copy link
Contributor

@berland berland commented Dec 14, 2023

Adds testing and fixes some problems.

Currently it allows for a change in events emitted by the local driver from legacy to new. This is up for discussion.

Should be merged after #6787 as code is stolen therefrom

Approach
Test unit_tests/ensemble_evaluator with scheduler using new fixture.

Pre review checklist

  • Read through the code changes carefully after finishing work
  • Make sure tests pass locally (after every commit!)
  • Prepare changes in small commits for more convenient review (optional)
  • PR title captures the intent of the changes, and is fitting for release notes.
  • Updated documentation
  • Ensured that unit tests are added for all new behavior (See
    Ground Rules),
    and changes to existing code have good test coverage.

Pre merge checklist

  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.

@berland berland self-assigned this Dec 14, 2023
@berland berland added scheduler release-notes:skip If there should be no mention of this in release notes labels Dec 14, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0140ccc) 83.90% compared to head (f00c196) 83.90%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6803   +/-   ##
=======================================
  Coverage   83.90%   83.90%           
=======================================
  Files         365      365           
  Lines       21410    21410           
  Branches      948      948           
=======================================
  Hits        17964    17964           
  Misses       3152     3152           
  Partials      294      294           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@berland berland marked this pull request as ready for review December 14, 2023 09:08
end_event = from_json(mock_ws_task.result()[end_event_index])
assert end_event["type"] == "com.equinor.ert.realization.success"
assert end_event.data == {"queue_event_type": "SUCCESS"}
if FeatureToggling.is_enabled("scheduler"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look here for the change of events emitted by the new local driver. Also note that "waiting" is translated to "SUBMITTED", but not for the legacy driver/job_queue_node(?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because the initial state is assumed to be WAITING in Scheduler, so the first new event is SUBMITTED, while JobQueue starts with NOT_ACTIVE, so the first event is WAITING.

I don't see the need for two different states that both say "we're not doing anything yet". Maybe we should consider removing NOT_ACTIVE from JobQueue to avoid this if? Alternatively, Scheduler might send WAITING explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with differing behaviour for now, so we can reduce the number of states while transitioning to Scheduler. Lets reconsider if a case pops up where we need to distuingish between NOT_ACTIVE and WAITING in Scheduler.

@@ -104,7 +104,7 @@ async def _send(self, state: State) -> None:
event = CloudEvent(
{
"type": _queue_state_event_type(status),
"source": f"/etc/ensemble/{self._scheduler._ens_id}/real/{self.iens}",
"source": f"/ert/ensemble/{self._scheduler._ens_id}/real/{self.iens}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way did I really write etc and not notice??? 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your finger muscle memory is forgiven

@berland berland force-pushed the test_ens_ev_scheduler branch from 3b5aad0 to 87eae99 Compare December 14, 2023 13:43
@berland berland force-pushed the test_ens_ev_scheduler branch 3 times, most recently from 50318a2 to 326edeb Compare December 20, 2023 12:21
@berland berland enabled auto-merge (rebase) December 20, 2023 13:25
@@ -73,7 +73,7 @@ async def _submit_and_run_once(self, sem: asyncio.BoundedSemaphore) -> None:
self.real.iens, self.real.job_script, cwd=self.real.run_arg.runpath
)

await self._send(State.STARTING)
await self._send(State.STARTING) # aka PENDING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is PENDING a better name? Should we change it back so that we don't need this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not better by itself, but more familiar for legacy ert devs..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change back to PENDING. I will claim it means exactly the same, and also makes as much sense as STARTING for the LocalDriver.

end_event = from_json(mock_ws_task.result()[end_event_index])
assert end_event["type"] == "com.equinor.ert.realization.success"
assert end_event.data == {"queue_event_type": "SUCCESS"}
if FeatureToggling.is_enabled("scheduler"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because the initial state is assumed to be WAITING in Scheduler, so the first new event is SUBMITTED, while JobQueue starts with NOT_ACTIVE, so the first event is WAITING.

I don't see the need for two different states that both say "we're not doing anything yet". Maybe we should consider removing NOT_ACTIVE from JobQueue to avoid this if? Alternatively, Scheduler might send WAITING explicitly.

@berland berland force-pushed the test_ens_ev_scheduler branch 2 times, most recently from 26a222f to cd3d4d2 Compare December 22, 2023 07:35
This highlights a behavioural change in the new LocalDriver, it will
not send the same events as the legacy local driver, see
test_async_queue_execution.py::test_happy_path

The new scheduler will not catch bare exceptions for now, and
thus the test for that situation is only applied for the legacy
JobQueue.
@berland berland force-pushed the test_ens_ev_scheduler branch from cd3d4d2 to f00c196 Compare December 22, 2023 13:23
tmpdir, make_ensemble_builder, monkeypatch
):
"""This test function is not ported to Scheduler, as it will not
catch general exceptions."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs have been searched through, and the general exception that the legacy queue can catch has not happened the last 3 months.

Copy link
Contributor

@pinkwah pinkwah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@berland berland merged commit e9ff239 into equinor:main Dec 22, 2023
42 of 43 checks passed
@berland berland deleted the test_ens_ev_scheduler branch June 6, 2024 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:skip If there should be no mention of this in release notes
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants