-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ensemble_evaluator with new scheduler #6803
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6803 +/- ##
=======================================
Coverage 83.90% 83.90%
=======================================
Files 365 365
Lines 21410 21410
Branches 948 948
=======================================
Hits 17964 17964
Misses 3152 3152
Partials 294 294 ☔ View full report in Codecov by Sentry. |
end_event = from_json(mock_ws_task.result()[end_event_index]) | ||
assert end_event["type"] == "com.equinor.ert.realization.success" | ||
assert end_event.data == {"queue_event_type": "SUCCESS"} | ||
if FeatureToggling.is_enabled("scheduler"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look here for the change of events emitted by the new local driver. Also note that "waiting" is translated to "SUBMITTED", but not for the legacy driver/job_queue_node(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because the initial state is assumed to be WAITING
in Scheduler, so the first new event is SUBMITTED
, while JobQueue starts with NOT_ACTIVE
, so the first event is WAITING
.
I don't see the need for two different states that both say "we're not doing anything yet". Maybe we should consider removing NOT_ACTIVE
from JobQueue to avoid this if
? Alternatively, Scheduler might send WAITING
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with differing behaviour for now, so we can reduce the number of states while transitioning to Scheduler. Lets reconsider if a case pops up where we need to distuingish between NOT_ACTIVE and WAITING in Scheduler.
@@ -104,7 +104,7 @@ async def _send(self, state: State) -> None: | |||
event = CloudEvent( | |||
{ | |||
"type": _queue_state_event_type(status), | |||
"source": f"/etc/ensemble/{self._scheduler._ens_id}/real/{self.iens}", | |||
"source": f"/ert/ensemble/{self._scheduler._ens_id}/real/{self.iens}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No way did I really write etc
and not notice??? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your finger muscle memory is forgiven
3b5aad0
to
87eae99
Compare
50318a2
to
326edeb
Compare
src/ert/scheduler/job.py
Outdated
@@ -73,7 +73,7 @@ async def _submit_and_run_once(self, sem: asyncio.BoundedSemaphore) -> None: | |||
self.real.iens, self.real.job_script, cwd=self.real.run_arg.runpath | |||
) | |||
|
|||
await self._send(State.STARTING) | |||
await self._send(State.STARTING) # aka PENDING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PENDING
a better name? Should we change it back so that we don't need this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not better by itself, but more familiar for legacy ert devs..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change back to PENDING. I will claim it means exactly the same, and also makes as much sense as STARTING for the LocalDriver.
end_event = from_json(mock_ws_task.result()[end_event_index]) | ||
assert end_event["type"] == "com.equinor.ert.realization.success" | ||
assert end_event.data == {"queue_event_type": "SUCCESS"} | ||
if FeatureToggling.is_enabled("scheduler"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because the initial state is assumed to be WAITING
in Scheduler, so the first new event is SUBMITTED
, while JobQueue starts with NOT_ACTIVE
, so the first event is WAITING
.
I don't see the need for two different states that both say "we're not doing anything yet". Maybe we should consider removing NOT_ACTIVE
from JobQueue to avoid this if
? Alternatively, Scheduler might send WAITING
explicitly.
26a222f
to
cd3d4d2
Compare
This highlights a behavioural change in the new LocalDriver, it will not send the same events as the legacy local driver, see test_async_queue_execution.py::test_happy_path The new scheduler will not catch bare exceptions for now, and thus the test for that situation is only applied for the legacy JobQueue.
cd3d4d2
to
f00c196
Compare
tmpdir, make_ensemble_builder, monkeypatch | ||
): | ||
"""This test function is not ported to Scheduler, as it will not | ||
catch general exceptions.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs have been searched through, and the general exception that the legacy queue can catch has not happened the last 3 months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Adds testing and fixes some problems.
Currently it allows for a change in events emitted by the local driver from legacy to new. This is up for discussion.
Should be merged after #6787 as code is stolen therefrom
Approach
Test unit_tests/ensemble_evaluator with scheduler using new fixture.
Pre review checklist
Ground Rules),
and changes to existing code have good test coverage.
Pre merge checklist