-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve server-client communication error handling #6578
Conversation
5dd62b7
to
ee3102d
Compare
ee3102d
to
ff80511
Compare
client = WorkflowRuntimeClient(one.workflow) | ||
with monkeypatch.context() as mp: | ||
mp.setattr(client, 'socket', Mock(recv=mock_recv)) | ||
mp.setattr(client, 'poller', Mock()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future/historical reference, I originally tried mocking the poll
method of client.poller
, i.e.
mp.setattr(client.poller, 'poll', Mock())
but this resulted in pytest hanging near the end of running all integration tests; some strange interaction due to the ZMQ poller using threading or something, dunno.
Anyway, the reason for this monkeypatching is to get client.poller.poll()
to return a truthy value so the socket can receive the mock response
cylc-flow/cylc/flow/network/client.py
Lines 298 to 300 in dd468d6
# receive response | |
if self.poller.poll(timeout): | |
res = await self.socket.recv() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made two suggestions, but neither should block this PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand what bug this fixes.
The changes I can see:
- messages/traceback field removed
- cylc_version field added
- Errors now force exceptions in multi mode (was this the bug?).
- Refactoring.
- Nicer error for the trigger incompatibility issue.
Before# Workflow running at Cylc 8.3.6, CLI at Cylc 8.4.0
$ cylc trigger wflow//1/foo
Error processing command:
AttributeError: 'list' object has no attribute 'values' After# Workflow running at Cylc 8.3.6, CLI at this Cylc <this PR>
$ cylc trigger wflow//1/foo
Error processing command:
Exception: This command is no longer compatible with the version of Cylc running the workflow. Please stop & restart the workflow with Cylc 8.4.1.dev or higher.
[{'error': {'message': 'Unknown argument "onResume" on field "trigger" of type "Mutations".', 'traceback': ['graphql.error.base.GraphQLError: Unknown argument "onResume" on field "trigger" of type "Mutations".\n']}}] (Added to OP) |
0fac2c7
to
f751949
Compare
b81932c
to
9505212
Compare
9505212
to
43cab17
Compare
I think this PR is stripping away exception context? I tried creating an internal error with this diff: diff --git a/cylc/flow/network/resolvers.py b/cylc/flow/network/resolvers.py
index 79b91f97e..702b3f573 100644
--- a/cylc/flow/network/resolvers.py
+++ b/cylc/flow/network/resolvers.py
@@ -787,6 +787,7 @@ class Resolvers(BaseResolvers):
cutoff: Any = None
):
"""Put or clear broadcasts."""
+ raise ValueError('Unsupported broadcast mode')
if settings is not None:
# Convert schema field names to workflow config setting names if
# applicable: Results:
|
aa12785
to
1415531
Compare
Should be sorted in 1415531 |
} | ||
but this is not 100% consistent unfortunately | ||
""" | ||
error: Union[Exception, str, dict] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is error
an Exception
or a str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried to change it to Exception | str
to begin with, to simplify things, but realised it breaks inter-version compatibility. So presently I don't think it ever is an Exception
or str
. However, with these changes we should be able to change it to be Exception | str
in future without breaking inter-version compatibility with this version onwards.
Using a diff along the lines of this: diff --git a/cylc/flow/network/resolvers.py b/cylc/flow/network/resolvers.py
index 79b91f97e..702b3f573 100644
--- a/cylc/flow/network/resolvers.py
+++ b/cylc/flow/network/resolvers.py
@@ -787,6 +787,7 @@ class Resolvers(BaseResolvers):
cutoff: Any = None
):
"""Put or clear broadcasts."""
+ raise ValueError('Unsupported broadcast mode')
if settings is not None:
# Convert schema field names to workflow config setting names if
# applicable: We will now get an error something like this:
This patch changes the string diff --git a/cylc/flow/exceptions.py b/cylc/flow/exceptions.py
index 802cfaaa9..a3eff7388 100644
--- a/cylc/flow/exceptions.py
+++ b/cylc/flow/exceptions.py
@@ -285,6 +285,10 @@ class ClientError(CylcError):
return ret
+class RequestError(ClientError):
+ """Represents an error returned by the server."""
+
+
class WorkflowStopped(ClientError):
"""The Cylc scheduler you attempted to connect to is stopped.""" diff --git a/cylc/flow/network/client.py b/cylc/flow/network/client.py
index 656aefd5c..89dac4cbb 100644
--- a/cylc/flow/network/client.py
+++ b/cylc/flow/network/client.py
@@ -40,10 +40,10 @@ from cylc.flow import (
__version__ as CYLC_VERSION,
)
from cylc.flow.exceptions import (
- ClientError,
ClientTimeout,
ContactFileExists,
CylcError,
+ RequestError,
WorkflowStopped,
)
from cylc.flow.hostuserutil import get_fqdn_by_host
@@ -332,17 +332,23 @@ class WorkflowRuntimeClient( # type: ignore[misc]
return response['data']
except KeyError:
error = response.get('error')
- if not error:
- error = (
+ error_mesage: str
+ if isinstance(error, (str, Exception)):
+ error_mesage = str(error)
+ elif error is None:
+ error_mesage = (
f"Received invalid response for Cylc {CYLC_VERSION}: "
f"{response}"
)
wflow_cylc_ver = response.get('cylc_version')
if wflow_cylc_ver and wflow_cylc_ver != CYLC_VERSION:
- error += (
+ error_mesage += (
f"\n(Workflow is running in Cylc {wflow_cylc_ver})"
)
- raise ClientError(str(error)) from None
+ else:
+ error_mesage = error.get('message', 'ERROR')
+
+ raise RequestError(error_mesage) from None
def get_header(self) -> dict:
"""Return "header" data to attach to each request for traceability. |
In #6499 we added a new field to the
trigger
GraphQL mutation, and this unfortunately broke the ability forcylc trigger
at 8.4 to work on workflows running in 8.3 (I shall term this "inter-version server-client comms").This PR does not fix that (we don't think a fix is feasible at this stage).
An additional problem is that the error handling for server-client comms is broken - that is what this PR fixes. And it attempts to ensure we have defined format for server-client comms which should reduce the chance of breaking inter-version server-client comms in future.
Before
After
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.?.?.x
branch.