-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API endpoint to report on datapusher errors #28
base: master
Are you sure you want to change the base?
Conversation
# IBlueprint | ||
|
||
def get_blueprint(self): | ||
return views.api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just registers a "plugin" that registers the Flask blueprint so that CKAN's routing knows about it.
model.TaskStatus.state == 'error', | ||
model.TaskStatus.task_type == 'datapusher', | ||
model.TaskStatus.last_updated > a_day_ago | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns any datapusher tasks that have errored in the last day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't have to worry about timezone issues here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so... server runs in UTC I believe, and all the last_updated values appear to be set in that timezone also.
if query.count() > 0: | ||
data = [model_dictize.task_status_dictize(ts, {'task_status': ts}) for ts in query.all()] | ||
response_msg = json.dumps(data) | ||
return make_response((response_msg, 500, headers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example response:
[{"id": "12d7b0bd-2992-40ac-9037-51ac0d35b8ac", "entity_id": "e91d0448-86ef-4462-a862-77b46a3ca6da", "entity_type": "resource", "task_type": "datapusher", "key": "datapusher", "value": "{}", "state": "error", "error": "{\"message\": \"Could not connect to DataPusher.\", \"details\": \"HTTPConnectionPool(host='datapusher', port=8800): Max retries exceeded with url: /job (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8bc3946890>: Failed to establish a new connection: [Errno -2] Name or service not known'))\"}", "last_updated": "2024-07-05T04:58:32.846214"}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking if 200 is a better option even when returning datapusher errors 🤔 Would it be confusing if the there is an actual 500 error thrown somewhere in here? I can also see how 500 can be useful as the caller won't have to check the payload before raising alarm or something. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess a monitoring platform should be able to check for the existence of error
in the response payload, and potentially then we'd be able to have different alerts based on 500, 200 with error, 200 ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅. Just a couple of questions inline.
@@ -88,7 +94,7 @@ def edit_uri(uri_id): | |||
if not current_uri: | |||
base.abort(404, 'URI not found') | |||
if current_uri.superseded_by != None: | |||
base.aport(422, 'URI is archived, editing not supported') | |||
base.abort(422, 'URI is archived, editing not supported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.TaskStatus.state == 'error', | ||
model.TaskStatus.task_type == 'datapusher', | ||
model.TaskStatus.last_updated > a_day_ago | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't have to worry about timezone issues here
if query.count() > 0: | ||
data = [model_dictize.task_status_dictize(ts, {'task_status': ts}) for ts in query.all()] | ||
response_msg = json.dumps(data) | ||
return make_response((response_msg, 500, headers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking if 200 is a better option even when returning datapusher errors 🤔 Would it be confusing if the there is an actual 500 error thrown somewhere in here? I can also see how 500 can be useful as the caller won't have to check the payload before raising alarm or something. 🤷♂️
API endpoint returns 500 on datapusher errors, with info about the affected resource and the error that occurred.
Intent is to hit this from pingdom or similar and alert on it.