Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API endpoint to report on datapusher errors #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

markstuart
Copy link
Contributor

API endpoint returns 500 on datapusher errors, with info about the affected resource and the error that occurred.

Intent is to hit this from pingdom or similar and alert on it.

@markstuart markstuart requested a review from Suvra19 July 5, 2024 05:11
# IBlueprint

def get_blueprint(self):
return views.api
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just registers a "plugin" that registers the Flask blueprint so that CKAN's routing knows about it.

model.TaskStatus.state == 'error',
model.TaskStatus.task_type == 'datapusher',
model.TaskStatus.last_updated > a_day_ago
))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns any datapusher tasks that have errored in the last day

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't have to worry about timezone issues here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so... server runs in UTC I believe, and all the last_updated values appear to be set in that timezone also.

if query.count() > 0:
data = [model_dictize.task_status_dictize(ts, {'task_status': ts}) for ts in query.all()]
response_msg = json.dumps(data)
return make_response((response_msg, 500, headers))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example response:

[{"id": "12d7b0bd-2992-40ac-9037-51ac0d35b8ac", "entity_id": "e91d0448-86ef-4462-a862-77b46a3ca6da", "entity_type": "resource", "task_type": "datapusher", "key": "datapusher", "value": "{}", "state": "error", "error": "{\"message\": \"Could not connect to DataPusher.\", \"details\": \"HTTPConnectionPool(host='datapusher', port=8800): Max retries exceeded with url: /job (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8bc3946890>: Failed to establish a new connection: [Errno -2] Name or service not known'))\"}", "last_updated": "2024-07-05T04:58:32.846214"}]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking if 200 is a better option even when returning datapusher errors 🤔 Would it be confusing if the there is an actual 500 error thrown somewhere in here? I can also see how 500 can be useful as the caller won't have to check the payload before raising alarm or something. 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess a monitoring platform should be able to check for the existence of error in the response payload, and potentially then we'd be able to have different alerts based on 500, 200 with error, 200 ok

Copy link

@Suvra19 Suvra19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅. Just a couple of questions inline.

@@ -88,7 +94,7 @@ def edit_uri(uri_id):
if not current_uri:
base.abort(404, 'URI not found')
if current_uri.superseded_by != None:
base.aport(422, 'URI is archived, editing not supported')
base.abort(422, 'URI is archived, editing not supported')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-07-08 at 8 28 34 AM

model.TaskStatus.state == 'error',
model.TaskStatus.task_type == 'datapusher',
model.TaskStatus.last_updated > a_day_ago
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't have to worry about timezone issues here

if query.count() > 0:
data = [model_dictize.task_status_dictize(ts, {'task_status': ts}) for ts in query.all()]
response_msg = json.dumps(data)
return make_response((response_msg, 500, headers))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking if 200 is a better option even when returning datapusher errors 🤔 Would it be confusing if the there is an actual 500 error thrown somewhere in here? I can also see how 500 can be useful as the caller won't have to check the payload before raising alarm or something. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants