-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog harvest error not reported #3532
Comments
The direct cause for Did a SOLR reindex on that dataset, |
@FuhuXia will do some digging to figure out how often this is happening so we have a better idea how to prioritize it. |
Survey did for last 60 days. There are four types of errors, total 83 occurrences, that will halt gather process but not error reported. 38 occurrences of ValidationError
Detailed error message and DOL sample given above. 21 occurrences of ValidationError
Sample harvest report: 22 occurrences of TypeError
Sample harvest report: 2 occurrences of TypeError
Sample harvest report: |
Another scenario caught. When a record in data.json harvest source contains non-ascii char in the identifier (such as Error message in gather log.
|
Another scenario caught when harvesting an arcgis source on sandbox. https://soa-dnr.maps.arcgis.com/sharing/search?f=pjson&q=test&num=1&start=0 Error message in gather log.
|
@FuhuXia A few questions:
My understanding of the harvesting error catching:
Overall, what is the desired solution? |
The errors with the |
Errors to be fixed:
|
|
@FuhuXia will try to find some harvest sources to verify that it's fixed |
The error
|
The error
|
Document the steps to locate unhandled harvesting errors in the new relic. In gather process, use this query
In fetch process, use same query to get the count in a certain timeframe, the count should be to sum of all known captured error.
Or on the UI/DB, if a harvest job last for 24 hourd then got force-finished by the CKAN timeout setting |
Some harvesting error in gather stage is silently ignored. The harvest report shows 0 change 0 error.
How to reproduce
run harvest job on DOL https://admin-catalog-next.data.gov/harvest/about/dol-json
There are some changes in the harvest source data.json file http://www.dol.gov/data.json
Expected behavior
Harvest report should report some updates
Actual behavior
Harvest reports
0 added 0 updated 0 deleted 0 not modified
Saw error on
/var/log/gather-consumer.log
:Sketch
Fix errorin [Task] Re-sync bad data causing harvest failure #3546[u'Duplicate key "harvest_object_id"']
The text was updated successfully, but these errors were encountered: