-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset lost harvest object when xml file timestamp changed without xml content change #4505
Comments
After conducting local testing, identified the code responsible for the root cause of the issue: https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70. In the testing, above rebuild index call did not refresh Solr from DB, and discovered that calling will investigate alternative methods for rebuilding the index to address this problem. |
Tested the most recent version of Ckan, utilizing only two extensions |
After conducting additional tests, it was observed that by adding Furthermore, an examination of the package dictionary transmitted to Solr, same contents regardless of whether a commit was performed or not. Further investigation is required to determine why package_index.index_package needs a database commit to refresh Solr. |
created upstream issue ckan/ckanext-spatial#324 |
The root cause of the issue is illustrated in the above PR/commit. It shows the query with current=True is getting results with current=False. Could be a CKAN core bug that when querying updated records in an uncommited transaction, wrong results are returned. |
WAF source file timestamp changes without real content change is causing other issues such as #4425, but it also make dataset losing its harvest object on the UI, and potentially it is the biggest contributor to the db-solr-sync workload.
How to reproduce
Modify a XML file timestamp on a WAF souce, reharvest
Expected behavior
No change on the dataset. UI stays the same, no addition workload to db-solr-sync
Actual behavior
See the error in the fetch log
On the UI, dataset lost its harvest souce metadata info
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]
The text was updated successfully, but these errors were encountered: