Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest error on some old spatial POLYGON format #3597

Closed
FuhuXia opened this issue Dec 15, 2021 · 4 comments
Closed

Harvest error on some old spatial POLYGON format #3597

FuhuXia opened this issue Dec 15, 2021 · 4 comments
Labels
bug Software defect or bug

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Dec 15, 2021

There are some error when harvesting USDA source at http://www.usda.gov/data.json. It seems to be related to spatial value transformation from old format to new format.

How to reproduce

Harvest usda data.json source. A snapshot of the data.json file is archived for debugging purpose.

Expected behavior

No harvest error related to spatial value.

Actual behavior

Error on some datasets but not others

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

@FuhuXia FuhuXia added the bug Software defect or bug label Dec 15, 2021
@FuhuXia
Copy link
Member Author

FuhuXia commented Dec 15, 2021

fetch-comsumer.log showing some datasets with old spatial value were harvested but some got error.

2021-12-13 16:30:14,686 INFO  [ckanext.geodatagov.logic] Old Spatial found POLYGON ((-87.47314453125 42.442714522269, -87.20947265625 43.758200767076, -86.41845703125 45.355040402316, -88.22021484375 46.2
43501312528, -90.32958984375 47.028014348561, -92.21923828125 46.968070899556, -92.87841796875 46.273885251899, -93.31787109375 45.570793708127, -93.09814453125 44.702825993117, -92.04345703125 43.5674567
7057, -91.03271484375 42.280356984586))
2021-12-13 16:30:14,820 ERROR [ckanext.spatial.plugin] Geometry not valid GeoJSON, not indexing
2021-12-13 16:30:14,869 DEBUG [ckanext.archiver.plugin] Notified of package event: agricultural-land-use-by-field-wisconsin-2010-2019 new
2021-12-13 16:30:14,869 DEBUG [ckanext.archiver.plugin] New package - will archive
2021-12-13 16:30:14,869 DEBUG [ckanext.archiver.plugin] Creating archiver task: agricultural-land-use-by-field-wisconsin-2010-2019
2021-12-13 16:30:14,871 DEBUG [ckanext.archiver.lib] Archival of package put into celery queue priority: agricultural-land-use-by-field-wisconsin-2010-2019
2021-12-13 16:30:14,900 WARNI [ckanext.datajson.datajson_ckan_28] created package agricultural-land-use-by-field-wisconsin-2010-2019 (b04132e6-7be2-4260-8d29-c3ecc98826f0) from http://rpm.tigbox.com/test/
JSON-3532-SPATIAL/data.json
2021-12-13 16:30:14,969 ERROR [ckanext.spatial.plugin] Geometry not valid GeoJSON, not indexing
2021-12-13 16:30:15,031 INFO  [ckanext.harvest.queue] Received harvest object id: e60e7e4b-7201-4df1-a3c8-082f72633c5b
2021-12-13 16:30:15,058 DEBUG [ckanext.datajson.datajson_ckan_28] In <Plugin DataJsonHarvester 'datajson_harvest'> import_stage
2021-12-13 16:30:15,062 DEBUG [ckanext.datajson.datajson_ckan_28] SOURCE CONFIG from DB {u'private_datasets': u'False'}
2021-12-13 16:30:15,069 DEBUG [ckanext.geodatagov.logic] Search backend solr
2021-12-13 16:30:15,198 DEBUG [ckanext.archiver.plugin] Notified of package event: agroecosystem-performance-assessment-tool new
2021-12-13 16:30:15,198 DEBUG [ckanext.archiver.plugin] New package - will archive
2021-12-13 16:30:15,198 DEBUG [ckanext.archiver.plugin] Creating archiver task: agroecosystem-performance-assessment-tool
2021-12-13 16:30:15,200 DEBUG [ckanext.archiver.lib] Archival of package put into celery queue priority: agroecosystem-performance-assessment-tool
2021-12-13 16:30:15,231 WARNI [ckanext.datajson.datajson_ckan_28] created package agroecosystem-performance-assessment-tool (23202d7b-ab19-4d6c-a712-5aa46b38b882) from http://rpm.tigbox.com/test/JSON-3532
-SPATIAL/data.json
2021-12-13 16:30:15,404 INFO  [ckanext.harvest.queue] Received harvest object id: 54e9d5d2-b09b-4c47-856d-e35148a28cee
2021-12-13 16:30:15,423 DEBUG [ckanext.datajson.datajson_ckan_28] In <Plugin DataJsonHarvester 'datajson_harvest'> import_stage
2021-12-13 16:30:15,426 DEBUG [ckanext.datajson.datajson_ckan_28] SOURCE CONFIG from DB {u'private_datasets': u'False'}
2021-12-13 16:30:15,433 DEBUG [ckanext.geodatagov.logic] Search backend solr
2021-12-13 16:30:15,433 INFO  [ckanext.geodatagov.logic] Old Spatial found POLYGON ((-121.93496287 48.97444556, -114.72793162 48.97444556, -114.72793162 44.24519746, -121.93496287 44.24519746))
2021-12-13 16:30:15,433 INFO  [ckanext.geodatagov.logic] New Spatial transformed {"type": "Polygon", "coordinates": [[[POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556]]]}
2021-12-13 16:30:15,478 DEBUG [ckanext.spatial.plugin] Received: '{"type": "Polygon", "coordinates": [[[POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556]]]}'
2021-12-13 16:30:15,478 ERROR [ckanext.datajson.datajson_ckan_28] Failed to create package agroecological-classes-2018 from http://rpm.tigbox.com/test/JSON-3532-SPATIAL/data.json
        {'maintainer': u'Huggins, David', 'name': 'agroecological-classes-2018', 'tags': [{'name': 'environment'}, {'name': 'farming'}, {'name': 'iso-metadata'}], 'notes': u'<p><a href="https://www.reacchpna.org/sites/default/files/AR3_1.2.pdf">https://www.reacchpna.org/sites/default/files/AR3_1.2.pdf</a></p>\n<p>Pixel classification:<br />\nClassification, Stable, Dynamic, Unstable<br />\nUrban, 1, 101, 202<br />\nRangeland, 3, 103, 203<br />\nForest, 4, 104, 204<br />\nWater, 5, 105, 205<br />\nWetlands, 6, 106, 206<br />\nBarren, 7, 107, 207<br />\nWilderness, 9, 109, 209<br />\nAnnual, 11, 111, 211<br />\nTransition, 12, 112, 212<br />\nGrain-fallow, 13, 113, 213<br />\nIrrigated, 14, 114, 214<br />\nOrchard, 15, 115, 215<br />\nAgriculture, 50, 150, 250<br />\nWater and Other, 51, 151, 251</p>\n', 'owner_org': u'bd3b4484-c23d-49ab-942b-7e1121c7b2b3', 'maintainer_email': u'[email protected]', 'state': 'active', 'extras': [{'value': u'19447c58-2850-4919-ad12-ca2a772e4c05', 'key': 'harvest_source_id'}, {'value': u'54e9d5d2-b09b-4c47-856d-e35148a28cee', 'key': 'harvest_object_id'}, {'value': u'json spatial', 'key': 'harvest_source_title'}, {'value': '{"type": "Polygon", "coordinates": [[[POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -121.93496287 44.24519746))], [ -114.72793162 44.24519746,  -114.72793162 48.97444556], [POLYGON ((-121.93496287 48.97444556,  -114.72793162 48.97444556]]]}', 'key': 'spatial'}, {'value': '{"publisher": "Agricultural Research Service", "identifier": "83415cb1-f993-4f61-ac85-e0efcd360bb1", "catalog_describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json", "catalog_@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld", "resource-type": "Dataset", "modified": "2020-08-25", "old-spatial": "POLYGON ((-121.93496287 48.97444556, -114.72793162 48.97444556, -114.72793162 44.24519746, -121.93496287 44.24519746))", "source_schema_version": "1.1", "source_datajson_identifier": true, "programCode": ["005:040"], "bureauCode": ["005:18"], "catalog_conformsTo": "https://project-open-data.cio.gov/v1.1/schema", "accessLevel": "non-public", "source_hash": "4b133a1a68e3e6a56ddd04cb4075a5fe4983ad56"}', 'key': 'extras_rollup'}], 'groups': [{'name': ''}], 'license_id': 'notspecified', 'title': u'Agroecological classes | 2018', 'type': 'dataset', 'resources': []}
        {'spatial': [u'Error decoding JSON object: Expecting value: line 1 column 39 (char 38)']}

Traceback (most recent call last):
  File "/usr/bin/ckan", line 45, in <module>
    load_entry_point('PasteScript', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 237, in command
    utils.fetch_consumer()
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/utils.py", line 351, in fetch_consumer
    fetch_callback(consumer, method, header, body)
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/queue.py", line 491, in fetch_callback
    fetch_and_import_stages(harvester, obj)
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/queue.py", line 509, in fetch_and_import_stages
    success_import = harvester.import_stage(obj)
  File "/usr/lib/ckan-new/src/ckanext-datajson/ckanext/datajson/datajson_ckan_28.py", line 741, in import_stage
    pkg = get_action('package_create')(self.context(), pkg)
  File "/usr/lib/ckan-new/src/ckan/ckan/logic/__init__.py", line 498, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan-new/src/ckanext-geodatagov/ckanext/geodatagov/logic.py", line 510, in package_create
    return up_func(context, data_dict)
  File "/usr/lib/ckan-new/src/ckan/ckan/logic/action/create.py", line 209, in package_create
    item.create(pkg)
  File "/usr/lib/ckan-new/src/ckanext-spatial/ckanext/spatial/plugin.py", line 90, in create
    self.check_spatial_extra(package)
  File "/usr/lib/ckan-new/src/ckanext-spatial/ckanext/spatial/plugin.py", line 115, in check_spatial_extra
    raise p.toolkit.ValidationError(error_dict, error_summary=package_error_summary(error_dict))
ckan.logic.ValidationError: {'spatial': [u'Error decoding JSON object: Expecting value: line 1 column 39 (char 38)']}

@FuhuXia
Copy link
Member Author

FuhuXia commented Dec 15, 2021

This is one of the errors seen in #3532.

@jbrown-xentity
Copy link
Contributor

This was already logged in #3549 . Looks like this may not be a code change, but a breaking data change on USDA side (if this happened in FCS).

@mogul
Copy link
Contributor

mogul commented Dec 16, 2021

Closing in favor of dupe #3549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug
Projects
None yet
Development

No branches or pull requests

3 participants