-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF Serialization - issue with the dct:format property value #231
Comments
The format_mapping object is used by this extension only during the serialization procedure not during the harvesting phase, so if the source you are harvesting is not a CKAN, with this extension properly installed, the problem is not related to the format_mapping. The format_mapping is just used as fallback by this extension, normally the resource's distribution_format is used if valorized. Obviously the format_mapping can be improved making it configurable in some way and adding additional formats to the mapping.
As mentioned above, the harvester does not use the format_mapping, he uses what is defined in the RDF element by the source catalog, so probably is the harvester's code that needs to be investigated for this particular behavior. |
@tdipisa @etj @cezio with the help of a developer we were able to identify the issue which seems to be more complex than expected. The issue involves the
In general we have many OP_DATAPRO in the central registry because we harvest from RDF serializations of PAs that introduce this error. Possible solution to fix the issue In the case 3. we need to take the last part of URI of the EU controlled vocabulary and set Case 2 is more complex. If the data source is not compliant we do not have any EU controlled vocabulary reference, we just have CKAN's BTW: in the CKAN's filters it should be better visualizing the DCAT-AP_IT formats and not the current mess of CKAN which allows anyone to include a text free format if not included in the JSON I pointed out above. |
in the RDF serialization of a dataset, the dct:format property may assume the value OP_DATPRO even if the source catalog correctly indicates the format using the EU controlled vocabulary, as requested by the DCAT-AP_IT specs.
This does not happen if the format of the distribution is CSV for instance. It seems happing during the harvesting phase and for specific formats (e.g., all those related to RDF serializations such as RDF_XML, RDF_TURTLE, RDF_N_TRIPLES, etc.)
Example:
Source Catalogue: Linked Data Platform with metadata compliant with DCAT-AP_IT
In this case the format is OP_DATPRO while in the source catalogue is correctly valorized with the following URI: http://publications.europa.eu/resource/authority/file-type/RDF_N_TRIPLES
It may be a problem of a limited set of format_mapping values https://github.com/geosolutions-it/ckanext-dcatapit/blob/master/ckanext/dcatapit/dcat/profiles.py#L76 ?
In any case, if the source correctly includes the format using the requested controlled vocabulary, no format mapping should be applied. We should simply use what is included in the source catalogue.
The text was updated successfully, but these errors were encountered: