Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BNF importer] invalid issue JSON output #104

Closed
mromanello opened this issue Aug 20, 2020 · 1 comment · Fixed by #127
Closed

[BNF importer] invalid issue JSON output #104

mromanello opened this issue Aug 20, 2020 · 1 comment · Fixed by #127
Assignees
Labels
bug Something isn't working

Comments

@mromanello
Copy link
Member

There are two problems here:

  1. iiif_link should contain the link to JSON manifest, not to the JPG image itself;
  2. the coordinates field c is missing for content items of type image (e.g. excelsior-1910-11-16-a-i0161).

Both problems will break the impress-pyimages script that downloads images via IIIF, cfr. https://github.com/impresso/impresso-images/blob/89bef72675f4bd9ea9a7557cbbadcf337c61a0bf/impresso-pyimages/impresso_images/data_utils.py#L62

Original JSON:

// taken from s3://original-canonical-staging/excelsior/issues/excelsior-1910-issues.jsonl.bz2
{
      "m": {
        "id": "excelsior-1910-11-16-a-i0161",
        "tp": "image",
        "pp": [
          12
        ],
        "t": "Publicité",
        "iiif_link": "https://gallica.bnf.fr/iiif/ark:/12148/bpt6k46000007/f12/2753,4963,3010,3273/full/0/default.jpg"
      }
    }

Expected JSON:

    {
      "m": {
        "id": "excelsior-1910-11-16-a-i0161",
        "tp": "image",
        "pp": [
          12
        ],
        "t": "Publicité",
        "iiif_link": "https://gallica.bnf.fr/iiif/ark:/12148/bpt6k46000007/f12/info.json",
        "c": [
          2753,
          4963,
          3010,
          3273
        ]
      },
}
@mromanello mromanello added the bug Something isn't working label Aug 20, 2020
@piconti piconti self-assigned this Oct 18, 2023
@piconti
Copy link
Member

piconti commented Oct 18, 2023

This issue is linked to issue #105, and BNF-EN importer is also concerned.

  1. For all 3 importer, iiif_link contains the link to the JPG image itself.
  2. The placement of iiif_link and c in the content items fileds is inconsistent accross importers.

While iiif_link is expected to be in the subfield m by the rebuilder [see here], it's not the case for the coordinates c which are expected to be ouside of the subfield.

As a result, the placements of the subfields were uniformized to fit the requirements of the rebuilder for now, and will be corrected along with the rebuilder so that each module (and importer) can be tested independently first.
This correction should take place BEFORE a reingestion of the data corresponding to the concerned importers (BNF, BNF-EN, RERO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants