Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jpeg): Support encoding/decoding arbitrary metadata as comments #4430

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

lukasstockner
Copy link
Contributor

This is needed to port Blender's current JPEG IO code to using OIIO, but is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated key-value pairs and sets metadata accordingly. For writing, this needs to be explicitly enabled by setting jpeg:com_attributes to 1 in order to avoid accidentally bloating files for existing applications.

Tests

I've added a small (~10KB) JPEG file containing Blender metadata and a basic test that parses it, checks that the metadata was read correctly, writes it twice (once with and once without jpeg:com_attributes), and then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still contains one Blender attribute - that's because the first COM field is still put into ImageDescription just like before, so even without jpeg:com_attributes it ends up being written to the output file and recognized during parsing.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

Copy link

linux-foundation-easycla bot commented Sep 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

This is needed to port Blender's current JPEG IO code to using OIIO,
but is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated key-value
pairs and sets metadata accordingly. For writing, this needs to be explicitly
enabled by setting jpeg:com_attributes to 1 in order to avoid accidentally
bloating files for existing applications.

Signed-off-by: Lukas Stockner <[email protected]>
@lgritz
Copy link
Collaborator

lgritz commented Sep 18, 2024

The Mac failures are unrelated and fixed by a different PR that has already been merged.

std::string((const char*)m->data,
m->data_length));
m_spec.attribute("ImageDescription", data);
// Additional string metadata can be stored in JPEG files as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth explicitly starting by spelling out:

// The first COM block encountered will be interpreted as the image description.
// Subsequent COM blocks, if in the form "key:value", ... blah blah

By the way, is this exactly what we want? What if the first COM looks like "key:value", should that always be slotted into ImageDescription?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer only setting ImageDescription if the parsing fails, but I figured that's a potentially breaking change so I kept it safe for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet that most traditional JPEG COM blocks that are true "image comments" are unlikely to have the specific form of "[ident:]string1:string2" where the optional ident (namespace prefix) follows C identifier rules and string1 won't start or end with whitespace. If we interpret only that pattern as metadata and the first COM that doesn't follow the pattern is "ImageDescription".

I'm willing to risk that an occasional "comment" with a quirky format might be incorrectly interpreted as metadata. Especially if there is some kind of OIIO global option that lets you revert to the old behavior (first COM is always ImageDescription), so somebody can get out of a bind if they have a pile of images with the ambiguous formatting of their COM blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the verdict here, @lukasstockner? Do you want to make any more changes, or do you want to keep the logic as-is and we can always revise later if it causes trouble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's fine with you, I'll go ahead and add some more logic to only set the ImageDescription if a global option is set and/or the matched key fails a heuristic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

@lukasstockner Is this ready to merge, or is there more you wanted to do at this stage?

@lukasstockner
Copy link
Contributor Author

@lukasstockner Is this ready to merge, or is there more you wanted to do at this stage?

I still want to implement the ImageDescription logic discussed above. I didn't find time yet, but hope to do so tomorrow.

@lgritz
Copy link
Collaborator

lgritz commented Oct 18, 2024

@lukasstockner Just a reminder, you were going to make a minor adjustment here, and then I think we are ready to merge. You may also need to rebase on the current main and resolve some very minor conflicts that have occurred since you started this PR.

@lgritz
Copy link
Collaborator

lgritz commented Oct 29, 2024

Ping @lukasstockner

@lukasstockner
Copy link
Contributor Author

So sorry for the delay here, I've finally implemented the improvement we discussed.
There's still some ambiguity (e.g., Example: A tree will be parsed as a key-value pair), but now there is a setting to disable that.

Signed-off-by: Lukas Stockner <[email protected]>
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for getting this one across the finish line.

@lgritz lgritz merged commit 24dcdf9 into AcademySoftwareFoundation:main Nov 7, 2024
29 checks passed
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Nov 10, 2024
…cademySoftwareFoundation#4430)

This is needed to port Blender's current JPEG IO code to using OIIO, but
is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated
key-value pairs and sets metadata accordingly. For writing, this needs
to be explicitly enabled by setting jpeg:com_attributes to 1 in order to
avoid accidentally bloating files for existing applications.

Tests:

I've added a small (~10KB) JPEG file containing Blender metadata and a
basic test that parses it, checks that the metadata was read correctly,
writes it twice (once with and once without `jpeg:com_attributes`), and
then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still
contains one Blender attribute - that's because the first COM field is
still put into `ImageDescription` just like before, so even without
`jpeg:com_attributes` it ends up being written to the output file and
recognized during parsing.


---------

Signed-off-by: Lukas Stockner <[email protected]>
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Nov 10, 2024
…cademySoftwareFoundation#4430)

This is needed to port Blender's current JPEG IO code to using OIIO, but
is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated
key-value pairs and sets metadata accordingly. For writing, this needs
to be explicitly enabled by setting jpeg:com_attributes to 1 in order to
avoid accidentally bloating files for existing applications.

Tests:

I've added a small (~10KB) JPEG file containing Blender metadata and a
basic test that parses it, checks that the metadata was read correctly,
writes it twice (once with and once without `jpeg:com_attributes`), and
then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still
contains one Blender attribute - that's because the first COM field is
still put into `ImageDescription` just like before, so even without
`jpeg:com_attributes` it ends up being written to the output file and
recognized during parsing.


---------

Signed-off-by: Lukas Stockner <[email protected]>
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Nov 13, 2024
…cademySoftwareFoundation#4430)

This is needed to port Blender's current JPEG IO code to using OIIO, but
is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated
key-value pairs and sets metadata accordingly. For writing, this needs
to be explicitly enabled by setting jpeg:com_attributes to 1 in order to
avoid accidentally bloating files for existing applications.

Tests:

I've added a small (~10KB) JPEG file containing Blender metadata and a
basic test that parses it, checks that the metadata was read correctly,
writes it twice (once with and once without `jpeg:com_attributes`), and
then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still
contains one Blender attribute - that's because the first COM field is
still put into `ImageDescription` just like before, so even without
`jpeg:com_attributes` it ends up being written to the output file and
recognized during parsing.


---------

Signed-off-by: Lukas Stockner <[email protected]>
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Nov 21, 2024
…cademySoftwareFoundation#4430)

This is needed to port Blender's current JPEG IO code to using OIIO, but
is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated
key-value pairs and sets metadata accordingly. For writing, this needs
to be explicitly enabled by setting jpeg:com_attributes to 1 in order to
avoid accidentally bloating files for existing applications.

Tests:

I've added a small (~10KB) JPEG file containing Blender metadata and a
basic test that parses it, checks that the metadata was read correctly,
writes it twice (once with and once without `jpeg:com_attributes`), and
then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still
contains one Blender attribute - that's because the first COM field is
still put into `ImageDescription` just like before, so even without
`jpeg:com_attributes` it ends up being written to the output file and
recognized during parsing.


---------

Signed-off-by: Lukas Stockner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants