Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Java string Unicode supplementary plane codepoints incomplete #17

Open
Dani-Hub opened this issue Apr 30, 2024 · 2 comments

Comments

@Dani-Hub
Copy link

Dani-Hub commented Apr 30, 2024

In our application we are using the jibx 1.4.2 marshalling API in the following manner (error handling omitted for brevity):

final IBindingFactory factory = BindingDirectory.getFactory(SomeClass.class);
final IMarshallingContext mctx = factory.createMarshallingContext();
mctx.setIndent(4); // pretty print
mctx.startDocument("UTF-8", null, new FileOutputStream(settingsFile));
mctx.marshalDocument(settings);

When attempting to serialize a structure SomeClass using a value-style="attribute" for String field which contains a character belonging to the Unicode supplementary plane such as "\ud83d\udc27" this causes jibx to throw an exception:

Caused by: java.io.IOException: Illegal character code 0xd83e in attribute value text

A similar issue was fixed shortly before jibx 1.4.2 has been released through this commit, but the fix was only applied to the ICharacterEscaper implementations. Problem is that the above shown example effectively calls org.jibx.runtime.impl.MarshallingContext.startDocument(String enc, Boolean alone, OutputStream outs), which delegates to org.jibx.runtime.impl.MarshallingContext.setOutput(OutputStream outs, String enc). This method begins with a special handling for detecting the encoding types "UTF-8" and "ISO-8859-1" and calls setOutput(outs, enc, createEscaper(enc)) affected by the fix only otherwise. The special handling for "UTF-8" and "ISO-8859-1" invoke org.jibx.runtime.impl.UTF8StreamWriter and org.jibx.runtime.impl.ISO88591StreamWriter, respectively, but both stream writers use writeAttributeText in this case where an equivalent fix has not been applied (Their writeTextContent and writeCData methods seem to be affected by the issue as well).

@Dani-Hub
Copy link
Author

Dani-Hub commented May 3, 2024

A PULL request has been provided. I would appreciate a review of #18

1 similar comment
@Dani-Hub
Copy link
Author

Dani-Hub commented May 5, 2024

A PULL request has been provided. I would appreciate a review of #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant