Skip to content

This code reproduces the bug in JDK internal XSLT transformer

Notifications You must be signed in to change notification settings

dkBrazz/reproduce-jdk-xslt-bug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

https://bugs.openjdk.org/browse/JDK-8207760 - already resolved

Reproduce bug in JDK XSLT Transformer

And exception "SAXException: Invalid UTF-16 surrogate detected: d83c ?"

When being processed, XML stream is split by chunks of 1024 bytes
If a multi-char symbol (e.g. emoji) is on the edge between two chunks then the first chunk is ended with the first char of the symbol and the second chunk is started with the second char of the symbol.
In the given example we have a "fallen leaf" Unicode symbol (https://www.compart.com/en/unicode/U+1F342). In the UTF-16 representation it consists of two chars - 0xD83C and 0xDF42. When the second char is carried to the next chunk the first char 0xD83C is recognized as a single invalid character

About

This code reproduces the bug in JDK internal XSLT transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published