Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure parser error snippets are valid UTF-8 #756

Merged
merged 1 commit into from
Feb 26, 2025

Conversation

byroot
Copy link
Member

@byroot byroot commented Feb 25, 2025

Fix: #755

Error messages now include a snippet of the document that doesn't parse to help locate the issue, however the way it was done wasn't UTF-8 aware, and it could result in exception messages with truncated characters.

It would be nice to go a bit farther and actually support codepoints, but it's a lot of complexity to do it in C, perhaps if we move that logic to Ruby given it's not a performance sensitive codepath.

Fix: ruby#755

Error messages now include a snippet of the document
that doesn't parse to help locate the issue, however
the way it was done wasn't UTF-8 aware, and it could
result in exception messages with truncated characters.

It would be nice to go a bit farther and actually support
codepoints, but it's a lot of complexity to do it in C,
perhaps if we move that logic to Ruby given it's not a
performance sensitive codepath.
@byroot byroot merged commit f3e1136 into ruby:master Feb 26, 2025
33 checks passed
@byroot byroot deleted the utf8-snippets branch February 26, 2025 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multibyte characters in ParserError exception messages
1 participant