-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allow [
or ]
in XML names.
#187
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an example of a DOCTYPE that was not being parsed correctly before: ``` <!DOCTYPE language[ <!ENTITY nmtoken "[\-\w\d\.:_]+"> <!ENTITY entref "(#[0-9]+|#[xX][0-9A-Fa-f]+|&nmtoken;);"> ]> ``` xml-conduit was parsing `language[` as the root element name. I have kept to the most minimal possible change in this PR, because I don't want to break anything inadvertently. However, the current parser is still far from correct. As I understand it, only a few symbols (`_`, `-`, `.`) are allowed in element names (in addition, `:` can be used for a namespace, but that is supported separately in this parser). The current parser would accept things like `<foo~bar>`.
The CI failures seem unrelated to my change. |
jgm
added a commit
to jgm/skylighting
that referenced
this pull request
Jun 20, 2023
This is to work around a bug in xml-conduit: snoyberg/xml#187
Looks good to me, thank you. I've opened #188 to keep track of the remaining gap with XML names specification. I'll have a look at the CI issue. |
Thanks. It would be great to have a release with this fix! |
Released as |
netbsd-srcmastr
pushed a commit
to NetBSD/pkgsrc
that referenced
this pull request
Oct 30, 2023
0.14 * Add rWeakDeliminators field to Rule. [API change] * Make WordDetect sensitive to weakDeliminator. This fixes parsing of floats beginning with '0.' in C (#174). * Add debiancontrol syntax (#173). 0.13.4.1 * Update syntax definitions: ada, bash, cmake, css, html, isocpp, java, javascript, kotlin, latex, makefile, markdown, php, python, qml, r, sass, scss, typescript, zsh. * Don't require word boundary at end of Int, Float, HlCHex, HlCOct (#170). KDE does not. This fixes things like 7L in R. 0.13.4 * Add dosbat syntax (MS DOS batch file) (#169). * Derive Bounded Instance for TokenType (#168, Pavan Pikhi). Add Bounded to the derived instances for the TokenType type. This allows consumers to use [minBound .. maxBound] to generate a list of all token types when writing a Style. * Require xml-conduit >= 1.9.1.3. This fixes a bug that prevents parsing certain DOCTYPE declarations, e.g. in agda.xml. * Updated cmake syntax definition. 0.13.3 * Add gap language (#167). * Update syntax definitions. * Add patches for agda.xml and dtd.xml, to wor around a bug in xml-conduit: snoyberg/xml#187 * Store compiled regexes in RE (#166, Jonathan Coates). This changes the RE type to (lazily) compile the regex when constructed, rather than in the tokenizer. This allows us to avoid re-compiling regexes for each separate tokenize call, instead sharing them globally. We try to hide the internals of this, exposing the previous interface (RE { reString, reCaseSensitive }) with pattern synonyms. * ConTeXt: fix handling of spaces in non-normal tokens (Albert Krewinkel). This ensures that multiple spaces won't be collapsed into a single space. 0.13.2.1 * Update tango style for new token types (#164). The original tango style didn't have colors defined for many token types that have been added since it was added. This commit updates the style to support them. Thanks to @danbraswell for providing the values needed.
netbsd-srcmastr
pushed a commit
to NetBSD/pkgsrc
that referenced
this pull request
Oct 30, 2023
0.14 * Add rWeakDeliminators field to Rule. [API change] * Make WordDetect sensitive to weakDeliminator. This fixes parsing of floats beginning with '0.' in C (#174). * Add debiancontrol syntax (#173). 0.13.4.1 * Update syntax definitions: ada, bash, cmake, css, html, isocpp, java, javascript, kotlin, latex, makefile, markdown, php, python, qml, r, sass, scss, typescript, zsh. * Don't require word boundary at end of Int, Float, HlCHex, HlCOct (#170). KDE does not. This fixes things like 7L in R. 0.13.4 * Add dosbat syntax (MS DOS batch file) (#169). * Derive Bounded Instance for TokenType (#168, Pavan Pikhi). Add Bounded to the derived instances for the TokenType type. This allows consumers to use [minBound .. maxBound] to generate a list of all token types when writing a Style. * Require xml-conduit >= 1.9.1.3. This fixes a bug that prevents parsing certain DOCTYPE declarations, e.g. in agda.xml. * Updated cmake syntax definition. 0.13.3 * Add gap language (#167). * Update syntax definitions. * Add patches for agda.xml and dtd.xml, to wor around a bug in xml-conduit: snoyberg/xml#187 * Store compiled regexes in RE (#166, Jonathan Coates). This changes the RE type to (lazily) compile the regex when constructed, rather than in the tokenizer. This allows us to avoid re-compiling regexes for each separate tokenize call, instead sharing them globally. We try to hide the internals of this, exposing the previous interface (RE { reString, reCaseSensitive }) with pattern synonyms. * ConTeXt: fix handling of spaces in non-normal tokens (Albert Krewinkel). This ensures that multiple spaces won't be collapsed into a single space. 0.13.2.1 * Update tango style for new token types (#164). The original tango style didn't have colors defined for many token types that have been added since it was added. This commit updates the style to support them. Thanks to @danbraswell for providing the values needed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an example of a DOCTYPE that was not being parsed correctly before:
xml-conduit was parsing
language[
as the root element name.I have kept to the most minimal possible change in this PR, because I don't want to break anything inadvertently. However, the current parser is still far from correct. As I understand it, only a few symbols (
_
,-
,.
) are allowed in element names (in addition,:
can be used for a namespace, but that is supported separately in this parser). The current parser would accept things like<foo~bar>
.