Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape HTML characters in list of suggestions #17189

Open
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from

Conversation

dontcallmedom
Copy link

fix #16932

Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string

>
> fix LeaVerou#16932
>
> Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string
Copy link
Owner

@LeaVerou LeaVerou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dom!

Thanks for contributing!

It looks like this is something that could be done much more succinctly, by using a function in the second argument of replace() that creates a <mark> element, sets its textContent, then returns its innerHTML. But it's entirely possible I may be missing something here!

Also, please follow the conventions & code style of the project, namely:

  • We use smart tabs for indentation (tabs for indentation, spaces for alignment)
  • Closing brace on its own line
  • We have a $.create() helper for creating elements and setting attributes on them, it works similarly to the one in Bliss. Its implementation is here.

Use tabs instead of spaces for indentation
Use $.create to create DOM elements instead of document.createElement
@dontcallmedom
Copy link
Author

I have hopefully adjusted the code to the coding style.

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

@LeaVerou
Copy link
Owner

I have hopefully adjusted the code to the coding style.

Thanks!

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

That made me realize that as currently written, this is backwards incompatible: HTML in options works right now, whereas if this is merged, it would stop working and possibly break numerous use cases. Ideally we want something that keeps HTML that was there from the beginning, but doesn't convert special characters to HTML when highlighting them. I.e. something that's XSS-safe, but doesn't restrict what you can do in the initial options. Does that make sense?

@dontcallmedom
Copy link
Author

re backwards-compatibility, I don't think it can be preserved:

  • right now, the API is ambiguous about the type of content in label - when rendered in the list options, it is rendered as HTML, but when rendered in the <input> field, it is rendered as plain text (since it goes through the value attribute).
  • it's not really a matter of highlighting the content of the label with <mark> - the XSS risks exists even when no highlighting happens - if the list of label options contains a malicious <script> (e.g. coming from a third party ajax service, or from an http query parameters), that script will be loaded as soon as the list of labels is displayed

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

@LeaVerou
Copy link
Owner

right now, the API is ambiguous about the type of content in label - when rendered in the list options, it is rendered as HTML, but when rendered in the <input> field, it is rendered as plain text (since it goes through the value attribute).

I believe that's fine, e.g. many use cases are about displaying graphics next to the suggestions.

it's not really a matter of highlighting the content of the label with <mark> - the XSS risks exist even when no highlighting happens - if the list of label options contains a malicious <script> (e.g. coming from a third party ajax service, or from an http query parameters), that script will be loaded as soon as the list of labels is displayed

Good point, that should be prevented (though I'm still not sure if it should be the default at this point). However, if there's HTML in the initial options, included in the page's markup, that should be preserved, no?

I noticed that the current behavior is definitely buggy: If you include <b> in data-list, it will be rendered (correct). If you include &lt;b&gt;, it will still be rendered! That's definitely a bug which needs to be fixed in both cases (regardless of someone opts in to allowing HTML or not).

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

I wonder if it would be an easier transition to make it opt-in in this version, and announce that it's going to be default behavior in the next one, and people should use the option now if they need it.

@dontcallmedom
Copy link
Author

I have looked a bit more into this, and I think there are the following bugs in the way HTML values or labels are handled:

  • if you import your list of values/labels from in-markup page, only the text is imported, not the markup; but as you pointed out, that text itself is then interpreted as HTML in the list of suggestions (i.e. &lt;b&gt; gets rendered as a <b> tag)
  • if you use markup in the list of suggestions, the characters of the markup are included in the matching (e.g. <strong> will be matched when searching str); when matched, the markup gets messed up. Conversely, if the list of suggestions includes in-word markup, the word won't be matched (e.g. <strong>J</strong>avascript, then typing java won't list Javascript as a suggestion)
  • as I described above, markup used in the list of suggestions doesn't get displayed properly when rendered in the input box
  • as described above, the combination of the ambiguous API and parsing bugs create XSS risks (e.g. when the suggestions come from unfiltered external input)

Here is how I think HTML in the list of suggestions could be properly supported with reduced XSS risks:

  • suggestions brought in by text should always be assumed to be text only (e.g. when declared in data-list)
  • suggestions that need markup rendering should be provided as DOM fragments - which could be done either when they're imported via the CSS selector syntax, or could be done when passed as arrays in JS (e.g. list: [ node1, node2]) - the assumption being that it would then be the responsibility of the developer to produce these fragments from strings if that's the behavior they want (e.g. with DOMParser or innerHTML) - that way, developers that are aware of the risk of innerHTML will have greater awareness of possible XSS risks if the strings are unsafe

This would imply adopting the code to handle differently text and DOM fragments, and converting DOM fragments to text via textContent when setting input.value and when matching user input. I think (but am not sure) the highlight of matching characters in DOM fragments could still be achieved with the Range API.

What do you think? (I'm leaving aside the backwards-compatibility question for now - I feel it will be easier to consider it once the end goal is clearer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not processing text containing html tags properly
2 participants