Escape HTML characters in list of suggestions #17189

dontcallmedom · 2019-06-26T20:11:28Z

Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string

> > fix LeaVerou#16932 > > Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string

LeaVerou

Hi Dom!

Thanks for contributing!

It looks like this is something that could be done much more succinctly, by using a function in the second argument of replace() that creates a <mark> element, sets its textContent, then returns its innerHTML. But it's entirely possible I may be missing something here!

Also, please follow the conventions & code style of the project, namely:

We use smart tabs for indentation (tabs for indentation, spaces for alignment)
Closing brace on its own line
We have a $.create() helper for creating elements and setting attributes on them, it works similarly to the one in Bliss. Its implementation is here.

Use tabs instead of spaces for indentation Use $.create to create DOM elements instead of document.createElement

dontcallmedom · 2019-06-27T13:20:14Z

I have hopefully adjusted the code to the coding style.

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

LeaVerou · 2019-06-27T18:43:55Z

I have hopefully adjusted the code to the coding style.

Thanks!

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

That made me realize that as currently written, this is backwards incompatible: HTML in options works right now, whereas if this is merged, it would stop working and possibly break numerous use cases. Ideally we want something that keeps HTML that was there from the beginning, but doesn't convert special characters to HTML when highlighting them. I.e. something that's XSS-safe, but doesn't restrict what you can do in the initial options. Does that make sense?

dontcallmedom · 2019-06-27T19:31:38Z

re backwards-compatibility, I don't think it can be preserved:

right now, the API is ambiguous about the type of content in label - when rendered in the list options, it is rendered as HTML, but when rendered in the <input> field, it is rendered as plain text (since it goes through the value attribute).
it's not really a matter of highlighting the content of the label with <mark> - the XSS risks exists even when no highlighting happens - if the list of label options contains a malicious <script> (e.g. coming from a third party ajax service, or from an http query parameters), that script will be loaded as soon as the list of labels is displayed

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

LeaVerou · 2019-06-29T17:53:27Z

right now, the API is ambiguous about the type of content in label - when rendered in the list options, it is rendered as HTML, but when rendered in the <input> field, it is rendered as plain text (since it goes through the value attribute).

I believe that's fine, e.g. many use cases are about displaying graphics next to the suggestions.

it's not really a matter of highlighting the content of the label with <mark> - the XSS risks exist even when no highlighting happens - if the list of label options contains a malicious <script> (e.g. coming from a third party ajax service, or from an http query parameters), that script will be loaded as soon as the list of labels is displayed

Good point, that should be prevented (though I'm still not sure if it should be the default at this point). However, if there's HTML in the initial options, included in the page's markup, that should be preserved, no?

I noticed that the current behavior is definitely buggy: If you include <b> in data-list, it will be rendered (correct). If you include <b>, it will still be rendered! That's definitely a bug which needs to be fixed in both cases (regardless of someone opts in to allowing HTML or not).

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

I wonder if it would be an easier transition to make it opt-in in this version, and announce that it's going to be default behavior in the next one, and people should use the option now if they need it.

dontcallmedom · 2019-07-01T11:07:57Z

I have looked a bit more into this, and I think there are the following bugs in the way HTML values or labels are handled:

if you import your list of values/labels from in-markup page, only the text is imported, not the markup; but as you pointed out, that text itself is then interpreted as HTML in the list of suggestions (i.e. <b> gets rendered as a <b> tag)
if you use markup in the list of suggestions, the characters of the markup are included in the matching (e.g. <strong> will be matched when searching str); when matched, the markup gets messed up. Conversely, if the list of suggestions includes in-word markup, the word won't be matched (e.g. <strong>J</strong>avascript, then typing java won't list Javascript as a suggestion)
as I described above, markup used in the list of suggestions doesn't get displayed properly when rendered in the input box
as described above, the combination of the ambiguous API and parsing bugs create XSS risks (e.g. when the suggestions come from unfiltered external input)

Here is how I think HTML in the list of suggestions could be properly supported with reduced XSS risks:

suggestions brought in by text should always be assumed to be text only (e.g. when declared in data-list)
suggestions that need markup rendering should be provided as DOM fragments - which could be done either when they're imported via the CSS selector syntax, or could be done when passed as arrays in JS (e.g. list: [ node1, node2]) - the assumption being that it would then be the responsibility of the developer to produce these fragments from strings if that's the behavior they want (e.g. with DOMParser or innerHTML) - that way, developers that are aware of the risk of innerHTML will have greater awareness of possible XSS risks if the strings are unsafe

This would imply adopting the code to handle differently text and DOM fragments, and converting DOM fragments to text via textContent when setting input.value and when matching user input. I think (but am not sure) the highlight of matching characters in DOM fragments could still be achieved with the Range API.

What do you think? (I'm leaving aside the backwards-compatibility question for now - I feel it will be easier to consider it once the end goal is clearer)

Escape HTML characters in list of suggestions

56a8afa

> > fix LeaVerou#16932 > > Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string

LeaVerou requested changes Jun 27, 2019

View reviewed changes

Align with coding style

f1a7b6c

Use tabs instead of spaces for indentation Use $.create to create DOM elements instead of document.createElement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape HTML characters in list of suggestions #17189

Escape HTML characters in list of suggestions #17189

dontcallmedom commented Jun 26, 2019

LeaVerou left a comment

dontcallmedom commented Jun 27, 2019

LeaVerou commented Jun 27, 2019

dontcallmedom commented Jun 27, 2019

LeaVerou commented Jun 29, 2019

dontcallmedom commented Jul 1, 2019

Escape HTML characters in list of suggestions #17189

Are you sure you want to change the base?

Escape HTML characters in list of suggestions #17189

Conversation

dontcallmedom commented Jun 26, 2019

LeaVerou left a comment

Choose a reason for hiding this comment

dontcallmedom commented Jun 27, 2019

LeaVerou commented Jun 27, 2019

dontcallmedom commented Jun 27, 2019

LeaVerou commented Jun 29, 2019

dontcallmedom commented Jul 1, 2019