Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bool $doubleEncode optional param to escapeHtml() #54

Closed
wants to merge 2 commits into from

Conversation

objecttothis
Copy link

@objecttothis objecttothis commented Mar 25, 2024

Allows $doubleEncode to be sent to htmlspecialchars() in html context.

Q A
Documentation no
Bugfix no
BC Break no
New Feature yes
RFC yes
QA no

Description

Tell us about why this change is necessary:

  • Are you adding something the library currently does not support?

Yes

  • Why should it be added?

current behavior in html context causes & & to be encoded as & &. htmlspecialchars() has a fourth optional parameter bool $double_encode = true, so by default it's double encoding.

  • What will it enable?

This change will allow users to call escapeHtml() with a 2nd (optional) parameter which tells it not to double encode parts of the string which are already encoded.

  • How will the code be used?

escapeHtml('& &', false); would yield & &. escapeHtml('& &'); would still yield & &

  • TARGET THE NEXT MINOR BRANCH OR THE NEXT MAJOR IF BC WILL BE BROKEN.

BC will not be broken because the 2nd parameter is an optional parameter equal to current behavior of double encoding. 2.14.x targeted

Allows $double_encode to be sent to htmlspecialchars(). This is a non-breaking change.

Signed-off-by: objecttothis <[email protected]>
@objecttothis objecttothis changed the title Allow bool $double_encode optional param to escapeHtml() Allow bool $doubleEncode optional param to escapeHtml() Mar 25, 2024
Copy link
Member

@Ocramius Ocramius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current behavior in html context causes &amp; & to be encoded as &amp;amp; &amp;. htmlspecialchars() has a fourth optional parameter bool $double_encode = true, so by default it's double encoding.

I don't think we should do this at all: &amp; & being encoded as &amp;amp; &amp; is expected behavior.

I wouldn't put my fire on laminas-escaper being fully bi-directional as it is, but we should strive for it.

$string === decode(encode($string)) is what we should kinda have, at logical constraint.

@Ocramius Ocramius self-assigned this Mar 25, 2024
@Ocramius Ocramius added the Invalid This doesn't seem right label Mar 25, 2024
@Ocramius Ocramius closed this Mar 25, 2024
@Ocramius
Copy link
Member

To be clear, encoding is a bi-directional process: you encode and de-code.

The fact that you skip re-encoding means that decoding will lead to invalid conversions, breaking this bidirectionality.

@objecttothis
Copy link
Author

To be clear, encoding is a bi-directional process: you encode and de-code.

The fact that you skip re-encoding means that decoding will lead to invalid conversions, breaking this bidirectionality.

Why does PHP offer this exact functionality in htmlspecialchars() and htmlentities() if not for this exact situation where double encoding breaks decoding?

@Ocramius
Copy link
Member

PHP is full of quirks and bad ideas: it's not a good design reference, but it cannot easily remove arguments, once they are introduced, mostly for backwards compatibility.

I traced back the last time this argument handling was refactored (not introduced: it's likely much older) to ~14y ago in php/php-src@91727cb

Be aware that these php-src arguments were already present when this Laminas component was designed.

@objecttothis
Copy link
Author

objecttothis commented Mar 25, 2024

$string === decode(encode($string)) is what we should kinda have, at logical constraint.

This is what I have in my workaround code. It's problem is that it either html encodes the entire string or none at all. I think I understand your argument for bi-directionality of laminas-escaper in all aspects to mean that with my addition

$foo = '&amp; &';
$sameThing = htmlspecialchars_decode(escapeHtml($foo, false)')) === $foo;

$sameThing would be false because the result of the decode would be & & not &amp; & Is this what you mean?

Please humor me for a moment. Imagine I'm you before you learned that it was bad practice and explain why it's bad design in this context:

I have <div>Description: <?= $htmlspecialchars($foo, ENT_QUOTES, 'utf-8') ?></div> where $foo is &amp; &. That's an odd description, but it's an example of partially encoded text that wouldn't be uncommon to contain those characters if users are going to use the software. In my code here I end up with Description: &amp;amp; &amp; and now we have malformed html entities. If instead I write $htmlspecialchars($foo, ENT_QUOTES, 'utf-8, false) I'm left with Description: &amp; &amp; and no more malformed html entities.

Now if I don't have $double_encode (as is the case with current escapteHtml() I have to write a wrapper function with a callback and regex to test if each found string is an encoded entity and encode it if it's not already encoded in order to avoid double_encoding the whole string.

@Ocramius
Copy link
Member

where $foo is &amp; &.
In my code here I end up with Description: &amp;amp; &amp;
and now we have malformed html entities

The user sees the text &amp; & rendered on their screen / printed label / form field / etc.: this is intentional/expected/correct.

Your content is either text (and should be escaped), or it is HTML for which you'd heavily sanitize before keeping anything as-is.

Anything in-between is not part of this component: it's a wasteland that nobody should traverse :-)
If you want to interpret &amp; & as half-HTML, half-text (where you want to keep the special characters), that's the job for a (very heavily restricted) HTML sanitizer, not for an escaper.

@objecttothis objecttothis deleted the patch-1 branch March 26, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Invalid This doesn't seem right Unit Test Needed Won't Fix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants