Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement legacy semantics for string escaping behavior #85

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

dsnet
Copy link
Collaborator

@dsnet dsnet commented Dec 23, 2024

For doubly escaped strings, the v1 behavior was to apply escaping for EscapeForHTML and EscapeForJS on the first pass, rather than on the second pass. This does lead to unnecessarily longer outputs since escaping on the first pass means that the newly included '\' characters have to be escaped again in the second pass. Doubly escaped strings are already an esoteric feature of v1, so we should preserve the semantic.

Also, when encoding the raw output of MarshalJSON, the v1 behavior was to preserve any pre-existing escape sequences, while still escaping any characters that might need escaping per EscapeForHTML and EscapeForJS. Replicate this behavior.

}
q, _ := jsontext.AppendQuote(nil, b) // cannot fail since b is valid UTF-8
return enc.WriteValue(q)
Copy link
Collaborator Author

@dsnet dsnet Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the escaping for EscapeForHTML and EscapeForJS was deferred to the natural behavior of enc.WriteValue.

However, to replicate existing v1 behavior, we escape earlier in L210 by passing in mo.Flags, which will have the escape flags specified.

b, _ := AppendUnquote(nil, src[:n])
dst, _ = AppendQuote(dst, string(b), flags)
dst, _ = AppendQuote(dst, b, flags)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary casting of []byte to string removed now that AppendQuote is a generic function operating on either type.

@@ -162,6 +162,8 @@ func (v *Value) reformat(canonical, multiline bool, prefix, indent string) error
eo.Flags.Set(jsonflags.AllowInvalidUTF8 | 1)
eo.Flags.Set(jsonflags.AllowDuplicateNames | 1)
eo.Flags.Set(jsonflags.PreserveRawStrings | 1)
eo.Flags.Set(jsonflags.EscapeForHTML | 0) // ensure strings are preserved
eo.Flags.Set(jsonflags.EscapeForJS | 0) // ensure strings are preserved
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 0 by default anyways, but explicitly set to 0 for clarity.

Copy link
Collaborator

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are noticeably slowing down the v2 benchmarks that don't trigger all the v1 logic. Even if the CPU can predict that the v1 branches will never be hit, we still have to refactor the code a bit to make space for them.

arshal_default.go Outdated Show resolved Hide resolved
@dsnet
Copy link
Collaborator Author

dsnet commented Dec 23, 2024

I wonder if we are noticeably slowing down the v2 benchmarks that don't trigger all the v1 logic. Even if the CPU can predict that the v1 branches will never be hit, we still have to refactor the code a bit to make space for them.

Yep... I haven't been running any of the benchmarks lately and I'm somewhat afraid to find out the result...

For doubly escaped strings, the v1 behavior was to apply
escaping for EscapeForHTML and EscapeForJS on the first pass,
rather than on the second pass. This does lead to unnecessarily
longer outputs since escaping on the first pass means that
the newly included '\\' characters have to be escaped again
in the second pass. Doubly escaped strings are already an
esoteric feature of v1, so we should preserve the semantic.

Also, when encoding the raw output of MarshalJSON,
the v1 behavior was to preserve any pre-existing escape sequences,
while still escaping any characters that might need escaping
per EscapeForHTML and EscapeForJS. Replicate this behavior.
@dsnet dsnet merged commit d2142c8 into master Dec 23, 2024
8 checks passed
@dsnet dsnet deleted the legacy-escape branch December 23, 2024 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants