Implement legacy semantics for string escaping behavior #85

dsnet · 2024-12-23T21:02:44Z

For doubly escaped strings, the v1 behavior was to apply escaping for EscapeForHTML and EscapeForJS on the first pass, rather than on the second pass. This does lead to unnecessarily longer outputs since escaping on the first pass means that the newly included '\' characters have to be escaped again in the second pass. Doubly escaped strings are already an esoteric feature of v1, so we should preserve the semantic.

Also, when encoding the raw output of MarshalJSON, the v1 behavior was to preserve any pre-existing escape sequences, while still escaping any characters that might need escaping per EscapeForHTML and EscapeForJS. Replicate this behavior.

dsnet · 2024-12-23T21:04:52Z

arshal_default.go

 			}
+			q, _ := jsontext.AppendQuote(nil, b) // cannot fail since b is valid UTF-8
 			return enc.WriteValue(q)


Previously, the escaping for EscapeForHTML and EscapeForJS was deferred to the natural behavior of enc.WriteValue.

However, to replicate existing v1 behavior, we escape earlier in L210 by passing in mo.Flags, which will have the escape flags specified.

dsnet · 2024-12-23T21:05:19Z

internal/jsonwire/encode.go

 	b, _ := AppendUnquote(nil, src[:n])
-	dst, _ = AppendQuote(dst, string(b), flags)
+	dst, _ = AppendQuote(dst, b, flags)


Unnecessary casting of []byte to string removed now that AppendQuote is a generic function operating on either type.

dsnet · 2024-12-23T21:05:46Z

jsontext/value.go

@@ -162,6 +162,8 @@ func (v *Value) reformat(canonical, multiline bool, prefix, indent string) error
 		eo.Flags.Set(jsonflags.AllowInvalidUTF8 | 1)
 		eo.Flags.Set(jsonflags.AllowDuplicateNames | 1)
 		eo.Flags.Set(jsonflags.PreserveRawStrings | 1)
+		eo.Flags.Set(jsonflags.EscapeForHTML | 0) // ensure strings are preserved
+		eo.Flags.Set(jsonflags.EscapeForJS | 0)   // ensure strings are preserved


These are 0 by default anyways, but explicitly set to 0 for clarity.

mvdan

I wonder if we are noticeably slowing down the v2 benchmarks that don't trigger all the v1 logic. Even if the CPU can predict that the v1 branches will never be hit, we still have to refactor the code a bit to make space for them.

arshal_default.go

dsnet · 2024-12-23T23:04:34Z

I wonder if we are noticeably slowing down the v2 benchmarks that don't trigger all the v1 logic. Even if the CPU can predict that the v1 branches will never be hit, we still have to refactor the code a bit to make space for them.

Yep... I haven't been running any of the benchmarks lately and I'm somewhat afraid to find out the result...

For doubly escaped strings, the v1 behavior was to apply escaping for EscapeForHTML and EscapeForJS on the first pass, rather than on the second pass. This does lead to unnecessarily longer outputs since escaping on the first pass means that the newly included '\\' characters have to be escaped again in the second pass. Doubly escaped strings are already an esoteric feature of v1, so we should preserve the semantic. Also, when encoding the raw output of MarshalJSON, the v1 behavior was to preserve any pre-existing escape sequences, while still escaping any characters that might need escaping per EscapeForHTML and EscapeForJS. Replicate this behavior.

dsnet requested review from johanbrandhorst and mvdan December 23, 2024 21:02

dsnet commented Dec 23, 2024

View reviewed changes

dsnet force-pushed the legacy-escape branch from 44daefa to 78c0b90 Compare December 23, 2024 21:06

mvdan approved these changes Dec 23, 2024

View reviewed changes

arshal_default.go Outdated Show resolved Hide resolved

dsnet force-pushed the legacy-escape branch from 78c0b90 to bf38a24 Compare December 23, 2024 23:34

dsnet merged commit d2142c8 into master Dec 23, 2024
8 checks passed

dsnet deleted the legacy-escape branch December 23, 2024 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement legacy semantics for string escaping behavior #85

Implement legacy semantics for string escaping behavior #85

dsnet commented Dec 23, 2024

dsnet Dec 23, 2024 •

edited

Loading

dsnet Dec 23, 2024

dsnet Dec 23, 2024

mvdan left a comment

dsnet commented Dec 23, 2024

Implement legacy semantics for string escaping behavior #85

Implement legacy semantics for string escaping behavior #85

Conversation

dsnet commented Dec 23, 2024

dsnet Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

dsnet Dec 23, 2024

Choose a reason for hiding this comment

dsnet Dec 23, 2024

Choose a reason for hiding this comment

mvdan left a comment

Choose a reason for hiding this comment

dsnet commented Dec 23, 2024

dsnet Dec 23, 2024 •

edited

Loading