Merge pull request #45 from TD5/string-interpolation-eep

Add EEP 62:: String interpolation syntax
erlang · Dec 20, 2023 · 201adff · 201adff
2 parents de0bfbc + a360e53
commit 201adff
Showing 1 changed file with 168 additions and 0 deletions.
diff --git a/eep-0062.md b/eep-0062.md
@@ -0,0 +1,168 @@
+    Author: Tom Davies <todavies5(at)gmail(dot)com>
+    Status: Draft
+    Type: Standards Track
+    Created: 1-Jun-2023
+    Post-History:
+****
+EEP 62: String interpolation syntax
+----
+
+Abstract
+========
+
+This EEP proposes new syntax for string interpolation, allowing expressions to be embedded
+into string constants to make constructing compound strings more readable.
+
+For example, the new syntax:
+
+```
+bf"A utf-8 binary string: ~2 + 2~"
+```
+
+would evaluate to:
+
+```
+<<"A utf-8 binary string: 4"/utf8>>
+```
+
+Feature outline
+========
+
+This proposal adds four kinds of string interpolation split over two axes (utf-8 binary or
+unicode codepoint list, and user-facing or developer-facing formatting).
+
+The result are four general classes of syntax with interpolated values:
+
+```
+% binary format
+<<"A utf-8 binary string: 4"/utf8>> =
+  bf"A utf-8 binary string: ~2 + 2~"
+```
+
+```
+% list format
+"A unicode codepoint list string: 4" =
+  lf"A unicode codepoint list string: ~2 + 2~"
+```
+
+```
+% binary debug
+<<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> =
+  bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~"
+```
+
+```
+% list debug
+"A unicode codepoint list string: {4, foo, [x, y, z]}" =
+  ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~"
+```
+
+Arbitrary expressions can be nested inside string interpolation
+substitutions, including variables, function calls, macros and
+even further string interpolation expressions.
+
+Design
+======
+
+Why both list- and binary-strings?
+-----------------------------
+
+In the `string` module from the stdlib, a string is represented by
+`unicode:chardata()`, that is, a list of codepoints, binaries with
+UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.
+
+With this in mind, the list- and binary-oriented string interpolation
+syntaxes accept either type of interpolated value, but the user
+of the interpolation determines whether they want to generate a
+`unicode:char_list()` or `unicode:unicode_binary()` based on which
+kind of interpolation they use (`bf"..."` and `bd"..."` to create
+binaries, or `lf"..."` and `ld"..."` to create lists).
+
+List-strings are most useful for backwards compatibility and convenience.
+Binary-strings are most useful for memory-compactness and IO.
+
+Why user- and developer-oriented strings?
+-----------------------------------------
+
+There are two similar, but distinct cases where developers typically
+want to format strings: when logging/debugging, and when displaying
+data to users.
+
+When logging or debugging, the most important features are typically
+that any kind of term can be printed, and it should round-trip
+losslessly and be read by developers unambiguously. Examples of these
+properties are, for example, retaining runtime type information, e.g.
+keeping strings quoted when formatting them and printing floats
+with full range and resolution.
+
+When displaying to users, the most important features are typically
+that they are always going to be human-readable and cleanly formatted.
+Examples of these properties are, for example, formatting strings
+verbatim, without quotation marks, and not retaining any Erlang-isms
+(e.g. we don't want to be printing Erlang tuples, because they won't
+make much sense to the average application consumer), so we'd rather
+get a `badarg` error to push the developer to make an explicit
+formatting decision.
+
+Why no formatting options?
+--------------------------
+
+Let's consider the two use-cases introduced earlier:
+
+- Logging/debugging: Typically you want to fire-and-forget, giving
+  whatever value you care about to the formatter, and just let it
+  print that value unambiguously, meaning there's no need to tweak
+  formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"`
+- Displaying to users: Typically you want to tightly control formatting,
+  and you probably want to do so in a modular and reusable way. In that
+  case, factoring out your formatting decision to a function, and
+  interpolating the result of that function is probably the best way to
+  go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`.
+
+Notably, nothing in the design and implementation here precludes the
+future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do
+with `io_lib:format` etc. But existing stdlib functions can offer
+similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`,
+and can be factored out into their own reusable functions.
+
+Why not use Elixir's syntax?
+----------------
+
+Elixir uses `#{...}` to introduce an interpolated expression within a string, and it might
+perhaps be convenient to reuse that syntax. Unfortunately, this conflicts with Erlang's
+syntax for maps. Elixir's maps use `%{...}`, so it doesn't have that conflict.
+
+Implementation outline
+==============
+
+To parse interpolated strings, the scanner tracks some additional state
+regarding whether we are currently in an interpolated string, at which
+point it enables the recognition of `~` as the delimiter for
+interpolated expressions, and generates new tokens which represent the
+various components of an interpolated string.
+
+Early during compilation and shell evaluation, interpolated strings are
+desugared into calls to functions from the `io_lib` module, and
+therefore don't impact later stages of compilation or evalution.
+
+Reference Implementation
+========
+
+PR [#7343](https://github.com/erlang/otp/pull/7343)
+
+Backward compatibility
+========
+
+The new string interpolation syntax was not previously valid syntax, so
+tooling supporting the new syntax should be entirely backwards compatible
+with existing source code.
+
+The new syntax will generate calls to new binary-constructing functions
+in the standard library, so BEAM files compiled with this new feature
+will not be compatible with earlier releases.
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.