From 49095fe24e6648ce5c942f4f1413a1e388b7f13d Mon Sep 17 00:00:00 2001 From: Tom Date: Thu, 1 Jun 2023 15:53:48 +0100 Subject: [PATCH 1/2] Create eep-0062.md: String interpolation syntax --- eep-0062.md | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 168 insertions(+) create mode 100644 eep-0062.md diff --git a/eep-0062.md b/eep-0062.md new file mode 100644 index 0000000..6051905 --- /dev/null +++ b/eep-0062.md @@ -0,0 +1,168 @@ + Author: Tom Davies + Status: Draft + Type: Standards Track + Created: 1-Jun-2023 + Post-History: +**** +EEP 61: String interpolation syntax +---- + +Abstract +======== + +This EEP proposes new syntax for string interpolation, allowing expressions to be embedded +into string constants to make constructing compound strings more readable. + +For example, the new syntax: + +``` +bf"A utf-8 binary string: ~2 + 2~" +``` + +would evaluate to: + +``` +<<"A utf-8 binary string: 4"/utf8>> +``` + +Feature outline +======== + +This proposal adds four kinds of string interpolation split over two axes (utf-8 binary or +unicode codepoint list, and user-facing or developer-facing formatting). + +The result are four general classes of syntax with interpolated values: + +``` +% binary format +<<"A utf-8 binary string: 4"/utf8>> = + bf"A utf-8 binary string: ~2 + 2~" +``` + +``` +% list format +"A unicode codepoint list string: 4" = + lf"A unicode codepoint list string: ~2 + 2~" +``` + +``` +% binary debug +<<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> = + bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~" +``` + +``` +% list debug +"A unicode codepoint list string: {4, foo, [x, y, z]}" = + ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~" +``` + +Arbitrary expressions can be nested inside string interpolation +substitutions, including variables, function calls, macros and +even further string interpolation expressions. + +Design +====== + +Why both list- and binary-strings? +----------------------------- + +In the `string` module from the stdlib, a string is represented by +`unicode:chardata()`, that is, a list of codepoints, binaries with +UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two. + +With this in mind, the list- and binary-oriented string interpolation +syntaxes accept either type of interpolated value, but the user +of the interpolation determines whether they want to generate a +`unicode:char_list()` or `unicode:unicode_binary()` based on which +kind of interpolation they use (`bf"..."` and `bd"..."` to create +binaries, or `lf"..."` and `ld"..."` to create lists). + +List-strings are most useful for backwards compatibility and convenience. +Binary-strings are most useful for memory-compactness and IO. + +Why user- and developer-oriented strings? +----------------------------------------- + +There are two similar, but distinct cases where developers typically +want to format strings: when logging/debugging, and when displaying +data to users. + +When logging or debugging, the most important features are typically +that any kind of term can be printed, and it should round-trip +losslessly and be read by developers unambiguously. Examples of these +properties are, for example, retaining runtime type information, e.g. +keeping strings quoted when formatting them and printing floats +with full range and resolution. + +When displaying to users, the most important features are typically +that they are always going to be human-readable and cleanly formatted. +Examples of these properties are, for example, formatting strings +verbatim, without quotation marks, and not retaining any Erlang-isms +(e.g. we don't want to be printing Erlang tuples, because they won't +make much sense to the average application consumer), so we'd rather +get a `badarg` error to push the developer to make an explicit +formatting decision. + +Why no formatting options? +-------------------------- + +Let's consider the two use-cases introduced earlier: + +- Logging/debugging: Typically you want to fire-and-forget, giving + whatever value you care about to the formatter, and just let it + print that value unambiguously, meaning there's no need to tweak + formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"` +- Displaying to users: Typically you want to tightly control formatting, + and you probably want to do so in a modular and reusable way. In that + case, factoring out your formatting decision to a function, and + interpolating the result of that function is probably the best way to + go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`. + +Notably, nothing in the design and implementation here precludes the +future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do +with `io_lib:format` etc. But existing stdlib functions can offer +similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`, +and can be factored out into their own reusable functions. + +Why not use Elixir's syntax? +---------------- + +Elixir uses `#{...}` to introduce an interpolated expression within a string, and it might +perhaps be convenient to reuse that syntax. Unfortunately, this conflicts with Erlang's +syntax for maps. Elixir's maps use `%{...}`, so it doesn't have that conflict. + +Implementation outline +============== + +To parse interpolated strings, the scanner tracks some additional state +regarding whether we are currently in an interpolated string, at which +point it enables the recognition of `~` as the delimiter for +interpolated expressions, and generates new tokens which represent the +various components of an interpolated string. + +Early during compilation and shell evaluation, interpolated strings are +desugared into calls to functions from the `io_lib` module, and +therefore don't impact later stages of compilation or evalution. + +Reference Implementation +======== + +PR [#7343](https://github.com/erlang/otp/pull/7343) + +Backward compatibility +======== + +The new string interpolation syntax was not previously valid syntax, so +tooling supporting the new syntax should be entirely backwards compatible +with existing source code. + +The new syntax will generate calls to new binary-constructing functions +in the standard library, so BEAM files compiled with this new feature +will not be compatible with earlier releases. + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive. From a360e5314f83d12950da8f4d6bfa3aec65698638 Mon Sep 17 00:00:00 2001 From: Tom Date: Thu, 1 Jun 2023 16:15:40 +0100 Subject: [PATCH 2/2] Fix EEP number --- eep-0062.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eep-0062.md b/eep-0062.md index 6051905..b099005 100644 --- a/eep-0062.md +++ b/eep-0062.md @@ -4,7 +4,7 @@ Created: 1-Jun-2023 Post-History: **** -EEP 61: String interpolation syntax +EEP 62: String interpolation syntax ---- Abstract