-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #45 from TD5/string-interpolation-eep
Add EEP 62:: String interpolation syntax
- Loading branch information
Showing
1 changed file
with
168 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
Author: Tom Davies <todavies5(at)gmail(dot)com> | ||
Status: Draft | ||
Type: Standards Track | ||
Created: 1-Jun-2023 | ||
Post-History: | ||
**** | ||
EEP 62: String interpolation syntax | ||
---- | ||
|
||
Abstract | ||
======== | ||
|
||
This EEP proposes new syntax for string interpolation, allowing expressions to be embedded | ||
into string constants to make constructing compound strings more readable. | ||
|
||
For example, the new syntax: | ||
|
||
``` | ||
bf"A utf-8 binary string: ~2 + 2~" | ||
``` | ||
|
||
would evaluate to: | ||
|
||
``` | ||
<<"A utf-8 binary string: 4"/utf8>> | ||
``` | ||
|
||
Feature outline | ||
======== | ||
|
||
This proposal adds four kinds of string interpolation split over two axes (utf-8 binary or | ||
unicode codepoint list, and user-facing or developer-facing formatting). | ||
|
||
The result are four general classes of syntax with interpolated values: | ||
|
||
``` | ||
% binary format | ||
<<"A utf-8 binary string: 4"/utf8>> = | ||
bf"A utf-8 binary string: ~2 + 2~" | ||
``` | ||
|
||
``` | ||
% list format | ||
"A unicode codepoint list string: 4" = | ||
lf"A unicode codepoint list string: ~2 + 2~" | ||
``` | ||
|
||
``` | ||
% binary debug | ||
<<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> = | ||
bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~" | ||
``` | ||
|
||
``` | ||
% list debug | ||
"A unicode codepoint list string: {4, foo, [x, y, z]}" = | ||
ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~" | ||
``` | ||
|
||
Arbitrary expressions can be nested inside string interpolation | ||
substitutions, including variables, function calls, macros and | ||
even further string interpolation expressions. | ||
|
||
Design | ||
====== | ||
|
||
Why both list- and binary-strings? | ||
----------------------------- | ||
|
||
In the `string` module from the stdlib, a string is represented by | ||
`unicode:chardata()`, that is, a list of codepoints, binaries with | ||
UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two. | ||
|
||
With this in mind, the list- and binary-oriented string interpolation | ||
syntaxes accept either type of interpolated value, but the user | ||
of the interpolation determines whether they want to generate a | ||
`unicode:char_list()` or `unicode:unicode_binary()` based on which | ||
kind of interpolation they use (`bf"..."` and `bd"..."` to create | ||
binaries, or `lf"..."` and `ld"..."` to create lists). | ||
|
||
List-strings are most useful for backwards compatibility and convenience. | ||
Binary-strings are most useful for memory-compactness and IO. | ||
|
||
Why user- and developer-oriented strings? | ||
----------------------------------------- | ||
|
||
There are two similar, but distinct cases where developers typically | ||
want to format strings: when logging/debugging, and when displaying | ||
data to users. | ||
|
||
When logging or debugging, the most important features are typically | ||
that any kind of term can be printed, and it should round-trip | ||
losslessly and be read by developers unambiguously. Examples of these | ||
properties are, for example, retaining runtime type information, e.g. | ||
keeping strings quoted when formatting them and printing floats | ||
with full range and resolution. | ||
|
||
When displaying to users, the most important features are typically | ||
that they are always going to be human-readable and cleanly formatted. | ||
Examples of these properties are, for example, formatting strings | ||
verbatim, without quotation marks, and not retaining any Erlang-isms | ||
(e.g. we don't want to be printing Erlang tuples, because they won't | ||
make much sense to the average application consumer), so we'd rather | ||
get a `badarg` error to push the developer to make an explicit | ||
formatting decision. | ||
|
||
Why no formatting options? | ||
-------------------------- | ||
|
||
Let's consider the two use-cases introduced earlier: | ||
|
||
- Logging/debugging: Typically you want to fire-and-forget, giving | ||
whatever value you care about to the formatter, and just let it | ||
print that value unambiguously, meaning there's no need to tweak | ||
formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"` | ||
- Displaying to users: Typically you want to tightly control formatting, | ||
and you probably want to do so in a modular and reusable way. In that | ||
case, factoring out your formatting decision to a function, and | ||
interpolating the result of that function is probably the best way to | ||
go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`. | ||
|
||
Notably, nothing in the design and implementation here precludes the | ||
future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do | ||
with `io_lib:format` etc. But existing stdlib functions can offer | ||
similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`, | ||
and can be factored out into their own reusable functions. | ||
|
||
Why not use Elixir's syntax? | ||
---------------- | ||
|
||
Elixir uses `#{...}` to introduce an interpolated expression within a string, and it might | ||
perhaps be convenient to reuse that syntax. Unfortunately, this conflicts with Erlang's | ||
syntax for maps. Elixir's maps use `%{...}`, so it doesn't have that conflict. | ||
|
||
Implementation outline | ||
============== | ||
|
||
To parse interpolated strings, the scanner tracks some additional state | ||
regarding whether we are currently in an interpolated string, at which | ||
point it enables the recognition of `~` as the delimiter for | ||
interpolated expressions, and generates new tokens which represent the | ||
various components of an interpolated string. | ||
|
||
Early during compilation and shell evaluation, interpolated strings are | ||
desugared into calls to functions from the `io_lib` module, and | ||
therefore don't impact later stages of compilation or evalution. | ||
|
||
Reference Implementation | ||
======== | ||
|
||
PR [#7343](https://github.com/erlang/otp/pull/7343) | ||
|
||
Backward compatibility | ||
======== | ||
|
||
The new string interpolation syntax was not previously valid syntax, so | ||
tooling supporting the new syntax should be entirely backwards compatible | ||
with existing source code. | ||
|
||
The new syntax will generate calls to new binary-constructing functions | ||
in the standard library, so BEAM files compiled with this new feature | ||
will not be compatible with earlier releases. | ||
|
||
Copyright | ||
========= | ||
|
||
This document is placed in the public domain or under the CC0-1.0-Universal | ||
license, whichever is more permissive. |