Skip to content

Commit

Permalink
Make verbatim sigils truly verbatim
Browse files Browse the repository at this point in the history
  • Loading branch information
RaimoNiskanen committed Nov 17, 2023
1 parent 963404b commit 025f666
Showing 1 changed file with 45 additions and 30 deletions.
75 changes: 45 additions & 30 deletions eeps/eep-0066.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,9 +201,9 @@ shall be interpreted. The suggested Sigil Types are:

* «`B`»: `unicode:unicode_binary()`, verbatim.

Creates a UTF-8 encoded binary, with verbatim string content
in that only the [end delimiter][] character can be escaped
with a «`\`» character.
Creates a UTF-8 encoded binary, with verbatim string content.
The content ends when the end delimiter is found.
There is no way to escape the end delimiter.

In Elixir this corresponds to the «`~S`» sigil, a [string][4].

Expand All @@ -217,15 +217,15 @@ shall be interpreted. The suggested Sigil Types are:

* «`S`»: `string()`, verbatim.

Creates a Unicode codepoint list, with verbatim string content
in that only the [end delimiter][] character can be escaped
with a «`\`» character.
Creates a Unicode codepoint list, with verbatim string content.
The content ends when the end delimiter is found.
There is no way to escape the end delimiter.

In Elixir this corresponds to the «`~C`» sigil, a [charlist][5].

* «`r`»: regular expression.
* «`R`»: regular expression.

This EEP proposes to not implement regulare expressions yet.
This EEP proposes to not implement regular expressions yet.
It is still unclear how integration with the `re` module
should be done, and if it is worth the effort compared
to just using the «`S`» or the «`B`» Sigil Type.
Expand All @@ -240,11 +240,9 @@ shall be interpreted. The suggested Sigil Types are:
See the [Regular Expressions][] section about the reasoning
behind this proposed term type.

Within the [String Content][], character escape sequences are handled
according to the regular expression rules, and in addition to them
the [end delimiter][] character can be escaped with a «`\`» character.
Between triple-quote delimiters according to [EEP 64][]
there is no end delimiter character to escape.
First the [end delimiter][] is found and within the [String Content][],
character escape sequences are handled according to
the regular expression rules.

The main advantage of a regular expression [Sigil][] is to avoid
the additional escaping of «`\`» that regular erlang strings require.
Expand Down Expand Up @@ -281,17 +279,11 @@ Immediately following the [Sigil Prefix][] is the string start delimiter.
A specific start delimiter character has a corresponding
end delimiter character.

Elixir has got the following start-end delimiter character pairs:
«`()`», «`[]`», «`{}`», and «`<>`», and the following characters
are start delimiters that have themselves as end delimiters:
«`/`», «`|`», «`'`», and «`"`».
The allowed start-end delimiter character pairs are:
«`()`», «`[]`», «`{}`», «`<>`» and «`«»`».

This EEP proposes to so far only implement the regular string
start and end delimiter «`"`» as single character demiliter.
It is the established string delimiter in Erlang and will
create no confusion. The other can be added later
and by not allowing them yet it will still be possible for them
to have different semantics, if we find some good use for that.
The following characters are start delimiters that have themselves
as end delimiters: «`/`», «`|`», «`'`», «`"`» and «`#`».

Triple-quote delimiters are also allowed, that is; a sequence of
3 or more double quote «`"`» characters as described in [EEP 64][].
Expand All @@ -304,6 +296,22 @@ For a triple-quoted string, though, conceptually the end delimiter
doesn't occur in the string's content, so interpreting the string content
does not interfere with finding the end delimiter.

The proposed set of delimiters is the same as in [Elixir][1],
plus «`«»`» and «`#`». They are the characters in [Latin-1][]
that are normally used for bracketing or text quoting,
and those that feel like full height vertikal lines.
Except: «`\`» is too often used for character escaping,
«`` `» and «`´`» look too much like «`'`»,
«`¦`» looks too much like «`|`», and «`#`» is too useful
to *not* include since it in many contexts (shell scripts,
Perl regular expressions) it is a comment character than
is easy to avoid in the [String Content][].

It may not be obvious how to type the «`«`» and «`»`» characters
on some keyboards (US), but there *are* ways that should not
hinder a determined programmer. When using X Compose sequences
it is simply [`Compose`] [`<`] [`<`] and [`Compose`] [`>`] [`>`].

### String Content

Between the start and end [String Delimiters][], all characters
Expand All @@ -330,10 +338,10 @@ of name characters.

The Sigil Suffix may indicate how to interpret the String Content,
for a specific [Sigil Type][].
For example; for the «`~r`» [Sigil Prefix][] (regular expression),
For example; for the «`~R`» [Sigil Prefix][] (regular expression),
the Sigil Suffix is interpreted as short form compile options
such as «`i`» that makes the regular expression character
case insensitive.
case insensitive. For example `~R/^from: /i`.

Things that may have to be performed by the tokenizer, such as
how to handle escape character rules, should not be affected
Expand All @@ -346,7 +354,7 @@ or the parser.

### Regular Expressions

A regular expression sigil «`~r"expression"flags`» should
A regular expression sigil «`~R"expression"flags`» should
be translated to something useful for tools/libraries.
There are at least two ways; [uncompiled regular expressions][],
or [compiled regular expressions][].
Expand Down Expand Up @@ -413,8 +421,8 @@ should represent an *uncompiled* regular expression with compile flags.

The [Vanilla Sigil][] (empty [Sigil Type][]) is not allowed in Elixir.

This EEP proposes to only implement the «`"`» [String Delimiters][],
for starters. Elixir has got a much wider set.
This EEP proposes to add the following [String Delimiters][]
to the set that Elixir has: «`«»`» and «`#`».

The string and binary [Sigil Type][]s are named differently
between the languages, to keep the names consistent within
Expand All @@ -426,6 +434,12 @@ When Elixir allows escape sequences in the [String Content][]
it also allows string interpolation. This EEP proposes to *not*
implement string interpolation in the suggested [Sigil Type][]s.


When Elixir doesn't allow escape sequences in the [String Content][],
it still allows escaping the end delimiter. This EEP proposes
that such strings should be truly verbatim whith no possibility
to escape the end delimiter.

There are small differences in which escape sequences that are implemented
in the languages; Elixir allows escaping of newlines, and has
an escape sequence «`\a`», that Erlang does not have.
Expand All @@ -434,11 +448,12 @@ There are also small differences in how newlines are handled
between «`~S`» heredocs in Elixir and triple-quoted strings in Erlang.
See [EEP 64][].

Details about regular expression sigils, «`~r`», in particular
Details about regular expression sigils, «`~R`», in particular
their [Sigil Suffix][]es remains to be decided in Erlang.
Also, there is a question about escaping the end delimiter or not.

It has not been decided how or even *if* string interpolation
in will be implemented in Erlang, but a [Sigil Suffix][] or
will be implemented in Erlang, but a [Sigil Suffix][] or
new [Sigil Type][]s would most probably be used.

Reference Implementation
Expand Down

0 comments on commit 025f666

Please sign in to comment.