Skip to content

Commit

Permalink
Clarify when Sigils are literal terms allowed in patterns
Browse files Browse the repository at this point in the history
  • Loading branch information
RaimoNiskanen committed Nov 27, 2023
1 parent b10faec commit faaa9a3
Showing 1 changed file with 37 additions and 14 deletions.
51 changes: 37 additions & 14 deletions eeps/eep-0066.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,23 +142,46 @@ In a general sense, a [Sigil][3], is a prefix to a variable
that indicates its *type*, such as `$I` in Basic or Perl,
where `$` is the sigil and `I` is the variable.

Here we define a Sigil as a prefix (and a suffix) to a string literal
that indicates how it should be *interpreted*. The Sigil is
a *syntactic sugar* that creates some Erlang term.
Here we define a Sigil as a prefix (and maybe a suffix) to
a string literal that indicates how it should be *interpreted*.
The Sigil is a *syntactic sugar* that is transformed into
some Erlang term, or expression.

A Sigil string literal consists of:

1. The [Sigil Prefix][], `~` followed by a name that may be empty.
2. The [String Content][] within [String Delimiters][].
3. The [Sigil Suffix][], a name character sequence that may be empty.

### Sigil Transformation

The sigil is transformed early by the tokenizer and the parser
into some other term or expression. Later steps in the
parsing and compilation finds out if the transformation
result is valid.

#### Patterns and Expressions

Where the transformed term is valid depends on what it was
transformed into. For example, if the sigil is transformed
into some other literal term, it would be valid in a pattern.

Should the sigil have become something containing
a function call, then it is only valid in a general
expression, not in a pattern.

#### String Concatenation

Adjacent strings are concatenated by the parser so for example
«`"abc" "def"`» is concatenated to `"abcdef"`.

A Sigil looks like a string with a prefix (and maybe a suffix),
but expands to some term (or expression), so it cannot be subject
to the string concatenation the parser does.
but may be transformed into something other than a string,
so it cannot be subject to string concatenation.

Therefore «`"abc" "def"`» is `"abcdef"` but «`~s"abc" "def"`»
should be illegal, and also all other sequences consisting
of a Sigil of any type, and any other term, in any order.
Therefore «`~s"abc" "def"`» should be illegal, and also all other
sequences consisting of a Sigil of any type, and any other term,
in any order.

### Sigil Prefix

Expand All @@ -173,7 +196,7 @@ shall be interpreted. The suggested Sigil Types are:

* «»: the vanilla (default (empty name)) [Sigil][].

Creates an Erlang `unicode:unicode_binary()`.
Creates a literal Erlang `unicode:unicode_binary()`.
It is a string represented as a UTF-8 encoded binary,
equivalent to applying `unicode:characters_to_binary/1`
on the [String Content][]. The [String Delimiters][]
Expand All @@ -193,31 +216,31 @@ shall be interpreted. The suggested Sigil Types are:

* `b`: `unicode:unicode_binary()`

Creates a UTF-8 encoded binary, handling escape characters
Creates a literal UTF-8 encoded binary, handling escape characters
in the string content. Other features such as string interpolation
will require another Sigil Type or using the [Sigil Suffix][].

In Elixir this corresponds to the `~s` sigil, a [string][4].

* `B`: `unicode:unicode_binary()`, verbatim.

Creates a UTF-8 encoded binary, with verbatim string content.
Creates a literal UTF-8 encoded binary, with verbatim string content.
The content ends when the end delimiter is found.
There is no way to escape the end delimiter.

In Elixir this corresponds to the `~S` sigil, a [string][4].

* `s`: `string()`.

Creates a Unicode codepoint list, handling escape characters
Creates a literal Unicode codepoint list, handling escape characters
in the string content. Other features such as string interpolation
will require another Sigil Type or using the [Sigil Suffix][].

In Elixir this corresponds to the `~c` sigil, a [charlist][5].

* `S`: `string()`, verbatim.

Creates a Unicode codepoint list, with verbatim string content.
Creates a literal Unicode codepoint list, with verbatim string content.
The content ends when the end delimiter is found.
There is no way to escape the end delimiter.

Expand All @@ -230,7 +253,7 @@ shall be interpreted. The suggested Sigil Types are:
should be done, and if it is worth the effort compared
to just using the `S` or the `B` Sigil Type.

The best idea so far was that this sigil creates a term
The best idea so far was that this sigil creates a literal term
`{re,RE::unicode:charlist(),Flags::[unicode:latin1_char()]}`
that is an uncompiled regular expression with compile flags,
suitable for (yet to be implemented) functions in the `re` module.
Expand Down

0 comments on commit faaa9a3

Please sign in to comment.