From 97f88538bdfad5e25d37579a6e5cbb3181125a98 Mon Sep 17 00:00:00 2001 From: Jesse Gumm Date: Tue, 22 Oct 2024 08:22:03 -0500 Subject: [PATCH] Update record names as reserved words/variables and new record definition syntax (#64) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Record names as reserved_words/Variables and new record definition syntax Co-authored-by: José Valim Co-authored-by: Raimo Niskanen --- eeps/eep-0072.md | 251 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 eeps/eep-0072.md diff --git a/eeps/eep-0072.md b/eeps/eep-0072.md new file mode 100644 index 0000000..8378485 --- /dev/null +++ b/eeps/eep-0072.md @@ -0,0 +1,251 @@ + Author: Jesse Gumm + Status: Draft + Type: Standards Track + Created: 08-Oct-2024 + Post-History: +**** +EEP 72: Reserved words and Variables as record names, and enhancement to definition syntax +---- + +Abstract +======== + +This EEP loosens some of the restrictions around record names to make it no +longer necessary to quote them when they are named with reserved words (`#if` +vs `#'if'`) or words with capitalized first characters (terms that currently +would be treated as variables, for example `#Hello` vs `#'Hello'`). + +This EEP also proposes to add a new record-like syntax to the record +definitions (also adopting the above syntactical changes), so that the +following record definitions would be valid and identical: + +```erlang +-record('div', {a :: integer(), b :: integer()}). +-record #div{a :: integer(), b :: integer()}. +``` + +The latter one is proposed new syntax. The following would also be valid and +identical since parentheses are optional in attributes, and since atoms may be +quoted even when not mandatory: + +```erlang +-record 'div', {a :: integer(), b :: integer()}. +-record #'div'{a :: integer(), b :: integer()}. +-record(#'div'{a :: integer(), b :: integer()}). +-record(#div{a :: integer(), b :: integer()}). +``` + +Usage Syntax Motivation +======================= + +Record names are atoms. As such, the current Erlang syntax requires the record +names to be consistent with the rest of the language's use of atoms. + +All atoms in Erlang can be denoted with single quotes. Some examples: + +For example: + +```erlang +'foo'. +'FOO'. +'foo-bar'. +``` + +But, conveniently, simple atoms (all alphanumeric, underscores (`_`) or +at symbols (`@`) with the first character being a lowercase letter and not one +of the 20+ reserved words), in all contexts can be invoked without the +necessary wrapping quotes. Some examples: + +```erlang +foo. +foo_Bar. +'foo-bar'. % still quoted since the term has a non-atomic character in it. +``` + +Conveniently, this also means that records named with simple atoms can be +invoked and used without having to quote the atoms. For example: + +```erlang +-record(foo, {a, b}). +-record(bar, {c}). + +go() -> + X = #foo{a = 1, b = 2}, + Y = X#foo{a = something_else}, + Z = #bar{c = Y#foo.a}, + ... +``` + +Unfortunately, that also means that records named with anything that doesn't +fit the "simple atom" pattern must be wrapped in quotes in definition and +usage. For example: + +```erlang +-record('div', {a, b}). +-record('SET', {c}). + +go() -> + X = #'div'{a = 1, b = 2}, + Y = X#'div'{a = something_else}, + Z = #'SET'{c = Y#'div'.a}, + ... +``` + +While this approach is consistent with atom usage in the language, for reserved +words and capitalized atoms, this makes the record syntax *feel* inconsistent if you have a need for +naming a record with a reserved word (or term with a capital first letter). In +this case, it almost guarantees a user won't use a record named 'if', +'receive', 'fun', etc even though there may very well be a valid use case for +such a name. The most common use case that comes to mind from the Nitrogen Web +Framework. Since HTML has a `div` tag, Nitrogen (which represents HTML tags +using Erlang records) should naturally have a `#div` record, however, due to +'div' being a reserved word (the integer division operator), the record `#panel` +is used instead to save the programmer from having to invoke `#'div'`, +which feels unnatural and awkward. + +Further, applications such as ASN.1 and Corba both have naming conventions that +rely heavily on uppercase record names and as such, they currently must be +quoted as well. You can see this in modules in Erlang's +[`asn1`](https://github.com/erlang/otp/blob/OTP-27.1.1/lib/asn1/src/asn1_records.hrl#L35-L39) +application. (The previous link points to some record definitions in `asn1`, +but you can see the usage scattered across a number of modules in the `asn1` +application). + +Usage Syntax Specification +========================== + +This EEP simplifies the above example by + +1. Allowing reserved words and variables to be used without quotes for record + names, and +2. Simplifying the definition such that the syntax between record definition + and record usage becomes more consistent. + +With the changes from this EEP, the above code becomes: + +```erlang +-record('div', {a, b}). +-record('SET', {c}). + +go() -> + X = #div{a = 1, b = 2}, + Y = X#div{a = something_else}, + Z = #SET{c = Y#div.a}, + ... +``` + +Definition Syntax Motivation +============================ + +While the updated example in the usage syntax specification makes the *using* +of records cleaner, there remains one more inconsistency that can also be +relatively easily solved. That is the record definition still also needing to +quote record name, as the example above demonstrates (repeated here for +convenience): + +```erlang +-record('div', {a, b}). + +go() -> + X = #div{a = 1, b = 2}, + Y = X#div{a = something_else}, + Z = Y#div.a, + ... +``` + +So whereas the record definition needs to be thought of as `'div'`, the record +usage no longer requires the quoted term 'div', which could certainly lead an +Erlang beginner to wonder why 'div' needs to be quoted in the definition while +other atom-looking terms don't. + +Definition Syntax Specification +=============================== + +Conveniently, there is a rather easy solution, and that's to +allow the record usage syntax to also be used as the record definition. + +This EEP also then also adds a new record definition syntax, improving the +symmetry between general record usage and record definition. + +The above example can fully then look like the following: + +```erlang +-record #div{a, b}. + +go() -> + X = #div{a = 1, b = 2}, + Y = X#div{a = something_else}, + Z = Y#div.a, + ... +``` + +Implementation +============== + +To update the syntax for using records, we can safely augment the parser to +change its already existing record handling of `'#' atom '{' ... '}'` and +`'#'atom '.' atom` into `'#' record_name '{' ... '}'` and +`'#' record_name '.' atom`, and define `record_name` to be `atom`, `var`, or +`reserved_word`. + +To update the record definition syntax, we can simply add a few new +modifications to the `attribute` Nonterminal to allow `'#' record_name` as name +for the `record` attribute, instead of `atom` as for generic attributes. + +Backwards Compatibility +======================= + +As this EEP only adds new syntax, the vast majority existing codebases will +still work, with the possible exception of AST/code analysis tools that are +analyzing code using the new syntax. + +Syntax highlighting and code completion tools may need to be updated to support +the new syntax if your code uses the new syntax rules. + +Broader Concerns and Points of Discussion +========================================= + +While the new definition syntax creates some degree of symmetry around record +usage, perfect symmetry is impossible to achieve, since a record can always +be handled as the atom tagged tuple it actually is. The question is where +to draw the line where the record's true nature shows, and how hard we +should try to hide it. These are remaining concerns and inconsistencies: + +Auxiliary Record Functions +-------------------------- + +Other functions that work with records like `is_record/2` or `record_info/1` +are not currently covered by any of the syntactical changes in this EEP, and as +such, it remains necessary to quote record names if they are not simple atoms. +For example: `is_record(X, div)` would still be a syntax error. So there is +still not true 100% symmetry. Note that instead of using the +`is_record(X, 'div')` guard, matching on `#div{}` is probably more frequently +used, since it is terser and mostly regarded as more readable. + +Two Definition Syntaxes? +------------------------ + +This EEP introducing a new syntax for record definition could potentially lead +to some to wonder why the language has two rather different syntaxes for +defining records. Since usage of the syntax for getting, setting, matching, etc +(e.g. `#rec{a=x,y=b}`) occurs far more commonly than defining, it only feels +natural that the definition syntax would mirror usage. + +For more symmetry, the syntax in Erlang's type system to define records also +matches the newly proposed define syntax. + +Thus, I feel that sharing the existing usage and type syntax with the +definition system would likely become the default/preferred way, and that the +original syntax remain for backwards compatibility. + +Reference Implementation +======================== + +The reference implementation is provided in a form of pull request on GitHub + + https://github.com/erlang/otp/pull/7873 + +Copyright +========= + +This document has been placed in the public domain.