Add tuple projection labels #819

br4sco · 2024-01-11T17:04:20Z

After discussions with @elegios, this PR also changes parsing of record projections so that .<uint> is now treated as a token. This means that it is now possible to write projections from nested tuples as:

(1, (2, 3)).1.0

Before we had to project from nested tuples using ((1, (2, 3)).1).0 or (1, (2, 3)).1 .0, as 1.0 would parse as a float. However, this change means that we cannot write lambdas as lam x.1, instead we need to write it as lam x. 1.

johnwikman · 2024-01-19T09:51:54Z

The thing that I am unsure about is how the new tokenization corresponds to the mental picture of the syntax. In my mind, the "dot" represents a separator which means that I expect that lam x.0 and lam x. 0 would parse as the same AST. This becomes a wierd quirk of the language that I am not really comfortable with, but it's highly subjective and most people were fine with this quirk as discussed on the last meeting.

The way I see it, the main culprits are tuples and floats. I tested some variations on the dot as a separator in lambdas and could not find a good alternative. So I am throwing in some alternate suggestions of how this issue could be fixed:

Change label naming in tuples to _<idx>. So the tuple (4, ("foo", True)) would instead be syntactic sugar for {_0 = 4, _1 = {_0 = "foo", _1 = True}}. Then we can get the foo by t._1._0, which is less nice than t.1.0 but not terrible IMO.
Use tup:label syntax for record projection. E.g. t:1:0 to get the foo from the previous example. This would also work for records, e.g. env.freevars becomes env:freevars. An issue I see with this syntax is that the colon might hint at that this is a sequence or slice as is common in other languages, rather than projection.
(Very unsure of this) Change syntax of floats to always use a postfix f. Similar to single-precision floats in C, they have the end with the f suffix. So a float value for 1 would be specified as 1.0f. This also opens up that we don't need to have the prefix 0 anymore, so .3f would be a valid token for the float value 0.3.

elegios · 2024-01-19T10:28:01Z

I think your point 1 is a very neat change that also makes it so that labels fully overlap with identifiers, rather than identifiers and integers.

Point 3 is also interesting in that it fixes the issue and enables some float syntax we wouldn’t have otherwise (also, e.g., 1f, which for some reason feels nicer to me than 1.). It’s a bit heavier syntax for floats, which is the thing that makes me unsure.

After some thinking, I think I don’t support my previous suggestion any more, and that I’d rather do your 1, or 3, or drop tuple-projection. Dropping is a nice thing for a surface language intended to be read, since it encourages names, but of course, mcore isn’t a surface language. Then again, it’s a decently common feature in languages, and it would be nice to have a clean way of pprinting the corresponding mcore.

br4sco · 2024-01-24T14:35:02Z

I like point 1 because It removes one instance of syntactic sugar. I also like point 3. Not only for not overlapping with tuple projections, but I also think the code becomes more readable with an f suffix rather than, a possible non-existing, decimal.

david-broman · 2024-02-13T03:38:34Z

But alternative 1 would conflict with ordinary field names. I.e., "_0" is a legal identifier. That is, the syntactic sugar of tuples pollutes the ordinary namespace. This might be OK, but we need to think of the consequences of that.

johnwikman · 2024-02-13T10:11:23Z

In terms of confusion when directly creating records? Or in general with labels starting with an underscore?

-- these are equivalent

let r = {_0 = 5, _1 = "foo"} in r._0

let r = (5, "foo") in r._0

I think that this feels fine, the only potential issue I see in the short term is that we probably need to have some more metadata when pretty printing records. Such that a record explicitly created as {_0 = 5, _1 = "foo"} doesn't get pretty printed as a tuple.

There should be no syntax issue specific to tuple projection since that should equally apply to records.

elegios · 2024-02-13T11:08:23Z

Technically the current way things work also has tuples pollute the ordinary namespace, it's just that it's a bit more inconvenient to write those labels since you have to say, e.g., #label"1". All we're doing is changing what precise form of record we're giving syntactic sugar, in the form of tuples (patterns, expressions, and types).

The description of the syntax sugar for tuples would change roughly as follows:

(a, b, c) is syntactic sugar for {#label"0" = a, #label"1" = b, #label"2" = c} and foo.0 is syntactic sugar for foo.#label"0".

becomes

(a, b, c) is syntactic sugar for {_0 = a, _1 = b, _2 = c}.

along with the corresponding change for tuple types.

The only change is that it's easier to write a record that looks like a mixed record and tuple; {_0 = a, foo = b} is probably more likely to be written than {#label"0" = a, foo = b}. Regardless, the only thing weird about such a record is that if we have code that projects a tuple-like field from a record then we might expect to be able to project all other fields in a tuple-like fashion, which wouldn't be true for this mix. On the other hand, that's also true today, so this change wouldn't affect the status quo.

I also don't think we should special case and remember how things were printed originally, unless we want to do that for everything, e.g.,

() vs {}
"" vs []
"abc" vs ['a', 'b', 'c']
foo.bar vs match foo with {bar = x} in x
match a with b in ... vs match a with b then ... else never
switch x case A _ then true case B _ then false end vs match x with A _ then true else match x with B _ then false else never (this one's not implemented in the pretty-printer, but at least I feel like it should be).

johnwikman · 2024-02-13T11:33:12Z

I also don't think we should special case and remember how things were printed originally, unless we want to do that for everything, e.g.,

* `()` vs `{}`

* `""` vs `[]`

* `"abc"` vs `['a', 'b', 'c']`

* `foo.bar` vs `match foo with {bar = x} in x`

* `match a with b in ...` vs `match a with b then ... else never`

* `switch x case A _ then true case B _ then false end` vs `match x with A _ then true else match x with B _ then false else never` (this one's not implemented in the pretty-printer, but at least I feel like it should be).

Perhaps not in the case of MCore, but for a language intended for end users I think that this is quite important, especially when auto formatting the code. I would be quite annoyed if I write something as ['\n', '\t', ' '] and it auto formats as "\n\t ", when I clearly meant it as a list of characters and not a string of text.

Having high-level hints of AST nodes being generated as part of syntactic sugar does not sound too unreasonable. These would likely get lost in certain transformations when compiling, but this syntactic information is not likely needed at that point.

Add tuple projection tokens

8fd8d8e

br4sco force-pushed the tuple-proj-tokens branch from 5017d6d to 8fd8d8e Compare January 11, 2024 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tuple projection labels #819

Add tuple projection labels #819

br4sco commented Jan 11, 2024

johnwikman commented Jan 19, 2024 •

edited

Loading

elegios commented Jan 19, 2024

br4sco commented Jan 24, 2024

david-broman commented Feb 13, 2024

johnwikman commented Feb 13, 2024

elegios commented Feb 13, 2024 •

edited

Loading

johnwikman commented Feb 13, 2024

Add tuple projection labels #819

Are you sure you want to change the base?

Add tuple projection labels #819

Conversation

br4sco commented Jan 11, 2024

johnwikman commented Jan 19, 2024 • edited Loading

elegios commented Jan 19, 2024

br4sco commented Jan 24, 2024

david-broman commented Feb 13, 2024

johnwikman commented Feb 13, 2024

elegios commented Feb 13, 2024 • edited Loading

johnwikman commented Feb 13, 2024

johnwikman commented Jan 19, 2024 •

edited

Loading

elegios commented Feb 13, 2024 •

edited

Loading