Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tuple projection labels #819

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

br4sco
Copy link
Contributor

@br4sco br4sco commented Jan 11, 2024

After discussions with @elegios, this PR also changes parsing of record projections so that .<uint> is now treated as a token. This means that it is now possible to write projections from nested tuples as:

(1, (2, 3)).1.0

Before we had to project from nested tuples using ((1, (2, 3)).1).0 or (1, (2, 3)).1 .0, as 1.0 would parse as a float. However, this change means that we cannot write lambdas as lam x.1, instead we need to write it as lam x. 1.

@br4sco br4sco force-pushed the tuple-proj-tokens branch from 5017d6d to 8fd8d8e Compare January 11, 2024 17:44
@johnwikman
Copy link
Contributor

johnwikman commented Jan 19, 2024

The thing that I am unsure about is how the new tokenization corresponds to the mental picture of the syntax. In my mind, the "dot" represents a separator which means that I expect that lam x.0 and lam x. 0 would parse as the same AST. This becomes a wierd quirk of the language that I am not really comfortable with, but it's highly subjective and most people were fine with this quirk as discussed on the last meeting.

The way I see it, the main culprits are tuples and floats. I tested some variations on the dot as a separator in lambdas and could not find a good alternative. So I am throwing in some alternate suggestions of how this issue could be fixed:

  1. Change label naming in tuples to _<idx>. So the tuple (4, ("foo", True)) would instead be syntactic sugar for {_0 = 4, _1 = {_0 = "foo", _1 = True}}. Then we can get the foo by t._1._0, which is less nice than t.1.0 but not terrible IMO.
  2. Use tup:label syntax for record projection. E.g. t:1:0 to get the foo from the previous example. This would also work for records, e.g. env.freevars becomes env:freevars. An issue I see with this syntax is that the colon might hint at that this is a sequence or slice as is common in other languages, rather than projection.
  3. (Very unsure of this) Change syntax of floats to always use a postfix f. Similar to single-precision floats in C, they have the end with the f suffix. So a float value for 1 would be specified as 1.0f. This also opens up that we don't need to have the prefix 0 anymore, so .3f would be a valid token for the float value 0.3.

@elegios
Copy link
Contributor

elegios commented Jan 19, 2024

I think your point 1 is a very neat change that also makes it so that labels fully overlap with identifiers, rather than identifiers and integers.

Point 3 is also interesting in that it fixes the issue and enables some float syntax we wouldn’t have otherwise (also, e.g., 1f, which for some reason feels nicer to me than 1.). It’s a bit heavier syntax for floats, which is the thing that makes me unsure.

After some thinking, I think I don’t support my previous suggestion any more, and that I’d rather do your 1, or 3, or drop tuple-projection. Dropping is a nice thing for a surface language intended to be read, since it encourages names, but of course, mcore isn’t a surface language. Then again, it’s a decently common feature in languages, and it would be nice to have a clean way of pprinting the corresponding mcore.

@br4sco
Copy link
Contributor Author

br4sco commented Jan 24, 2024

I like point 1 because It removes one instance of syntactic sugar. I also like point 3. Not only for not overlapping with tuple projections, but I also think the code becomes more readable with an f suffix rather than, a possible non-existing, decimal.

@david-broman
Copy link
Contributor

But alternative 1 would conflict with ordinary field names. I.e., "_0" is a legal identifier. That is, the syntactic sugar of tuples pollutes the ordinary namespace. This might be OK, but we need to think of the consequences of that.

@johnwikman
Copy link
Contributor

In terms of confusion when directly creating records? Or in general with labels starting with an underscore?

-- these are equivalent

let r = {_0 = 5, _1 = "foo"} in r._0

let r = (5, "foo") in r._0

I think that this feels fine, the only potential issue I see in the short term is that we probably need to have some more metadata when pretty printing records. Such that a record explicitly created as {_0 = 5, _1 = "foo"} doesn't get pretty printed as a tuple.

There should be no syntax issue specific to tuple projection since that should equally apply to records.

@elegios
Copy link
Contributor

elegios commented Feb 13, 2024

Technically the current way things work also has tuples pollute the ordinary namespace, it's just that it's a bit more inconvenient to write those labels since you have to say, e.g., #label"1". All we're doing is changing what precise form of record we're giving syntactic sugar, in the form of tuples (patterns, expressions, and types).

The description of the syntax sugar for tuples would change roughly as follows:

(a, b, c) is syntactic sugar for {#label"0" = a, #label"1" = b, #label"2" = c} and foo.0 is syntactic sugar for foo.#label"0".

becomes

(a, b, c) is syntactic sugar for {_0 = a, _1 = b, _2 = c}.

along with the corresponding change for tuple types.

The only change is that it's easier to write a record that looks like a mixed record and tuple; {_0 = a, foo = b} is probably more likely to be written than {#label"0" = a, foo = b}. Regardless, the only thing weird about such a record is that if we have code that projects a tuple-like field from a record then we might expect to be able to project all other fields in a tuple-like fashion, which wouldn't be true for this mix. On the other hand, that's also true today, so this change wouldn't affect the status quo.


I also don't think we should special case and remember how things were printed originally, unless we want to do that for everything, e.g.,

  • () vs {}
  • "" vs []
  • "abc" vs ['a', 'b', 'c']
  • foo.bar vs match foo with {bar = x} in x
  • match a with b in ... vs match a with b then ... else never
  • switch x case A _ then true case B _ then false end vs match x with A _ then true else match x with B _ then false else never (this one's not implemented in the pretty-printer, but at least I feel like it should be).

@johnwikman
Copy link
Contributor

I also don't think we should special case and remember how things were printed originally, unless we want to do that for everything, e.g.,

* `()` vs `{}`

* `""` vs `[]`

* `"abc"` vs `['a', 'b', 'c']`

* `foo.bar` vs `match foo with {bar = x} in x`

* `match a with b in ...` vs `match a with b then ... else never`

* `switch x case A _ then true case B _ then false end` vs `match x with A _ then true else match x with B _ then false else never` (this one's not implemented in the pretty-printer, but at least I feel like it should be).

Perhaps not in the case of MCore, but for a language intended for end users I think that this is quite important, especially when auto formatting the code. I would be quite annoyed if I write something as ['\n', '\t', ' '] and it auto formats as "\n\t ", when I clearly meant it as a list of characters and not a string of text.

Having high-level hints of AST nodes being generated as part of syntactic sugar does not sound too unreasonable. These would likely get lost in certain transformations when compiling, but this syntactic information is not likely needed at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants