Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo: adding m[x] notation for indexing #9174

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

richcarl
Copy link
Contributor

This is just a proof of concept as a discussion point. We could easily have m[x] notation for looking up a value in a map (and possibly other data types), if we wanted to. It's no different grammatically from how it works in e.g. Python or Javascript.

f(X, Y) ->
    A=#{a => #{1=>a, 2=>b, 3=>c},
        b => #{1=>p, 2=>q, 3=>r},
        c => #{1=>x, 2=>y, 3=>z}},
    A[X][Y].

As people are using maps more and more, I see that there could be a case for adding this syntax now, as an alias for erlang:map_get/2, especially for accessing nested maps. Works in guards as well.

Copy link
Contributor

github-actions bot commented Dec 10, 2024

CT Test Results

    2 files     96 suites   1h 9m 8s ⏱️
2 173 tests 2 125 ✅ 48 💤 0 ❌
2 536 runs  2 486 ✅ 50 💤 0 ❌

Results for commit f47e1ec.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng added the team:LG Assigned to OTP language group label Dec 11, 2024
@Maria-12648430
Copy link
Contributor

This is just a proof of concept as a discussion point. We could easily have m[x] notation for looking up a value in a map (and possibly other data types), if we wanted to. It's no different grammatically from how it works in e.g. Python or Javascript.

f(X, Y) ->
    A=#{a => #{1=>a, 2=>b, 3=>c},
        b => #{1=>p, 2=>q, 3=>r},
        c => #{1=>x, 2=>y, 3=>z}},
    A[X][Y].

As people are using maps more and more, I see that there could be a case for adding this syntax now, as an alias for erlang:map_get/2, especially for accessing nested maps. Works in guards as well.

This is something I have wished for for a long time, thanks ❤️

I have some issues with the terminology and syntax, though 😅


The term "index" seems wrong to me in the context of maps, and should be "key" or something. The grammar token should be something like map_lookup_expr, in order to remove the "index" part and make it obvious that it concerns maps.


The proposed syntax of M[X][...] using square brackets puts this uncomfortably close to lists territory. It may be just me, but when I see square brackets, I think "lists". What I mean becomes more obvious when you put in some more formatting. Using the last line from your example, imagine it written like this:

    ...
    A
        [X]
        [Y].

It's not really obvious at a glance what this means. Each line on its own is valid syntax. Which also means, if a stray comma sneaks in, like this...

    ...
    A
        [X],
        [Y].

... it won't be detected by the compiler, since it is all totally valid. But the result won't be what you want at all: [Y] instead of maps:get(maps:get(A, X), Y), all because of a misplaced, hard to spot comma. Debug this 👊

My suggestion is, at least put in a #. Me, when I see #, I always think "maps". Maybe also abolish square brackets here, another option would be something like #{X}, that is, a map-like construct without the assignment. But most important, make it so that the lookup construct is not valid code on its own.

@williamthome
Copy link
Contributor

I don't dislike the proposed notation, but I agree with Maria.
However, I'll give another suggestion.

This is a record notation:

-record(foo, {bar}).

rec() ->
    Rec = #foo{bar = bar},
    Rec#foo.bar.

I suggest using a notation like the one on record:

map() ->
    Map = #{foo => bar}.
    #Map.foo.

To me, this notation feels more obvious that I'm getting a map value instead of Map[foo].


Note that the below are an invalid record notations:

invalid_rec_1() ->
    Foo = foo, 
    Rec = #foo{bar = bar},
    Rec#Foo.bar.

invalid_rec_2() ->
    Foo = foo,
    #Foo{bar = bar}.

@richcarl
Copy link
Contributor Author

The term "index" seems wrong to me in the context of maps, and should be "key" or something.

The act of looking up something is commonly referred to as indexing, and the terminology should also not be too specific to the particular application of looking up from a dictionary, since it could be made to apply to things like tuples or binaries as well.

The proposed syntax of M[X][...] using square brackets puts this uncomfortably close to lists territory. It may be just me, but when I see square brackets, I think "lists".

This is true, but the same goes for Python, Javascript, Ruby, etc. They all have the same possibility of accidentally leaving out a comma, but I don't see many people pointing this out as a big problem. The advantages of using a universally recognized notation compared to adding yet another Erlang specific quirk should outweigh the risk of occasional typos, I think.

@Maria-12648430
Copy link
Contributor

map() ->
    Map = #{foo => bar}.
    #Map.foo.

This is kinda ok (though I can't say I like it much, sorry to say) as long as the key is an atom. It gets pretty weird when the key is something else: #Map."foo", #Map.{foo}, #Map.[foo], #M.#{foo => bar}. And it gets even weirder when the map to access is not in a variable but given as a literal: ##{foo => bar}.foo.

@Maria-12648430
Copy link
Contributor

Maria-12648430 commented Dec 12, 2024

This is true, but the same goes for Python, Javascript, Ruby, etc.

This doesn't mean it is good, just saying 😜

The advantages of using a universally recognized notation compared to adding yet another Erlang specific quirk should outweigh the risk of occasional typos, I think.

Let's just say I really disagree on that point 😅

Curious to see what others think...

@richcarl
Copy link
Contributor Author

I suggest using a notation like the one on record:

Keep in mind that both map and key can be computed values. You probably don't want this:

    #(build_a_map(Of, Stuff)).(get_the_key(From, Somewhere))

@richcarl
Copy link
Contributor Author

Oh, and don't forget that you probably want to be able to chain them in a good way. Consider:

    #(#(get_map_of_maps(Of, Stuff).(get_the_key(From, Somewhere))).(get_other_key(Elsewhere))

compared to

    get_map_of_maps(Of, Stuff)[get_the_key(From, Somewhere)][get_other_key(Elsewhere)]

@juhlig
Copy link
Contributor

juhlig commented Dec 12, 2024

The act of looking up something is commonly referred to as indexing

Really? I would say that the common expression is "looking up (by index, by key, by ...)". English is not my native language, though, so I may be mistaken.

the terminology should also not be too specific to the particular application of looking up from a dictionary, since it could be made to apply to things like tuples or binaries as well.

I think this should be restricted to maps. Because aside from maps, what this could be applied to are:

  • Lists. However, accessing list elements by their index is discouraged, because, performance etc. But if you introduce a shortcut for doing it, you encorage it instead, because if the language provides a shortcut for it, it can't be bad, right?
  • Tuples. Yes, but how often do you want to access tuple elements by their index? They are usually small and pattern-matched against. Cases where you need access by index are rare enough that using element/2 is not too much of a pain.
  • Binaries. No. What should Bin[N] give you? The Nth byte? The Nth bit? What about other unit sizes? And what should the type of the returned value be? An integer? In what endianness, and signed or unsigned? Or a (sub-) binary?

@richcarl
Copy link
Contributor Author

I think this should be restricted to maps.

That's perfectly fine, but the (suggested) notation is neutral and shouldn't be hard coded to maps only, because we don't know what we might want to add in the future. Maybe built-in arrays or other kinds of vectors as an example.

@garazdawi
Copy link
Contributor

Maybe built-in arrays or other kinds of vectors as an example.

Or somekind of Access protocol similar to what Elixir uses :) So that we could use it for custom datatypes.

@richcarl
Copy link
Contributor Author

Or somekind of Access protocol similar to what Elixir uses :) So that we could use it for custom datatypes.

Yes, it's an interesting question if that can be added to Erlang in a nice way (with little overhead and no complications at compile time and supporting dynamic code updates and with good debugging support).

@juhlig
Copy link
Contributor

juhlig commented Dec 13, 2024

Maybe built-in arrays or other kinds of vectors as an example.

Or somekind of Access protocol similar to what Elixir uses :) So that we could use it for custom datatypes.

(Disclaimer: This is just my personal opinion)


Please don't introduce multiple mostly-alike-but-oh-so-slightly-different implementations of the same thing. Also, please don't introduce "magic wands" that work on anything remotely alike except when it doesn't.


This is one thing I hate most about how Clojure (what I do for a living) handles things, and what I cherish most about how Erlang handles things.

Like, lists and vectors in Clojure. Basically the same thing from an outside (user) perspective; except when you eg conj, with lists in prepends, with vectors it appends.
In Erlang in contrast, there is one thing for one purpose, and so there is one way of doing things, with absolutely predictable (and debuggable) results.

map, filter and friends work on lists/vectors, sets, maps (and whatnot, basically anything that can be traversed), except that those are really different things, and you may have to at least design the mapping/filtering/etc function according to what you think is the input; they are handled all the same as if they were vectors/lists (and nil is taken as "empty", which doesn't make it better); and the result is always a list, no matter what the input was (-> see lists, vectors, conj above), which you tend to forget.
In Erlang, if I want to do something with maps, I have to use the maps module or maps-specific syntax, which means I can be sure that what goes in and/or comes out is a map, not a list, not a tuple; and if I want to do something with lists, I have to use the lists module or lists-specific syntax, which means I can be sure that what goes in and/or comes out is a list.

The Clojure way of "pick your best fit of a variety of similiar data structures" and "one function to work on every data structure" may seem nice and shiny at first. Until you get to debugging things, and all your logic is fine, but different things happen because the data running through it is of one type and not another, or got silently transformed from one type into a another, one that can be (and is) processed, just with slightly different results.

@juhlig
Copy link
Contributor

juhlig commented Dec 13, 2024

The advantages of using a universally recognized notation compared to adding yet another Erlang specific quirk should outweigh the risk of occasional typos, I think.

Let's just say I really disagree on that point 😅

So do I, wholeheartedly! I think that fitting a new feature/syntax into a language should, above all, be guided by the specifics (or quirks, if you want) of the language it is introduced into, not by what it looks like in other languages that already have that feature.

What you will get by adopting the latter approach is a very messy mish-mash conglomerate of syntaxes borrowed from all over the place. "$THIS has to be done the rubbish Ruby-ish way; $THAT has to be done the Elixir-ish way; $OTHER_THING has to be done the Haskell-ish way; and there still is the Erlang heritage way for older things".

tl;dr, my 2ct: We are doing Erlang, so let's just keep doing things the Erlang way. There are enough things for newcomers to get used to, not only syntax, that is IMO a minor part. Learning a different but not too far off access syntax is just another small thing on the learners side to overcome. Using a construct that does not fit into the overall language is a permanent sore once you're into the language.

@Maria-12648430
Copy link
Contributor

tl;dr, I totally agree with what @juhlig said.

To illustrate the point of "works on everything, except when it doesn't" in the current access syntax context, imagine the following:

M = #{a => b, 1 => c}.
L = [b, a].
  • M[1] and L[1] will both work, but access different things (the map key 1 in the former, the element at position 1 in the latter case), and yield different results
  • M[a] will work, but L[a] won't
  • M[2] won't work, but L[2] will

By using the same access syntax on different things, you never know what it is that you will get. To be sure, you have to use type checks (is_map, is_list, ...) beforehand and act differently, but then again you can usually just pattern match out the interesting thing while you're doing that.

@richcarl
Copy link
Contributor Author

The Clojure way of "pick your best fit of a variety of similiar data structures" and "one function to work on every data structure" may seem nice and shiny at first. Until you get to debugging things, and all your logic is fine, but different things happen because the data running through it is of one type and not another, or got silently transformed from one type into a another, one that can be (and is) processed, just with slightly different results.

I agree that this kind of situation is not what you want, and thanks for sharing your Clojure experiences. Like I said, debugging must work well. On the other hand there are situations where you do want to be able to run the same nontrivial piece of code on different implementations of a data type, and you do not want to maintain several copies of that code each tailored to use a separate implementation. Today, you can solve this using preprocessor macros, or passing around module names for dynamic calls, or by parse transforms or similar code generation approaches, and all those are mostly worse for debugging. It would be nice to have a general built-in approach with good support for tracing and debugging.

@garazdawi
Copy link
Contributor

Protocols can be a source a great confusion, but also reduce the amount of code written and make things more performant. For example being able to let the json module iterate over a gb_tree or an ets table, instead of having to provide a custom encoder function. Being able to lookup an element in a list by an index, not so much, which I imagine is why you get this in Elixir if you try:

** (ArgumentError) the Access module does not support accessing lists by index, got: 1

Accessing a list by index is typically discouraged in Elixir, instead we prefer to use the Enum module to manipulate lists as a whole. If you really must access a list element by index, you can Enum.at/1 or the functions in the List module
    (elixir 1.17.3) lib/access.ex:347: Access.get/3
    iex:2: (file)

I doubt that we will ever get protocols in Erlang, but at times I think we really are missing out. The combination of protocols and structs in Elixir is very neat IMO.

@michalmuskala
Copy link
Contributor

michalmuskala commented Dec 16, 2024

I agree that addition of a map-key access syntax would be a good addition and it would allow simplifying quite a bit of code. I'm not convinced about the M[key] syntax.

The original map EEP proposed M#{key} as the access syntax, but it was never implemented.
I think one of the issues with that syntax is that it's easy to confuse with the update syntax.

I agree with the premise of this discussion about using [] for access as a clear "indexing" marker. I would however, argue to keep the existing "syntax marker" for maps of #. We'd end up with a M#[key] syntax in this case.

Polymorphism is a tool with advantages and disadvantages - traditionally Erlang had fairly little polymorphism, and in places where it had it - notably size/1 BIF, it was later effectively deprecated and is a performance pitfall today. I'm not sure changing the situation just in this one place makes sense - if indeed Erlang should be extended with some form of polymorphism, it should be more generic than just access syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:LG Assigned to OTP language group
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants