Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

headless anonymous function (->) syntax #38713

Open
rapus95 opened this issue Dec 5, 2020 · 163 comments
Open

headless anonymous function (->) syntax #38713

rapus95 opened this issue Dec 5, 2020 · 163 comments
Labels
parser Language parsing and surface syntax

Comments

@rapus95
Copy link
Contributor

rapus95 commented Dec 5, 2020

Since #24990 stalls on the question of what the right amount of tight capturing is

Idea

I want to propose a headless -> variant which has the same scoping mechanics as (args...)-> but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless -> it can find.

Before After
lfold((x,y)->x+2y, A) lfold(->_+2_,A)
lfold((x,y)->sin(x)-cos(y), A) lfold(->sin(_)-cos(_), A)
map(x->5x+2, A) map(->5_+2,A)
map(x->f(x.a), A) map(->f(_.a),A)

Advantage(s)

In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters.
Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right

      -> ---| -----|
            V      V
lfold(->sin(_)-cos(_), A)

Sure, some more complex options like reordering ((x,y)->(y,x)), ellipsing ((x...)->x) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.

Feasibility

  1. Both, a leading -> and an _ as the right hand side (value side) error on 1.5 so that shouldn't be breaking.
  2. Since it uses the well-defined scoping of the ordinary anonymous functions it should be easy to
    2a) switch between both variants mentally
    2b) reuse most of the current parser code and just extend it to collect/replace underscores

Compatibility with #24990

It shouldn't clash with the result of #24990 because that focuses more on tight single argument very tight argument cases. And even if you are in a situation where the headless -> consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->) in the right place to make that underscore once again standalone.

Further Explorations

This proposal can optionally be combined with #53946.
Additionally, the following links to comments further down explore different ideas to stretch into, all adding their own value to different parts of the ecosystem.
Alternative explorations: #38713 (comment) #38713 (comment)

@StefanKarpinski
Copy link
Member

After that long, inconclusive debate, I think I've also come to the conclusion that having an explicit marker is better. Headless -> is the "natural" marker for this, so there you have it — it's simple and unambiguous. It even leaves room for letting Array{_, 3} be a shorthand for Array{<:Any, 3}. We would have to decide what to do in cases like -> (_, Array{_, 3}), but I would argue that it would probably be best to just make that an error and not allow _ in type parameter position inside of a headless lambda, which would need to be decided when implementing this, since otherwise making it an error after this feature is implemented would be a breaking change.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 5, 2020

To clarify, the question is which of these -> (_, Array{_, 3}) would mean:

  • x -> (x, Array{<:Any, 3})
  • (x, y) -> (x, Array{y, 3})

Both could potentially make sense. My suggestion is to make it an error and force the user to either use a normal lambda or not use _ as a type parameter. Alternatively, we could say that _ always "binds" to the tightest thing it could. That's also a consideration in the presence of nested headless lambdas. For example, it seems only sensible to interpret -> (_, -> _ + _) as meaning x -> (x, (y, z) -> y + z), so you could make the same case for -> (_, Array{_, 3}) that the _ as an argument to Array should mean Array{<:Any} since it's innermost.

@yurivish
Copy link
Contributor

yurivish commented Dec 5, 2020

Sure, some more complex options like reordering ((x,y)->(y,x))

One solution is to say that inside of a headless anonymous function _n refers to the nth argument. So (x, y) -> (y, x) would be written as

-> (_2, _1)

There is precedent for this in Clojure:

The function literal supports multiple arguments via %, %n, and %&.

#(println %1 %2 %3)

and Mathematica:

#n represents the n ^(th) argument.

In[1] := f[#2, #1] &[x, y]
Out[1] = f[y, x]

@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label Dec 5, 2020
@rapus95
Copy link
Contributor Author

rapus95 commented Dec 5, 2020

Alternatively, we could say that _ always "binds" to the tightest thing it could.

that's exactly what I meant when saying

automatically collects all not-yet-captured underscores into an argument list.

so yes, I'd clearly be in favor of making them bind the tightest.
Regarding the case of parametric underscore in headless lambda, I'd propose to already special case that in the parser (or wherever that belongs to 🙈) but make it error for now. That way we're free to add a proper rule once we've found a good solution without being breaking.
Though, right now I'm thinking about the following idea:

sugar expanded
->(_, Array{_, 1}) x->(x, Array{<:Any, 1})
->(_, Array{<:_, 1}) (x,y)->(x, Array{<:y, 1})

for types the 2nd approach should work in the most cases since <:LeafType == LeafType for leaf types while one rarely needs actual abstract types as fixed parameters (and even then we have AbstractType <: AbstractType)
The drawback is that this approach wouldn't work for bitstypes like integers since <:1 is undefined iirc.
BUT! once we have something like Array{String,::Int} to denote the 2nd parameter to be an integer, we might be able to allow ->Array{String, _::T} as (x::T)->Array{String, x}

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 5, 2020

regarding _1,_2 to denote the argument order still makes me have mixed feelings. On the one hand it reduces the readability since it's no more simply "put in from left to right" and on the other hand it doesn't save much anymore since now you need 2 characters to denote a single variable. a normal lambda would need 3 characters (definition, comma in argument list and usage)

@mcabbott
Copy link
Contributor

mcabbott commented Dec 5, 2020

The other advantage of numbering is that it lets you re-use the same argument:

sort(xs, by= -> _.re + _.im/100)  # x -> x.re + x.im/100
sort(xs, lt= -> _1.re < _2.re)    # (x,y) -> x.re < y.re

Edit -- You could argue that the first line only saves one character, and the second 3 not 5. But what it does still save is the need to invent a name for the variable.

Re-using the same symbol with different meanings in an expression seems confusing to me. Is it ever actually ambiguous? (The order of symbols in +(a,b) and a+b differ, with the same Expr, but _ will never be infix. So perhaps that cannot happen?)

Edit' -- Not the most convincing example, but notice that 1,2,3 occur in order in the lowered version of this, as Iterators.filter puts the condition before the argument, but the comprehension puts it afterwards:

dump(:(  [f(x,1) for x in g(z,3) if h(x,2)]  ), maxdepth=10)

@yurivish
Copy link
Contributor

yurivish commented Dec 5, 2020

regarding _1,_2 to denote the argument order still makes me have mixed feelings. On the one hand it reduces the readability since it's no more simply "put in from left to right" and on the other hand it doesn't save much anymore since now you need 2 characters to denote a single variable. a normal lambda would need 3 characters (definition, comma in argument list and usage)

You can continue to use _ for that; the numbered underscores are just a syntax for referring to arguments by their position.

In Mathematica's and Clojure _ always refers to the first argument, and numbered underscores are usually used for the second/third/... arguments.

So @mcabbott's first example works as-is and the second example can also be written as

sort(xs, lt= -> _.re < _2.re)    # (x,y) -> x.re < y.re

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 5, 2020

to be honest I'm absolutely against making the single underscore refer to the same argument. Because we'd lose the entire convenience for the multi argument cases only to save 1(!!) character in rare situations...
->_.re + _.im/100 vs x->x.re + x.im/100 that's just not worth it. So IMO each underscore should denote its own argument, from left to right.
By that approach compared to @mcabbott 's suggestion we'd save 2 characters for the different argument case and only lose a single character for the same argument case. (which btw would currently be handled by the tight binding approach of #24990)
I. e. whenever it feels like multiple underscores should denote the same element, just use an ordinary lambda (it'll only cost you a single character extra)

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 5, 2020

the numbering approach won't be possible until 2.0 anyway since _2 etc are currently valid identifiers and thus that would be breaking AFAIU

@yurivish
Copy link
Contributor

yurivish commented Dec 5, 2020

the numbering approach won't be possible until 2.0 anyway since _2 etc are currently valid identifiers and thus that would be breaking AFAIU

I think it wouldn't be breaking if _2 only means the second argument inside a headless anonymous function.

Good points re: just using an ordinary lambda. I'm curious what fraction of anonymous functions would be made shorter by the "headless" type. It'd be hard to measure accurately, since anonymous functions are used a lot in interactive non-package code.

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 5, 2020

Re-using the same symbol with different meanings in an expression seems confusing to me

while certainly true for most characters, the underscore already has the meaning of
"fill in whatever you get, I won't refer to it anywhere else"
See underscore as left hand side where it's used to tell that we don't need the result; and as a parametric argument to denote "I don't care about the actual type"
So I'd try to use that freedom to not to be bound to ordinary variable behavior. If I wanted ordinary behaviour for it, I could just use an ordinary variable 👀
if we were crazy we could just bind any variable which would otherwise result in an undefvarerror 😂 but I'm strongly against that.

@StefanKarpinski
Copy link
Member

I'm in agreement with @rapus95 here: let's stick to the simple win with _ for positional arguments. Anything more complicated seems to me better expresses with named arguments.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 6, 2020

Oh, one more thing to consider: interaction with |>. With just the basic proposal here, one would often need to write something like this:

collection |> -> map(lowercase, _) |> -> filter(in(words), _)

We could either introduce a new pipe syntax as a shorthand for this, or say that |> also acts as a headless lambda delimiter in the absence of -> so if there are unbound _ in the expression following the |> then the -> is implicitly inserted before the |> for you. So you'd write the above like this instead:

collection |> map(lowercase, _) |> filter(in(words), _)

This does allow using another headless lambda for the filter/map operation, like this for example:

collection |> map(-> _ ^ 7, _) |> filter(-> _ % 3 == 0, _)

Might want to do something similar with <| for symmetry. It feels a little weird to single out these two, but I can't see anything more general that makes much sense to me.

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

I love that idea tbh because it would give us some part of #24990 for "free". The only thing that holds me back is that, in that case applying the same rule to \circ probably make sense aswell and then it starts to feel like arbitrary special casing again...

Btw, since the headless approach is currently primarily developed around multiple argument cases, it'd be very nice if we had a syntactical solution for splatting. Otherwise cases like
(1,2) |> (x,y)->x+y wont work anyway since we'd have to wrap the anonymous function into Base.splat

EDIT: could we use |>> as head that automatically splats Tuple? It already has that extra > which somewhat hints the -> and if it's made specifically for that, we could even include splatting since it would be a special operator the evolves around chaining anonymous functions.

@yurivish
Copy link
Contributor

yurivish commented Dec 6, 2020

Could syntax like map(... -> sin(_) - cos(_), xy_tuples) work?

@bramtayl
Copy link
Contributor

bramtayl commented Dec 6, 2020

I think argument 1 could be 1 underscore (_), argument 2 could be 2 underscores (__), etc. This is how things are currently done in the queryverse. Ref https://www.queryverse.org/Query.jl/stable/experimental/#The-_-and-__-syntax-1 and #24990 (comment) and #24990 (comment)

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

@bramtayl having every single underscore reference the same single argument doesn't scale well for the headless syntax. Read #38713 (comment) for why.
and needing the same argument in multiple places is also a comparably rare case.

@bramtayl
Copy link
Contributor

bramtayl commented Dec 6, 2020

Hmm, needing to use the same argument in multiple places happens all the time in querying though, I think. Consider processing a row of a table: you might need to reference several fields of the row.

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

how many characters more would you need if you switch to using an ordinary lambda with a single letter variable name instead? (remember that you need the -> in any case)

@bramtayl
Copy link
Contributor

bramtayl commented Dec 6, 2020

The extra _ for the second argument seems to me to be a very small price to pay for the flexibility of using an argument as many times as you want, is all

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

a) but it doesn't scale into more different arguments
b) it's a huge price! we waste an entire syntax to save a single character compared to an ordinary lambda. You will never be able to save more than a single character if multiple underscores denote the same argument.

@bramtayl
Copy link
Contributor

bramtayl commented Dec 6, 2020

It seems to me likely that wanting to make a two argument anonymous function will be much less common than wanting to reference an argument more than once

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

@bramtayl do the maths yourself. it scales very bad (i.e.negatively) if you intend on using any other than the first argument more than once or if you intend to use more than 3 arguments
so, multiple chained underscores just don't benefit us for the general purpose case.

_i for i being a single digit number would still be a better proposal for that case, both for scaling in number of uses per argument and number of arguments. But that can live in its own issue since it is orthogonal to the current proposal

@bramtayl
Copy link
Contributor

bramtayl commented Dec 6, 2020

_i would definitely work too. I suppose I could go through some code and count the number of times you have different numbers of arguments in anonymous functions and the number of times you reuse an argument. Even though I suspect that reusing an argument will be far more common, it doesn't matter too much, because I think it would be nice to have a syntax flexible enough to do both. Do you know of any use-cases for three-argument anonymous functions?

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 6, 2020

just include a lot of code samples which use reduce and similar functions. If you only include code that maps data it would be an unfair comparison. But this already shows what I'm talking about. We don't want a domain specific syntax feature in the general purpose language. And the queryverse definitively is domain specific. And it already has a macro for that exact case. Which doesn't seem like it made it outside of that domain.

@goretkin
Copy link
Contributor

goretkin commented Dec 7, 2020

the underscore already has the meaning of
"fill in whatever you get, I won't refer to it anywhere else"
See underscore as left hand side where it's used to tell that we don't need the result; and as a parametric argument to denote "I don't care about the actual type"

I think more fundamentally, the principle behind _ is something like "I want to avoid giving an arbitrary name to a value". As of now, you can avoid giving an arbitrary name if

  • you don't need to access the value that would be named arbitrarily
(_, r) = divrem(1,2)
foo(_) = 3
foo(nothing)
  • (less famously) there is only one field in the struct, which is perfect for "wrapper" types
struct Foo
    _::Int64
end

Foo(3)._

Note this usage is a counterexample to _ meaning "I won't refer to it". (xref: #37381)

It seems (at least) roughly consistent with this principle that Array{_, 3} mean Array{<:Any, 3}, which is Array{var"#s27", 3} where var"#s27", where the need for an arbitrary name is fulfilled by something like gensym.

The use of _ as an arbitrary name for a type parameter, combined with the proposal here, leads to a possible ambiguity already mentioned. To be sure, consider the interaction with _ as a field name:

x->x._ could be written as ->_._. I do not think there is any chance for ambiguity here, the same way that there is no chance for ambiguity with x->x.x.

I agree that it would be more useful if each _ in the anonymous body referred to a subsequent argument, as opposed to the alternative that _ always refers to the first argument. I'm not sure if this choice can be justified as a natural consequence of _ meaning "avoid arbitrary name", and that the alternative cannot be, but it kind of feels that way. Certainly if _ is used as a type parameter, each _ would be a unique parameter.

@bramtayl
Copy link
Contributor

bramtayl commented Dec 7, 2020

Hmm, well, a quick audit of non-single-argument-0-or-1-mention uses of -> in julia/base, excluding splats, is below. A couple of notes:

  • 3 argument anonymous functions seem extremely rare; I didn't see any 4 argument ones.
  • At least from this sample, reusing an argument in an anonymous function is about as common as having multiple arguments.
  • I did this relatively quickly, so I probably missed some

Multiple arguments

sum(map((i, s, o)->s*(i-o), J, strides(x), Tuple(first(CartesianIndices(x)))))*elsize(x)
foldr((v, a) -> prepend!(a, v), iter, init=a)
(r,args) -> (r.x = f(args...))
(i,args) -> (itr.results[i]=itr.f(args...))
((p, q) -> p | ~q))
((p, q) -> ~p | q))
((p, q) -> ~xor(p, q)))
((p, q) -> ~p & q))
((p, q) -> p & ~q)))
map((rng, offset)->rng .+ offset, I.indices, Tuple(j))
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv))
foldl((x1,x2)->:($x1 || ($expr == $x2)), values[2:end]; init=:($expr == $(values[1])))
retry(http_get, check=(s,e)->e.status == "503")(url)
retry(read, check=(s,e)->isa(e, IOError))(io, 128; all=false)
dict_with_eltype((K, V) -> IdDict{K, V}, kv, eltype(kv))
CartesianIndices(map((i,j) -> i:j, Tuple(I), Tuple(J)))
CartesianIndices(map((i,s,j) -> i:s:j, Tuple(I), Tuple(S), Tuple(J)))
map((isrc, idest)->first(isrc)-first(idest), indssrc, indsdest)
(x,y)->isless(x[2],y[2])
(x, y) -> lt(by(x), by(y))
(io, linestart, idx) -> (print(io, idx > 0 ? lpad(cst[idx], nd+1)
(mod, t) -> (print(rpad(string(mod) * "  ", $maxlen + 3, "─"));
(f, x) -> f(x)
(f, x) -> wait(Threads.@spawn f(x))
afoldl((ys, x) -> f(x) ? (ys..., x) : ys, (), xs...)
Base.dict_with_eltype((K, V) -> WeakKeyDict{K, V}, kv, eltype(kv))
simple_walk(compact, lifted_val, (pi, idx)->true)
(io::IO, indent::String, idx::Int) -> printer(io, indent, idx > 0 ? code.codelocs[idx] : typemin(Int32))

Reuses an argument

dst::typeof(I) = ntuple(i-> _findin(I[i], i < n ? (1:sz[i]) : (1:s)), n)::typeof(I)
src::typeof(I) = ntuple(i-> I[i][_findin(I[i], i < n ? (1:sz[i]) : (1:s))], n)::typeof(I)
CartesianIndices(ntuple(k -> firstindex(A,k):firstindex(A,k)-1+@inbounds(halfsz[k]), Val{N}()))
CartesianIndices(ntuple(k -> k == dims[1] ? (mid:mid) : (firstindex(A,k):lastindex(A,k)), Val{N}()))
all(d->idxs[d]==first(tailinds[d]),1:i-1)
map(x->string("args_tuple: ", x, ", element_val: ", x[1], ", task: ", tskoid()), input)
foreach(x -> (batch_refs[x[1]].x = x[2]), enumerate(results))
map(v -> Symbol(v[1]) => v[2], split.(tag_fields, "+"))
findlast(frame -> !frame.from_c && frame.func === :eval, bt)
ntuple(n -> convert(fieldtype(T, n), x[n]), Val(N))
map(chi -> (chi.filename, chi.mtime), includes)
filter(x -> !(x === empty_sym || '#' in string(x)), slotnames[(kwli.nargs + 1):end])
ntuple(i -> i == dims ? UnitRange(1, last(r[i]) - 1) : UnitRange(r[i]), N)
ntuple(i -> i == dims ? UnitRange(2, last(r[i])) : UnitRange(r[i]), N)
map(n->getfield(sym_in(n, bn) ? b : a, n), names)
filter!(x->!isempty(x) && x!=".", parts)
all(map(d->iperm[perm[d]]==d, 1:N))
ntuple(i -> i == k ? 1 : size(A, i), nd)
ntuple(i -> i == k ? Colon() : idx[i], nd)
map(x->x isa Integer ? UInt64(x) : String(x), pre)
map(x->x isa Integer ? UInt64(x) : String(x), bld))
_any(t -> !isa(t, DataType) || !(t <: Tuple) || !isknownlength(t), utis)
 _all(i->at.val[i] isa fieldtype(t, i), 1:n)
filter(ssa->!isa(ssa, SSAValue) || !(ssa.id in intermediaries), useexpr.args[(6+nccallargs):end])
findfirst(i->last_stack[i] != stack[i], 1:x)
 x -> (x = new_nodes_info[x]; (x.pos, x.attach_after))
filter(p->p != 0 && !(p in bb_defs), cfg.blocks[bb].preds)
filter(ex -> !(isa(ex, LineNumberNode) || isexpr(ex, :line)), ex.args)

@rapus95
Copy link
Contributor Author

rapus95 commented Dec 7, 2020

there is only one field in the struct, which is perfect for "wrapper" types

@goretkin that case is perfectly handled by interpreting it as 0-d and using x[] (as Ref does) the "whatever comes I won't refer to it" otoh was the only reason why underscore became reserved. That's the reason why it must not be used as right hand side.

@bramtayl would you be willing to translate these cases into both (or even better all 3 variants) i. e. _ _ _ __ _1 _2 and measure the number of characters saved, compared to the ordinary lambda?

@mcabbott
Copy link
Contributor

mcabbott commented Dec 7, 2020

Thanks for gathering these, @bramtayl.

In the "multiple arguments" list, it looks like 12/28 don't follow the simple pattern of using every argument, exactly once, in order, and not as a type parameter.

2 of those simply drop trailing arguments, (x, _...) -> stuff, which raises the question (not so-far discussed?) of whether these headless lambdas should in general accept more arguments than they use, or not. Should map(->nothing, xs) work?

Details Drop last:
simple_walk(compact, lifted_val, (pi, idx)->true)
(mod, t) -> (print(rpad(string(mod) * "  ", $maxlen + 3, "─"));
Drop first or middle:
retry(http_get, check=(s,e)->e.status == "503")(url)
retry(read, check=(s,e)->isa(e, IOError))(io, 128; all=false)
(io, linestart, idx) -> (print(io, idx > 0 ? lpad(cst[idx], nd+1)
Shuffle:
sum(map((i, s, o)->s*(i-o), J, strides(x), Tuple(first(CartesianIndices(x)))))*elsize(x)
Re-use:
afoldl((ys, x) -> f(x) ? (ys..., x) : ys, (), xs...)
foldr((v, a) -> prepend!(a, v), iter, init=a)
(io::IO, indent::String, idx::Int) -> printer(io, indent, idx > 0 ? code.codelocs[idx] : typemin(Int32))
Type parameters:
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv))
dict_with_eltype((K, V) -> IdDict{K, V}, kv, eltype(kv))
Base.dict_with_eltype((K, V) -> WeakKeyDict{K, V}, kv, eltype(kv))

Simple cases (every parameter used exactly once, in order) [Edit -- now with some brackets removed]:

(r,args) -> (r.x = f(args...))
(i,args) -> (itr.results[i]=itr.f(args...))
((p, q) -> p | ~q )
((p, q) -> ~p | q )
((p, q) -> ~xor(p, q))
((p, q) -> ~p & q)
((p, q) -> p & ~q)
map((rng, offset)->rng .+ offset, I.indices, Tuple(j))
foldl((x1,x2)->:($x1 || ($expr == $x2)), values[2:end]; init=:($expr == $(values[1])))
CartesianIndices(map((i,j) -> i:j, Tuple(I), Tuple(J)))
CartesianIndices(map((i,s,j) -> i:s:j, Tuple(I), Tuple(S), Tuple(J)))
map((isrc, idest)->first(isrc)-first(idest), indssrc, indsdest)
(x,y)->isless(x[2],y[2])
(x, y) -> lt(by(x), by(y))
(f, x) -> f(x)
(f, x) -> wait(Threads.@spawn f(x))

... which could become (with each _ a new argument)

-> (_.x = f(_...))
-> (itr.results[_]=itr.f(_...))
(-> _ | ~_ )
(-> ~_ | _ )
(-> ~xor(_, _))
(-> ~_ & _ )
(-> _ & ~_ )
map(->_ .+ _, I.indices, Tuple(j))
foldl(->:($_ || ($expr == $_)), values[2:end]; init=:($expr == $(values[1])))
CartesianIndices(map(-> _:_, Tuple(I), Tuple(J)))
CartesianIndices(map(-> _:_:_, Tuple(I), Tuple(S), Tuple(J)))
map(->first(_)-first(_), indssrc, indsdest)
->isless(_[2],_[2])
-> lt(by(_), by(_))
-> _(_)
-> wait(Threads.@spawn _(_))

Interesting how few from either list above would be clearer (IMO) without naming variables -- most are quite long & complicated. So another possibility to consider is that this headless -> syntax could be restricted to zero or one arguments [Edit -- I think I meant to say, "used at most once"], at least initially. To emphasise that it's for writing short things, where clarity may be improved by not having to name the variable.

I'm not sure that counting characters saved is a great measure, as the cases where you could save the most letters also seem like the ones complicated enough that you ought to be explicit. [Nor is counting how many cases in Base, really.] But using |> as a fence seems neat (it's visually -> with the minus rotated, right?) and means that some one-argument cases could become quite a bit shorter & less cluttered. For example you don't have to think about whether it's confusing to re-use the same name for vs & xs here:

[rand(Int8,5) for _ in 1:7] |> vs -> reduce(vcat, vs) |> xs -> filter(x -> x%3 != 0, xs)

collection |> reduce(vcat, _) |> filter(-> _%3 != 0, _)

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 20, 2023

As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:

Added a new syntax for lambdas for which the argument list is skipped. It is tailored to different situations:

  1. Accessing/extracting data from a single (first and only) argument
    Within a headless lambda that gets exactly one argument (onlyarg), interpolation syntax $prop refers to onlyarg.prop and $[i] refers to onlyarg[i]. Both access the first and only argument of the lambda. (solves 22710 and part of 24990)
    Examples:
  • filter(->$colA+$colB>5, data) as a short form of filter(x->x.colA+x.colB>5, data)
  • accessor = ->$a (hence accessor(x) == x.a)
  • fifth = ->$[5] (hence fifth(x) = x[5])
  1. Combining/transforming multiple arguments (as needed for higher order functions)
    Within a headless lambda, underscores refer to those different arguments in order, based on position. (solves part of 24990)
    Example:
  • reduce(->2*_+_, data) as a short form of reduce((a,b)->2*a+b, data)
  • map(->2*sin(_), data) as a short form of map(x->2*sin(x), data)

further examples:

  • mapreduce(->$a, ->2*_+_, data) as a shorthand for mapreduce(x->x.a, (s,n)->2*s+n, data)
  • subset(df, [colA, colB] => ByRow(->_+_>5)) as a shorthand for subset(df, [colA, colB] => ByRow((a,b)->a+b>5))

@tpapp
Copy link
Contributor

tpapp commented Feb 20, 2023

I'm missing core aspects on what makes the proposal difficult to understand for the user.

From my perspective it is not that the proposal is difficult to understand per se. Julia is a powerful language, which comes with a certain amount of complexity, and users manage that just fine.

I think the key issue is the gain in function vs the added complexity, and tradeoffs between various alternatives (the multislot and the single argument versions are of course mutually exclusive).

Also, I think that a _ stands out more visually than a $.

(Incidentally, I find it confusing to switch syntaxes in the middle of a proposal like this.)

@MasonProtter
Copy link
Contributor

MasonProtter commented Feb 20, 2023

->$a
#to me feels even more straight to the point and better serving the intention than
->_.a

This basically proposes $ instead of _ as the single argument placeholder, right?

@aplavin no, if you look at the code you quoted they are suggesting that -> $a means x -> x.a, not that $ is used as an alternative for _.


As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description:

@rapus95 I have no problem personally, with adding more handy syntax because I've already learned our current syntax, so this is just a small bite sized addition for me to learn. However, that's not the case for everyone, specifically new users.

I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.

The more special syntax we have, the less willing we should be to add more special syntax on top of it.

@MasonProtter
Copy link
Contributor

MasonProtter commented Feb 20, 2023

Actually, I just realized that we don't really need to solve #36547, and we can actually just replace thist -> syntax with a macro pretty trivially. The key is just to slurp up and then spit out extra arguments that might end up in the macro.

using MacroTools

@eval macro $(:_)(ex)
    @gensym x
    if ex isa Expr && ex.head == :tuple
        pre_body, rest... = ex.args
    else
        pre_body = ex
        rest = ()
    end
    body = MacroTools.postwalk(pre_body) do ex
        ex == :_ ? x : ex
    end
    λ = :($x -> $body)
    if length(rest) == 0
        esc(λ)
    else
        esc(:(($λ, $(rest...))...))
    end
end

Behold:

julia> map(@_ _[1], [[1,2,3], [4,5,6]])
2-element Vector{Int64}:
 1
 4

No parser changes required.

@bramtayl
Copy link
Contributor

bramtayl commented Feb 20, 2023

Macros kind of work "outside in" but parsing kind of works "inside out". In this case, if this was done by the parser, _ would "find" the innermost -> to "attach" to. And it would be good for this syntax to be nestable, so people could do: df |> filter(-> _.a .> 1, _). There are messy ways to work around this with a macro, but especially since this is such a commonly demanded feature, I think parser support is the way to go.

@MasonProtter
Copy link
Contributor

Parsing also definitely works "outside in", and has to take the same care that a macro would have to take to attach _ to the right fence.

@bramtayl
Copy link
Contributor

Hmm, maybe I meant symbol resolution works inside out?

@MasonProtter
Copy link
Contributor

This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.

@JeffBezanson
Copy link
Member

I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users.

💯 I would go farther though. Less noise is better for everybody, not just somebody in their first week of learning Julia.

@bramtayl
Copy link
Contributor

This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something.

Ok, but what if someone else writes a new macro that uses _ and they don't play well together?

@MasonProtter
Copy link
Contributor

MasonProtter commented Feb 20, 2023

Someone could also quite easily write a macro that doesn't play well with this PR in the same way.

@MasonProtter
Copy link
Contributor

Okay, I've made https://github.com/MasonProtter/SimpleUnderscores.jl, @bramtayl or anyone else interested in this syntax please feel free to poke around with it and see if it fails in any obvious ways.

@aplavin
Copy link
Contributor

aplavin commented Feb 21, 2023

This basically proposes $ instead of _ as the single argument placeholder, right?

@aplavin no, if you look at the code you quoted they are suggesting that -> $a means x -> x.a, not that $ is used as an alternative for _.

Whew, I missed that there's no dot between $ and a indeed! Why?..
Well, $a is even worse IMO: it's totally ad-hoc special syntax, and doesn't work with function-based data accessors like
-> max(real(_), 0) would.

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 21, 2023

I personally don't like that the idea is to dedicate a syntax to replace x with an underscore. Because that's all, that your proposal could do. IMO there should be more benefit in this. And I'd like to have something that's compatible with DataFrames.jl and the higher order functions (especially able to create binary functions)

@pablosanjose
Copy link
Contributor

pablosanjose commented Feb 21, 2023

I'm not sure I understand how the proposed -> delimiter disambiguates the boundaries of the lambda. Unless I'm very confused (which could well be) I should parsef(-> 2_, 2) as f(x->2x, 2), f(->_,_) as f((x,y) -> (x,y)) (or f(x-> (x,x)) depending on who you ask) , g |> f(->_, _) as g |> z-> f(x->x, z) and g |> f(->_, 2) as g |> f(x->x, 2) ? Or do commas act also as rightmost boundaries?

Although I really would like to have concise lambdas, I find that anything that is not 100% obvious and transparent (why should two visually identical _ represent different arguments?), would only make Julia syntax worse for most people (who not only write, but read code). My 2 cents.

EDIT: Mmm, perhaps I indeed misunderstood, and f(->_,_) should be a syntax error, just like x->x,x.

@tpapp
Copy link
Contributor

tpapp commented Feb 21, 2023

how the proposed -> delimiter disambiguates the boundaries of the lambda

My understanding is that it is the same as (args...) -> body... now, just without the arguments.

and f(->,) should be a syntax error, just like x->x,x.

But that isn't:

julia> let x = 1
       x -> x,x
       end
(var"#5#6"(), 1)

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 21, 2023

I'll keep producing ideas hoping there'll once be one that serves most of us.
what about:

->_+_ == x->x+x
1->_+_ == error
2->_+_ == (x, y)->x+y
3->_+_ == (x, y, z)->x+y

Then I'd have my binaries by prepending a 2 and also something that works with DataFrames.jl while you still have your syntax available

@aplavin
Copy link
Contributor

aplavin commented Feb 21, 2023

I personally don't like that the idea is to dedicate a syntax to replace x with an underscore.

But you proposed to replace x. with $ :)

I don't think this kind of syntax is necessary at all in Julia, agree with @tpapp and others above in that.
Even more so with the recent @MasonProtter's finding about macros. @-> _.a + _.b is basically as concise and readable as -> _.a + _.b, especially if a single-character macro name is chosen instead.
Everyone reading Julia code is already used to macros preceded with @ and affecting following code in some way.

If this macro behavior is indeed officially supported, it may become popularized and see more usage in packages.

Because that's all, that your proposal could do.

I don't propose to add -> _.a + _.b syntax!
It's just when comparing -> syntax variants that use the underscore to mean a single thing vs multiple different things, I definitely prefer the former.
Empirical evidence shows the same: many packages use _ to mean the same thing in the same scope, vs none where _ means different things each time.

And I'd like to have something that's compatible with DataFrames.jl and the higher order functions

DataFrames often choose unique special syntax that has no equivalent elsewhere in Julia. Design of that and other packages isn't set in stone, and can be influenced by Julia changes. For example, at some point they may decide to replace transform(df, [:a, :b] => ByRow((a,b) -> a+b)) with transform(df, [:a, :b] => ByRow(->_.a+_.b)) to be more consistent with other tables/collections.

In higher-order data processing functions with regular Julian interface, "underscore = the only argument" is most often convenient.

@tpapp
Copy link
Contributor

tpapp commented Feb 21, 2023

I'll keep producing ideas in the hope there'll once be one that serves most of us.

My problem with this family of proposals is the meandering brainstorming that they degenerate to. Discussion is fine, and people should of course comment if they want to make a point, but if that leads to major changes a new issue should be opened IMO.

Reading a comment stream which has various proposals floating around without a resolution is very confusing, especially in a discussion that goes on for years.

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 21, 2023

I personally don't like that the idea is to dedicate a syntax to replace x with an underscore.

But you proposed to replace x. with $ :)

yes, as a complementary proposal to extend the original one for better synergy. Not as the only feature

@MasonProtter
Copy link
Contributor

MasonProtter commented Feb 21, 2023

I think this syntax should be a macro for the following reasons:

  1. It can be a macro.
  2. It being a macro would let it do more controversial things without needing buy-in from everyone, and without overcomplicating the language base syntax. For example,
    • We can use _1, _2, _14, etc. to refer to the 1st, 2nd, and 14th arguments of the function respectively
    • Things like the proposal to use $a to mean _.a could be considered whereas it is pretty much out of the question for surface syntax.
  3. This being a macro would make it less conflicting and less problematic to have A{_} mean A{T} where T.

@bramtayl
Copy link
Contributor

bramtayl commented Feb 21, 2023

I've had a somewhat similar macro in LightQuery for a few years and it didn't seem to catch on (I haven't really been maintaining it). It uses double underscore instead of _2, but I kind of like the numbers better. Having a officially supported macro in base (or ideally, two, @-> and @|>) would be pretty nice.

@MasonProtter
Copy link
Contributor

Ah yes, I had forgotten about the one in LightQuery.jl. I think so far as I recall, people didn't like that you had to write

map(@_(1 + _), v)

instead of

map(@_ 1 + _, v)

since @_(1+_) is actually longer than x->1+x, so maybe that was a barrier to adoption?

@bramtayl
Copy link
Contributor

map(@_ 1 + _, v) is nicer for sure!

@tpapp
Copy link
Contributor

tpapp commented Feb 22, 2023

@MasonProtter:

being a macro would let it do more controversial things without needing buy-in from everyone

Indeed, not having to put this in the core language at this point would allow fuller exploration of this syntax without the usual constraints of having stuff in Julia. So this would be a great advantage.

Thanks for making a package!

@LilithHafner LilithHafner removed the triage This should be discussed on a triage call label Feb 1, 2024
Keno added a commit that referenced this issue Apr 4, 2024
This PR adds support for parsing `.a` as `x->x.a`. This kind of thing has come
up multiple times in the past, but I'm currently finding myself doing a lot
of work on nested structs where this operation is very common.

In general, we've had the position that this kind of thing should be a special
case of the short-currying syntax (e.g. #38713), but I actually think that might
be a false constraint. In particular, `.a` is a bit of a worst case for the curry
syntax. If there is no requirement for `.a` to be excessively short in an eventual
underscore curry syntax, I think that could open more options.

That said, any syntax proposal of course needs to stand on its own, so let me
motivate the cases where I think this plays:

A. Curried getfield

I think this is probably the most obvious and often requested. The syntax
here is very useful for situations where higher order functions operate
on collections of records:

1. `map(.a, vec)` and reductions for getting the fields of an object - also includes things like `sum(.price, items)`
2. Predicates like `sort(vecs, by=.x)` or `filter(!.deleted, entries)`
3. In pipelines `vecs |> .x |> sqrt |> sum`

I think that's mostly what people are thinking of, but the use case
for this syntax is more general.

B. A syntax for lenses

Packages like Accessors.jl provide lens-like abstractions. Currently these are written as
`lens = @optic _.a`. An example use of Accessors.jl is (from their documentation)
```
julia> modify(lowercase, (;a="AA", b="BB"), @optic _.a)
T("aa", "BB")
```

This PR can be thought of as providing lenses first class syntax, as in:
```
julia> modify(lowercase, (;a="AA", b="BB"), .a)
T("aa", "BB")
```

C. Symbol index generalization to hierachical structures

We have a lot of packages in the ecosystem that support named axes
of various forms (Canonical examples might be DataFrames and NamedArrays,
but there's probably two dozen of these). Generally the way that this
syntax works is that people use quoted symbols for indexing:

```
df[5, :col]
```

However, this breaks down when there is hierachical composition involved.
For example, for simulation models, you often build parameter sets and
solutions out of hierarchies of simpler models.

There's a couple of solutions that people have come up with for this problem:
1. Some packages parse out hierachy from symbol names: `sol[:var"my.nested.hierachy.state"]`
2. Other packages have a global root object: `sol[○.my.nested.hierarchy.state]`
2a. A variant of this is using the object as its own root `sol[sol.my.nested.hierarchy.state]`
2b. Yet another variant is having the root object be context specific `sol[sys.my.nested.hierarchy.state]`
3. Yet other packages put symbolic names into the global namespaces `sol[my.nested.hierarchy.state]`

These solutions are all lacking. 1 requires string manipulation for composition, the various variants of 2
are ok, but there is no agreement among packages what the root object looks like or is spelled, and
even so, it's an extra export and 3 pollutes the global namespaces.

By using the same mechanism here, we essentially standardize the solution `2`,
but make the root object implicit.`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax
Projects
None yet
Development

No branches or pull requests