Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow group to take in an AbstractVector of groups? #30

Open
pdeffebach opened this issue Oct 30, 2020 · 8 comments
Open

Allow group to take in an AbstractVector of groups? #30

pdeffebach opened this issue Oct 30, 2020 · 8 comments

Comments

@pdeffebach
Copy link

Something like

g = [1, 1, 2, 2]
x = [5, 6, 7, 8]
group(g, x)
@andyferris
Copy link
Member

Yes I think this is a good idea, though we need to be careful that dispatch works out.

I also thought we might have had something like this? (Perhaps it’s the internal function).

@pdeffebach
Copy link
Author

I don't feel that strongly about it. It was just a surprising omission because without this there is no exact equivelent to a tapply call from R

@andyferris
Copy link
Member

Hi @pdeffebach,

I finally got some time at the computer and see we already have this behavior:

julia> g = [1, 1, 2, 2]
4-element Array{Int64,1}:
 1
 1
 2
 2

julia> x = [5, 6, 7, 8]
4-element Array{Int64,1}:
 5
 6
 7
 8

julia> group(g, x)
2-element Dictionaries.Dictionary{Int64,Array{Int64,1}}
 1 │ [5, 6]
 2 │ [7, 8]

Is this what you were expecting?

@andyferris
Copy link
Member

Regarding R's tapply if you want to apply fun to each group you can do fun.(group(g, x)) (or sometimes fun.(groupview(g, x)) might be faster/less memory hungry, and there is always groupreduce like groupreduce(+, g, x)).

@pdeffebach
Copy link
Author

Thanks for this.

One final question, is there a version of this for transform? I.e. "spread"-ing the result across a vector the same length as the inputs?

I've been doing data cleaning at the repl and not having to write out a full groupby... transform call in data frames would be nice

@andyferris
Copy link
Member

I'm not sure what you are seeking? Is it this?

julia> g = [1, 1, 2, 2]
4-element Array{Int64,1}:
 1
 1
 2
 2

julia> x = [5, 6, 7, 8]
4-element Array{Int64,1}:
 5
 6
 7
 8

julia> groups = group(g, x)
2-element Dictionaries.Dictionary{Int64,Array{Int64,1}}
 1 │ [5, 6]
 2 │ [7, 8]

julia> map(x -> groups[x], g)
4-element Array{Array{Int64,1},1}:
 [5, 6]
 [5, 6]
 [7, 8]
 [7, 8]

@pdeffebach
Copy link
Author

Sorry for forgetting about this thread. I think the infrastructure has almost what I want, but I would like this to be in one function (The package is called SplitApplyCombine after all)

julia> using Statistics, SplitApplyCombine;

julia> function applyby(f, g::AbstractVector, x::AbstractVector)
           groups = group(g, x)
           map(f, groups)
       end
applyby (generic function with 1 method)

julia> applyby(mean, [1, 1, 2, 2], [5, 6, 7, 8])
2-element Dictionaries.Dictionary{Int64, Float64}
 1 │ 5.5
 2 │ 7.5

This would be nice to have. For reference, my motivation is for supporting grouped operations inside DataFramesMeta's @with, where all columns are just the vectors, so we can't take advantage of any DataFrames machinery.

An added bonus on the above would be to allow multiple arguments, i.e. applyby(f, g, args...). Not sure how that would work but could be feasible.

@aplavin
Copy link
Collaborator

aplavin commented May 26, 2021

Out of general principles, it seems more optimal to have fewer general functions that easily compose (group + map in your example) compared to a larger number of specialized functions (applyby). I think this case would have an (almost) zero overhead if you use groupview instead of group.
Maybe I'm missing something, but

map(mean, group([1, 1, 2, 2], [5, 6, 7, 8]))

already looks very short, intuitive and clear - when one knowns what map and group do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants