You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is the right place, but I've been thinking about the table operations that I typically use.
Seems the Tables interface allows for this in a straightforward way.
I was going to just implement it and release the package, but thought I'd run the idea here to coordinate efforts.
I've boiled them down to the operations in the code snippet below, aiming for:
No ambiguity. It should be obvious from the name what the operation does. E.g., select selects columns by convention, but if I've been away from my code for a while I have to relearn this. I'd prefer selectcols (and selectrows instead of filter).
Safety. Mutating operations should be visibly clear, and unsafe operations made explicit (as per the previous point).
Minimality. There shouldn't be 2 functions that do the same thing. E.g., some systems have both mutate and transform, which I think creates clutter in the API.
So here's what I have in mind for tables, views and split-apply-combine operations.
Suggestions most welcome.
Cheers
Tables:
newtable =SomeTableType(table) # Convert table to SomeTableType
val = table[i, colname] # Get
table[i, colname] = val # Set
newtable =appendrows(table, rows)
newtable =appendcols(table, newcolname => somevector...)
newtable =appendcols(table, newcolname =>func(row)...)
newtable =deleterows(table, rows)
newtable =deletecols(table, cols)
table =mutatecol!(table, colname::Symbol=> func)
table =sortrows!(table, by)
table =sortrows!(table, colnames, rev)
Views:
view =selectcols(table, colnames)
view =selectrows(table, rowindices)
view =selectrows(table, func(row))
val = view[i, colname] # Get
view[i, colname] = val # Set. Raise an error if the view function returns false on the resulting row.unsafe_set!(view, i, colname, val) # Changes the value and does not raise an error.
newtable =SomeTableType(view) # Convert view to SomeTableType
view =mutatecol!(view, colname::Symbol=> func) # Raise an error if the view function returns false on any of the resulting rows.
split-apply-combine:
grptbl =groupby(data, colnames...)
grptbl =groupby(data, rowfunc)
for grp in grptbl # grp is a viewfor r inrows(grp)
# do something hereendend
reducedtbl = some_empty_table
for grp in grptbl
push!(reducedtbl, (col1=sum(grp[:col3]), col2=mean(grp[:col4])))
end
val =groupdefinition(grp) # (colname1=val1, colname2=val2,...) if grp was defined by colnames; or func(grp[1, :]) if grp was defined by a row function
grp =group(grptbl, groupdef) # Useful for groups accessed via definition.
grp =group(grptbl, groupidx) # Useful for accessing groups by index and for iterating over groups
For constructing reduced tables DataFrames has an interface similar to
reducedtbl =reduceby(table, colnames, :col1=> (sum, :col3), :col2=> (sum, :col4)) # Short version of the above, though less flexible (cannot operate on multiple columns at once)
But I prefer the version that explicitly iterates over the groups because it adheres to minimality and is more flexible (construction of the new columns can use arbitrary functions of the input view).
The text was updated successfully, but these errors were encountered:
Thanks. That's a long thread that didn't conclude anything. Since the set of operations that are useful will differ for different people/use cases/style preferences, it's hard to see any agreement for a single unified API. Nor does it seem necessary.
Base and Tables.jl between them seem to provide the elements required to construct a set of operations, so perhaps it's best to let a number of query-like packages built on Base and Tables.jl to emerge organically, which the community will undoubtedly pare down by voting with their feet.
I'm inclined to just put something out there with a view to having it improved or replaced by popular vote.
Hi there,
Not sure if this is the right place, but I've been thinking about the table operations that I typically use.
Seems the
Tables
interface allows for this in a straightforward way.I was going to just implement it and release the package, but thought I'd run the idea here to coordinate efforts.
I've boiled them down to the operations in the code snippet below, aiming for:
select
selects columns by convention, but if I've been away from my code for a while I have to relearn this. I'd preferselectcols
(andselectrows
instead offilter
).mutate
andtransform
, which I think creates clutter in the API.So here's what I have in mind for tables, views and split-apply-combine operations.
Suggestions most welcome.
Cheers
Tables:
Views:
split-apply-combine:
For constructing reduced tables DataFrames has an interface similar to
But I prefer the version that explicitly iterates over the groups because it adheres to minimality and is more flexible (construction of the new columns can use arbitrary functions of the input view).
The text was updated successfully, but these errors were encountered: