Krangl Roadmap

For completed items see change-log.

Next release

https://github.com/holgerbrandl/krangl/issues

provide a tidyr::fill equivalent
consider shorter columns cast as in pandas
Add excel import/export
Better documentation & cheatsheet
Date (column?) support
Factor (column?) support + Add factor attribute utilities similar to methods in R package forcats
better spec out NA
- consider use of doublearray for double/int-col along with NaN, see https://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data
add a pretty_column helper
improve iter.toDataFrame() to include reference, getters, kotlin property getters

Meta

inconsistenly named reader methods
krangl.ColumnsKt#map should have better return type
use/support compressed columns (https://github.com/lemire/JavaFastPFOR)
Better lambda receiver contexts
Performance (indices, avoid list and array copies, compressed columns)
Use dedicated return type for table formula helpers (like mean, rank) to reduce runtime errors
More bindings to other jvm data-science libraries
Sequence vs Iterable?
Pluggable backends like native or SQL
should unfold be better called flatten?
write chapter about timeseries support
- https://github.com/signaflo/java-timeseries/wiki/ARIMA-models
- learn from https://pandas.pydata.org/pandas-docs/stable/timeseries.html

IO

Add parquet support https://stackoverflow.com/questions/39728854/create-parquet-files-in-java

Core

more defined behavior/tests needed for grouped dfs that become empty after filtering

require(dplyr)
iris %>% group_by(Species) %>% filter(Sepal.Length>100)

API improvements

Performance

misc consider to use kotlin.collections.ArrayAsCollection
Setup up benchmarking suite

List copy optimization

use iterable where possible
misc consider to use kotlin.collections.ArrayAsCollection --> get rid of toList which always does a full copy internally.
30% flights HOTSPOT: krangl/Extensions.kt:275 can we get rid fo the array creation?
krangl.SimpleDataFrame.addColumn should avoid toMutatbleList
More consistent use of List vs using arrays as column datastore (see array vs list). This would avoid array conversion which are omnipresent in the API at the moment.
get rid of other toMutableList` and use view instead
Analyze benchmark results with with kravis/krangl :-)
use for column indices to speed up access

fast column storage https://github.com/lemire/JavaFastPFOR http://fastutil.di.unimi.it/

http://nd4j.org/

benchmarking

https://github.com/mm-mansour/Fast-Pandas

Backlog

remove regrouping in core verbs where possible
consider to use invoke for row access (potentially decouple more arguable extensions in different namespace?)
provide equivalent for dplyr::summarize_each and dplyr::mutate_each #4
krangl.head should use view instead of copy; also consider to use views for grouped data (see https://softwarecave.org/2014/03/19/views-in-java-collections-framework/)
koma bindings --> http://koma.kyonifer.com/
Add a DataFrame.transpose() method as_tibble(cbind(nms = names(df), t(df)))
Integrate idoms to do enrichment testing with fisher test from commons-math
see tablesaw changelog https://jtablesaw.github.io/tablesaw/changes_in_v_0.2

Rejected Ideas

directly access values with it["foo"] and not just column object. For the latter DataFrame.cols can be used
- Not a good idea because all extension function would then be defined for common lists like List etc. It's more important to keep the namespace clear

Provide adhoc/data class conversion for column model adhoc/data class objects

val dataFrame = object : DataFrame() {
    val x = Factor("sdf", "sdf", "sdfd")
    val y = DblCol(Double.MAX_VALUE, Double.MIN_VALUE)
    val z = y + y
}


val newTable = df.map{ data class Foo(val name:String)}

newTable.newCol
newTable.src.x

--> Can not work because data class is not an expression

improve benchmarking by avoid jmv warmup with -XX:CompileThreshold=1 src

--> rather continue with jmh driven benchmarking subproject

--

Make use of kotlin.Number to simplify API --> Done by adding NumberCol but unclear how to actually benefit from it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roadmap.md

roadmap.md

Krangl Roadmap

Next release

Meta

IO

Core

API improvements

Performance

Backlog

Rejected Ideas

Files

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

Krangl Roadmap

Next release

Meta

IO

Core

API improvements

Performance

Backlog

Rejected Ideas