Skip to content

Latest commit

 

History

History
141 lines (89 loc) · 4.94 KB

roadmap.md

File metadata and controls

141 lines (89 loc) · 4.94 KB

Krangl Roadmap

For completed items see change-log.

Next release

https://github.com/holgerbrandl/krangl/issues

Meta

IO

Core

  • more defined behavior/tests needed for grouped dfs that become empty after filtering
require(dplyr)
iris %>% group_by(Species) %>% filter(Sepal.Length>100)

API improvements


Performance

  • misc consider to use kotlin.collections.ArrayAsCollection
  • Setup up benchmarking suite

List copy optimization

  • use iterable where possible

  • misc consider to use kotlin.collections.ArrayAsCollection --> get rid of toList which always does a full copy internally.

  • 30% flights HOTSPOT: krangl/Extensions.kt:275 can we get rid fo the array creation?

  • krangl.SimpleDataFrame.addColumn should avoid toMutatbleList

  • More consistent use of List vs using arrays as column datastore (see array vs list). This would avoid array conversion which are omnipresent in the API at the moment.

  • get rid of other toMutableList` and use view instead

  • Analyze benchmark results with with kravis/krangl :-)

  • use for column indices to speed up access

fast column storage https://github.com/lemire/JavaFastPFOR http://fastutil.di.unimi.it/

http://nd4j.org/

benchmarking

https://github.com/mm-mansour/Fast-Pandas

Backlog

Rejected Ideas

  • directly access values with it["foo"] and not just column object. For the latter DataFrame.cols can be used
    • Not a good idea because all extension function would then be defined for common lists like List etc. It's more important to keep the namespace clear

Provide adhoc/data class conversion for column model adhoc/data class objects

val dataFrame = object : DataFrame() {
    val x = Factor("sdf", "sdf", "sdfd")
    val y = DblCol(Double.MAX_VALUE, Double.MIN_VALUE)
    val z = y + y
}


val newTable = df.map{ data class Foo(val name:String)}

newTable.newCol
newTable.src.x

--> Can not work because data class is not an expression


  • improve benchmarking by avoid jmv warmup with -XX:CompileThreshold=1 src

--> rather continue with jmh driven benchmarking subproject

--

Make use of kotlin.Number to simplify API --> Done by adding NumberCol but unclear how to actually benefit from it