New Features
- #97 Added Excel read/write support (by LeandroC89)
// read
df = DataFrame.readExcel("data.xlsx", sheetName = "sales")
df = DataFrame.readExcel("data.xlsx", cellRange = CellRangeAddress.valueOf("A1:D10"))
// write
df.writeExcel("results.xslx")
- #95 Improved column type casts
dataFrameOf("foo")(1, 2, 3).addColumn("stringified_foo") { it["foo"].toStrings() }.schema()
> DataFrame with 3 observations
> foo [Int] 1, 2, 3
> stringified_foo [Str] 1, 2, 3
dataFrameOf("foo")("1", "2", "3").addColumn("parsed_foo") { it["foo"].toInts() }.schema()
> DataFrame with 3 observations
> foo [Str] 1, 2, 3
> parsed_foo [Int] 1, 2, 3
- #99 Added filtering by list (similar to R's
%in%
operator)
irisData.filter { it["Species"].inList("setosa", "versicolor") }
Bug Fixes
- #84 Builder now supports mixed numbers in column
- #96 & #94 Fixed bugs in
join
- #100 Improved SQL bindings
- #99 Fixed median
- Fixed missing by values overhanging RHS in outer join (fixes #94)
- Added addRow (via PR92 by LeandroC89
- Added column type text to sql interface (fixes #72)
Released: 2020-06-02
- Added column transformation to calculate cumulative sum
cumSum
sales
.sortedBy("quarter")
.addColumn("cum_sales" to { it["sold_units"].cumSum()})
- Added column transformation
pctChange
to calculate percentage change between the current and a prior element. similar to pct_change in pandas (contributed by @amorphous1 in PR85)
sales
.groupBy("product")
.addColumn("sales_pct_change" to { it["sold_units"].pctChange() })
- Added
lead
andlag
(contributed by @amorphous1 in PR85)
sales
.groupBy("product")
.sortedBy("quarter")
.addColumn("prev_quarter_sales" to { it["sold_units"].lag() })
-
Significantly improved join performance (contributed by @amorphous1 in PR85)
-
New: Extended
bindRows
API to combine data rowwise (see PR #77 by @CrystalLord)
val person1 = mapOf("person" to "James", "year" to 1996)
val person2 = mapOf("person" to "Anne", "year" to 1998)
emptyDataFrame().bindRows(person1, person2).print()
internal release
- New: Added built-it support for
Long
columns (PR #69 by @davidpedrosa)
Major:
- New:
summarizeAt
for simplified column aggregations - New:
setNames
to replace column headers of a data-frame - New: Deparse Iterables more conveniently using lambdas in
deparseRecords
Minor:
- Fixed: Can not read csv-tables without header
- Added option to skip lines in csv reader.
- Fixed
schema()
should no throw memory exception (#53: ) - Fixed
DataFrame.readTSV
default format (#56) - Added
where()
for conditional column creation (relates to #54) - Added
writeTSV
- Fixed grouping by
Any
columns - Added:
toDoubleMatrix()
helper extension method
Major Enhancements
DataFrame.fromJson
will now flatten nested json data
Minor
- Added
sum()
extension for columns summaries/transformation - Added
dataFrameOf()
that accepts Iterable of names - Added
bindRows()
alias that accepts data frames as varargs - Added
bindCols()
extension for list ofDataCol
- Fill missing cells with NA in
bindRows
andbindCols
- Resolve duplicated column names in
bindCols()
- Added new builder to create data-frame from
DataFrameRow
iterator - Added
addRowNumber
to add the row number as column to a data-frame - Fixed: Incorrect types in gathered columns
Released 2018-04-11
Major Enhancements
- Allow index access for column model (fixes #46):
irisData[1][2]
- Improved
DataFrame.count
to respect existing groupings and to simply count rows if no grouping is defined - Added
moveLeft
andmoveRight
to rearrange column order - Added
nest
andunnest
to wrap columns into sub-tables and back - Added
expand
andcomplete
to expand column value-sets into data-frames - Added function literal support for
count
andgroupBy
(fixes #48):irisData.groupByExpr{ it["Sepal.Width"] > 3 }
- Added receiver context for sortBy lambdas with sorting specific API (fixes #44)
Improved data-frame rendering
- Improved
print()
ing of data-frames andschema()
ta to have better alignment and more formatting options - Print row numbers by default when using
print
(fixes #49)
Minor Enhancements
- Renamed
select2
/remove2
toselectIf
andremoveIF
- Fixed #39: Can not add scalar object as column
- Started submodule for documentation
- Hide columns in
print
after exceeding maximum line length (fixes #50) - Fixed #45:
sleepData.sortedBy{ "order" }
should fail with informative exception
Released 2018-03-21
Major Enhancements
- Added property unfolding
df.unfold<Person("user", properties=listOf("address"))
- Added text matching helper:
irisData.filter{ it["Species"].isMatching{ startsWith("se") }}
(fixes #21) - Added
sortedByDescending
anddesc
and added more sorting tests - Added More elegant object bindings via reflection. Example
val objPersons : Iterable<User> = users.rowsAs<User>()
(fixes #22) - Added compressed csv write support, configurable or by filename guessing
Minor Enhancements
- More robust row to object conversion
- Made
List<Boolean?>.not()
public - Use regex instead of string as
separate
separator - Replaced fixed temporary column names with uuids
- Fixed incorrect coercion of incomplete inplace data to df
- Added
concat
operator for string column arithmetics - Fixed arithmetic comparison operators
- Added beakerx display adapter
Released 2018-03-14
Major Enhancements
- Allow specifying column types when reading csv data (Thanks to LeanderG for providing the PR)
- Added
groupedBy
to provide distinct set of grouping tuples as data-frame - Read support for URLs (Example
DataFrame.readCSV("https://git.io/vxks7").glimpse()
) - Added basic read/write support for JSON data
- Added generic collection conversion
Iterable<Any>.asDataFrame()
via reflection (fixes #24)
Incompatible API changes
- Renamed
structure
tocolumnTypes
- Renamed all table read function from
.from*
to.read*
- Fixed #29:
mapNonNull
should use parameter and not receiver
Minor Enhancements
- Namespace cleanup to hide internal helpers
- Bundled
irisData
- Enhanced:
DataCol.toDouble()
should work for int columns as well (same vv) - Added MIT License
- Use iterable instead of list for object conversions
Released: 2017-11-11
- More idiomatic API mimicking kotlin stdlib where possible
- Added
DataFrame.remove
to drop columns from data-frames - Added
DataFrame.addColumn
to add column from data-frames - Added
DataFrame.sortBy(TableFormula)
- Added
DataFrame.filterByRow
- Reworked column selector API
- Changed column expression API from Any to a constrained set of support types
- Fixed issues when combining columns of different types (e.g. DoubleCol + IntCol
- Dropped most unary operators
Skipped.
released on 2017-4-12
New Features
spread()
-gather()
support for elegant data reshaping (fixes #2)- Improve reshaping functionality by adding
unite
andseparate
(fixes #9) - Added
sampleFrac()
andsampleN()
for random sub-sampling of data-frames (either with or without replacement)
Important Bug Fixes
mutate()
can now change existing columns without altering column positions
Other
- New property accessor
DataFrame.cols
to access all columns of a data-frame - Incremented kotlin version to 1.1
Initial Release
- Implement all
dplyr
core verbs - Implement all join types
- Table write support using csv-commons wrapper
- Extensive unit test coverage =
- TravisCI integration
- Support for
count()
anddistinct()
- Basic benchmarking framework (without jvm usage)