diff --git a/README.md b/README.md index 3ba59a05..a88bafe0 100644 --- a/README.md +++ b/README.md @@ -10,14 +10,18 @@ The following folders can be found in this repository: - **[`iris`](https://archive.ics.uci.edu/ml/datasets/iris)** - The iris flower data set from Ronald Fisher's 1936 paper - **[`metriccentury`](https://github.com/mikeblas/samples-junk/tree/main/metriccentury)** - Data recorded from a 100 km bike ride - **[`DeNiro`](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html)** - Data on Robert De Niro's movies up to 2016 +- **`parquet`** - Sample parquet data in many formats ## Description -Each folder in this repository has two items within: +Each folder in this repository except parquet has two items within: - `README` - An explanation of everything about the data - `csv` - A folder with all relevant data in either CSV or TSV format +parquet instead contains a number of standalone .parquet files with different data formats + + ## Installation Instructions 1. Follow the README instructions on [Deephaven Community Core](https://github.com/deephaven/deephaven-core) for installing the OSS client and all required dependencies. diff --git a/parquet/.DS_Store b/parquet/.DS_Store new file mode 100644 index 00000000..5008ddfc Binary files /dev/null and b/parquet/.DS_Store differ diff --git a/parquet/README.md b/parquet/README.md new file mode 100644 index 00000000..8db49011 --- /dev/null +++ b/parquet/README.md @@ -0,0 +1,33 @@ +# Parquet Sample data + +This folder contains sample files in various format + +## Table of contents + +- `alltypes_dictionary.parquet`: Parquet file using int32, int 64, int 96 timestamps +- `alltypes_plain.parquet`: Parquet file using int32, int 64, int 96 timestamps +- `alltypes_plain.snappy.parquet`: Parquet file using int32, int 64, int 96 timestamps with snappy compression +- `customer.impala.GZIP.parquet`: Parquet file with GZIP compression +- `customer.impala.NONE.parquet`: Parquet file with no compression +- `customer.impala.SNAPPY.parquet`: Parquet file with snappy compression +- `flow.snappy.parquet`: Parquet file doubles and with snappy compression +- `monthlyProductSales.parquet`: Parquet file with whitespace +- `nation.dict-malformed.parquet`: Non-Standard Parquet file +- `nation.impala.SNAPPY.parquet`: Parquet file with int32 and binary, snappy compression +- `nation.impala.GZIP.parquet`: Parquet file with int32 and binary, GZIP compression +- `nation.impala.NONE.parquet`: Parquet file with int32 and binary, no compression +- `nested_lists.snappy.parquet`: Nested Parquet file with snappy compression +- `nested_maps.snappy.parquet`: Nested Parquet map file with snappy compression +- `nonnullable.impala.parquet`: Nested Parquet file with maps and no nulls +- `nullable.impala.parquet`: Nested Parquet file with maps and nulls +- `nulls.snappy.parquet`: Nested Parquet file with maps and nulls, snappy compression +- `repeated_no_annotation.parquet`: Nested Parquet file with repeat values +- `stock_simulated.parquet`: Sample stock data in Parquet format +- `taxi.parquet`: Parquet data with millis timestamps +- `test_datapage_v2.snappy.parquet`: Nested Parquet list file with snappy compression + + + +# Source and License + +This data was built from data sets publicly available. It is provided here for demonstrative use without any warranty as to the accuracy, reliability, or completeness of the data. diff --git a/parquet/alltypes_dictionary.parquet b/parquet/alltypes_dictionary.parquet new file mode 100644 index 00000000..e6da6ab7 Binary files /dev/null and b/parquet/alltypes_dictionary.parquet differ diff --git a/parquet/alltypes_plain.parquet b/parquet/alltypes_plain.parquet new file mode 100644 index 00000000..a63f5dca Binary files /dev/null and b/parquet/alltypes_plain.parquet differ diff --git a/parquet/alltypes_plain.snappy.parquet b/parquet/alltypes_plain.snappy.parquet new file mode 100644 index 00000000..9809d676 Binary files /dev/null and b/parquet/alltypes_plain.snappy.parquet differ diff --git a/parquet/customer.impala.GZIP.parquet b/parquet/customer.impala.GZIP.parquet new file mode 100644 index 00000000..361cb34a Binary files /dev/null and b/parquet/customer.impala.GZIP.parquet differ diff --git a/parquet/customer.impala.NONE.parquet b/parquet/customer.impala.NONE.parquet new file mode 100644 index 00000000..3ce4ade7 Binary files /dev/null and b/parquet/customer.impala.NONE.parquet differ diff --git a/parquet/customer.impala.SNAPPY.parquet b/parquet/customer.impala.SNAPPY.parquet new file mode 100644 index 00000000..acb6787c Binary files /dev/null and b/parquet/customer.impala.SNAPPY.parquet differ diff --git a/parquet/flow.snappy.parquet b/parquet/flow.snappy.parquet new file mode 100644 index 00000000..42310782 Binary files /dev/null and b/parquet/flow.snappy.parquet differ diff --git a/parquet/monthlyProductSales.parquet b/parquet/monthlyProductSales.parquet new file mode 100755 index 00000000..8242055c Binary files /dev/null and b/parquet/monthlyProductSales.parquet differ diff --git a/parquet/nation.dict-malformed.parquet b/parquet/nation.dict-malformed.parquet new file mode 100644 index 00000000..5008ac0b Binary files /dev/null and b/parquet/nation.dict-malformed.parquet differ diff --git a/parquet/nation.impala.GZIP.parquet b/parquet/nation.impala.GZIP.parquet new file mode 100644 index 00000000..5bbf0d50 Binary files /dev/null and b/parquet/nation.impala.GZIP.parquet differ diff --git a/parquet/nation.impala.NONE.parquet b/parquet/nation.impala.NONE.parquet new file mode 100644 index 00000000..bc61f97c Binary files /dev/null and b/parquet/nation.impala.NONE.parquet differ diff --git a/parquet/nation.impala.SNAPPY.parquet b/parquet/nation.impala.SNAPPY.parquet new file mode 100644 index 00000000..67144031 Binary files /dev/null and b/parquet/nation.impala.SNAPPY.parquet differ diff --git a/parquet/nested_lists.snappy.parquet b/parquet/nested_lists.snappy.parquet new file mode 100644 index 00000000..f66ba04b Binary files /dev/null and b/parquet/nested_lists.snappy.parquet differ diff --git a/parquet/nested_maps.snappy.parquet b/parquet/nested_maps.snappy.parquet new file mode 100644 index 00000000..6645527d Binary files /dev/null and b/parquet/nested_maps.snappy.parquet differ diff --git a/parquet/nonnullable.impala.parquet b/parquet/nonnullable.impala.parquet new file mode 100644 index 00000000..f4be0828 Binary files /dev/null and b/parquet/nonnullable.impala.parquet differ diff --git a/parquet/nullable.impala.parquet b/parquet/nullable.impala.parquet new file mode 100644 index 00000000..2c72f52f Binary files /dev/null and b/parquet/nullable.impala.parquet differ diff --git a/parquet/nulls.snappy.parquet b/parquet/nulls.snappy.parquet new file mode 100644 index 00000000..4046d79b Binary files /dev/null and b/parquet/nulls.snappy.parquet differ diff --git a/parquet/repeated_no_annotation.parquet b/parquet/repeated_no_annotation.parquet new file mode 100644 index 00000000..02f20a64 Binary files /dev/null and b/parquet/repeated_no_annotation.parquet differ diff --git a/parquet/stock_simulated.parquet b/parquet/stock_simulated.parquet new file mode 100644 index 00000000..9d2c5fb2 Binary files /dev/null and b/parquet/stock_simulated.parquet differ diff --git a/parquet/taxi.parquet b/parquet/taxi.parquet new file mode 100755 index 00000000..b236ff2a Binary files /dev/null and b/parquet/taxi.parquet differ diff --git a/parquet/test_datapage_v2.snappy.parquet b/parquet/test_datapage_v2.snappy.parquet new file mode 100644 index 00000000..2b77bb1e Binary files /dev/null and b/parquet/test_datapage_v2.snappy.parquet differ