Skip to content

Commit

Permalink
Parquet files (#13)
Browse files Browse the repository at this point in the history
adds parquet sample data
  • Loading branch information
hythloda authored Jun 16, 2021
1 parent 51e308f commit 5897c62
Show file tree
Hide file tree
Showing 24 changed files with 38 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,18 @@ The following folders can be found in this repository:
- **[`iris`](https://archive.ics.uci.edu/ml/datasets/iris)** - The iris flower data set from Ronald Fisher's 1936 paper
- **[`metriccentury`](https://github.com/mikeblas/samples-junk/tree/main/metriccentury)** - Data recorded from a 100 km bike ride
- **[`DeNiro`](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html)** - Data on Robert De Niro's movies up to 2016
- **`parquet`** - Sample parquet data in many formats

## Description

Each folder in this repository has two items within:
Each folder in this repository except parquet has two items within:

- `README` - An explanation of everything about the data
- `csv` - A folder with all relevant data in either CSV or TSV format

parquet instead contains a number of standalone .parquet files with different data formats


## Installation Instructions

1. Follow the README instructions on [Deephaven Community Core](https://github.com/deephaven/deephaven-core) for installing the OSS client and all required dependencies.
Expand Down
Binary file added parquet/.DS_Store
Binary file not shown.
33 changes: 33 additions & 0 deletions parquet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Parquet Sample data

This folder contains sample files in various format

## Table of contents

- `alltypes_dictionary.parquet`: Parquet file using int32, int 64, int 96 timestamps
- `alltypes_plain.parquet`: Parquet file using int32, int 64, int 96 timestamps
- `alltypes_plain.snappy.parquet`: Parquet file using int32, int 64, int 96 timestamps with snappy compression
- `customer.impala.GZIP.parquet`: Parquet file with GZIP compression
- `customer.impala.NONE.parquet`: Parquet file with no compression
- `customer.impala.SNAPPY.parquet`: Parquet file with snappy compression
- `flow.snappy.parquet`: Parquet file doubles and with snappy compression
- `monthlyProductSales.parquet`: Parquet file with whitespace
- `nation.dict-malformed.parquet`: Non-Standard Parquet file
- `nation.impala.SNAPPY.parquet`: Parquet file with int32 and binary, snappy compression
- `nation.impala.GZIP.parquet`: Parquet file with int32 and binary, GZIP compression
- `nation.impala.NONE.parquet`: Parquet file with int32 and binary, no compression
- `nested_lists.snappy.parquet`: Nested Parquet file with snappy compression
- `nested_maps.snappy.parquet`: Nested Parquet map file with snappy compression
- `nonnullable.impala.parquet`: Nested Parquet file with maps and no nulls
- `nullable.impala.parquet`: Nested Parquet file with maps and nulls
- `nulls.snappy.parquet`: Nested Parquet file with maps and nulls, snappy compression
- `repeated_no_annotation.parquet`: Nested Parquet file with repeat values
- `stock_simulated.parquet`: Sample stock data in Parquet format
- `taxi.parquet`: Parquet data with millis timestamps
- `test_datapage_v2.snappy.parquet`: Nested Parquet list file with snappy compression



# Source and License

This data was built from data sets publicly available. It is provided here for demonstrative use without any warranty as to the accuracy, reliability, or completeness of the data.
Binary file added parquet/alltypes_dictionary.parquet
Binary file not shown.
Binary file added parquet/alltypes_plain.parquet
Binary file not shown.
Binary file added parquet/alltypes_plain.snappy.parquet
Binary file not shown.
Binary file added parquet/customer.impala.GZIP.parquet
Binary file not shown.
Binary file added parquet/customer.impala.NONE.parquet
Binary file not shown.
Binary file added parquet/customer.impala.SNAPPY.parquet
Binary file not shown.
Binary file added parquet/flow.snappy.parquet
Binary file not shown.
Binary file added parquet/monthlyProductSales.parquet
Binary file not shown.
Binary file added parquet/nation.dict-malformed.parquet
Binary file not shown.
Binary file added parquet/nation.impala.GZIP.parquet
Binary file not shown.
Binary file added parquet/nation.impala.NONE.parquet
Binary file not shown.
Binary file added parquet/nation.impala.SNAPPY.parquet
Binary file not shown.
Binary file added parquet/nested_lists.snappy.parquet
Binary file not shown.
Binary file added parquet/nested_maps.snappy.parquet
Binary file not shown.
Binary file added parquet/nonnullable.impala.parquet
Binary file not shown.
Binary file added parquet/nullable.impala.parquet
Binary file not shown.
Binary file added parquet/nulls.snappy.parquet
Binary file not shown.
Binary file added parquet/repeated_no_annotation.parquet
Binary file not shown.
Binary file added parquet/stock_simulated.parquet
Binary file not shown.
Binary file added parquet/taxi.parquet
Binary file not shown.
Binary file added parquet/test_datapage_v2.snappy.parquet
Binary file not shown.

0 comments on commit 5897c62

Please sign in to comment.