FlatQuack

FlatQuack is an open source tool to convert healthcare data in FHIR format into flat CSV, Parquet, JSON, or database tables that are ready for analysis using off-the-shelf tools. Not sure what this means? Check out the overview below :).

Overview

As the availability of healthcare data in FHIR format increases, there is a growing interest in using this data for analytic purposes. Most analytic and machine learning use cases require the preparation of FHIR data using transformations and tabular projections from its original, deeply nested form and authoring and maintaining these transformations is not trivial. The SQL-on-FHIR specification defines a compact language to describe these transformations called a FHIR ViewDefinition.

FlatQuack is an open source tool that compiles these ViewDefinitions into SQL that can be executed with DuckDB. DuckDB is a lightweight, scalable, open source database engine that runs at blazing speed thanks to its columnar engine, which supports parallel execution and can process larger-than-memory workloads. DuckDB natively supports reading and writing data in common formats like JSON, Parquet and CSV and can even integrate directly with remote endpoints such as AWS S3 buckets and databases such as Postgres, MySql and SQLite. FlatQuack takes advantage of this capability with the ability to specify templates that shape the SQL produced by FlatQuack to different use cases, making it easy to adapt to the specific naming conventions, input file formats, and output file formats in your workflow. Since FlatQuack just outputs SQL, it can be run directly on a FHIR dataset for a specific project or integrated into an existing data pipeline that uses orchestration and transformation tools like Apache Airflow and dbt.

Alpha Version

FlatQuack is alpha software - it is not complete and you may run into bugs when using it. Please see the project roadmap, contribution guidelines, and support sections below for more information.

Setup

Install bun - Bun is an open source Javascript runtime similar to node.js
You may also want to install the latest version of DuckDB if you plan to execute the generated SQL queries outside of the FlatQuack command line interface (recommended for large datasets)

Additional steps if you would like to run scripts, unit tests or edit the project's source code:

Clone this repository and switch to that directory
Run the bun install command to install FlatQuack dependencies

You can also run it using Docker container:

docker build -t flatquack .
docker run --rm  -v "$(pwd):/app" -it flatquack

root@<container_hash>:/app# bunx flatquack
# *** compiling example/example.vd.json ***
# ... rest of output ...

Running FlatQuack

`bunx flatquack`

(or bun run ./src/cli.js if you installed FlatQuack locally using steps 3 and 4 above)

Command line arguments

name	short	default value	description
`--mode`	`-m`	`preview`	Action FlatQuack should take with generated SQL. See the modes section below for details.
`--view-path`	`-v`	current directory	The absolute or relative path to your ViewDefinition JSON files. Note that the -view-pattern parameter describes which files within this path will be used.
`--view-pattern`	`-p`	`*/.vd.json`	Glob pattern to define which files are ViewDefinitions.
`--template`	`-t`	`@csv`	Path to template to use when generating SQL. May be the name of a sample template or the path to a custom template
`--schema-file`	`-s`	FHIR R4 Schema	Path to a FHIR schema generated using the script included at `./scripts/build-fhir-schema.js`. This can be used to execute ViewDefinitions against FHIR data from versions other than R4. See the Generating a FHIR Schema seciton below for details.
`--param`			`name=value` pair of user defined variables to be used when generating SQL with a custom template. This argument may be repeated.
`--verbose`		false	Print debugging information to the console when running FlatQuack.

Modes (--mode parameter)

name	description
`preview` (default)	Generate SQL and display it in the console.
`build`	Generate SQL and save it in the same directory as the source ViewDefinition.
`run`	Execute the SQL and print the time it took to run in the console.
`explore`	Execute the SQL and print the query output in the console as JSON. Large queries should use the `build` action and run the resulting SQL files directly with DuckDB.

Templates (--template parameter)

Sample Templates

name	input	output
`@csv` (default)	NDJSON FHIR Bulk Data files with a `.ndjson` extension and the resource type in the name	Flat CSV files with a header row
`@parquet`	NDJSON FHIR Bulk Data files with a `.ndjson` extension and the resource type in the name	Parquet files ready for additional processing
`@ndjson`	NDJSON FHIR Bulk Data files with a `.ndjson` extension	NDJSON file with one line per output row and just the abstracted data
`@dbt_model`	DBT Source named `fhir_db` with tables named as FHIR resource types	SQL Select statement that returns a flat table
`@dbt_prehook`	NA	DuckDB SQL macros to load before executing a query generated with FlatQuack
`@explore` (default for the `explore` mode)	NDJSON FHIR Bulk Data files with a `.ndjson` extension and the resource type in the name	Flattened table with up to 10 results

Custom Templates

FlatQuack uses a very simple template language that replaces specific variables when they're placed between double brackets with values from the current execution (e.g., {{ fq_input_dir }}). Variable names and values not in the list below may be passed into the template processor using the --param command line argument and will be replaced if they appear they the template. This argument may also be used to pass in values that override the values of the built-in variables. Variables not in the list below or passed in as arguments will not be removed by the template engine to support their use in other processing steps such as DBT pipelines.

name	description
`fq_input_dir`	Defaults to current working directory
`fq_output_dir`	Defaults to current working directory
`fq_sql_transform_expression`	SQL transformation generated from the `select` element of the ViewDefinition
`fq_where_filter`	SQL generated from the `where` element of the ViewDefinition
`fq_sql_input_schema`	SQL schema generated from FHIR elements used in the `select` and `where` elements of the ViewDefinition
`fq_sql_flattening_cols`	SQL columns that create the output columns based on the `column` elements in the ViewDefinition
`fq_sql_flattening_tables`	SQL joins that create the output columns based on the `column` elements in the ViewDefinition
`fq_vd_name`	The value in the `name` element of the ViewDefinition
`fq_vd_resource`	The value in the `resource` element of the ViewDefinition
`fq_sql_macros`	DuckDB SQL macros to load before executing a query generated with FlatQuack

Generating a FHIR Schema

The schema for FHIR R4 is included with FlatQuack, but you may want to execute ViewDefinition files against other FHIR versions as well. To do this you can generate schema files for those version and pass them in with the --schema-file command line argument.

To generate a schema:

Download and decompress the FHIR definitions in JSON format from https://hl7.org/fhir/downloads.html (or the corresponding URL for the FHIR version you want to use).

Run the script:

bun ./scripts/build-fhir-schema.js {path to FHIR definitions} {output file path}

The script accepts two positional parameters:

The path to the directory where the FHIR definition files in JSON format are located (e.g., ../fhir/R4)
The path for the schema file (e.g. ../schemas/fhir-schema-r4.json)

Roadmap

Value Set support
Unions with nested select elements
Constants in ViewDefinitions
Boundary functions in FHIR Path expressions
Watch command line mode for exploratory queries

Contributing

Pull Requests

Pull requests are very welcome! Apart from minor bug fixes, please start by opening an issue to discuss your plans and make sure they're aligned with other work being done on the project.

Running the tests

bun test

Generating specification test output file

Ensure that the test files in the ./tests/spec-tests directory are up to date
Run bun ./scripts/build-test-output.js.
Output file is written as flatquack_test_output.json

Support

As an open source project, support is provided for FlatQuack on a best effort basis. Interested in commercial support, training, or sponsoring the addition of a feature to the app? Please get in touch - Dan (at) CentralSquareSolutions.com.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
example		example
schemas		schemas
scripts		scripts
src		src
templates		templates
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lockb		bun.lockb
flatquack_test_output.json		flatquack_test_output.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlatQuack

Overview

Alpha Version

Setup

Running FlatQuack

`bunx flatquack`

Command line arguments

Modes (--mode parameter)

Templates (--template parameter)

Sample Templates

Custom Templates

Generating a FHIR Schema

Roadmap

Contributing

Pull Requests

Running the tests

Generating specification test output file

Support

About

Releases

Packages

Languages

License

kczpl/flatquack

Folders and files

Latest commit

History

Repository files navigation

FlatQuack

Overview

Alpha Version

Setup

Running FlatQuack

bunx flatquack

Command line arguments

Modes (--mode parameter)

Templates (--template parameter)

Sample Templates

Custom Templates

Generating a FHIR Schema

Roadmap

Contributing

Pull Requests

Running the tests

Generating specification test output file

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`bunx flatquack`

Packages