This repository contains the code for the Data Engineering Pipelines with Snowpark Scala Snowflake Quickstart.
➡️ NOTE: This quickstart has not been published yet to https://quickstarts.snowflake.com.
Here is an overview of what we'll build in this lab:
The following is required:
- A Snowflake account
- Packages on local machine:
- Java 11
- Scala 2.12.18
- SBT 1.9.7
- Maven
Packages setup
Use your package manager of choice to install the aforementioned packages or if Nix and direnv are installed, run direnv allow
which will pull in all the needed packages and set up the environment.
Note: Nix/direnv combination is entirely optional. It does greatly simplify the packages installation.
If using Nix, make sure to enable the "flakes" experimental feature.
Authentication setup
This repository is intended to work with either snowflake.properties
(see example) or with environment variables that can be retrieved from snowsql
configuration by .envrc (see .envrc
) file.
Properties file takes precedence.
One important environment variable to consider is export QUICKSTART_RUN_LOCALLY=TRUE
which will allow you to run the procedures locally.
-
Follow the SQL instructions in
sql/01_setup_snowflake.sql
to prepare your Snowflake account for the lab. -
Run
Step02_LoadRawData
Scala program (from your local machine) to ingest raw data from S3 into Snowflake:sbt "runMain com.snowflake.examples.de.Step02_LoadRawData"
-
Follow the SQL instructions in
sql/03_load_weather.sql
to configure the Frostbyte weather data share in your Snowflake account. -
Run
Step04_CreatePOSView
Scala program (from your local machine) to create a flattened view (and corresponding stream object) for POS data:sbt "runMain com.snowflake.examples.de.Step04_CreatePOSView"
-
Review Scala code in
Step05_FarenheitToCelsiusUDF
. This contains a UDF that will be used in Step 7. -
Review Scala code in
Step06_UpdateOrdersProcedure
. This file contains a Scala stored procedure that will merge data from the flattened POS view into theorders
table. -
Review Scala code in
Step07_UpdateDailyCityMetricsProcedure
. This file contains a Scala stored procedure that will perform various aggregation and transformations to prepare thedaily_city_metrics
table in theanalytics
layer.
mvn clean package snowflake:deploy
- Follow the SQL instructions in
sql/08_orchestrate_jobs.sql
to configure and execute tasks that will trigger the Scala stored procedures. These procedures will populate theorders
anddaily_city_metrics
tables. - Follow the SQL instructions in
sql/09_process_incrementally.sql
to load additional data for incremental processing - Follow the SQL instructions in
sql/10_teardown.sql
to remove assets related to this lab from your Snowflake account.
mvn dependency:list -DincludeScope=compile