Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 2 #42

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions greenery/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target/
dbt_packages/
logs/
36 changes: 36 additions & 0 deletions greenery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Week 3 Answers:

Overall Conversion Rate: 62.5%

Product level conversion:
product_id conversion rate
fb0e8be7-5ac4-4a76-a1fa-2cc4bf0b2d80 0.676923077
5b50b820-1d0a-4231-9422-75e7f6b0cecf 0.533333333
80eda933-749d-4fc6-91d5-613d29eb126f 0.5
be49171b-9f72-4fc9-bf7a-9a52e259836b 0.530612245
64d39754-03e4-4fa0-b1ea-5f4293315f67 0.508474576
74aeb414-e3dd-4e8a-beef-0fa45225214d 0.609375
55c6a062-5f4a-4a8b-a8e5-05ea5e6715a3 0.507936508
58b575f2-2192-4a53-9d21-df9a0c14fc25 0.516129032
d3e228db-8ca5-42ad-bb0a-2148e876cc59 0.464285714
e18f33a6-b89a-4fbc-82ad-ccba5bb261cc 0.422535211
c7050c3b-a898-424d-8d98-ab0aaad7bef4 0.493333333
bb19d194-e1bd-4358-819e-cd1f1b401c0c 0.5
37e0062f-bd15-4c3e-b272-558a86d90598 0.548387097
05df0866-1a66-41d8-9ed7-e2bbcddd6a3d 0.55
6f3a3072-a24d-4d11-9cef-25b0b5f8a4af 0.5
615695d3-8ffd-4850-bcf7-944cf6d3685b 0.553846154
579f4cd0-1f45-49d2-af55-9ab2b72c3b35 0.571428571
e8b6528e-a830-4d03-a027-473b411c7f02 0.465753425
843b6553-dc6a-4fc4-bceb-02cd39af0168 0.514705882
5ceddd13-cf00-481f-9285-8340ab95d06d 0.550724638
35550082-a52d-4301-8f06-05b30f6f3616 0.533333333
4cda01b9-62e2-46c5-830f-b7f262a58fb1 0.375
689fb64e-a4a2-45c5-b9f2-480c2155624d 0.608695652
b66a7143-c18a-43bb-b5dc-06bb5d1d3160 0.538461538
e706ab70-b396-4d30-a6b2-a1ccf3625b52 0.508474576
e5ee99b6-519f-4218-8b41-62f48f59f700 0.52238806
b86ae24b-6f59-47e8-8adc-b17d88cbd367 0.603773585
c17e63f7-0d28-4a95-8248-b01ea354840e 0.581818182
a88a23ef-679c-4743-b151-dc7722040d8c 0.52173913
e2e78dfc-f25c-4fec-a002-8e280d61a2f2 0.53968254
Empty file added greenery/analyses/.gitkeep
Empty file.
46 changes: 46 additions & 0 deletions greenery/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'greenery'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'greenery'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
greenery:
# Config indicated by + and applies to all files under models/example/
staging:
+materialized: view
marts:
+materialized: table

post-hook:
- "GRANT SELECT ON {{this}} TO reporting"

on-run-end:
- "GRANT USAGE ON SCHEMA {{schema}} TO reporting"
Empty file added greenery/macros/.gitkeep
Empty file.
3 changes: 3 additions & 0 deletions greenery/macros/event_type_agg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{%- macro events(event_type) -%}
SUM(CASE WHEN event_type = '{{event_type}}' THEN 1 ELSE 0 END)
{%- endmacro -%}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{ config (materialized='table')}}

SELECT
order_id,
order_items_surrogate_key,
o.product_id,
product_name,
price,
inventory as available_inventory,
quantity as quanity_purchased
FROM {{ref('stg_greenery_order_items')}} o
JOIN {{ref('stg_greenery_products')}} p ON p.product_id = o.product_id
10 changes: 10 additions & 0 deletions greenery/models/models/marts/core/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: 2

models:
- name: int_order_products
description: Detailed product descriptions for order lines
columns:
- name: order_items_surrogate_key
tests:
- unique
- not_null
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{{ config (materialized='table')}}

WITH user_order_count as (
SELECT
user_id,
count(order_id) as order_count
FROM {{ref('stg_greenery_orders')}}
WHERE order_status = 'delivered'
GROUP BY 1
)

SELECT
u.user_id,
u.first_name,
u.last_name,
u.email,
u.phone_number,
u.user_address_id,
ua.address,
ua.zipcode,
ua.state,
ua.country,
o.address_id,
o.order_created_at_utc,
o.order_cost,
shipping_cost,
order_total,
order_tracking_id,
shipping_service,
order_estimated_delivery_at_utc,
order_delivered_at_utc,
order_status,
COALESCE(order_count,0) as user_order_count
FROM {{ref('stg_greenery_users')}} u
JOIN {{ref('stg_greenery_orders')}} o on o.user_id = u.user_id
JOIN {{ref('stg_greenery_addresses')}} ua on ua.address_id = u.user_address_id
LEFT JOIN user_order_count uc on uc.user_id = u.user_id
10 changes: 10 additions & 0 deletions greenery/models/models/marts/marketing/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: 2

models:
- name: int_user_orders
description: Order data structured around the users placing them
columns:
- name: order_id
tests:
- unique
- not_null
15 changes: 15 additions & 0 deletions greenery/models/models/marts/product/fct_product_conversion.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

{%- set event_types = dbt_utils.get_column_values(
table=ref('stg_greenery_events'),
column='event_type'
) -%}

SELECT
p.product_id
{%- for event in event_types %}
, {{events(event)}} AS {{event}}_counts
{%- endfor %}

from {{ref('stg_greenery_products')}} p
left join {{ref('stg_greenery_events')}} e on e.product_id = p.product_id
group by 1
13 changes: 13 additions & 0 deletions greenery/models/models/marts/product/fct_sessions.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{{ config (materialized='table')}}

SELECT
session_id,
u.user_id,
u.first_name,
u.last_name,
session_first_event,
session_last_event,
session_last_event - session_first_event as session_length,
case when session_conversion_events > 0 then TRUE else FALSE end as session_converted
FROM {{ref('int_session_events')}} s
LEFT JOIN {{ref('stg_greenery_users')}} u on u.user_id = s.user_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{{ config (materialized='table')}}

{%- set event_types = dbt_utils.get_column_values(
table=ref('stg_greenery_events'),
column='event_type'
) -%}

SELECT
session_id,
user_id,
min(created_at_utc) as session_first_event,
max(created_at_utc) as session_last_event,
sum(case when order_id is not null then 1 else 0 end) AS session_conversion_events
{%- for event in event_types %}
, {{events(event)}} AS {{event}}_counts
{%- endfor %}

FROM {{ref('stg_greenery_events')}} e
GROUP BY 1,2
17 changes: 17 additions & 0 deletions greenery/models/models/marts/product/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 2

models:
- name: int_session_events
description: Basic details about the sessions associated with events
columns:
- name: session_id
tests:
- unique
- not_null
- name: fct_sessions
description: Aggregated session performance data
columns:
- name: session_id
tests:
- unique
- not_null
21 changes: 21 additions & 0 deletions greenery/models/models/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

version: 2

models:
- name: my_first_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null

- name: my_second_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null
68 changes: 68 additions & 0 deletions greenery/models/models/staging/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
How many users do we have?
A: 130
<!-- SQL: select count(distinct user_id) from dbt_charlie_m.tbl_users -->

On average, how many orders do we receive per hour?
A: 7.5
SQL:
<!--
with hour_orders as (
select
date_trunc('hour', created_at) as order_hour,
count(order_id) as order_count
from dbt_charlie_m.tbl_orders
group by 1
)
select avg(order_count)
from hour_orders
-->

On average, how long does an order take from being placed to being delivered?
A: 3 days, 21 hours
SQL:
<!-- with x as (
select
order_id,
date_trunc('hour', delivered_at_utc) - date_trunc('hour', created_at) as diff,
created_at

from dbt_charlie_m.tbl_orders
where status = 'delivered'
)
select avg(diff) from x -->

How many users have only made one purchase? Two purchases? Three+ purchases?
A: 1 order: 25; 2 orders: 28; 3+ orders: 71
SQL:
<!--
with user_order_count as (
select
user_id,
count(order_id) as order_count

from dbt_charlie_m.tbl_orders
group by 1
)

select
count(case when order_count = 1 then user_id else null end) as one_order,
count(case when order_count = 2 then user_id else null end) as two_orders,
count(case when order_count >= 3 then user_id else null end) as three_plus_orders
from user_order_count -->

On average, how many unique sessions do we have per hour?
A: 16.3
SQL:
<!--
with hour_sessions as (
select
date_trunc('hour', created_at_utc) as hour,
count(distinct session_id) as unique_sessions
from dbt_charlie_m.tbl_events
group by 1
)

select
avg(unique_sessions) as avg_session_count
from hour_sessions
-->
37 changes: 37 additions & 0 deletions greenery/models/models/staging/greenery_sources.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
version: 2
sources:
- name: greenery_src
schema: public
database: dbt
quoting:
database: false
schema: false
identifier: false
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
tables:
- name: addresses
description: >
Contains delivery location information associated with greenery's users (GDPR)
- name: events
loaded_at_field: created_at
description: >
Contains information regarding greenery's users site events
- name: orders
loaded_at_field: created_at
description: >
Contains information regarding orders, deliveries and shipping
- name: order_items
description: >
Contains list of items per order
- name: products
description: >
Contains information regarding greenery's products
- name: promos
description: >
Contains information regarding greenery's promotions
- name: users
loaded_at_field: created_at
description: >
Contains demographic information and contact details for greenery's users (GDPR)
20 changes: 20 additions & 0 deletions greenery/models/models/staging/stg_greenery_addresses.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{config(
materialized='table'
) }}

with sources as (
select * from {{ source('greenery_src', 'addresses') }} as addresses
)

, rename_recast as (
SELECT
address_id,
address,
zipcode,
state,
country

FROM sources
)

select * from rename_recast
Loading