Skip to content

Commit

Permalink
docs: added psql exmaple in getting started
Browse files Browse the repository at this point in the history
  • Loading branch information
MrunmayS committed Feb 8, 2024
1 parent 05237e6 commit e7327cd
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 18 deletions.
2 changes: 1 addition & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Dozer relies on a YAML configuration structure delineated in `dozer-config.yaml`
|-------------|-----------------------------------------------------------------------------------------------------------------|
| [Connections and Sources](configuration/data-sources) | Details the array of various database, data wareshouses or any other type of connection and their tables. |
| [Transformations](configuration/transformations) | Describes the transformations applied to the sourced data. |
| [Endpoints](configuration/endpoints) | Establishes endpoints, determining how sinks are configured |
| [Sinks](configuration/endpoints) | Establishes endpoints, determining how sinks are configured |
| [Global Parameters and Flags](configuration/flags) | Enables or disables specific options or feature |


Expand Down
57 changes: 41 additions & 16 deletions docs/getting_started/core/adding-transformations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,55 @@ import TabItem from '@theme/TabItem';
We now want to join the two datasets and perform an aggregations to determine the average fare of trips between zones. For this we will add a `sql` section to the `dozer-config.yaml` that looks like this:

```yaml
sql: |
SELECT
PULocationID, DOLocationID,
pu_zones.Zone as PULocationName,
do_zones.Zone as DOLocationName,
AVG(fare_amount) as avg_amount
INTO avg_fares
FROM trips
INNER JOIN zones pu_zones ON trips.PULocationID = pu_zones.LocationID
INNER JOIN zones do_zones ON trips.DOLocationID = do_zones.LocationID
GROUP BY PULocationID, DOLocationID;
sources:
- name: actors
table_name: actor
connection: pagila_conn
columns:
- actor_id
- first_name
- last_name
- name: films
table_name: film
connection: pagila_conn
columns:
- film_id
- title
- rental_rate
- name: film_actors
table_name: film_actor
connection: pagila_conn
columns:
- actor_id
- film_id

sql: |
SELECT a.first_name AS actor_first_name,
a.last_name AS actor_last_name,
f.title AS film_title,
f.rental_rate
into actor_films
FROM actors a
JOIN film_actors fa ON a.actor_id = fa.actor_id
JOIN films f ON fa.film_id = f.film_id
WHERE f.rental_rate > 3;
```
::::note
The SQL you specify in the .yaml file does not run in the source database. Data is processed in real-time as it comes into Dozer by Dozer's internal streaming SQL engine.
::::
Here we used `actors`, `films` and `film_actors` as sources and joined them to create a new table `actor_films`. We also filtered the data to only include films with a rental rate greater than 3.

To expose the result of this query as an API we will also need to add an additional endpoint:
This new table would replicated into the sink database.
```yaml
endpoints:
- table_name: avg_fares
sinks:
- table_name: actor_films
config: !Dummy
```


Now restart `dozer`, to re-trigger the ingestion from the CSV file and Supabase concurrently, and execute the above query in real-time.
Now restart `dozer`, to re-trigger the Snapshotting and Replication process.

```bash
dozer run
```

2 changes: 1 addition & 1 deletion docs/getting_started/core/connecting-to-destinations.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Connecting to the destination databases
# Connecting to the Sink Database

The following sections describe how to connect to the destination databases. Previously, we were transfering data to a demo sink, but now we will transfer data to a real sink.

Expand Down

0 comments on commit e7327cd

Please sign in to comment.