Skip to content

Commit

Permalink
Make clean_stale_db_objects configurable
Browse files Browse the repository at this point in the history
Also update local development instructions in README
  • Loading branch information
hancush committed Sep 11, 2024
1 parent 45dfc9b commit c39df14
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 33 deletions.
37 changes: 12 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,38 +12,25 @@ An Airflow-based dashboard for the LA Metro ETL pipeline!

Perform the following steps from your terminal.

1. Clone this repository and its submodule, then `cd` into the superproject.
1. Clone [the LA Metro Councilmatic repository](ttps://github.com/Metro-Records/la-metro-councilmatic) and follow the instructions in its
README to build and run the application.

```bash
git clone --recursive https://github.com/Metro-Records/la-metro-dashboard.git
cd la-metro-dashboard
```
2. Build `la-metro-dashboard` application, and create a local `.env` file. Fill
in the absolute location of your GPG keyring, usually the absolute path for ` ~/.gnupg`.
2. Clone this repository and create a local `.env` file.

```bash
docker-compose build
cp .env.example .env
# Fill in the correct value for GPG_KEYRING_PATH
```

3. Once the command exits, follow the instructions to build the [LA Metro Councilmatic application](https://github.com/Metro-Records/la-metro-councilmatic#setup)
```bash
cp .env.example .env
```

4. In order to run the `la-metro-dashboard` application, the `la-metro-councilmatic`
app must already be running. Open a new shell, move into the `la-metro-councilmatic`
application, and run it.
Fill in the absolute location of your GPG keyring, usually the absolute path for ` ~/.gnupg`.

```bash
cd la-metro-councilmatic && docker-compose up app
```

Once la-metro-councilmatic is running, in your first shell, run the la-metro-dashboard application.
3. Build and run the dashboard:

```bash
docker-compose up
```
```bash
docker-compose up
```

5. Finally, to visit the dashboard app, go to http://localhost:8080/admin/. The
4. Finally, to visit the dashboard app, go to http://localhost:8080/admin/. The
Councilmatic app runs on http://localhost:8001/.

See the Airflow documentation for more on [navigating the UI](https://airflow.apache.org/docs/stable/ui.html)
Expand Down
31 changes: 23 additions & 8 deletions dags/clean_stale_db_objects.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
from datetime import timedelta

from airflow import DAG
from airflow.decorators import dag, task

from constants import (
LA_METRO_DATABASE_URL,
LA_METRO_SEARCH_URL,
LA_METRO_DOCKER_IMAGE_TAG,
LA_METRO_STAGING_DATABASE_URL,
START_DATE,
LA_SCRAPERS_IMAGE_URL
LA_SCRAPERS_IMAGE_URL,
)
from operators.blackbox_docker_operator import BlackboxDockerOperator

Expand Down Expand Up @@ -41,16 +41,31 @@
},
}

with DAG(
"clean_stale_db_objects",
default_args=default_args,

@dag(
schedule_interval="0 0 * * 0",
description="Deletes objects from the database that have not"
"been seen in a recent scrape",
) as dag:
default_args=default_args,
params={"window": 7, "max": 25, "report": False},
)
def clean_stale_db_objects(window=7, max=25, report=False):
@task
def get_flags(**kwargs):
if kwargs["params"]["report"]:
return "--report"
else:
return f"--window={kwargs['params']['window']} --max={kwargs['params']['max']} --yes"

BlackboxDockerOperator(
flags = get_flags()

pupa_clean = BlackboxDockerOperator(
task_id="clean_stale_db_objects",
environment=docker_base_environment,
command="pupa clean --noinput",
command=f"pupa clean {flags}",
)

flags >> pupa_clean


clean_stale_db_objects()
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,4 @@ volumes:
networks:
app_net:
name: la-metro-councilmatic_default
external: true

0 comments on commit c39df14

Please sign in to comment.