From dddc3753ff2ee7b4a4a05b73ccb8013472596e62 Mon Sep 17 00:00:00 2001 From: Andra Blaj Date: Fri, 8 Mar 2024 16:08:54 +0000 Subject: [PATCH] Apply suggestions --- .../data/analytics/environment-variables.md | 39 ++++++++++++++++++ .../guides/data/analytics/introduction.md | 9 ++--- .../en/apps/guides/data/analytics/setup.md | 40 ++----------------- 3 files changed, 46 insertions(+), 42 deletions(-) create mode 100644 content/en/apps/guides/data/analytics/environment-variables.md diff --git a/content/en/apps/guides/data/analytics/environment-variables.md b/content/en/apps/guides/data/analytics/environment-variables.md new file mode 100644 index 000000000..678e48f0c --- /dev/null +++ b/content/en/apps/guides/data/analytics/environment-variables.md @@ -0,0 +1,39 @@ +--- +title: "Environment Variables" +weight: 3 +linkTitle: "Environment Variables" +description: > + Environment variables for running CHT Sync +--- + +There are three environment variable groups in the `.env` file. To successfully set up CHT Sync, it is important to understand the difference between them. +1. `POSTGRES_`: Used by PostgREST and PostgreSQL to establish the PostgreSQL database to synchronize CouchDB data to. They also define the schema and table names to store the CouchDB data. The main objective is to define the environment where the raw CouchDB data will be copied. +2. `DBT_`: Exclusive to the DBT configuration. The main objective is to define the environment where the tables and views for the models defined in `CHT_PIPELINE_BRANCH_URL` will be created. It is important to separate this environment from the previous group. `DBT_POSTGRES_SCHEMA` must be different from `POSTGRES_SCHEMA`. `DBT_POSTGRES_HOST` has to be the Postgres instance created with the environment variables set in the first group. +3. `COUCHDB_`: Used by CouchDB and Logstash to define the CouchDB instance to sync with. With `COUCHDB_DBS`, we can specify a list of databases to sync. + +All the variables in the `.env` file: + +| Name | Default | Description | +|---------------------------|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| +| `COMPOSE_PROJECT_NAME` | `pipeline` | (Optional) Docker Compose name | +| `POSTGRES_USER` | `postgres` | Username of the PostgreSQL database to copy CouchDB data to | +| `POSTGRES_PASSWORD` | `postgres` | Password of the PostgreSQL database to copy CouchDB data to | +| `POSTGRES_DB` | `data` | PostgreSQL database where the CouchDB data is copied | +| `POSTGRES_SCHEMA` | `v1` | PostgreSQL schema where the CouchDB data is copied | +| `POSTGRES_TABLE` | `medic` | PostgreSQL table where the CouchDB data is copied. For `DBT` use only. | +| `POSTGRES_HOST` | `localhost` | PostgreSQL instance to copy CouchDB data to. To be set only if the PostgreSQL instance is different than the container provided with CHT Sync. | +| `DBT_POSTGRES_USER` | `postgres` | Username of the PostgreSQL database where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | +| `DBT_POSTGRES_PASSWORD` | `postgres` | Password of the PostgreSQL database where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | +| `DBT_POSTGRES_SCHEMA` | `dbt` | PostgreSQL schema where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | +| `DBT_POSTGRES_HOST` | `postgres` | PostgreSQL instance IP or endpoint | +| `CHT_PIPELINE_BRANCH_URL` | `"https://github.com/medic/cht-pipeline.git#main"` | CHT Pipeline branch containing the `DBT` models | +| `COUCHDB_USER` | `medic` | Username of the CouchDB instance to sync with | +| `COUCHDB_PASSWORD` | `password` | Password of the CouchDB instance to sync with | +| `COUCHDB_DBS` | `"medic"` | Space separated list of databases to sync e.g `"medic medic_sentinel"` | +| `COUCHDB_HOST` | `couchdb` | Host of the CouchDB instance to sync with | +| `COUCHDB_PORT` | `5984` | Port of the CouchDB instance to sync with | +| `COUCHDB_SECURE` | `false` | Is connection to CouchDB instance secure? | + +{{% alert title="Note" %}} +If `CHT_PIPELINE_BRANCH_URL` is pointing to a private GitHub repository, you'll need an access token in the URL. Assuming your repository is `medic/cht-pipeline`, you would replace `` with an access token: `https://@github.com/medic/cht-pipeline.git#main`. Please see [GitHub's instructions](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) on how to generate a token. +{{% /alert %}} diff --git a/content/en/apps/guides/data/analytics/introduction.md b/content/en/apps/guides/data/analytics/introduction.md index 855513a90..b7b6b4a01 100644 --- a/content/en/apps/guides/data/analytics/introduction.md +++ b/content/en/apps/guides/data/analytics/introduction.md @@ -1,6 +1,6 @@ --- title: "Introduction & Prerequisites to data synchronization and analytics" -weight: 100 +weight: 1 linkTitle: "Introduction & Prerequisites" description: > High level approach to data synchronization and analytics with CHT applications @@ -10,7 +10,9 @@ relatedContent: > --- {{% pageinfo %}} -The pages in this section apply to both CHT 3.x (beyond 3.12) and CHT 4.x. +The pages in this section apply to both CHT 3.x (beyond 3.12) and CHT 4.x. + +[CHT Sync schema](https://github.com/medic/cht-sync/blob/main/postgres/init-dbt-resources.sh) differs from [CHT Couch2pg](https://github.com/medic/cht-couch2pg). {{% /pageinfo %}} Most CHT deployments require some sort of analytics so that stakeholders can make data driven decisions. CouchDB, which is the database used by the CHT, is not designed for analytics. It is a document database, which means that it is optimized for storing and retrieving documents, and not for aggregating data. For example, if you wanted to know how many patients were registered in a particular area, you would have to query the database for all the patients in that area, and then count them. This is not a very efficient process. It is much more efficient to store the number of patients in a particular area in a separate database, and update that number whenever a patient is registered or unregistered. This is what CHT Sync paired with CHT Pipeline is designed to do. @@ -29,6 +31,3 @@ CHT Sync has been designed to work in both local development environments for te - [Node and npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) (Node 18 LTS or newer) - [CHT Sync](https://github.com/medic/cht-sync) GitHub repository (can be cloned via `git clone https://github.com/medic/cht-sync`). -{{% alert title="Note" %}} -In order for CHT Sync to transform CouchDB data to PostgreSQL format, it needs to be linked to [CHT Pipeline](https://github.com/medic/cht-pipeline), which contains transformation models using `DBT`. [The schema](https://github.com/medic/cht-sync/blob/main/postgres/init-dbt-resources.sh) differs from [`couch2pg`](https://github.com/medic/couch2pg). -{{% /alert %}} diff --git a/content/en/apps/guides/data/analytics/setup.md b/content/en/apps/guides/data/analytics/setup.md index 2bf1fbc62..85fc76b7f 100644 --- a/content/en/apps/guides/data/analytics/setup.md +++ b/content/en/apps/guides/data/analytics/setup.md @@ -1,6 +1,6 @@ --- title: "Local CHT Sync Setup" -weight: 100 +weight: 2 linkTitle: "Local CHT Sync Setup" description: > Setting up a local deployment of CHT Sync with the CHT @@ -13,43 +13,9 @@ Before setting up CHT Sync in production, it's very handy to be able to run it l These instructions assume you're running CHT Sync, CHT Core and PostgreSQL either locally on your workstation or on a local server. They are not meant to be used to deploy a secure, always on production instance. -#### Environment variables - -There are three environment variable groups in the `.env` file. To successfully set up CHT Sync, it is important to understand the difference between them. -1. `POSTGRES_`: Used by PostgREST and PostgreSQL to establish the PostgreSQL database to synchronize CouchDB data to. They also define the schema and table names to store the CouchDB data. The main objective is to define the environment where the raw CouchDB data will be copied. -2. `DBT_`: Exclusive to the DBT configuration. The main objective is to define the environment where the tables and views for the models defined in `CHT_PIPELINE_BRANCH_URL` will be created. It is important to separate this environment from the previous group. `DBT_POSTGRES_SCHEMA` must be different from `POSTGRES_SCHEMA`. `DBT_POSTGRES_HOST` has to be the Postgres instance created with the environment variables set in the first group. -3. `COUCHDB_`: Used by CouchDB and Logstash to define the CouchDB instance to sync with. With `COUCHDB_DBS`, we can specify a list of databases to sync. - -All the variables in the `.env` file: - -| Name | Default | Description | -|---------------------------|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| -| `COMPOSE_PROJECT_NAME` | `pipeline` | (Optional) Docker Compose name | -| `POSTGRES_USER` | `postgres` | Username of the PostgreSQL database to copy CouchDB data to | -| `POSTGRES_PASSWORD` | `postgres` | Password of the PostgreSQL database to copy CouchDB data to | -| `POSTGRES_DB` | `data` | PostgreSQL database where the CouchDB data is copied | -| `POSTGRES_SCHEMA` | `v1` | PostgreSQL schema where the CouchDB data is copied | -| `POSTGRES_TABLE` | `medic` | PostgreSQL table where the CouchDB data is copied. For `DBT` use only. | -| `POSTGRES_HOST` | `localhost` | PostgreSQL instance to copy CouchDB data to. To be set only if the PostgreSQL instance is different than the container provided with CHT Sync. | -| `DBT_POSTGRES_USER` | `postgres` | Username of the PostgreSQL database where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | -| `DBT_POSTGRES_PASSWORD` | `postgres` | Password of the PostgreSQL database where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | -| `DBT_POSTGRES_SCHEMA` | `dbt` | PostgreSQL schema where `DBT` creates tables and views from the models in `CHT_PIPELINE_BRANCH_URL` | -| `DBT_POSTGRES_HOST` | `postgres` | PostgreSQL instance IP or endpoint | -| `CHT_PIPELINE_BRANCH_URL` | `"https://github.com/medic/cht-pipeline.git#main"` | CHT Pipeline branch containing the `DBT` models | -| `COUCHDB_USER` | `medic` | Username of the CouchDB instance to sync with | -| `COUCHDB_PASSWORD` | `password` | Password of the CouchDB instance to sync with | -| `COUCHDB_DBS` | `"medic"` | Space separated list of databases to sync e.g `"medic medic_sentinel"` | -| `COUCHDB_HOST` | `couchdb` | Host of the CouchDB instance to sync with | -| `COUCHDB_PORT` | `5984` | Port of the CouchDB instance to sync with | -| `COUCHDB_SECURE` | `false` | Is connection to CouchDB instance secure? | - -{{% alert title="Note" %}} -If `CHT_PIPELINE_BRANCH_URL` is pointing to a private GitHub repository, you'll need an access token in the URL. Assuming your repository is `medic/cht-pipeline`, you would replace `` with an access token: `https://@github.com/medic/cht-pipeline.git#main`. Please see [GitHub's instructions](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) on how to generate a token. -{{% /alert %}} - ## Setup -Copy the values in `env.template` file to the `.env` file and update them accordingly to the local configuration for the different scenarios below. +Copy the values in `env.template` file to the `.env` file. For more information, see the references on the [Environment variables page]({{< relref "apps/guides/data/analytics/environment-variables" >}}). Install the dependencies: ```sh @@ -70,7 +36,7 @@ docker-compose -f docker-compose.couchdb.yml -f docker-compose.postgres.yml -f d You can verify this command worked by running `docker ps`. It should show 6 containers running including Logstash, DBT, data generator, PostgreSQL, CouchDB and PostgREST (note the `t` at the end!). -Now that all services are running, use a PostgreSQL client like [pgAdmin](https://www.pgadmin.org/) to connect to server `localhost:5432` with user `postgres` and password `postgres`. You should be able to see sample data being inserted into the `v1.medic` table. +Now that all services are running, use a PostgreSQL client like [pgAdmin](https://www.pgadmin.org/) or [Beekeeper](https://www.beekeeperstudio.io/) to connect to server `localhost:5432` with user `postgres` and password `postgres`. You should be able to see sample data being inserted into the `v1.medic` table. ### Separate CouchDB instance This setup involves starting Logstash, PostgreSQL, PostgREST, and DBT. It assumes you have a CouchDB instance running, and you updated the `.env` CouchDB variables accordingly.