Merge pull request #36 from Health-Informatics-UoN/fix/35/bunny-cli-p…

…oetry Refine Bunny quickstart
Health-Informatics-UoN · Jan 22, 2025 · 47057d7 · 47057d7
2 parents 9585e4f + 1a745e2
commit 47057d7
Show file tree

Hide file tree

Showing 10 changed files with 283 additions and 192 deletions.
diff --git a/website/pages/bunny.mdx b/website/pages/bunny.mdx
@@ -6,12 +6,13 @@ Bunny is an application for fetching cohort discovery queries and resolving them
 - Bunny can request queries from any compatible upstream Task API, such as the HDR Cohort Discovery tool, or as part of a federated network through [Hutch Relay](/relay).
 - Bunny enables [obfuscation](/bunny/config#obfuscation) of query results, to simplify data governance issues. 
 - Bunny container images are available for ease of [deployment](/bunny/deployment) in your environment.
+- Bunny currently supports PostgreSQL and SQL Server as OMOP CDM databases.
 - Bunny can also be ran just as a query executor, through its [command line interface](/bunny/quickstart#running-bunny-cli).
 
 The code for Bunny is open source and licensed under MIT, and can be found on [Github](https://github.com/Health-Informatics-UoN/hutch-bunny).
 
 ---
 
-![A federated deployment of Relay and Bunny](/images/relay.png)
-_A federated deployment of Relay and multiple Bunnies_.
+![A deployment of one Bunny directly to the Gateway](/images/bunny.png)
+_A deployment of one Bunny directly to the Gateway_.
 
diff --git a/website/pages/bunny/_meta.js b/website/pages/bunny/_meta.js
@@ -2,6 +2,6 @@ export default {
   quickstart: "Quickstart",
   config: "Configuration",
   deployment: "Deployment",
-  dev_setup: "Setting up a development environment",
-  core_api_ref: "core API reference",
+  core_api_ref: "Core API reference",
+  developers: "Developers"
 };
diff --git a/website/pages/bunny/config.mdx b/website/pages/bunny/config.mdx
@@ -78,6 +78,18 @@ USE_TRINO=
 If using the supplied `compose.yml` and running the database in the same stack, unless you modify the database configuration, the defaults will connect Bunny to your database.
 If using a remote database, change accordingly.
 
+### DATASOURCE_DB_DRIVERNAME
+
+Defines the database driver to use. Currently supports PostgreSQL and SQL Server.
+
+Valid values:
+- `postgresql` -> PostgreSQL
+- `mssql` -> SQL Server
+
+```yaml
+DATASOURCE_DB_DRIVERNAME=postgres
+```
+
 ## Obfuscation
 
 Two methods of result obfuscation are provided as below. Before building your Hutch tools stack, these can be configured in `compose.yml`.

diff --git a/website/pages/bunny/deployment.mdx b/website/pages/bunny/deployment.mdx
@@ -3,7 +3,7 @@ import { Steps } from "nextra/components";
 
 # Bunny Deployment
 
-This page will guide you through getting Bunny deployed in a Virtual Machine (VM).
+This page will guide you through getting Bunny deployed in a Virtual Machine (VM) or locally on your machine.
 
 ## Prerequisites
 

diff --git a/website/pages/bunny/developers/_meta.js b/website/pages/bunny/developers/_meta.js
@@ -0,0 +1,4 @@
+export default {
+  quickstart: "Quickstart",
+  dev_setup: "Setting up a development environment",
+};
diff --git a/website/pages/bunny/dev_setup.mdx → website/pages/bunny/developers/dev_setup.mdx b/website/pages/bunny/dev_setup.mdx → website/pages/bunny/developers/dev_setup.mdx
@@ -6,15 +6,10 @@ If the bunny-daemon logs messages saying that it's setting up a database connect
 To double check, you can also use the bunny-cli to generate an output JSON.
 
 ### Using the environment
-Bunny uses poetry to manage its dependencies.
+Bunny uses uv to manage its dependencies.
 To test any changes made to Bunny in development you can use either
 ```bash
-poetry run bunny-[daemon/cli]
-```
-or start a poetry shell
-```bash
-poetry shell
-bunny-[daemon/cli]
+uv run bunny-[daemon]
 ```
 
 ## Architecture

diff --git a/website/pages/bunny/developers/quickstart.mdx b/website/pages/bunny/developers/quickstart.mdx
@@ -0,0 +1,215 @@
+import { Callout, Tabs, Table, Td, Th, Tr, Code } from "nextra/components";
+import { Steps } from "nextra/components";
+
+# Getting started with Bunny
+
+This page will guide you through getting Bunny running locally.
+
+Start by cloning [the Bunny repository](https://github.com/Health-Informatics-UoN/hutch-bunny)
+
+## Prerequisites
+
+- Bunny runs on Python version 3.13
+- Dependencies are managed with [uv](https://github.com/astral-sh/uv), which needs to be installed if you run Bunny outside a container.
+- Bunny needs to query a database
+  - A remote OMOP-CDM database running
+  - Or a tarball containing a `pg_dump` of an OMOP-CDM database
+
+## OMOP-CDM setup
+
+Before Bunny can get up and running, it needs a database to query.
+If you don't have a real OMOP-CDM, you can run a mock database.
+
+### Mock database setup
+
+<Steps>
+#### Start a containerized database
+
+The `compose.yml` in the root of the Hutch tools repository can start a database container with
+
+```bash copy
+docker compose up db
+```
+
+This will initialise a Postgres instance in the container.
+
+<Callout type="info">
+  These instructions assume you have a pre-existing OMOP CDM Postgresql database
+  and it can be called `hutch_db.tar`, for example, as below.
+</Callout>
+
+#### Copy the data
+
+Navigate to the folder containing `hutch_db.tar` and copy it into the container with:
+
+```bash copy
+docker cp hutch_db.tar hutch-bunny-dev-db-1:/
+```
+
+#### Start running bash in your container
+
+```bash copy
+docker exec -it hutch-bunny-dev-db-1 bash
+```
+
+#### Use pg_restore to load the data into the database
+
+```bash copy
+pg_restore --dbname=postgres --host=localhost --port=5432 --username=postgres --password hutch_db.tar
+```
+
+If prompted, provide "postgres" as a password
+
+You can then exit the container with `ctrl+d` or `exit`
+
+</Steps>
+
+## Environment configuration
+
+To run Bunny locally, your environment needs to have some variables configured.
+
+<Steps>
+### Create a file called .env in app/bunny
+The file should contain the variables required to connect to a database and Relay.
+
+For example, if using the containerized Relay and mock database:
+
+```sh copy
+DATASOURCE_DB_USERNAME=postgres
+DATASOURCE_DB_PASSWORD=postgres
+DATASOURCE_DB_DATABASE=postgres
+DATASOURCE_DB_DRIVERNAME=postgresql
+DATASOURCE_DB_SCHEMA=public
+DATASOURCE_DB_PORT=5432
+DATASOURCE_DB_HOST=localhost
+TASK_API_BASE_URL=http://localhost:8080
+TASK_API_USERNAME=username
+TASK_API_PASSWORD=password
+LOW_NUMBER_SUPPRESSION_THRESHOLD=
+ROUNDING_TARGET=
+POLLING_INTERVAL=5
+```
+
+If you are querying a remote database, the variables prefixed with `DATASOURCE` must be configured accordingly.
+If you use another method to set your environment variables, follow the example above.
+
+</Steps>
+## Installing dependencies
+The first time you run Bunny outside a container, you will need to install its dependencies by running
+```bash copy
+uv pip install .
+```
+
+## Running the Docker daemon
+
+Bunny has a daemon which polls [Relay](/relay) for jobs, so needs to have a Relay instance running.
+
+<Steps>
+### Start the Relay container
+
+The compose used to start up the database also contains an implementation of Relay.
+
+```bash copy
+docker compose up relay -d
+```
+
+### Run the Bunny daemon
+
+The Bunny daemon can then be run using uv. This will ensure the dependencies and environment variables are loaded.
+
+```bash copy
+uv run bunny-daemon
+```
+
+You should then see a message in your console like this:
+
+```bash
+INFO - 12-Nov-24 12:36:24 - Setting up database connection...
+INFO - 12-Nov-24 12:36:24 - Looking for job...
+INFO - 12-Nov-24 12:36:29 - Job received. Resolving...
+INFO - 12-Nov-24 12:36:29 - Processing query...
+INFO - 12-Nov-24 12:36:30 - Solved availability query
+INFO - 12-Nov-24 12:36:30 - Job resolved.
+INFO - 12-Nov-24 12:36:35 - Looking for job...
+INFO - 12-Nov-24 12:36:40 - Looking for job...
+INFO - 12-Nov-24 12:36:45 - Looking for job...
+```
+
+Bunny establishes a connection to your OMOP-CDM database, then polls Relay for a job.
+When it receives a job, it processes the query, queries the database, and sends the result back to Relay.
+You won't see the results here, but if you see the messages, then it's successfully contacting both the database and Relay.
+
+</Steps>
+
+<Callout emoji="🎉">Congratulations on your first Bunny query!</Callout>
+
+## Running bunny-cli
+
+To run Bunny without Relay, the command-line interface can be used.
+This needs a file with the right JSON schema to run (example below).
+
+### How to run
+
+<Tabs items={['Docker', 'From Source']}>
+  <Tabs.Tab>
+    You can use the CLI from the same image as the daemon, by overriding the `entrypoint`.
+
+    You will need to have made the input file available to the docker container, for example by mounting a volume.
+
+    It is possible to pass Docker arguments to `docker run` **before** the image, and arguments to the Bunny CLI **after** the image.
+
+    Here's an example with `docker run`:
+
+    ```bash copy
+    docker run \
+    -v <path/to/rquest-query.json>:./rquest-query.json \
+    --entrypoint uv \
+    ghcr.io/health-informatics-uon/hutch/bunny:<TAG> \
+    run bunny --body ./rquest-query.json
+    ```
+
+  </Tabs.Tab>
+  <Tabs.Tab>
+    To run the CLI, navigate to `hutch-cohort-discovery/app/bunny`
+
+    Then run
+
+    ```bash copy
+    uv run bunny --body <path/to/rquest-query.json>
+    ```
+
+    This should write a result for your query to `app/bunny/output.json`
+
+  </Tabs.Tab>
+</Tabs>
+
+### Sample input files
+
+```json copy filename="availability.json"
+{
+  "task_id": "job-2023-01-13-14: 20: 38-project",
+  "project": "project_id",
+  "owner": "user1",
+  "cohort": {
+    "groups": [
+      {
+        "rules": [
+          {
+            "varname": "OMOP",
+            "varcat": "Person",
+            "type": "TEXT",
+            "oper": "=",
+            "value": "8507"
+          }
+        ],
+        "rules_oper": "AND"
+      }
+    ],
+    "groups_oper": "OR"
+  },
+  "collection": "collection_id",
+  "protocol_version": "v2",
+  "char_salt": "salt",
+  "uuid": "unique_id"
+}
+```