Skip to content

Commit

Permalink
Merge pull request #36 from Health-Informatics-UoN/fix/35/bunny-cli-p…
Browse files Browse the repository at this point in the history
…oetry

Refine Bunny quickstart
  • Loading branch information
AndyRae authored Jan 22, 2025
2 parents 9585e4f + 1a745e2 commit 47057d7
Show file tree
Hide file tree
Showing 10 changed files with 283 additions and 192 deletions.
5 changes: 3 additions & 2 deletions website/pages/bunny.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ Bunny is an application for fetching cohort discovery queries and resolving them
- Bunny can request queries from any compatible upstream Task API, such as the HDR Cohort Discovery tool, or as part of a federated network through [Hutch Relay](/relay).
- Bunny enables [obfuscation](/bunny/config#obfuscation) of query results, to simplify data governance issues.
- Bunny container images are available for ease of [deployment](/bunny/deployment) in your environment.
- Bunny currently supports PostgreSQL and SQL Server as OMOP CDM databases.
- Bunny can also be ran just as a query executor, through its [command line interface](/bunny/quickstart#running-bunny-cli).

The code for Bunny is open source and licensed under MIT, and can be found on [Github](https://github.com/Health-Informatics-UoN/hutch-bunny).

---

![A federated deployment of Relay and Bunny](/images/relay.png)
_A federated deployment of Relay and multiple Bunnies_.
![A deployment of one Bunny directly to the Gateway](/images/bunny.png)
_A deployment of one Bunny directly to the Gateway_.

4 changes: 2 additions & 2 deletions website/pages/bunny/_meta.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ export default {
quickstart: "Quickstart",
config: "Configuration",
deployment: "Deployment",
dev_setup: "Setting up a development environment",
core_api_ref: "core API reference",
core_api_ref: "Core API reference",
developers: "Developers"
};
12 changes: 12 additions & 0 deletions website/pages/bunny/config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,18 @@ USE_TRINO=
If using the supplied `compose.yml` and running the database in the same stack, unless you modify the database configuration, the defaults will connect Bunny to your database.
If using a remote database, change accordingly.
### DATASOURCE_DB_DRIVERNAME
Defines the database driver to use. Currently supports PostgreSQL and SQL Server.
Valid values:
- `postgresql` -> PostgreSQL
- `mssql` -> SQL Server
```yaml
DATASOURCE_DB_DRIVERNAME=postgres
```

## Obfuscation

Two methods of result obfuscation are provided as below. Before building your Hutch tools stack, these can be configured in `compose.yml`.
Expand Down
2 changes: 1 addition & 1 deletion website/pages/bunny/deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import { Steps } from "nextra/components";

# Bunny Deployment

This page will guide you through getting Bunny deployed in a Virtual Machine (VM).
This page will guide you through getting Bunny deployed in a Virtual Machine (VM) or locally on your machine.

## Prerequisites

Expand Down
4 changes: 4 additions & 0 deletions website/pages/bunny/developers/_meta.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export default {
quickstart: "Quickstart",
dev_setup: "Setting up a development environment",
};
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,10 @@ If the bunny-daemon logs messages saying that it's setting up a database connect
To double check, you can also use the bunny-cli to generate an output JSON.

### Using the environment
Bunny uses poetry to manage its dependencies.
Bunny uses uv to manage its dependencies.
To test any changes made to Bunny in development you can use either
```bash
poetry run bunny-[daemon/cli]
```
or start a poetry shell
```bash
poetry shell
bunny-[daemon/cli]
uv run bunny-[daemon]
```

## Architecture
Expand Down
215 changes: 215 additions & 0 deletions website/pages/bunny/developers/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
import { Callout, Tabs, Table, Td, Th, Tr, Code } from "nextra/components";
import { Steps } from "nextra/components";

# Getting started with Bunny

This page will guide you through getting Bunny running locally.

Start by cloning [the Bunny repository](https://github.com/Health-Informatics-UoN/hutch-bunny)

## Prerequisites

- Bunny runs on Python version 3.13
- Dependencies are managed with [uv](https://github.com/astral-sh/uv), which needs to be installed if you run Bunny outside a container.
- Bunny needs to query a database
- A remote OMOP-CDM database running
- Or a tarball containing a `pg_dump` of an OMOP-CDM database

## OMOP-CDM setup

Before Bunny can get up and running, it needs a database to query.
If you don't have a real OMOP-CDM, you can run a mock database.

### Mock database setup

<Steps>
#### Start a containerized database

The `compose.yml` in the root of the Hutch tools repository can start a database container with

```bash copy
docker compose up db
```

This will initialise a Postgres instance in the container.

<Callout type="info">
These instructions assume you have a pre-existing OMOP CDM Postgresql database
and it can be called `hutch_db.tar`, for example, as below.
</Callout>

#### Copy the data

Navigate to the folder containing `hutch_db.tar` and copy it into the container with:

```bash copy
docker cp hutch_db.tar hutch-bunny-dev-db-1:/
```

#### Start running bash in your container

```bash copy
docker exec -it hutch-bunny-dev-db-1 bash
```

#### Use pg_restore to load the data into the database

```bash copy
pg_restore --dbname=postgres --host=localhost --port=5432 --username=postgres --password hutch_db.tar
```

If prompted, provide "postgres" as a password

You can then exit the container with `ctrl+d` or `exit`

</Steps>

## Environment configuration

To run Bunny locally, your environment needs to have some variables configured.

<Steps>
### Create a file called .env in app/bunny
The file should contain the variables required to connect to a database and Relay.

For example, if using the containerized Relay and mock database:

```sh copy
DATASOURCE_DB_USERNAME=postgres
DATASOURCE_DB_PASSWORD=postgres
DATASOURCE_DB_DATABASE=postgres
DATASOURCE_DB_DRIVERNAME=postgresql
DATASOURCE_DB_SCHEMA=public
DATASOURCE_DB_PORT=5432
DATASOURCE_DB_HOST=localhost
TASK_API_BASE_URL=http://localhost:8080
TASK_API_USERNAME=username
TASK_API_PASSWORD=password
LOW_NUMBER_SUPPRESSION_THRESHOLD=
ROUNDING_TARGET=
POLLING_INTERVAL=5
```

If you are querying a remote database, the variables prefixed with `DATASOURCE` must be configured accordingly.
If you use another method to set your environment variables, follow the example above.

</Steps>
## Installing dependencies
The first time you run Bunny outside a container, you will need to install its dependencies by running
```bash copy
uv pip install .
```

## Running the Docker daemon

Bunny has a daemon which polls [Relay](/relay) for jobs, so needs to have a Relay instance running.

<Steps>
### Start the Relay container

The compose used to start up the database also contains an implementation of Relay.

```bash copy
docker compose up relay -d
```

### Run the Bunny daemon

The Bunny daemon can then be run using uv. This will ensure the dependencies and environment variables are loaded.

```bash copy
uv run bunny-daemon
```

You should then see a message in your console like this:

```bash
INFO - 12-Nov-24 12:36:24 - Setting up database connection...
INFO - 12-Nov-24 12:36:24 - Looking for job...
INFO - 12-Nov-24 12:36:29 - Job received. Resolving...
INFO - 12-Nov-24 12:36:29 - Processing query...
INFO - 12-Nov-24 12:36:30 - Solved availability query
INFO - 12-Nov-24 12:36:30 - Job resolved.
INFO - 12-Nov-24 12:36:35 - Looking for job...
INFO - 12-Nov-24 12:36:40 - Looking for job...
INFO - 12-Nov-24 12:36:45 - Looking for job...
```

Bunny establishes a connection to your OMOP-CDM database, then polls Relay for a job.
When it receives a job, it processes the query, queries the database, and sends the result back to Relay.
You won't see the results here, but if you see the messages, then it's successfully contacting both the database and Relay.

</Steps>

<Callout emoji="🎉">Congratulations on your first Bunny query!</Callout>

## Running bunny-cli

To run Bunny without Relay, the command-line interface can be used.
This needs a file with the right JSON schema to run (example below).

### How to run

<Tabs items={['Docker', 'From Source']}>
<Tabs.Tab>
You can use the CLI from the same image as the daemon, by overriding the `entrypoint`.

You will need to have made the input file available to the docker container, for example by mounting a volume.

It is possible to pass Docker arguments to `docker run` **before** the image, and arguments to the Bunny CLI **after** the image.

Here's an example with `docker run`:

```bash copy
docker run \
-v <path/to/rquest-query.json>:./rquest-query.json \
--entrypoint uv \
ghcr.io/health-informatics-uon/hutch/bunny:<TAG> \
run bunny --body ./rquest-query.json
```

</Tabs.Tab>
<Tabs.Tab>
To run the CLI, navigate to `hutch-cohort-discovery/app/bunny`

Then run

```bash copy
uv run bunny --body <path/to/rquest-query.json>
```

This should write a result for your query to `app/bunny/output.json`

</Tabs.Tab>
</Tabs>

### Sample input files

```json copy filename="availability.json"
{
"task_id": "job-2023-01-13-14: 20: 38-project",
"project": "project_id",
"owner": "user1",
"cohort": {
"groups": [
{
"rules": [
{
"varname": "OMOP",
"varcat": "Person",
"type": "TEXT",
"oper": "=",
"value": "8507"
}
],
"rules_oper": "AND"
}
],
"groups_oper": "OR"
},
"collection": "collection_id",
"protocol_version": "v2",
"char_salt": "salt",
"uuid": "unique_id"
}
```
Loading

0 comments on commit 47057d7

Please sign in to comment.