Skip to content

Commit

Permalink
Metabase: Add test harness for validating Metabase against CrateDB
Browse files Browse the repository at this point in the history
A basic test case that reads CrateDB's `sys.summit` table through
Metabase, after connecting CrateDB as a PostgreSQL database.
  • Loading branch information
amotl committed Nov 23, 2024
1 parent 2a17d1e commit f1374d9
Show file tree
Hide file tree
Showing 10 changed files with 432 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,16 @@ updates:
schedule:
interval: "daily"

- directory: "/application/metabase"
package-ecosystem: "pip"
schedule:
interval: "daily"

- directory: "/application/metabase"
package-ecosystem: "docker"
schedule:
interval: "daily"

# Frameworks.

- directory: "/framework/dbt/basic"
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/application-metabase.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Metabase

on:
pull_request:
branches: ~
paths:
- '.github/workflows/application-metabase.yml'
- 'application/metabase/**'
- '/requirements.txt'
push:
branches: [ main ]
paths:
- '.github/workflows/application-metabase.yml'
- 'application/metabase/**'
- '/requirements.txt'

# Allow job to be triggered manually.
workflow_dispatch:

# Run job each night after CrateDB nightly has been published.
schedule:
- cron: '0 3 * * *'

# Cancel in-progress jobs when pushing to the same branch.
concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}

jobs:

test:
name: "
Python: ${{ matrix.python-version }}
CrateDB: ${{ matrix.cratedb-version }}
on ${{ matrix.os }}"
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
os: [ "ubuntu-22.04" ]
python-version: [ "3.12" ]
cratedb-version: [ "nightly" ]

steps:

- name: Acquire sources
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
architecture: x64
cache: "pip"
cache-dependency-path: |
pyproject.toml
requirements.txt
requirements-test.txt
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"

- name: Install utilities
run: |
uv pip install --system -r requirements.txt
- name: Validate application/metabase
run: |
ngr test --accept-no-venv application/metabase
38 changes: 38 additions & 0 deletions application/metabase/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Verify Metabase with CrateDB

## About

This folder includes software integration tests for verifying
that Metabase works well together with CrateDB.
The test harness is based on Docker Compose.

## What's Inside

A basic test case that reads CrateDB's `sys.summit` table through
Metabase, after connecting CrateDB as a PostgreSQL database.

## Setup

Setup sandbox and install packages.
```bash
pip install uv
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt -r requirements-test.txt
```

## Usage

Run integration tests.
```bash
pytest
```

Watch service logs.
```shell
docker compose logs -f
```

Note that the setup is configured to keep the containers alive after starting
them. If you want to actively recycle them, invoke `docker compose down` before
running `pytest`.
46 changes: 46 additions & 0 deletions application/metabase/backlog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# CrateDB <-> Metabase backlog


## metabase/metabase:v0.45.4.3

```
2024-11-22 23:22:07,139 ERROR driver.util :: Failed to connect to Database
org.postgresql.util.PSQLException: The server does not support SSL.
```

```
2024-11-22 23:22:07,290 WARN metabase.email :: Failed to send email
clojure.lang.ExceptionInfo: SMTP host is not set. {:cause :smtp-host-not-set}
```

```
2024-11-22 23:22:08,189 WARN sync.util :: Error running step 'sync-timezone' for postgres Database 2 'cratedb-testdrive'
java.lang.Exception: Unable to parse date string '2024-11-22 23:22:08.175 ' for database engine 'postgres'
```

```
2024-11-22 23:22:08,724 WARN sync.describe-table :: Don't know how to map column type '_int4' to a Field base_type, falling back to :type/*.
2024-11-22 23:22:08,724 WARN sync.describe-table :: Don't know how to map column type '_int4' to a Field base_type, falling back to :type/*.
2024-11-22 23:22:08,725 WARN sync.describe-table :: Don't know how to map column type 'regclass' to a Field base_type, falling back to :type/*.
2024-11-22 23:22:08,725 WARN sync.describe-table :: Don't know how to map column type '_int4' to a Field base_type, falling back to :type/*.
2024-11-22 23:22:08,726 WARN sync.describe-table :: Don't know how to map column type '_int2' to a Field base_type, falling back to :type/*.
...
```

```
2024-11-22 23:22:13,900 WARN sync.util :: Error fingerprinting Table 12 'sys.jobs'
clojure.lang.ExceptionInfo: Error executing query: ERROR: line 2:359: no viable alternative at input 'SELECT "source"."substring531" AS "substring531", "source"."substring532" AS "substring532", "source"."substring533" AS "substring533", "source"."started" AS "started", "source"."substring534" AS "substring534", "source"."substring535" AS "substring535", "source"."substring536" AS "substring536" FROM (SELECT "sys"."jobs"."id" AS "id", ("sys"."jobs"."node"#>'
```

```
2024-11-22 23:22:14,390 WARN sync.util :: Error fingerprinting Table 13 'sys.nodes'
clojure.lang.ExceptionInfo: Error executing query: ERROR: line 2:97: no viable alternative at input 'SELECT "source"."load['probe_timestamp']" AS "load['probe_timestamp']", ("source"."fs['total']"#>'
```

```
2024-11-22 23:22:23,588 ERROR models.field-values :: Error fetching field values
clojure.lang.ExceptionInfo: Error executing query: ERROR: Cannot ORDER BY 'conffeqop': invalid data type 'integer_array'.
2024-11-22 23:22:23,599 ERROR models.field-values :: Error fetching field values
clojure.lang.ExceptionInfo: Error executing query: ERROR: Cannot ORDER BY 'conkey': invalid data type 'smallint_array'.
```
51 changes: 51 additions & 0 deletions application/metabase/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
networks:
metanet-demo:
driver: bridge

services:

# Metabase
# https://www.metabase.com/docs/latest/installation-and-operation/running-metabase-on-docker#example-docker-compose-yaml-file
metabase:
image: metabase/metabase:v0.45.4.3
container_name: metabase
hostname: metabase
volumes:
- /dev/urandom:/dev/random:ro
ports:
- 3000:3000
networks:
- metanet-demo
healthcheck:
test: curl --fail -I http://localhost:3000/api/health || exit 1
interval: 15s
timeout: 5s
retries: 5

# CrateDB
# https://github.com/crate/crate
cratedb:
image: crate/crate:nightly
container_name: cratedb
hostname: cratedb
ports:
- 4200:4200
- 5432:5432
networks:
- metanet-demo
healthcheck:
# https://github.com/crate/docker-crate/pull/151/files
test: curl --max-time 25 http://localhost:4200 || exit 1
interval: 30s
timeout: 30s

# Wait for all defined services to be fully available by probing their health
# status, even when using `docker compose up --detach`.
# https://marcopeg.com/2019/docker-compose-healthcheck/
wait:
image: dadarek/wait-for-dependencies
depends_on:
metabase:
condition: service_healthy
cratedb:
condition: service_healthy
163 changes: 163 additions & 0 deletions application/metabase/metabase_rig.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
import time
from functools import lru_cache

import requests
from metabase_api import Metabase_API


class MetabaseRig:
"""
Support end-to-end testing of CrateDB and Metabase.
https://www.metabase.com/docs/latest/api-documentation
Authenticate your requests with a session token
https://www.metabase.com/learn/metabase-basics/administration/administration-and-operation/metabase-api#authenticate-your-requests-with-a-session-token
"""
def __init__(self, url: str):
self.username = "[email protected]"
self.password = "123456metabase"
self.mb = None

self.url = url
self.api_url = f"{url.rstrip('/')}/api"
self.session = requests.Session()
self.session_token = None

def get_setup_token(self) -> str:
response = self.session.get(f"{self.api_url}/session/properties")
return response.json()["setup-token"]

def setup(self):
"""
Run Metabase setup, create admin user, and return a session ID.
https://www.metabase.com/docs/latest/api/setup#post-apisetup
https://discourse.metabase.com/t/rest-api-for-initial-setup-process/3419
"""
response = self.session.post(f"{self.api_url}/setup", json={
"prefs": {
"allow_tracking": "false",
"site_locale": "en",
"site_name": "Hotzenplotz",
},
"user": {
"password": self.password,
"password_confirm": self.password,
"email": self.username,
},
"token": self.get_setup_token(),
})
self.session_token = response.json()["id"]

def login(self):
self.session.post(f"{self.api_url}/session", json={
"username": self.username,
"password": self.password,
})
self.mb = Metabase_API(self.url, self.username, self.password)

def get_databases(self):
return self.session.get(f"{self.api_url}/database").json()

def database(self, name: str) -> "MetabaseDatabase":
return MetabaseDatabase(rig=self, name=name)


class MetabaseDatabase:
def __init__(self, rig: MetabaseRig, name: str):
self.rig = rig
self.name = name
self.timeout = 15

@property
@lru_cache(maxsize=None)
def id(self):
return self.rig.mb.get_item_id("database", self.name)

def create(self):
"""
https://www.metabase.com/docs/latest/api/database#post-apidatabase
"""
self.rig.session.post(
f"{self.rig.api_url}/database",
json={
"engine": "postgres",
"name": self.name,
"details": {
"host": "cratedb",
"port": 5432,
"user": "crate",
},
},
)

def exists(self):
try:
response = self.rig.session.get(f"{self.rig.api_url}/database/{self.id}")
return response.status_code == 200
except ValueError as ex:
if "There is no DB with the name" not in str(ex):
raise
return False

def schema(self, name: str):
response = self.rig.session.get(f"{self.rig.api_url}/database/{self.id}/schema/{name}")
response.raise_for_status()
return response.json()

def table_names(self, schema_name: str):
names = []
for item in self.schema(name=schema_name):
names.append(f"{item['schema']}.{item['name']}")
return names

def table_id_by_name(self, name: str):
return self.rig.mb.get_item_id("table", name)

def query(self, table: str):
response = self.rig.session.post(
f"{self.rig.api_url}/dataset",
json={
"database": self.id,
"query": {
"source-table": self.table_id_by_name(table),
},
"type": "query",
"parameters": [],
}
)
return response.json()

def wait_database(self):
def condition():
return self.exists()
return self._wait(condition, f"Database not found: {self.name}")

def wait_schema(self, name: str):
def condition():
try:
if schema := self.schema(name):
return schema
except requests.RequestException:
pass
return False
return self._wait(condition, f"Database schema '{name}' not found in database '{self.name}'")

def wait_table(self, schema: str, name: str):
def condition():
if schema_info := self.wait_schema(schema):
for item in schema_info:
if item["name"] == name and item["initial_sync_status"] == "complete":
return True
return self._wait(condition, f"Table not found: {schema}.{name}")

def _wait(self, condition, timeout_message):
timeout = self.timeout
while True:
if result := condition():
return result
if timeout == 0:
raise TimeoutError(timeout_message)
timeout -= 1
time.sleep(1)
Loading

0 comments on commit f1374d9

Please sign in to comment.