Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

ehclark · 2024-04-23T19:34:34Z

Resolves #87

This started as a simple change to add in a database connection management library so that when AnyVar is running as a server for long periods of time, it does not fail due to stale database connections. It evolved into a larger refactoring of the Snowflake and Postgres storage implementations.

New SqlStorage base class
Given the amount of changes required due to the slight differences between DB APIs for Snowflake/Postgres and SQLAlchemy, it made sense to create a new SqlStorage base class that both the Snowflake and Postgres storage implementations descend from. The SqlStorage implementation incorporates the background write feature in the Snowflake implementation, but by default will still behave as previously by flushing writes before exiting a batch manager context.

The SnowflakeObjectStore and PostgresObjectStore storage classes implement low level database operations where differences in SQL semantics exist. TheSnowflakeObjectStore implementation also now supports a setting that controls what type of SQL statement is used to write new VRS objects. This setting allows for trade offs between uniqueness of records in the VRS objects tables and overall database throughput.

SQLAlchemy integration
SQLAlchemy is a database abstraction layer and ORM. It is also the only database connection management/pool library that is supported by Snowflake. Thus it was chosen for database connection management. The only SQLAlchemy feature used by AnyVar is its database connection recycling (via a connection pool) capability.

The existing Snowflake SQL mocks for testing have been replaced by SQLAlchemy mocks for unit testing.

Throw KeyError from __getitem__
The existing implementations of __getitem__ in the storage implementations would return None if a VRS object was not found in the store. This is not compliant with the prescribed behavior described here. The SqlStorage.__getitem__() function now throws a KeyError if the VRS ID is not found in the store.

Consistent 404 behavior from /variation and /locations
The /variation/{id} and /locations/{id} endpoints return either a variation or sequence location from the store. They were inconsistent with each other and also relied on a KeyError to detect a missing object, which was not being thrown. The implementations for these methods were updated so that they behave consistently and always return a 404 if the specified object is not found.

Includes the changes in #92 and should be merged after #92 is merged.

… RDBMS storage implementations The base class utilizes SqlAlchemy for connection management and SQL statement execution because it is the only connection pooling library that works with the Snowflake connector. The base class includes the background db write capabilities from the Snowflake implementation and actual SQL statement execution where standard SQL is used. Abstract methods are defined for queries where the SQL or database APIs are not standard.

…w SqlStorage base class Primarily removed code that was included in the base class and reorganized remaining code into the base class API shape. Because the Snowflake connector only supports SqlAlchemy 1.4 which in turn only supports psycopg2, had to modify the batch insert logic to use a different API.

…orage base class Removed code that is now included in base class and reorganized remaining code into base class API shape.

…storage implementation Refactored mocks for SqlAlchemy based testing into separate module

…n_pool are sometimes not created leading to spurious errors on close(). Check for these attributes before attempting to clean them up.

…e a string or a dict

…use when adding new VRS objects to the database

…alized version

…store does not throw a KeyError on missing key

Merge issue-85 into issue-87-db-conn-pool Unit tests for storage implementations rely on upstream unit tests to provide needed data

… vrs_objects table has a primary key and uses "ON CONFLICT" on inserts

…med parameters were not working Pick up table name from environment in unit tests

…and throw KeyError when an item is not found

…ion used internally is not correct for API responses

…id is not found

src/anyvar/storage/snowflake.py

ehclark added 30 commits March 1, 2024 12:49

Update expected VRS IDs for VCF tests

f61bd01

Update VRS IDs for variation tests

3b8df69

Switch to snowflake-sqlalchemy package

b3b7535

Update Snowflake storage implementation to be a subclass of new SqlSt…

b4c002d

…orage base class Removed code that is now included in base class and reorganized remaining code into base class API shape.

Updated unit tests to cover the use of background writes in Postgres …

c279668

…storage implementation Refactored mocks for SqlAlchemy based testing into separate module

Rename test file and replace unused var names with underscore

92195de

Add storage option to always fully flush on batch context exit

c58b144

When storage construction does not complete, the batch_thread and con…

7d15584

…n_pool are sometimes not created leading to spurious errors on close(). Check for these attributes before attempting to clean them up.

Depending on the underlying database, the returned column value can b…

1653906

…e a string or a dict

Add batch add mode settings to control what type of SQL statement to …

daefc3c

…use when adding new VRS objects to the database

Update variation test data to match VRS 2.0 changes

4300ad6

Comment out response model to return full VRS objects instead of seri…

8971570

…alized version

Make get location/variation behave consistently even when the object …

15514da

…store does not throw a KeyError on missing key

Remove code added to make debugging easier

cf63bf2

Merge branch 'issue-85' into issue-87-db-conn-pool

4c0878e

Merge issue-85 into issue-87-db-conn-pool Unit tests for storage implementations rely on upstream unit tests to provide needed data

Uupdate queries to use specified table name

e0a8d71

Fix bug in detecting column value type on fetch

1a625b3

Batch add mode only makes sense for Snowflake because in Postgres the…

5a62da4

… vrs_objects table has a primary key and uses "ON CONFLICT" on inserts

Switch to using question mark bind variables for Snowflake because na…

bf09a47

…med parameters were not working Pick up table name from environment in unit tests

Update to batch insert to play nicely with Snowflake quirks

ab96430

Update example URL to be SQLAlchemy friendly

da15cb6

Use super() to invoke __init__()

22295be

Add support for Snowflake private key auth

cb998fa

Add monkey patch workaround for bug in Snowflake SQLAlchemy

7a3e99e

Update collation in temp loading table

e64455b

Storage implementations should be consistent with MutableMapping API …

5f9a483

…and throw KeyError when an item is not found

Remove VRS model classes from response objects because the serializat…

a6c7484

…ion used internally is not correct for API responses

Corrected path used for missing allele id test

bec6d2f

ehclark added 13 commits April 18, 2024 16:51

Get location and get variation should be consistent in behavior when …

2e28e95

…id is not found

Revert unecessary change

003d9d1

Throw KeyError when id is not found

5dee8f1

Merge branch 'issue-85' into issue-87-db-conn-pool

b10af9a

Add missing argument to _get_connect_args

d818d5c

Code formatting

157a46a

Merge branch 'main' into issue-85

277e886

Merge branch 'main' into issue-87-db-conn-pool

977e052

Suppress SQL injection warning as elsewhere

202e752

Code formatting

770b8d6

Adding missing SQL injection warning suppressions

54d4102

Update README to reflect changes

5f45287

Merge branch 'issue-85' into issue-87-db-conn-pool

68701b8

ehclark linked an issue Apr 23, 2024 that may be closed by this pull request

Add database connection lifecycle management #87

Closed

github-advanced-security bot found potential problems Apr 23, 2024

View reviewed changes

src/anyvar/storage/snowflake.py Fixed Show fixed Hide fixed

Address "Incomplete URL substring sanitization" warning

c46a54b

ehclark requested review from korikuzma and jsstevenson April 23, 2024 19:48

Merge branch 'main' into issue-87-db-conn-pool

66cd446

jsstevenson approved these changes Apr 26, 2024

View reviewed changes

korikuzma approved these changes Apr 29, 2024

View reviewed changes

ehclark merged commit 454504f into main Apr 29, 2024
5 checks passed

ehclark deleted the issue-87-db-conn-pool branch April 29, 2024 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

ehclark commented Apr 23, 2024

Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

Conversation

ehclark commented Apr 23, 2024