Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93

Merged
merged 45 commits into from
Apr 29, 2024

Conversation

ehclark
Copy link
Contributor

@ehclark ehclark commented Apr 23, 2024

Resolves #87

This started as a simple change to add in a database connection management library so that when AnyVar is running as a server for long periods of time, it does not fail due to stale database connections. It evolved into a larger refactoring of the Snowflake and Postgres storage implementations.

New SqlStorage base class
Given the amount of changes required due to the slight differences between DB APIs for Snowflake/Postgres and SQLAlchemy, it made sense to create a new SqlStorage base class that both the Snowflake and Postgres storage implementations descend from. The SqlStorage implementation incorporates the background write feature in the Snowflake implementation, but by default will still behave as previously by flushing writes before exiting a batch manager context.

The SnowflakeObjectStore and PostgresObjectStore storage classes implement low level database operations where differences in SQL semantics exist. TheSnowflakeObjectStore implementation also now supports a setting that controls what type of SQL statement is used to write new VRS objects. This setting allows for trade offs between uniqueness of records in the VRS objects tables and overall database throughput.

SQLAlchemy integration
SQLAlchemy is a database abstraction layer and ORM. It is also the only database connection management/pool library that is supported by Snowflake. Thus it was chosen for database connection management. The only SQLAlchemy feature used by AnyVar is its database connection recycling (via a connection pool) capability.

The existing Snowflake SQL mocks for testing have been replaced by SQLAlchemy mocks for unit testing.

Throw KeyError from __getitem__
The existing implementations of __getitem__ in the storage implementations would return None if a VRS object was not found in the store. This is not compliant with the prescribed behavior described here. The SqlStorage.__getitem__() function now throws a KeyError if the VRS ID is not found in the store.

Consistent 404 behavior from /variation and /locations
The /variation/{id} and /locations/{id} endpoints return either a variation or sequence location from the store. They were inconsistent with each other and also relied on a KeyError to detect a missing object, which was not being thrown. The implementations for these methods were updated so that they behave consistently and always return a 404 if the specified object is not found.

Includes the changes in #92 and should be merged after #92 is merged.

ehclark added 30 commits March 1, 2024 12:49
… RDBMS storage implementations

The base class utilizes SqlAlchemy for connection management and SQL statement execution because it is the only connection pooling library that works with the Snowflake connector.

The base class includes the background db write capabilities from the Snowflake implementation and actual SQL statement execution where standard SQL is used.  Abstract methods are defined for queries where the SQL or database APIs are not standard.
…w SqlStorage base class

Primarily removed code that was included in the base class and reorganized remaining code into the base class API shape.

Because the Snowflake connector only supports SqlAlchemy 1.4 which in turn only supports psycopg2, had to modify the batch insert logic to use a different API.
…orage base class

Removed code that is now included in base class and reorganized remaining code into base class API shape.
…storage implementation

Refactored mocks for SqlAlchemy based testing into separate module
…n_pool are sometimes not created leading to spurious errors on close(). Check for these attributes before attempting to clean them up.
…use when adding new VRS objects to the database
…store does not throw a KeyError on missing key
Merge issue-85 into issue-87-db-conn-pool

Unit tests for storage implementations rely on upstream unit tests
to provide needed data
… vrs_objects table has a primary key and uses "ON CONFLICT" on inserts
…med parameters were not working

Pick up table name from environment in unit tests
…and throw KeyError when an item is not found
…ion used internally is not correct for API responses
@ehclark ehclark linked an issue Apr 23, 2024 that may be closed by this pull request
@ehclark ehclark merged commit 454504f into main Apr 29, 2024
5 checks passed
@ehclark ehclark deleted the issue-87-db-conn-pool branch April 29, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add database connection lifecycle management
3 participants