-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate SQLAlchemy for db conn management and introduce new SqlStorage abstraction #93
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… RDBMS storage implementations The base class utilizes SqlAlchemy for connection management and SQL statement execution because it is the only connection pooling library that works with the Snowflake connector. The base class includes the background db write capabilities from the Snowflake implementation and actual SQL statement execution where standard SQL is used. Abstract methods are defined for queries where the SQL or database APIs are not standard.
…w SqlStorage base class Primarily removed code that was included in the base class and reorganized remaining code into the base class API shape. Because the Snowflake connector only supports SqlAlchemy 1.4 which in turn only supports psycopg2, had to modify the batch insert logic to use a different API.
…orage base class Removed code that is now included in base class and reorganized remaining code into base class API shape.
…storage implementation Refactored mocks for SqlAlchemy based testing into separate module
…n_pool are sometimes not created leading to spurious errors on close(). Check for these attributes before attempting to clean them up.
…e a string or a dict
…use when adding new VRS objects to the database
…store does not throw a KeyError on missing key
Merge issue-85 into issue-87-db-conn-pool Unit tests for storage implementations rely on upstream unit tests to provide needed data
… vrs_objects table has a primary key and uses "ON CONFLICT" on inserts
…med parameters were not working Pick up table name from environment in unit tests
…and throw KeyError when an item is not found
…ion used internally is not correct for API responses
jsstevenson
approved these changes
Apr 26, 2024
korikuzma
approved these changes
Apr 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #87
This started as a simple change to add in a database connection management library so that when AnyVar is running as a server for long periods of time, it does not fail due to stale database connections. It evolved into a larger refactoring of the Snowflake and Postgres storage implementations.
New SqlStorage base class
Given the amount of changes required due to the slight differences between DB APIs for Snowflake/Postgres and SQLAlchemy, it made sense to create a new
SqlStorage
base class that both the Snowflake and Postgres storage implementations descend from. TheSqlStorage
implementation incorporates the background write feature in the Snowflake implementation, but by default will still behave as previously by flushing writes before exiting a batch manager context.The
SnowflakeObjectStore
andPostgresObjectStore
storage classes implement low level database operations where differences in SQL semantics exist. TheSnowflakeObjectStore
implementation also now supports a setting that controls what type of SQL statement is used to write new VRS objects. This setting allows for trade offs between uniqueness of records in the VRS objects tables and overall database throughput.SQLAlchemy integration
SQLAlchemy is a database abstraction layer and ORM. It is also the only database connection management/pool library that is supported by Snowflake. Thus it was chosen for database connection management. The only SQLAlchemy feature used by AnyVar is its database connection recycling (via a connection pool) capability.
The existing Snowflake SQL mocks for testing have been replaced by SQLAlchemy mocks for unit testing.
Throw KeyError from
__getitem__
The existing implementations of
__getitem__
in the storage implementations would returnNone
if a VRS object was not found in the store. This is not compliant with the prescribed behavior described here. TheSqlStorage.__getitem__()
function now throws aKeyError
if the VRS ID is not found in the store.Consistent 404 behavior from /variation and /locations
The
/variation/{id}
and/locations/{id}
endpoints return either a variation or sequence location from the store. They were inconsistent with each other and also relied on aKeyError
to detect a missing object, which was not being thrown. The implementations for these methods were updated so that they behave consistently and always return a 404 if the specified object is not found.Includes the changes in #92 and should be merged after #92 is merged.