Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database batch update API #81

Merged
merged 5 commits into from
Jan 30, 2025
Merged

Database batch update API #81

merged 5 commits into from
Jan 30, 2025

Conversation

disinvite
Copy link
Collaborator

Updating the sqlite db using executemany and generator functions is a necessary evil for performance. This change abstracts away the implementation details and provides a much friendlier interface:

# Before:

self._db.bulk_recomp_insert(
    ((addr, {"name": values["name"]}) for addr, values in dataset.items()),
    upsert=True,
)
self._db.bulk_match(
    (
        (orig_addr, recomp_addr)
        for recomp_addr, orig_addr in orig_by_recomp.items()
    )
)


# After:

with self._db.batch() as batch:
    # Assume we are in a loop
        batch.set_recomp(recomp_addr, name="Test")
        batch.match(orig_addr, recomp_addr)

You can use the batch object with the context manager as shown, or manually call batch.commit().

The batch object provides five methods. When you commit, the changes are applied in this order (regardless of what order the methods were called):

  • insert_orig and insert_recomp will set an entity in the database, but only if there is no existing data. (i.e. INSERT.)
  • set_orig and set_recomp will insert or update data for the entity. (i.e. INSERT or REPLACE.)
  • match will combine an orig address with a recomp address if neither is already matched.

The other big change is that matching will now combine two entities if both the orig and recomp addresses already exist but neither are matched. Until now, it was only possible to match by setting an unused orig address on an existing recomp entity. This will finally allow us to have inserting data and matching entities as two distinct operations. I have new modularized match functions mostly done in the stash. We can add these incrementally and now finally have some good tests on the matching behavior.

I only used the batch in a few places in core.py for now. These new matching functions will use it extensively and we can refactor the existing code that works with thunks, vtordisps, and strings.

@disinvite disinvite requested review from jonschz and madebr January 27, 2025 16:19
return self

def __exit__(self, exc_type, exc_value, exc_traceback):
self.commit()
Copy link
Collaborator

@madebr madebr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/reference/datamodel.html#object.exit

This will unconditionally commit on exit. Shouldn't it roll back, or cancel the operations when the exception argument is non-null?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also remember that there is some helper function and syntax trick involving yield to efficiently write these scopes. Can't remember from the top of my head, though

Copy link
Collaborator

@madebr madebr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also remember that there is some helper function and syntax trick involving yield to efficiently write these scopes. Can't remember from the top of my head, though

You're probably thinking about contextlib.contextmanager (or similar functions in that module)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you. I added some basic handling to __exit__.

commit might be a poor name because the batch operation itself is only loosely atomic. Only the bulk_match function uses a SQL transaction because it requires two statements.

@disinvite
Copy link
Collaborator Author

I just added a SQL transaction to batch.commit() and some tests to demonstrate. I checked on my project branch (with upcoming changes that use this API more) and the performance difference of doing it all in a transaction seems negligible.

@jonschz
Copy link
Collaborator

jonschz commented Jan 29, 2025

I am rather busy at the moment. Not sure if I can review before Sunday. If everything works and @madebr hasn't found anything else, feel free to merge.

@disinvite disinvite merged commit 0fbd24c into isledecomp:master Jan 30, 2025
11 checks passed
@disinvite disinvite deleted the batch2 branch January 30, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants