Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database updates should be performed thread safe #195

Open
devsnd opened this issue Dec 26, 2012 · 7 comments
Open

database updates should be performed thread safe #195

devsnd opened this issue Dec 26, 2012 · 7 comments
Labels

Comments

@devsnd
Copy link
Owner

devsnd commented Dec 26, 2012

The database on the testserver was corrupted, just like @6arms1leg reported once. I guess this is because of us not respecting the thread-safety for the database: As the process can be killed at any time, it seems that the database can become corrupted, if we don't pay attention to the changes we make in different threads, even when the tables don't seem to overlap or anything.

@tilboerner
Copy link
Collaborator

Yeah, this concerns the other databases, too. I'm still secretly toying around with ideas for a database module, and as it happens, I read up on sqlite thread safety only yesterday. I became a little concerned, but I think I found just the thing to deal with it: thread-local database connections.

I don't have much time right now, but I'm almost done with what might be a decent half of a solution that I'll commit, and which one of us can pick up and run with. Might be me, but probably not this week.

@tilboerner
Copy link
Collaborator

see also 5855234 (moved Table* classes to new database module)

@tilboerner
Copy link
Collaborator

SQLDatabase is only sitting there right now, unused, but if everyone accesses databases through it, threads won't share a connection anymore. CherryPy reuses threads via pooling, right? Otherwise it'd probably make more sense to have a connection pool instead...

I haven't done much systematic testing, especially not if there are any situations where connections remain open. I'm relying on sqlite3 to clean up after itself. The C sources of sqlite3.Connection explicitly close open connections on de-allocation, so I think it should work.

The question remains what really led to the corruption of the db files. I'd feel better if we could reliably reproduce the scenario that does it.

@tilboerner
Copy link
Collaborator

Solution: create a global lockfile during updates that prevents further updates from being run concurrently. In this case, an uncaught exception will be raised by SQLiteCache.

Also prevents starting multiple updates from the web interface by clicking the button more than once.

@tilboerner
Copy link
Collaborator

Not a good solution, after all. It works for the file db, but we do concurrent writes to the other databases, too. To be safe, there should be a solution that works for all cases. I see the following options:

  • Threadlocal connections. Since cherrypy pools its threads (it does, right?), this should be feasible, provided cleanup works reliably when the threads die, even in pypy.
  • Open a new connection for each bunch of associated database ops. Might be doable because the overhead of creating an sqlite connection is comparably light; the performance penalty might increase if we decide to switch to another DBMS at some point.
  • A singleton accessor for database ops in a dedicated thread. Interested threads could send messages containing tasks, with the option to wait for an answer.

With the exception of "fire and forget" writes in the last option, all operations are potentially blocking for the caller, since they can encounter locks. Therefore, there need to be reasonable timeouts which are handled gracefully when they occur.

EDIT: Found this write-up about Multi-threaded Access to an SQLite3 Database. Adds connection pooling as an option, pretty much rules out option 2, and seems to go with option 3 in the end.

@devsnd
Copy link
Owner Author

devsnd commented Apr 5, 2013

A singleton accessor for database ops in a dedicated thread. Interested threads could send messages containing tasks, with the option to wait for an answer.

👍 I like that option best: Not too hard to implement and is reasonable clear.
What I like most: If we'd change the database backend someday, it leaves the option to reenable concurrent access to the database for specific backends and so forth.

I don't know why, but I for me message passing is something like optocouplers; Almost impossible to short circuit stuff; never a bad idea. 🔌

@tilboerner
Copy link
Collaborator

Not too hard to implement

... and not too easy, either. For file db updates, we want batch commits of changes that rely on intermediate results, while taking care not to starve anybody else; other databases need transactions around well-defined sets of statements.

Interesting! 🍕 🍴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants