Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to secondary databases #570

Open
ederag opened this issue Feb 21, 2020 · 6 comments
Open

Access to secondary databases #570

ederag opened this issue Feb 21, 2020 · 6 comments

Comments

@ederag
Copy link
Collaborator

ederag commented Feb 21, 2020

There is a need to access other hamster databases as well, not only the main hamster.db,
e.g. for merging a foreign database into the main one (#340).
As recently discussed, this is not possible using the org.gnome.Hamster D-Bus service,
which is tied to the main database.

One possibility:

  1. add an option (--no-dbus ?) to hamster-cli to bypass D-Bus and use db.Storage directly.
  2. add an option (--db-file=<filename>) to hamster-cli.
  3. Edit: Still not fully sorted out, but the following one was probably being over-cautious,
    as correctly indicated by @GeraldJansen below.

    Of course this access is risky
    (there is no central server to avoid simultaneous write access in this case),
    so if the server is running, it should be forbidden to access the server database.
    (that one will remain hard-coded)

Still thinking, open to discussion.

@GeraldJansen
Copy link
Contributor

It seems to me that you underestimate sqlite3's capabilities for handling concurrent requests from multiple processes and overestimate the risk associated with accessing hamster.db directly, by-passing the hamster-service D-Bus interface (eg. here). From a bit of reading, for example

in my assessment, considering the small size our hamster.db files, the processing speed of even oldish laptops/desktops and the low likelihood of concurrent write requests, the "risk" associated with accessing hamster.db without using the D-Bus interface is extremely low.

@ederag
Copy link
Collaborator Author

ederag commented Feb 23, 2020

Thanks for the links ! Lots of food for thought.

@ederag ederag mentioned this issue Mar 1, 2020
@ederag
Copy link
Collaborator Author

ederag commented Mar 1, 2020

The links were spot-on,
and reading carefully is fully convincing that sqlite is robust and perfect for hamster,
sql writes from different processes just queue up.
(timeout is possible, but the database integrity remains)
It's not a matter of probabilities, but of how sqlite works.
What a relief, thanks Gerald !

Next step is to check that the start_transaction / end_transaction blocs are enough
to protect from database changes during python code execution,
in particular this call in the middle of __add_fact__ should be checked:

self.__solve_overlaps(start_time, end_time)

Not saying there is an issue, just that it deserves to be checked twice.

The first point is almost done (PR #573).
It will be used to test the second one, which should be simpler.

@matthijskooijman
Copy link
Member

I actually wonder if a --no-dbus option is needed at all for this? I can imagine two flavours of "access to secondary databases":

  1. Running hamster normally, with multiple databases active (e.g. loading data from two databases, writing data to one, or maybe other ways to decide where to write data).
  2. Running a one-off "merge data" process.

I think the point of this issue is option 2. As for option 1, this would probably require invasive changes, which would be better suited to include in the rewrite rather than the current codebase, I believe. Also, if this would be supported usecase, I think it should actually access both databases through dbus (maybe even present them as a single db somehow), rather than bypassing dbus for this usecase.

As for option 2., I wonder if this is not something that can be implemented already? I would think you can create a script that creates two instances of hamster.storage.db.Storage and merges them into either one (or a new db) without accessing dbus at all? Or maybe even more elegant, a script that creates one extra storage and merges that into the main storage accessed through dbus. I can imagine that this needs some refactoring for code that currently assumes there is a dbus connection available and/or assumes there is only one storage available, but I expect that this would need a lot less than what #573 implements.

@GeraldJansen
Copy link
Contributor

Yes, the point of this issue is option 2, with the proviso that "one-off" could include frequent (say daily) synchronization of databases (say work PC and home PC).

An external script (something like htool) is okay for the one-off case, but seriously risks obscurity and abandonment. Integrating that functionality into hamster-cli.py would be better. All that is needed is something like

hamster --no-dbus --db-file=<some/hamster.db> export tsv [date1 [time1]] [date2 [time2]] > some.tsv
hamster --no-dbus --db-file=<other/hamster.db> load tsv some.tsv

A lot of flexibility is gained if the whole CLI can use hamster.storage.db.Storage directly instead of the dbus interface, thereby avoiding having to be careful about stopping dbus services tied to a specific database file. This would also very convenient for mock databases for development and testing, so you don't have to continuously be worried about corrupting your precious "real" hamster.db.

@matthijskooijman
Copy link
Member

Yes, the point of this issue is option 2, with the proviso that "one-off" could include frequent (say daily) synchronization of databases (say work PC and home PC).

Ah, good point. I was thinking about an "importing data from an old machine" kind of setup, but tracking on multiple machines indeed makes sense.

And indeed, just using separate export and load commands seems like an elegant way to support this usecase, while also allowing the user to do some preprocessing before import if needed.

Of course, you can still do a slightly less flexible version of this without --no-dbus, e.g.:

hamster importdb <some/hamster.db>

which would import the contents of the given db into the main db (accessed through dbus). This could be implemented by creating a secondary db.Storage next to the main client.Storage, as I previously suggested.

However, I like the elegancy and flexibility of importing using a load tsv (or similar) command.

Thanks for adding some examples :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants