Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using semidbm in a shelve object - a code snippet #21

Open
ianozsvald opened this issue Dec 10, 2019 · 0 comments
Open

Using semidbm in a shelve object - a code snippet #21

ianozsvald opened this issue Dec 10, 2019 · 0 comments

Comments

@ianozsvald
Copy link

Using Python 3.7's shelve with the default dbm I run into the same size limitation noted here http://jamesls.com/semidbm-a-pure-python-dbm.html (notably HASH: Out of overflow pages. Increase page size) using a Mac. Having installed gdbm it won't appear with my Conda Pythons.

semidbm came to the rescue using the following code snippet. The class and function are lifted directly from Python's shelve.py. I see no speed difference but I do see a an ability to scale to more objects that dbm lacked. gdbm should have provided a similar solution but on my Anaconda distribution I can't get it to work ( for reference import dbm.gnu generates ModuleNotFoundError: No module named '_gdbm').

Thank you for this package! I hope that the snippet below helps other who use shelve on a large dataset.

from shelve import Shelf
class DbfilenameShelfSemidbm(Shelf):
    """Shelf implementation using the "dbm" generic dbm interface.

    This is initialized with the filename for the dbm database.
    See the module's __doc__ string for an overview of the interface.
    """

    def __init__(self, filename, flag='c', protocol=None, writeback=False):
        import dbm
        Shelf.__init__(self, semidbm.open(filename, flag), protocol, writeback)


def open_semidbm(filename, flag='c', protocol=None, writeback=False):
    """Open a persistent dictionary for reading and writing.

    The filename parameter is the base filename for the underlying
    database.  As a side-effect, an extension may be added to the
    filename and more than one file may be created.  The optional flag
    parameter has the same interpretation as the flag parameter of
    dbm.open(). The optional protocol parameter specifies the
    version of the pickle protocol.

    See the module's __doc__ string for an overview of the interface.
    """

    return DbfilenameShelfSemidbm(filename, flag, protocol, writeback)

Timing on a smaller client task (prior to the HASH error above):

  • dbm inside a default shelve 1m20
  • semidbm inside the derived shelve 1m20 (the same as dbm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant