Using `semidbm` in a `shelve` object - a code snippet #21

ianozsvald · 2019-12-10T12:15:29Z

Using Python 3.7's shelve with the default dbm I run into the same size limitation noted here http://jamesls.com/semidbm-a-pure-python-dbm.html (notably HASH: Out of overflow pages. Increase page size) using a Mac. Having installed gdbm it won't appear with my Conda Pythons.

semidbm came to the rescue using the following code snippet. The class and function are lifted directly from Python's shelve.py. I see no speed difference but I do see a an ability to scale to more objects that dbm lacked. gdbm should have provided a similar solution but on my Anaconda distribution I can't get it to work ( for reference import dbm.gnu generates ModuleNotFoundError: No module named '_gdbm').

Thank you for this package! I hope that the snippet below helps other who use shelve on a large dataset.

from shelve import Shelf
class DbfilenameShelfSemidbm(Shelf):
    """Shelf implementation using the "dbm" generic dbm interface.

    This is initialized with the filename for the dbm database.
    See the module's __doc__ string for an overview of the interface.
    """

    def __init__(self, filename, flag='c', protocol=None, writeback=False):
        import dbm
        Shelf.__init__(self, semidbm.open(filename, flag), protocol, writeback)


def open_semidbm(filename, flag='c', protocol=None, writeback=False):
    """Open a persistent dictionary for reading and writing.

    The filename parameter is the base filename for the underlying
    database.  As a side-effect, an extension may be added to the
    filename and more than one file may be created.  The optional flag
    parameter has the same interpretation as the flag parameter of
    dbm.open(). The optional protocol parameter specifies the
    version of the pickle protocol.

    See the module's __doc__ string for an overview of the interface.
    """

    return DbfilenameShelfSemidbm(filename, flag, protocol, writeback)

Timing on a smaller client task (prior to the HASH error above):

dbm inside a default shelve 1m20
semidbm inside the derived shelve 1m20 (the same as dbm)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `semidbm` in a `shelve` object - a code snippet #21

Using `semidbm` in a `shelve` object - a code snippet #21

ianozsvald commented Dec 10, 2019

Using semidbm in a shelve object - a code snippet #21

Using semidbm in a shelve object - a code snippet #21

Comments

ianozsvald commented Dec 10, 2019

Using `semidbm` in a `shelve` object - a code snippet #21

Using `semidbm` in a `shelve` object - a code snippet #21