Skip to content

A performant, parallel, probabilistic, random acyclic-graph, low-latency, perfect hash generation library.

License

Notifications You must be signed in to change notification settings

tpn/perfecthash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perfect Hash

Appveyor Badge

A library for creating perfect hash tables.

This README is a work-in-progress.

Usage

PerfectHashBulkCreate.exe

Usage: PerfectHashBulkCreate.exe
    <KeysDirectory> <OutputDirectory>
    <Algorithm> <HashFunction> <MaskFunction>
    <MaximumConcurrency>
    [BulkCreateFlags] [KeysLoadFlags] [TableCreateFlags]
    [TableCompileFlags] [TableCreateParameters]

Bulk Create Flags:

    --SkipTestAfterCreate

        Normally, after a table has been successfully created, it is tested.
        Setting this flag disables this behavior.

        N.B. This will also disable benchmarking, so no performance information
             will be present in the .csv output file.

    --Compile

        Compiles the table after creation.

        N.B. msbuild.exe must be on the PATH environment variable for this
             to work.  (The current error message if this isn't the case is
             quite cryptic.)

Keys Load Flags:

    --TryLargePagesForKeysData

        Tries to allocate the keys buffer using large pages.

    --SkipKeysVerification

        Skips the logic that enumerates all keys after loading and a) ensures
        they are sorted, and b) constructs a keys bitmap.  If you can be certain
        the keys are sorted, specifying this flag can provide a speedup when
        loading large key sets.

    --DisableImplicitKeyDownsizing

        When loading keys that are 64-bit (8 bytes), a bitmap is kept that
        tracks whether or not a given bit was seen across the entire key set.
        After enumerating the set, the number of zeros in the bitmap are
        counted; if this number is less than or equal to 32, it means that the
        entire key set can be compressed into 32-bit values with some parallel
        bit extraction logic (i.e. _pext_u64()).  As this has beneficial size
        and performance implications, when detected, the key load operation will
        implicitly heap-allocate another array and convert all the 64-bit keys
        into their unique 32-bit equivalent.  Specifying this flag will disable
        this behavior.

    --TryInferKeySizeFromKeysFilename

        The default key size is 32-bit (4 bytes).  When this flag is present,
        if the keys file name ends with "64.keys" (e.g. "foo64.keys"), the key
        size will be interpreted as 64-bit (8 bytes).  This flag takes
        precedence over the table create parameter --KeySizeInBytes.

Table Create Flags:

    --Silent

        Disables console printing of the dots, dashes and other characters used
        to (crudely) visualize the result of individual table create operations.

    --NoFileIo

        Disables writing of all files when a perfect hash solution has been
        found.  The only time you would use this flag from the console
        application is to observe the performance of table creation without
        performing any file I/O operations.

    --Paranoid

        Enables redundant checks in the routine that determines whether or not
        a generated graph is acyclic.

    --FirstGraphWins | --FindBestGraph

        --FirstGraphWins [default]

            This is the default behavior.  When searching for solutions in
            parallel, the first graph to be found, "wins".  i.e. it's the
            solution that is subsequently written to disk.

        --FindBestGraph

            Requires the following two table create parameters to be present:

                --BestCoverageAttempts=N
                --BestCoverageType=<CoverageType>

            The table create routine will then run until it finds the number of
            best coverage attempts specified.  At that point, the graph that was
            found to be the "best" based on the coverage type predicate "wins",
            and is subsequently saved to disk.

            N.B. This option is significantly more CPU intensive than the
                 --FirstGraphWins mode, however, it is highly probable that it
                 will find a graph that is better (based on the predicate) than
                 when in first graph wins mode.

    --SkipMemoryCoverageInFirstGraphWinsMode

        Skips calculating memory coverage information when in "first graph wins"
        mode.  This will result in the corresponding fields in the .csv output
        indicating 0.

    --SkipGraphVerification

        When set, skips the internal graph verification check that ensures a
        valid perfect hash solution has been found (i.e. with no collisions
        across the entire key set).

    --OmitCsvRowIfTableCreateFailed

        When present, omits writing a row in the .csv output file if table
        creation fails for a given keys file.

    --OmitCsvRowIfTableCreateSucceeded

        When present, omits writing a row in the .csv output file if table
        creation succeeded for a given keys file.

    --IndexOnly

        When set, affects the generated C files by defining the C preprocessor
        macro CPH_INDEX_ONLY, which results in omitting the compiled perfect
        hash routines that deal with the underlying table values array (i.e.
        any routine other than Index(); e.g. Insert(), Lookup(), Delete() etc),
        as well as the array itself.  This results in a size reduction of the
        final compiled perfect hash binary.  Additionally, only the .dll and
        BenchmarkIndex projects will be built, as the BenchmarkFull and Test
        projects require access to a table values array.  This flag is intended
        to be used if you only need the Index() routine and will be managing the
        table values array independently.

    --UseRwsSectionForTableValues

        When set, tells the linker to use a shared read-write section for the
        table values array, e.g.: #pragma comment(linker,"/section:.cphval,rws")
        This will result in the table values array being accessible across
        multiple processes.  Thus, the array will persist as long as one process
        maintains an open section (mapping); i.e. keeps the .dll loaded.

    --UseNonTemporalAvx2Routines

        When set, uses implementations of RtlCopyPages and RtlFillPages that
        use non-temporal hints.

    --IgnorePreviousTableSize

        When set, ignores any previously-recorded table sizes associated with
        the keys file for the given algorithm, hash function and masking type.

        N.B. To forcibly delete all previously-recorded table sizes from all
             keys in a directory, the following PowerShell snippet can be used:

             PS C:\Temp\keys> Get-Item -Path *.keys -Stream *.TableSize | Remove-Item

    --IncludeNumberOfTableResizeEventsInOutputPath

        When set, incorporates the number of table resize events encountered
        whilst searching for a perfect hash solution into the final output
        names, e.g.:

            C:\Temp\output\KernelBase_2485_1_Chm01_Crc32Rotate_And\...
                                           ^
                                           Number of resize events.

    --IncludeNumberOfTableElementsInOutputPath

        When set, incorporates the number of table elements (i.e. the final
        table size) into the output path, e.g.:

            C:\Temp\output\KernelBase_2485_16384_Chm01_Crc32Rotate_And\...
                                           ^
                                           Number of table elements.

        N.B. These two flags can be combined, yielding a path as follows:

            C:\Temp\output\KernelBase_2485_1_16384_Chm01_Crc32Rotate_And\...

        N.B. It is important to understand how table resize events impact the
             behavior of this program if one or both of these flags are present.
             Using the example above, the initial path that will be used for
             the solution will be:

                C:\Temp\output\KernelBase_2485_0_8192_Chm01_Crc32Rotate_And\...

             After the maximum number of attempts are reached, a table resize
             event occurs; the new path component will be:

                C:\Temp\output\KernelBase_2485_1_16384_Chm01_Crc32Rotate_And\...

             However, the actual renaming of the directory is not done until
             a solution has been found and all files have been written.  If
             this program is being run repeatedly, then the target directory
             will already exist.  This complicates things, as, unlike files,
             we can't just replace an existing directory with a new one.

             There are two ways this could be handled: a) delete all the
             existing files under the target path, then delete the directory,
             then perform the rename, or b) move the target directory somewhere
             else first, preserving the existing contents, then proceed with
             the rename.

             This program takes the latter approach.  The existing directory
             will be moved to:

                C:\Temp\output\old\KernelBase_1_16384_Chm01_Crc32Rotate_And_2018-11-19-011023-512\...

             The timestamp appended to the directory name is derived from the
             existing directory's creation time, which should ensure uniqueness.
             (In the unlikely situation the target directory already exists in
             the old subdirectory, the whole operation is aborted and the table
             create routine returns a failure.)

             The point of mentioning all of this is the following: when one or
             both of these flags are routinely specified, the number of output
             files rooted in the output directory's 'old' subdirectory will grow
             rapidly, consuming a lot of disk space.  Thus, if the old files are
             not required, it is recommended to regularly delete them manually.

Table Compile Flags:

    N/A

Table Create Parameters:

    --ValueSizeInBytes=4|8

        Sets the size, in bytes, of the value element that will be stored in the
        compiled perfect hash table via Insert().  Defaults to 4 bytes (ULONG).

    --MainWorkThreadpoolPriority=<High|Normal|Low> [default: Normal]
    --FileWorkThreadpoolPriority=<High|Normal|Low> [default: Normal]

        Sets the main work (i.e. the CPU-intensive graph solving) threadpool
        priority, or the file work threadpool priority, to the given value.

    --AttemptsBeforeTableResize=N [default = 18]

        Specifies the number of attempts at solving the graph that will be made
        before a table resize event will occur (assuming that resize events are
        permitted, as per the following flag).

    --MaxNumberOfTableResizes=N [default = 5]

        Maximum number of table resizes that will be permitted before giving up.

    --BestCoverageAttempts=N

        Where N is a positive integer, and represents the number of attempts
        that will be made at finding a "best" graph (based on the best coverage
        type requested below) before the create table routine returns.

    --BestCoverageType=<CoverageType>

        Indicates the predicate to determine what constitutes the best graph.

        Valid coverage types:

            HighestNumberOfEmptyCacheLines

                This predicate is based on the notion that a high number of
                empty cache lines implies a lower number of cache lines are
                required for the table data, which means better clustering of
                table data, which could result in fewer cache misses, which
                would yield greater performance.

            HighestNumberOfEmptyPages
            HighestNumberOfEmptyLargePages

                As above, but for pages and large pages, respectively.

Console Output Character Legend

 Char | Meaning

    .   Table created successfully.

    -   Table resize event occured.

    x   Failed to create a table.  The maximum number of attempts at trying to
        solve the table at a given size was reached, and no more resize attempts
        were possible (due to the maximum resize limit also being hit).

    F   Failed to create a table due to a target not being reached by a specific
        number of attempts or time duration.  Not yet implemented.

    *   None of the worker threads were able to allocate sufficient memory to
        attempt solving the graph.

    !   The system is out of memory.

    L   The system is running low on memory (a low memory event is triggered
        at about 90% RAM usage).  In certain situations we can detect this
        situation prior to actually running out of memory; in these cases,
        we abort the current table creation attempt (which will instantly
        relieve system memory pressure).

    V   The graph was created successfully, however, we weren't able to allocate
        enough memory for the table values array in order for the array to be
        used after creation.  This can be avoided by omitting --TestAfterCreate.

    T   The requested number of table elements was too large (exceeded 32 bits).

    S   A shutdown event was received.  This shouldn't be seen unless externally
        signaling the named shutdown event associated with a context.

Algorithms:

   ID | Name
    1   Chm01

Hash Functions:

   ID | Name (Number of Seeds)
    1   Crc32Rotate15 (2)
    2   Jenkins (2)
    3   JenkinsMod (2)
    4   RotateXor (4)
    5   AddSubXor (4)
    6   Xor (2)
    7   Scratch (4)
    8   Crc32RotateXor (3)
    9   Crc32 (2)
   10   Djb (2)
   11   DjbXor (2)
   12   Fnv (2)
   13   Crc32Not (2)
   14   Crc32RotateX (3)
   15   Crc32RotateXY (3)
   16   Crc32RotateWXYZ (3)

N.B. The lowest latency hash functions with good solving ability, in order of
     ascending latency, are: Crc32RotateX, Crc32RotateXY, Crc32RotateWXYZ.
     You should try these hash functions first and see if a solution can be
     found without a table resize occurring.  Failing that, the Jenkins routine
     has been observed to be the least likely to require a table resize on a
     given key set -- however, it does have the highest latency of all the
     hash functions above (anywhere from 7x-10x the latency of Crc32RotateX).

     (The difference in latency between the X, XY and WXYZ functions is minimal;
      only a few cycles.)

Mask Functions:

  ID | Name
   1   Modulus (does not work!)
   2   And

PerfectHashCreate.exe

Creates a single perfect hash table. Provides the ability to specify a subset of keys as part of the "find best graph" predicates.

Usage: PerfectHashCreate.exe
    <KeysPath> <OutputDirectory>
    <Algorithm> <HashFunction> <MaskFunction>
    <MaximumConcurrency>
    [KeysLoadFlags] [TableCreateFlags]
    [TableCompileFlags] [TableCreateParameters]

Keys Load Flags:

    --TryLargePagesForKeysData

        Tries to allocate the keys buffer using large pages.

    --SkipKeysVerification

        Skips the logic that enumerates all keys after loading and a) ensures
        they are sorted, and b) constructs a keys bitmap.  If you can be certain
        the keys are sorted, specifying this flag can provide a speedup when
        loading large key sets.

    --DisableImplicitKeyDownsizing

        When loading keys that are 64-bit (8 bytes), a bitmap is kept that
        tracks whether or not a given bit was seen across the entire key set.
        After enumerating the set, the number of zeros in the bitmap are
        counted; if this number is less than or equal to 32, it means that the
        entire key set can be compressed into 32-bit values with some parallel
        bit extraction logic (i.e. _pext_u64()).  As this has beneficial size
        and performance implications, when detected, the key load operation will
        implicitly heap-allocate another array and convert all the 64-bit keys
        into their unique 32-bit equivalent.  Specifying this flag will disable
        this behavior.

    --TryInferKeySizeFromKeysFilename

        The default key size is 32-bit (4 bytes).  When this flag is present,
        if the keys file name ends with "64.keys" (e.g. "foo64.keys"), the key
        size will be interpreted as 64-bit (8 bytes).  This flag takes
        precedence over the table create parameter --KeySizeInBytes.

Table Create Flags:

    --SkipTestAfterCreate

        Normally, after a table has been successfully created, it is tested.
        Setting this flag disables this behavior.

        N.B. This will also disable benchmarking, so no performance information
             will be present in the .csv output file.

    --Silent

        Disables console printing of the dots, dashes and other characters used
        to (crudely) visualize the result of individual table create operations.

    --NoFileIo

        Disables writing of all files when a perfect hash solution has been
        found.  The only time you would use this flag from the console
        application is to observe the performance of table creation without
        performing any file I/O operations.

    --Paranoid

        Enables redundant checks in the routine that determines whether or not
        a generated graph is acyclic.

    --FirstGraphWins | --FindBestGraph

        --FirstGraphWins [default]

            This is the default behavior.  When searching for solutions in
            parallel, the first graph to be found, "wins".  i.e. it's the
            solution that is subsequently written to disk.

        --FindBestGraph

            Requires the following two table create parameters to be present:

                --BestCoverageAttempts=N
                --BestCoverageType=<CoverageType>

            The table create routine will then run until it finds the number of
            best coverage attempts specified.  At that point, the graph that was
            found to be the "best" based on the coverage type predicate "wins",
            and is subsequently saved to disk.

            N.B. This option is significantly more CPU intensive than the
                 --FirstGraphWins mode, however, it is highly probable that it
                 will find a graph that is better (based on the predicate) than
                 when in first graph wins mode.

    --SkipMemoryCoverageInFirstGraphWinsMode

        Skips calculating memory coverage information when in "first graph wins"
        mode.  This will result in the corresponding fields in the .csv output
        indicating 0.

    --SkipGraphVerification

        When set, skips the internal graph verification check that ensures a
        valid perfect hash solution has been found (i.e. with no collisions
        across the entire key set).

    --OmitCsvRowIfTableCreateFailed

        When present, omits writing a row in the .csv output file if table
        creation fails for a given keys file.

    --OmitCsvRowIfTableCreateSucceeded

        When present, omits writing a row in the .csv output file if table
        creation succeeded for a given keys file.

    --IndexOnly

        When set, affects the generated C files by defining the C preprocessor
        macro CPH_INDEX_ONLY, which results in omitting the compiled perfect
        hash routines that deal with the underlying table values array (i.e.
        any routine other than Index(); e.g. Insert(), Lookup(), Delete() etc),
        as well as the array itself.  This results in a size reduction of the
        final compiled perfect hash binary.  Additionally, only the .dll and
        BenchmarkIndex projects will be built, as the BenchmarkFull and Test
        projects require access to a table values array.  This flag is intended
        to be used if you only need the Index() routine and will be managing the
        table values array independently.

    --UseRwsSectionForTableValues

        When set, tells the linker to use a shared read-write section for the
        table values array, e.g.: #pragma comment(linker,"/section:.cphval,rws")
        This will result in the table values array being accessible across
        multiple processes.  Thus, the array will persist as long as one process
        maintains an open section (mapping); i.e. keeps the .dll loaded.

    --UseNonTemporalAvx2Routines

        When set, uses implementations of RtlCopyPages and RtlFillPages that
        use non-temporal hints.

    --IgnorePreviousTableSize

        When set, ignores any previously-recorded table sizes associated with
        the keys file for the given algorithm, hash function and masking type.

        N.B. To forcibly delete all previously-recorded table sizes from all
             keys in a directory, the following PowerShell snippet can be used:

             PS C:\Temp\keys> Get-Item -Path *.keys -Stream *.TableSize | Remove-Item

    --IncludeNumberOfTableResizeEventsInOutputPath

        When set, incorporates the number of table resize events encountered
        whilst searching for a perfect hash solution into the final output
        names, e.g.:

            C:\Temp\output\KernelBase_2485_1_Chm01_Crc32Rotate_And\...
                                           ^
                                           Number of resize events.

    --IncludeNumberOfTableElementsInOutputPath

        When set, incorporates the number of table elements (i.e. the final
        table size) into the output path, e.g.:

            C:\Temp\output\KernelBase_2485_16384_Chm01_Crc32Rotate_And\...
                                           ^
                                           Number of table elements.

        N.B. These two flags can be combined, yielding a path as follows:

            C:\Temp\output\KernelBase_2485_1_16384_Chm01_Crc32Rotate_And\...

        N.B. It is important to understand how table resize events impact the
             behavior of this program if one or both of these flags are present.
             Using the example above, the initial path that will be used for
             the solution will be:

                C:\Temp\output\KernelBase_2485_0_8192_Chm01_Crc32Rotate_And\...

             After the maximum number of attempts are reached, a table resize
             event occurs; the new path component will be:

                C:\Temp\output\KernelBase_2485_1_16384_Chm01_Crc32Rotate_And\...

             However, the actual renaming of the directory is not done until
             a solution has been found and all files have been written.  If
             this program is being run repeatedly, then the target directory
             will already exist.  This complicates things, as, unlike files,
             we can't just replace an existing directory with a new one.

             There are two ways this could be handled: a) delete all the
             existing files under the target path, then delete the directory,
             then perform the rename, or b) move the target directory somewhere
             else first, preserving the existing contents, then proceed with
             the rename.

             This program takes the latter approach.  The existing directory
             will be moved to:

                C:\Temp\output\old\KernelBase_1_16384_Chm01_Crc32Rotate_And_2018-11-19-011023-512\...

             The timestamp appended to the directory name is derived from the
             existing directory's creation time, which should ensure uniqueness.
             (In the unlikely situation the target directory already exists in
             the old subdirectory, the whole operation is aborted and the table
             create routine returns a failure.)

             The point of mentioning all of this is the following: when one or
             both of these flags are routinely specified, the number of output
             files rooted in the output directory's 'old' subdirectory will grow
             rapidly, consuming a lot of disk space.  Thus, if the old files are
             not required, it is recommended to regularly delete them manually.

Table Compile Flags:

    N/A

Table Create Parameters:

    --ValueSizeInBytes=4|8

        Sets the size, in bytes, of the value element that will be stored in the
        compiled perfect hash table via Insert().  Defaults to 4 bytes (ULONG).

    --MainWorkThreadpoolPriority=<High|Normal|Low> [default: Normal]
    --FileWorkThreadpoolPriority=<High|Normal|Low> [default: Normal]

        Sets the main work (i.e. the CPU-intensive graph solving) threadpool
        priority, or the file work threadpool priority, to the given value.

    --AttemptsBeforeTableResize=N [default = 18]

        Specifies the number of attempts at solving the graph that will be made
        before a table resize event will occur (assuming that resize events are
        permitted, as per the following flag).

    --MaxNumberOfTableResizes=N [default = 5]

        Maximum number of table resizes that will be permitted before giving up.

    --BestCoverageAttempts=N

        Where N is a positive integer, and represents the number of attempts
        that will be made at finding a "best" graph (based on the best coverage
        type requested below) before the create table routine returns.

    --BestCoverageType=<CoverageType>

        Indicates the predicate to determine what constitutes the best graph.

        Valid coverage types:

            HighestNumberOfEmptyCacheLines

                This predicate is based on the notion that a high number of
                empty cache lines implies a lower number of cache lines are
                required for the table data, which means better clustering of
                table data, which could result in fewer cache misses, which
                would yield greater performance.

            HighestNumberOfEmptyPages
            HighestNumberOfEmptyLargePages

                As above, but for pages and large pages, respectively.

            HighestMaxAssignedPerCacheLineCount

                A histogram is maintained of the number of assigned values per
                cache line; this predicate selects the graph with the highest
                histogram count (cache line occupancy) for a given graph.

            HighestMaxGraphTraversalDepth

                This predicate selects the graph with the highest recursive
                traversal depth encountered during the graph assignment stage.
                A high value for this metric is indicative of clustering of
                vertices for one half of an assigned table lookup (and thus,
                may result in a solution with better cache behavior).

        N.B. The following predicates must be used in conjunction with
             --KeysSubset.

            LowestNumberOfCacheLinesUsedByKeysSubset

                This predicate is used to to search for solutions where the
                most frequent keys consume the lowest number of cache lines.
                It is useful in scenarios where the frequency of individual
                keys being looked up is heavily skewed toward a small subset.
                For example, if 90% of the lookups occur for 10 of the keys,
                the fewer cache lines occupied by those keys, the better.

            LowestNumberOfPagesUsedByKeysSubset
            LowestNumberOfLargePagesUsedByKeysSubset

                As above, but for pages and large pages, respectively.

            HighestMaxAssignedPerCacheLineCountForKeysSubset

                Like HighestMaxAssignedPerCacheLineCount, but for a subset of
                keys.

    --KeysSubset=N,N+1[,N+2,N+3,...] (e.g. --KeysSubset=10,50,123,600,670)

        Supplies a comma-separated list of keys in ascending key-value order.
        Must contain two or more elements.

Algorithms:

   ID | Name
    1   Chm01

Hash Functions:

   ID | Name (Number of Seeds)
    1   Crc32Rotate15 (2)
    2   Jenkins (2)
    3   JenkinsMod (2)
    4   RotateXor (4)
    5   AddSubXor (4)
    6   Xor (2)
    7   Scratch (4)
    8   Crc32RotateXor (3)
    9   Crc32 (2)
   10   Djb (2)
   11   DjbXor (2)
   12   Fnv (2)
   13   Crc32Not (2)
   14   Crc32RotateX (3)
   15   Crc32RotateXY (3)
   16   Crc32RotateWXYZ (3)

N.B. The lowest latency hash functions with good solving ability, in order of
     ascending latency, are: Crc32RotateX, Crc32RotateXY, Crc32RotateWXYZ.
     You should try these hash functions first and see if a solution can be
     found without a table resize occurring.  Failing that, the Jenkins routine
     has been observed to be the least likely to require a table resize on a
     given key set -- however, it does have the highest latency of all the
     hash functions above (anywhere from 7x-10x the latency of Crc32RotateX).

     (The difference in latency between the X, XY and WXYZ functions is minimal;
      only a few cycles.)

Mask Functions:

  ID | Name
   1   Modulus (does not work!)
   2   And