Skip to content

Commit

Permalink
Improve usage messages.
Browse files Browse the repository at this point in the history
  • Loading branch information
tpn committed Feb 19, 2020
1 parent 96c7ac2 commit b285c0b
Show file tree
Hide file tree
Showing 4 changed files with 74 additions and 155 deletions.
17 changes: 0 additions & 17 deletions include/PerfectHash.h
Original file line number Diff line number Diff line change
Expand Up @@ -2713,23 +2713,6 @@ IsValidPerfectHashTableCreateParameterId(
// registering the best graph, which should give more clarity to the
// role of the X-macro.
//
// N.B. The following coverage types are intended to generate worst-case hash
// tables compared to their best-case counterparts. They can be useful
// during development and performance testing to assess the performance
// benefit, if any, of things like reduced cache lines, etc. The types
// follow:
//
// LowestMaxGraphTraversalDepth
// LowestTotalGraphTraversals
// LowestNumberOfEmptyPages
// LowestNumberOfEmptyLargePages
// LowestNumberOfEmptyCacheLines
// LowestNumberOfEmptyPagesUsedByKeysSubset
// LowestNumberOfEmptyLargePagesUsedByKeysSubset
// LowestNumberOfEmptyCacheLinesUsedByKeysSubset
// LowestMaxAssignedPerCacheLineCount
// LowestMaxAssignedPerCacheLineCountForKeysSubset
//

#define BEST_COVERAGE_TYPE_TABLE(FIRST_ENTRY, ENTRY, LAST_ENTRY) \
FIRST_ENTRY(NumberOfEmptyPages, Highest, >) \
Expand Down
106 changes: 37 additions & 69 deletions include/PerfectHashErrors.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,28 +255,6 @@ Module Name:
// 25 Multiply (2)
// 26 MultiplyXor (4)
//
// N.B. The lowest latency hash functions with good solving ability, in order of
// ascending latency, are: Crc32RotateX, Crc32RotateXY, Crc32RotateWXYZ.
// You should try these hash functions first and see if a solution can be
// found without a table resize occurring. Failing that, the Jenkins routine
// has been observed to be the least likely to require a table resize on a
// given key set -- however, it does have the highest latency of all the
// hash functions above (anywhere from 7x-10x the latency of Crc32RotateX).
//
// (The difference in latency between the X, XY and WXYZ functions is minimal;
// only a few cycles.)
//
// N.B. The three most recent hash functions are now exhibiting latency on-par with
// the Crc32Rotate functions, but with the added benefit of requiring no table
// resize events on the https://github.com/tpn/perfecthash-keys/sys32 input
// set of keys. That is, these routines should be tried in this order and
// compared against the Crc32Rotate rouintes:
//
// ShiftMultiplyXorShift
// RotateMultiplyXorRotate
// ShiftMultiplyXorShift2
// RotateMultiplyXorRotate2
//
// Mask Functions:
//
// ID | Name
Expand Down Expand Up @@ -562,18 +540,31 @@ Module Name:
//
// Valid coverage types:
//
// HighestNumberOfEmptyPages
// HighestNumberOfEmptyLargePages
// HighestNumberOfEmptyCacheLines
// HighestMaxGraphTraversalDepth
// HighestTotalGraphTraversals
// HighestMaxAssignedPerCacheLineCount
//
// This predicate is based on the notion that a high number of
// empty cache lines implies a lower number of cache lines are
// required for the table data, which means better clustering of
// table data, which could result in fewer cache misses, which
// would yield greater performance.
// LowestNumberOfEmptyPages
// LowestNumberOfEmptyLargePages
// LowestNumberOfEmptyCacheLines
// LowestMaxGraphTraversalDepth
// LowestTotalGraphTraversals
// LowestMaxAssignedPerCacheLineCount
//
// HighestNumberOfEmptyPages
// HighestNumberOfEmptyLargePages
// The following predicates must be used in conjunction with --KeysSubset:
//
// As above, but for pages and large pages, respectively.
// HighestMaxAssignedPerCacheLineCountForKeysSubset
// HighestNumberOfPagesUsedByKeysSubset
// HighestNumberOfLargePagesUsedByKeysSubset
// HighestNumberOfCacheLinesUsedByKeysSubset
//
// LowestMaxAssignedPerCacheLineCountForKeysSubset
// LowestNumberOfPagesUsedByKeysSubset
// LowestNumberOfLargePagesUsedByKeysSubset
// LowestNumberOfCacheLinesUsedByKeysSubset
//
// Console Output Character Legend
//
Expand Down Expand Up @@ -894,54 +885,31 @@ Module Name:
//
// Valid coverage types:
//
// HighestNumberOfEmptyCacheLines
//
// This predicate is based on the notion that a high number of
// empty cache lines implies a lower number of cache lines are
// required for the table data, which means better clustering of
// table data, which could result in fewer cache misses, which
// would yield greater performance.
//
// HighestNumberOfEmptyPages
// HighestNumberOfEmptyLargePages
//
// As above, but for pages and large pages, respectively.
//
// HighestMaxAssignedPerCacheLineCount
//
// A histogram is maintained of the number of assigned values per
// cache line; this predicate selects the graph with the highest
// histogram count (cache line occupancy) for a given graph.
//
// HighestNumberOfEmptyCacheLines
// HighestMaxGraphTraversalDepth
// HighestTotalGraphTraversals
// HighestMaxAssignedPerCacheLineCount
//
// This predicate selects the graph with the highest recursive
// traversal depth encountered during the graph assignment stage.
// A high value for this metric is indicative of clustering of
// vertices for one half of an assigned table lookup (and thus,
// may result in a solution with better cache behavior).
// LowestNumberOfEmptyPages
// LowestNumberOfEmptyLargePages
// LowestNumberOfEmptyCacheLines
// LowestMaxGraphTraversalDepth
// LowestTotalGraphTraversals
// LowestMaxAssignedPerCacheLineCount
//
// N.B. The following predicates must be used in conjunction with
// --KeysSubset.
// The following predicates must be used in conjunction with --KeysSubset:
//
// LowestNumberOfCacheLinesUsedByKeysSubset
//
// This predicate is used to to search for solutions where the
// most frequent keys consume the lowest number of cache lines.
// It is useful in scenarios where the frequency of individual
// keys being looked up is heavily skewed toward a small subset.
// For example, if 90%% of the lookups occur for 10% of the keys,
// the fewer cache lines occupied by those keys, the better.
// HighestMaxAssignedPerCacheLineCountForKeysSubset
// HighestNumberOfPagesUsedByKeysSubset
// HighestNumberOfLargePagesUsedByKeysSubset
// HighestNumberOfCacheLinesUsedByKeysSubset
//
// LowestMaxAssignedPerCacheLineCountForKeysSubset
// LowestNumberOfPagesUsedByKeysSubset
// LowestNumberOfLargePagesUsedByKeysSubset
//
// As above, but for pages and large pages, respectively.
//
// HighestMaxAssignedPerCacheLineCountForKeysSubset
//
// Like HighestMaxAssignedPerCacheLineCount, but for a subset of
// keys.
// LowestNumberOfCacheLinesUsedByKeysSubset
//
// --KeysSubset=N,N+1[,N+2,N+3,...] (e.g. --KeysSubset=10,50,123,600,670)
//
Expand Down
106 changes: 37 additions & 69 deletions src/PerfectHash/PerfectHashErrors.mc
Original file line number Diff line number Diff line change
Expand Up @@ -201,28 +201,6 @@ Hash Functions:
25 Multiply (2)
26 MultiplyXor (4)
N.B. The lowest latency hash functions with good solving ability, in order of
ascending latency, are: Crc32RotateX, Crc32RotateXY, Crc32RotateWXYZ.
You should try these hash functions first and see if a solution can be
found without a table resize occurring. Failing that, the Jenkins routine
has been observed to be the least likely to require a table resize on a
given key set -- however, it does have the highest latency of all the
hash functions above (anywhere from 7x-10x the latency of Crc32RotateX).
(The difference in latency between the X, XY and WXYZ functions is minimal;
only a few cycles.)
N.B. The three most recent hash functions are now exhibiting latency on-par with
the Crc32Rotate functions, but with the added benefit of requiring no table
resize events on the https://github.com/tpn/perfecthash-keys/sys32 input
set of keys. That is, these routines should be tried in this order and
compared against the Crc32Rotate rouintes:
ShiftMultiplyXorShift
RotateMultiplyXorRotate
ShiftMultiplyXorShift2
RotateMultiplyXorRotate2
Mask Functions:
ID | Name
Expand Down Expand Up @@ -507,18 +485,31 @@ Table Create Parameters:
Valid coverage types:
HighestNumberOfEmptyPages
HighestNumberOfEmptyLargePages
HighestNumberOfEmptyCacheLines
HighestMaxGraphTraversalDepth
HighestTotalGraphTraversals
HighestMaxAssignedPerCacheLineCount
This predicate is based on the notion that a high number of
empty cache lines implies a lower number of cache lines are
required for the table data, which means better clustering of
table data, which could result in fewer cache misses, which
would yield greater performance.
LowestNumberOfEmptyPages
LowestNumberOfEmptyLargePages
LowestNumberOfEmptyCacheLines
LowestMaxGraphTraversalDepth
LowestTotalGraphTraversals
LowestMaxAssignedPerCacheLineCount
HighestNumberOfEmptyPages
HighestNumberOfEmptyLargePages
The following predicates must be used in conjunction with --KeysSubset:
As above, but for pages and large pages, respectively.
HighestMaxAssignedPerCacheLineCountForKeysSubset
HighestNumberOfPagesUsedByKeysSubset
HighestNumberOfLargePagesUsedByKeysSubset
HighestNumberOfCacheLinesUsedByKeysSubset
LowestMaxAssignedPerCacheLineCountForKeysSubset
LowestNumberOfPagesUsedByKeysSubset
LowestNumberOfLargePagesUsedByKeysSubset
LowestNumberOfCacheLinesUsedByKeysSubset
Console Output Character Legend
Expand Down Expand Up @@ -837,54 +828,31 @@ Table Create Parameters:
Valid coverage types:
HighestNumberOfEmptyCacheLines
This predicate is based on the notion that a high number of
empty cache lines implies a lower number of cache lines are
required for the table data, which means better clustering of
table data, which could result in fewer cache misses, which
would yield greater performance.
HighestNumberOfEmptyPages
HighestNumberOfEmptyLargePages
As above, but for pages and large pages, respectively.
HighestMaxAssignedPerCacheLineCount
A histogram is maintained of the number of assigned values per
cache line; this predicate selects the graph with the highest
histogram count (cache line occupancy) for a given graph.
HighestNumberOfEmptyCacheLines
HighestMaxGraphTraversalDepth
HighestTotalGraphTraversals
HighestMaxAssignedPerCacheLineCount
This predicate selects the graph with the highest recursive
traversal depth encountered during the graph assignment stage.
A high value for this metric is indicative of clustering of
vertices for one half of an assigned table lookup (and thus,
may result in a solution with better cache behavior).
LowestNumberOfEmptyPages
LowestNumberOfEmptyLargePages
LowestNumberOfEmptyCacheLines
LowestMaxGraphTraversalDepth
LowestTotalGraphTraversals
LowestMaxAssignedPerCacheLineCount
N.B. The following predicates must be used in conjunction with
--KeysSubset.
The following predicates must be used in conjunction with --KeysSubset:
LowestNumberOfCacheLinesUsedByKeysSubset
This predicate is used to to search for solutions where the
most frequent keys consume the lowest number of cache lines.
It is useful in scenarios where the frequency of individual
keys being looked up is heavily skewed toward a small subset.
For example, if 90%% of the lookups occur for 10% of the keys,
the fewer cache lines occupied by those keys, the better.
HighestMaxAssignedPerCacheLineCountForKeysSubset
HighestNumberOfPagesUsedByKeysSubset
HighestNumberOfLargePagesUsedByKeysSubset
HighestNumberOfCacheLinesUsedByKeysSubset
LowestMaxAssignedPerCacheLineCountForKeysSubset
LowestNumberOfPagesUsedByKeysSubset
LowestNumberOfLargePagesUsedByKeysSubset
As above, but for pages and large pages, respectively.
HighestMaxAssignedPerCacheLineCountForKeysSubset
Like HighestMaxAssignedPerCacheLineCount, but for a subset of
keys.
LowestNumberOfCacheLinesUsedByKeysSubset
--KeysSubset=N,N+1[,N+2,N+3,...] (e.g. --KeysSubset=10,50,123,600,670)
Expand Down
Binary file modified src/PerfectHash/PerfectHashErrors_English.bin
Binary file not shown.

0 comments on commit b285c0b

Please sign in to comment.