Skip to content

Notes on hash consing

Hayley Patton edited this page Jul 1, 2021 · 8 revisions

The old hash consing implementation sucked. It was slow. Here are the notes I took on the day that I made it go fast.

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (make-dfa-from-expression
                         (make-search-machine
                          (parse-regular-expression "Bombers fly at zero feet."))))
138.52 milliseconds

Looking through type.lisp, which miraculously is about a year old, and profiling information, we note:

  • We look up weak tables for each RE type in another table, which is unnecessary.
  • Weak tables on SBCL require locking which is also slow.

My first idea is to remove the first indirection, putting each table in a variable rather than this two-layer mess.

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (make-dfa-from-expression
                         (make-search-machine
                          (parse-regular-expression "Bombers fly at zero feet."))))
130.11 milliseconds

Okay, an improvement, sure, but not amazing.

Then it might be a good idea to replace weak tables with non-weak tables with dynamic extent. Such tables would not leak too much memory as intermediate data from a compilation is un-rooted after compilation. Locks are also unnecessary as each compilation can be done with separate tables.

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (with-hash-consing-tables ()
                          (make-dfa-from-expression
                           (make-search-machine
                            (parse-regular-expression "Bombers fly at zero feet.")))))
107.32 milliseconds

Better I suppose. After hash consing derivative and nullable and friends, I'd call it bearable:

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (with-hash-consing-tables ()
                          (make-dfa-from-expression
                           (make-search-machine
                            (parse-regular-expression "Bombers fly at zero feet.")))))
39.75 milliseconds

After consulting with clim.flamegraph, the tag functions are quite hot, and they can be cached in the object to avoid a hash table lookup too:

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (with-hash-consing-tables ()
                          (make-dfa-from-expression
                           (make-search-machine
                            (parse-regular-expression "Bombers fly at zero feet.")))))
18.89 milliseconds

Then, after noticing with the encapsulating sb-profile profiler that we call derivative and nullable a lot of times, I wondered if the derivative classes were being computed right. Checking how we handle joins....

       (if (nullable r)
           (merge-sets (derivative-classes r)
                       (derivative-classes s))
           (derivative-classes r)))

No, nullable returns a RE and not a Boolean now. Dammit. Fix that...

ONE-MORE-RE-NIGHTMARE> (the-cost-of-nothing:bench
                        (with-hash-consing-tables ()
                          (make-dfa-from-expression
                           (make-search-machine
                            (parse-regular-expression "Bombers fly at zero feet.")))))
3.93 milliseconds

The moral of the story: algorithms are everything usually. At least everything else is really fast now!

But at the end of the day, we intend to use Boyer-Moore-Horspool or a SIMD searching algorithm so a DFA for such a long string would never be constructed. It is still a simple example of the lousiness of how we hash cons.

Clone this wiki locally