Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AhoCorasick #169

Merged
merged 13 commits into from
May 29, 2024
Merged
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ See the installation instructions:
:hidden:

main-algorithms/action/index.rst
main-algorithms/aho-corasick/index.rst
main-algorithms/congruences/index
main-algorithms/froidure-pin/index
main-algorithms/kambites/index
Expand Down
219 changes: 121 additions & 98 deletions docs/source/libsemigroups.bib
Original file line number Diff line number Diff line change
Expand Up @@ -6,121 +6,144 @@

%% Saved with string encoding Unicode (UTF-8)

@article{Aho1975aa,
author = {Aho, Alfred V. and Corasick, Margaret J.},
journal = {Communications of the {ACM}},
month = { June },
year = {1975},
title = {Efficient string matching: an aid to bibliographic search},
volume = {18},
issn = {0001-0782, 1557-7317},
url = {https://dl.acm.org/doi/10.1145/360825.360855},
doi = {10.1145/360825.360855},
shorttitle = {Efficient string matching},
abstract = {This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.},
pages = {333--340},
number = {6},
urldate = {2024-03-26},
date = {1975-06},
langid = {english}
}

@article{Gilman1979,
Author = {Robert H Gilman},
Journal = {Journal of Algebra},
Month = { April },
Number = {2},
Pages = {544--554},
Title = {Presentations of groups and monoids},
Volume = {57},
Year = {1979}}
author = {Robert H Gilman},
journal = {Journal of Algebra},
month = { April },
number = {2},
pages = {544--554},
title = {Presentations of groups and monoids},
volume = {57},
year = {1979}
}

@misc{Holt2018aa,
Author = {Holt, Derek},
Title = {kbmag -- {GAP} package, {V}ersion 1.5.9},
Month = { July },
Year = { 2019 },
Url = {https://gap-packages.github.io/kbmag/},
author = {Holt, Derek},
title = {kbmag -- {GAP} package, {V}ersion 1.5.9},
month = { July },
year = { 2019 },
url = {https://gap-packages.github.io/kbmag/}
}

@book{Jantzen2012aa,
Author = {Jantzen, Matthias},
Date-Added = {2019-12-20 14:27:56 +0000},
Date-Modified = {2019-12-20 14:28:00 +0000},
Publisher = {Springer Science \& Business Media},
Title = {Confluent string rewriting},
Volume = {14},
Year = {2012}}
author = {Jantzen, Matthias},
date-added = {2019-12-20 14:27:56 +0000},
date-modified = {2019-12-20 14:28:00 +0000},
publisher = {Springer Science \& Business Media},
title = {Confluent string rewriting},
volume = {14},
year = {2012}
}

@book{Sims1994aa,
Address = {Cambridge,, England, New York},
Author = {Sims, Charles C.},
Date-Added = {2019-10-23 12:55:38 +0100},
Date-Modified = {2019-10-23 12:55:38 +0100},
Isbn = {0-521-43213-8},
Publisher = {Cambridge University Press},
Series = {Encyclopedia of mathematics and its applications},
Title = {Computation with finitely presented groups},
Url = {http://opac.inria.fr/record=b1082972},
Year = 1994,
Bdsk-File-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhZYWxpYXNEYXRhXxDGLi4vLi4vLi4vTGlicmFyeS9Nb2JpbGUgRG9jdW1lbnRzL2NvbX5hcHBsZX5DbG91ZERvY3MvTWF0aHMvQmlidGV4L0ZpbGVkLyhFbmN5Y2xvcGVkaWEgb2YgTWF0aGVtYXRpY3MgYW5kIGl0cyBBcHBsaWNhdGlvbnMpIENoYXJsZXMgQy4gU2ltcy1Db21wdXRhdGlvbiB3aXRoIGZpbml0ZWx5IHByZXNlbnRlZCBncm91cHMtQ1VQICgxOTk0KS5kanZ1TxEDaAAAAAADaAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAAAAAAEJEAAH/////HyhFbmN5Y2xvcGVkaWEgb2YgI0ZGRkZGRkZGLmRqdnUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAwAHAAAKIGN1AAAAAAAAAAAAAAAAAAVGaWxlZAAAAgDJLzpVc2VyczpqZG06TGlicmFyeTpNb2JpbGUgRG9jdW1lbnRzOmNvbX5hcHBsZX5DbG91ZERvY3M6TWF0aHM6QmlidGV4OkZpbGVkOihFbmN5Y2xvcGVkaWEgb2YgTWF0aGVtYXRpY3MgYW5kIGl0cyBBcHBsaWNhdGlvbnMpIENoYXJsZXMgQy4gU2ltcy1Db21wdXRhdGlvbiB3aXRoIGZpbml0ZWx5IHByZXNlbnRlZCBncm91cHMtQ1VQICgxOTk0KS5kanZ1AAAOAPwAfQAoAEUAbgBjAHkAYwBsAG8AcABlAGQAaQBhACAAbwBmACAATQBhAHQAaABlAG0AYQB0AGkAYwBzACAAYQBuAGQAIABpAHQAcwAgAEEAcABwAGwAaQBjAGEAdABpAG8AbgBzACkAIABDAGgAYQByAGwAZQBzACAAQwAuACAAUwBpAG0AcwAtAEMAbwBtAHAAdQB0AGEAdABpAG8AbgAgAHcAaQB0AGgAIABmAGkAbgBpAHQAZQBsAHkAIABwAHIAZQBzAGUAbgB0AGUAZAAgAGcAcgBvAHUAcABzAC0AQwBVAFAAIAAoADEAOQA5ADQAKQAuAGQAagB2AHUADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgDHVXNlcnMvamRtL0xpYnJhcnkvTW9iaWxlIERvY3VtZW50cy9jb21+YXBwbGV+Q2xvdWREb2NzL01hdGhzL0JpYnRleC9GaWxlZC8oRW5jeWNsb3BlZGlhIG9mIE1hdGhlbWF0aWNzIGFuZCBpdHMgQXBwbGljYXRpb25zKSBDaGFybGVzIEMuIFNpbXMtQ29tcHV0YXRpb24gd2l0aCBmaW5pdGVseSBwcmVzZW50ZWQgZ3JvdXBzLUNVUCAoMTk5NCkuZGp2dQAAEwABLwAAFQACAAr//wAAAAgADQAaACQA7QAAAAAAAAIBAAAAAAAAAAUAAAAAAAAAAAAAAAAAAARZ},
Bdsk-Url-1 = {http://opac.inria.fr/record=b1082972}}
address = {Cambridge,, England, New York},
author = {Sims, Charles C.},
date-added = {2019-10-23 12:55:38 +0100},
date-modified = {2019-10-23 12:55:38 +0100},
isbn = {0-521-43213-8},
publisher = {Cambridge University Press},
series = {Encyclopedia of mathematics and its applications},
title = {Computation with finitely presented groups},
url = {http://opac.inria.fr/record=b1082972},
year = 1994,
bdsk-file-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhZYWxpYXNEYXRhXxDGLi4vLi4vLi4vTGlicmFyeS9Nb2JpbGUgRG9jdW1lbnRzL2NvbX5hcHBsZX5DbG91ZERvY3MvTWF0aHMvQmlidGV4L0ZpbGVkLyhFbmN5Y2xvcGVkaWEgb2YgTWF0aGVtYXRpY3MgYW5kIGl0cyBBcHBsaWNhdGlvbnMpIENoYXJsZXMgQy4gU2ltcy1Db21wdXRhdGlvbiB3aXRoIGZpbml0ZWx5IHByZXNlbnRlZCBncm91cHMtQ1VQICgxOTk0KS5kanZ1TxEDaAAAAAADaAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAAAAAAEJEAAH/////HyhFbmN5Y2xvcGVkaWEgb2YgI0ZGRkZGRkZGLmRqdnUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAwAHAAAKIGN1AAAAAAAAAAAAAAAAAAVGaWxlZAAAAgDJLzpVc2VyczpqZG06TGlicmFyeTpNb2JpbGUgRG9jdW1lbnRzOmNvbX5hcHBsZX5DbG91ZERvY3M6TWF0aHM6QmlidGV4OkZpbGVkOihFbmN5Y2xvcGVkaWEgb2YgTWF0aGVtYXRpY3MgYW5kIGl0cyBBcHBsaWNhdGlvbnMpIENoYXJsZXMgQy4gU2ltcy1Db21wdXRhdGlvbiB3aXRoIGZpbml0ZWx5IHByZXNlbnRlZCBncm91cHMtQ1VQICgxOTk0KS5kanZ1AAAOAPwAfQAoAEUAbgBjAHkAYwBsAG8AcABlAGQAaQBhACAAbwBmACAATQBhAHQAaABlAG0AYQB0AGkAYwBzACAAYQBuAGQAIABpAHQAcwAgAEEAcABwAGwAaQBjAGEAdABpAG8AbgBzACkAIABDAGgAYQByAGwAZQBzACAAQwAuACAAUwBpAG0AcwAtAEMAbwBtAHAAdQB0AGEAdABpAG8AbgAgAHcAaQB0AGgAIABmAGkAbgBpAHQAZQBsAHkAIABwAHIAZQBzAGUAbgB0AGUAZAAgAGcAcgBvAHUAcABzAC0AQwBVAFAAIAAoADEAOQA5ADQAKQAuAGQAagB2AHUADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgDHVXNlcnMvamRtL0xpYnJhcnkvTW9iaWxlIERvY3VtZW50cy9jb21+YXBwbGV+Q2xvdWREb2NzL01hdGhzL0JpYnRleC9GaWxlZC8oRW5jeWNsb3BlZGlhIG9mIE1hdGhlbWF0aWNzIGFuZCBpdHMgQXBwbGljYXRpb25zKSBDaGFybGVzIEMuIFNpbXMtQ29tcHV0YXRpb24gd2l0aCBmaW5pdGVseSBwcmVzZW50ZWQgZ3JvdXBzLUNVUCAoMTk5NCkuZGp2dQAAEwABLwAAFQACAAr//wAAAAgADQAaACQA7QAAAAAAAAIBAAAAAAAAAAUAAAAAAAAAAAAAAAAAAARZ},
bdsk-url-1 = {http://opac.inria.fr/record=b1082972}
}

@book{Knuth2009aa,
Author = {Knuth, Donald E.},
Date-Added = {2019-10-22 15:37:10 +0100},
Date-Modified = {2019-10-22 15:37:13 +0100},
Edition = {12th},
Isbn = {0321580508, 9780321580504},
Publisher = {Addison-Wesley Professional},
Title = {The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks \& Techniques; Binary Decision Diagrams},
Year = {2009}}
author = {Knuth, Donald E.},
date-added = {2019-10-22 15:37:10 +0100},
date-modified = {2019-10-22 15:37:13 +0100},
edition = {12th},
isbn = {0321580508, 9780321580504},
publisher = {Addison-Wesley Professional},
title = {The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks \& Techniques; Binary Decision Diagrams},
year = {2009}
}

@article{Jonusas2017aa,
Author = {Jonusas, Julius and Mitchell, James D. and Pfeiffer, Markus},
Date-Added = {2019-10-18 13:47:37 +0100},
Date-Modified = {2019-10-18 13:52:37 +0100},
Doi = {10.4171/PM/2001},
Fjournal = {Portugaliae Mathematica. A Journal of the Portuguese Mathematical Society},
Issn = {0032-5155},
Journal = {Port. Math.},
Mrclass = {20M10 (20B40 20M20 68Q42)},
Mrnumber = {3763897},
Mrreviewer = {Karim Ahmadidelir},
Number = {3},
Pages = {173--200},
Title = {Two variants of the {F}roidure-{P}in algorithm for finite semigroups},
Url = {https://doi.org/10.4171/PM/2001},
Volume = {74},
Year = {2017},
Bdsk-Url-1 = {https://doi.org/10.4171/PM/2001}}
author = {Jonusas, Julius and Mitchell, James D. and Pfeiffer, Markus},
date-added = {2019-10-18 13:47:37 +0100},
date-modified = {2019-10-18 13:52:37 +0100},
doi = {10.4171/PM/2001},
fjournal = {Portugaliae Mathematica. A Journal of the Portuguese Mathematical Society},
issn = {0032-5155},
journal = {Port. Math.},
mrclass = {20M10 (20B40 20M20 68Q42)},
mrnumber = {3763897},
mrreviewer = {Karim Ahmadidelir},
number = {3},
pages = {173--200},
title = {Two variants of the {F}roidure-{P}in algorithm for finite semigroups},
url = {https://doi.org/10.4171/PM/2001},
volume = {74},
year = {2017},
bdsk-url-1 = {https://doi.org/10.4171/PM/2001}
}

@incollection{Froidure1997aa,
Address = {Berlin},
Author = {Froidure, V{\'e}ronique and Pin, Jean-Eric},
Booktitle = {Foundations of computational mathematics ({R}io de {J}aneiro, 1997)},
Date-Added = {2019-10-18 13:44:44 +0100},
Date-Modified = {2019-10-18 13:44:44 +0100},
Mrclass = {20M10},
Mrnumber = {MR1661975 (99k:20111)},
Mrreviewer = {Jorge Almeida},
Pages = {112--126},
Publisher = {Springer},
Title = {Algorithms for computing finite semigroups},
Year = {1997},
address = {Berlin},
author = {Froidure, V{\'e}ronique and Pin, Jean-Eric},
booktitle = {Foundations of computational mathematics ({R}io de {J}aneiro, 1997)},
date-added = {2019-10-18 13:44:44 +0100},
date-modified = {2019-10-18 13:44:44 +0100},
mrclass = {20M10},
mrnumber = {MR1661975 (99k:20111)},
mrreviewer = {Jorge Almeida},
pages = {112--126},
publisher = {Springer},
title = {Algorithms for computing finite semigroups},
year = {1997}
}

@article {Konieczny1994aa,
AUTHOR = {Konieczny, Janusz},
TITLE = {Green's equivalences in finite semigroups of binary relations},
JOURNAL = {Semigroup Forum},
FJOURNAL = {Semigroup Forum},
VOLUME = {48},
YEAR = {1994},
NUMBER = {2},
PAGES = {235--252},
ISSN = {0037-1912},
MRCLASS = {20M20},
MRNUMBER = {1256691},
MRREVIEWER = {G. J. Lallement},
DOI = {10.1007/BF02573672},
@article{Konieczny1994aa,
author = {Konieczny, Janusz},
title = {Green's equivalences in finite semigroups of binary relations},
journal = {Semigroup Forum},
fjournal = {Semigroup Forum},
volume = {48},
year = {1994},
number = {2},
pages = {235--252},
issn = {0037-1912},
mrclass = {20M20},
mrnumber = {1256691},
mrreviewer = {G. J. Lallement},
doi = {10.1007/BF02573672}
}

@article {Lallement1990aa,
AUTHOR = {Lallement, Gerard and McFadden, Robert},
TITLE = {On the determination of {G}reen's relations in finite
transformation semigroups},
JOURNAL = {J. Symbolic Comput.},
FJOURNAL = {Journal of Symbolic Computation},
VOLUME = {10},
YEAR = {1990},
NUMBER = {5},
PAGES = {481--498},
ISSN = {0747-7171},
MRCLASS = {20M20 (68Q45)},
MRNUMBER = {1087717},
MRREVIEWER = {Dominique Perrin},
DOI = {10.1016/S0747-7171(08)80057-0},
@article{Lallement1990aa,
author = {Lallement, Gerard and McFadden, Robert},
title = {On the determination of {G}reen's relations in finite
transformation semigroups},
journal = {J. Symbolic Comput.},
fjournal = {Journal of Symbolic Computation},
volume = {10},
year = {1990},
number = {5},
pages = {481--498},
issn = {0747-7171},
mrclass = {20M20 (68Q45)},
mrnumber = {1087717},
mrreviewer = {Dominique Perrin},
doi = {10.1016/S0747-7171(08)80057-0}
}
69 changes: 69 additions & 0 deletions docs/source/main-algorithms/aho-corasick/ac-helpers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.. Copyright (c) 2024 Joseph Edwards

Distributed under the terms of the GPL license version 3.

The full license is in the file LICENSE, distributed with this software.


Aho-Corasick helper functions
=============================

.. automodule:: libsemigroups_pybind11.aho_corasick
:no-index:

.. doctest::

>>> from libsemigroups_pybind11 import AhoCorasick, aho_corasick
>>> # Construct an empty AhoCorasick
>>> ac = AhoCorasick()

>>> # Add words
>>> aho_corasick.add_word(ac, [0, 1, 0, 1])
4
>>> aho_corasick.add_word(ac, [0, 1, 1, 0])
6
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 1])
7
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 0])
8

>>> # Can't add a word that already exists
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 0])
Traceback (most recent call last):
...
LibsemigroupsError: the word [0, 1, 1, 0, 0] given by the arguments [first, last) already belongs to the trie

>>> # Remove words
>>> aho_corasick.rm_word(ac, [0, 1, 0, 1])
4

>>> # Can't remove a word that is not a terminal node in the trie
>>> aho_corasick.rm_word(ac, [0, 1, 0, 1])
Traceback (most recent call last):
...
LibsemigroupsError: cannot remove the word [0, 1, 0, 1] given by the arguments [first, last), as it does not correspond to a node in the trie

>>> # Traverse
>>> aho_corasick.traverse_word(ac, 5, [0, 1])
7
>>> aho_corasick.traverse_word(ac, [0, 1, 0, 1, 1, 0])
6


Contents
--------
.. currentmodule:: libsemigroups_pybind11.aho_corasick

.. autosummary::
:nosignatures:

add_word
rm_word
traverse_word

Full API
--------

.. automodule:: libsemigroups_pybind11.aho_corasick
:members:
:imported-members:
37 changes: 37 additions & 0 deletions docs/source/main-algorithms/aho-corasick/aho-corasick.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. Copyright (c) 2024 Joseph Edwards

Distributed under the terms of the GPL license version 3.

The full license is in the file LICENSE, distributed with this software.

.. currentmodule:: _libsemigroups_pybind11

Aho-Corasick
============
.. autoclass:: AhoCorasick
:doc-only:
:class-doc-from: class

Contents
--------

.. autosummary::
:nosignatures:

AhoCorasick.child
AhoCorasick.height
AhoCorasick.init
AhoCorasick.number_of_nodes
AhoCorasick.signature
AhoCorasick.suffix_link
AhoCorasick.traverse
AhoCorasick.validate_active_node_index
AhoCorasick.validate_node_index

Full API
--------

.. autoclass:: AhoCorasick
:no-doc:
:special-members: __init__
:members:
16 changes: 16 additions & 0 deletions docs/source/main-algorithms/aho-corasick/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. Copyright (c) 2024 Joseph Edwards

Distributed under the terms of the GPL license version 3.

The full license is in the file LICENSE, distributed with this software.

Aho-Corasick
============

This page describes the functionality related to Aho-Corasick's algorithm.

.. toctree::
:maxdepth: 1

aho-corasick
ac-helpers
1 change: 1 addition & 0 deletions libsemigroups_pybind11/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
one,
domain,
image,
AhoCorasick,
)
except ModuleNotFoundError as e:
raise ModuleNotFoundError(
Expand Down
Loading