-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to walk the trie? #20
Comments
It is a marisa-trie C++ library limitation - see https://code.google.com/p/marisa-trie/issues/detail?id=9.
What is your use case? It is possible to implement |
Use case is anything like fuzzy predictive search, where at each edge forward, a fuzzy comparison is made using some fuzzy distance metrics coupled with score to see if it's worth moving forward past that edge. With the current implementation, I'd need to instead proactively check to see if edges exist (so instead of By the way, I hope this doesn't come off as criticism. I think all your trie implementations are awesome, and looked through DAWG and dat-trie for similar functionality last night. I'll try to add this functionality myself if necessary, but wanted to check if I'm missing something first.
|
Thanks! datrie has State and Iterator objects which allow to implement any iteration in user code. It has some issues though - memory savings are much smaller than in DAWG/DAFSA or marisa-trie, and building of datrie can be slow for large tries (see pytries/datrie#12). I don't use datrie myself, and the development stalled because for my use cases one of DAWG, marisa-trie or hat-trie usually work better than datrie. There are also some open issues in tracker. DAWG doesn't have a public API for custom iteration, but this API can be implemented. Check https://github.com/kmike/DAWG-Python/blob/master/dawg_python/wrapper.py and how are classes from there used in https://github.com/kmike/DAWG-Python/blob/master/dawg_python/dawgs.py (DAWG package has a very similar coe, but in Cython). Pull requests are welcome :) Unfortunately we can't create a similar API for marisa-trie because of C++ library limitations. I haven't added a public API for iteration to DAWG because of speed issues. Lookups are going to be much slower if iteration is done in Python instead of Cython, that's why there are many optimized methods for common use cases instead of a couple of generic iterator helpers. |
If you have suggestions for new optimized methods for DAWG (e.g. for fuzzy match) then I'm open to PRs; it could be a good addition to the library. |
Alright, I'll investigate a bit more, and if it still looks useful, I'll send over a PR. Closing the issue out here, since it'd be much easier to do for DAWG than the Marisa-Trie. |
Actually, this is the second time that this feature would be very useful to me, so I thought I might mention it here. I also looked through other Python trie libraries (PyTrie, patricia-trie, python-trie) and neither of them support this. My usecase (which I would expect to be not that rare) is walking down the tree while computing Levenshtein distance, I use it for fuzzy matches similar to @EliFinkelshteyn. |
I don't see any methods that would allow for just proceeding one edge in the trie, for example:
Using the prefixes method for something like this would be very expensive if the trie is big and the prefix is short. Is there some technical detail I'm missing for why implementing a function for this would be costly, or some other reason this isn't implemented? It seems like it's a necessary step in the traversal with the
prefixes()
method anyway, and quite useful for predictive lookup operations.The text was updated successfully, but these errors were encountered: