-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use curies
package for default implementation
#23
Conversation
Codecov Report
@@ Coverage Diff @@
## master #23 +/- ##
==========================================
- Coverage 82.20% 81.81% -0.40%
==========================================
Files 5 5
Lines 163 176 +13
==========================================
+ Hits 134 144 +10
- Misses 29 32 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@sierra-moxon @kshefchek this PR now replaces the default functionality with a fast implementation based on tries. It doesn't change any functionality for when custom dictionaries are passed - ideally, those should be pre-processed with the trie structure to take advantage of its speed-up |
prefixcommons/curie_util.py
Outdated
def get_prefixes(cmaps: Optional[List[PREFIX_MAP]] = None) -> List[str]: | ||
if cmaps is None: | ||
cmaps = default_curie_maps | ||
return sorted(default_converter.get_prefixes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cthoyt - does this change the order of the prefix maps from what they were before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appeared that the existing order was random (or was based on implicit assumptions about python data structures) so I don't think that there's a satisfying answer to your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, it could have been the case that prefixes were duplicated since the previous implementation's logic was to just extend a list. That doesn't make a lot of sense to me either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree; we can definitely make the code better. Is there a problem you are trying to solve for the sorted
addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the curies
package returns a set, which seems more meaningful since this doesn't have an inherent ordering. However, this package expects a list so sorting the set to make deterministically make it a list seemed reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes good sense, however, if we leave off the sorted
we might have a better chance of preserving the serendipitous effects here? (the worry for me is downstream dependencies that depend on a specific ordering; I've had cases where I've tried to reorder the many different contexts to take advantage of one or the other's more complete or updated maps, only to find it breaks project assumptions about prefixes, etc.). Tangential to this PR perhaps, but it would be good to know our strategy for aligning this codebase with bioregistry. #24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than sorted
just do list
? That's fine for me if you think it will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this update in 4987c55
@@ -136,7 +143,15 @@ def contract_uri( | |||
|
|||
""" | |||
if cmaps is None: | |||
cmaps = default_curie_maps | |||
# TODO warn if not shortest? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e., this implementation is not compatible with the argument shortest=False
since tries only return the longest
Closes #5
This PR uses the
curies
package to provide a much faster default implementation for expansion and contraction that uses thetrie
data structure. This doesn't address the case where users bring their own custom prefix maps, as this data structure needs to be pre-built.