-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of improvements for big datasets #53
Comments
In GitLab by @grst on Apr 1, 2020, 14:43 |
In GitLab by @grst on Apr 6, 2020, 08:26 changed title from List of improvements for {-many cell-}s to List of improvements for {+big dataset+}s |
In GitLab by @grst on Apr 6, 2020, 08:26 changed the description |
Maybe this can be abused to compute Levenshtein distance in linear time: and/or to prefilter which alignments to compute |
Consider storing clonotype x clonotypes matrices rather than cell x cell matrices. |
I tried to compute edit distance with symspellpy, but is actually 3x slower than the current levenshtein implementation. The reason for this is probably that the Levenshtein package is a highly optimized C library, while symspellpy is a pure python port of the SymSpell library. |
Maybe WFA provides even better alignment performances than parasail? |
mmseqs2 should do a great job for this. |
anndata
-> to be able to submit this easily to a clusteroptimize/parallelize construction and reduction of coord dictionary. (superseeded by Refactor CDR3-network construction. #191)further optimize matrix housekeeping in(superseeded by Refactor CDR3-network construction. #191)tcr_neighbors
The text was updated successfully, but these errors were encountered: