Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangenome graph creation optimizations #64

Open
kevinkle opened this issue Jun 19, 2019 · 9 comments
Open

Pangenome graph creation optimizations #64

kevinkle opened this issue Jun 19, 2019 · 9 comments
Assignees

Comments

@kevinkle
Copy link
Member

Our use of objects instead of strings for graph creation slowed down performance, probably due to the additional allocation.

2019-06-19 14:33:34 panther prairiedog[21158] DEBUG Done graphing SRR3295769.fasta, covering 4964075 kmers in 521.0180611610413 s

this is compared to ~460s from #53 (comment)

Performance was much worse (~1000s+) before we changed LGGraph.add_edge() to have an optional Edge object return per the echo arg.

@kevinkle
Copy link
Member Author

Back to 426 after #69

2019-06-20 09:12:06 panther prairiedog[16565] DEBUG Done graphing SRR3295769.fasta, covering 4964075 kmers in 426.83602833747864 s
Current graph size is 1.1517066955566406 GB
2019-06-20 09:19:05 panther prairiedog[16565] DEBUG Done graphing SRR3665189.fasta, covering 4975087 kmers in 415.99421286582947 s
Current graph size is 2.240581512451172 GB
rule 'pangenome' on Kmer 3 / 10
2019-06-20 09:19:08 panther prair

@kevinkle kevinkle self-assigned this Jun 21, 2019
@kevinkle
Copy link
Member Author

@kevinkle
Copy link
Member Author

Should note that nosync just sets MDB_NOSYNC per https://sourcegraph.com/github.com/NationalSecurityAgency/lemongraph/-/blob/lib/db.c#L48 which is probably suitable for our use case. See https://news.ycombinator.com/item?id=18411474

@kevinkle
Copy link
Member Author

This is with the new mapsize, nosync=True,noreadahead=True. seems slower

2019-06-24 10:25:49 panther prairiedog[15246] DEBUG 4700000/4899264, 95%
2019-06-24 10:26:02 panther prairiedog[15246] DEBUG 4800000/4899264, 97%
2019-06-24 10:26:15 panther prairiedog[15246] DEBUG Done graphing SRR2407793.fasta, covering 4899264 kmers in 615.9522776603699 s
Current graph size is 20.06531524658203 GB
rule 'pangenome' on Kmer 12 / 100

@kevinkle
Copy link
Member Author

With old mapsize, other options the same

2019-06-24 10:51:35 panther prairiedog[20555] DEBUG 4900000/4975087, 98%
2019-06-24 10:51:44 panther prairiedog[20555] DEBUG Done graphing SRR3665189.fasta, covering 4975087 kmers in 565.5920946598053 s
Current graph size is 3.167652130126953 GB
rule 'pangenome' on Kmer 3 / 100
2019-06-24 10:42:18 panther prairiedog[20555] DEBUG Done graphing SRR3295769.fasta, covering 4964075 kmers in 574.8278570175171 s
Current graph size is 1.5720138549804688 GB
rule 'pangenome' on Kmer 2 / 100

@kevinkle
Copy link
Member Author

old mapsize, readahead=False, nosync=True

2019-06-24 11:15:45 panther prairiedog[20806] DEBUG Done graphing SRR3665189.fasta, covering 4975087 kmers in 554.5166437625885 s
Current graph size is 3.1686248779296875 GB
rule 'pangenome' on Kmer 3 / 100

@kevinkle
Copy link
Member Author

old mapsize, readahead=False, nosync=False . I wonder if something else changed between here and #69

2019-06-24 11:36:14 panther prairiedog[20943] DEBUG Done graphing SRR3665189.fasta, covering 4975087 kmers in 567.0542962551117 s
Current graph size is 3.1681747436523438 GB
rule 'pangenome' on Kmer 3 / 100

@kevinkle
Copy link
Member Author

From above, we added filename + contig header props to each edge and increased mapsize. Will leave it to run with the profiler and see

@kevinkle
Copy link
Member Author

kevinkle commented Jul 3, 2019

PyPy with prop metadata for additional genome and contig:

2019-07-03 11:12:40 panther prairiedog[22137] DEBUG 4800000/4800480, 99%
2019-07-03 11:12:40 panther prairiedog[22137] DEBUG Done graphing SRR1060582.fasta, covering 4800480 kmers in 231.11508917808533 s
Current graph size is 1.553009033203125 GB
rule 'pangenome' on Kmer 2 / 2
2019-07-03 11:16:37 panther prairiedog[22137] DEBUG 4800000/4871878, 98%
2019-07-03 11:16:40 panther prairiedog[22137] DEBUG Done graphing SRR3295722.fasta, covering 4871878 kmers in 239.99927473068237 s

Without edge props:

2019-07-03 11:20:41 panther prairiedog[22894] DEBUG 4800000/4800480, 99%
2019-07-03 11:20:41 panther prairiedog[22894] DEBUG Done graphing SRR1060582.fasta, covering 4800480 kmers in 151.0755934715271 s
Current graph size is 1.1490974426269531 GB
2019-07-03 11:23:19 panther prairiedog[22894] DEBUG 4800000/4871878, 98%
2019-07-03 11:23:21 panther prairiedog[22894] DEBUG Done graphing SRR3295722.fasta, covering 4871878 kmers in 159.93281745910645 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant