-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize relevant portions of IR2Vec with OMP #101
Comments
Hi, this seems like an interesting enhancement that I'd like to help out on. I think its important to have a baseline to compare against for any potential improvements, is the TimeTaken experiment suitable for that? Further, is there a script I could use to generate time taken as in I'd be happy to add an additional benchmark as well, the SQLite Amalgamation might be an interesting option. |
Hi @m-atalla, Apologies for the delay in response. We do not have a script for this yet. It would be great if you could help in this. SQLite Amalgamation is also very interesting and would be a valuable addition. We have started integrating OMP with IR2Vec (See #105, which is a work in progress). Please feel free to reach out if you need any inputs or have further questions. Will be happy to help :) Best, |
Hi, I wanted to follow up with profiling info on SQLite benchmark now that its added! I used $ perf record -g --call-graph dwarf build/bin/ir2vec --sym -level p ./src/test-suite/PE-benchmarks-llfiles-llvm17/sqlite3.ll -o sqlite.txt
$ perf script > /tmp/sym-perf.out And I used the firefox profiler to analyze and upload the profile data which could be found here. From the call tree it seems that about 53% of the time is spent on parsing (not much could be done about it) and 44% is spent in Similarly, I generated a profile for the flow-aware (FA) mode which could found here. The call tree shows the following functions It'd be happy to assist further as needed. Thank you. |
Hi @m-atalla, Thanks for the perf report :) It exposes more opportunities for optimizations in addition to parallelization. On the top of my mind, I have two things:
Perhaps I will create separate issues to track these as the objective of these points is a bit different from that of the current issue. Please give me some time. I will have a more detailed look at the perf report and get back with more possible improvements. |
Hi @svkeerthy, sorry I kinda lost track of this issue as I'm currently in the midst of working on my masters thesis, I think I can send a PR for SmallVector copy part in |
No description provided.
The text was updated successfully, but these errors were encountered: