-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk insert aggregated results #225
Conversation
I've ran some tests and memory usage is actually better, probably due to not fetching multiple copies of the same entities repeatedly. Gist |
Add level order, which should be faster when fetching from the database, by making one query per-level of the tree, instead of possibly one per-node. Signed-off-by: Rafael Reggiani Manzo <[email protected]>
It was too loose for `level_order` relying on `descendants_by_level` and too tight for this last one mocking the protected method `fetch_all_children` leaving it uncovered by tests.
This is not necessary for the given ModuleResult#find_by_module_and_processing unit test.
Places all the tree walking related methods within the same describe statement enabling setup sharing. This is based on the work of @danielkza on commit 805c7c1.
Previously aggregation worked by traversing the modules tree in pre-order. But to ensure that children are aggregated before their parents, we can relax that order a bit to just processes all the results on the same level before all of those on a level above it (the topmost level consisting of the root). This allows fetching much more results at once and significantly reduce t he number of trips to the database - from a number proportional to the number of nodes, to exactly and no more than the maximum depth of the tree. It also makes it much easier to accumulate the created tree metric results to be created all at once. That also saves a huge number of trips to the database. Using the aggregation performance tests, in my development machine the average time - combined with the indexing changes that were previously made - went from around 200s to <20s. Regarding tests: a complete refactor was necessary, and made possible by the module results tree factory. The tests ended up much cleaner and arguably better, as they can verify the actual values being aggregated while mocking only the necessary data accesses.
It is no longer used since the new aggregation processing collects the tree metric results itself, making the auxiliary logic to find out whether a node already has a result for a metric unnecessary.
8a7847d
to
bbd513d
Compare
Using `import!` with no batch limit can theoretically offer the best performance, but generates obscenely large queries that are very hard to deal with in logs. A batch size of 100 wrapped in a transaction seems to have virtually no performance penalty, but makes things much easier to manage.
cc5feeb
to
eb4176d
Compare
@@ -4,6 +4,9 @@ KalibroProcessor is the processing web service for Mezuro. | |||
|
|||
== Unreleased | |||
|
|||
* Insert in one query all aggregated MetricResults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean to be too meticulous here, but now we are batching the queries, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegoamc Right. I personally prefer changelog entries that describe changes as they would affect a user reading it, and not exactly what changed. In this case it doesn't really matter whether we batch, insert at once, or whatever else, only that the aggregation got faster.
@@ -30,8 +30,27 @@ def pre_order | |||
@pre_order ||= pre_order_traverse(root).to_a | |||
end | |||
|
|||
def level_order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method actually called somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, It isn't used. it was part of an earlier version of the PR. It
might be useful for other processing steps, but we can add it whenever we
actually need it.
Em qua, 27 de jul de 2016 15:28, Diego de Araújo Martinez Camarinha <
[email protected]> escreveu:
In app/models/module_result.rb
#225 (comment)
:@@ -30,8 +30,27 @@ def pre_order
@pre_order ||= pre_order_traverse(root).to_a
end
- def level_order
Is this method actually called somewhere?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mezuro/kalibro_processor/pull/225/files/eb4176d9b85a5550c3830c82d5831086525cfc63#r72495614,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAtnToL6_XMIcirJL-cE7_b2sPZhqGpuks5qZ6M4gaJpZM4JTGJy
.
Congratulations, this is a very fine piece of work! 👏 I think we should remove |
Good to me! Em qua, 27 de jul de 2016 15:39, Diego de Araújo Martinez Camarinha <
|
Done. |
Sweet 🍬 |
This is mostly @danielkza a great work for which I've split some commits into atomic ones.
To improve even more the bulk insertions effect, walking by levels bottom up has been added alongside this same PR.
As this stores a lot of data in memory, I think adding memory statistics to the profiling is sane before accepting this.
Aggregation performance results:
Aggregation performance test results below:
Almost a 10x speedup! Nicely done!
This is part of #207.
After accepting this, please create an issue so we do not forget to rewrite
MetricResult#pre_order
unit tests to take full advantage of the context already existent objects.