Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect size parameter in aggregations (1/3 - refactor) #35

Merged
merged 6 commits into from
May 21, 2024

Conversation

trzysiek
Copy link
Member

@trzysiek trzysiek commented May 5, 2024

Basically moves SimpleQuery from queryparser package to model. Also starts a small refactor to aggregation_parser, which is the motivation for this change. Before we had 1 function parse_some_aggregation_type. Now I split it into 2: parse and process. I think it simplifies a lot of things + enables making plugins for new aggregations quite easy.

edit2: comments below are obsolete. Changed this PR into refactor, next will follow.

Edit: I think it works 100% fine now, although this issue is generally pretty tricky, and hard to get right, and code is so far pretty ugly, because of a lot of trial and error approaches.
1 test was failing after some small fixes, and I finally managed to make it work at the end. Will check rest of tests tomorrow, but I'm 99% they'll pass.

Generally, plan is to transform (multiple) terms (simple split into buckets, or GROUP BY with limit) (e.g. with LIMITs 2 and 5) into something like this:

SELECT customer_id, toInt64(taxless_total_price / 20) FROM opensearch_dashboards_sample_data_ecommerce
    WHERE customer_id IN 
        (SELECT customer_id FROM opensearch_dashboards_sample_data_ecommerce GROUP BY customer_id ORDER BY customer_id LIMIT 2)
    GROUP BY customer_id, toInt64(taxless_total_price / 20)
	ORDER BY customer_id, toInt64(taxless_total_price / 20)
	LIMIT 5 BY customer_id

First terms uses normal LIMIT N, 2+ terms use LIMIT N BY [previous terms]
You can paste this into our Clickhouse to see that such a query is correct
Screenshot 2024-05-15 at 01 55 28

Old comments:

Before: we didn't add LIMIT x anywhere, so we were displaying all buckets.
After: well, I tried visualizing a lot of charts in OpenSearch and it seems to work everywhere, even this Other bar.
Screenshot 2024-05-05 at 18 40 56
Screenshot 2024-05-05 at 18 48 01

BUT
a) not important, fixed on Tuesday I'm pretty sure the code isn't perfect, as I said I can't break it easily in UI, but with more complex aggregations it probably will not work 100% correctly. The PR code isn't prettiest, maybe some small refactor which is needed anyway will help with that correctness issue.
b) still valid, but not very important, and not very hard Order (by) parameter in query still isn't respected.

I'd leave b) for next PR.

@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch 2 times, most recently from ae2d25c to d4f9eb0 Compare May 8, 2024 12:05
@jakozaur jakozaur mentioned this pull request May 14, 2024
@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch 7 times, most recently from 080517c to 5b73ca3 Compare May 20, 2024 20:49
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply copied from queryparser package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it got moved, not copied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code is the same as before, added only a few checks which were missing.
Moved from big single file to a separate one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of those functions will also have finish... name, I think more appropriate, if you don't agree, maybe let's argue in next PR when I change all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only used in tests, these prints help when they fail

@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch from 9532bb1 to 4b4efc3 Compare May 20, 2024 21:15
@trzysiek trzysiek changed the title Respect size parameter in aggregations 1/3 (refactor) Respect size parameter in aggregations May 20, 2024
@trzysiek trzysiek changed the title 1/3 (refactor) Respect size parameter in aggregations Respect size parameter in aggregations (1/3) refactor May 20, 2024
@trzysiek trzysiek changed the title Respect size parameter in aggregations (1/3) refactor Respect size parameter in aggregations (1/3 - refactor) May 20, 2024
@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch 4 times, most recently from 1ceacc1 to 5973ca4 Compare May 20, 2024 22:00
@trzysiek trzysiek marked this pull request as ready for review May 20, 2024 22:01
@trzysiek trzysiek requested a review from a team as a code owner May 20, 2024 22:01
@trzysiek trzysiek requested a review from pdelewski May 20, 2024 22:01
@trzysiek trzysiek marked this pull request as draft May 20, 2024 22:05
@trzysiek trzysiek marked this pull request as ready for review May 20, 2024 22:56
@jakozaur
Copy link
Contributor

The general comment is that "LIMIT N BY " is an unusual SQL method. Most databases do not support this. Quesma would still need to generate a query for that database without this clause.

@jakozaur
Copy link
Contributor

Ok, I would need walkthrough. I see a lot of refactoring changes which are ok.

Some logic there I don't understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it got moved, not copied.

aggrBuilder.Type = aggr
aggrBuilder.whereBuilder = model.CombineWheres(cw.Ctx, aggrBuilder.whereBuilder, filter.Sql)
aggrBuilder.Aggregators = append(aggrBuilder.Aggregators, model.NewAggregatorEmpty(filter.Name))
*resultAccumulator = append(*resultAccumulator, aggrBuilder.buildBucketAggregation(nil)) // nil for now, will be changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don;t get it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to be like that before.
.Type keeps what type of aggregation it is, here some filters.
.whereBuilder keeps where clause as string, I combine, so do old_where AND this_filter's_where
.Aggregators = list of aggregation names to be put in response's JSON. We add filter.name where it's kept.
resultAccumulator = result list of all queries we need to send to DB, we add 1 here for each filter

@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch from 5973ca4 to da2fe90 Compare May 21, 2024 16:53
@trzysiek trzysiek force-pushed the respect-size-in-aggregations branch from bd3e400 to 2e55d63 Compare May 21, 2024 17:12
@trzysiek trzysiek merged commit e0efff2 into main May 21, 2024
4 checks passed
@trzysiek trzysiek deleted the respect-size-in-aggregations branch May 21, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants