-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support database queries on arbitrary labels #117
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some inline comments, overall this seems to be going in a good direction, thanks for the contribution.
One other thing: can you please open the corresponding Steve PR as well, even in draft mode? I'd like to have a look at the whole picture, and possibly locally try it out as well. Thanks in advance! |
The steve PR is at rancher/steve#317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this gets closer. I found two issues, one of which (adding an INDEX) should be very straightforward to solve.
The other will likely need changes in the parser in Steve as well as changes in the query builder here in listOptionsIndexer.
Thanks, keep up the good work!
912361f
to
2a2272b
Compare
This change has the new filter query expression parser, with tests. There's some cruft that came in from the k8s code that I'll pull out during the next round of reviews. |
Here are some sample filters commands I've been running:
This last one is interesting -- on the command-line, the square brackets have to be URL-hex-encoded. There are also unit tests that verify that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericpromislow, I see this moving well towards the goal.
Most of my notes are nitpicks, there is just a couple of substantial ones.
More than anything, this needs some road testing by @richard-cox or somebody in the UI team - we want to make sure that whatever is built here will satisfy the needs of frontend code.
Keep up the good work!
clause := fmt.Sprintf(`f."%s" IS NOT NULL`, columnName) | ||
return clause, []any{}, nil | ||
case NotExists: | ||
clause := fmt.Sprintf(`f."%s" IS NULL`, columnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this will never happen, because addIndexedField will use empty string instead of proper NULL
:
To do this right we have to:
- really insert
NULL
- check for nullability every time we do another comparision (eg.
=
or<>
, because 3-valued logic) - make sure this does not break the UI
Before implementing that, though, do we need new operator support in the UI at all?
IIUC:
- Before all we had for fields was equality and inequality
- This PR's main intent is about adding support for filtering by labels and with more operators (in, not in, exists, doesn't exist)
- This code section basically adds the same operators for fields as well
@ericpromislow is that correct?
@richard-cox do we need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, my guess is that NULL/!NULL don't make sense for the non-labels, because either we index them (which means they can't be NULL, because they always come with a value, right?) or we don't index them, which means you can't query on those fields.
But NULL/!NULL do make sense for labels, because you're basically querying on the existence of a key in a hash. Also it took me a couple of hours to work out the syntax for NotExists(label)
so I don't want to toss it that quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about yaml an empty label value foo:
is provided as foo: ""
via api (json). Not sure that makes any difference?
I couldn't find anything specific in the label selector for 'without this label' / 'with a label but NULL', though the generic utility of filter by everything that doesn't have a property value might be useful at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number 1, let's focus back on fields (not labels) in this thread - discussion on labels, if needed, will be in a separate thread.
Number 2, @ericpromislow:
my guess is that NULL/!NULL don't make sense for the non-labels, because either we index them (which means they can't be NULL, because they always come with a value, right?) or we don't index them, which means you can't query on those fields.
Hmm, my understanding is different. Assuming that we index a certain resource field, its value can very well be null (ie. missing or as explicit null
in the YAML) - lots of structs corresponding to Kubernetes resources come with the "omitempty" directive for that reason.
We have to keep in mind that, right now, in such cases we simply throw away the null and exchange it with an empty string. That is arguably not really awesome, but it suffices when all we care for is string search, which is probably why this shortcut was originally taken. Now...
Number 3, @richard-cox: do I understand correctly that, from a UI perspective, we do not really need anything other than exact/inexact text matching for non-label properties, at least for now?
case Eq: | ||
if filter.Partial { | ||
opString = "LIKE" | ||
escapeString = escapeBackslashDirective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about inlining the directive in the SQL query where it's used, instead of having this as a constant copied into a variable and then into the query?
(yes, I understand that's a bit of repetition, maybe it's just me suffering from indirection motion sickness!)
PS. same about matchFmt. I had a hard time looking into 4 places before coalescing the sense of the query in my mind while reviewing 🦀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBD
We need to hear from Richard if this business with quoting strings to be exact vs doing substring matches is going to survive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There needs to be a way to differentiate between partial and exact matches. As long as that's there am open to offers on syntax (UI changes would be straight forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richard-cox please note that the context for this thread is labels.
My understanding is that, for labels, you need power to match label selectors.
Now label selectors do implement equality, which we call exact for normal fields. They do not implement "substring equality".
Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?
query += fmt.Sprintf(`JOIN "%s_fields" f ON o.key = f.key`, dbName) | ||
if hasLabelFilter(lo.Filters) { | ||
query += "\n " | ||
query += fmt.Sprintf(`JOIN "%s_labels" lt ON o.key = lt.key`, dbName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this will not always work.
Main failure case will be when a single object row has multiple labels: JOIN
will create multiple resulting rows with all object/field columns repeated, and different label columns.
Secondarily, if an object has no labels at all then it will be filtered out by this JOIN
.
One way to resolve this problem could be to not JOIN the labels table at all here, and only use subqueries in filters when needed.
Another is to LEFT OUTER JOIN
here, and then deduplicate by DISTINCT on all other columns before returning results. Probably more convoluted.
Do we have any tests involving the database that cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main peeve here is that we're mocking the database in the unit tests, and that makes increasing less sense to me.
I've written plenty of unit tests in Rails and Flask (python), and you never mock the database. You just use a test variant of it, and the framework resets the database on every test. Yes, it takes a bit longer, but the tests are much simpler (no mocks) and you have more confidence that the DB code will run correctly in production.
So, no, we don't have any tests that cover this.
Yes, I'm proposing that we rewrite the unit tests .
In my defense: https://markphelps.me/posts/writing-tests-for-your-database-code-in-go/ :
However, I would argue that mocking the database when testing your SQL code is an anti-pattern that you should avoid, mainly because it violates requirement 3[*] since it isn’t actually testing that we can correctly interact with a real database as we are mocking the database itself!
- "We want to test that we can correctly interact with the database and also test that the business logic is correct."
I certainly worked out my statements interactively with a sqlite DB, namely the one created by running steve while creating a few events, and then shutting down steve so the DB wouldn't change.
Here's an example:
sqlite> select * from _v1_Event_labels;
default/my-custom-event4|app|app1
default/my-custom-event4|bar|chocolate
default/my-custom-event5|app|app1
default/my-custom-event5|bar|chocolate
default/my-custom-event5|cat|morris
default/my-custom-event5|dog|zipher
sqlite> SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "dog" and lt.value = "zipher");
%
Unstructured??||0
This looks ok. Unfortunately the fields we're asking for our partly binary and not very useful. But let's look at a label that is used more than once:
sqlite> SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "bar" and lt.value = "chocolate");
%
Unstructured??||0
%
Unstructured??||0
But it's hard to tell what we're getting, so let's get the key instead:
sqlite> SELECT o.key FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "bar" and lt.value = "chocolate");
default/my-custom-event4
default/my-custom-event5
Unless I'm missing something, this looks ok. In the next comment I'll report on an object where there are no labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's a reason="Starting"
event for metadata.name=lima-rancher-desktop.17e053aa93e89c71
that has no labels. Here are two curl calls, one without labels, one with, on this event:
$ curl -ksL https://localhost:5111/v1/events'?filter=metadata.name="lima-rancher-desktop.17e053aa93e89c71"' | jq . | wc
94 165 2826
$ curl -ksL https://localhost:5111/v1/events'?filter=metadata.name="lima-rancher-desktop.17e053aa93e89c71"&filter=metadata.labels.fish=car' | jq .
{
"type": "collection",
"links": {
"self": "https://localhost:5111/v1/events"
},
"createTypes": {
"event": "https://localhost:5111/v1/events"
},
"actions": {},
"resourceType": "event",
"data": []
}
The SQL for the second call was this:
SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o
JOIN "_v1_Event_fields" f ON o.key = f.key
JOIN "_v1_Event_labels" lt ON o.key = lt.key
WHERE
(f."metadata.name" = "lima-rancher-desktop.17e053aa93e89c71") AND
(lt.label = "fish" AND lt.value LIKE "%car% ESCAPE '\')
ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC
LIMIT 100000
and it behaved as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main peeve here is that we're mocking the database in the unit tests, and that makes increasing less sense to me.
100% agreed. DB should be in the loop - we do care about correctness of queries, but much more importantly we care about correctness of results. And those can't be tested without some DB in the loop.
I've written plenty of unit tests in Rails and Flask (python), and you never mock the database. You just use a test variant of it, and the framework resets the database on every test. Yes, it takes a bit longer, but the tests are much simpler (no mocks) and you have more confidence that the DB code will run correctly in production.
Same experience in my background (Rails and Java/Hibernate).
So, no, we don't have any tests that cover this.
How about extending pkg/cache/sql/integration_test.go
Those already run a fake Kubernetes environment and have the DB in the loop.
(PS. I am having a look at your queries too, probably tomorrow. I am sorry I could not take the time to try them today)
9d24e20
to
2e39614
Compare
* Add labels when adding/replacing objects. * Add labels to the query language
e1c53ba
to
b032b9d
Compare
…escriptive named constants. Also: Move all the label operations from store to listoption_indexer.
…bles. - tx.Exec takes only one argument.
- Fixed rendering NOT-EXISTS queries. - Wrap query error in a 'db.QueryError' object. - Use consistent error message when failing to get a unstructured object. - Don't include ORDER-BY clauses in COUNT queries. - Don't bother pulling the various fields out of the `queryInfo` struct - Pull the count queryInfo parts out only when needed.
In particular, don't clear the count query values if no count query needs to be made -- just leave the default struct values, and the query executor won't run a count-query.
b032b9d
to
8501428
Compare
Related to #46333
This PR needs to get merged first before I can submit the PR for Steve,
which caches the labels. If you prefer, I'll submit a PR with steve that
temporarily pulls in
../lasso
in the go.mod file.Note that this is the search syntax I've implemented:
If
LABELVALUE
is quoted, an exact match is made. Otherwise partial matching is done on the value,like for other
A=B
queries.Note that the exact
LABELNAME
must be specified aftermetadata.labels
, between the escaped square brackets.This is similar to how you always have to provide the full name of a field to the left of
=
.This PR supersedes the experimental PR #110