Support database queries on arbitrary labels #117

ericpromislow · 2024-11-06T00:31:14Z

Related to #46333

This PR needs to get merged first before I can submit the PR for Steve,
which caches the labels. If you prefer, I'll submit a PR with steve that
temporarily pulls in ../lasso in the go.mod file.

Note that this is the search syntax I've implemented:

curl -sk https://HOSTANDPORT/v1/configmaps?filter=metadata.labels%5bLABELNAME%5d=LABELVALUE

If LABELVALUE is quoted, an exact match is made. Otherwise partial matching is done on the value,
like for other A=B queries.

Note that the exact LABELNAME must be specified after metadata.labels, between the escaped square brackets.
This is similar to how you always have to provide the full name of a field to the left of =.

This PR supersedes the experimental PR #110

moio

I left some inline comments, overall this seems to be going in a good direction, thanks for the contribution.

pkg/cache/sql/store/store.go

pkg/cache/sql/db/client.go

pkg/cache/sql/informer/listoption_indexer.go

moio · 2024-11-06T12:23:58Z

One other thing: can you please open the corresponding Steve PR as well, even in draft mode?

I'd like to have a look at the whole picture, and possibly locally try it out as well.

Thanks in advance!

ericpromislow · 2024-11-07T01:17:59Z

The steve PR is at rancher/steve#317

moio

Thanks, this gets closer. I found two issues, one of which (adding an INDEX) should be very straightforward to solve.

The other will likely need changes in the parser in Steve as well as changes in the query builder here in listOptionsIndexer.

Thanks, keep up the good work!

pkg/cache/sql/informer/listoption_indexer.go

ericpromislow · 2024-11-14T02:18:30Z

This change has the new filter query expression parser, with tests. There's some cruft that came in from the k8s code that I'll pull out during the next round of reviews.

ericpromislow · 2024-11-15T00:32:57Z

Here are some sample filters commands I've been running:

/v1/events'?filter=involvedObject.kind=Pod'
/v1/events?filter=_type=some-event-type
/v1/events'?filter=metadata.labels.app=app1'
/v1/events'?filter=message="This+event+%234."'  # have to URL-encode number signs
/v1/events'?filter=metadata.labels%5bauthz.management.cattle.io/default-project%5d="true"'

This last one is interesting -- on the command-line, the square brackets have to be URL-hex-encoded.
Next the lexer considers square brackets to be alphanumerics, and the restricted label syntax means
that all the characters inside the brackets are alphanumeric as well. The code that converts k8s/apimachinery
Requirement objects into lasso/Informer objects knows what to do with metadata.labels[...], and you get
the SQL query you'd expect.

There are also unit tests that verify that filter=x=1,y=2 => select when ... x=1 OR y=2
and filter=x=1&filter=y=2 => select when ... x =1 & y = 2

moio

Thanks @ericpromislow, I see this moving well towards the goal.

Most of my notes are nitpicks, there is just a couple of substantial ones.

More than anything, this needs some road testing by @richard-cox or somebody in the UI team - we want to make sure that whatever is built here will satisfy the needs of frontend code.

Keep up the good work!

pkg/cache/sql/informer/listoption_indexer.go

moio · 2024-11-28T15:31:31Z

pkg/cache/sql/informer/listoption_indexer.go

+		clause := fmt.Sprintf(`f."%s" IS NOT NULL`, columnName)
+		return clause, []any{}, nil
+	case NotExists:
+		clause := fmt.Sprintf(`f."%s" IS NULL`, columnName)


I fear this will never happen, because addIndexedField will use empty string instead of proper NULL:

https://github.com/rancher/lasso/pull/117/files?w=1#diff-910681530ceac7cfe911fb307754542734adee41c8d009d795aabe0d9e3803b2R200

To do this right we have to:

really insert NULL

check for nullability every time we do another comparision (eg. = or <>, because 3-valued logic)

make sure this does not break the UI

Before implementing that, though, do we need new operator support in the UI at all?

IIUC:

Before all we had for fields was equality and inequality

This PR's main intent is about adding support for filtering by labels and with more operators (in, not in, exists, doesn't exist)

This code section basically adds the same operators for fields as well

@ericpromislow is that correct?
@richard-cox do we need them?

Now that I think about it, my guess is that NULL/!NULL don't make sense for the non-labels, because either we index them (which means they can't be NULL, because they always come with a value, right?) or we don't index them, which means you can't query on those fields.

But NULL/!NULL do make sense for labels, because you're basically querying on the existence of a key in a hash. Also it took me a couple of hours to work out the syntax for NotExists(label) so I don't want to toss it that quickly.

Thinking about yaml an empty label value foo: is provided as foo: "" via api (json). Not sure that makes any difference?

I couldn't find anything specific in the label selector for 'without this label' / 'with a label but NULL', though the generic utility of filter by everything that doesn't have a property value might be useful at some point

Number 1, let's focus back on fields (not labels) in this thread - discussion on labels, if needed, will be in a separate thread.

Number 2, @ericpromislow:

my guess is that NULL/!NULL don't make sense for the non-labels, because either we index them (which means they can't be NULL, because they always come with a value, right?) or we don't index them, which means you can't query on those fields.

Hmm, my understanding is different. Assuming that we index a certain resource field, its value can very well be null (ie. missing or as explicit null in the YAML) - lots of structs corresponding to Kubernetes resources come with the "omitempty" directive for that reason.

We have to keep in mind that, right now, in such cases we simply throw away the null and exchange it with an empty string. That is arguably not really awesome, but it suffices when all we care for is string search, which is probably why this shortcut was originally taken. Now...

Number 3, @richard-cox: do I understand correctly that, from a UI perspective, we do not really need anything other than exact/inexact text matching for non-label properties, at least for now?

pkg/cache/sql/informer/listoption_indexer.go

moio · 2024-11-28T15:43:07Z

pkg/cache/sql/informer/listoption_indexer.go

+	case Eq:
+		if filter.Partial {
+			opString = "LIKE"
+			escapeString = escapeBackslashDirective


How about inlining the directive in the SQL query where it's used, instead of having this as a constant copied into a variable and then into the query?

(yes, I understand that's a bit of repetition, maybe it's just me suffering from indirection motion sickness!)

PS. same about matchFmt. I had a hard time looking into 4 places before coalescing the sense of the query in my mind while reviewing 🦀

TBD

We need to hear from Richard if this business with quoting strings to be exact vs doing substring matches is going to survive.

There needs to be a way to differentiate between partial and exact matches. As long as that's there am open to offers on syntax (UI changes would be straight forward)

@richard-cox please note that the context for this thread is labels.

My understanding is that, for labels, you need power to match label selectors.

Now label selectors do implement equality, which we call exact for normal fields. They do not implement "substring equality".

Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?

moio · 2024-11-28T15:54:40Z

pkg/cache/sql/informer/listoption_indexer.go

+	query += fmt.Sprintf(`JOIN "%s_fields" f ON o.key = f.key`, dbName)
+	if hasLabelFilter(lo.Filters) {
+		query += "\n  "
+		query += fmt.Sprintf(`JOIN "%s_labels" lt ON o.key = lt.key`, dbName)


I fear this will not always work.

Main failure case will be when a single object row has multiple labels: JOIN will create multiple resulting rows with all object/field columns repeated, and different label columns.

Secondarily, if an object has no labels at all then it will be filtered out by this JOIN.

One way to resolve this problem could be to not JOIN the labels table at all here, and only use subqueries in filters when needed.

Another is to LEFT OUTER JOIN here, and then deduplicate by DISTINCT on all other columns before returning results. Probably more convoluted.

Do we have any tests involving the database that cover this?

My main peeve here is that we're mocking the database in the unit tests, and that makes increasing less sense to me.

I've written plenty of unit tests in Rails and Flask (python), and you never mock the database. You just use a test variant of it, and the framework resets the database on every test. Yes, it takes a bit longer, but the tests are much simpler (no mocks) and you have more confidence that the DB code will run correctly in production.

So, no, we don't have any tests that cover this.

Yes, I'm proposing that we rewrite the unit tests .

In my defense: https://markphelps.me/posts/writing-tests-for-your-database-code-in-go/ :

However, I would argue that mocking the database when testing your SQL code is an anti-pattern that you should avoid, mainly because it violates requirement 3[*] since it isn’t actually testing that we can correctly interact with a real database as we are mocking the database itself!

"We want to test that we can correctly interact with the database and also test that the business logic is correct."

I certainly worked out my statements interactively with a sqlite DB, namely the one created by running steve while creating a few events, and then shutting down steve so the DB wouldn't change.

Here's an example:

sqlite> select * from _v1_Event_labels; default/my-custom-event4|app|app1 default/my-custom-event4|bar|chocolate default/my-custom-event5|app|app1 default/my-custom-event5|bar|chocolate default/my-custom-event5|cat|morris default/my-custom-event5|dog|zipher sqlite> SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "dog" and lt.value = "zipher"); % Unstructured??||0

This looks ok. Unfortunately the fields we're asking for our partly binary and not very useful. But let's look at a label that is used more than once:

sqlite> SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "bar" and lt.value = "chocolate"); % Unstructured??||0 % Unstructured??||0

But it's hard to tell what we're getting, so let's get the key instead:

sqlite> SELECT o.key FROM "_v1_Event" o JOIN "_v1_Event_fields" f on o.key = f.key JOIN "_v1_Event_labels" lt on o.key = lt.key WHERE (lt.label = "bar" and lt.value = "chocolate"); default/my-custom-event4 default/my-custom-event5

Unless I'm missing something, this looks ok. In the next comment I'll report on an object where there are no labels.

So here's a reason="Starting" event for metadata.name=lima-rancher-desktop.17e053aa93e89c71 that has no labels. Here are two curl calls, one without labels, one with, on this event:

$ curl -ksL https://localhost:5111/v1/events'?filter=metadata.name="lima-rancher-desktop.17e053aa93e89c71"' | jq . | wc 94 165 2826 $ curl -ksL https://localhost:5111/v1/events'?filter=metadata.name="lima-rancher-desktop.17e053aa93e89c71"&filter=metadata.labels.fish=car' | jq . { "type": "collection", "links": { "self": "https://localhost:5111/v1/events" }, "createTypes": { "event": "https://localhost:5111/v1/events" }, "actions": {}, "resourceType": "event", "data": [] }

The SQL for the second call was this:

SELECT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o JOIN "_v1_Event_fields" f ON o.key = f.key JOIN "_v1_Event_labels" lt ON o.key = lt.key WHERE (f."metadata.name" = "lima-rancher-desktop.17e053aa93e89c71") AND (lt.label = "fish" AND lt.value LIKE "%car% ESCAPE '\') ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC LIMIT 100000

and it behaved as expected.

My main peeve here is that we're mocking the database in the unit tests, and that makes increasing less sense to me.

100% agreed. DB should be in the loop - we do care about correctness of queries, but much more importantly we care about correctness of results. And those can't be tested without some DB in the loop.

I've written plenty of unit tests in Rails and Flask (python), and you never mock the database. You just use a test variant of it, and the framework resets the database on every test. Yes, it takes a bit longer, but the tests are much simpler (no mocks) and you have more confidence that the DB code will run correctly in production.

Same experience in my background (Rails and Java/Hibernate).

So, no, we don't have any tests that cover this.

How about extending pkg/cache/sql/integration_test.go

Those already run a fake Kubernetes environment and have the DB in the loop.

(PS. I am having a look at your queries too, probably tomorrow. I am sorry I could not take the time to try them today)

* Add labels when adding/replacing objects. * Add labels to the query language

…escriptive named constants. Also: Move all the label operations from store to listoption_indexer.

…bles. - tx.Exec takes only one argument.

…ptive names.

- Fixed rendering NOT-EXISTS queries. - Wrap query error in a 'db.QueryError' object. - Use consistent error message when failing to get a unstructured object. - Don't include ORDER-BY clauses in COUNT queries. - Don't bother pulling the various fields out of the `queryInfo` struct - Pull the count queryInfo parts out only when needed.

In particular, don't clear the count query values if no count query needs to be made -- just leave the default struct values, and the query executor won't run a count-query.

ericpromislow requested a review from a team as a code owner November 6, 2024 00:31

ericpromislow requested review from moio and a team and removed request for a team November 6, 2024 00:31

moio requested changes Nov 6, 2024

View reviewed changes

pkg/cache/sql/store/store.go Outdated Show resolved Hide resolved

pkg/cache/sql/db/client.go Outdated Show resolved Hide resolved

pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved

ericpromislow marked this pull request as draft November 6, 2024 19:27

ericpromislow mentioned this pull request Nov 7, 2024

Index arbitrary labels rancher/steve#317

Draft

ericpromislow marked this pull request as ready for review November 7, 2024 01:18

moio requested changes Nov 7, 2024

View reviewed changes

ericpromislow requested a review from moio November 7, 2024 17:31

ericpromislow force-pushed the 46333-cache-labels branch from 912361f to 2a2272b Compare November 14, 2024 02:17

ericpromislow requested a review from richard-cox November 14, 2024 02:17

moio requested changes Nov 28, 2024

View reviewed changes

ericpromislow force-pushed the 46333-cache-labels branch from 9d24e20 to 2e39614 Compare November 30, 2024 00:41

ericpromislow requested a review from moio December 3, 2024 01:51

ericpromislow added 2 commits December 3, 2024 13:18

Support searching by metadata.label fields:

a8a924b

* Add labels when adding/replacing objects. * Add labels to the query language

Add a unit test for label tests.

82b7994

ericpromislow force-pushed the 46333-cache-labels branch from e1c53ba to b032b9d Compare December 3, 2024 21:21

ericpromislow changed the base branch from master to main December 3, 2024 21:23

ericpromislow added 7 commits December 3, 2024 13:26

Stick with terse in-place table alias names rather than interpolate d…

991d637

…escriptive named constants. Also: Move all the label operations from store to listoption_indexer.

Run go generate after refactoring label handling.

305b92b

Move label handling out of store.go and into the listoption indexer.

cbde1d8

Fix the number of args in the mocked calls to creating the indices ta…

2e3a890

…bles. - tx.Exec takes only one argument.

Split 'ListOptionIndexer.afterUpsert' into two funcs with more descri…

e555a6f

…ptive names.

Split the 'afterX' methods into two parts, with better names.

ef1acd2

Refactor the query-generator into its own method.

29f233f

ericpromislow added 6 commits December 3, 2024 13:27

Split 'ListByOptions' into two functions to simplify testing.

31a4b97

Add tests for converting other ops to sql stmts, fixing breakage.

95274c3

Add more tests, fix NOT-EXISTS for labels.

b8a8223

Simplify the way the final queryInfo struct is built.

e908e62

In particular, don't clear the count query values if no count query needs to be made -- just leave the default struct values, and the query executor won't run a count-query.

No need to map nil to an empty array when there's no count query.

8501428

ericpromislow force-pushed the 46333-cache-labels branch from b032b9d to 8501428 Compare December 3, 2024 21:29

Process comparisons in the query strings.

2f0f76a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support database queries on arbitrary labels #117

Support database queries on arbitrary labels #117

ericpromislow commented Nov 6, 2024

moio left a comment

moio commented Nov 6, 2024

ericpromislow commented Nov 7, 2024

moio left a comment

ericpromislow commented Nov 14, 2024

ericpromislow commented Nov 15, 2024 •

edited

Loading

moio left a comment

moio Nov 28, 2024

ericpromislow Dec 3, 2024

richard-cox Dec 3, 2024

moio Dec 4, 2024

moio Nov 28, 2024

ericpromislow Dec 3, 2024 •

edited

Loading

moio Dec 3, 2024

richard-cox Dec 3, 2024

moio Dec 4, 2024

moio Nov 28, 2024

ericpromislow Dec 3, 2024

ericpromislow Dec 3, 2024

moio Dec 3, 2024 •

edited

Loading

Support database queries on arbitrary labels #117

Are you sure you want to change the base?

Support database queries on arbitrary labels #117

Conversation

ericpromislow commented Nov 6, 2024

moio left a comment

Choose a reason for hiding this comment

moio commented Nov 6, 2024

ericpromislow commented Nov 7, 2024

moio left a comment

Choose a reason for hiding this comment

ericpromislow commented Nov 14, 2024

ericpromislow commented Nov 15, 2024 • edited Loading

moio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericpromislow Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

TBD

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moio Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

ericpromislow commented Nov 15, 2024 •

edited

Loading

ericpromislow Dec 3, 2024 •

edited

Loading

moio Dec 3, 2024 •

edited

Loading