-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add special encoding of json array elements #3368
base: develop
Are you sure you want to change the base?
feat: Add special encoding of json array elements #3368
Conversation
when index is reading encoded keys it has how idea what's inside. It has either a prefix or an exact value. We use to encode like this (taking the example from the issue): {
"name": "Andy",
"custom": {
"numbers": 4,
}
}
{
"name": "John",
"custom": {
"numbers": [3, 4],
}
} Old way to store indexes:
So when one of these is fetched there is no info if it's array or not.
|
Excellent explanation, makes sense to me now, thanks Islam!
question: What is the significance of
|
but it's not necessary for our use case for now. Later we can employ it when we need to process more sophisticated array queries.
this is illegal json. This would be correct
And |
}, client.TraverseJSONOnlyLeaves(), client.TraverseJSONVisitArrayElements(false)) | ||
}, | ||
// we don't want to traverse intermediate nodes, because we encode only values that can be filtered on | ||
client.TraverseJSONOnlyLeaves(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why are these functions declared in the client
package? Do you think users will want to use them, and if so, for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't have to be in client
package. It can be easily moved elsewhere? Do you have an idea where it would fit best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are only ever called in internal/db/index.go
, so my preference would be to make them private funcs in that file. But the most important thing for me is to move them out of the users' scope and to somewhere internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out it's not that easy to move it out.
I created a ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your earlier answers Islam, I have now reviewed the PR - looks good, but I think you are dropping an error, and a little more documentation would be handy.
@@ -59,17 +143,17 @@ type JSON interface { | |||
Marshal(w io.Writer) error | |||
|
|||
// GetPath returns the path of the JSON value in the JSON tree. | |||
GetPath() []string | |||
GetPath() JSONPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Please note that this is the index path only, and does not represent the path of the base (datastore-doc) data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said earlier, although at the moment it's used only by the indexer, it is not index-specific, hence doesn't need to mention it.
And there no point in saying anything about storage, as it's irrelevant. It says "in the JSON tree".
I thought we agreed that TODOs are going to be used only if a reviewer considers the code (or absence of it) a blocker. I don't think any request for comment or documentation can ever be considered a blocker. To me it feels like we failed the experiment at it's very start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any request for comment or documentation can ever be considered a blocker.
I very strongly disagree with this. Anyone can write greenfield code that gets the machine to do what they want. The hard, and valuable, bit is writing maintainable code that it readable by your peers, including your future self. Code documentation is a valuable tool when it comes to improving code readability, and IMO reviewers, with their fresh, unfocused eyes are far better judges than authors when it comes to guessing where information is missing.
The fact that there is still confusion around what this function and it's types represent suggests very strongly to me that the code is not self describing and requires further documentation in order for maintainers to safely and efficiently maintain it.
This particular doc request
I think perhaps a large contribution to the continued confusion here is a name clash! The existing code/documentation uses the word index
a lot, and most of the time it is not referring to secondary indexes (the only place where the code is consumed), but to the index of elements in an array.
It is further confused by the magic 0
, that forms part of the encoded result of this path, that has the appearance of an array-location, yet is not. The existence/behaviour of this confuses the example in the documentation for JSONPathPart
for myself, although without looking into the codebase/raw-store this problem will not affect users.
Perhaps we can reduce this confusion by:
- suggestion: Avoid the use of the word
index
in the documentation forJSONPathPart
, for example perhaps talk about thelocation in the array
instead ofarray index
. - suggestion: Change the example in the documentation for
JSONPathPart
to avoid talking about the first index, so that we avoid the ambiguity of the hardcoded0
. Perhaps add a second element to the example array and talk about that instead. And/or note the existence of the magic0
in that documentation and what it represents (that might be out of place though). - suggestion: Add a line to the documentation for
GetPath
noting that it is currently only used internally for the encoding of secondary indexes (please make sure you include the wordsecondary
, to minimise the whole index vs index confusion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think the current documentation is relevant and precise enough. This is not index specific and not related to datastore keys so I don't think its relevant to state that it does not represent a datastore doc path.
op, ok := key.(*mapper.Operator) | ||
if ok && isArrayCondition(op.Operation) { | ||
if op.Operation == compOpNone { | ||
// if the array condition is _none it doesn't make sense to use index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Please document why it does not make sense to use an index for _none
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was obvious. It feel like I'm writing information that has no value.
I have to ask you @AndrewSisley again why you mark this as TODO. Even without the experiment I would disagree with it's TODO status.
It feels like you are not onboard with the experiment and going the opposite direction. I'm a bit frustrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is obvious to you because you wrote it, and have spent considerably more time than most thinking about it.
It is a blocker, because I want code that is easy to read merged into to the codebase. This review took far longer than it needed to because it was not easy to read, it felt like you dumped 600 lines of undocumented code on us with a 3 line PR description. We are both frustrated. If you want quicker reviews, document your work so that other people can read it.
The experiment is not a 'merge whatever you like whether others can work with it or not' sprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke with Fred, he agreed with you that this was not a blocker. He suggested I make it a suggestion, and successfully convinced me that it was suggestion worthy, and that my original comment was information-sparse :)
suggestion: This comment feels incomplete and raises questions in the reader's mind. It would speed up reading if it described why it does not make sense to use an index for _none
.
Perhaps you could could change it to something along the lines of:
// if the array condition is _none it doesn't make sense to use index as we need to do a full table scan anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in person, if we write documentation to make the code more readable, we should make an effort to describe why
the code has this behaviour. In this case, the why
is that the full table scan that would be required on the index would perform worst than the datastore full table scan so we don't use the index. It makes sense to update it in my opinion.
A todo was maybe a little much in this case but I don't think the comment goes against the experiment as we also discussed in person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Relevant issue(s)
Resolves #3329
Description
Mark array elements in a special way so that it's possible to scope the search only to arrays.
Introduces
JSONPath
that helps differenciate parts of json path.Also changes the behavior of
_none
filter operator so that it's exlude resuls if they are not arrays.