Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_after param (used to paginate hits e.g. in Discover), 1/2 strategies #1104

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

trzysiek
Copy link
Member

@trzysiek trzysiek commented Dec 12, 2024

#1099 1/2 PR

After this PR instead of discarding search_after param, we simply add tuple(sort_field_1, sort_field_2, ...)<(search_after_1, search_after_2, ...) to the WHERE clause, so we'll (almost) properly return paginated hits (never wrong ones, but we can skip some valid ones)
If sort_field_i is asc, we swap sort_field_i with search_after_i in the expression above, I think it gives exactly what we want.

Timestamp fields use fromUnixTimestampMilli64 conversion - it'll work because the search_afterparam is what we returned in the previous request ourselves, for previous hits. And for dates/timestamps we always return unix_timestamp in ms.

This strategy is not fully correct, because we'll skip all hits with exactly the same sort values (timestamps in ms, usually), but it's some progress anyway. Before this PR, queries with this param basically didn't work at all, now they have a small bug.

TODO (I'll finish in next PR):

  • Finish Bulletproof strategy, which fixes the limitation above. Should be quick as I already got a lot of tests here, and there's not really a lot of code in the logic itself.
  • Fix conversion function, we always use from...Milli64. It'll always work for Clickhouse, but for Hydrolix and DateTime field (without 64), it won't. It's a known issue, quick to fix. I'd do it in another PR, as before this one, search_after didn't work anyway, so there's no regression here.

Screens of this working after all the commits:
Screenshot 2024-12-27 at 00 58 42
Screenshot 2024-12-27 at 00 58 57

@trzysiek trzysiek changed the title search_after param (used to paginate hits e.g. in Discover) search_after param #1 (used to paginate hits e.g. in Discover) Dec 13, 2024
@trzysiek trzysiek marked this pull request as ready for review December 13, 2024 15:03
@trzysiek trzysiek requested a review from a team as a code owner December 13, 2024 15:03
@trzysiek trzysiek changed the title search_after param #1 (used to paginate hits e.g. in Discover) search_after param (used to paginate hits e.g. in Discover), part 1 Dec 13, 2024
Copy link

cloudflare-workers-and-pages bot commented Dec 16, 2024

Deploying quesma with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7afebeb
Status: ✅  Deploy successful!
Preview URL: https://80d7440c.quesma.pages.dev
Branch Preview URL: https://search-after-2.quesma.pages.dev

View logs

Copy link
Contributor

@jakozaur jakozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

  1. Having more generic timestamp method accomodating for different types would be awesome.
  2. Not sure what if we don't have timestamp.
  3. Tests.

quesma/model/search_after.go Outdated Show resolved Hide resolved
if searchAfter.timestampMs == emptySearchAfterTs {
return query
}
timestampRangeClause := NewInfixExpr(s.timestampField, "<=", NewFunction("fromUnixTimestamp64Milli", NewLiteral(searchAfter.timestampMs))) // TODO fix this for Hydrolix...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always correct, as we might have timestamps in seconds, not milliseconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always correct, because we read up the value from this:

func FormatSortValue(i any) any {
	switch v := i.(type) {
	case time.Time:
		// When returned as part of `sort`, timestamps are always returned as millis
		return v.UnixMilli()
	default:
		return i
	}
}

As you see, if it's present, it'll always be milliseconds

Copy link
Member Author

@trzysiek trzysiek Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I definitely agree that the way it works and why it does, is way too cryptic.

quesma/model/search_after.go Outdated Show resolved Hide resolved
if searchAfterTs == emptySearchAfterTs {
return query
}
timestampRangeClause := NewInfixExpr(s.timestampField, "<", NewFunction("fromUnixTimestamp64Milli", NewLiteral(searchAfterTs)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

quesma/model/search_after.go Outdated Show resolved Hide resolved
@trzysiek trzysiek marked this pull request as draft December 16, 2024 12:42
@trzysiek
Copy link
Member Author

trzysiek commented Dec 16, 2024

  1. Not sure what if we don't have timestamp.

Tested that, and
a) we will return empty sort field for every hit
b) even if we have timestamp, we can sort by some other field, and my solution won't work anyway
edit: now it will

So unfortunately there's some TODO to do.

@trzysiek trzysiek force-pushed the search-after branch 3 times, most recently from e010e6d to 44c3016 Compare December 26, 2024 21:39
@trzysiek trzysiek changed the title search_after param (used to paginate hits e.g. in Discover), part 1 search_after param (used to paginate hits e.g. in Discover), 1/2 strategies Dec 26, 2024
@trzysiek
Copy link
Member Author

trzysiek commented Dec 27, 2024

  1. Having more generic timestamp method accomodating for different types would be awesome.

Agreed, but let me do that in a quick next PR, this one already is quite big, wouldn't like to mess in a couple more places.

@trzysiek trzysiek marked this pull request as ready for review December 27, 2024 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants