Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: skip rate limiting of snapshot events #1383

Merged
merged 4 commits into from
Aug 27, 2024

Conversation

daibhin
Copy link
Contributor

@daibhin daibhin commented Aug 23, 2024

Changes

Adaptation of #1353 based on feedback in #1353 (comment) and https://posthog.slack.com/archives/C03PB072FMJ/p1724078964640479

  • We did not want to adapt the split limit because we know there's too many events, it feels weird to have workaround code
  • Alternative approach: do the rate limiting before calling _captureSnapshot
    • Cool side effect of calling clientRateLimitContext is that it tracks one token which we can think of as representing the $snapshot event
    • That means when it comes to calling _captureSnapshot we can include skip_client_rate_limiting no matter how many individual snapshot events were created during the splitting of the buffer

Copy link

vercel bot commented Aug 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
posthog-js ✅ Ready (Inspect) Visit Preview Aug 27, 2024 1:27pm

Copy link

github-actions bot commented Aug 23, 2024

Size Change: +116 B (+0.01%)

Total Size: 1.17 MB

Filename Size Change
dist/array.full.js 334 kB +29 B (+0.01%)
dist/array.js 155 kB +29 B (+0.02%)
dist/main.js 156 kB +29 B (+0.02%)
dist/module.js 155 kB +29 B (+0.02%)
ℹ️ View Unchanged
Filename Size
dist/exception-autocapture.js 10.4 kB
dist/recorder-v2.js 110 kB
dist/recorder.js 110 kB
dist/surveys-preview.js 59.8 kB
dist/surveys.js 66 kB
dist/tracing-headers.js 8.26 kB
dist/web-vitals.js 5.79 kB

compressed-size-action

@daibhin daibhin requested a review from a team August 27, 2024 09:23
$snapshot_data: snapshotBuffer.data,
$session_id: snapshotBuffer.sessionId,
$window_id: snapshotBuffer.windowId,
const clientRateLimitContext = this.instance.rateLimiter.clientRateLimitContext()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would get us the global bucket?

What I don't want (for example) is someone has an error in their code and exception autocapture rate limits because they're throwing in a loop

And then we can't capture a session recording.

I think we need to have a separate bucket for $snapshot events and a different rate limit for this - it's way chattier than other events

Ultimately I think we might want a separate bucket for errors but that can come with time.

I might be misunderstanding what's going on here ofc...

The benefit of client side rate limiting being event type aware would be it can pick the bucket and we don't need to change the recorder.

I'm picturing

public clientRateLimitContext(checkOnly = false, suffix: string): {
        isRateLimited: boolean
        remainingTokens: number
    } {
        // This is primarily to prevent runaway loops from flooding capture with millions of events for a single user.
        // It's as much for our protection as theirs.
        const now = new Date().getTime()
        const bucket = this.instance.persistence?.get_property(CAPTURE_RATE_LIMIT + '/' + suffix) ?? {
            tokens: this.captureEventsBurstLimit,
            last: now,
        }

or something equivalent to that so that we have a totally separate bucket.

equally you could make this endpoint aware so that it automatically has a different rate limit per endpoint

the other thing... this is pretty new... maybe the limits are wrong and that's the simplest fix is they should be a little higher so that replay doesn't trigger them

(Sorry brain dumping since i'll be afk for hours now :))

@@ -903,6 +909,7 @@ export class SessionRecording {
_url: this.instance.requestRouter.endpointFor('api', this._endpoint),
_noTruncate: true,
_batchKey: SESSION_RECORDING_BATCH_KEY,
skip_client_rate_limiting: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the short term i think we should just get this line in

i can't think of a problem caused by volume of snapshots that wasn't from a pre-batching version of the SDK

so i think we can skip client rate limiting for replay and then figure out a good solution with the pressure off

@daibhin
Copy link
Contributor Author

daibhin commented Aug 27, 2024

@pauldambra (replying to both your comments in one)

I think the point about looping exceptions blocking recordings is fair and could be a potential consequence.

I'd be wary of changing the rate limits because we've seen with recordings there's always been a bigger fish. If we increase the limits to a "sensible amount" there'll always be something just over that amount. In that sense I don't think the limits could be "wrong", more so the case that they can never be enough.

Going to go with the simplest solution of just calling skip_client_rate_limiting for $snapshot events for now given the batching constraint seems to have fixed a lot of issues downstream. If rate limiting becomes a problem in future we at least know what we can do here (most likely implement per bucket rate limits)

@daibhin daibhin requested a review from pauldambra August 27, 2024 14:30
@daibhin daibhin changed the title chore: check rate limit outside of capture method chore: skip rate limiting of snapshot events Aug 27, 2024
Copy link
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

yep, rrweb is so wary of missing events that we want to step carefully here 🙈

@daibhin daibhin added the bump patch Bump patch version when this PR gets merged label Aug 27, 2024
@daibhin daibhin merged commit 5b3894d into main Aug 27, 2024
14 checks passed
@daibhin daibhin deleted the dn-chore/check-rate-limit-outside-capture branch August 27, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump patch Bump patch version when this PR gets merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants