Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: skip rate limiting of snapshot events #1383

Merged
merged 4 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion src/__tests__/extensions/replay/sessionrecording.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ describe('SessionRecording', () => {
const loadScriptMock = jest.fn()
let _emit: any
let posthog: PostHog
let onCapture: Mock
let clientRateLimitContext: Mock
let sessionRecording: SessionRecording
let sessionId: string
let sessionManager: SessionIdManager
Expand Down Expand Up @@ -172,12 +174,19 @@ describe('SessionRecording', () => {

sessionManager = new SessionIdManager(config, postHogPersistence, sessionIdGeneratorMock, windowIdGeneratorMock)

onCapture = jest.fn()

clientRateLimitContext = jest.fn().mockReturnValue({
isRateLimited: false,
remainingTokens: 10,
})

posthog = {
get_property: (property_key: string): Property | undefined => {
return postHogPersistence?.['props'][property_key]
},
config: config,
capture: jest.fn(),
capture: onCapture,
persistence: postHogPersistence,
onFeatureFlags: (cb: (flags: string[]) => void) => {
onFeatureFlagsCallback = cb
Expand All @@ -186,6 +195,7 @@ describe('SessionRecording', () => {
requestRouter: new RequestRouter({ config } as any),
_addCaptureHook: jest.fn(),
consent: { isOptedOut: () => false },
rateLimiter: { clientRateLimitContext: clientRateLimitContext },
} as unknown as PostHog

loadScriptMock.mockImplementation((_path, callback) => {
Expand Down Expand Up @@ -732,6 +742,7 @@ describe('SessionRecording', () => {
_url: 'https://test.com/s/',
_noTruncate: true,
_batchKey: 'recordings',
skip_client_rate_limiting: true,
}
)
})
Expand Down Expand Up @@ -767,6 +778,7 @@ describe('SessionRecording', () => {
_url: 'https://test.com/s/',
_noTruncate: true,
_batchKey: 'recordings',
skip_client_rate_limiting: true,
}
)
})
Expand Down Expand Up @@ -849,6 +861,7 @@ describe('SessionRecording', () => {
_url: 'https://test.com/s/',
_noTruncate: true,
_batchKey: 'recordings',
skip_client_rate_limiting: true,
}
)

Expand Down Expand Up @@ -1514,6 +1527,7 @@ describe('SessionRecording', () => {
_batchKey: 'recordings',
_noTruncate: true,
_url: 'https://test.com/s/',
skip_client_rate_limiting: true,
}
)

Expand Down Expand Up @@ -1607,6 +1621,7 @@ describe('SessionRecording', () => {
_batchKey: 'recordings',
_noTruncate: true,
_url: 'https://test.com/s/',
skip_client_rate_limiting: true,
}
)

Expand Down Expand Up @@ -1635,6 +1650,7 @@ describe('SessionRecording', () => {
_batchKey: 'recordings',
_noTruncate: true,
_url: 'https://test.com/s/',
skip_client_rate_limiting: true,
}
)
expect(sessionRecording['buffer']).toEqual({
Expand Down Expand Up @@ -1950,4 +1966,36 @@ describe('SessionRecording', () => {
expect((sessionRecording as any)['_tryAddCustomEvent']).not.toHaveBeenCalled()
})
})

describe('when client is rate limited', () => {
beforeEach(() => {
clientRateLimitContext.mockReturnValue({
isRateLimited: true,
remainingTokens: 0,
})
})

it('does not capture if rate limit is in place', () => {
jest.useFakeTimers()
jest.setSystemTime(Date.now())
sessionRecording.startIfEnabledOrStop()
sessionRecording.afterDecideResponse(makeDecideResponse({ sessionRecording: { endpoint: '/s/' } }))

for (let i = 0; i < 50; i++) {
posthog.capture('custom_event')
}
onCapture.mockClear()
console.error = jest.fn()

_emit(createFullSnapshot())
// access private method 🤯so we don't need to wait for the timer
sessionRecording['_flushBuffer']()

expect(posthog.capture).not.toHaveBeenCalled()
expect(console.error).toHaveBeenCalledWith(
'[PostHog.js]',
'Cannot capture $snapshot event due to client rate limiting.'
)
})
})
})
23 changes: 15 additions & 8 deletions src/extensions/replay/sessionrecording.ts
Original file line number Diff line number Diff line change
Expand Up @@ -863,15 +863,21 @@ export class SessionRecording {
}

if (this.buffer.data.length > 0) {
const snapshotEvents = splitBuffer(this.buffer)
snapshotEvents.forEach((snapshotBuffer) => {
this._captureSnapshot({
$snapshot_bytes: snapshotBuffer.size,
$snapshot_data: snapshotBuffer.data,
$session_id: snapshotBuffer.sessionId,
$window_id: snapshotBuffer.windowId,
const clientRateLimitContext = this.instance.rateLimiter.clientRateLimitContext()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would get us the global bucket?

What I don't want (for example) is someone has an error in their code and exception autocapture rate limits because they're throwing in a loop

And then we can't capture a session recording.

I think we need to have a separate bucket for $snapshot events and a different rate limit for this - it's way chattier than other events

Ultimately I think we might want a separate bucket for errors but that can come with time.

I might be misunderstanding what's going on here ofc...

The benefit of client side rate limiting being event type aware would be it can pick the bucket and we don't need to change the recorder.

I'm picturing

public clientRateLimitContext(checkOnly = false, suffix: string): {
        isRateLimited: boolean
        remainingTokens: number
    } {
        // This is primarily to prevent runaway loops from flooding capture with millions of events for a single user.
        // It's as much for our protection as theirs.
        const now = new Date().getTime()
        const bucket = this.instance.persistence?.get_property(CAPTURE_RATE_LIMIT + '/' + suffix) ?? {
            tokens: this.captureEventsBurstLimit,
            last: now,
        }

or something equivalent to that so that we have a totally separate bucket.

equally you could make this endpoint aware so that it automatically has a different rate limit per endpoint

the other thing... this is pretty new... maybe the limits are wrong and that's the simplest fix is they should be a little higher so that replay doesn't trigger them

(Sorry brain dumping since i'll be afk for hours now :))


if (!clientRateLimitContext?.isRateLimited) {
const snapshotEvents = splitBuffer(this.buffer)
snapshotEvents.forEach((snapshotBuffer) => {
this._captureSnapshot({
$snapshot_bytes: snapshotBuffer.size,
$snapshot_data: snapshotBuffer.data,
$session_id: snapshotBuffer.sessionId,
$window_id: snapshotBuffer.windowId,
})
})
})
} else {
logger.critical('Cannot capture $snapshot event due to client rate limiting.')
}
}

// buffer is empty, we clear it in case the session id has changed
Expand Down Expand Up @@ -903,6 +909,7 @@ export class SessionRecording {
_url: this.instance.requestRouter.endpointFor('api', this._endpoint),
_noTruncate: true,
_batchKey: SESSION_RECORDING_BATCH_KEY,
skip_client_rate_limiting: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the short term i think we should just get this line in

i can't think of a problem caused by volume of snapshots that wasn't from a pre-batching version of the SDK

so i think we can skip client rate limiting for replay and then figure out a good solution with the pressure off

})
}

Expand Down
Loading