Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ES integration test race conditions #6229

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

madmecodes
Copy link

Fix ES integration test race conditions

Which problem is this PR solving?

Resolves #6094 - ES integration test race condition where spans are written before templates are fully created.

Description of the changes

  1. Added template creation synchronization in factory.go:

    • Added 30s timeout context
    • Added retry mechanism with exponential backoff
    • Ensure templates are created before returning writer
  2. Updated writer.go's CreateTemplates:

    • Added context parameter for proper timeout handling
    • Using passed context in ES client calls
  3. Sequence is now:

Before: Start -> Create Writer -> Start Template Creation -> Return Writer -> Write Spans (fails since template still creating)
After: Start -> Create Writer -> Create Templates -> Retry if needed -> Templates Ready -> Return Writer -> Write Spans

- Add template creation retry with timeout in factory
- Add context to CreateTemplates in writer
- Ensure templates are created before returning writer

Fixes jaegertracing#6094
@madmecodes madmecodes requested a review from a team as a code owner November 20, 2024 16:47
@madmecodes
Copy link
Author

@yurishkuro will be great if you can review the changes, looking forward for your feedback.

@yurishkuro
Copy link
Member

please make sure all commits are signed off

case <-ctx.Done():
return fmt.Errorf("template creation timeout: %w", ctx.Err())
default:
if err := writer.CreateTemplates(ctx, spanMapping, serviceMapping, indexPrefix); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing how this addresses the original issue. The assumption was that when we call CreateTemplates it may return before the templates are actually created in the ES, and the immediately-after span writes might fail. Doing retries does not address that.

I would also like to understand if that assumption is correct in the first place and why it might happen - is it indeed that ES acks a success before the template changes are actually applied to the db? Or could it be our fault, e.g. by using an async connection? We do something funky with batch inserts that happens in the background, that could be the source of a race.

return nil, err
}
}
return writer, nil
}

func CreateTemplatesWithRetry(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func CreateTemplatesWithRetry(
func createTemplatesWithRetry(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES 8.x / v2 integration test often fails
2 participants