Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk import performance reported incorrectly #4135

Closed
patchwork01 opened this issue Jan 24, 2025 · 0 comments · Fixed by #4145
Closed

Bulk import performance reported incorrectly #4135

patchwork01 opened this issue Jan 24, 2025 · 0 comments · Fixed by #4145
Assignees
Labels
bug Something isn't working
Milestone

Comments

@patchwork01
Copy link
Collaborator

patchwork01 commented Jan 24, 2025

Description

Before the change EmrBulkImportPerformanceST was reliably over 3.5 million records per second, it's now around 2.2 million per second. This has failed on every performance test since the morning of 22nd January. The last successful performance test was on the morning of 20th January.

The PR where this was changed also stopped setting the maximum connections to S3 explicitly in the system tests. It was set to 25 connections, it now uses the default of 100. This could have caused the problem. UPDATE: After retesting on m7i, the performance was still at 2.2 million records/s, and still the same with 25 max connections on m7g. It seems the move to Graviton and the max connections change were not the problem.

There was also a PR at a similar time that changed how the start and finish time are reported in the job tracker. This seems likely to have caused the problem. It seems like instead of taking the time the job started and finished in the Spark driver, it's taking the start time to be the time the job was received in the starter lambda.

Steps to reproduce

  1. Run EmrBulkImportPerformanceST
  2. See error

Expected behaviour

The test should pass.

For performance calculation, the reporting code should count the time a bulk import job started as the time it started in the Spark driver, rather than the time it was accepted in the job starter lambda.

Background

Possibly introduced by:

Here are all the PRs that were merged between the passing test and the failing test:

@patchwork01 patchwork01 added the bug Something isn't working label Jan 24, 2025
@patchwork01 patchwork01 added this to the 0.28.0 milestone Jan 24, 2025
@patchwork01 patchwork01 self-assigned this Jan 24, 2025
@patchwork01 patchwork01 changed the title Bulk import performance test failing after switch to Graviton Bulk import performance reported incorrectly Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant