You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before the change EmrBulkImportPerformanceST was reliably over 3.5 million records per second, it's now around 2.2 million per second. This has failed on every performance test since the morning of 22nd January. The last successful performance test was on the morning of 20th January.
The PR where this was changed also stopped setting the maximum connections to S3 explicitly in the system tests. It was set to 25 connections, it now uses the default of 100. This could have caused the problem. UPDATE: After retesting on m7i, the performance was still at 2.2 million records/s, and still the same with 25 max connections on m7g. It seems the move to Graviton and the max connections change were not the problem.
There was also a PR at a similar time that changed how the start and finish time are reported in the job tracker. This seems likely to have caused the problem. It seems like instead of taking the time the job started and finished in the Spark driver, it's taking the start time to be the time the job was received in the starter lambda.
Steps to reproduce
Run EmrBulkImportPerformanceST
See error
Expected behaviour
The test should pass.
For performance calculation, the reporting code should count the time a bulk import job started as the time it started in the Spark driver, rather than the time it was accepted in the job starter lambda.
Description
Before the change EmrBulkImportPerformanceST was reliably over 3.5 million records per second, it's now around 2.2 million per second. This has failed on every performance test since the morning of 22nd January. The last successful performance test was on the morning of 20th January.
The PR where this was changed also stopped setting the maximum connections to S3 explicitly in the system tests. It was set to 25 connections, it now uses the default of 100. This could have caused the problem. UPDATE: After retesting on m7i, the performance was still at 2.2 million records/s, and still the same with 25 max connections on m7g. It seems the move to Graviton and the max connections change were not the problem.
There was also a PR at a similar time that changed how the start and finish time are reported in the job tracker. This seems likely to have caused the problem. It seems like instead of taking the time the job started and finished in the Spark driver, it's taking the start time to be the time the job was received in the starter lambda.
Steps to reproduce
Expected behaviour
The test should pass.
For performance calculation, the reporting code should count the time a bulk import job started as the time it started in the Spark driver, rather than the time it was accepted in the job starter lambda.
Background
Possibly introduced by:
Here are all the PRs that were merged between the passing test and the failing test:
The text was updated successfully, but these errors were encountered: