-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loader speed optimization #186
Comments
Found a very weird behaviour: 100Mb loads in 15seconds. Finding the cause and fixing it should improve the performance. |
Redshift slices were not getting utilized, batching 200kb as one batch and loading 10Mb per load increased our speed from 70kb/s to 1Mb/s. This needs to be done for every table, and if operator can calculate and set these values, the perf would be the best. But before that still need to figure out what is the max mb/s it can go. |
The reason for the performance reason was many small files cause a lot of overhead. As described here, each s3 lookup costs ~ 0.5 second. So if you have many small files getting downloaded, the total processing time increasing. So big batch files ~ 200kb to 1mb increases the performance. Operator needs to get intelligent to determine these values for every table based on the throughput and the redshift node slices available. |
After #240 the copy stage time has dropped by 50%, next the delete common needs to be optimized. |
Redshift Cluster Spec
Load Speed
The load speed reduces when multiple loads are happening and the max Speed is seen around 11.5GB per hour.
Division of time taken in the load task
Below example is for 8GB
maxSizePerBatch
Need to find the optimization area and work on optimizing the speed.
Can we load at 100GB/hour?
The text was updated successfully, but these errors were encountered: