Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader speed optimization #186

Open
alok87 opened this issue Apr 8, 2021 · 4 comments
Open

Loader speed optimization #186

alok87 opened this issue Apr 8, 2021 · 4 comments
Labels
p1 priority 1, do it ASAP performance Monitoring, Metrics, Logs, Benchmarks

Comments

@alok87
Copy link
Contributor

alok87 commented Apr 8, 2021

Redshift Cluster Spec

  • Cluster CPU Utilisation: ~50%
  • Cluster resources: ra3.xlplus/2nodes

Load Speed

maxSizePerBatch LoadMinutes GB/Hour
0.5 3 10
0.5 5 6
0.5 8 3.75
1 10 6
1 9 6.67
1 7 8.57
1 7 8.57
1 6 10
4 21 11.43
4 21 11.43
0.5 4 7.5

The load speed reduces when multiple loads are happening and the max Speed is seen around 11.5GB per hour.

Division of time taken in the load task

Below example is for 8GB maxSizePerBatch

I0401 08:05:12.574673       1 load_processor.go:739] ts.inventory.customers, batchId:1, size:16389: processing...
I0401 08:05:12.574702       1 load_processor.go:646] ts.inventory.customers, batchId:1, startOffset:57150
I0401 08:05:13.119588       1 load_processor.go:701] ts.inventory.customers, load staging
I0401 08:05:21.538138       1 redshift.go:868] Running: COPY from s3 to: customers_ts_adx_reload_staged
I0401 08:34:51.824170       1 load_processor.go:212] ts.inventory.customers, copied staging
I0401 08:36:45.631030       1 load_processor.go:235] ts.inventory.customers, deduped
I0401 08:40:02.744066       1 load_processor.go:254] ts.inventory.customers, deleted common
I0401 08:40:04.206752       1 load_processor.go:273] ts.inventory.customers, deleted delete-op
I0401 08:40:04.216792       1 redshift.go:817] Running: UNLOAD from customers_ts_adx_reload_staged to s3
I0401 08:43:33.241421       1 load_processor.go:323] ts.inventory.customers, unloaded
I0401 08:43:33.241453       1 redshift.go:868] Running: COPY from s3 to: customers_ts_adx_reload
I0401 08:49:11.932393       1 load_processor.go:339] ts.inventory.customers, copied
I0401 08:49:19.985916       1 load_processor.go:151] ts.inventory.customers, offset: 73539, marking
I0401 08:49:19.985935       1 load_processor.go:158] ts.inventory.customers, offset: 73539, marked
I0401 08:49:19.985939       1 load_processor.go:161] ts.inventory.customers, committing (autoCommit=false)
I0401 08:49:19.987312       1 load_processor.go:163] ts.inventory.customers, committed (autoCommit=false)
I0401 08:49:19.987344       1 load_processor.go:768] ts.inventory.customers, batchId:1, size:16389, end:73538:, processed in 44m
Task TimeTaken %
load staging 29mins 65.9%
merge/dedupe 2mins. 4.4%
merge/deleteCommon 4mins 8.8%
merge/deleteOp. 2 seconds 0.07%
unload 3mins 6.8%
load target 6mins 13.6%

Need to find the optimization area and work on optimizing the speed.
Can we load at 100GB/hour?

@alok87 alok87 added the performance Monitoring, Metrics, Logs, Benchmarks label Apr 8, 2021
@alok87 alok87 changed the title RedshiftLoader load speed optimization Loader speed optimization May 5, 2021
@alok87 alok87 added the p1 priority 1, do it ASAP label May 5, 2021
@alok87
Copy link
Contributor Author

alok87 commented May 5, 2021

Found a very weird behaviour:

100Mb loads in 15seconds.
while 1Mb is seen to load in minutes.

Finding the cause and fixing it should improve the performance.

@alok87
Copy link
Contributor Author

alok87 commented May 6, 2021

Redshift slices were not getting utilized, batching 200kb as one batch and loading 10Mb per load increased our speed from 70kb/s to 1Mb/s. This needs to be done for every table, and if operator can calculate and set these values, the perf would be the best. But before that still need to figure out what is the max mb/s it can go.

@alok87
Copy link
Contributor Author

alok87 commented May 7, 2021

The reason for the performance reason was many small files cause a lot of overhead. As described here, each s3 lookup costs ~ 0.5 second. So if you have many small files getting downloaded, the total processing time increasing.

So big batch files ~ 200kb to 1mb increases the performance.

Operator needs to get intelligent to determine these values for every table based on the throughput and the redshift node slices available.

alok87 added a commit that referenced this issue May 7, 2021
@alok87 alok87 mentioned this issue May 13, 2021
3 tasks
@alok87
Copy link
Contributor Author

alok87 commented May 15, 2021

After #240 the copy stage time has dropped by 50%, next the delete common needs to be optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 priority 1, do it ASAP performance Monitoring, Metrics, Logs, Benchmarks
Projects
None yet
Development

No branches or pull requests

1 participant