Google::Cloud::Storage is about twice slower than gsutil cp for downloading files #1897

casperisfine · 2017-12-21T13:56:08Z

Hi, I ran the following benchmark from a GKE container:

#!/usr/bin/env ruby

require 'benchmark'

def measure(&block)
  duration = Benchmark.realtime(&block)
  puts "took: #{duration.round(1)}s"
end

puts "--- Generate 500MB file"
puts system('dd if=/dev/urandom of=/tmp/random-500M.bin bs=1048576 count=500')

url = "gs://<my-test-bucket>/test/random-500M.bin.#{rand}"
puts "--- Upload with gsutil"
measure do
  system("gsutil cp /tmp/random-500M.bin #{url}")
end


puts "--- Download with gsutil"
measure do
  system("gsutil cp #{url} /tmp/random-500M.bin.#{rand}")
end

puts "--- Upload with GCS-ruby"

bucket = Google::Cloud::Storage.new(
  project: '<my-test-project>',
  keyfile: ENV['GOOGLE_APPLICATION_CREDENTIALS'],
  timeout: 5,
  retries: 0,
).bucket('<my-test-bucket>', skip_lookup: true)

bucket = Buildkite::FSCache.send(:bucket)

file_name = "test/random-500M.bin.#{rand}"
measure do
  puts bucket.create_file('/tmp/random-500M.bin', file_name)
end

puts "--- Download with GCS-ruby"
measure do
  puts bucket.file(file_name).download("/tmp/random-500M.bin.#{rand}")
end

In short, upload an then download a random 500MB file, with the following results:

gsutil cp upload 7.1s
gsutil cp download 6.1s
G::C::S upload 4.8s
G::C::S download 13.9s

There obviously a bit of variance between the runs, but G::C::S#download is constantly twice slower than the rest.

Also note that gsutil doesn't use crcmod here (it's printing warnings about it).

I tired looking at gsutil's source to see what it's doing differently, but it's a bit like looking for a needle in a haystack.

Any ideas?

cc @DazWorrall @wvanbergen

The text was updated successfully, but these errors were encountered:

blowmage · 2017-12-21T15:10:15Z

Are you disabling the verification check when downloading? You pass verify: :none to the download method to do this.

https://googlecloudplatform.github.io/google-cloud-ruby/#/docs/google-cloud-storage/v1.9.0/google/cloud/storage/file?method=download-instance

casperisfine · 2017-12-21T15:10:52Z

I'll try this, thanks!

blowmage · 2017-12-21T15:17:48Z

Ugh, the app I use for GitHub notifications didn’t show the benchmark code in the issue description. Now that I’m reading this in a browser I see that you were not disabling the verify check.

It is possible that this verification stuff is the cause for the difference. Calculating MD5 on a 500 MB file will take some time, but we think we are doing this as efficiently as possible.

casperisfine · 2017-12-21T15:18:34Z

I ran the benchmark twice with verify: :none:

run 1: gsutil -> 8.0s, gcs-ruby -> 11.6s
run 1: gsutil -> 6.3s, gcs-ruby -> 13.0s.

So I'm afraid disabling the check only improve things marginally :/

blowmage · 2017-12-21T17:20:29Z

K, I'll look into this. FWIW, google-cloud-storage uses google-api-client to access the API, which uses httpclient for the HTTP calls. So it will take a bit to dive through all the dependencies to see where we can pick up performance.

blowmage · 2017-12-21T17:22:28Z

@casperisfine Oh! One more thing! Your benchmark includes creating the File object, which also makes an API call. Can you update it to something like the following?

  puts bucket.file(file_name, skip_lookup: true).download("/tmp/random-500M.bin.#{rand}", verify: :none)

casperisfine · 2017-12-22T10:45:22Z

Can you update it to something like the following?

Good point, but still no sensible change (~6s vs ~12s). But it's still nice to do one less call so I'll add this to our production system.

which uses httpclient for the HTTP calls. So it will take a bit to dive through all the dependencies to see where we can pick up performance.

Yeah my hunch says the performance difference is more likely to be at that layer. Either some non-optimal buffering, or different compression, etc

What is interesting too is that it impacts only the download performance, the upload performance is fine.

I'll do some more benchmark to see if the perf diff is linear, let me know if there is anything else I can do to help you figure this out.

Also maybe gsutil is doing something smart and the authors would know?

casperisfine · 2017-12-22T11:37:08Z

I did 3 more runs, this time with 100M, 500M and 1G (speadsheet)

There is a lot of variance obviously, but is does appear to be a fairly linear progression:

quartzmo · 2017-12-22T21:17:34Z

Hi @casperisfine, what is this line for?

bucket = Buildkite::FSCache.send(:bucket)

Can you share your entire configuration somewhere so that I can run this script in my own GKE deployment? I'm just trying to reproduce your results first, then I'll try to debug.

DazWorrall · 2017-12-22T21:20:00Z

[context: I work with @casperisfine]

You can ignore that line I think @quartzmo , it looks like some leftover cruft as a result of extracting this code from the project we're using it in. The result is the same as the code immediately above, it returns a Google::Cloud::Storage::Bucket object.

quartzmo · 2017-12-22T21:25:35Z

@DazWorrall, OK, sounds like maybe it isn't easy to share your deployment configuration.

@casperisfine I know my home workstation isn't your deployment target, but here's what I get locally:

--- Upload with gsutil
- [1 files][500.0 MiB/500.0 MiB]    8.7 MiB/s                                   
Operation completed over 1 objects/500.0 MiB.                                    
took: 65.2s
--- Download with gsutil
Copying gs://downloadtest_978004358714/test/random-500M.bin.0.5400586331450901...
\ [1 files][500.0 MiB/500.0 MiB]    6.7 MiB/s                                   
Operation completed over 1 objects/500.0 MiB.                                    
took: 75.3s
--- Upload with GCS-ruby
#<Google::Cloud::Storage::File:0x00007fd63a70d588>
took: 57.4s
--- Download with GCS-ruby
#<File:0x00007fd63a5c7368>
took: 61.9s

byroot · 2017-12-22T22:43:46Z

[context: I am @casperisfine alias]

OK, sounds like maybe it isn't easy to share your deployment configuration.

It's entirely my bad. I used this in my initial benchmark, but that method simply contain what I copy pasted above:

bucket = Google::Cloud::Storage.new(
  project: '<my-test-project>',
  keyfile: ENV['GOOGLE_APPLICATION_CREDENTIALS'],
  timeout: 5,
  retries: 0,
).bucket('<my-test-bucket>', skip_lookup: true)

I know my home workstation isn't your deployment target, but here's what I get locally:

If it can help, I'm benchmarking from a GKE cluster.

blowmage · 2018-01-08T18:53:35Z

I'm still looking into this.

blowmage · 2018-01-09T01:09:33Z

I've tried to reproduce this on several different systems, with several different network connections, and for the vast majority i've found google-cloud-storage to be faster than the gcloud CLI.

The fastest connection has been (not surprising) using a Google Compute Engine virtual machine, the same platform the GKE runs on. After installing the gcloud CLI, ruby, the google-cloud-storage gem, and run several attempts I've collected the following measurements. (The full output log and modified benchmark script can be found here: https://gist.github.com/blowmage/b9e89cd710fcc73117cbc8a2c3ac2cce)

10MB File

Action	Try 1	Try 2	Try 3
gsutil upload	1.8s	2.0s	2.0s
gsutil download	1.1s	1.1s	1.0s
gcs-ruby upload	1.2s	1.0s	1.1s
gcs-ruby download	0.2s	0.3s	0.3s

100MB File

Action	Try 1	Try 2	Try 3
gsutil upload	2.4s	2.6s	3.7s
gsutil download	2.6s	2.9s	3.0s
gcs-ruby upload	1.7s	2.0s	1.8s
gcs-ruby download	1.0s	1.0s	1.0s

250MB File

Action	Try 1	Try 2	Try 3
gsutil upload	4.2s	3.6s	3.8s
gsutil download	3.0s	2.8s	3.0s
gcs-ruby upload	3.5s	3.6s	2.7s
gcs-ruby download	2.2s	2.1s	2.3s

500MB File

Action	Try 1	Try 2	Try 3
gsutil upload	10.5s	6.4s	6.0s
gsutil download	5.5s	4.8s	5.3s
gcs-ruby upload	4.9s	5.5s	5.6s
gcs-ruby download	4.0s	4.0s	4.1s

750MB File

Action	Try 1	Try 2	Try 3
gsutil upload	8.6s	7.2s	9.3s
gsutil download	8.0s	7.5s	8.0s
gcs-ruby upload	7.8s	8.0s	7.6s
gcs-ruby download	6.5s	6.4s	6.2s

1GB File

Action	Try 1	Try 2	Try 3
gsutil upload	13.3s	8.9s	10.0s
gsutil download	9.3s	9.0s	9.2s
gcs-ruby upload	9.9s	10.1s	10.4s
gcs-ruby download	9.6s	10.6s	10.4s

I have not found an appreciable difference between these two libraries, and I am unaware of any low hanging fruit in how HTTPClient is used. Is there more information you can share to help us continue looking into this, or can this response close this issue?

casperisfine · 2018-01-09T10:27:12Z

Could the bucket configuration have an impact here? If so I'm benchmarking with the shopify-ci/shopify-ci bucket which is configured as multi-regional, and the benchmark was ran from us-east1.

I can't think of any other information that could help pinpoint the issue. So if you think it's not it I'll close and look for issues elsewhere in the stack.

blowmage · 2018-01-09T14:02:02Z

I don’t know if bucket configuration would affect this, or why it would affect the gem only and not the cli. But I’m also not a Google employee and have no insight into the Storage service beyond the public documentation. Is there a support channel you can ask that question to?

casperisfine · 2018-01-09T14:04:34Z

Well, thanks for all the time you spent on this. I'll indeed try to go through support with this.

blowmage · 2018-01-09T14:14:33Z

I’m sorry we couldn’t repro the download speed you are seeing. If you discover the cause we would love to know so we can change the Ruby client to avoid it if possible.

casperisfine · 2018-01-09T14:15:41Z

No need to be sorry, that's actually helpful. And I'll make sure to report any further findings here even if there is no modifications to do to the library.

blowmage · 2018-01-09T14:19:26Z

Sounds good. Best of luck!

casperisfine · 2018-01-11T09:32:55Z

I received an update from Google support, they were able to reproduce the issue:

Hello Jean, My initial tests do show a very significant (~7x) difference between downloading via gsutil and the ruby library via the benchmark. I still need to run a couple other tests, and if those also show similar results I will need to bring this up with Engineering. I'll have another update for you no later than Friday. For now I would keep using the gsutil command for large downloads. Sincerely,

🤞

quartzmo · 2018-01-11T16:09:15Z

Thanks for the update!

casperisfine · 2018-01-22T09:20:35Z

Another update:

Engineering is still looking into this issue. I ran a couple more tests and from my end it seems that the ruby download is drastically tied with the amount of CPU that the instance has. I've added this to the backed issue I've filed with engineering. I'll have another update for you next week with any progress that engineering may have.

The CPU part is interesting. Maybe the API used for HTTP end up using write(read()) instead of leveraging sendfile()

casperisfine · 2018-01-23T15:04:15Z

So I reran the download benchmark with a (very dirty) patch to httpclient to use IO.copy_stream (which leverage sendfile() when possible) instead of read_partial.

For 500M, the bench went down from ~15s to >5s, so that is definitely the difference between gsutil and gcs-ruby.

The shitty patch: https://gist.github.com/casperisfine/f7830c20718813a570927f07d6357bfe
And the bench: https://gist.github.com/casperisfine/6e25454406322be4ecde976d15392ca0

There's an open issue in httpclient to use IO.copy_stream: nahi/httpclient#66 but it's very old, and from my investigation I really don't see how it could be done without breaking the current API.

@quartzmo / @blowmage I'd be willing to put time into this if there is interest, but I'd need to know to what lengths a patch could go. Is there a specific reason for using httpclient in the first place, or would migrating to another one that supports copy_stream be acceptable?

I see google-cloud-env has faraday as a dependency, is there plans to migrate to it? (I'm actually ensure how faraday perform on this particular issue).

blowmage · 2018-01-23T16:20:55Z

The httpclient gem is used by google-api-client, the library that is actually making the API calls. It used to use faraday, but there were several issues with faraday. It eventually migrated to hurley, which solved many of the issues, but hurley never gained wide adoption and was eventually abandoned. Not wanting to revert on the many improvements that hurley made possible, the maintainers of google-api-client decided to switch to httpclient.

Around the same time that google-api-client was first started, the gem signet was created to perform the OAuth calls. Like google-api-client, signet was also built on top of faraday. Because the OAuth lifecycle is so simple it has never needed to move away from it.

All authentication for the google-cloud-* gems use the googleauth gem, which is built on top of signet, so we will always have faraday as a dependency. This is why google-cloud-env chose to use faraday for its calls. It has similar HTTP usage to signet, so it made sense to use a library that was already a known dependency.

Okay, history over. I would not expect faraday to bring improved download performance. I see three main options for addressing the performance:

Fix httpclient. Continue to use google-api-client, which continues to use httpclient, and all users benefit from performance gains. But, as you stated, this would likely require an additional API or breaking API compatibility in httpclient. Not a quick solution, likely tied to a future httpclient 3.0 release.
Switch google-api-client to use a library that performs better than httpclient. Unfortunately, while google-api-client is being maintained, it is also not under active development. Also, the last time google-api-client switched HTTP libraries it was necessary to break backwards compatibility.
Have google-cloud-storage drop the use of google-api-client for downloads, and re-implement all functionality, including error handling, retry, incremental backoff, etc.

There are other options, such as maintaining the "shitty patch" somewhere in our code, but they don't seem to be as realistic or viable as the three above. @geigerj, can you offer any guidance on how to proceed? The tl;dr is downloads are CPU-intensive for very large files compared to other implementations.

casperisfine · 2018-01-23T16:26:34Z

Thanks for the history :), much appreciated.

I didn't try very hard to make that patch maintainable. We could certainly do much better.

Another possibility would be to expose a specific API for this in httpclient, since they have an open issue for it, they might be open to the idea. This specific call is likely the only one which would benefit from it.

blowmage · 2018-01-23T18:25:09Z

Another possibility would be to expose a specific API for this in httpclient

Yes, I listed that in my first option. I think adding a new API is a better solution that breaking backwards compatibility, most of the time. But I also think your solution of yielding block arguments of io and offset is far superior to chunk, so as an uninterested observer I would prefer the API to use that. But either solution would be an improvement for sure! :)

casperisfine · 2018-01-24T09:37:40Z

Yeah my bad. It was late -_-.

I'll try to see what's possible in term of API, and submit a PR to httpclient if I have something decent.

casperisfine · 2018-01-24T17:08:43Z

I submitted a PR to httpclient: nahi/httpclient#383

blowmage self-assigned this Dec 21, 2017

blowmage added api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 21, 2017

casperisfine closed this as completed Jan 9, 2018

casperisfine mentioned this issue Jan 24, 2018

Use IO.copy_stream when possible nahi/httpclient#383

Open

lewisf mentioned this issue May 20, 2021

Use IO.copy_stream when possible persona-id/httpclient#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google::Cloud::Storage is about twice slower than gsutil cp for downloading files #1897

Google::Cloud::Storage is about twice slower than gsutil cp for downloading files #1897

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 22, 2017

casperisfine commented Dec 22, 2017

quartzmo commented Dec 22, 2017

DazWorrall commented Dec 22, 2017 •

edited

Loading

quartzmo commented Dec 22, 2017 •

edited

Loading

byroot commented Dec 22, 2017

blowmage commented Jan 8, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 11, 2018

quartzmo commented Jan 11, 2018

casperisfine commented Jan 22, 2018

casperisfine commented Jan 23, 2018

blowmage commented Jan 23, 2018

casperisfine commented Jan 23, 2018

blowmage commented Jan 23, 2018

casperisfine commented Jan 24, 2018

casperisfine commented Jan 24, 2018

Google::Cloud::Storage is about twice slower than gsutil cp for downloading files #1897

Google::Cloud::Storage is about twice slower than gsutil cp for downloading files #1897

Comments

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 21, 2017

blowmage commented Dec 21, 2017

blowmage commented Dec 21, 2017

casperisfine commented Dec 22, 2017

casperisfine commented Dec 22, 2017

quartzmo commented Dec 22, 2017

DazWorrall commented Dec 22, 2017 • edited Loading

quartzmo commented Dec 22, 2017 • edited Loading

byroot commented Dec 22, 2017

blowmage commented Jan 8, 2018

blowmage commented Jan 9, 2018

10MB File

100MB File

250MB File

500MB File

750MB File

1GB File

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 9, 2018

blowmage commented Jan 9, 2018

casperisfine commented Jan 11, 2018

quartzmo commented Jan 11, 2018

casperisfine commented Jan 22, 2018

casperisfine commented Jan 23, 2018

blowmage commented Jan 23, 2018

casperisfine commented Jan 23, 2018

blowmage commented Jan 23, 2018

casperisfine commented Jan 24, 2018

casperisfine commented Jan 24, 2018

DazWorrall commented Dec 22, 2017 •

edited

Loading

quartzmo commented Dec 22, 2017 •

edited

Loading