-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache header normalization to reduce object allocation #789
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use cache, we should ensure that the normalized names in the cache are immutable:
name = -name.to_s
@cache[name] ||= -normalize_header(name)
Otherwise it might cause unpleasant surprise, when user uses something like:
response.headers.to_h.keys.each(&:upcase!)
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
After spectacular failure with LRU cache implementation (thread safe variant I could think of right now is way too slow), WDYT about this patch:
diff --git a/lib/http/headers.rb b/lib/http/headers.rb
index 89ff78b..5f28561 100644
--- a/lib/http/headers.rb
+++ b/lib/http/headers.rb
@@ -13,6 +13,33 @@ module HTTP
extend Forwardable
include Enumerable
+ class << self
+ # Coerces given `object` into Headers.
+ #
+ # @raise [Error] if object can't be coerced
+ # @param [#to_hash, #to_h, #to_a] object
+ # @return [Headers]
+ def coerce(object)
+ unless object.is_a? self
+ object = case
+ when object.respond_to?(:to_hash) then object.to_hash
+ when object.respond_to?(:to_h) then object.to_h
+ when object.respond_to?(:to_a) then object.to_a
+ else raise Error, "Can't coerce #{object.inspect} to Headers"
+ end
+ end
+
+ headers = new
+ object.each { |k, v| headers.add k, v }
+ headers
+ end
+ alias [] coerce
+
+ def normalizer
+ @normalizer ||= Headers::Normalizer.new
+ end
+ end
+
# Class constructor.
def initialize
# The @pile stores each header value using a three element array:
@@ -188,40 +215,11 @@ module HTTP
dup.tap { |dupped| dupped.merge! other }
end
- class << self
- # Coerces given `object` into Headers.
- #
- # @raise [Error] if object can't be coerced
- # @param [#to_hash, #to_h, #to_a] object
- # @return [Headers]
- def coerce(object)
- unless object.is_a? self
- object = case
- when object.respond_to?(:to_hash) then object.to_hash
- when object.respond_to?(:to_h) then object.to_h
- when object.respond_to?(:to_a) then object.to_a
- else raise Error, "Can't coerce #{object.inspect} to Headers"
- end
- end
-
- headers = new
- object.each { |k, v| headers.add k, v }
- headers
- end
- alias [] coerce
- end
-
private
- class << self
- def header_normalizer
- @header_normalizer ||= Headers::Normalizer.new
- end
- end
-
# Transforms `name` to canonical HTTP header capitalization
def normalize_header(name)
- self.class.header_normalizer.normalize(name)
+ self.class.normalizer.call(name)
end
# Ensures there is no new line character in the header value
diff --git a/lib/http/headers/normalizer.rb b/lib/http/headers/normalizer.rb
index 7c260d0..e3eb3de 100644
--- a/lib/http/headers/normalizer.rb
+++ b/lib/http/headers/normalizer.rb
@@ -10,15 +10,41 @@ module HTTP
# @see http://tools.ietf.org/html/rfc7230#section-3.2
COMPLIANT_NAME_RE = /\A[A-Za-z0-9!#$%&'*+\-.^_`|~]+\z/
- MAX_CACHE_SIZE = 200
+ NAME_PARTS_SEPARATOR_RE = /[\-_]/
+
+ # @private
+ # Normalized header names cache
+ class Cache
+ MAX_SIZE = 200
+
+ def initialize
+ @store = {}
+ end
+
+ def get(key)
+ @store[key]
+ end
+ alias [] get
+
+ def set(key, value)
+ # Maintain cache size
+ @store.delete(@store.each_key.first) while MAX_SIZE <= @store.size
+
+ @store[key] = value
+ end
+ alias []= set
+ end
def initialize
- @cache = Cache.new(MAX_CACHE_SIZE)
+ @cache = Cache.new
end
# Transforms `name` to canonical HTTP header capitalization
- def normalize(name)
- @cache[name] ||= normalize_header(name)
+ def call(name)
+ name = -name.to_s
+ value = (@cache[name] ||= -normalize_header(name))
+
+ value.dup
end
private
@@ -32,51 +58,11 @@ module HTTP
def normalize_header(name)
return name if CANONICAL_NAME_RE.match?(name)
- normalized = name.split(/[\-_]/).each(&:capitalize!).join("-")
-
+ normalized = name.split(NAME_PARTS_SEPARATOR_RE).each(&:capitalize!).join("-")
return normalized if COMPLIANT_NAME_RE.match?(normalized)
raise HeaderError, "Invalid HTTP header field name: #{name.inspect}"
end
-
- class Cache
- def initialize(max_size)
- @max_size = max_size
- @cache = {}
- end
-
- def get(key)
- @cache[key]
- end
-
- def set(key, value)
- @cache[key] = value
-
- # Maintain cache size
- return unless @cache.size > @max_size
-
- oldest_key = @cache.keys.first
- @cache.delete(oldest_key)
- end
-
- def size
- @cache.size
- end
-
- def key?(key)
- @cache.key?(key)
- end
-
- def [](key)
- get(key)
- end
-
- def []=(key, value)
- set(key, value)
- end
- end
-
- private_constant :Cache
end
end
end PS: I think for specs we should add UPDATE: Updated proposed patch |
@ixti Thank you for the review and ideas! I took your patch and updated the tests, including:
Please let me know what you think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you!
@ixti might be good to cut a release. There are a few merged PRs that haven't been released yet |
Thanks for the help here @ixti and the really fast reviews and feedback! A release would be awesome - I'd love to try this in production. |
I'm using this gem to make a lot of requests in a multi-threaded environment. I've noticed that header normalization contributes to a large percentage of the overall objects allocated.
I expect that most services will come across some of the same headers again and again. In my case, I haven't counted the different headers that are encountered in production, but I only issue traffic to the same service so I am confident that I'm parsing the same headers many times.
This PR introduces a cache for header normalization to reduce object allocation. Here's a benchmark script to validate the improvement (you can put this in the root of the repository as bench.rb, then run with
bundle exec ruby bench.rb
for example):Results on main
Results on this branch
Analysis
I wanted to use
Benchmark.ips
to make sure what I'm adding doesn't regress performance. This PR improves performance on the benchmark because a singleHeaderNormalizer
instance is memoized in theHeader
class and reused for all the benchmarks, so once we've normalized a header once, there is no additional work the next time.However, what I really care about is the object allocations. We can see that this PR increases the retained memory, as expected, since we are now persisting header normalized values in a cache. But the win here is the reduction in objects allocated.
Follow up
I just wanted to get this out for your feedback! Please let me know what you think and whether you're open to something like this in the gem. If not, perhaps we can explore a pattern that would make it easier for me to extend the gem...