Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
YAML compile cache: encoding aware symbols
Browse files Browse the repository at this point in the history
Ref: msgpack/msgpack-ruby#211

The default msgpack Symbol packer/unpacker is not encoding
aware which cause all non-ASCII symbols to be unpacked with
ASCII-8BIT encoding aka BINARY.

So we define a custom packer that prefix the symbol name with the encoding
index. Note that the encoding index isn't fixed across ruby platforms
and version, but the cache versioning should protect us from that.

An alternative could be to simply assume non-ASCII symbols are UTF-8,
but it wouldn't work for people with non UTF-8 source files.
byroot committed Jan 28, 2022
1 parent e3ef615 commit b433b5f
Showing 4 changed files with 41 additions and 3 deletions.
2 changes: 1 addition & 1 deletion ext/bootsnap/bootsnap.c
Original file line number Diff line number Diff line change
@@ -75,7 +75,7 @@ struct bs_cache_key {
STATIC_ASSERT(sizeof(struct bs_cache_key) == KEY_SIZE);

/* Effectively a schema version. Bumping invalidates all previous caches */
static const uint32_t current_version = 4;
static const uint32_t current_version = 5;

/* hash of e.g. "x86_64-darwin17", invalidating when ruby is recompiled on a
* new OS ABI, etc. */
34 changes: 33 additions & 1 deletion lib/bootsnap/compile_cache/yaml.rb
Original file line number Diff line number Diff line change
@@ -29,6 +29,33 @@ def install!(cache_dir)
::YAML.singleton_class.prepend(@implementation::Patch)
end

module EncodingAwareSymbols
extend self

ENCODINGS = Encoding.list.freeze
ENCODINGS_INDEX = {}.compare_by_identity
ENCODINGS.each_with_index do |enc, index|
ENCODINGS_INDEX[enc] = -(index.chr.force_encoding(Encoding::BINARY))
end

if Symbol.method_defined?(:name)
def pack(symbol)
ENCODINGS_INDEX[symbol.encoding].dup << symbol.name
end
else
def pack(symbol)
ENCODINGS_INDEX[symbol.encoding].dup << symbol.to_s
end
end

def unpack(payload)
payload.freeze
string = payload.byteslice(1..-1)
string.force_encoding(ENCODINGS[payload.ord])
string.to_sym
end
end

def init!
require("yaml")
require("msgpack")
@@ -43,7 +70,12 @@ def init!
# We want them to roundtrip cleanly, so we use a custom factory.
# see: https://github.com/msgpack/msgpack-ruby/pull/122
factory = MessagePack::Factory.new
factory.register_type(0x00, Symbol)
factory.register_type(
0x00,
Symbol,
packer: EncodingAwareSymbols.method(:pack).to_proc,
unpacker: EncodingAwareSymbols.method(:unpack).to_proc,
)

if defined? MessagePack::Timestamp
factory.register_type(
6 changes: 6 additions & 0 deletions test/compile_cache/yaml_test.rb
Original file line number Diff line number Diff line change
@@ -49,6 +49,12 @@ def test_yaml_tags
assert_equal "YAML tags are not supported: !ruby/object", error.message
end

def test_symbols_encoding
symbols = [:ascii, :utf8_fée]
Help.set_file("a.yml", YAML.dump(symbols), 100)
assert_equal(symbols, FakeYaml.load_file("a.yml"))
end

if YAML::VERSION >= "4"
def test_load_psych_4_with_alias
Help.set_file("a.yml", "foo: &foo\n bar: 42\nplop:\n <<: *foo", 100)
2 changes: 1 addition & 1 deletion test/compile_cache_key_format_test.rb
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ class CompileCacheKeyFormatTest < Minitest::Test

def test_key_version
key = cache_key_for_file(FILE)
exp = [4].pack("L")
exp = [5].pack("L")
assert_equal(exp, key[R[:version]])
end

0 comments on commit b433b5f

Please sign in to comment.