From 495669031b5c78d10d076f4dc3388283d4d8d81c Mon Sep 17 00:00:00 2001 From: NAITOH Jun Date: Sun, 8 Dec 2024 20:59:47 +0900 Subject: [PATCH] Use `StringScanner#peek_byte` to get double or single quotation mark ## Why? `StringScanner#peek_byte` is fast, because it does not generate String object. ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 19.753 19.888 35.641 35.928 i/s - 100.000 times in 5.062402s 5.028121s 2.805792s 2.783339s sax 30.349 30.978 53.485 57.885 i/s - 100.000 times in 3.295012s 3.228103s 1.869671s 1.727567s pull 34.170 35.436 61.713 66.534 i/s - 100.000 times in 2.926534s 2.821955s 1.620404s 1.502996s stream 33.121 35.268 60.751 63.276 i/s - 100.000 times in 3.019222s 2.835443s 1.646065s 1.580374s Comparison: dom after(YJIT): 35.9 i/s before(YJIT): 35.6 i/s - 1.01x slower after: 19.9 i/s - 1.81x slower before: 19.8 i/s - 1.82x slower sax after(YJIT): 57.9 i/s before(YJIT): 53.5 i/s - 1.08x slower after: 31.0 i/s - 1.87x slower before: 30.3 i/s - 1.91x slower pull after(YJIT): 66.5 i/s before(YJIT): 61.7 i/s - 1.08x slower after: 35.4 i/s - 1.88x slower before: 34.2 i/s - 1.95x slower stream after(YJIT): 63.3 i/s before(YJIT): 60.8 i/s - 1.04x slower after: 35.3 i/s - 1.79x slower before: 33.1 i/s - 1.91x slower ``` - YJIT=ON : 1.01x - 1.08x faster - YJIT=OFF : 1.00x - 1.06x faster Co-authored-by: Sutou Kouhei --- lib/rexml/parsers/baseparser.rb | 22 ++++++++++++++++++++-- lib/rexml/source.rb | 8 ++++++++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb index 90851bb1..13cdd821 100644 --- a/lib/rexml/parsers/baseparser.rb +++ b/lib/rexml/parsers/baseparser.rb @@ -766,6 +766,25 @@ def process_instruction [:processing_instruction, name, content] end + if StringScanner::Version < "3.1.1" + def scan_quote + @source.match(/(['"])/, true)&.[](1) + end + else + def scan_quote + case @source.peek_byte + when 34 # '"'.ord + @source.scan_byte + '"' + when 39 # "'".ord + @source.scan_byte + "'" + else + nil + end + end + end + def parse_attributes(prefixes) attributes = {} expanded_names = {} @@ -785,11 +804,10 @@ def parse_attributes(prefixes) message = "Missing attribute equal: <#{name}>" raise REXML::ParseException.new(message, @source) end - unless match = @source.match(/(['"])/, true) + unless quote = scan_quote message = "Missing attribute value start quote: <#{name}>" raise REXML::ParseException.new(message, @source) end - quote = match[1] start_position = @source.position value = @source.read_until(quote) unless value.chomp!(quote) diff --git a/lib/rexml/source.rb b/lib/rexml/source.rb index 2409f76e..5ba5ab12 100644 --- a/lib/rexml/source.rb +++ b/lib/rexml/source.rb @@ -158,6 +158,14 @@ def position=(pos) @scanner.pos = pos end + def peek_byte + @scanner.peek_byte + end + + def scan_byte + @scanner.scan_byte + end + # @return true if the Source is exhausted def empty? @scanner.eos?