Tweak generated class methods .lookup and .resolve for enums to prevent StackError when formatting #172

davebenvenuti · 2025-01-24T21:36:07Z

I ran across an issue when working with an enum definition with thousands of values. SyntaxTree ran into a "nesting too deep" error when attempting to format the code:

1) Error:
ProtoBoeuf::CodeGenTest#test_big_enum:
SyntaxTree::Parser::ParseError: nesting too deep
    /Users/dave/.gem/ruby/3.3.4/gems/syntax_tree-6.2.0/lib/syntax_tree/parser.rb:2864:in `on_parse_error'
    /Users/dave/.gem/ruby/3.3.4/gems/syntax_tree-6.2.0/lib/syntax_tree.rb:124:in `parse'
    /Users/dave/.gem/ruby/3.3.4/gems/syntax_tree-6.2.0/lib/syntax_tree.rb:124:in `parse'
    /Users/dave/.gem/ruby/3.3.4/gems/syntax_tree-6.2.0/lib/syntax_tree.rb:68:in `format'
    lib/protoboeuf/codegen.rb:1723:in `block in to_ruby'
    lib/protoboeuf/codegen.rb:1703:in `each'
    lib/protoboeuf/codegen.rb:1703:in `to_ruby'
    test/codegen_test.rb:12:in `to_ruby'
    test/codegen_test.rb:834:in `test_big_enum'

The error appeared to have something to do with the thousands of chained elsifs. I changed this to single line statements like return :foo if val == 1, and this seems to have solved the issue. I this also makes the generated code a bit more concise and, in my opinion, a little more idiomatic. First I tried inserting newlines between the elsifs to see if that would help, but it didn't.

I dumped the instruction sequences of both with and without yjit and it looks to not make a difference:

ruby --yjit --dump=insns -e 'def foo(x); if x == 1; return :first; elsif x == 2; return :second; elsif x == 3; return :third; end; end'

➜  protoboeuf git:(big-enums) ruby --yjit --dump=insns -e 'def foo(x); if x == 1; return :first; elsif x == 2; return :second; elsif x == 3; return :third; end; end'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,105)>
0000 definemethod                           :foo, foo                 (   1)[Li]
0003 putobject                              :foo
0005 leave

== disasm: #<ISeq:foo@-e:1 (1,0)-(1,105)>
local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] x@0<Arg>
0000 getlocal_WC_0                          x@0                       (   1)[LiCa]
0002 putobject_INT2FIX_1_
0003 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0005 branchunless                           10
0007 putobject                              :first
0009 leave                                  [Re]
0010 getlocal_WC_0                          x@0
0012 putobject                              2
0014 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0016 branchunless                           21
0018 putobject                              :second
0020 leave                                  [Re]
0021 getlocal_WC_0                          x@0
0023 putobject                              3
0025 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0027 branchunless                           32
0029 putobject                              :third
0031 leave                                  [Re]
0032 putnil
0033 leave                                  [Re]

ruby --yjit --dump=insns -e 'def foo(x); return :first if x == 1; return :second if x == 2; return :third if x == 3; end'

➜  protoboeuf git:(big-enums) ruby --yjit --dump=insns -e 'def foo(x); return :first if x == 1; return :second if x == 2; return :third if x == 3; end'           
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,91)>
0000 definemethod                           :foo, foo                 (   1)[Li]
0003 putobject                              :foo
0005 leave

== disasm: #<ISeq:foo@-e:1 (1,0)-(1,91)>
local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] x@0<Arg>
0000 getlocal_WC_0                          x@0                       (   1)[LiCa]
0002 putobject_INT2FIX_1_
0003 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0005 branchunless                           10
0007 putobject                              :first
0009 leave                                  [Re]
0010 getlocal_WC_0                          x@0
0012 putobject                              2
0014 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0016 branchunless                           21
0018 putobject                              :second
0020 leave                                  [Re]
0021 getlocal_WC_0                          x@0
0023 putobject                              3
0025 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0027 branchunless                           32
0029 putobject                              :third
0031 leave                                  [Re]
0032 putnil
0033 leave                                  [Re]

Interestingly, when I pulled out the self.lookup and self.resolve methods to ERB templates, I started seeing Rubocop failures related to multi-line blocks with curly braces, so I fixed those too. I think some of the Ruby that generates Ruby might have been confusing Rubocop.

… "nesting too deep" error

davebenvenuti · 2025-01-24T22:52:11Z

I also filed an issue for syntax_tree: ruby-syntax-tree/syntax_tree#491

nirvdrum · 2025-01-27T21:10:15Z

lib/protoboeuf/codegen.rb

      end

+      LOOKUP = ERB.new(<<~LOOKUP, trim_mode: "-")


I realize this is just improving upon what we already have, but I wonder if we could use Hash here or something else with constant-time access. As implemented, we're looking at linear-time lookups. I could see this potentially being faster if enum.value was a frequency list with an infrequently accessed tail, but I think in practice it's just the order in the protobuf definition.

@tenderlove Were the values enumerated as a long list of branches for YJIT?

Since we're generating the code, we could do a mixed strategy, too (assuming the current approach is to optimize on YJIT). E.g., if enum.value.size < 10 or whatever, emit the chain of conditionals, otherwise store in a Hash and do a lookup that way.

I can merge this as-is then followup with that. I created an issue: #176

@tenderlove Were the values enumerated as a long list of branches for YJIT?

Yes, they were. The idea was that the enums were probably short enough that staying in machine code would outperform calling out to a hash. I thought at one point we'd used a sparse array though? I feel you could do this with an array, but the catch is that Protobuf allows negative enums.

Makes sense. I updated the Issue title to "Investigate using a Hash or sparse Array for Enum lookup and resolve instead of a series of conditonals"

nirvdrum · 2025-01-27T21:14:18Z

Thank you for confirming the instruction sequences were equivalent in both cases. Some of the non-idiomatic code in the code generator is written that way to maximize performance, either in the interpreter or in YJIT. Comparing the instruction sequences helps avoid inadvertent regressions.

rwstauner · 2025-01-27T21:43:11Z

The --dump=insns is a CRuby feature that isn't going to be influenced by any yjit options.
If you wanted to compare the yjit compiled code you could trade --dump=disasm for --yjit-dump-disasm (you would also need to ensure the code is run enough times to be compiled by yjit).
FWIW I did this and after normalizing the memory addresses the diff seems to be similarly negligible.

rwstauner

Makes sense to me, seems like a necessary work-around

davebenvenuti force-pushed the big-enums branch 3 times, most recently from 47368a4 to 4890b39 Compare January 24, 2025 21:55

Tweak our generated lookup and resolve methods for enums to prevent a…

94e2260

… "nesting too deep" error

davebenvenuti force-pushed the big-enums branch from 4890b39 to 94e2260 Compare January 24, 2025 21:59

davebenvenuti marked this pull request as ready for review January 24, 2025 22:15

davebenvenuti requested review from nirvdrum, rwstauner and tenderworks January 24, 2025 22:15

nirvdrum approved these changes Jan 27, 2025

View reviewed changes

rwstauner approved these changes Jan 28, 2025

View reviewed changes

davebenvenuti merged commit 9e25aa6 into main Jan 28, 2025
7 checks passed

davebenvenuti deleted the big-enums branch January 28, 2025 23:11

davebenvenuti added the #gsd:43107 Ruby Language: RPC Code Generation for Shopify Internal APIs [GSD 43107] label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak generated class methods .lookup and .resolve for enums to prevent StackError when formatting #172

Tweak generated class methods .lookup and .resolve for enums to prevent StackError when formatting #172

davebenvenuti commented Jan 24, 2025 •

edited

Loading

davebenvenuti commented Jan 24, 2025

nirvdrum Jan 27, 2025 •

edited

Loading

nirvdrum Jan 27, 2025

davebenvenuti Jan 28, 2025

tenderworks Jan 28, 2025

davebenvenuti Jan 28, 2025

nirvdrum commented Jan 27, 2025

rwstauner commented Jan 27, 2025

rwstauner left a comment

Tweak generated class methods .lookup and .resolve for enums to prevent StackError when formatting #172

Tweak generated class methods .lookup and .resolve for enums to prevent StackError when formatting #172

Conversation

davebenvenuti commented Jan 24, 2025 • edited Loading

davebenvenuti commented Jan 24, 2025

nirvdrum Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

nirvdrum Jan 27, 2025

Choose a reason for hiding this comment

davebenvenuti Jan 28, 2025

Choose a reason for hiding this comment

tenderworks Jan 28, 2025

Choose a reason for hiding this comment

davebenvenuti Jan 28, 2025

Choose a reason for hiding this comment

nirvdrum commented Jan 27, 2025

rwstauner commented Jan 27, 2025

rwstauner left a comment

Choose a reason for hiding this comment

davebenvenuti commented Jan 24, 2025 •

edited

Loading

nirvdrum Jan 27, 2025 •

edited

Loading