Handle `throw` instruction without exiting in YJIT #504

maximecb · 2023-03-07T21:56:38Z

Thanks to the work Jimmy has been doing closing the exits in send, we are very close to hitting 90%+ ratio_in_yjit on the liquid benchmark, and send is no longer at the top of the exit ops:

ratio_in_yjit:                 88.7%
avg_len_in_yjit:                60.3
Top-16 most frequent exit ops (100.0% of exits):
                     throw:     19,107 (82.8%)
                      send:      1,498 ( 6.5%)
    opt_send_without_block:        920 ( 4.0%)
      opt_getconstant_path:        752 ( 3.3%)
               invokeblock:        432 ( 1.9%)
                    opt_eq:         90 ( 0.4%)
                      once:         87 ( 0.4%)

Kokubun has expressed interest in tackling this one.

In some ways, throw behaves like a return, and so some dynamic dispatch strategy might be needed. I don't know the exact way that throw is implemented in CRuby, but I imagine that there is a function that can tell us where to throw to (which ISEQ/pc to jump to). A little cache with side-exits could then potentially be implemented to handle that.

Does the interpreter currently use some kind of a cache for throw? How does it find out where to jump to?

The text was updated successfully, but these errors were encountered:

k0kubun · 2023-03-09T19:57:38Z

The trace of throw exit locations:

$ chruby ruby; stackprof benchmarks/liquid-render/yjit_exit_locations.dump --method 'throw'
throw (nonexistent.def:1)
  samples:  18965 self (87.9%)  /   13019 total (60.3%)
  callers:
    5916  (   45.4%)  Liquid::Context#try_variable_find_in_environments
    3692  (   28.4%)  Liquid::VariableLookup#evaluate
    1479  (   11.4%)  Liquid::Condition#evaluate
    1015  (    7.8%)  Liquid::Parser#variable_lookups
     792  (    6.1%)  Liquid::Block#parse_body
      40  (    0.3%)  Liquid::If#render_to_output_buffer
      39  (    0.3%)  Liquid::Utils.slice_collection_using_each
      28  (    0.2%)  Liquid::Context#squash_instance_assigns_with_environments
      18  (    0.1%)  Kernel#require
  callees (-5946 total):
  code:
        SOURCE UNAVAILABLE
trace_throw (nonexistent.def:1)
  samples:     0 self (0.0%)  /      0 total (0.0%)
  callees (0 total):
  code:
        SOURCE UNAVAILABLE

Examples:
https://github.com/Shopify/liquid/blob/48cb643c026557f48e524dfd39cc9ff90aa3db95/lib/liquid/context.rb#L247
https://github.com/Shopify/liquid/blob/48cb643c026557f48e524dfd39cc9ff90aa3db95/lib/liquid/variable_lookup.rb#L69

I was primarily interested in directly jumping from JIT code to JIT code. However, it doesn't seem safe if there's a C method frame in between, which seems like the case for those examples (i.e. each and each_index).

So my current idea is to take multiple steps as follows:

Just eliminate the interpreter exit overhead
Somehow generate a direct jump to the throw destination when all the VM call frames between them seem to be in the same native JIT frame.
Consider merging Array#each in Ruby

maximecb · 2023-03-09T21:24:21Z

Even if we can just eliminate the interpreter exits, it would be a win. It's not just the entry/exit that slow us down, it's also that it can take a while to get back into JIT code. That only happens at JIT entry points, so if you go through a bunch of returns after you exit, that's all done by the interpreter for instance.

Somehow generate a direct jump to the throw destination when all the VM call frames between them seem to be in the same native JIT frame.

If you could compare the current C stack pointer vs the stack pointer at the return, it might be possible to detect that 🤔

Might be other strategies we could use too... Like, if we could somehow know that the rescue block is in the caller directly, then we only need to check that the jit_return address is set and we know that the parent is a JIT frame. To some degree, we can also generate code for a small loop that walks the stack. It might not be that expensive to walk up to 4-5 levels deep in the stack to make sure that there is no C frame... So some amount of speculation may be possible.

You could even, generate a little unrolled loop that walks the stack up to N==8 frames deep, and if it fails, it side-exits to the interpreter?

maximecb · 2023-03-09T22:23:20Z

Based on the fact that throw is 11% of the exits on SFR right now (probably because of liquid use), this should produce a nice speedup there too. Exciting.

k0kubun · 2023-03-09T22:40:18Z

For implementing a general path just to remove side exits, I came up with a couple of ways to implement it: one with longjmp ruby#7490 and one without ruby#7491.

Both of them are working-ish but have a test failure. I'd like to discuss the approaches and pair on fixing it next week.

maximecb · 2023-03-13T19:33:26Z

Out of curiosity I created a throw instruction microbenchmark: Shopify/yjit-bench#199

While testing that, I also tested replacing the throw instruction by a regular return and uh, the results are interesting.

With throw, YJIT enabled:

...
itr #60: 157ms
itr #61: 155ms
itr #62: 155ms
itr #63: 156ms
itr #64: 155ms
itr #65: 155ms

With return, YJIT enabled:

...
itr #1503: 6ms
itr #1504: 6ms
itr #1505: 6ms
itr #1506: 6ms
itr #1507: 6ms
itr #1508: 6ms

So the overhead of throw is absolutely massive. We should do what we can do go from JIT code to JIT code if possible. Either way, good to have a microbenchmark to measure this.

maximecb · 2023-03-17T17:34:38Z

It looks like in order to be able to handle throw without exiting the JIT, we would need to implement at least Array#each in Ruby. Until then we've improved performance a little bit with ruby#7491

See #493

* Test existing behavior Typing Ctrl-D ends editing but typing <Del> does not. Also renamed a test that is not testing ed_delete_next_char but key_delete. * Check if line empty first in em_delete By distributivity of AND over OR, we can factor out this condition. This will make the next commit simpler. * Use em_delete in key_delete When the editing mode is emacs, use `em_delete` in `key_delete`. We need to add a condition though to `em_delete`, because it implements both `delete-char` and `end-of-file`. We only want the `end-of-file` behavior is the key is really Ctrl-D. This matches the behavior of the <Del> key with readline, i.e. deleting the next character if there is one, but not moving the cursor, while not finishing the editing if there are no characters.

k0kubun · 2023-08-09T23:05:52Z

ruby#8171 basically solved the problem of exiting JIT code on throw instruction, so I think we could close this and discuss further improvements in different issues.

maximecb added the enhancement New feature or request label Mar 7, 2023

maximecb assigned k0kubun Mar 7, 2023

k0kubun mentioned this issue Mar 13, 2023

Create a microbenchmark for the throw instruction Shopify/yjit-bench#199

Merged

k0kubun closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle `throw` instruction without exiting in YJIT #504

Handle `throw` instruction without exiting in YJIT #504

maximecb commented Mar 7, 2023 •

edited

Loading

k0kubun commented Mar 9, 2023 •

edited

Loading

maximecb commented Mar 9, 2023

maximecb commented Mar 9, 2023

k0kubun commented Mar 9, 2023 •

edited

Loading

maximecb commented Mar 13, 2023

maximecb commented Mar 17, 2023

k0kubun commented Aug 9, 2023

Handle throw instruction without exiting in YJIT #504

Handle throw instruction without exiting in YJIT #504

Comments

maximecb commented Mar 7, 2023 • edited Loading

k0kubun commented Mar 9, 2023 • edited Loading

maximecb commented Mar 9, 2023

maximecb commented Mar 9, 2023

k0kubun commented Mar 9, 2023 • edited Loading

maximecb commented Mar 13, 2023

maximecb commented Mar 17, 2023

k0kubun commented Aug 9, 2023

Handle `throw` instruction without exiting in YJIT #504

Handle `throw` instruction without exiting in YJIT #504

maximecb commented Mar 7, 2023 •

edited

Loading

k0kubun commented Mar 9, 2023 •

edited

Loading

k0kubun commented Mar 9, 2023 •

edited

Loading