Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle throw instruction without exiting in YJIT #504

Closed
maximecb opened this issue Mar 7, 2023 · 7 comments
Closed

Handle throw instruction without exiting in YJIT #504

maximecb opened this issue Mar 7, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@maximecb
Copy link

maximecb commented Mar 7, 2023

Thanks to the work Jimmy has been doing closing the exits in send, we are very close to hitting 90%+ ratio_in_yjit on the liquid benchmark, and send is no longer at the top of the exit ops:

ratio_in_yjit:                 88.7%
avg_len_in_yjit:                60.3
Top-16 most frequent exit ops (100.0% of exits):
                     throw:     19,107 (82.8%)
                      send:      1,498 ( 6.5%)
    opt_send_without_block:        920 ( 4.0%)
      opt_getconstant_path:        752 ( 3.3%)
               invokeblock:        432 ( 1.9%)
                    opt_eq:         90 ( 0.4%)
                      once:         87 ( 0.4%)

Kokubun has expressed interest in tackling this one.

In some ways, throw behaves like a return, and so some dynamic dispatch strategy might be needed. I don't know the exact way that throw is implemented in CRuby, but I imagine that there is a function that can tell us where to throw to (which ISEQ/pc to jump to). A little cache with side-exits could then potentially be implemented to handle that.

Does the interpreter currently use some kind of a cache for throw? How does it find out where to jump to?

@maximecb maximecb added the enhancement New feature or request label Mar 7, 2023
@k0kubun
Copy link
Member

k0kubun commented Mar 9, 2023

The trace of throw exit locations:

$ chruby ruby; stackprof benchmarks/liquid-render/yjit_exit_locations.dump --method 'throw'
throw (nonexistent.def:1)
  samples:  18965 self (87.9%)  /   13019 total (60.3%)
  callers:
    5916  (   45.4%)  Liquid::Context#try_variable_find_in_environments
    3692  (   28.4%)  Liquid::VariableLookup#evaluate
    1479  (   11.4%)  Liquid::Condition#evaluate
    1015  (    7.8%)  Liquid::Parser#variable_lookups
     792  (    6.1%)  Liquid::Block#parse_body
      40  (    0.3%)  Liquid::If#render_to_output_buffer
      39  (    0.3%)  Liquid::Utils.slice_collection_using_each
      28  (    0.2%)  Liquid::Context#squash_instance_assigns_with_environments
      18  (    0.1%)  Kernel#require
  callees (-5946 total):
  code:
        SOURCE UNAVAILABLE
trace_throw (nonexistent.def:1)
  samples:     0 self (0.0%)  /      0 total (0.0%)
  callees (0 total):
  code:
        SOURCE UNAVAILABLE

Examples:
https://github.com/Shopify/liquid/blob/48cb643c026557f48e524dfd39cc9ff90aa3db95/lib/liquid/context.rb#L247
https://github.com/Shopify/liquid/blob/48cb643c026557f48e524dfd39cc9ff90aa3db95/lib/liquid/variable_lookup.rb#L69

I was primarily interested in directly jumping from JIT code to JIT code. However, it doesn't seem safe if there's a C method frame in between, which seems like the case for those examples (i.e. each and each_index).

So my current idea is to take multiple steps as follows:

  1. Just eliminate the interpreter exit overhead
  2. Somehow generate a direct jump to the throw destination when all the VM call frames between them seem to be in the same native JIT frame.
  3. Consider merging Array#each in Ruby

@maximecb
Copy link
Author

maximecb commented Mar 9, 2023

Even if we can just eliminate the interpreter exits, it would be a win. It's not just the entry/exit that slow us down, it's also that it can take a while to get back into JIT code. That only happens at JIT entry points, so if you go through a bunch of returns after you exit, that's all done by the interpreter for instance.

Somehow generate a direct jump to the throw destination when all the VM call frames between them seem to be in the same native JIT frame.

If you could compare the current C stack pointer vs the stack pointer at the return, it might be possible to detect that 🤔

Might be other strategies we could use too... Like, if we could somehow know that the rescue block is in the caller directly, then we only need to check that the jit_return address is set and we know that the parent is a JIT frame. To some degree, we can also generate code for a small loop that walks the stack. It might not be that expensive to walk up to 4-5 levels deep in the stack to make sure that there is no C frame... So some amount of speculation may be possible.

You could even, generate a little unrolled loop that walks the stack up to N==8 frames deep, and if it fails, it side-exits to the interpreter?

@maximecb
Copy link
Author

maximecb commented Mar 9, 2023

Based on the fact that throw is 11% of the exits on SFR right now (probably because of liquid use), this should produce a nice speedup there too. Exciting.

@k0kubun
Copy link
Member

k0kubun commented Mar 9, 2023

For implementing a general path just to remove side exits, I came up with a couple of ways to implement it: one with longjmp ruby#7490 and one without ruby#7491.

Both of them are working-ish but have a test failure. I'd like to discuss the approaches and pair on fixing it next week.

@maximecb
Copy link
Author

Out of curiosity I created a throw instruction microbenchmark: Shopify/yjit-bench#199

While testing that, I also tested replacing the throw instruction by a regular return and uh, the results are interesting.

With throw, YJIT enabled:

...
itr #60: 157ms
itr #61: 155ms
itr #62: 155ms
itr #63: 156ms
itr #64: 155ms
itr #65: 155ms

With return, YJIT enabled:

...
itr #1503: 6ms
itr #1504: 6ms
itr #1505: 6ms
itr #1506: 6ms
itr #1507: 6ms
itr #1508: 6ms

So the overhead of throw is absolutely massive. We should do what we can do go from JIT code to JIT code if possible. Either way, good to have a microbenchmark to measure this.

@maximecb
Copy link
Author

It looks like in order to be able to handle throw without exiting the JIT, we would need to implement at least Array#each in Ruby. Until then we've improved performance a little bit with ruby#7491

See #493

casperisfine pushed a commit that referenced this issue Apr 5, 2023
* Test existing behavior

Typing Ctrl-D ends editing but typing <Del> does not.

Also renamed a test that is not testing ed_delete_next_char but
key_delete.

* Check if line empty first in em_delete

By distributivity of AND over OR, we can factor out this condition. This
will make the next commit simpler.

* Use em_delete in key_delete

When the editing mode is emacs, use `em_delete` in `key_delete`. We need
to add a condition though to `em_delete`, because it implements both
`delete-char` and `end-of-file`. We only want the `end-of-file` behavior
is the key is really Ctrl-D.

This matches the behavior of the <Del> key with readline, i.e. deleting
the next character if there is one, but not moving the cursor, while not
finishing the editing if there are no characters.
@k0kubun
Copy link
Member

k0kubun commented Aug 9, 2023

ruby#8171 basically solved the problem of exiting JIT code on throw instruction, so I think we could close this and discuss further improvements in different issues.

@k0kubun k0kubun closed this as completed Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants