Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should non-nullable locals be initialized by unreachable code? #98

Open
tlively opened this issue Apr 14, 2023 · 13 comments
Open

Should non-nullable locals be initialized by unreachable code? #98

tlively opened this issue Apr 14, 2023 · 13 comments

Comments

@tlively
Copy link
Member

tlively commented Apr 14, 2023

Binaryen optimization passes that can introduce new invalid uses of non-nullable locals opt themselves into getting an automatic fixup pass applied afterward. To keep binaryen efficient, we try to have as few optimization passes as possible opt into this fixup. However, we found a problem with Binaryen's treatment of non-nullable locals that might require many more passes to opt into the fixup. I'd like to avoid that by making a fix in the spec instead.

The problem is that Binaryen performs a domination analysis that finds initializing sets that make subsequent gets valid, but that domination analysis did not consider unreachable code specially, so sets in unreachable code were considered to make subsequent gets valid. However, Binaryen does not emit unreachable code, so those sets are not actually emitted into the binary. The engine then sees a get without an initializing set and fails validation. You can find an example of this problem at WebAssembly/binaryen#5599.

A potential fix is given in WebAssembly/binaryen#5665, which makes the domination analysis ignore unreachable sets that won't appear in the binary, but this fix would require many more passes to have to opt into the local fixup mechanism and would considerably slow down Binaryen's compilation speed.

The problem would also be fixed if we changed the spec to say that instructions that leave the stack in a polymorphic state (i.e. introduce unreachability) also mark all locals as initialized. This would be sound because subsequent gets that are made valid by this change are never executed. This would let Binaryen avoid making any changes at all by making the current bug a feature instead.

What do folks think about this proposed relaxation of the validation rules?

@conrad-watt
Copy link
Contributor

conrad-watt commented Apr 14, 2023

I don't think I'm clearly understanding whether the code in your linked issue is the input to binaryen or the output. If it's the input, it's invalid, so I'm surprised that Binaryen is attempting to do any kind of optimisation on it at all instead of rejecting it. If it's output, I don't think that there's a clear relaxation of our 1a-style locals that can make it valid.

Put another way, is the issue that Binaryen is emitting code like this:

(block
    unreachable
    local.get $i
end)

(where $i is an uninitialised local)

or code like this?

(block
    unreachable
end)
local.get $i

It's definitely possible to relax validation to make the first code snippet above validate (and in fact we'd need to make this relaxation anyway if we ever wanted to move to the 1b/c cross-block approach) - although I don't understand how code in this form is getting emitted since I'd expect Binaryen to optimise the local.get away.

We can't make the latter code snippet valid without going beyond our current 1a intra-block initialisation tracking.

@tlively
Copy link
Member Author

tlively commented Apr 14, 2023

The code in the linked issue is input code. It doesn't cause any errors because binaryen ignores unnamed blocks when doing non-nullable local validation because it never emits those blocks in its output anyway. So I believe Binaryen is producing code like your first example.

(Note that there also appears to be an opportunity to skip even more unreachable code in Binaryen's binary emitter, which I suppose would also solve this problem. I don't know how complicated that solution would be, though, and regardless it seems useful to relax the spec as much as we can.)

@conrad-watt
Copy link
Contributor

It seems totally fine for us to relax the spec to make this code valid (also assuming it's not annoying for implementers), although I imagine that having Binaryen stop emitting code that's program-order-after a statically-known unreachable point (unreachable and unconditional branches) might also be generally valuable for code-size reasons.

@kripken
Copy link
Member

kripken commented Apr 14, 2023

True, yeah. We do that in the --dce pass, so this issue only affects unoptimized code (where code size doesn't matter as much).

Overall if this is a simple and otherwise beneficial relaxation in the spec that could be nice, but if not, we can figure things out in Binaryen of course (but it will add a small amount of work, slowing down builds slightly).

@conrad-watt
Copy link
Contributor

conrad-watt commented Apr 14, 2023

Ah, I hadn't properly understood that some Binaryen configurations might not want to do (syntax-driven) dce. I'm happy to help facilitate relaxing the spec as above (assuming we get a neutral/positive noise from @rossberg and engines).

@titzer
Copy link
Contributor

titzer commented Apr 14, 2023

Would it be possible that Binaryen track the point in a block after which code is unreachable and then avoid adding new instructions to that region, or is the issue that an unreachable gets introduced in the middle of a block?

@tlively
Copy link
Member Author

tlively commented Apr 14, 2023

Yes, Binaryen already does that to some extent, and if it did a better job of it, this issue would go away.

Now that we've identified that fix, I wouldn't say a spec change is strongly motivated by this one problem we ran into. I still think it would be a nice change to make based on the general principle that it is helpful to relax the validation rules to reduce the amount of effort producers have to put into generating valid code.

@titzer
Copy link
Contributor

titzer commented Apr 15, 2023

FWIW I don't think this is a particularly hard change in Wizard's validator; there is literally only one instruction affected, which is local.get, and the control stack entries are annotated with an inUnreachableCode flag.

@rossberg
Copy link
Member

From first glance, this looks like a plausible change. But I probably won't have time to look into it for a week or two. It means a few more conditions in every validators, though, so I wonder if it isn't simpler, globally speaking, to implement the fix in Binaryen, which seems desirable anyway.

@rossberg
Copy link
Member

I created #99, which would relax the semantics as suggested. It's not hard but a moderate complication, so we should be perhaps be discussing whether it's worth it.

@tlively
Copy link
Member Author

tlively commented Apr 19, 2023

FWIW this is turning out to be nontrivial to fix in Binaryen (although still desirable to fix, of course). I am unsurprisingly in favor of this spec change.

kripken added a commit to WebAssembly/binaryen that referenced this issue Apr 19, 2023
DCE at the end avoids issues with non-nullable local operations in unreachable
code, which is still being discussed. This PR avoids fuzzer errors for now, but we
should revert it when we have a proper fix.

See

* #5599
* #5665
* WebAssembly/function-references#98
@rossberg
Copy link
Member

rossberg commented Nov 8, 2023

I just noticed that this issue is still open. @tlively, is this still relevant for Binaryen?

@tlively
Copy link
Member Author

tlively commented Nov 8, 2023

Yes, I think this is still a problem in principle. We've worked around it by always running DCE in our fuzzer when GC is enabled, though, so it's not a problem that is giving us ongoing trouble. I would be ok closing this issue.

radekdoulik pushed a commit to dotnet/binaryen that referenced this issue Jul 12, 2024
DCE at the end avoids issues with non-nullable local operations in unreachable
code, which is still being discussed. This PR avoids fuzzer errors for now, but we
should revert it when we have a proper fix.

See

* WebAssembly#5599
* WebAssembly#5665
* WebAssembly/function-references#98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants