Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flags.c: More robust flags parsing #539

Merged
merged 3 commits into from
Nov 10, 2023
Merged

flags.c: More robust flags parsing #539

merged 3 commits into from
Nov 10, 2023

Conversation

tomsmeding
Copy link
Member

@tomsmeding tomsmeding commented Nov 8, 2023

Two important changes:

  1. We now parse all +ACC ... -ACC blocks instead of only the first one.
  2. We don't segfault if we encounter a NULL in argv[].

These fixes were informed by the awkward bug of "passing +RTS or +ACC makes the program hang indefinitely". The reason this happened was that, perhaps since recently, GHC has changed (?) approaches to parsing +RTS arguments. I'm not sure what it did before, but at least now, it shuffles all the non-RTS arguments to the front of argv[] and puts a NULL value as the first non-argument. Stale copies of other, perhaps RTS, arguments can be found after that NULL.

What happened is that the process_options constructor function got called multiple times during execution (to be precise, twice); the second invocation seems to be connected to accelerate-llvm-native starting some threads. But the strange thing is that it only gets called once more, not numCapabillities more times.

In any case, we get called multiple times, and the second time the GHC RTS had put a NULL in argv[], and we happily didn't check for NULL and ran strncmp() which segfaulted. And due to some signal handler on SIGSEGV which may or may not handle that by ignoring it entirely (?), nothing happened and all froze.

The direct fix for that is (2.) above, but in order to make this robust we also have to do (1.). The reason is that if a user does this:

+ACC abcd -ACC +ACC efgh -ACC

then our first run would have parsed "abcd" as an Accelerate flag and rewritten the argument list to:

-RTS -RTS -RTS +ACC efgh -ACC

which the GHC RTS would have rewritten to:

+ACC efgh -ACC <NULL> -RTS -RTS

after which the second invocation of process_options() would parse a +ACC block that we didn't parse before, resulting in non-idempotency. We don't want that, so we make sure to parse all +ACC blocks the first time round.

How has this been tested?
Tested this manually, locally. The first commit shows the debug prints I was using to diagnose what was happening.

Types of changes
What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist
Go over all the following points, and put an x in all the boxes that apply. If you're unsure about any of these, don't hesitate to ask. We're here to help!

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

cbits/flags.c Show resolved Hide resolved
cbits/flags.c Outdated Show resolved Hide resolved
tomsmeding and others added 3 commits November 8, 2023 23:59
Two important changes:
1. We now parse _all_ +ACC ... -ACC blocks instead of only the first
   one.
2. We don't segfault if we encounter a NULL in argv[].

These fixes were informed by the awkward bug of "passing +RTS or +ACC
makes the program hang indefinitely". The reason this happened was that,
perhaps since recently, GHC has changed (?) approaches to parsing +RTS
arguments. I'm not sure what it did before, but at least now, it
shuffles all the non-RTS arguments to the front of argv[] and puts a
NULL value as the first non-argument. Stale copies of other, perhaps
RTS, arguments can be found after that NULL.

What happened is that the process_options constructor function got
called multiple times during execution (to be precise, twice); the
second invocation seems to be connected to accelerate-llvm-native
starting some threads. But the strange thing is that it only gets called
_once_ more, not numCapabillities more times.

In any case, we get called multiple times, and the second time the GHC
RTS had put a NULL in argv[], and we happily didn't check for NULL and
ran strncmp() which segfaulted. And due to some signal handler on
SIGSEGV which may or may not handle that by ignoring it entirely (?),
nothing happened and all froze.

The direct fix for that is (2.) above, but in order to make this robust
we also have to do (1.). The reason is that if a user does this:

  +ACC abcd -ACC +ACC efgh -ACC

then our first run would have parsed "abcd" as an Accelerate flag and
rewritten the argument list to:

  -RTS -RTS -RTS +ACC efgh -ACC

which the GHC RTS would have rewritten to:

  +ACC efgh -ACC <NULL> -RTS -RTS

after which the second invocation of process_options() would parse a
+ACC block that we didn't parse before, resulting in non-idempotency. We
don't want that, so we make sure to parse _all_ +ACC blocks the first
time round.
@tomsmeding tomsmeding merged commit 334d055 into master Nov 10, 2023
45 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants