-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MADNESS firing tasks into PaRSEC deadlocks when MAD_NUM_THREADS=1 #139
Comments
Does that happen with any example? Do you think it's a bug in MADNESS or PaRSEC? |
@devreal this probably happens with all examples, and the issue is how the PaRSEC is used in MADNESS (if MAD_NUM_THREADS=1 PaRSEC is not supposed to use any threads to execute tasks, only main is supposed to execute tasks during fences. My guess is there is no way actually to make the main thread part of PaRSEC thread group, neither is there to make a non-PaRSEC main to execute current task pool). This really is a MADNESS issue, but put it here to attract @therault and @bosilca 's eyes + PaRSEC backend in MADNESS was implemented in the TESSE project anyway. |
This used to be the same with TBB ... had to force MAD_NUM_THREADS >= 2.
But then they added an API so that the main thread can execute tasks while
waiting.
Is there a PARSEC API that permits this?
It is appealing to have just a single thread (i.e., main) execute the
program
* a clear performance baseline
* makes some debugging easier
* many performance analysis tools cannot handle multiple threads
…On Sun, Aug 29, 2021 at 4:25 PM Eduard Valeyev ***@***.***> wrote:
@devreal <https://github.com/devreal> this probably happens with all
examples, and the issue is how the PaRSEC is used in MADNESS (if
MAD_NUM_THREADS=1 PaRSEC is not supposed to use any threads to execute
tasks, only main is supposed to execute tasks during fences. My guess is
there is no way actually to make the main thread part of PaRSEC thread
group, neither is there to make a non-PaRSEC main to execute current task
pool).
This really is a MADNESS issue, but put it here to attract @therault
<https://github.com/therault> and @bosilca <https://github.com/bosilca>
's eyes + PaRSEC backend in MADNESS was implemented in the TESSE project
anyway.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#139 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZSAPI4V2QBH7WLHMCHUV3T7KJTFANCNFSM5DAINK4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Robert J. Harrison
tel: 865-274-8544
|
We can certainly make it work with PaRSEC as well, I'm looking into it.
The current PaRSEC/MADNESS integration does not modify the MADNESS fence().
If we enter the parsec_wait_context or another parsec_wait operation in the
MADNESS fence(), the main thread will join the computation, as we do in ttg.
On Mon, Aug 30, 2021 at 10:13 AM Robert J. Harrison <
***@***.***> wrote:
… This used to be the same with TBB ... had to force MAD_NUM_THREADS >= 2.
But then they added an API so that the main thread can execute tasks while
waiting.
Is there a PARSEC API that permits this?
It is appealing to have just a single thread (i.e., main) execute the
program
* a clear performance baseline
* makes some debugging easier
* many performance analysis tools cannot handle multiple threads
On Sun, Aug 29, 2021 at 4:25 PM Eduard Valeyev ***@***.***>
wrote:
> @devreal <https://github.com/devreal> this probably happens with all
> examples, and the issue is how the PaRSEC is used in MADNESS (if
> MAD_NUM_THREADS=1 PaRSEC is not supposed to use any threads to execute
> tasks, only main is supposed to execute tasks during fences. My guess is
> there is no way actually to make the main thread part of PaRSEC thread
> group, neither is there to make a non-PaRSEC main to execute current task
> pool).
>
> This really is a MADNESS issue, but put it here to attract @therault
> <https://github.com/therault> and @bosilca <https://github.com/bosilca>
> 's eyes + PaRSEC backend in MADNESS was implemented in the TESSE project
> anyway.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#139 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ABZSAPI4V2QBH7WLHMCHUV3T7KJTFANCNFSM5DAINK4Q
>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
>
> or Android
> <
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
>.
>
>
--
Robert J. Harrison
tel: 865-274-8544
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#139 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFEZNRXU6DW4ZWILUXAMG3T7OGXDANCNFSM5DAINK4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@robertjharrison yes, they have |
What progress is lacking in that case? Communication? |
If the main thread is joining in the computation while fencing, then I
don't understand why it is hanging.
Does TTG+PaRSEC itself execute OK with just one thread (i.e., no threads in
the task pool)?
On Mon, Aug 30, 2021 at 10:22 AM Thomas Herault ***@***.***>
wrote:
… We can certainly make it work with PaRSEC as well, I'm looking into it.
The current PaRSEC/MADNESS integration does not modify the MADNESS fence().
If we enter the parsec_wait_context or another parsec_wait operation in the
MADNESS fence(), the main thread will join the computation, as we do in
ttg.
On Mon, Aug 30, 2021 at 10:13 AM Robert J. Harrison <
***@***.***> wrote:
> This used to be the same with TBB ... had to force MAD_NUM_THREADS >= 2.
> But then they added an API so that the main thread can execute tasks
while
> waiting.
>
> Is there a PARSEC API that permits this?
>
> It is appealing to have just a single thread (i.e., main) execute the
> program
> * a clear performance baseline
> * makes some debugging easier
> * many performance analysis tools cannot handle multiple threads
>
>
>
> On Sun, Aug 29, 2021 at 4:25 PM Eduard Valeyev ***@***.***>
> wrote:
>
> > @devreal <https://github.com/devreal> this probably happens with all
> > examples, and the issue is how the PaRSEC is used in MADNESS (if
> > MAD_NUM_THREADS=1 PaRSEC is not supposed to use any threads to execute
> > tasks, only main is supposed to execute tasks during fences. My guess
is
> > there is no way actually to make the main thread part of PaRSEC thread
> > group, neither is there to make a non-PaRSEC main to execute current
task
> > pool).
> >
> > This really is a MADNESS issue, but put it here to attract @therault
> > <https://github.com/therault> and @bosilca <https://github.com/bosilca
>
> > 's eyes + PaRSEC backend in MADNESS was implemented in the TESSE
project
> > anyway.
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <#139 (comment)>,
or
> > unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/ABZSAPI4V2QBH7WLHMCHUV3T7KJTFANCNFSM5DAINK4Q
> >
> > .
> > Triage notifications on the go with GitHub Mobile for iOS
> > <
>
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
> >
> > or Android
> > <
>
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
> >.
> >
> >
>
>
> --
> Robert J. Harrison
> tel: 865-274-8544
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#139 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ABFEZNRXU6DW4ZWILUXAMG3T7OGXDANCNFSM5DAINK4Q
>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
>
> or Android
> <
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
>.
>
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#139 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZSAPNM3N5322CHVF54CVLT7OH4BANCNFSM5DAINK4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Robert J. Harrison
tel: 865-274-8544
|
It does. In the PaRSEC backend in TTG, we call |
Current PaRSEC implementation in MADNESS mimics the TBB implementation:
There are two ways to provide active participation of the main thread:
@evaleev In addition to call parsec_context_wait(), we need to update the runtime_nb_tasks() before and after calling it, and do some things with the taskpool, before we can call parsec_context_wait() again. The second option is probably cleaner. Does that happen? Where should I look? |
Calling parsec_context_wait works in all DSL where all dependencies tracking happen in parsec, as there is no escape for the main thread from this blocking function until all known tasks in the context are completed. If MADNESS has it's communication thread outside the main thread, and the communication thread continue to guarantee communication progress (i.e it will trigger known but not-yet-ready tasks), blocking the main thread in parsec_context_wait should work. |
No description provided.
The text was updated successfully, but these errors were encountered: