Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 pkpool #20
base: v2
Are you sure you want to change the base?
v2 pkpool #20
Changes from 2 commits
461b6ed
5eb5330
22b35ff
9809f95
081bab4
7ee652c
5d4bbe2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in some benchmark it can be an issue since the cost of putting the data on a remote pool can be significant i.e. it's always guarantee a cache miss vs. returning to the its own pool which is very cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. In our use case however, there is only ever one thread (per device I guess) that can return packets. Is it then better to cause a cache miss or to allow a pool to empty, guaranteeing a cache miss later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there is no right or wrong here. Things like this need to be tuned and measured against application behavior. Hence giving some flex over it for tuning is better, I would change 1024 to a const, and if you set to 0, compiler will get rid of it.
Btw note also cache miss on the progress thread trying to return to the original pool can be worst than on sender stealing. Firstly it delay progress, secondly assuming we have large number of thread doing send, cost are amortized as you may have other threads doing useful works. Again may not matter in your case now which you have only one sender thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see your point.
The branch of ParSEC I'm currently on is old and funnels all sends via the communication thread; but more recent versions can send from other threads as well, so this will become more relevant in the future once I rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remembered that actually, the communication and progress threads are supposed to run on the same core (they're bound). This means that the should end up using the same pool. In PaRSEC/LCI, the user's main thread is actually the one to initialize LCI and thus fill the initial pool; the communication and progress threads are started later (and presumably run on a different core) and therefore start with an empty pool.
I'll do some more tests that discard the changes for medium and long sends, but keep the remaining changes (most notably those in
server_psm2.h
).Long-term, I'll probably introduce some compile-time parameters for a) limit for medium send packet return and b) whether to do long send packet return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange if they are using the same pool, please verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll verify tomorrow, but my intuition is that my above statement is correct. Both threads should be bound to the same core and so
sched_getcpu
should return the same value, leading to using the same pool. This is PaRSEC-specific though, and in the future if/when we enable using more than one device we'll need to have better logic for thread placement (in PaRSEC).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the activation mode, PaRSEC communication thread might not be bound to a core, but instead be allowed to freely move around. You can be more restrictive and force a binding to the communication thread, in which case the resource will be assumed reserved, and no computation thread will be allowed to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default (which I've been testing with) is to have it be bound. When we have a single progress thread, it makes sense to bind it to the communication thread, but if we have more (multiple NICs/hardware queues/"LCI devices") then the decision is less obvious. At that point, we'll want to introduce an MCA parameter to control the binding.