-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread manager crate to agave #3890
Conversation
da038b3
to
309350b
Compare
@@ -48,11 +48,12 @@ fn main() -> anyhow::Result<()> { | |||
println!("Running {exp}"); | |||
let mut conffile = PathBuf::from(env!("CARGO_MANIFEST_DIR")); | |||
conffile.push(exp); | |||
let conffile = std::fs::File::open(conffile)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, we don't specify configurations in files. Doing it in code instead for the example/test cases, or with command line flags for other cases. Sometimes use yml (for accounts information for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got you, you want to read thread-manager/examples/core_contention_contending_set.toml
with it.
Yeah, could you maybe have a method that reads this file and used in both example and production environment in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file will never be read in prod, its just for examples which are also benchmarks/tests. I'm not even sure we will use toml in prod for this yet, though it is likely.
I think you need to explain broader audience proposed changes in the context. Maybe by setting up a meeting with involved parties. |
Yes, a meeting would be necessary eventually, but I also need the code to be reasonably good before a million people look at it=) |
2c6a9dd
to
4e1d728
Compare
c93751c
to
60a04f8
Compare
895fd15
to
e00029a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the high level idea. Left a few nits and some questions.
I need to better understand how you envision the string-->string-->runtime mapping working before I can weigh in on it
## Scheduling policy and priority | ||
If you want you can set thread scheduling policy and priority. Keep in mind that this will likely require | ||
```bash | ||
sudo setcap cap_sys_nice+ep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the past we've had tuning scripts that could handle recommended configurations like this. If it makes sense, we could add this wherever relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about priority stuff at this point, it is super platform specific and mostly a placeholder at the moment, as getting this past CI will be unfun (due to extra priveleges necessary)
thread-manager/src/lib.rs
Outdated
#[derive(Default, Debug)] | ||
pub struct ThreadManagerInner { | ||
pub tokio_runtimes: HashMap<ConstString, TokioRuntime>, | ||
pub tokio_runtime_mapping: HashMap<ConstString, ConstString>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the idea here that we essentially have a mapping like so:
service --> tokio name --> tokio runtime
And service --> tokio name could many to 1 but the tokio name --> tokio runtime is 1 to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly! I believe we will eventually have maybe 1-2 tokio runtimes + 2-3 rayons. and mappings to those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! I think it makes sense. Maybe we can get rid of the intermediate mapping and go just from service to runtime (I believe you mentioned this elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you want the intermediate mapping removed now or can we keep it in?
Adding @alessandrod as I'm sure he'll have opinions on this 😄 |
From this idea we get to solGossipConsume -> RayonSigVerify -> ThreadPool sort of mapping. At least that was the idea. We can dispense with the intermediate layer but then each pool would necessarily run on its own set of threads which may not be ideal. We can, of course, axe this functionality and bring it back later as appropriate. |
4e33c06
to
bb667f3
Compare
Co-authored-by: Andrew Fitzgerald <[email protected]>
Co-authored-by: kirill lykov <[email protected]>
bb667f3
to
b569477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I expect that when this crate will be adapted there will be some other changes/improvements. And also I would like to see better code documentation (in follow ups):
- on each module level describe what is this module for, etc
- on each trait/struct the same
- each method
Finally check that the generated documentation looks decent.
I know that most of the crates don't follow this practice but I think for the new crates we can make it.
what measurements? why do we have a userspace scheduler? |
Problem
There exist a variety of perf issues related to unorganized thread pools that spawn far more threads than are useful on a given machine, this was confirmed by measurements (see measurement examples in thread-manager/examples directory) and is relatively easy to fix in agave.
Idea is to eventually address the needs of
Summary of Changes
Added new agave-thread-manager crate, which is to be gradually hooked into the agave itself. It will be used to centralize thread pool creation such that their core allocations can be controlled, and total thread count can gradually be reduced to match core count as closely as possible. Crate comes with its own benchmarks and some tests. The crate has full functionality only on Linux as this is the only platform where it is relevant.