-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize tracing-tracy allocations #86
Conversation
…locals: #86 Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Benchmarking code: ``` use std::cell::{Cell, RefCell, UnsafeCell}; use std::env::args_os; use std::ffi::OsStr; fn a() { thread_local! { static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { s.borrow_mut().push(i); }); } TLS.with(|s| s.borrow_mut().clear()); } fn b() { thread_local! { static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { unsafe { s.try_borrow_mut().unwrap_unchecked() }.push(i); }); } TLS.with(|s| unsafe { s.try_borrow_mut().unwrap_unchecked() }.clear()); } fn c() { thread_local! { static TLS: Cell<Vec<usize>> = const { Cell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { let mut v = s.take(); v.push(i); s.set(v); }); } TLS.with(|s| { let mut v = s.take(); v.clear(); s.set(v); }); } fn d() { thread_local! { static TLS: UnsafeCell<Vec<usize>> = const { UnsafeCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { unsafe { &mut *s.get() }.push(i); }); } TLS.with(|s| unsafe { &mut *s.get() }.clear()); } fn e() { thread_local! { static TLS: UnsafeCell<Vec<usize>> = UnsafeCell::new(Vec::new()); } for i in 0..10000 { TLS.with(|s| { unsafe { &mut *s.get() }.push(i); }); } TLS.with(|s| unsafe { &mut *s.get() }.clear()); } fn f() { thread_local! { static TLS: RefCell<Vec<usize>> = RefCell::new(Vec::new()); } for i in 0..10000 { TLS.with(|s| { s.borrow_mut().push(i); }); } TLS.with(|s| s.borrow_mut().clear()); } fn main() { let cmd = args_os().nth(1).unwrap(); for _ in 0..1000 { if cmd == OsStr::new("refcell") { a(); } else if cmd == OsStr::new("refcell_unsafe") { b(); } else if cmd == OsStr::new("cell") { c(); } else if cmd == OsStr::new("unsafecell") { d(); } else if cmd == OsStr::new("unsafecell_non_const") { e(); } else if cmd == OsStr::new("refcell_non_const") { f(); } else { panic!(); } } } ``` Benchmarks: ``` > hyperfine -N "./foo refcell" "./foo refcell_unsafe" "./foo cell" "./foo unsafecell" "./foo unsafecell_non_const" "./foo refcell_non_const" Benchmark 1: ./foo refcell Time (mean ± σ): 17.2 ms ± 0.6 ms [User: 16.8 ms, System: 0.3 ms] Range (min … max): 15.0 ms … 21.3 ms 143 runs Benchmark 2: ./foo refcell_unsafe Time (mean ± σ): 18.1 ms ± 0.3 ms [User: 17.7 ms, System: 0.3 ms] Range (min … max): 16.6 ms … 21.0 ms 167 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 3: ./foo cell Time (mean ± σ): 36.0 ms ± 0.9 ms [User: 35.9 ms, System: 0.1 ms] Range (min … max): 34.3 ms … 42.7 ms 81 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 4: ./foo unsafecell Time (mean ± σ): 15.2 ms ± 1.1 ms [User: 14.7 ms, System: 0.3 ms] Range (min … max): 14.2 ms … 19.2 ms 210 runs Benchmark 5: ./foo unsafecell_non_const Time (mean ± σ): 15.5 ms ± 0.3 ms [User: 15.0 ms, System: 0.4 ms] Range (min … max): 15.0 ms … 16.3 ms 193 runs Benchmark 6: ./foo refcell_non_const Time (mean ± σ): 25.9 ms ± 5.3 ms [User: 25.5 ms, System: 0.3 ms] Range (min … max): 17.9 ms … 42.2 ms 146 runs Summary './foo unsafecell' ran 1.02 ± 0.08 times faster than './foo unsafecell_non_const' 1.13 ± 0.09 times faster than './foo refcell' 1.19 ± 0.09 times faster than './foo refcell_unsafe' 1.71 ± 0.37 times faster than './foo refcell_non_const' 2.37 ± 0.18 times faster than './foo cell' ``` Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
…there—go back to TLS, but use the new VecCell abstraction Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Ok, this should be ready for review. I've updated the PR description with results. |
fn record_bool(&mut self, field: &Field, value: bool) { | ||
match (value, field.name()) { | ||
(true, "tracy.frame_mark") => self.frame_mark = true, | ||
_ => self.record_debug(field, &value), | ||
(_, "tracy.frame_mark") => self.frame_mark = value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to set the value... was the previous behavior intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle a field name for an event can occur only once. I don’t particularly care either way, but tracy.frame_mark = false
does not really have any reason to show up in the user code… so it seemed to make sense to not invoke any special logic on it? Unless I guess they want to make it dynamic somehow? 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I guess they want to make it dynamic somehow?
Yeah that's all I could think of. It just felt wrong to me to take in a bool and ignore it. I could see it as being confusing behavior to track down if you're expecting false to work for whatever reason.
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle LGTM, last few nits and comments inline. Sorry for not spotting them last time around.
Signed-off-by: Alex Saveau <[email protected]>
Depends on #85. Will rebase once that's in.Done.Note that the two commits should NOT be squashed as the second one contains benchmarking code.Turns out GitHub saves the PR history even when squashing, so nevermind since I feel like these commits are messier.
The one major downside of this caching is that for long-running applications which occasionally have super long field values, we have no way of reclaiming that memory. I think that's ok until someone actually runs into this issue and can tell us more about their needs. For example, maybe we offer a "clear cache" API that they could run? Or maybe they need to configure the max cache size? Not sure, so probs easiest to wait unless you think this needs to be solved (in which case I'd rather do that in a follow up PR). TL;DR: the memory usage currently scales as O(num_active_spans*max_field_value_length).
The results are in: almost double the performance!
Before:
After: