Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize tracing-tracy allocations #86

Merged
merged 11 commits into from
Jan 1, 2024
Merged

Minimize tracing-tracy allocations #86

merged 11 commits into from
Jan 1, 2024

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Dec 26, 2023

Depends on #85. Will rebase once that's in. Done.


Note that the two commits should NOT be squashed as the second one contains benchmarking code.

Turns out GitHub saves the PR history even when squashing, so nevermind since I feel like these commits are messier.


The one major downside of this caching is that for long-running applications which occasionally have super long field values, we have no way of reclaiming that memory. I think that's ok until someone actually runs into this issue and can tell us more about their needs. For example, maybe we offer a "clear cache" API that they could run? Or maybe they need to configure the max cache size? Not sure, so probs easiest to wait unless you think this needs to be solved (in which case I'd rather do that in a follow up PR). TL;DR: the memory usage currently scales as O(num_active_spans*max_field_value_length).


The results are in: almost double the performance!

Note that I'm testing this with #87 as I'll deal with the format! allocation once we have a path forward there.

~> tracy-capture -o ~/Desktop/ftzz4.tracy -f
Connecting to 127.0.0.1:8086...
Queue delay: 19 ns
Timer resolution: 19 ns
 471.09 Mbps / 18.8% =2501.45 Mbps | Tx: 563 MB | 1020.05 MB | 10.21 s
Frames: 2
Time span: 10.25 s
Zones: 4,033,823
Elapsed time: 9.93 s
Saving trace... done!
Trace size 282.4 MB (37.28% ratio)
~> tracy-capture -o ~/Desktop/ftzz5.tracy -f
Connecting to 127.0.0.1:8086...
Queue delay: 21 ns
Timer resolution: 21 ns
 498.87 Mbps / 18.2% =2744.67 Mbps | Tx: 366.32 MB | 722.33 MB | 6.67 s
Frames: 2
Time span: 6.69 s
Zones: 4,033,813
Elapsed time: 6.41 s
Saving trace... done!
Trace size 193.94 MB (34.32% ratio)

Before:
image

After:
image

Benchmarking code:

```
use std::cell::{Cell, RefCell, UnsafeCell};
use std::env::args_os;
use std::ffi::OsStr;

fn a() {
    thread_local! {
        static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) };
    }

    for i in 0..10000 {
        TLS.with(|s| {
            s.borrow_mut().push(i);
        });
    }
    TLS.with(|s| s.borrow_mut().clear());
}

fn b() {
    thread_local! {
        static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) };
    }

    for i in 0..10000 {
        TLS.with(|s| {
            unsafe { s.try_borrow_mut().unwrap_unchecked() }.push(i);
        });
    }
    TLS.with(|s| unsafe { s.try_borrow_mut().unwrap_unchecked() }.clear());
}

fn c() {
    thread_local! {
        static TLS: Cell<Vec<usize>> = const { Cell::new(Vec::new()) };
    }

    for i in 0..10000 {
        TLS.with(|s| {
            let mut v = s.take();
            v.push(i);
            s.set(v);
        });
    }
    TLS.with(|s| {
        let mut v = s.take();
        v.clear();
        s.set(v);
    });
}

fn d() {
    thread_local! {
        static TLS: UnsafeCell<Vec<usize>> = const { UnsafeCell::new(Vec::new()) };
    }

    for i in 0..10000 {
        TLS.with(|s| {
            unsafe { &mut *s.get() }.push(i);
        });
    }
    TLS.with(|s| unsafe { &mut *s.get() }.clear());
}

fn e() {
    thread_local! {
        static TLS: UnsafeCell<Vec<usize>> = UnsafeCell::new(Vec::new());
    }

    for i in 0..10000 {
        TLS.with(|s| {
            unsafe { &mut *s.get() }.push(i);
        });
    }
    TLS.with(|s| unsafe { &mut *s.get() }.clear());
}

fn f() {
    thread_local! {
        static TLS: RefCell<Vec<usize>> = RefCell::new(Vec::new());
    }

    for i in 0..10000 {
        TLS.with(|s| {
            s.borrow_mut().push(i);
        });
    }
    TLS.with(|s| s.borrow_mut().clear());
}

fn main() {
    let cmd = args_os().nth(1).unwrap();
    for _ in 0..1000 {
        if cmd == OsStr::new("refcell") {
            a();
        } else if cmd == OsStr::new("refcell_unsafe") {
            b();
        } else if cmd == OsStr::new("cell") {
            c();
        } else if cmd == OsStr::new("unsafecell") {
            d();
        } else if cmd == OsStr::new("unsafecell_non_const") {
            e();
        } else if cmd == OsStr::new("refcell_non_const") {
            f();
        } else {
            panic!();
        }
    }
}
```

Benchmarks:

```
> hyperfine -N "./foo refcell" "./foo refcell_unsafe" "./foo cell" "./foo unsafecell" "./foo unsafecell_non_const" "./foo refcell_non_const"
Benchmark 1: ./foo refcell
  Time (mean ± σ):      17.2 ms ±   0.6 ms    [User: 16.8 ms, System: 0.3 ms]
  Range (min … max):    15.0 ms …  21.3 ms    143 runs

Benchmark 2: ./foo refcell_unsafe
  Time (mean ± σ):      18.1 ms ±   0.3 ms    [User: 17.7 ms, System: 0.3 ms]
  Range (min … max):    16.6 ms …  21.0 ms    167 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 3: ./foo cell
  Time (mean ± σ):      36.0 ms ±   0.9 ms    [User: 35.9 ms, System: 0.1 ms]
  Range (min … max):    34.3 ms …  42.7 ms    81 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 4: ./foo unsafecell
  Time (mean ± σ):      15.2 ms ±   1.1 ms    [User: 14.7 ms, System: 0.3 ms]
  Range (min … max):    14.2 ms …  19.2 ms    210 runs

Benchmark 5: ./foo unsafecell_non_const
  Time (mean ± σ):      15.5 ms ±   0.3 ms    [User: 15.0 ms, System: 0.4 ms]
  Range (min … max):    15.0 ms …  16.3 ms    193 runs

Benchmark 6: ./foo refcell_non_const
  Time (mean ± σ):      25.9 ms ±   5.3 ms    [User: 25.5 ms, System: 0.3 ms]
  Range (min … max):    17.9 ms …  42.2 ms    146 runs

Summary
  './foo unsafecell' ran
    1.02 ± 0.08 times faster than './foo unsafecell_non_const'
    1.13 ± 0.09 times faster than './foo refcell'
    1.19 ± 0.09 times faster than './foo refcell_unsafe'
    1.71 ± 0.37 times faster than './foo refcell_non_const'
    2.37 ± 0.18 times faster than './foo cell'
```

Signed-off-by: Alex Saveau <[email protected]>
…there—go back to TLS, but use the new VecCell abstraction

Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
@SUPERCILEX SUPERCILEX changed the title Minimize TracyEventFieldVisitor allocations Minimize tracing-tracy allocations Dec 27, 2023
@SUPERCILEX
Copy link
Contributor Author

Ok, this should be ready for review. I've updated the PR description with results.

tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
fn record_bool(&mut self, field: &Field, value: bool) {
match (value, field.name()) {
(true, "tracy.frame_mark") => self.frame_mark = true,
_ => self.record_debug(field, &value),
(_, "tracy.frame_mark") => self.frame_mark = value,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to set the value... was the previous behavior intentional?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle a field name for an event can occur only once. I don’t particularly care either way, but tracy.frame_mark = false does not really have any reason to show up in the user code… so it seemed to make sense to not invoke any special logic on it? Unless I guess they want to make it dynamic somehow? 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I guess they want to make it dynamic somehow?

Yeah that's all I could think of. It just felt wrong to me to take in a bool and ignore it. I could see it as being confusing behavior to track down if you're expecting false to work for whatever reason.

tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Owner

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle LGTM, last few nits and comments inline. Sorry for not spotting them last time around.

Signed-off-by: Alex Saveau <[email protected]>
@nagisa nagisa merged commit 0684979 into nagisa:main Jan 1, 2024
36 checks passed
@SUPERCILEX SUPERCILEX deleted the alloc branch January 1, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants