Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store events in thread local buffer in simple format, convert to proto when flushing #3

Open
dsharlet opened this issue Nov 18, 2024 · 1 comment

Comments

@dsharlet
Copy link
Owner

Instead of encoding a protobuf directly into the thread local buffer, we could just record a simple struct of events, and generate the protobuf when flushing to the file.

This would reduce overhead in the tracing functions, but would cause flushes to be slower instead. There are pros and cons to this.

@dsharlet
Copy link
Owner Author

I'm not sure this really makes sense. Here's a profile of on a trivial loop of mutex lock, unlock on a single thread:

  56.53%  benchmark  [vdso]                [.] __vdso_clock_gettime                                              
  19.92%  benchmark  libc.so.6             [.] __memmove_avx_unaligned_erms                                      
   5.22%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_end                    
   4.54%  benchmark  libstdc++.so.6.0.30   [.] std::chrono::_V2::system_clock::now                               
   2.80%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_begin_with_delta<2ul, (
   2.17%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_begin<(anonymous namesp
   1.96%  benchmark  ld-linux-x86-64.so.2  [.] __tls_get_addr                                                    
   1.41%  benchmark  libc.so.6             [.] pthread_mutex_lock@@GLIBC_2.2.5                                   
   0.69%  benchmark  libc.so.6             [.] clock_gettime@@GLIBC_2.17                                         
   0.67%  benchmark  libc.so.6             [.] pthread_mutex_unlock@@GLIBC_2.2.5                                 
   0.67%  benchmark  pthread_trace.so      [.] pthread_mutex_unlock                                              
   0.64%  benchmark  libstdc++.so.6.0.30   [.] 0x000000000009eb10                                                
   0.58%  benchmark  pthread_trace.so      [.] pthread_mutex_lock                                                
  • I don't think there's much room to improve reading the clock
  • The memcpy is mostly copying from the thread local buffer to the global circular buffer

So it seems like at most a ~30% improvement is on the table. That's probably not worth a lot of added complexity...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant