Skip to content
This repository has been archived by the owner on Feb 4, 2023. It is now read-only.

Stacktrace-based tool issues #8

Open
joelagnel opened this issue Jan 10, 2018 · 8 comments
Open

Stacktrace-based tool issues #8

joelagnel opened this issue Jan 10, 2018 · 8 comments

Comments

@joelagnel
Copy link
Owner

joelagnel commented Jan 10, 2018

  • offcputime on arm64 shows “unknown” in the stack trace entries

FIXED:

  • tools like stackcount that output tonnes of stack traces are still slow – possible fix is to dump the entire map remotely and send it at once instead of individually sending each key/value as a separate command.
@joelagnel
Copy link
Owner Author

Second bullet implemented. Perf faster but not good enough.

@joelagnel
Copy link
Owner Author

joelagnel commented Jan 13, 2018

Found out that hash table dumps for StackTrace maps don't work as expected. Tried to force a dump with StackTrace maps, but turns out the bpf_get_first_key on a StackTrace map doesn't return anything even though there's stuff in the map, unlike regular maps. Kernel bug?

Ideally the following BCC patch should work if the kernel was giving us what we wanted:

diff --git a/src/python/bcc/remote/libremote.py b/src/python/bcc/remote/libremote.py
index 3e7a2e6..f565a94 100644
--- a/src/python/bcc/remote/libremote.py
+++ b/src/python/bcc/remote/libremote.py
@@ -33,6 +33,9 @@ class LibRemote(object):
         # Cache of maps, format: map_cache[map_fd] = { 'key1': 'value1', .. }
         self.nkey_cache = {}
         self.map_cache = {}
+        # Has the map been fully dumped once before the last delete/clear ?
+        # format: { map_fd: [True|False] }
+        self.map_dumped = {}
 
         # Create the remote connection
         self.remote = cls(remote_arg)
@@ -60,6 +63,11 @@ class LibRemote(object):
 
         return (int(m.group(1)), ret)
 
+    def _invalidate_map_cache(self, map_fd):
+        self.map_cache[map_fd] = {}
+        self.nkey_cache[map_fd] = {}
+        self.map_dumped[map_fd] = {}
+
     def available_filter_functions(self, tracefs):
         cmd = "GET_AVAIL_FILTER_FUNCS {}".format(tracefs)
         ret = self._remote_send_command(cmd)
@@ -131,6 +139,12 @@ class LibRemote(object):
             if kstr in self.map_cache[map_fd]:
                 return (0, [self.map_cache[map_fd][kstr]])
 
+        # Some maps like StackTrace may not trigger a get_first_key before lookup
+        # since the keys can be obtained through other maps (like counts in offcputime)
+        # Force a get_first_key so that the entire map is cached.
+        if map_fd not in self.map_dumped or self.map_dumped[map_fd] == False:
+            self.bpf_get_first_key(map_fd, klen, llen, dump_all=True)
+
         cmd = "BPF_LOOKUP_ELEM {} {} {} {}".format(map_fd, kstr, klen, llen)
         ret = self._remote_send_command(cmd)
         return ret
@@ -140,10 +154,11 @@ class LibRemote(object):
         ret = self._remote_send_command(cmd)
         return ret[0]
 
-    def bpf_get_first_key(self, map_fd, klen, vlen):
-        cmd = "BPF_GET_FIRST_KEY {} {} {}".format(map_fd, klen, vlen)
+    def bpf_get_first_key(self, map_fd, klen, vlen, dump_all=True):
+        cmd = "BPF_GET_FIRST_KEY {} {} {} {}".format(map_fd, klen, vlen, 1 if dump_all else 0)
         ret = self._remote_send_command(cmd)
-        if ret[0] < 0:
+
+        if not dump_all or ret[0] < 0:
             return ret
 
         # bpfd will dump the entire map on first get key so it can be
@@ -163,6 +178,8 @@ class LibRemote(object):
                 self.nkey_cache[map_fd][prev_key] = key
             prev_key = key
 
+        self.map_dumped[map_fd] = True
+
         return (0, [first_key])
 
     def bpf_get_next_key(self, map_fd, kstr, klen):
@@ -177,16 +194,13 @@ class LibRemote(object):
     def bpf_delete_elem(self, map_fd, kstr, klen):
         cmd = "BPF_DELETE_ELEM {} {} {}".format(map_fd, kstr, klen)
         ret = self._remote_send_command(cmd)
-        # Invalidate cache for this map on any element delete
-        self.map_cache[map_fd] = {}
-        self.nkey_cache[map_fd] = {}
+        _invalidate_map_cache(map_fd)
         return ret[0]
 
     def bpf_clear_map(self, map_fd, klen):
         cmd = "BPF_CLEAR_MAP {} {}".format(map_fd, klen)
         ret = self._remote_send_command(cmd)
-        self.map_cache[map_fd] = {}
-        self.nkey_cache[map_fd] = {}
+        _invalidate_map_cache(map_fd)
         return ret[0]
 
     def perf_reader_poll(self, fd_callbacks, timeout):

Another fix could be, if all keys could be provided to BPFd in the same invocation, however that would need modification to individual tools.

@joelagnel joelagnel changed the title stacktrace-based tool issues Stacktrace-based tool issues Jan 13, 2018
@joelagnel
Copy link
Owner Author

Kernel needs key iteration ability in stackmap for above patch to work. Will work on that soon.

@joelagnel
Copy link
Owner Author

Turns out bpf_get_next_key for stackmap needed implementation in the kernel, I did that and checked in the patch to patches/kernel/ in this project.
Also updated corresponding BCC code to use this feature. Speed up is 10x or more with this!

Next... to figure out why we see "[unknown]" in stacktraces on ARM64. Symbol look ups has my attention!

@joelagnel
Copy link
Owner Author

TODO: Update INSTALL.md pointing to mandatory kernel patch

@joelagnel
Copy link
Owner Author

Symbol look ups when walking stack traces is completely broken for "remote" usecases.
This will (unfortunately) require replicating the following on the remote side:
https://github.com/iovisor/bcc/blob/master/src/cc/bcc_syms.cc

@jcanseco
Copy link
Contributor

Update: kernel and userspace symbol lookups have been implemented (see #25 and #26).

@jcanseco
Copy link
Contributor

We may be able to close this issue now. Stacktrace-based tools are now working with the latest symbol lookup patches (in addition to your stackmap changes). If there are other issues, I feel that they should be considered as new GitHub issues instead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants