Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the raw_lines field in stackprof for more precise lines #483

Closed
wants to merge 2 commits into from

Conversation

dalehamel
Copy link

@dalehamel dalehamel commented Oct 11, 2024

Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.

From the existing simple.rb:

def e
sleep 0.05
end

On main, it just gives the method start line:

Screenshot 2024-10-10 at 10 34 19 PM

With this patch, we get the actual line number of the call:

Screenshot 2024-10-10 at 10 33 00 PM

I've added some rhudimentary tests, to cover both older stackprof files that are missing raw_lines, and newer ones that have it. I basically just walked "children[0]" manually until i found a node with a line. Since the file is checked in, i think this is fine.

EDIT: addressing frame collisions

As called out in #483 (comment), my first commit could lead to collisions due to frames that were not identical using the same key. This is addressed in 136dd99

Looking at the following code:

def a
for i in 0..100 do
b
c
e
end
end

On main, we only see the frame info showing a calling from line 4:

Screenshot 2024-10-11 at 9 53 19 AM

But now, we have two different frames for a, one with line 5, and one with line 8:

Screenshot 2024-10-11 at 9 52 27 AM

In a more pathological example, I modified simple.rb to insert a sleep on line 16 to show how previously we would in fact lose information. Here is a graph on main:

Screenshot 2024-10-11 at 10 02 23 AM

And the other tree for the code path where it doesn't sleep:

Screenshot 2024-10-11 at 10 03 25 AM

Here it is with the patch:

Screenshot 2024-10-11 at 10 00 35 AM

And the other tree for the code path where it doesn't sleep:

Screenshot 2024-10-11 at 10 03 42 AM

We can see that with the patch, it correctly shows the call site of "sleep" and is able to distinguish these code paths, one for line 16 where sleep is actually called, and one for line 13 for the other code path and indeed another on line 14 for the other instance of b. Previously these would all have had the same value. The graph has more nodes, and we are able to have different instances of the "b" frame.

Since tmm1/stackprof#213 the actual source line within
the method, not just the method start line, should be available.
@jlfwong
Copy link
Owner

jlfwong commented Oct 11, 2024

Hi @dalehamel! Thanks for the patch — happy to merge this after tests are fixed

It looks like the new test file wasn't checked in:

Summary of all failing tests
FAIL src/import/stackprof.test.ts
  ● reads correct line numbers › uses the most precise iseq line if raw_lines is available

    ENOENT: no such file or directory, open './sample/profiles/stackprof/wall_with_raw_lines.json'

      at Object.openSync (node:fs:596:3)
      at readFileSync (node:fs:464:35)
      at src/import/stackprof.test.ts:64:37
      at src/import/stackprof.test.ts:8:71
      at Object.<anonymous>.__awaiter (src/import/stackprof.test.ts:4:12)
      at Object.<anonymous> (src/import/stackprof.test.ts:63:80)

@jlfwong
Copy link
Owner

jlfwong commented Oct 11, 2024

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
  b
  c
end

If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

@dalehamel
Copy link
Author

It looks like the new test file wasn't checked in:

Whoops, i amended the commit but forgot to push.

@dalehamel
Copy link
Author

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
  b
  c
end

If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

I'll need to look into this further and get back to you

@dalehamel
Copy link
Author

One possible solution would be to include the line number as part of the key. Looks like it accepts a string, so a quick and dirty way to prevent the collisions would be to append the line number to the frame id, eg:

diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index bd20c5d..e4cd7f5 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -39,8 +39,9 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
       if (frameName == null) {
         frameName = '(unknown)'
       }
+      const key = id.toString(10) + "_" + lineNo.toString(10);
       const frame = {
-        key: id,
+        key: key,
         ...frames[id],
         line: lineNo,
         name: frameName,

This addresses some weirdness I was seeing in one of my profiles which I think was what you called out. This is one solution to ensure that they are keyed differently. I'll get into a bit more detail about this in a bit.

The stackprof frame key is updated to include the frame id in the upper
48 bits, and line number in the lower 16 bits.

To prevent truncation when casting to a number, this resulting BigInt
is converted to a string.

This ensures that frames with the same start line but different call lines
do not collide when deduping frames.
@dalehamel
Copy link
Author

@jlfwong please take another look - i updated the description to address the collision issue you pointed out. I believe I have come up with a keying strategy that should resolve this.

Unfortunately it changes the key from a number to a string, as a general rule I prefer integers for performance reasons but in this case due to number only permitting 53 bits, there is no way to stuff all the info in without the possibility of truncation. So use a similar trick as in tmm1/stackprof#213 to stuff the frame and line number together, but have to convert the result to a string.

If we were to update

readonly key: string | number
}
export class KeyedSet<T extends HasKey> implements Iterable<T> {
private map = new Map<string | number, T>()
to permit BigInt in addition to string and number, we can avoid this conversion but I'm not sure if there might be unintended consequences to this. For now, since the keyed set accepts strings, I just use convert the resulting BigInt to its base 10 string representation.

// Use a similar trick to https://github.com/tmm1/stackprof/pull/213
// generates a unique key for the name id + call line.
// Converts to a string to prevent truncation from BigInt to number
key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm accepting Aaron's argument here about arm and x86 reserving these 16 bits as fact. Using bigint to get a 64 bit container for the bits and so that the bit shifting won't truncate it.

Instead of shifting 48 though, i'm going the other way and shoving the frame ID into the upper bits. This pushes them into the more significant bits to make room for the line number, for which 16 bits is ample.

Unfortunately, I have to convert it to a string because 64 bits is more than we can depend on for a "number", for which only 53 bits are available.

So in the end, all this fancy math leads to a hash key that is about as useful as just converting both to strings and concatenating them.

I am tempted to go with something like this, but there is a small possibility of collisions with it. Keeping it in a numerical type should be faster and use less space than strings though:

From baa30c0ba0130f4638db46585ddcaaa230864053 Mon Sep 17 00:00:00 2001
From: Dale Hamel <[email protected]>
Date: Fri, 11 Oct 2024 20:55:02 -0400
Subject: [PATCH] Build stackprof frame key with rhudimentary "hash"

Combine frame id with line number using bitwise to try and keep the key within
53 bits.

Collisions should be unlikely but this won't win any awards for best hasher.
Tries to optimize for the frame address being at most 48 bits, and the line
number being unable to exceed 16 bits. XOR with itself and prime number
before bit shifting to further reduce loss of precision

Should be much faster than string operations, but might come at the cost of
correctness in rare cases.
---
 .../__snapshots__/stackprof.test.ts.snap      | 90 +++++++++----------
 src/import/stackprof.ts                       | 23 ++++-
 2 files changed, 64 insertions(+), 49 deletions(-)

diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index 93c4c94..db5bbf7 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -40,10 +40,7 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
         frameName = '(unknown)'
       }
       const frame = {
-        // Use a similar trick to https://github.com/tmm1/stackprof/pull/213
-        // generates a unique key for the name id + call line.
-        // Converts to a string to prevent truncation from BigInt to number
-        key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10),
+        key: bitShiftXorNumbers(id, lineNo ? lineNo : 0),
         ...frames[id],
         line: lineNo,
         name: frameName,
@@ -76,3 +73,21 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile

   return profile.build()
 }
+
+/*
+This is not really generalizable, just meant to be a relatively fast way to
+combine the two numbers while not losing precision. We can't use the same
+representation that stackprof uses of just bit shifting, so we "hash" them
+by bit shifting and Xoring them.
+*/
+function bitShiftXorNumbers(num1: number, num2: number): number {
+  const prime1 = 31 // A small prime number to "seed" the "hash"
+  const prime2 = 37 // Another small prime number
+
+  // Use a combination of multiplication and addition to reduce collision risk
+  const hash1 = (num1 * prime1) ^ (num1 >> 16) // XOR with a right shift of num1
+  const hash2 = (num2 * prime2) ^ (num2 << 8) // XOR with a left shift of num2
+
+  // Shift hash1 left to make room for hash2 and combine them
+  return (hash1 << 16) ^ hash2
+}
--
2.45.1

Maybe start with strings and switch to a numerical approach like this if perf becomes an issue?

@dalehamel
Copy link
Author

@jlfwong - gentle nudge as it has been a week, do you have a preference on the key generation approaches? FWIW we've been using the string based approach in prod and it seems to be fine performance-wise.

@dalehamel
Copy link
Author

@jlfwong last ping I'll give, when you are able to take a look let me know but I'm moving on to other work and running a forked version for now.

@jlfwong
Copy link
Owner

jlfwong commented Oct 31, 2024

Hey, sorry I've been MIA

Only have a couple minutes right now, but I think I may not have communicated my concern here correctly.

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
b
c
end
If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

When I was describing the problem here, my concern is that, IMO, we want these two "a" frames to be collapsed. If I understand this change correctly (which I might not be), then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Am I misunderstanding the nature of this change?

@dalehamel
Copy link
Author

IMO, we want these two "a" frames to be collapsed

FWIW I rolled this out internally a couple of weeks ago, and the response has been positive - developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.

then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

Yes

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Yes, this does happen. They are displayed with the same name, but then on clicking on them, you can see there are different line numbers. I agree this messes up the "top table" part of the sandwich view, as a frame that has multiple call sites can now be showed below a frame that only has one, but actually has a lower total overall weight. Perhaps the top table code could be amended to just use the frame names as the "key" when performing the aggregation in the table, but show the distinct call sites in the actual sandwich graphical view.

In the "pprof" format, it is possible to store both the method definition line as well as the actual call line. Perhaps a compromise would be to amend the data structures to be able to store both, and have an option to toggle exact line numbers, or line numbers of the method definition only.

@jlfwong
Copy link
Owner

jlfwong commented Nov 25, 2024

developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.

Ah I see. I hadn't understood that this was an intentional behavioral change, not just an update to which number is displayed in the interface (i.e. this actually produces structurally different flamegraphs, not just flamegraphs with different line number labels).

I'm afraid that I'm uncomfortable merging this PR as-is then. The rest of speedscope assumes that the line number is the line number of the function definition, not the line number of the invocation. It's interesting that the developers you've been working with have appreciated the change, but I would personally be quite bothered by this (especially if it was inconsistent depending on what file format was imported).

The thing I could easily get on board with is for speedscope profiles to contain both the function definition line and the call site line, as you suggested. But that's a different change than this.

In that world, the function definition line lives, as it does today, in Frame. The call site line, which would be newly introduced, would live in CallTreeNode. Frames correspond to functions, and are reused at many places in flamegraphs. CallTreeNode instances on the other hand correspond 1:1 with flame graph nodes.

Depending on the circumstance, we would use one v.s. the other. The only view that seems to me like it should unambiguously use the call site line number is the back trace view that's shown when you click on a rectangle in the time ordered view

image

@dalehamel
Copy link
Author

The thing I could easily get on board with is for speedscope profiles to contain both the function definition line and the call site line, as you suggested. But that's a different change than this.

Yes I agree

Depending on the circumstance, we would use one v.s. the other. The only view that seems to me like it should unambiguously use the call site line number is the back trace view that's shown when you click on a rectangle in the time ordered view

This is more or less what firefox-profiler does, when used with the vernier profiler.

I'm going to close this, and revert this change internally. The changes you describe above make sense, but are beyond my typescript capabilities. I think as a way forward, I'll push folks to use vernier + firefox-profiler. If we can eventually add support to speedscope, that's great.

@dalehamel dalehamel closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants