-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the raw_lines field in stackprof for more precise lines #483
Conversation
Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.
Hi @dalehamel! Thanks for the patch — happy to merge this after tests are fixed It looks like the new test file wasn't checked in:
|
Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:
If |
fc98015
to
5175aa1
Compare
Whoops, i amended the commit but forgot to push. |
I'll need to look into this further and get back to you |
One possible solution would be to include the line number as part of the key. Looks like it accepts a string, so a quick and dirty way to prevent the collisions would be to append the line number to the frame id, eg: diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index bd20c5d..e4cd7f5 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -39,8 +39,9 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
if (frameName == null) {
frameName = '(unknown)'
}
+ const key = id.toString(10) + "_" + lineNo.toString(10);
const frame = {
- key: id,
+ key: key,
...frames[id],
line: lineNo,
name: frameName, This addresses some weirdness I was seeing in one of my profiles which I think was what you called out. This is one solution to ensure that they are keyed differently. I'll get into a bit more detail about this in a bit. |
The stackprof frame key is updated to include the frame id in the upper 48 bits, and line number in the lower 16 bits. To prevent truncation when casting to a number, this resulting BigInt is converted to a string. This ensures that frames with the same start line but different call lines do not collide when deduping frames.
@jlfwong please take another look - i updated the description to address the collision issue you pointed out. I believe I have come up with a keying strategy that should resolve this. Unfortunately it changes the key from a number to a string, as a general rule I prefer integers for performance reasons but in this case due to If we were to update Lines 46 to 49 in d69f3d0
|
// Use a similar trick to https://github.com/tmm1/stackprof/pull/213 | ||
// generates a unique key for the name id + call line. | ||
// Converts to a string to prevent truncation from BigInt to number | ||
key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm accepting Aaron's argument here about arm and x86 reserving these 16 bits as fact. Using bigint to get a 64 bit container for the bits and so that the bit shifting won't truncate it.
Instead of shifting 48 though, i'm going the other way and shoving the frame ID into the upper bits. This pushes them into the more significant bits to make room for the line number, for which 16 bits is ample.
Unfortunately, I have to convert it to a string because 64 bits is more than we can depend on for a "number", for which only 53 bits are available.
So in the end, all this fancy math leads to a hash key that is about as useful as just converting both to strings and concatenating them.
I am tempted to go with something like this, but there is a small possibility of collisions with it. Keeping it in a numerical type should be faster and use less space than strings though:
From baa30c0ba0130f4638db46585ddcaaa230864053 Mon Sep 17 00:00:00 2001
From: Dale Hamel <[email protected]>
Date: Fri, 11 Oct 2024 20:55:02 -0400
Subject: [PATCH] Build stackprof frame key with rhudimentary "hash"
Combine frame id with line number using bitwise to try and keep the key within
53 bits.
Collisions should be unlikely but this won't win any awards for best hasher.
Tries to optimize for the frame address being at most 48 bits, and the line
number being unable to exceed 16 bits. XOR with itself and prime number
before bit shifting to further reduce loss of precision
Should be much faster than string operations, but might come at the cost of
correctness in rare cases.
---
.../__snapshots__/stackprof.test.ts.snap | 90 +++++++++----------
src/import/stackprof.ts | 23 ++++-
2 files changed, 64 insertions(+), 49 deletions(-)
diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index 93c4c94..db5bbf7 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -40,10 +40,7 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
frameName = '(unknown)'
}
const frame = {
- // Use a similar trick to https://github.com/tmm1/stackprof/pull/213
- // generates a unique key for the name id + call line.
- // Converts to a string to prevent truncation from BigInt to number
- key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10),
+ key: bitShiftXorNumbers(id, lineNo ? lineNo : 0),
...frames[id],
line: lineNo,
name: frameName,
@@ -76,3 +73,21 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
return profile.build()
}
+
+/*
+This is not really generalizable, just meant to be a relatively fast way to
+combine the two numbers while not losing precision. We can't use the same
+representation that stackprof uses of just bit shifting, so we "hash" them
+by bit shifting and Xoring them.
+*/
+function bitShiftXorNumbers(num1: number, num2: number): number {
+ const prime1 = 31 // A small prime number to "seed" the "hash"
+ const prime2 = 37 // Another small prime number
+
+ // Use a combination of multiplication and addition to reduce collision risk
+ const hash1 = (num1 * prime1) ^ (num1 >> 16) // XOR with a right shift of num1
+ const hash2 = (num2 * prime2) ^ (num2 << 8) // XOR with a left shift of num2
+
+ // Shift hash1 left to make room for hash2 and combine them
+ return (hash1 << 16) ^ hash2
+}
--
2.45.1
Maybe start with strings and switch to a numerical approach like this if perf becomes an issue?
@jlfwong - gentle nudge as it has been a week, do you have a preference on the key generation approaches? FWIW we've been using the string based approach in prod and it seems to be fine performance-wise. |
@jlfwong last ping I'll give, when you are able to take a look let me know but I'm moving on to other work and running a forked version for now. |
Hey, sorry I've been MIA Only have a couple minutes right now, but I think I may not have communicated my concern here correctly.
When I was describing the problem here, my concern is that, IMO, we want these two "a" frames to be collapsed. If I understand this change correctly (which I might not be), then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases. This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time. Am I misunderstanding the nature of this change? |
FWIW I rolled this out internally a couple of weeks ago, and the response has been positive - developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.
Yes
Yes, this does happen. They are displayed with the same name, but then on clicking on them, you can see there are different line numbers. I agree this messes up the "top table" part of the sandwich view, as a frame that has multiple call sites can now be showed below a frame that only has one, but actually has a lower total overall weight. Perhaps the top table code could be amended to just use the frame names as the "key" when performing the aggregation in the table, but show the distinct call sites in the actual sandwich graphical view. In the "pprof" format, it is possible to store both the method definition line as well as the actual call line. Perhaps a compromise would be to amend the data structures to be able to store both, and have an option to toggle exact line numbers, or line numbers of the method definition only. |
Yes I agree
This is more or less what firefox-profiler does, when used with the vernier profiler. I'm going to close this, and revert this change internally. The changes you describe above make sense, but are beyond my typescript capabilities. I think as a way forward, I'll push folks to use vernier + firefox-profiler. If we can eventually add support to speedscope, that's great. |
Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.
From the existing simple.rb:
speedscope/sample/programs/ruby/simple.rb
Lines 32 to 34 in d69f3d0
On main, it just gives the method start line:
With this patch, we get the actual line number of the call:
I've added some rhudimentary tests, to cover both older stackprof files that are missing
raw_lines
, and newer ones that have it. I basically just walked "children[0]" manually until i found a node with a line. Since the file is checked in, i think this is fine.EDIT: addressing frame collisions
As called out in #483 (comment), my first commit could lead to collisions due to frames that were not identical using the same key. This is addressed in 136dd99
Looking at the following code:
speedscope/sample/programs/ruby/simple.rb
Lines 3 to 10 in d69f3d0
On main, we only see the frame info showing
a
calling from line 4:But now, we have two different frames for
a
, one with line 5, and one with line 8:In a more pathological example, I modified
simple.rb
to insert a sleep on line 16 to show how previously we would in fact lose information. Here is a graph on main:And the other tree for the code path where it doesn't sleep:
Here it is with the patch:
And the other tree for the code path where it doesn't sleep:
We can see that with the patch, it correctly shows the call site of "sleep" and is able to distinguish these code paths, one for line 16 where sleep is actually called, and one for line 13 for the other code path and indeed another on line 14 for the other instance of b. Previously these would all have had the same value. The graph has more nodes, and we are able to have different instances of the "b" frame.