-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document internals of invalidation and hook profiler #585
Document internals of invalidation and hook profiler #585
Conversation
And inline the actual used function in the companion of `IncrementalCommon`, with a better and simpler implementation.
While I was trying to implement some improvements here, I realize that the current code can hardly be read and reused. I struggled as well to understand the nitty-gritty of the algorithm as the control flow was not clear and the order of functions was confusing. This commit aims to fix this by reorganising the code to allow for more reuse (functions in `IncrementalCommon`'s companion), reorganizes the outline of `IncrementalCommon` to have clearer names and structure and documents the most important parts of the code with implementation and contract details.
2952644
to
2e63f5d
Compare
I'll fix the MiMa errors later on, but this PR is ready for review. |
zprof is the name I've chosen for this small profiler (or tracker if you will) of the invalidation logic. The profiled data is formalized in an internal format that is not supposed to be used by normal users, but rather by us (Zinc) and related tools (Bloop). The current profiled data exposes details of how the incremental compiler works internally and how it invalidates classes. This is the realization of an idea I registered here: sbt#550 With this idea, this data will not only be useful for debugging but for providing an automatic way of reporting bugs in Zinc. The infrastructure is far from finished but it's already in a usable state for libraries that depend on Zinc directly and have direct access to `Incremental`. By default, no profiler is used. Only people that change the profiler argument for `Incremental.compile` will be able to get the run profiles.
These methods are not exposed to the public API of Zinc and can therefore be changed with certainty.
2e63f5d
to
13c7d4d
Compare
MiMa errors are fixed, CI passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the comments, making things more understandable for a zinc newcomer 👍 😍
But I want more 😂
The general implementation seems reasonable from the understanding I have so far (which can be described as almost nothing 😛 ). The only question I have is why the toProfile
method is not part of the public API. It seems to me that this is the method to get the profile data in a serializable format
@@ -0,0 +1,68 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add more documentation on the messages and the zprof.proto
proto file 😃
Especially answering questions like
- why is this message necessary?
- who/what might actually use this message to solve which problem
- a remake to this pull request which marks the initial implementation
} | ||
|
||
object InvalidationProfiler { | ||
final val empty: InvalidationProfiler = new InvalidationProfiler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the implementation I guess this is being used when no profiling should be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
|
||
def profileRun: RunProfiler = new ZincProfilerImplementation | ||
|
||
private final var runs: List[zprof.ZincRun] = Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some information why we need to persist every run here?
Also this list grows with every compile run if I understand this correctly?
AFAIK it's only used for serialization to disc. Should we empty the runs after
serialization? Or is the memory footprint small enough that we should never care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZincProfiler
is managed outside of Zinc, by another tool. Whomever uses this profiler is responsible for managing the lifetime of the profilers used. That being said, the memory profile of the runs is very small given that we use a string table
* It is recommended to only perform this operation when we are | ||
* going to persist the profiled protobuf data to disk. Do not | ||
* call this function after every compiler iteration as the aggregation | ||
* of the symbol tables may be expensive, it's recommended to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this recommended? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the string table for very big projects could be reasonable big, so you want to make the most out of it for as many runs as you can
private final var lastKnownIndex: Long = -1L | ||
private final val stringTable1: ArrayBuffer[String] = new ArrayBuffer[String](1000) | ||
private final val stringTable2: ArrayBuffer[String] = new ArrayBuffer[String](10) | ||
private final val stringTableIndices: mutable.HashMap[String, Long] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some information what these indices are for? String
/ Long
are very generic types and people like me with very few compiler experience would find one sentence explaining the variable quite helpful 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you write? I didn't write anything because i thought the variable names were self-descriptive, i.e. stringTableIndices
means a map from string to indices of the string table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I don't understand (yet) is what these strings are. Class names, variable names, file paths or all of this 😁 And long
is the index of what.
If you say it's self explaining once one is deeper into how things work, I'm fine with not putting any docs on it 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think you make a good point. It's likely that I'm too much into the implementation that I see it obvious, but it isn't. Thanks for pointing that out, I'll add a quick comment.
} | ||
} | ||
|
||
abstract class RunProfiler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here 😉 I find some scaladocs on a fresh API usually very helpful to get
the intention and purpose of the API. The question why is this interface necessary
should be answered in the scaladocs 😃
): Unit | ||
|
||
def registerEvent( | ||
kind: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind
s are allowed here? Is there any other implementation I can look at to see what
is applicable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in generic on purpose so that we can easily involve it in the future, even though we only have one implementation. The kinds are defined in https://github.com/sbt/zinc/pull/585/files#diff-de86f02542b47ffddbb8687a94316daeR259
) | ||
} | ||
|
||
private final var currentEvents: List[zprof.InvalidationEvent] = Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with runs
. This grows unbounded, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this grows unbounded? In every run, we create a new RunProfiler
so that new profile wont' see events from the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks 👍 Misunderstood that.
currentEvents = event :: currentEvents | ||
} | ||
|
||
private final var cycles: List[zprof.CycleInvalidation] = Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with runs
. This grows unbounded, right?
* | ||
* @return An immutable zprof profile that can be persisted via protobuf. | ||
*/ | ||
def toProfile: zprof.Profile = zprof.Profile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most important method in the whole implementation, right?
It creates a serializable format for the profile data gathered so far. Why isn't this part
of the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this exposes internal details of the algorithm, and I only want people that really know what they're doing using it 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay 😄
@muuki88 Thanks for the review, I answered most of your comments. Regarding the additions of more docs (for example in the profiling schema), I'm afraid I won't be able to do so until the beginning of October. My schedule for the next 4 weeks looks pretty busy. |
@jvican thanks for the quick feedback 🤗
no worries 😄 |
I realized how stupid it was the idea of keeping two string tables intead of one, since it's unlikely we'll ever have more than 2147483647 unique string instances and the JVM doesn't allow to instantiate such a big array in normal heap sizes. So we remove this part of the design that we inherited from pprof.
@muuki88 OK, motivated by your comments I quickly went into the GitHub interface and submitted two changes: more docs and an improvement to the internals of the profiler. Feel free to have a look at them and tell me if they are useful. |
Given that Google has not move protobuf towards JDK 9/10/11 yet, I am concerned about bringing Protobuf in as a critical piece. As far as I know, the current usage is somewhat limited to Analysis serialization only right? What do you think about using Contraband instead here? |
all ++ invalidated // need the union because all doesn't contain removed sources | ||
} else invalidated | ||
} | ||
def recompileClasses( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation, reorganization, and some changes are intermixed in this commit, so it's difficult for me to preview, but have you made any intentional change in the observable behavior, either in the difference in output or in performance characteristics? For example, I noticed that the method named recompileClasses
are no longer recompiling classes, but it's recompiling sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s no change in semantics, I double checked this several times manually and the tests confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt the change you mention @eed3si9n, the reason why you see that we recompile sources instead of classes in a clearer way is because recompileClasses
was doing the mapping between invalidated classes and sources to recompile, and I pulled that logic out of it to keep things clear.
It can be ported in the future if need so, but the elephant in the the room is the analysis file implementation. Conversely, this use of protobuf can be easily disabled without affecting the correctness of the language (and in fact by default it won’t even initialize the protobuf classes — the warning won’t happen), so I wouldn’t worry about it. The protobuf issue will be fixed down the road (I have high confidence in that), and if it doesn’t we can find solutions by then. |
Awesome @jvican ! That clarifies a lot of things 😍 |
Analysis file implementation can be switched back to using text file like sbt 0.13.
If it's easy to not use protobuf, let's switch to using Contraband like the rest of the datatypes. |
I think you misread me 😛 I didn't say it's easy to switch to Contraband (it would actually take me quite a lot of time that I cannot afford now), I said it's easy not to use the protobuf-based profiler in the future. By default, the profiler is empty and in the worst-case scenario if protobuf doesn't remove the warning in JDK 11 you can (1) no use any profiler at all or (2) implement your own (and that is if you use it in the first place). |
While I was trying to implement some improvements here, I realize that
the current code can hardly be read and reused. I struggled as well to
understand the nitty-gritty of the algorithm as the control flow was not
clear and the order of functions was confusing.
This PR aims to fix this by reorganising the code to allow for more
reuse (functions in
IncrementalCommon
's companion), reorganizes theoutline of
IncrementalCommon
to have clearer names and structure anddocuments the most important parts of the code with implementation and
contract details. Aside from that, it introduces a protobuf data to
track all the invalidation logic. This protobuf is internal to Zinc
and should only be used by users that really know what they are doing --
as the format can change in the future (protobuf makes it easy to evolve
this profiling schema). This is related to the idea I pitched in in #550.
With this idea, this data will not only be useful for debugging but for
providing an automatic way of reporting bugs in Zinc. The infrastructure
is far from finished but it's already in a usable state for libraries
that depend on Zinc directly and have direct access to
Incremental
.For performance reasons, a string table is kept per zinc profiler. More
details in the scaladoc.
Note that by default no profiler is used. Only people that change the
profiler argument for
Incremental.compile
will be able to get the runprofiles. How these profiles are persisted is out of scope of this PR.