Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document internals of invalidation and hook profiler #585

Merged
merged 6 commits into from
Aug 31, 2018

Conversation

jvican
Copy link
Member

@jvican jvican commented Aug 26, 2018

While I was trying to implement some improvements here, I realize that
the current code can hardly be read and reused. I struggled as well to
understand the nitty-gritty of the algorithm as the control flow was not
clear and the order of functions was confusing.

This PR aims to fix this by reorganising the code to allow for more
reuse (functions in IncrementalCommon's companion), reorganizes the
outline of IncrementalCommon to have clearer names and structure and
documents the most important parts of the code with implementation and
contract details. Aside from that, it introduces a protobuf data to
track all the invalidation logic. This protobuf is internal to Zinc
and should only be used by users that really know what they are doing --
as the format can change in the future (protobuf makes it easy to evolve
this profiling schema). This is related to the idea I pitched in in #550.

With this idea, this data will not only be useful for debugging but for
providing an automatic way of reporting bugs in Zinc. The infrastructure
is far from finished but it's already in a usable state for libraries
that depend on Zinc directly and have direct access to Incremental.
For performance reasons, a string table is kept per zinc profiler. More
details in the scaladoc.

Note that by default no profiler is used. Only people that change the
profiler argument for Incremental.compile will be able to get the run
profiles. How these profiles are persisted is out of scope of this PR.

jvican added 2 commits August 26, 2018 14:22
And inline the actual used function in the companion of
`IncrementalCommon`, with a better and simpler implementation.
While I was trying to implement some improvements here, I realize that
the current code can hardly be read and reused. I struggled as well to
understand the nitty-gritty of the algorithm as the control flow was not
clear and the order of functions was confusing.

This commit aims to fix this by reorganising the code to allow for more
reuse (functions in `IncrementalCommon`'s companion), reorganizes the
outline of `IncrementalCommon` to have clearer names and structure and
documents the most important parts of the code with implementation and
contract details.
@jvican
Copy link
Member Author

jvican commented Aug 26, 2018

I'll fix the MiMa errors later on, but this PR is ready for review.

jvican added 2 commits August 26, 2018 20:04
zprof is the name I've chosen for this small profiler (or tracker if you
will) of the invalidation logic. The profiled data is formalized in an
internal format that is not supposed to be used by normal users, but
rather by us (Zinc) and related tools (Bloop).

The current profiled data exposes details of how the incremental
compiler works internally and how it invalidates classes. This is the
realization of an idea I registered here: sbt#550

With this idea, this data will not only be useful for debugging but for
providing an automatic way of reporting bugs in Zinc. The infrastructure
is far from finished but it's already in a usable state for libraries
that depend on Zinc directly and have direct access to `Incremental`.

By default, no profiler is used. Only people that change the profiler
argument for `Incremental.compile` will be able to get the run profiles.
These methods are not exposed to the public API of Zinc and can
therefore be changed with certainty.
@jvican jvican force-pushed the topic/wip-incremental-common branch from 2e63f5d to 13c7d4d Compare August 26, 2018 18:06
@jvican
Copy link
Member Author

jvican commented Aug 26, 2018

MiMa errors are fixed, CI passing.

Copy link

@muuki88 muuki88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the comments, making things more understandable for a zinc newcomer 👍 😍
But I want more 😂

The general implementation seems reasonable from the understanding I have so far (which can be described as almost nothing 😛 ). The only question I have is why the toProfile method is not part of the public API. It seems to me that this is the method to get the profile data in a serializable format

@@ -0,0 +1,68 @@
syntax = "proto3";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add more documentation on the messages and the zprof.proto proto file 😃
Especially answering questions like

  • why is this message necessary?
  • who/what might actually use this message to solve which problem
  • a remake to this pull request which marks the initial implementation

}

object InvalidationProfiler {
final val empty: InvalidationProfiler = new InvalidationProfiler {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation I guess this is being used when no profiling should be done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


def profileRun: RunProfiler = new ZincProfilerImplementation

private final var runs: List[zprof.ZincRun] = Nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some information why we need to persist every run here?

Also this list grows with every compile run if I understand this correctly?
AFAIK it's only used for serialization to disc. Should we empty the runs after
serialization? Or is the memory footprint small enough that we should never care

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ZincProfiler is managed outside of Zinc, by another tool. Whomever uses this profiler is responsible for managing the lifetime of the profilers used. That being said, the memory profile of the runs is very small given that we use a string table

* It is recommended to only perform this operation when we are
* going to persist the profiled protobuf data to disk. Do not
* call this function after every compiler iteration as the aggregation
* of the symbol tables may be expensive, it's recommended to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this recommended? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the string table for very big projects could be reasonable big, so you want to make the most out of it for as many runs as you can

private final var lastKnownIndex: Long = -1L
private final val stringTable1: ArrayBuffer[String] = new ArrayBuffer[String](1000)
private final val stringTable2: ArrayBuffer[String] = new ArrayBuffer[String](10)
private final val stringTableIndices: mutable.HashMap[String, Long] =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some information what these indices are for? String / Long are very generic types and people like me with very few compiler experience would find one sentence explaining the variable quite helpful 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you write? I didn't write anything because i thought the variable names were self-descriptive, i.e. stringTableIndices means a map from string to indices of the string table.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I don't understand (yet) is what these strings are. Class names, variable names, file paths or all of this 😁 And long is the index of what.

If you say it's self explaining once one is deeper into how things work, I'm fine with not putting any docs on it 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you make a good point. It's likely that I'm too much into the implementation that I see it obvious, but it isn't. Thanks for pointing that out, I'll add a quick comment.

}
}

abstract class RunProfiler {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here 😉 I find some scaladocs on a fresh API usually very helpful to get
the intention and purpose of the API. The question why is this interface necessary
should be answered in the scaladocs 😃

): Unit

def registerEvent(
kind: String,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds are allowed here? Is there any other implementation I can look at to see what
is applicable here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in generic on purpose so that we can easily involve it in the future, even though we only have one implementation. The kinds are defined in https://github.com/sbt/zinc/pull/585/files#diff-de86f02542b47ffddbb8687a94316daeR259

)
}

private final var currentEvents: List[zprof.InvalidationEvent] = Nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with runs. This grows unbounded, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this grows unbounded? In every run, we create a new RunProfiler so that new profile wont' see events from the past

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks 👍 Misunderstood that.

currentEvents = event :: currentEvents
}

private final var cycles: List[zprof.CycleInvalidation] = Nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with runs. This grows unbounded, right?

*
* @return An immutable zprof profile that can be persisted via protobuf.
*/
def toProfile: zprof.Profile = zprof.Profile(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important method in the whole implementation, right?
It creates a serializable format for the profile data gathered so far. Why isn't this part
of the public API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this exposes internal details of the algorithm, and I only want people that really know what they're doing using it 😉

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay 😄

@jvican
Copy link
Member Author

jvican commented Aug 27, 2018

@muuki88 Thanks for the review, I answered most of your comments. Regarding the additions of more docs (for example in the profiling schema), I'm afraid I won't be able to do so until the beginning of October. My schedule for the next 4 weeks looks pretty busy.

@muuki88
Copy link

muuki88 commented Aug 27, 2018

@jvican thanks for the quick feedback 🤗

I'm afraid I won't be able to do so until the beginning of October. My schedule for the next 4 weeks looks pretty busy.

no worries 😄

jvican added 2 commits August 27, 2018 11:03
I realized how stupid it was the idea of keeping two string tables
intead of one, since it's unlikely we'll ever have more than
2147483647 unique string instances and the JVM doesn't allow to
instantiate such a big array in normal heap sizes. So we remove this
part of the design that we inherited from pprof.
@jvican
Copy link
Member Author

jvican commented Aug 27, 2018

@muuki88 OK, motivated by your comments I quickly went into the GitHub interface and submitted two changes: more docs and an improvement to the internals of the profiler. Feel free to have a look at them and tell me if they are useful.

@eed3si9n
Copy link
Member

Aside from that, it introduces a protobuf data to track all the invalidation logic.

Given that Google has not move protobuf towards JDK 9/10/11 yet, I am concerned about bringing Protobuf in as a critical piece. As far as I know, the current usage is somewhat limited to Analysis serialization only right? What do you think about using Contraband instead here?

all ++ invalidated // need the union because all doesn't contain removed sources
} else invalidated
}
def recompileClasses(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation, reorganization, and some changes are intermixed in this commit, so it's difficult for me to preview, but have you made any intentional change in the observable behavior, either in the difference in output or in performance characteristics? For example, I noticed that the method named recompileClasses are no longer recompiling classes, but it's recompiling sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no change in semantics, I double checked this several times manually and the tests confirm it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt the change you mention @eed3si9n, the reason why you see that we recompile sources instead of classes in a clearer way is because recompileClasses was doing the mapping between invalidated classes and sources to recompile, and I pulled that logic out of it to keep things clear.

@jvican
Copy link
Member Author

jvican commented Aug 27, 2018

Given that Google has not move protobuf towards JDK 9/10/11 yet, I am concerned about bringing Protobuf in as a critical piece. As far as I know, the current usage is somewhat limited to Analysis serialization only right? What do you think about using Contraband instead here?

It can be ported in the future if need so, but the elephant in the the room is the analysis file implementation. Conversely, this use of protobuf can be easily disabled without affecting the correctness of the language (and in fact by default it won’t even initialize the protobuf classes — the warning won’t happen), so I wouldn’t worry about it. The protobuf issue will be fixed down the road (I have high confidence in that), and if it doesn’t we can find solutions by then.

@muuki88
Copy link

muuki88 commented Aug 27, 2018

Awesome @jvican ! That clarifies a lot of things 😍

@eed3si9n
Copy link
Member

It can be ported in the future if need so, but the elephant in the the room is the analysis file implementation.

Analysis file implementation can be switched back to using text file like sbt 0.13.

Conversely, this use of protobuf can be easily disabled without affecting the correctness of the language

If it's easy to not use protobuf, let's switch to using Contraband like the rest of the datatypes.

@jvican
Copy link
Member Author

jvican commented Aug 29, 2018

If it's easy to not use protobuf, let's switch to using Contraband like the rest of the datatypes.

I think you misread me 😛 I didn't say it's easy to switch to Contraband (it would actually take me quite a lot of time that I cannot afford now), I said it's easy not to use the protobuf-based profiler in the future. By default, the profiler is empty and in the worst-case scenario if protobuf doesn't remove the warning in JDK 11 you can (1) no use any profiler at all or (2) implement your own (and that is if you use it in the first place).

@eed3si9n eed3si9n changed the base branch from 1.x to develop August 30, 2018 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants