Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Whether QueryCompiler Can Reuse JavaFileManager for Server Duration #4814

Closed
nbauernfeind opened this issue Nov 10, 2023 · 2 comments · Fixed by #5070
Closed
Assignees
Labels
core Core development tasks feature request New feature or request query engine
Milestone

Comments

@nbauernfeind
Copy link
Member

While working on #4808, I discovered that there is potential to cut formula compile time by ~50% if the JavaFileManager is reused across all calls to QueryCompiler#maybeCreateClassHelper. See the test, the test results, and accompanying image below.

The concerns / things that need follow up validation:

  • Does this prevent us from overriding classes / jars mid-day through the filesystem?
  • Does this prevent us from redefining classes in groovy?
  • The zulu-17 implementation of JavaFileManager is certainly not thread safe. Is it sufficient to implement a delegating implementation that synchronizes all public methods?
  • We should have a unit test of some sort that attempts to stress parallel compilation (only the go client caught this via check-ci).

The Test:

  • first query shown in the table is the REPL command to construct the tree table and should be ignored.
  • then three batches of 10 batch queries of 100 operations that run serially and guarantee the need to compile a single formula.
    -- the first batch is run sharing a single, initial, JavaFileManager (JFM) that was also used in the first query.
    -- the second batch disables sharing the JFM and instead creates a new one that it closes at the end of the compilation (engine behavior as of TableServiceAsyncTest: Eliminate Compiler Overhead Variance; Close QueryCompiler's JavaFileManager #4808).
    -- the third batch goes back to sharing a single JFM but creates a new instance first.

The first batch of group 1 has an average operation time of 48ms. (shared JFM)
The first batch of group 2 has an average operation time of 113ms. (new JFM per compilation)
The first batch of group 3 has an average operation time of 58ms. (shared JFM, but validating that the JFM is not responsible for the time creep)

Note that over time the simple formula compilation time creeps from 48ms to 69ms over ~3k formulas. This is possibly due to the temp dir that we write formula results into - I couldn't put my finger on the time creep except to eliminate blame on the JFM.

Screenshot 2023-11-10 at 2 36 51 PM

@nbauernfeind
Copy link
Member Author

Future follow-up:

    public static TreeTable queryOperationPerformanceAsTreeTable(
            @NotNull final Table qpl, @NotNull final Table qopl) {
        // TODO (https://github.com/deephaven/deephaven-core/issues/4814): use NULL_INT for ParentOperationNumber and
        //  Depth once we can prevent any compilation or at least reduce multiple usages to a single formula
        Table mergeWithAggKeys = TableTools.merge(
                qpl.updateView(
                        "EvalKey = Long.toString(EvaluationNumber)",
                        "ParentEvalKey = ParentEvaluationNumber == null ? null : (long.toString(ParentEvaluationNumber))",
                        "OperationNumber = NULL_INT",
                        "ParentOperationNumber = OperationNumber",
                        "Depth = OperationNumber",
                        "CallerLine = (String) null",
                        "IsCompilation = NULL_BOOLEAN",
                        "InputSizeLong = NULL_LONG"), 

Currently if all three columns were NULL_INT we get separate compilation lines for each as the formula includes the destination name in the generated formula. Even better (and maybe this becomes a separate issue) these could probably return a null column (or constant value column) without any compilation at all.

@rcaudy
Copy link
Member

rcaudy commented Dec 22, 2023

Next steps:

First, answer these questions:

  • Does this prevent us from overriding classes / jars mid-day through the filesystem?
  • Does this prevent us from redefining classes in groovy?
    If re-use doesn't cause issues with the above, we should at a minimum begin pooling re-usable JFMs

Second, answer this question:

  • The zulu-17 implementation of JavaFileManager is certainly not thread safe. Is it sufficient to implement a delegating implementation that synchronizes all public methods?
    If that approach works, is it better than pooling?

Third, consider whether we should be combining compilation units in cases where it won't create new failure dependencies. That is, in a single select/update/view/updateView/where, we should be willing to compile all formulas or conditions in one pass, if that's a speedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core development tasks feature request New feature or request query engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants