Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableServiceAsyncTest: Eliminate Compiler Overhead Variance; Close QueryCompiler's JavaFileManager #4808

Merged
merged 6 commits into from
Nov 11, 2023

Conversation

nbauernfeind
Copy link
Member

@nbauernfeind nbauernfeind commented Nov 10, 2023

Fixes #4798.
Follow up work #4814.

Whenever I could catch a timeout in the debugger the server was not blocked, always making progress, but spending time in compilation. I've eliminated the need to recompile formulas for each operation in the chain and added some validation that the data in the result tables are as expected.

While testing ways to reproduce and/or fix the original issue, I discovered that the JavaFileManager that we're creating on every compilation request opened file handles for many of the jars on my classpath. After around 50'ish queries the QueryCompiler broke complaining of Too many open files. Running a quick lsof | cut | sort | uniq -c showed 40+ open file handles each on many of the jars on my classpath. I've done the bare minimum in this PR, will write a follow up issue, and discuss via team-slack.

Multiplexed this PR's Check CI here: https://github.com/nbauernfeind/deephaven-core/actions/runs/6830355652
Multiplexed current main's Check CI here: https://github.com/nbauernfeind/deephaven-core/actions/runs/6830357421/
(I expect to see a failure or three in the current main and none for this PR as it's hard to tell if the issue has been fixed.)

@nbauernfeind nbauernfeind added bug Something isn't working query engine core Core development tasks NoDocumentationNeeded ReleaseNotesNeeded Release notes are needed labels Nov 10, 2023
@nbauernfeind nbauernfeind added this to the November 2023 milestone Nov 10, 2023
@nbauernfeind nbauernfeind self-assigned this Nov 10, 2023
Comment on lines 98 to 100
// to avoid timeouts due to compilation variance reuse the same expression, including the destination
// column which shows up in the generated formula's FormulaEvaluationException
longChain.add(prev.updateView("I_1 = 1 + I_0"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this is directly related to #4798; that is specifically around a timeout error. I'd assume in the case of too many open files in the compilation path there will be some other error that causes TableServiceAsyncTest to fail, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everytime it was timed out the server was in the middle of compiling. I did a bunch of instrumentation - there is no real issue with the test, just that the variance of compilation causes it to time out. Repeated local runs are actually harder to time out because the classes have been precompiled.

The change that you are commenting on here, is the actual fix -- run the long chain without repeatedly invoking the compiler. If you prefer, we can split this out from the singleton creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense; I never saw that high of variation likely due to the points you bring up.

@nbauernfeind nbauernfeind changed the title QueryCompiler: Make JavaFileManager a Singleton; Reduce TableServiceAsyncTest's Compilation Overhead TableServiceAsyncTest: Eliminate Compiler Overhead Variance; Close QueryCompiler's JavaFileManager Nov 10, 2023
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the comment regarding suppressing Errors is really important/actionable.

rcaudy
rcaudy previously approved these changes Nov 10, 2023
@nbauernfeind nbauernfeind merged commit 6694668 into deephaven:main Nov 11, 2023
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working core Core development tasks NoDocumentationNeeded query engine ReleaseNotesNeeded Release notes are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TableServiceAsyncTest timeout unit test failure
3 participants