Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract variable api from ScriptSession, let ScriptSession guard reads #4970

Merged
merged 27 commits into from
Jan 22, 2024

Conversation

niloc132
Copy link
Member

This patch moves from assuming that the ScriptSession owns the ExecContext for locking (with UpdateGraph) and variable access (with QueryScope). Liveness is also moved to the QueryScope, as is the responsibility of unwrapping objects (but delegated to ScriptSession as needed).

Previously, the groovy script session used a lock to interact with its query scope, but python did not under the incorrect assumption that the GIL would guard those reads. Instead, a read/write lock is used to ensure that reads block while an operation is running that could affect values, but avoiding reads blocking each other.

Partial #4040

This patch moves from assuming that the ScriptSession owns the
ExecContext for locking (with UpdateGraph) and variable access (with
QueryScope).

Previously, the groovy script session used a lock to interact with its
query scope, but python did not under the incorrect assumption that the
GIL would guard those reads. Instead, a read/write lock is used to
ensure that reads block while an operation is running that could affect
values, but avoiding reads blocking each other.

Partial deephaven#4040
@@ -141,8 +151,9 @@ public synchronized final Changes evaluateScript(final String script, @Nullable
final Changes diff;
// retain any objects which are created in the executed code, we'll release them when the script session
// closes
variableAccessLock.writeLock().lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment applies to the try block. Let's add a comment why we hold the writeLock for the duration of the script evaluation (IIRC to avoid mid-script race conditions?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take a few minutes today and consider the three locks in play?

  • UG exclusive/shared lock - exclusive lock is held for the duration of any executed script code
  • ScriptSession state read/write lock, preventing any (well, most) writes while anyone is actively reading
  • ScriptSession instance lock, appears only used to guard snapshots for scope changes. Probably can be folded into the new read/write lock.

External callers could already hold UG shared/exclusive locks when reading from QueryScope, so with the current code we could hit a lock inversion issue - take the write lock, wait to acquire the UG exclusive lock, then in another thread while holding a either UG lock try and acquire the read lock.

I think we can simply move the writelock inside apply(() -> {...}), plus as above remove the synchronized keyword usage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more lock: the GIL. Doesn't appear to impact the above, since we take the GIL inside of the other locks, and if we're in Java, calling in to one of these, we no longer hold the GIL. Only would matter if some code deliberately held the GIL in Java (which happens, but should only be possible when doing GC sorts of stuff).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more reply from me, then i'll defer to when we do a call: it isnt quite so simple to remove synchronized, since you can't upgrade a read lock to a write lock. If the whole write lock needs to be wider (encompass calls to takeSnapshot, observeScopeChanges, possibly createDiff), then the UG exclusive lock needs to also be correspondingly bigger as well.

@@ -110,7 +113,7 @@ private void handleAutocompleteRequest(AutoCompleteRequest request,
request.getRequestId() > 0 ? request.getRequestId() : request.getGetCompletionItems().getRequestId();
try {
final ScriptSession scriptSession = exportedConsole.get();
PyObject completer = (PyObject) scriptSession.getVariable("jedi_settings");
PyObject completer = (PyObject) scriptSession.getVariableProvider().getVariable("jedi_settings", null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't do this -- but why are we providing the script session to the observer instead of the settings?

This is the only construction:

            try {
                final ScriptSession scriptSession = scriptSessionProvider.get();
                scriptSession.evaluateScript(
                        "from deephaven_internal.auto_completer import jedi_settings ; jedi_settings.set_scope(globals())");
                settings[0] = scriptSession.getVariableProvider().getVariable("jedi_settings", null);
            } catch (Exception err) {
                log.error().append("Error trying to enable jedi autocomplete").append(err).endl();
            }
            boolean canJedi = settings[0] != null && settings[0].call("can_jedi").getBooleanValue();
            log.info().append(canJedi ? "Using jedi for python autocomplete"
                    : "No jedi dependency available in python environment; disabling autocomplete.").endl();
            return canJedi ? new PythonAutoCompleteObserver(responseObserver, scriptSessionProvider, session)
                    : new NoopAutoCompleteObserver(session, responseObserver);

We should pass the settings in instead of the script session provider. If the variable is "allowed" to change over time, then I still would rather pass in a Supplier<PyObject> instead of the script session provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a major problem with scope creep in these refactors, and would rather leave this to someone else who wants to overhaul autocomplete (again).

 * Synchronize access to groovy variables
 * unmanage items removed from query scope
 * Ensure the script session's query scope is active when executing code
public Set<String> getVariableNames() {
return Collections.unmodifiableSet(scope.getKeys().collect(Collectors.toSet()));
protected Set<String> getVariableNames() {
return scope.getKeys().collect(Collectors.toUnmodifiableSet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, we end up making three copies of this key-set! PyDictWrapper#keySet creates a copy of the keys (instead of a view). Then we copy it a second time in PythonDeephavenSession#getVariableNames using a collector. Finally, we copy it a third time in ScriptSessionQueryScope#getParamNames to filter out non-valid names.

We should pass a Predicate to getVariableNames so that the Groovy impl can filter under the param-map's object monitor. That makes python a two-copy impl and the groovy one a single-copy. (instead of three and two respectively)

Do we need to synchronize w/setVariable to avoid changing the map while PyDictWrapper#keySet is making the initial copy? It looks like if you some setVariable to null while iterating that the result set could include the removed variable and not the variable that was next in the dict. Also, note that PyListWrapper#iterator() is not protected from size changes and will appears to return null for the last item in the dict (if the size is decreased).

It looks like this would cause an NPE in the LinkedHashSet construction within PyDictWrapper#keySet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, will pass a predicate, stream results into the unmodifiable set.

Racing with setVariable is somewhat fraught - after all, setVariable isn't the only way to assign values into the collection, python itself could be assigning/removing something. I think technically I can use PyLib.ensureGil() here, to guarantee that reads of python structures can't change while reading, in place of the synchronized blocks that Groovy's impl uses). Any AbstractScriptSession override that makes more than one call into python should use this (and note that it is not safe to use this to call into arbitrary python, as that python code could end up dropping the GIL despite our efforts - ensureGil only lets us be sure that we have the GIL in Java, JNI, and our own C).

@@ -21,7 +24,7 @@
/**
* Variable scope used to resolve parameter values during query execution.
*/
public abstract class QueryScope implements LogOutputAppendable {
public abstract class QueryScope extends LivenessScope implements LogOutputAppendable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan mentioned that exposing the release method from LivenessScope is concerning to him.

I believe we can appease his concerns by changing the signature to:

public abstract class QueryScope extends ReferenceCountedLivenessNode implements ReleasableLivenessManager, LogOutputAppendable

Then there is no release() exposed to the user. We should probably override destroy in the standalone impl to clear the map for consistency.

Don't forget to add something like this to the constructor:

if (!Liveness.REFERENCE_TRACKING_DISABLED) {
    incrementReferenceCount();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REFERENCE_TRACKING_DISABLED is package-protected - I think the rest makes sense, I'll push a patch shortly with the rest of this for another pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks "opening" the queryscope as a liveness scope when executing code:

        // retain any objects which are created in the executed code, we'll release them when the script session
        // closes
        try (final S initialSnapshot = takeSnapshot();
                final SafeCloseable ignored = LivenessScopeStack.open(queryScope, false)) {

We could go back to passing this, but that seems to defeat the purpose of that change...


try {
// Actually evaluate the script; use the enclosing auth context, since AbstractScriptSession's
// ExecutionContext never has a non-null AuthContext
executionContext.withAuthContext(ExecutionContext.getContext().getAuthContext())
.withQueryScope(queryScope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is necessary. Isn't this queryScope already assigned to the executionContext in the constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be necessary today, that's correct, but if not needed, it will be a no-op. Where possible (and straightforward, and obviously not-incorrect), Ryan and I were trying to make this ready for multiple script sessions, so if it does nothing, I'd be inclined to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This execution context is created in the constructor of this object with this exact query scope. Static analysis shows, even with multiple script sessions, that we're already protected for multiple script sessions.

What am I missing?

@@ -117,7 +117,7 @@ private void testFuzzerScriptFile(final long timeSeed, String s, boolean realtim
clock.now += DateTimeUtils.SECOND / 10 * timeRandom.nextInt(20);
}

final TimeTable timeTable = (TimeTable) session.getVariable("tt");
final TimeTable timeTable = session.getQueryScope().readParamValue("tt", null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this NPE if it doesn't exist? Probably you meant to leave this as throwing a MissingVarException?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not wrong, but the test will fail either way.

final Object instance = variables.readParamValue(o.getName(), null);
if (instance != null) {
if (instance instanceof Table) {
return Optional.of(Table.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this. Why are we restricting sub-classes of Table to the top-level interface? Does it solve a real problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inlining the old implementation that VariableProvider did, replicated without additional thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to think long and hard before pushing BaseTable or QueryTable methods into auto-complete, to be honest. Let's not change the status quo in this PR.

nbauernfeind
nbauernfeind previously approved these changes Jan 8, 2024
Copy link
Member

@nbauernfeind nbauernfeind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; I believe Ryan wants to pass through this, too (but if not, it's good to me).

@niloc132
Copy link
Member Author

niloc132 commented Jan 9, 2024

@niloc132
Copy link
Member Author

Another nightly run after merging and dealing with conflicts: https://github.com/niloc132/deephaven-core/actions/runs/7467840686

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, I think I'm covering the most important classes, want to discuss.

@@ -119,6 +114,12 @@ private static Object applyValueConversions(final Object value) {
return value;
}

protected QueryScope() {
super(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only things we intend to manage are things we already hold strongly, maybe we should actually use enforceStrongReachability=true. That results in a simpler/cleaner implementation of reference tracking. On the other hand, if we expect query scopes to be leaked (as in garbage), that may not be what we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question could now apply to ScriptSessionQueryScope, do you still want to consider it?

} else {
wrapped = PythonObjectWrapper.wrap(newValue);
}
// This isn't thread safe, we're relying on the GIL being kind to us (as we have historically done).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true, given that we've enclosing with ensureGil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically no, the GIL doesn't guarantee us practically anything even if this was one call instead of two - __setitem__ is just a function that could be redefined in python in a way that lets it drop the gil while invoking it...

Practically, we hope that it continued to be kind to us.


// Observe changes from this "setVariable" (potentially capturing previous or concurrent external changes from
// other threads)
observeScopeChanges();

// This doesn't return the same Java instance of PyObject, so we won't decref it properly, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand this, let's discuss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call ScriptSessionQueryScope.putParam("foo", somePyObjectInstance), and then call same = readParamValue("foo") to get it back - the PyObject instances are pointing at the "same" python object, but same == somePyObjectInstance is false - they are separate java object instances, and contain separate borrowed pointers into py/c.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the solution? Is @jmao-denver aware?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a great solution in general, except to cleanly close PyObject basically everywhere - java gc is required to mop up all the unclosed PyObject. This is wrong in enough places in DHC that it isn't worth dedicating effort to it here.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly happy. A bit concerned about atomicity between listing names and iterating values. A few other nits or questions.

Comment on lines 86 to 87
for (String name : queryScope.getParamNames()) {
Object paramValue = queryScope.unwrapObject(queryScope.readParamValue(name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, to make the readParamValue results be consistent with getParamNames. Otherwise, you could have a MissingVariableException due to concurrent mutation of the QueryScope.
Feels like a missing method to get values, to be honest.

final Object instance = variables.readParamValue(o.getName(), null);
if (instance != null) {
if (instance instanceof Table) {
return Optional.of(Table.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to think long and hard before pushing BaseTable or QueryTable methods into auto-complete, to be honest. Let's not change the status quo in this PR.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy enough, but for the missing unwraps in Sql and ScopeTickerResolver.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge if nightlies pass.

@niloc132 niloc132 merged commit d3dae15 into deephaven:main Jan 22, 2024
19 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants