-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't try killing processes if we already know the command finished + reduce error logging noise on Windows #4231
base: master
Are you sure you want to change the base?
Conversation
// and the user is flicking through a bunch of items. | ||
self.throttle = time.Since(startTime) < THROTTLE_TIME && timeToStart > COMMAND_START_THRESHOLD | ||
if err := oscommands.Kill(cmd); err != nil { | ||
if !strings.Contains(err.Error(), "process already finished") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When navigating through files in the Files panel, we always enter this Kill branch. It doesn't matter how slowly this is done. Even if the previous diff command completed however long ago, we will still try killing its (now gone) pid.
There is a string check here to suppress kill failure logs due to this reason but on Windows, the string is different (fails in GetCreationTime -> GetWindowsHandle -> error: "The parameter is incorrect.").
Fix: Don't try killing processes needlessly if we already know the command finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be the likely culprit behind #3008.
In that thread, the theory proposed is:
- Unrelated process X spawns process Y.
- Process X exits.
- Lazygit spawns a git process with the same pid as X.
- Lazygit kills the running git process and then process Y as collateral due to process Y having ppid = X.
However, this doesn't seem to me like it matches up with the observation in that thread that this usually only happens after lazygit has been running in the background for a while. Why would it be any less likely to happen upon lazygit starting and killing its first git command if it's due to unrelated long-running processes with missing parents?
By contrast, the above issue does match up well with the symptoms. Imagine:
- lazygit is started and runs git diff in process X which completes immediately and exits.
- lazygit is left in the background for several hours by which process X pid is reused by an unrelated process.
- lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.
Checking for children creation time won't help here if these are legitimate child processes of unrelated process X.
@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p | |||
refreshViewIfStale() | |||
|
|||
if err := cmd.Wait(); err != nil { | |||
// it's fine if we've killed this program ourselves | |||
if !strings.Contains(err.Error(), "signal: killed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When navigating through branches with a lot of commits in the Branches panel, we always enter this error branch. The process will still be running indefinitely due to the buffered git log output so when we run the next git log, the previous process will be killed.
There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").
Fix: Check whether we sent a kill signal and if so, suppress the error logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've made this change, do we still need to check on signal: killed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw another usage of oscommands.Kill in RunAndProcessLines when the command decides it's time to stop reading. I think that could be another source of the kill signal here and should still be suppressed in logs on non-Windows where the string match is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand this. There's no further change we should make, neither here nor in RunAndProcessLines, is that right? Ready to merge? This is the last PR that blocks the release, so it would be nice to finish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm wondering if we should be doing something like this in RunAndProcessLines
:
diff --git a/pkg/commands/oscommands/cmd_obj_runner.go b/pkg/commands/oscommands/cmd_obj_runner.go
index 41fedcfbc..7f209cbc4 100644
--- a/pkg/commands/oscommands/cmd_obj_runner.go
+++ b/pkg/commands/oscommands/cmd_obj_runner.go
@@ -166,6 +166,12 @@ func (self *cmdObjRunner) RunAndProcessLines(cmdObj ICmdObj, onLine func(line st
return err
}
+ done := make(chan struct{})
+ go func() {
+ _ = cmd.Wait()
+ close(done)
+ }()
+
for scanner.Scan() {
line := scanner.Text()
stop, err := onLine(line)
@@ -173,17 +179,25 @@ func (self *cmdObjRunner) RunAndProcessLines(cmdObj ICmdObj, onLine func(line st
return err
}
if stop {
- _ = Kill(cmd)
+ select {
+ case <-done:
+ default:
+ _ = Kill(cmd)
+ }
break
}
}
if scanner.Err() != nil {
- _ = Kill(cmd)
+ select {
+ case <-done:
+ default:
+ _ = Kill(cmd)
+ }
return scanner.Err()
}
- _ = cmd.Wait()
+ <-done
self.log.Infof("%s (%s)", cmdObj.ToString(), time.Since(t))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my earlier comment, I was not suggesting a change to RunAndProcessLines.
- In tasks.go, I added a
case <-opts.Stop:
branch so Windows won't produce an error log on every arrow key press in the Branches panel. - A question was asked about whether we still need to the string check now that the
case <-opts.Stop:
branch is added. - I thought the answer was possibly yes because this stopping logic in tasks.go when you have another queued task is not the only place which can cause commands to be killed. I also saw another usage in RunAndProcessLines. So the string check could be kept so the log suppressing still works for non-Windows in these specific scenarios. On Windows, you're out of luck.
- Looking at the code again, I think I was just completely mistaken earlier. There should be no cases where an exec.Cmd simultaneously goes through RunAndProcessLines and the function being updated in this PR, it doesn't even make sense as a concept.
I updated the PR to remove the string check. Sorry for the confusion.
Makes sense to me. I'm not super familiar with the intricacies of the concurrency handling of this area of the code, so I'd like @jesseduffield to have a look too. One thought about the commit history: it seems that the two changes in this commit are independent, so I'd prefer them to be two separate commits. And your github comments explaining the rationale for them could then be the commit messages, so that people blaming the code in the future don't have to go to github to learn about why the changes were made.
Sounds plausible to me. Shouldn't the PR title be reworded then? It's no longer just about reducing logging noise, but about fixing an actual bug. |
4c9731d
to
7d50692
Compare
Nice, thanks for cleaning up the history. LGTM, I think this is ready to merge, but I'd still like @jesseduffield to have a look too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. One question
@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p | |||
refreshViewIfStale() | |||
|
|||
if err := cmd.Wait(); err != nil { | |||
// it's fine if we've killed this program ourselves | |||
if !strings.Contains(err.Error(), "signal: killed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've made this change, do we still need to check on signal: killed
?
This may lead to unrelated processes being killed on Windows (jesseduffield#3008). Imagine: 1. lazygit is started and runs git diff in process X which completes immediately and exits. 2. lazygit is left in the background for several hours by which process X pid is reused by an unrelated process. 3. lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.
There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").
7d50692
to
71c07af
Compare
I was tinkering around with the code and then checking associated logs but even with LOG_LEVEL=error, I found there was a lot of noise on Windows.
This PR fixes two such sources:
More details in the comments below.
go generate ./...
)