adds ssh debugger for selenium test #498

robacarp · 2024-08-23T16:28:21Z

No description provided.

krschacht · 2024-08-23T16:29:35Z

@robacarp I suspect you'll get what you need from this running in the PR, but if you need to merge into main for testing, go right ahead

robacarp · 2024-08-23T23:13:42Z

@krschacht Alright, here's what I think is happening. I don't have a solution yet, but it's a race condition somewhere between chromedriver and selenium. In #483 you specc'd for a bigger runner because there was suspicion that it was a memory issue, but that probably made the problem worse because now ruby boots faster but chromedriver still takes forever? A short term reduction in pain might be to spec-down the runner.

The Net::ReadTimeout is coming from capybara (aka selenium) failing to hit chromedriver when attempting to set up the server. This stack trace is all sorts of interesting:

The tests run (and fail) before puma is started by capybara
The test hung because the server was still running and ruby wouldn't exit
It says the TCP socket was closed -- does this means the socket was open when it started but closed during the exchange? Or that it was never open? I suspect the former because the stack trace is in the middle of a read loop.
The failure is in the area of code which causes chromedriver to build a new session (ie, start chrome up):

   ...
    vendor/bundle/ruby/3.2.0/gems/actionpack-7.1.3.2/lib/action_dispatch/system_testing/driver.rb:55:in `block in register_selenium'
    <internal:kernel>:90:in `tap'
    vendor/bundle/ruby/3.2.0/gems/actionpack-7.1.3.2/lib/action_dispatch/system_testing/driver.rb:54:in `register_selenium'
    vendor/bundle/ruby/3.2.0/gems/actionpack-7.1.3.2/lib/action_dispatch/system_testing/driver.rb:41:in `block in register'
    vendor/bundle/ruby/3.2.0/gems/capybara-3.39.2/lib/capybara/session.rb:105:in `driver'
    vendor/bundle/ruby/3.2.0/gems/capybara-3.39.2/lib/capybara/session.rb:91:in `initialize'
   ...

I added wait-for-it.sh but the problem isn't that chromedriver isn't ready, because it still fails when waitforit passes.

This is also an interesting failure: https://github.com/AllYourBot/hostedgpt/pull/498/checks#step:9:37 It seems like everything booted up in the right order, but the tests just never ran (I cancelled the job after ~3min). It makes me wonder if there's some confusion in capybara connecting to chromedriver, and it's waiting for chromedriver? Is the PORT 9515 assumption in the chromedriver github action plugin wrong? (Or is it not correctly installed?)

krschacht · 2024-08-23T23:17:16Z

@robacarp another big clue in all this is that it all used to work fine, for many months, and then this all started one day. Does that suggest anything else to you or a possible solution?

I even tried rolling back to the point in our commit history where everything was working reliably but it no longer does. I wonder if a dependency has a new version rolled out but if we could pin to an older version this might fix? But rolling back should have done that with gemfile.lock so maybe it’s a dependency that’s somehow outside of that, like part of GitHub action runner config?

robacarp · 2024-08-26T15:00:49Z

I don't think this is anything to do with the code here, unfortunately. I think the fact that it ever worked was luck. I added a DEBUG environment variable to the run here and there's a lot more output from many of the tools in the process, but nothing jumps out.

Another thing I forgot to mention that suggests an inter-process race condition, when I SSH into the job mid-run, it sometimes fails or hangs for a bit. But if I interrupt the process (^c) and then re-run it, it goes fine.

krschacht · 2024-08-26T16:00:36Z

Dang, it still seems really odd that for ~6 months these tests never hit this issue and the one day it starts hitting the issue regularly. That's what makes me think that an updated chromedriver or ubuntu-latest system dependency—something outside of gemfile.lock—changed and that introduced this issue. That's the only way I can explain rolling back the repo and still getting the error.

Also, I never get this issue locally on my dev machine. I only get it in github actions. Initially, I suspected this was because my dev machine has more resources (cores, etc) so that's what prompted me to upgrade the github actions server. And notably, that did help. When I was using the default ubuntu-latest I ran the test suite 10 times in a row and it failed every time. I don't think I ever got it to pass successfully. But when I upgraded to the larger instances then it's now passes about 50% of the time.

In terms of working around this, would it be possible to study the stack trace and find a place within the selenium or capybara code where you could re-open the class, re-write a method to add a simple "sleep 10" or something hacky like that in the right place within the boot cycle to work around the race?

robacarp · 2024-08-26T16:03:04Z

https://github.com/AllYourBot/hostedgpt/pull/498/files#diff-58be79ee432aa1a1e12529b3141879d9c9503ac6e071c47e2d53462d339362b7R13-R21

This sort of works -- but if capybara doesn't start the server until after the tests finish, it still hangs.

robacarp · 2024-08-26T16:04:57Z

Sorry, this -- I didn't realize I rolled back the retry code:

def self.test(name, &block)
  begin
    super(name, &block)
  rescue Net::ReadTimeout => e
    puts "(caught) Net::ReadTimeout: #{e}"
    retry_count ||= 0
    retry_count += 1
    retry if retry_count < 3
  end
end

krschacht · 2024-08-26T16:49:56Z

Is there some place you think you could introduce a delay to guarantee capybara gets the server started?

robacarp · 2024-08-26T17:07:52Z

Not that I've seen. I don't think it's a delay in chromedriver -- I added a wait to ensure that gets started, but it's still failing. It's so strange that it's attempting to run the tests before capybara has started the server.

matthewbennink · 2024-08-26T18:22:51Z

Might be worth looking at Cuprite to see if it may get around the issue. See this article and the cuprite project.

krschacht · 2024-08-27T22:07:43Z

Hi @robacarp, Matthew's suggestion is an interesting one but I also got another lead. I opened an issue on Selenium and someone explained how they resolved this by pinning a chromium version:
teamcapybara/capybara#2770 (comment)

krschacht · 2024-08-28T15:50:36Z

Just to keep capture this idea, on that other Issue someone claims that he solved a bunch of issues by switching to Playwright and linked to an article: teamcapybara/capybara#2770 (comment)

robacarp force-pushed the system-tests-rob branch 2 times, most recently from 3fe08f4 to 9e162a5 Compare August 23, 2024 20:22

rob added 6 commits August 23, 2024 14:36

drop: disable a bunch of workflows to save CI cost for now

518b407

adds ssh debugger for selenium test

ed566d5

disable parallelization of tests

4caedbc

drop to a debugger on net::timeout

fce2e3d

send chromedriver output to a logfile

5bf1b26

turns on selenium debugging

8ffaac2

robacarp force-pushed the system-tests-rob branch 5 times, most recently from d66f151 to fb396f0 Compare August 23, 2024 21:21

send selenium logs to gha artifacts

d0a02b5

robacarp force-pushed the system-tests-rob branch from fb396f0 to d0a02b5 Compare August 23, 2024 22:04

rob added 5 commits August 23, 2024 16:16

set timeout to 20s

5f456aa

ssh debugging again

6461ff2

attempt to wait-for-it on chromedriver

e4fb462

removes ssh debugger

47ad1cc

manually create a chromedriver session with curl

072e381

robacarp force-pushed the system-tests-rob branch from f44b3a8 to 072e381 Compare August 23, 2024 23:07

krschacht closed this Oct 17, 2024

robacarp deleted the system-tests-rob branch October 17, 2024 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds ssh debugger for selenium test #498

adds ssh debugger for selenium test #498

robacarp commented Aug 23, 2024

krschacht commented Aug 23, 2024

robacarp commented Aug 23, 2024

krschacht commented Aug 23, 2024

robacarp commented Aug 26, 2024

krschacht commented Aug 26, 2024

robacarp commented Aug 26, 2024

robacarp commented Aug 26, 2024

krschacht commented Aug 26, 2024

robacarp commented Aug 26, 2024

matthewbennink commented Aug 26, 2024

krschacht commented Aug 27, 2024

krschacht commented Aug 28, 2024

adds ssh debugger for selenium test #498

adds ssh debugger for selenium test #498

Conversation

robacarp commented Aug 23, 2024

krschacht commented Aug 23, 2024

robacarp commented Aug 23, 2024

krschacht commented Aug 23, 2024

robacarp commented Aug 26, 2024

krschacht commented Aug 26, 2024

robacarp commented Aug 26, 2024

robacarp commented Aug 26, 2024

krschacht commented Aug 26, 2024

robacarp commented Aug 26, 2024

matthewbennink commented Aug 26, 2024

krschacht commented Aug 27, 2024

krschacht commented Aug 28, 2024