Try to fix reset-project Playwright tests #789

rmunn · 2024-05-07T01:23:31Z

We believe the failures in the Playwright tests are due to NFS caching the directory entry, and thus not noticing that the directory has new contents, resulting in Mercurial thinking the directory is still empty for some time after the zip file upload has completed. It is possible to force NFS to refresh its cache of a directory by doing opendir() and closedir() on that directory. I don't know if C#'s Directory.GetFiles method, which uses the openat() system call rather than opendir() and closedir(), will have the same effect — but it's worth a try.

Update: The other piece of the puzzle appears to be caching on the side of the LexBox API pod's NFS client. NFS writes are cached until one of the following events occurs (scroll down to "The sync mount option" heading):

Memory pressure forces reclamation of system memory resources.
An application flushes file data explicitly with sync(2), msync(2), or fsync(3).
An application closes a file with close(2).
The file is locked/unlocked via fcntl(2).

So doing an opendir/closedir (which actually needs to happen on the hgweb pod, not the lexbox pod) is not enough; we also need to force the lexbox pod's NFS client to send its data to the NFS server. I've implemented that by doing the equivalent of mkdir random-name; rmdir random-name inside the Mercurial repo after the LexBox API pod touches it (e.g. a project reset or FinishReset operation), followed by a call to ls on the hgweb side. So far that seems to be working; more experimentation is needed to see whether the ls is necessary, or whether the mkdir/rmdir is enough. (Or perhaps it truly does need to be the C# equivalent of cp /dev/null random-file; rm random-file so that a close() system call is issued).

Might fix #765.

github-actions · 2024-05-07T01:25:21Z

C# Unit Tests

57 tests ±0 57 ✅ +1 10s ⏱️ ±0s
11 suites ±0 0 💤 ±0
2 files ±0 0 ❌ - 1

Results for commit 6c2b514. ± Comparison against base commit 3c62210.

♻️ This comment has been updated with latest results.

backend/LexBoxApi/Services/HgService.cs

rmunn · 2024-05-07T01:35:53Z

I'm deploying the develop branch (without this PR) to the develop server now. Then I'll run the Playwright tests against it, and verify that they fail. Then I'll deploy this PR branch to the develop server and see if it makes the Playwright tests pass.

github-actions · 2024-05-07T01:42:55Z

UI unit Tests

11 tests 11 ✅ 0s ⏱️
3 suites 0 💤
1 files 0 ❌

Results for commit 6c2b514.

♻️ This comment has been updated with latest results.

rmunn · 2024-05-07T01:46:32Z

Can't deploy the develop branch to the develop server? Huh? It's very strange to me that the "Build hgweb / deploy hgweb / deploy" step failed actions/checkout@v4 can't find the default branch of the repo it was checking out. Perhaps we should change that repo's default branch name back to master instead of main and see what happens.

backend/LexBoxApi/Services/HgService.cs

rmunn · 2024-05-08T04:20:41Z

Was able to find proof that running ls from the lexbox pod after a project reset does not fix the 404s, and I think I know why. I just realized that the NFS directory cache is on the NFS client (in our case, the hgweb pod), not the NFS server (which would be the Rancher storage-manager container). Each NFS client (each pod that mounts a volume) has its own directory cache; that's kind of the whole point of NFS caching, to avoid unnecessary server round-trips. So to avoid 404s coming from hgweb, we need to do the ls on hgweb.

I'm going to remove the Process.Start("ls") and/or Directory.GetFiles() calls from the C# backend, and insert an ls line into the command runner. Then I'll continue testing, to prove whether that solves the problem for good.

hahn-kev · 2024-05-08T05:24:47Z

Was able to find proof that running ls from the lexbox pod after a project reset does not fix the 404s, and I think I know why. I just realized that the NFS directory cache is on the NFS client (in our case, the hgweb pod), not the NFS server (which would be the Rancher storage-manager container). Each NFS client (each pod that mounts a volume) has its own directory cache; that's kind of the whole point of NFS caching, to avoid unnecessary server round-trips. So to avoid 404s coming from hgweb, we need to do the ls on hgweb.

I'm going to remove the Process.Start("ls") and/or Directory.GetFiles() calls from the C# backend, and insert an ls line into the command runner. Then I'll continue testing, to prove whether that solves the problem for good.

yeah that's what I said here: #765 (comment)

rmunn · 2024-05-08T06:11:15Z

yeah that's what I said here: #765 (comment)

Yep. I thought about that last week but wanted to verify whether changes to the C# code were enough. Proved that they weren't. However, I've also managed to prove that running ls on hgweb will fix the project-reset tests. So next I'll work out whether there are any situations where hgweb is the one modifying the repo directories (in ways that would need invalidating cache, i.e. just receiving an hg push wouldn't matter).

BTW, I ran the projectReset.test.ts tests about 30 times, using ls on hgweb, before finally hitting an error -- and it was an HTTP 502 Bad Gateway caused by a network hiccup. So I'm satisfied that commit c87ec67 is fixing the issue.

rmunn · 2024-05-08T06:31:26Z

Commit 5cc036e contains the change I had made locally, but not yet pushed, when I ran the reset-project E2E test 30 times without a failure. (It only works because of commit c87ec67, which makes the ls run whenever you run anything in command-runner.sh).

I think I'm going to push one more commit, adding a new command to the command runner called "refreshdircache". It will not call any hg commands, so its entire purpose is to run that ls and then do nothing. (That's mildly faster than running the tip command and throwing away the result, because we don't have to start an hg process). Once that's done, this will be ready for review.

rmunn · 2024-05-08T09:21:54Z

Marking as draft PR for now because there are more changes I plan for tomorrow, so there's not much point in reviewing it in its current state.

Pull in changes to deleting draft projects, necessary in order to run Playwright E2E tests more than once

If FinishReset gets an upload of an *empty* Mercurial repo, then WaitForRepoEmptyState could wait forever. To catch that situation 99% of the time, we can simply check for the existence of `00changelog.i` in the Mercurial repo store: that file is guaranteed to exist if the repo has commits, so its absence means an empty repo. To ensure belt-and-suspenders, we still need to implement a timeout in the WaitForRepoEmptyState method, but this is a good start.

backend/LexBoxApi/Services/HgService.cs

We now choose a random filename and check its existence before creating it. In theory there's still a chance that another process could create the file before we get a chance to, but in practice that's never going to happen. We also create a directory rather than a file because on the very, very slim chance that another process creates a file between when we check its existence and when we create it, the Directory.Create call will fail. So it's minutely safer.

rmunn · 2024-05-28T09:50:44Z

I replaced the call to sync with creating, and then deleting, an empty directory inside the project repo. I could have used a file, but a directory is slightly safer (in filename-conflict situations that approximately never come up anyway, and by "never" I mean the chances of the server getting hit by a meteor are greater than the chance of the filename conflicting). This seems to work: I'm getting consistent test passes, and the tests aren't taking too long.

I'm cautiously optimistic, and going to take this PR off of draft status because I think I may have finally solved the problem.

rmunn · 2024-05-28T10:26:18Z

NOTE: I've marked this as ready for review, so I don't want to push experimental commits while others might be reviewing this. Therefore, I've created the tmp/playwright-reset-test-experiments branch to experiment with various permutations of the fix in this branch. I'll be pushing to tmp/playwright-reset-test-experiments and deploying it to the develop server, but I plan to leave this branch alone for a while. So you can review it, knowing that it will be stable for a few hours, and won't be constantly getting pushes as I run further tests.

hahn-kev

looks good, though there's some changes you left in that shouldn't have made it in I suspect

backend/LexBoxApi/Services/HgService.cs

backend/Testing/LexCore/Services/HgServiceTests.cs

frontend/src/routes/(authenticated)/project/[project_code]/ResetProjectModal.svelte

hgweb/command-runner.sh

This allows us to uncomment the tests we were skipping earlier, because now we can mock HttpClient and not allow HgService to hit the network during unit tests.

Especially in WaitForRepoEmptyState, we'll want to make sure the function times out after a while so we can't end up locking up the server due to some unforeseen error.

rmunn · 2024-05-29T04:26:40Z

@hahn-kev - Addressed all your review comments. Let me know if there's anything else that needs changing.

hahn-kev

still some things to fix

backend/LexBoxApi/Services/HgService.cs

I forgot that .NET code will throw OperationCanceledException if passed a token that gets canceled. We don't want WaitForRepoEmptyState to throw if its timeout is reached, we want it to return as if it was successful. So we need to catch and suppress that exception. Also passed the token into Task.Delay so that if the timeout is reached during the delay, the delay will cancel immediately instead of running to completion.

hahn-kev

nice work Robin!

Fixes merge conflict in Testing.csproj

rmunn mentioned this pull request May 7, 2024

After uploading a .zip file in project reset, LexEntryCount incorrecly remains 0 #765

Closed

github-advanced-security bot found potential problems May 7, 2024

View reviewed changes

backend/LexBoxApi/Services/HgService.cs Fixed Show fixed Hide fixed

backend/LexBoxApi/Services/HgService.cs Fixed Show fixed Hide fixed

rmunn temporarily deployed to develop May 7, 2024 01:47 — with GitHub Actions Inactive

rmunn had a problem deploying to develop May 7, 2024 01:49 — with GitHub Actions Error

rmunn temporarily deployed to develop May 7, 2024 02:58 — with GitHub Actions Inactive

rmunn had a problem deploying to develop May 7, 2024 02:59 — with GitHub Actions Error

rmunn had a problem deploying to develop May 7, 2024 03:30 — with GitHub Actions Failure

rmunn had a problem deploying to develop May 7, 2024 05:03 — with GitHub Actions Failure

rmunn had a problem deploying to staging May 7, 2024 05:04 — with GitHub Actions Failure

rmunn had a problem deploying to develop May 7, 2024 07:32 — with GitHub Actions Error

rmunn temporarily deployed to develop May 7, 2024 07:39 — with GitHub Actions Inactive

rmunn had a problem deploying to develop May 7, 2024 07:40 — with GitHub Actions Error

rmunn had a problem deploying to develop May 7, 2024 08:04 — with GitHub Actions Failure

rmunn had a problem deploying to develop May 7, 2024 08:54 — with GitHub Actions Failure

hahn-kev requested changes May 7, 2024

View reviewed changes

backend/LexBoxApi/Services/HgService.cs Outdated Show resolved Hide resolved

rmunn force-pushed the chore/try-to-fix-playwright-reset-tests branch from f6165f7 to d3e550d Compare May 8, 2024 03:11

rmunn temporarily deployed to develop May 8, 2024 03:19 — with GitHub Actions Inactive

rmunn temporarily deployed to develop May 8, 2024 04:33 — with GitHub Actions Inactive

rmunn temporarily deployed to develop May 8, 2024 07:36 — with GitHub Actions Inactive

rmunn marked this pull request as draft May 8, 2024 09:22

rmunn force-pushed the chore/try-to-fix-playwright-reset-tests branch from 9691a40 to 87582d0 Compare May 9, 2024 01:38

Merge branch 'develop' into chore/try-to-fix-playwright-reset-tests

6f86d26

Pull in changes to deleting draft projects, necessary in order to run Playwright E2E tests more than once

rmunn temporarily deployed to develop May 28, 2024 08:05 — with GitHub Actions Inactive

Fix deleteProject logic

c5feebe

rmunn temporarily deployed to develop May 28, 2024 08:14 — with GitHub Actions Inactive

rmunn temporarily deployed to develop May 28, 2024 09:31 — with GitHub Actions Inactive

github-advanced-security bot found potential problems May 28, 2024

View reviewed changes

backend/LexBoxApi/Services/HgService.cs Dismissed Show resolved Hide resolved

rmunn marked this pull request as ready for review May 28, 2024 09:50

rmunn requested review from hahn-kev and myieye May 28, 2024 09:50

Remove one final TODO comment

c13c4a7

rmunn temporarily deployed to develop May 28, 2024 10:30 — with GitHub Actions Inactive

hahn-kev requested changes May 28, 2024

View reviewed changes

rmunn added 3 commits May 29, 2024 10:51

Add proper mocking of HttpClient in HgServiceTests

8a55b61

This allows us to uncomment the tests we were skipping earlier, because now we can mock HttpClient and not allow HgService to hit the network during unit tests.

Use cancellation tokens in new HgService code

2c7f2ef

Especially in WaitForRepoEmptyState, we'll want to make sure the function times out after a while so we can't end up locking up the server due to some unforeseen error.

Remove leftover debugging code

a581c38

rmunn requested a review from hahn-kev May 29, 2024 04:26

hahn-kev requested changes May 29, 2024

View reviewed changes

backend/LexBoxApi/Services/HgService.cs Outdated Show resolved Hide resolved

backend/LexBoxApi/Services/HgService.cs Outdated Show resolved Hide resolved

backend/LexBoxApi/Services/HgService.cs Outdated Show resolved Hide resolved

rmunn added 2 commits May 30, 2024 10:51

CancellationTokenSource is IDisposable, so dispose it

8ac1ba4

rmunn requested a review from hahn-kev May 30, 2024 04:05

hahn-kev approved these changes May 30, 2024

View reviewed changes

rmunn mentioned this pull request May 30, 2024

clean up project code validation in HgService #836

Merged

Merge branch 'develop' into chore/try-to-fix-playwright-reset-tests

6c2b514

Fixes merge conflict in Testing.csproj

rmunn merged commit 20ffa08 into develop May 30, 2024
15 of 16 checks passed

myieye deleted the chore/try-to-fix-playwright-reset-tests branch January 14, 2025 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix reset-project Playwright tests #789

Try to fix reset-project Playwright tests #789

rmunn commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited

Loading

rmunn commented May 7, 2024

github-actions bot commented May 7, 2024 •

edited

Loading

rmunn commented May 7, 2024

rmunn commented May 8, 2024

hahn-kev commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 28, 2024

rmunn commented May 28, 2024

hahn-kev left a comment

rmunn commented May 29, 2024

hahn-kev left a comment

hahn-kev left a comment

Try to fix reset-project Playwright tests #789

Try to fix reset-project Playwright tests #789

Conversation

rmunn commented May 7, 2024 • edited Loading

github-actions bot commented May 7, 2024 • edited Loading

C# Unit Tests

rmunn commented May 7, 2024

github-actions bot commented May 7, 2024 • edited Loading

UI unit Tests

rmunn commented May 7, 2024

rmunn commented May 8, 2024

hahn-kev commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 8, 2024

rmunn commented May 28, 2024

rmunn commented May 28, 2024

hahn-kev left a comment

Choose a reason for hiding this comment

rmunn commented May 29, 2024

hahn-kev left a comment

Choose a reason for hiding this comment

hahn-kev left a comment

Choose a reason for hiding this comment

rmunn commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited

Loading