Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDash Report random test failure tool (#600) #603

Merged
merged 53 commits into from
Feb 14, 2024

Conversation

achauphan
Copy link
Collaborator

@achauphan achauphan commented Feb 6, 2024

WIP: This PR is the implementation of a CDash Random Test Failure analysis tool related to:

@achauphan achauphan force-pushed the 600-cdash-random-failure-tool branch from d075467 to c36530a Compare February 6, 2024 17:22
@achauphan
Copy link
Collaborator Author

Force push to fix typo in d075467 commit message

Copy link
Contributor

@sebrowne sebrowne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more or less acceptable to me.

Biggest note that I have is that this is VERY complex code that effectively mocks out the responses from the API query library, but does so in such a way that a lot of the code in the library is actually run. In general, I recommend against doing that in unit tests, because you're testing more than one module with the test. A more robust way to do that would be to use the Mock module in Python to directly replace the library functions under test (CDQAR in this specific case). Then again, using Python to run the script via the command line is also a bit uncomfortable to me, as opposed to importing the capability and then testing it directly in Python. I generally do not test command-line parsers. Same reason as above. I'd rather have good unit tests than system tests.

#############################################################################


class test_getBuildIdFromTest(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend standard Python class naming (TestGetBuildIdFromTest)

)


def test_no_random_failure(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a docstring similar to the above one here (describe why it is not a random failure)

Comment on lines +12 to +13
u'status': u'Passed',
u'statusclass': u'error',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one of the (in-theory) contradictions that I was warning you about. I don't think it's a big deal, and probably would leave it alone since you don't use those values. Just a warning about mocking the API responses this way.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Feb 7, 2024

Commenting on @sebrowne review above, you need system-level tests for every major use case supported by the tool, just to make sure the command-line arguments are interpreted correctly and you get the desired behavior. But most of the detailed code coverage tests should be below system-level tests (i.e. unit tests). You need to be sparing with system-level tests because they are typically expensive and/or difficult to maintain compared to lower-level (unit-level) tests.

Updated: @sebrowne and @achauphan, do you want me to do a detailed code review of this PR? Just taking a quick look, there is really not that much production code but it is not very unit testable the way it is currently written. The entire file tribits/ci_support/cdash_analyze_and_report_random_failures.py is only 342 lines long, but it is mostly a small class and some small functions and then a large main() function, so it would be difficult unit test this. If it were me, I would break that long main() function up into small functions and/or classes and have unit tests on the pieces. Those unit tests are much easier to write and maintain and might allow fewer system-level tests. In fact, if you develop the code with TDD, the unit tests come for free and you have naturally modular code. I can suggest how to refactor this code and make it more modular and unit testable if there is interest in that. Just let me know.

@bartlettroscoe bartlettroscoe changed the title CDash Report random test failure tool CDash Report random test failure tool (#600) Feb 7, 2024
@achauphan
Copy link
Collaborator Author

@bartlettroscoe I do agree that main() needs to broken down into smaller units and I would love and appreciate a more detailed review, however, I think this is @sebrowne's call for if I should put more time into this as I've already taken too long on the story.

From a "lets get some data now on the problem" perspective, could it be worth a general review and standing it up for Trilinos to get a better picture of the frequency of random test failures and then assess how much more work should go into this?

@bartlettroscoe
Copy link
Member

@achauphan and @sebrowne,

I may be able to use some other funding related to DevOps research and metrics gathering to refactor this code some and generalize it so that it can work for projects other than Trilinos. As currently written, it will only work for Trilinos PR builds. Therefore, I think if we keep the code as is, it may be better (for now) to keep the tool and its tests on a branch or put this code into its own git repo.

With that said, I don't think it is a lot of work to do the minimal refactorings needed to make 95% of this generic and keep it in the TriBITS repo and then have a Trilinos-specific driver in a different git repo (could be the main Trilinos repo or a different git repo).

What do you think?

@achauphan
Copy link
Collaborator Author

@bartlettroscoe I can't speak for @sebrowne, but do agree that in its current state, this is not at all ready for general use so it should stay as a branch for now. If it's a good idea for me to keep working on this I can, but if you would like to and are able to take this on further, please feel free to do so. I do also agree that this should not too much to refactor and make it more generic.

@sebrowne
Copy link
Contributor

sebrowne commented Feb 8, 2024

@achauphan @bartlettroscoe

Looking over it again for the pieces that are Trilinos-specific, I do not feel they are significant enough to bother generalizing the code until there is a greater use case for doing so, primarily given the relative ease with which it should be possible to separate the general and the specific components. That being said, I think that the argument that splitting it is simple enough to just take care of now is also perfectly valid. I don't have a strong opinion on this matter, as long as this code is in Trilinos and usable very soon. I want to be able to start setting up a Jenkins job with it, and if it needs to target the branch in this repository to do so, that's fine too. Our sprint ends on Monday 02/19 and if this capability is not working and emailing by then, I do not feel that I can justify spending more of the team's time on it.

On the subject of main() splitting, I do not see the value if the cache-mock mechanism will still be used (because unit testing for the code that is currently in main() is prohibited much more by the lack of library-function mocking than it is by all being in one function).

@achauphan
Copy link
Collaborator Author

@sebrowne @bartlettroscoe

Is there a reasonable middle ground here for supporting cache-mock tests along with library-function mocking?

The argument to utilizing a cache-mock system test would be, as stated earlier by @bartlettroscoe, to test this program's command line args interpretation and check the overall desired behavior of the program at the cost of having to also run the library code as mentioned by @sebrowne.

Then utilizing library-function mocking would allowmain() to be broken down to its own smaller testable units, as mentioned by @sebrowne, which we all seem to think is a needed.

I believe this approach would promote another module file with something of the name of CDashRandomFailures.py that holds smaller units of main() and can further house the generic strategy objects mentioned here.

Coming into this, my understanding of testing has been limited, but I think this conversation has been very helpful in shaping the different perceptions around test.

What do ya'll think?

(Also @bartlettroscoe I would appreciate the refactor suggestions offer you made earlier if you have time)

@bartlettroscoe
Copy link
Member

Is there a reasonable middle ground here for supporting cache-mock tests along with library-function mocking?

@achauphan, what you outline above is exactly what I would recommend and it is what I did for the tool cdash_analyze_and_report.py. You will see there are a lot of unit tests for code called by that tool that typically never create files or touch the filesystem.

Also @bartlettroscoe I would appreciate the refactor suggestions offer you made earlier if you have time.

Yes, I can suggest what to do. But it sounds like @sebrowne wants to start with the minimal work to get this merged and used by Trilinos. For a minimal and easy refactoring (that does not require writing any new tests), what I would suggest would be:

  • Create a new Python class (called CDashAnalyzeReportRandomFailures?) that accepts two Strategy objects (one to modify build names and the other to extract the version info for a given build).
  • Move the existing main() function to a member function CDashAnalyzeReportRandomFailures::runDriver().
  • Create a test/example tribits/ci_support/example_cdash_analyze_and_report_random_failures.py that uses CDashAnalyzeReportRandomFailures and simple Strategy objects to match the existing tests (that can be the ones for Trilinos, that is fine).
  • Change the system-level tests in TriBITS to call example_cdash_analyze_and_report_random_failures.py (with the same exact data)
  • Create a new driver function in Trilinos called trilinos_cdash_analyze_and_report_random_failures.py that uses CDashAnalyzeReportRandomFailures and strategy objects specific for Trilinos (which initially would just be a direct copy of example_cdash_analyze_and_report_random_failures.py).

That is a very straightforward refactoring that can be done in stages, using the existing tests to ensure that each refactoring step goes smoothly and with no risk.

What is outlined above is exactly what was done for:

The above refactoring can be done in just a few short hours (or less). We can pair-program this together if you like. I have some availability tomorrow. (My SNL Outlook calendar is currently shown to be full but that is just blocking off time for some writing I need to get done for a paper.)

@achauphan
Copy link
Collaborator Author

@bartlettroscoe thanks for the suggestions! I've started a bit on these yesterday before your comment and planned to do a stacked PR. How about I finish going through these suggestions and we review the stacked PR together?

@bartlettroscoe
Copy link
Member

@bartlettroscoe thanks for the suggestions! I've started a bit on these yesterday before your comment and planned to do a stacked PR. How about I finish going through these suggestions and we review the stacked PR together?

@achauphan, sounds good. Let me know when you want me to review.

achauphan added a commit that referenced this pull request Feb 10, 2024
…stack

Stack PR with generic implementation for PR #603 (#600)
@achauphan
Copy link
Collaborator Author

Force push to fix codespell

@achauphan
Copy link
Collaborator Author

achauphan commented Feb 13, 2024

@sebrowne @bartlettroscoe Requesting another review.

The primary changes done since the last request was moving away from the script file cdash_analyze_and_report_random_failures.py and into using a module file CDashAnalyzeReportRandomFailures.py with a driver class CDashAnalyzeReportRandomFailuresDriver containing the same contents of the original script. Running the script will be done from a project specific script based on example_cdash_analyze_and_report_random_failures.py. The example script currently contains placeholder strategy classes that are the ones that would be used for Trilinos.

These are the changes that I hope are enough to stand up. I did not make any changes related to breaking up main(), or in this case runDriver(), into smaller testable bits as it was previously deemed good enough for now. That can perhaps be scheduled later down the road.

@bartlettroscoe
Copy link
Member

@achauphan, doing a detailed review now ...

@bartlettroscoe
Copy link
Member

@achauphan, while I am doing this review, could you please rebase this branch on top of the upstream 'master' target branch? GitHub is reporting the branch is out of date showing:

image

?

If not, I can take care of that after my reveiw.

NOTE: This is GitHub being super careful testing the exact version of the repo that will be merged

achauphan and others added 9 commits February 13, 2024 09:56
Helper module functions that construct the browser and query URLs to
cdash that can be used for downloaded the data from.
This initial script implementation takes in several cdash arguments
and filters cdash for an initial set of all failing tests for a
certain number of days. With that set of all failing tests, the script
will then get all of that test's testing history. The test's full testing
history is used to build a set of target,topic sha1 associated with failing
testing iterations.

This initial implementation current lacks the check to see if a
passing test's target,topic sha1s exist in the set of failing sha1s,
which denotes an unstable test.

Monolithic commit as this started from a lot of exploratory coding
that eventually built to this starting implementation.
Add checkIfTestUnstable() that takes in a tuple of passing sha1s and a
set of tuples containing nonpassing sha1s. This requires testing.
Should help with development.  Very rough and Just Got Stuff Working.
Add a set of unit tests for getBuildIdFromTest helper function from
cdash_analyze_and_report_random_failure.py
Moved argument parsing into a function that gets called by main and
changed getBuildIdFromTest to return the last item of the split string rather
than a constant index.
Limited the build name to only 80 characters to shorten the cache file name.
Fix regex pattern to match a string literal rather than
a raw json string output which was used prior for during testing.
Moved random failure test files to its own separate folder inside test/ci_support as to
not be confused with the test files associated with other script.
At a glance, the names of test cases such as "rft_0_ift_2" are not
understandable without knowing what the acronyms mean. The directories
of the new test case names will continue to use the acronyms as that
better depicts the contents of the test files present.
The context of the script is a cdash tool so most of the variable
names do not need that additional context in their names.
#600)

Create driver class CDashAnalyzeReportRandomFailuresDriver inside module file
CDashAnalyzeReportRandomFailures.py that will contain the main general functionality
of the random test failure tool.

The driver class accepts two strategy classes passed from the
example script. These strategy classes ExampleVersionInfoStrategy and
ExampleBuildNameStrategy contain the project specific implementation
that is generically used inside of the driver class.
#600)

This large commit is copying over the main() function and its associated
helper functions into CDashAnalyzeReportRandomFailures.py inside
the CDashAnalyzeReportRandomFailuresDriver class. This is part
of the effort to refactor cdash_analyze_and_report_random_failures.py
to be more generic.
…600)

There were mixed use cases of 'targetTopic' or 'topicTarget', this renames
all cases to use 'targetTopic' approach.
Moved example_cdash_analyze_and_report_random_failures.py to test/ci_support
Trilinos specific driver `trilinos_cdash_analyze_and_report_random_failures.py` based on
`example_cdash_analyze_and_report_random_failures.py` that contains the Trilinos specific
implementations of `VersionInfoStrategy` and `ExtractBuildNameStrategy`.
Example class did not include the 'Example' prefix.
This is for testing the CDashAnalyzeReportRandomFailures.py
runDriver().
Adjusted spacing between classes and added newline character at
the end of file.
This reverts commit dbe94f4.

Reverting this commit as this specific driver implementation shouldn't be
existing inside of TriBITS. Rather it should be added to the Trilinos
repo after snapshotting TriBITS in.
Deleted the original `cdash_analyze_and_report_random_failures.py` script after
moving its main functionality into a separate class inside `CDashAnalyzeReportRandomFailures.py`.

To run the script, one must start from `example_cdash_analyze_and_report_random_failures.py` located
in `test/ci_support` and supply an implementation of the two strategy objects used by the
`CDashAnalyzeReportRandomFailures.py` driver class.
Removed unit tests related to the old script, `cdash_analyze_and_report_random_failures.py`.
These tests will be put back as unittests for the module file `CDashAnalyzeReportRandomFailures.py`.

This change will keep `cdash_analyze_and_report_random_failures_UnitTests.py` focused on
the system tests for how the class `CDashAnalyzeReportRandomFailuresDriver` is used.
Added tests for `CDashAnalyzeReportRandomFailuresDriver` member functions in
`CDashAnalyzeReportRandomFailures.py`.
Previous filename compression technique was to always trim the buildname to
only the first 80 characters as to avoid "filename too long" errors.

Cache file or directory names are built in the format of `testName_buildName`
The above method does not protect against the case where testName may be
very long.

This implementation uses an existing function named `getCompressedFileNameIfTooLong` in
`CDashQueryAnalyzeReport.py` module file which will form a hash of the passed in string
if it is deamed too long.

This will also help mitigate the chances of a filename collision as previously it was
possible for a trimmed buildName to result in the same `testName_buildName` filename
if testName was the same test and had the correct length.
Optional usageHelp string that can be passed to `CDashAnalyzeReportRandomFailuresDriver`
that is outputted with when the main script is given the `--help` argument.
Used to specify the testing day start time unique to each CDash
project.
Copy link
Member

@bartlettroscoe bartlettroscoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achauphan, I did not have time for a line-by-line review of the test driver test/ci_support/CDashAnalyzeReportRandomFailures_UnitTests.py but I looked over the production code in the module CDashAnalyzeReportRandomFailures.py and I did not see any big problems. I only have suggestions for improvements that can be done in future PRs during future work.

I think this is good to merge as is and get used in Trilinos.

@@ -0,0 +1,288 @@
# @HEADER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file should be changed to example_cdash_analyze_and_report_random_failures_UnitTests.py? The file/module cdash_analyze_and_report_random_failures.py does not exist anymore. Or, you could call this system-level driver something like CDashAnalyzeReportRandomFailures_SystemLevelTests.py? (But the name example_cdash_analyze_and_report_random_failures_UnitTests.py might be more obvious?)

IMHO, the more detailed unit tests for CDashAnalyzeReportRandomFailures.py should be put in the driver CDashAnalyzeReportRandomFailures_UnitTests.py. It is nice to have very fast unit tests that are easer to debug in their own driver separate from the more expensive system-level tests driver. So when you are making changes, you run the faster CDashAnalyzeReportRandomFailures_UnitTests.py and resolve any errors in those tests first before digging into the failures in the system-level tests. That is just my opinion and I find this separation works well from experience.

Comment on lines +30 to +35
def getTargetTopicSha1s(self, buildData):
pattern = r"Parent [12]:\n\s+(\w+)"
matchedList = regex.findall(pattern, buildData)

if len(matchedList) != 2: return None
return tuple(matchedList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we may need to refactor this Strategy API to allow more flexibility in how the version info is extracted from CDash. For example, for a raw CMake project, the CMake configure output will not contain the version info and you will need to get that info in some other way (either directly or indirectly through other CDash REST API calls). Therefore, for the more general use case, we would need to pass in an object containing enough info to call CDash and get the needed data. But we can wait on that.

But note that even once Trilinos moves to GitHub Actions, this current implementation will still correctly extract the topic and target branch SHA1 info since the top commit will still be a merge commit. (But in the case of GHA, the merge commit is created by GitHub and that merge commit is just checked out locally instead of created locally.)

Comment on lines +37 to +38
def checkTargetTopicRandomFailure(self, targetTopicPair, knownTargetTopicPairs):
return targetTopicPair in knownTargetTopicPairs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function is needed in the strategy object. I think as long as the object returned from the getTargetTopicSha1s() function is comparable (which you can specify as a requirement for the strategy interface), then this is perfectly general code for all implementations that I can image.

Therefore, I would remove this function from here and put it back as a free function called by the CDashAnalyzeReportRandomFailuresDriver.runDriver() function.

Comment on lines +168 to +171
buildConfigOutput = self.downloadBuildSummaryOffCDash(
buildSummaryQueryUrl, buildSummaryCacheFile, verbose=printUrlMode =='all',
alwaysUseCacheFileIfExists=True)['configure']['output']

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to call ['configure']['output'] at this top-level driver code seems to be too much detail about how the CDash data-structure works? A future refactoring might consider providing the wrapper function downloadConfigureOutputOffCDash() that takes the same arguments and just calls:

  return downloadBuildSummaryOffCDash( ... args ...)['configure']['output']

And the function downloadConfigureOutputOffCDash() should be in the module CDashQueryAnalyzeReport.py as well.

Comment on lines +182 to +189
buildSummaryCacheFile = buildSummaryCacheDir+"/"+buildId
buildSummaryQueryUrl = CDQAR.getCDashBuildSummaryQueryUrl(cdashSiteUrl, buildId)
buildConfigOutput = self.downloadBuildSummaryOffCDash(
buildSummaryQueryUrl, buildSummaryCacheFile, verbose=printUrlMode =='all',
alwaysUseCacheFileIfExists=True)['configure']['output']

passingSha1Pair = \
self.versionInfoStrategy.getTargetTopicSha1s(buildConfigOutput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in my review of the file test/ci_support/example_cdash_analyze_and_report_random_failures.py above, a future refactoring should move all of this into a refactored strategy interface that takes buildId and returns the versionInfo object. (And the versionInfo object just must be printable and it must be compatible.) So the refactored code would be:

  passingVersionInfo = self.versionInfoStrategy.getTargetTopicSha1s(buildId)

Then other aways to get the "version" for a build for different projects could be devised, as I mention in my review of the file test/ci_support/example_cdash_analyze_and_report_random_failures.py` above.

print("\nWriting HTML to file: "+writeEmailToFile+" ...")
htmlStr = CDQAR.getFullCDashHtmlReportPageStr(cdashReportData,
pageTitle=summaryLine, pageStyle=defaultPageStyle)
# print(htmlStr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out debug print statement?


if sendEmailTo:
htmlStr = CDQAR.getFullCDashHtmlReportPageStr(cdashReportData,
pageStyle=defaultPageStyle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this indent correct? Should this line not be indented one more level (i.e. 2 spaces)?

msg=CDQAR.createHtmlMimeEmail(
sendEmailFrom, emailAddress, summaryLine, "", htmlStr)
CDQAR.sendMineEmail(msg)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For long functions like this, I sometimes provide a comment indented at the same level as the function prototype with:

  # end runDriver()

so that you know your context.

Copy link
Member

@bartlettroscoe bartlettroscoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achauphan, I did not have time for a line-by-line review of the test driver but I looked over the production code in the module CDashAnalyzeReportRandomFailures.py and I did not see any big problems. I only have suggestions for improvements that can be done in future PRs during future work.

I think this is good to merge as is and get used in Trilinos.

@bartlettroscoe
Copy link
Member

@achauphan, I am okay to merge as is. Let me know and I will hit the merge button.

@achauphan
Copy link
Collaborator Author

@bartlettroscoe thanks for the review and suggestions! Hopefully there is time in the future to implement them.

Please go ahead and merge :)

If you'd like, I can give snapshotting this into Trilinos a try tomorrow, assuming that this is the correct guide.

@bartlettroscoe bartlettroscoe merged commit 7b14c49 into master Feb 14, 2024
6 checks passed
@bartlettroscoe
Copy link
Member

If you'd like, I can give snapshotting this into Trilinos a try tomorrow, assuming that this is the correct guide.

Close, but we use a separate branch to snapshot into and then and then merge that into the Trilinos 'develop' branch so as to not copy over local changes to TriBITS Trilinos. I will do it this time right now and will write instructions later for next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants