-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDash Report random test failure tool (#600) #603
Conversation
d075467
to
c36530a
Compare
Force push to fix typo in d075467 commit message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more or less acceptable to me.
Biggest note that I have is that this is VERY complex code that effectively mocks out the responses from the API query library, but does so in such a way that a lot of the code in the library is actually run. In general, I recommend against doing that in unit tests, because you're testing more than one module with the test. A more robust way to do that would be to use the Mock module in Python to directly replace the library functions under test (CDQAR in this specific case). Then again, using Python to run the script via the command line is also a bit uncomfortable to me, as opposed to importing the capability and then testing it directly in Python. I generally do not test command-line parsers. Same reason as above. I'd rather have good unit tests than system tests.
############################################################################# | ||
|
||
|
||
class test_getBuildIdFromTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend standard Python class naming (TestGetBuildIdFromTest)
test/ci_support/cdash_analyze_and_report_random_failures_UnitTests.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
|
||
def test_no_random_failure(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a docstring similar to the above one here (describe why it is not a random failure)
u'status': u'Passed', | ||
u'statusclass': u'error', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one of the (in-theory) contradictions that I was warning you about. I don't think it's a big deal, and probably would leave it alone since you don't use those values. Just a warning about mocking the API responses this way.
Commenting on @sebrowne review above, you need system-level tests for every major use case supported by the tool, just to make sure the command-line arguments are interpreted correctly and you get the desired behavior. But most of the detailed code coverage tests should be below system-level tests (i.e. unit tests). You need to be sparing with system-level tests because they are typically expensive and/or difficult to maintain compared to lower-level (unit-level) tests. Updated: @sebrowne and @achauphan, do you want me to do a detailed code review of this PR? Just taking a quick look, there is really not that much production code but it is not very unit testable the way it is currently written. The entire file |
@bartlettroscoe I do agree that From a "lets get some data now on the problem" perspective, could it be worth a general review and standing it up for Trilinos to get a better picture of the frequency of random test failures and then assess how much more work should go into this? |
@achauphan and @sebrowne, I may be able to use some other funding related to DevOps research and metrics gathering to refactor this code some and generalize it so that it can work for projects other than Trilinos. As currently written, it will only work for Trilinos PR builds. Therefore, I think if we keep the code as is, it may be better (for now) to keep the tool and its tests on a branch or put this code into its own git repo. With that said, I don't think it is a lot of work to do the minimal refactorings needed to make 95% of this generic and keep it in the TriBITS repo and then have a Trilinos-specific driver in a different git repo (could be the main Trilinos repo or a different git repo). What do you think? |
@bartlettroscoe I can't speak for @sebrowne, but do agree that in its current state, this is not at all ready for general use so it should stay as a branch for now. If it's a good idea for me to keep working on this I can, but if you would like to and are able to take this on further, please feel free to do so. I do also agree that this should not too much to refactor and make it more generic. |
Looking over it again for the pieces that are Trilinos-specific, I do not feel they are significant enough to bother generalizing the code until there is a greater use case for doing so, primarily given the relative ease with which it should be possible to separate the general and the specific components. That being said, I think that the argument that splitting it is simple enough to just take care of now is also perfectly valid. I don't have a strong opinion on this matter, as long as this code is in Trilinos and usable very soon. I want to be able to start setting up a Jenkins job with it, and if it needs to target the branch in this repository to do so, that's fine too. Our sprint ends on Monday 02/19 and if this capability is not working and emailing by then, I do not feel that I can justify spending more of the team's time on it. On the subject of |
Is there a reasonable middle ground here for supporting cache-mock tests along with library-function mocking? The argument to utilizing a cache-mock system test would be, as stated earlier by @bartlettroscoe, to test this program's command line args interpretation and check the overall desired behavior of the program at the cost of having to also run the library code as mentioned by @sebrowne. Then utilizing library-function mocking would allow I believe this approach would promote another module file with something of the name of Coming into this, my understanding of testing has been limited, but I think this conversation has been very helpful in shaping the different perceptions around test. What do ya'll think? (Also @bartlettroscoe I would appreciate the refactor suggestions offer you made earlier if you have time) |
@achauphan, what you outline above is exactly what I would recommend and it is what I did for the tool
Yes, I can suggest what to do. But it sounds like @sebrowne wants to start with the minimal work to get this merged and used by Trilinos. For a minimal and easy refactoring (that does not require writing any new tests), what I would suggest would be:
That is a very straightforward refactoring that can be done in stages, using the existing tests to ensure that each refactoring step goes smoothly and with no risk. What is outlined above is exactly what was done for:
The above refactoring can be done in just a few short hours (or less). We can pair-program this together if you like. I have some availability tomorrow. (My SNL Outlook calendar is currently shown to be full but that is just blocking off time for some writing I need to get done for a paper.) |
@bartlettroscoe thanks for the suggestions! I've started a bit on these yesterday before your comment and planned to do a stacked PR. How about I finish going through these suggestions and we review the stacked PR together? |
@achauphan, sounds good. Let me know when you want me to review. |
c8a1a06
to
cefaef4
Compare
Force push to fix codespell |
@sebrowne @bartlettroscoe Requesting another review. The primary changes done since the last request was moving away from the script file These are the changes that I hope are enough to stand up. I did not make any changes related to breaking up |
@achauphan, doing a detailed review now ... |
@achauphan, while I am doing this review, could you please rebase this branch on top of the upstream 'master' target branch? GitHub is reporting the branch is out of date showing: ? If not, I can take care of that after my reveiw. NOTE: This is GitHub being super careful testing the exact version of the repo that will be merged |
Helper module functions that construct the browser and query URLs to cdash that can be used for downloaded the data from.
This initial script implementation takes in several cdash arguments and filters cdash for an initial set of all failing tests for a certain number of days. With that set of all failing tests, the script will then get all of that test's testing history. The test's full testing history is used to build a set of target,topic sha1 associated with failing testing iterations. This initial implementation current lacks the check to see if a passing test's target,topic sha1s exist in the set of failing sha1s, which denotes an unstable test. Monolithic commit as this started from a lot of exploratory coding that eventually built to this starting implementation.
Add checkIfTestUnstable() that takes in a tuple of passing sha1s and a set of tuples containing nonpassing sha1s. This requires testing.
Should help with development. Very rough and Just Got Stuff Working.
Add a set of unit tests for getBuildIdFromTest helper function from cdash_analyze_and_report_random_failure.py
Moved argument parsing into a function that gets called by main and changed getBuildIdFromTest to return the last item of the split string rather than a constant index.
Limited the build name to only 80 characters to shorten the cache file name.
Fix regex pattern to match a string literal rather than a raw json string output which was used prior for during testing.
Moved random failure test files to its own separate folder inside test/ci_support as to not be confused with the test files associated with other script.
At a glance, the names of test cases such as "rft_0_ift_2" are not understandable without knowing what the acronyms mean. The directories of the new test case names will continue to use the acronyms as that better depicts the contents of the test files present.
The context of the script is a cdash tool so most of the variable names do not need that additional context in their names.
#600) Create driver class CDashAnalyzeReportRandomFailuresDriver inside module file CDashAnalyzeReportRandomFailures.py that will contain the main general functionality of the random test failure tool. The driver class accepts two strategy classes passed from the example script. These strategy classes ExampleVersionInfoStrategy and ExampleBuildNameStrategy contain the project specific implementation that is generically used inside of the driver class.
#600) This large commit is copying over the main() function and its associated helper functions into CDashAnalyzeReportRandomFailures.py inside the CDashAnalyzeReportRandomFailuresDriver class. This is part of the effort to refactor cdash_analyze_and_report_random_failures.py to be more generic.
…600) There were mixed use cases of 'targetTopic' or 'topicTarget', this renames all cases to use 'targetTopic' approach.
Moved example_cdash_analyze_and_report_random_failures.py to test/ci_support
Trilinos specific driver `trilinos_cdash_analyze_and_report_random_failures.py` based on `example_cdash_analyze_and_report_random_failures.py` that contains the Trilinos specific implementations of `VersionInfoStrategy` and `ExtractBuildNameStrategy`.
Example class did not include the 'Example' prefix.
This is for testing the CDashAnalyzeReportRandomFailures.py runDriver().
Adjusted spacing between classes and added newline character at the end of file.
This reverts commit dbe94f4. Reverting this commit as this specific driver implementation shouldn't be existing inside of TriBITS. Rather it should be added to the Trilinos repo after snapshotting TriBITS in.
Deleted the original `cdash_analyze_and_report_random_failures.py` script after moving its main functionality into a separate class inside `CDashAnalyzeReportRandomFailures.py`. To run the script, one must start from `example_cdash_analyze_and_report_random_failures.py` located in `test/ci_support` and supply an implementation of the two strategy objects used by the `CDashAnalyzeReportRandomFailures.py` driver class.
Removed unit tests related to the old script, `cdash_analyze_and_report_random_failures.py`. These tests will be put back as unittests for the module file `CDashAnalyzeReportRandomFailures.py`. This change will keep `cdash_analyze_and_report_random_failures_UnitTests.py` focused on the system tests for how the class `CDashAnalyzeReportRandomFailuresDriver` is used.
Added tests for `CDashAnalyzeReportRandomFailuresDriver` member functions in `CDashAnalyzeReportRandomFailures.py`.
Previous filename compression technique was to always trim the buildname to only the first 80 characters as to avoid "filename too long" errors. Cache file or directory names are built in the format of `testName_buildName` The above method does not protect against the case where testName may be very long. This implementation uses an existing function named `getCompressedFileNameIfTooLong` in `CDashQueryAnalyzeReport.py` module file which will form a hash of the passed in string if it is deamed too long. This will also help mitigate the chances of a filename collision as previously it was possible for a trimmed buildName to result in the same `testName_buildName` filename if testName was the same test and had the correct length.
Optional usageHelp string that can be passed to `CDashAnalyzeReportRandomFailuresDriver` that is outputted with when the main script is given the `--help` argument.
Used to specify the testing day start time unique to each CDash project.
9861650
to
5e5bd79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achauphan, I did not have time for a line-by-line review of the test driver test/ci_support/CDashAnalyzeReportRandomFailures_UnitTests.py
but I looked over the production code in the module CDashAnalyzeReportRandomFailures.py
and I did not see any big problems. I only have suggestions for improvements that can be done in future PRs during future work.
I think this is good to merge as is and get used in Trilinos.
@@ -0,0 +1,288 @@ | |||
# @HEADER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this file should be changed to example_cdash_analyze_and_report_random_failures_UnitTests.py
? The file/module cdash_analyze_and_report_random_failures.py
does not exist anymore. Or, you could call this system-level driver something like CDashAnalyzeReportRandomFailures_SystemLevelTests.py
? (But the name example_cdash_analyze_and_report_random_failures_UnitTests.py
might be more obvious?)
IMHO, the more detailed unit tests for CDashAnalyzeReportRandomFailures.py
should be put in the driver CDashAnalyzeReportRandomFailures_UnitTests.py
. It is nice to have very fast unit tests that are easer to debug in their own driver separate from the more expensive system-level tests driver. So when you are making changes, you run the faster CDashAnalyzeReportRandomFailures_UnitTests.py
and resolve any errors in those tests first before digging into the failures in the system-level tests. That is just my opinion and I find this separation works well from experience.
test/ci_support/cdash_analyze_and_report_random_failures_UnitTests.py
Outdated
Show resolved
Hide resolved
def getTargetTopicSha1s(self, buildData): | ||
pattern = r"Parent [12]:\n\s+(\w+)" | ||
matchedList = regex.findall(pattern, buildData) | ||
|
||
if len(matchedList) != 2: return None | ||
return tuple(matchedList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we may need to refactor this Strategy API to allow more flexibility in how the version info is extracted from CDash. For example, for a raw CMake project, the CMake configure output will not contain the version info and you will need to get that info in some other way (either directly or indirectly through other CDash REST API calls). Therefore, for the more general use case, we would need to pass in an object containing enough info to call CDash and get the needed data. But we can wait on that.
But note that even once Trilinos moves to GitHub Actions, this current implementation will still correctly extract the topic and target branch SHA1 info since the top commit will still be a merge commit. (But in the case of GHA, the merge commit is created by GitHub and that merge commit is just checked out locally instead of created locally.)
def checkTargetTopicRandomFailure(self, targetTopicPair, knownTargetTopicPairs): | ||
return targetTopicPair in knownTargetTopicPairs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function is needed in the strategy object. I think as long as the object returned from the getTargetTopicSha1s()
function is comparable (which you can specify as a requirement for the strategy interface), then this is perfectly general code for all implementations that I can image.
Therefore, I would remove this function from here and put it back as a free function called by the CDashAnalyzeReportRandomFailuresDriver.runDriver()
function.
buildConfigOutput = self.downloadBuildSummaryOffCDash( | ||
buildSummaryQueryUrl, buildSummaryCacheFile, verbose=printUrlMode =='all', | ||
alwaysUseCacheFileIfExists=True)['configure']['output'] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needing to call ['configure']['output']
at this top-level driver code seems to be too much detail about how the CDash data-structure works? A future refactoring might consider providing the wrapper function downloadConfigureOutputOffCDash()
that takes the same arguments and just calls:
return downloadBuildSummaryOffCDash( ... args ...)['configure']['output']
And the function downloadConfigureOutputOffCDash()
should be in the module CDashQueryAnalyzeReport.py
as well.
buildSummaryCacheFile = buildSummaryCacheDir+"/"+buildId | ||
buildSummaryQueryUrl = CDQAR.getCDashBuildSummaryQueryUrl(cdashSiteUrl, buildId) | ||
buildConfigOutput = self.downloadBuildSummaryOffCDash( | ||
buildSummaryQueryUrl, buildSummaryCacheFile, verbose=printUrlMode =='all', | ||
alwaysUseCacheFileIfExists=True)['configure']['output'] | ||
|
||
passingSha1Pair = \ | ||
self.versionInfoStrategy.getTargetTopicSha1s(buildConfigOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in my review of the file test/ci_support/example_cdash_analyze_and_report_random_failures.py
above, a future refactoring should move all of this into a refactored strategy interface that takes buildId
and returns the versionInfo
object. (And the versionInfo
object just must be printable and it must be compatible.) So the refactored code would be:
passingVersionInfo = self.versionInfoStrategy.getTargetTopicSha1s(buildId)
Then other aways to get the "version" for a build for different projects could be devised, as I mention in my review of the file test/ci_support/example_cdash_analyze_and_report_random_failures.py` above.
print("\nWriting HTML to file: "+writeEmailToFile+" ...") | ||
htmlStr = CDQAR.getFullCDashHtmlReportPageStr(cdashReportData, | ||
pageTitle=summaryLine, pageStyle=defaultPageStyle) | ||
# print(htmlStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out debug print statement?
|
||
if sendEmailTo: | ||
htmlStr = CDQAR.getFullCDashHtmlReportPageStr(cdashReportData, | ||
pageStyle=defaultPageStyle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this indent correct? Should this line not be indented one more level (i.e. 2 spaces)?
msg=CDQAR.createHtmlMimeEmail( | ||
sendEmailFrom, emailAddress, summaryLine, "", htmlStr) | ||
CDQAR.sendMineEmail(msg) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For long functions like this, I sometimes provide a comment indented at the same level as the function prototype with:
# end runDriver()
so that you know your context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achauphan, I did not have time for a line-by-line review of the test driver but I looked over the production code in the module CDashAnalyzeReportRandomFailures.py
and I did not see any big problems. I only have suggestions for improvements that can be done in future PRs during future work.
I think this is good to merge as is and get used in Trilinos.
@achauphan, I am okay to merge as is. Let me know and I will hit the merge button. |
@bartlettroscoe thanks for the review and suggestions! Hopefully there is time in the future to implement them. Please go ahead and merge :) If you'd like, I can give snapshotting this into Trilinos a try tomorrow, assuming that this is the correct guide. |
Close, but we use a separate branch to snapshot into and then and then merge that into the Trilinos 'develop' branch so as to not copy over local changes to TriBITS Trilinos. I will do it this time right now and will write instructions later for next time. |
WIP: This PR is the implementation of a CDash Random Test Failure analysis tool related to: