-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tool for analyzing and reporting random CDash test failures #600
Comments
Added initial set of arguments for script to take in when ran.
…ub#600) Helper module functions that construct the browser and query URLs to cdash that can be used for downloaded the data from.
…#600) This initial script implementation takes in several cdash arguments and filters cdash for an initial set of all failing tests for a certain number of days. With that set of all failing tests, the script will then get all of that test's testing history. The test's full testing history is used to build a set of target,topic sha1 associated with failing testing iterations. This initial implementation current lacks the check to see if a passing test's target,topic sha1s exist in the set of failing sha1s, which denotes an unstable test. Monolithic commit as this started from a lot of exploratory coding that eventually built to this starting implementation.
…#600) Add checkIfTestUnstable() that takes in a tuple of passing sha1s and a set of tuples containing nonpassing sha1s. This requires testing.
Add a set of unit tests for getBuildIdFromTest helper function from cdash_analyze_and_report_random_failure.py
Moved argument parsing into a function that gets called by main and changed getBuildIdFromTest to return the last item of the split string rather than a constant index.
Limited the build name to only 80 characters to shorten the cache file name.
Fix regex pattern to match a string literal rather than a raw json string output which was used prior for during testing.
…#600) Moved random failure test files to its own seperate folder inside test/ci_support as to not be confused with the test files associated with other script.
Test layout was copied from another script. Renamed various functions and function calls to reflect the actual script name that is being tested.
Included summary output of analysis run and found randomly failing tests.
Initial test case for 1 passing and 1 failing test in a test history with identical sha1s between the two, signifying a failing test
Initial test case for 1 passing and 1 failing test in a test history with identical sha1s between the two, signifying a failing test
CC: @sebrowne @achauphan , one thing that occurred to me is that this tool will need to allow the usage of build-name modifier to take in the build name from CDash and provide a name used to determine sequential builds for the Trilinos PR and nightly testing system. For example, all of the Trilinos build names have the prefix
are sequence of the same build but CDash actually does not recognize that because the build names are different. To identify a related sequence builds, you need to at least remove the suffix
Then, if the target and topic branches are the same, and if a test goes from passing to failing, then you can classify this as a random test failure. You can provide the means for adjusting the build names using the Strategy Design Pattern. So the two areas of variability for such a tool that will be project-specific (and therefore need to be abstracted out and pulled in as Strategy objects) are:
Those can be two separate strategy objects given to the Python class(es) that are doing the data processing and analysis. |
Each test in a fail test's history will have their own testname_buildname directory inside of the build_summary_cache directory. This was done to better group build summary cache files with their associated test and build names.
Added a not random failure system test case beginning from one failed test and 5 tests in its history where all tests contain merge commits with non-matching parents.
Renamed variables and cache files to be shorter in cases where the expected source or direction is related to CDash or CDash tests.
Renamed variables and cache files to be shorter in cases where the expected source or direction is related to CDash or CDash tests.
Renamed variables and cache files to be shorter in cases where the expected source or direction is related to CDash or CDash tests.
Use dummy strings that are more easily identifiable for test input files for parent commit hashes of the respective build summary output.
…TSPub#600) Normalize the groupName string before usage in a url request rather than having the user input an already normalized string. This has the added benefit of being able to use the groupName string without url normal characters for output in upcoming summary lines.
Function used 4 spaces instead of 2.
Removed individual printing of RandomFailureSummary in-line and instead use a str() function for RandomFailureSummary object
Added functionality to build an html file containing analysis results and the ability to report the results via email.
Added functionality to build an html file containing analysis results and the ability to report the results via email.
…600) Added cdash_analyze_and_report_random_failures_UnitTests ctest test using tribits_add_advanced_test().
This check was left in from an initial starting script. After adding cdash_analyze_and_report_random_failures_UnitTests.py as a ctest test, this check would cause CI ctest runs to fail as TRIBITS_DIR is set on a project basis during testing.
At a glance, the names of test cases such as "rft_0_ift_2" are not understandable without knowing what the acronyms mean. The directories of the new test case names will continue to use the acronyms as that better depicts the contents of the test files present.
The context of the script is a cdash tool so most of the variable names do not need that additional context in their names.
#600) Create driver class CDashAnalyzeReportRandomFailuresDriver inside module file CDashAnalyzeReportRandomFailures.py that will contain the main general functionality of the random test failure tool. The driver class accepts two strategy classes passed from the example script. These strategy classes ExampleVersionInfoStrategy and ExampleBuildNameStrategy contain the project specific implementation that is generically used inside of the driver class.
#600) This large commit is copying over the main() function and its associated helper functions into CDashAnalyzeReportRandomFailures.py inside the CDashAnalyzeReportRandomFailuresDriver class. This is part of the effort to refactor cdash_analyze_and_report_random_failures.py to be more generic.
…600) There were mixed use cases of 'targetTopic' or 'topicTarget', this renames all cases to use 'targetTopic' approach.
Moved example_cdash_analyze_and_report_random_failures.py to test/ci_support
Trilinos specific driver `trilinos_cdash_analyze_and_report_random_failures.py` based on `example_cdash_analyze_and_report_random_failures.py` that contains the Trilinos specific implementations of `VersionInfoStrategy` and `ExtractBuildNameStrategy`.
Example class did not include the 'Example' prefix.
This is for testing the CDashAnalyzeReportRandomFailures.py runDriver().
Adjusted spacing between classes and added newline character at the end of file.
This reverts commit dbe94f4. Reverting this commit as this specific driver implementation shouldn't be existing inside of TriBITS. Rather it should be added to the Trilinos repo after snapshotting TriBITS in.
Deleted the original `cdash_analyze_and_report_random_failures.py` script after moving its main functionality into a separate class inside `CDashAnalyzeReportRandomFailures.py`. To run the script, one must start from `example_cdash_analyze_and_report_random_failures.py` located in `test/ci_support` and supply an implementation of the two strategy objects used by the `CDashAnalyzeReportRandomFailures.py` driver class.
Removed unit tests related to the old script, `cdash_analyze_and_report_random_failures.py`. These tests will be put back as unittests for the module file `CDashAnalyzeReportRandomFailures.py`. This change will keep `cdash_analyze_and_report_random_failures_UnitTests.py` focused on the system tests for how the class `CDashAnalyzeReportRandomFailuresDriver` is used.
Added tests for `CDashAnalyzeReportRandomFailuresDriver` member functions in `CDashAnalyzeReportRandomFailures.py`.
Previous filename compression technique was to always trim the buildname to only the first 80 characters as to avoid "filename too long" errors. Cache file or directory names are built in the format of `testName_buildName` The above method does not protect against the case where testName may be very long. This implementation uses an existing function named `getCompressedFileNameIfTooLong` in `CDashQueryAnalyzeReport.py` module file which will form a hash of the passed in string if it is deamed too long. This will also help mitigate the chances of a filename collision as previously it was possible for a trimmed buildName to result in the same `testName_buildName` filename if testName was the same test and had the correct length.
Optional usageHelp string that can be passed to `CDashAnalyzeReportRandomFailuresDriver` that is outputted with when the main script is given the `--help` argument.
Used to specify the testing day start time unique to each CDash project.
CDash Report random test failure tool (#600)
@achauphan and @sebrowne, the Trilinos PR that brings in TriBITS PR #603 is: We can work on further refactorings and feature enhancements later. I can see were this may be useful for some metrics for other projects that submit to CDash so I will do those refactorings as needed. |
Added argument to specify a prefix string for the built html page title and the email subject. This can help with the tool's email searchability.
…-02-15 CDash Random failure tool patch 2024-02-15 (#600)
Related issues
Description
Random failures can bring down an entire CI iteration on a regular basis and waste resources whenever a retest is requested in order to pass the various checks of a pull request.
Spotting a randomly failing test requires a lot of manual CDash querying and analysis by the developer. However, in most cases, a developer may not have the time to trace, identify, and report the randomly failing test, and instead will opt to ignore it in favor of requesting a retest, leading to the previously stated point of wasting resources. This lack of reporting also leads to bigger issue in that it allows the randomly failing test to linger inside the code base and further affect developers in the future.
Proposed Solution
This issue proposes a new tool (which for now would live inside of TriBITS under
tribits/ci_support
) that can run automatically to query, scrape, analyze, and report tests that are deemed to be "randomly failing" to an operations team via email or an automated issue creation in the repository.The definition for a randomly failing test will be a test that intermittently reports as passing or failing without any changes made to the topic or target branch being tested (topic and target tip SHA1 are the same) between CI testing iterations.
Fortunately, there is a lot of already existing work done that can be leveraged to build this tool in Python that already exists inside of
tribits/ci_support
. Notably, the moduleCreateIssueTrackerFromCDashQuery.py
which can be used in the template exampleexample_test_failure_github_issue.py
along with the moduleCDashQueryAnalyzeReport.py
which contains most of the heavy CDash querying functionality. Thus, the core work that will need to be done after utilizing the previously written modules will be to implement the algorithm that determines a random failure that is customizable on a project basis.The goal will be for this tool to be able to look for randomly failing tests for any projects that posts their test results to CDash. The specifics of how this tool will gather the version information of the builds in CDash will be unique to each project and will require implementation on a project basis.
Ideally, this tool can be extended to analyze and report randomly failing configure, builds, and tests, however starting with randomly failing tests should lead to a similar framework that can be used for those other cases.
Requirements
posts a github issue upon identifying a randomly failing test(TRILFRAME-614 requirement for any post starting with an email first)The text was updated successfully, but these errors were encountered: