Skip to content
Bryce Allen edited this page Jul 20, 2018 · 10 revisions

CODAR Software Integration QA process

Purpose

The QA process for CODAR is designed to find and track bugs that would prevent the integration of multiple independent software packages that is required when performing co-design studies. To this end, we propose the following:

  • Each CODAR software package will maintain an integration branch, which contains new features that have passed the project's own unit tests and are thought to be ready to go into the next release, but have not yet been tested together with other CODAR software.
  • For each non-CODAR maintained package that we have a dependency on (part of the Integration Platform), we will have a regular communication as to a fixed reference version (release or not) that we will use as a reference build target.
  • Each week, a designated tester will build the software and run a test script on each of the target machine environments (see below). The test script will launch a Cheetah campaign that exercises the various software components using a simple science code. The tester is responsible for monitoring the progress of the campaign and posting results on a github issue thread designated for the current test cycle. Results will also be announced on the Slack #infrastructure channel.
  • If issues are found, developer resources will be allocated to fix them, and the test script will be re-run by various team members to validate the fixes.

CODAR software packages

We have three classes of the packages for which regular build testing is to be performed.

CODAR Products

  • Cheetah
  • Savanna
  • SZ
  • SOSFlow (?)
  • Chimbuko (?)

Integration Platform Components

  • ADIOS (1.X or 2.X)
  • Tau
  • ZFP
  • spack
  • Dataspaces (?)
  • evpath (?)

Integration Targets

  • Heat_transfer
  • Brusellator

Target Machine Environments

A machine environment consists of a machine (Cori, Theta, Titan, Summitdev, Summit), architecture (KNL/Haswell on Cori/Theta), and compiler chain (GNU, PGI, Intel, IBM, etc.). The target machine environments are those on which we will perform regular testing. We need to decide which those should be. What follows is an exhaustive list - we should prioritize which ones are most commonly used and focus on them to start, it is probably not practical to test all combinations.

  • Titan
    • GNU
    • PGI
    • Intel
    • Cray
  • Cori
    • Haswell
      • GNU
      • Intel
      • Cray
    • KNL
      • GNU
      • Intel
      • Cray
  • Theta
    • Haswell
      • GNU
      • Intel
      • Cray
      • LLVM
    • KNL
      • GNU
      • Intel
      • Cray
      • LLVM
  • Summit (future)

Software Stack Build System

We will attempt to leverage Spack as much as possible to simplify building the software stack, and to avoid re-inventing the wheel. A top level build script will still be maintained to fill any gaps not handled by Spack. The CODAR fork of Spack will be used to maintain bleeding edge and customized package files without worrying about where they are in the upstream merge process.

Note that to maintain fully independent Spack installs, putting configuration in ~/.spack should be avoided. Instead, $SPACK_HOME/etc/spack should be used, with a separate SPACK_HOME for each task (e.g., specific codesign study, and each weekly QA run). See https://spack.readthedocs.io/en/latest/configuration.html#configuration-scopes.

Designated Testers

To share the fun and to get fresh eyes trying things, we will distribute the work of testing among multiple people.

Workflow Outline (WIP)

ssh {TARGET_MACHINE}
cd Software_Stack_QA
git checkout integration # or perhaps a weekly tag?
git pull
cd titan/gnu
./build.sh /path/to/top/install/dir
./test.sh /path/to/top/install/dir

The build script should load necessary modules, checkout the appropriate spack branch, copy spack configuration to $SPACK_HOME/etc/spack, install the required software with spack + anything that requires a custom install. The test script should create and submit a set of cheetah campaigns. The test runner must monitor the campaigns and report the results. The campaigns should exist in shared project space, so if issues are encountered the whole CODAR team can examine the results. Periodic clean up of old campaign results will need to be done to avoid hitting quota on shared project space.

Reporting

Every week we have a QA test run, we will create a github issue for each target machine environment. If no problems are found, the issue can be closed immediately, otherwise detailed problems will be reported in the issue thread. Creating the issues could be automated with a script. Results should also be announced on slack, possibly also via an automated integration tool.