-
Notifications
You must be signed in to change notification settings - Fork 46
CESM Python Post Processing Git Workflow
This document describes the Git and GitHub workflow for developers of the CESM Python Post Processing Workflow. As a basis for common understanding of this git workflow, we recommend all developers start by reading, at a minimum, chapters 1-3 of the online online Pro Git book. The first two sections of chapter 6 may also help you to get acquainted with GitHub.
Once familiar with the basic concepts of git, these commands are available for quick reference.
$ git help
$ git help <COMMAND>
This workflow assumes that you already have your own personal GitHub account, and that you are using Git 2.0 or later. To check the version of Git:
$ git --version
On some systems, you may need to load a module. On yellowstone, the command is
$ module load git
Figure 1 below graphically illustrates some of the transactions that take place when interacting with local and remote repositories. NOTE - this figure refers to the CIME github repository but the concepts are the same. Terms referenced in the figure are also referred to throughout this document as follows:
- upstream - NCAR/CESM_postprocessing github or remote repository
- origin - Personal fork of the upstream repository
- local - The local repository on a particular machine
Figure 1
Configure Git (one time)
Git needs to be configured once locally on each machine for each developer. The global configuration file is stored in $HOME/.gitconfig.
Required settings:
$ git config --global user.name "Your Name"
$ git config --global user.email [email protected]
`$ git config --global push.default simple `
The “push.default simple” configuration ensures that only the currently checked out local branch will be pushed to the remote repository (i.e. your fork on GitHub). This setting is unnecessary for Git 2.0 or later, but it helps when using a slightly older version. If there is a problem with this setting, it may be because your version of Git is very out-of-date, and you need to find or acquire a new version on your system.
All NCAR/CESM_postprocessing commits are required to use the following commit template:
[ 50 character, one line summary ]
- [ Description of the changes in this Pull Request. It should be enough
- information for someone not following this development to understand. Lines should be wrapped at about 72 characters. ]
Test suite:
Fixes [Github issue #]:
User interface changes:
Input data changes:
Code review:
You can configure git to use this template automatically:
- save the template to ~/.pp-commit-template.txt
- onetime setup for all repos you use on this machine
$ git config --global commit.template $HOME/.pp-commit-template.txt
- per sandbox setup if you have different templates for different projects
$ cd /path/to/sandbox
$ git config commit.template $HOME/.pp-commit-template.txt
- configure the default editor
$ git config --global core.editor [editor of your choice: emacs, vi, vim, etc]
Note that this template is only used when you type git commit. If you use git commit -m the template is ignored.
If you use the github web interface to issue a pull request, please remember to cut-n-paste the contents of the commit template into the comment field of the pull request web form.
Convenient (but not required) settings:
$ git config --global color.ui true # note - true is the default
$ git config --global core.editor [editor of your choice: emacs, vi, vim, etc]
$ git config --global diff.algorithm histogram
$ git config --global merge.ff false
$ git config --global pull.ff only
Note that if you use typical *nix environment variables to set an editor (e.g. using $EDITOR), Git will pick that up automatically even if you don’t add a setting to .gitconfig.
The diff.algorithm option will generate better patches than the default.
The merge.ff and pull.ff settings are mainly important if you are integrating changes back to master. Setting merge.ff=false is equivalent to specifying --no-ff when you do a merge; this is important when merging to master in order to maintain a clean history using 'git log --first-parent'. Setting pull.ff=only prevents you from using 'git pull' if your local branch has evolved. If you truly want to merge the changes from the remote with those on your local branch, then you will need to do so with git fetch + git merge (which will generate a merge commit). But, in order to maintain a clean history using 'git log --first-parent', we never want merge commits generated when updating your local copy of master. Setting pull.ff=only ensures that this will be true, as long as you only try to update your local copy of master using 'git pull'.
Git needs to be setup once per clone to use any NCAR/CESM_postprocessing specific commit templates and hooks.
See also: Pro Git, section 1.6
Fork a copy on GitHub from the upstream repo by going to
https://github.com/NCAR/CESM_postprocessing
and clicking on the “Fork” button in the upper right corner of the repository’s main page. Choose your personal GitHub user account when creating a new fork.
See also: Pro Git section 6.2
Clone the remote fork to your local machine. The GitHub fork is your “origin” and the local repository is referred to as “local repo-name”.
$ git clone https://github.com/username/CESM_postprocessing [local-repo-name]
$ cd [local-repo-name]
If you don’t specify a local-repo-name on the command line then the default directory created is “CESM_postprocessing”. For this document, we assume that the “local” directory is called “CESM_postprocessing”.
See also: Pro Git, section 2.1
First, query your local and remote repo for available branches.
To see what branches are available locally:
$ git branch --list
To see what tags are available:
$ git tag --list
To see all branches locally and remotely:
$ git branch --list --all
- * master
- remotes/origin/HEAD -> origin/master remotes/origin/master
To see all remote connections:
$ git remote -v origin https://github.com/NCAR/CESM_postprocessing.git (fetch) origin https://github.com/NCAR/CESM_postprocessing.git (push)
All changes should be carried out on a branch. Changes include: * the addition of a new feature (subcomponent/feature) * fixing a bug (subcomponent/bug_fix) * documentation * new tests
$ git checkout master
$ git branch [new-branch]
$ git checkout [new-branch]
-- or --
$ git checkout -b [new-branch] master
Tools for viewing the commit logs and project history:
If you have X11 installed and X11 forwarding setup for ssh, then you can launch the built in git GUI from the command line:
$ gitk
-- or --
$ git log --oneline --first-parent
To create a new branch starting from an existing tag or branch, and check it out:
$ git branch [some-old-tag-or-branch] [new_branch]
$ git checkout [new_branch]
-- or --
$ git checkout -b [new_branch] [some-old-tag-or-branch]
To switch to an existing branch:
$ git checkout [new-branch]
[Note: It is also possible to check out a tag this way, but this leaves you in a “detached HEAD” state, where changes you make and commit can be lost unless you make a new branch for them. Unless you are sure that you will make no changes in your working directory (e.g. because you just want to look at or archive the code without running it) you should make a branch instead of checking out the tag directly!]
To delete an existing branch:
$ git checkout [branch_name]
$ git branch -d [branch_name]
See also: Pro Git, section 2.6 and Pro Git, section 3.1
Git has a working copy (what is on the file system) and a staging area (what is actually going to be committed). They are not the same.
Here’s a workflow scenario for making changes.
First, you make changes to “filename1” and “filename2”. You check that these files are changed (and are the only changes):
$ git status
You want to stage the changes for the next commit:
$ git add filename1 filename2
$ git status
Actually, you’ve changed your mind, so you don't want the changes in “filename2” committed:
$ git reset filename2
$ git status
You decide to throw out these changes to “filename2” and restore the original version in your working copy:
$ git checkout filename2
You want to stage removal of “filename3” for this commit as well:
$ git rm filename3
To stage moving working copy file or directory to a new location:
$ git mv path/to/source destination/path
Finally, you can commit the staging area:
$ git commit
Note: do not use git commit -m. Always allow git to open an editor and use the commit template.
Optional step - To tag the committed change to your local:
$ git tag [tagname]
Optional step - To create a release tag for CESM_postprocessing, all tags are required to be annotated:
$ git tag -a [tagname]
To delete an existing tag:
$ git tag -d [tagname]
Local tag naming conventions can be whatever is most helpful to your development workflow. Production release tagging should follow semantic versioning conventions. See Semantic Versioning 2.0.0 for details.
There are 3 different types of “snapshot” in the local repo at any given moment: * the working copy * the staging area for the next commit * a commit such as the HEAD of a branch, a tag, or an arbitrary commit
To see the differences between any 2 of the 3 states, use one of the following commands. To see the difference between the working copy and staging area:
`$ git diff `
To see the difference between the staging area and the head of the branch:
$ git diff --staged
To see how the working copy differs from the head of the branch:
$ git diff HEAD
To see the difference between the working copy and a tag (or branch):
$ git diff cesm_postprocessing_0.1.0
To view a summary of the overall state of your local repo:
$ git status
See also: Pro Git, section 2.2
To update your local repo with the latest changes from the CESM development repo, you should add a remote on the command line.
$ git remote add upstream https://github.com/NCAR/CESM_postprocessing
This adds “upstream” as an alias for the URL “https://github.com/NCAR/CESM_postprocessing”.
To get the latest data from the GitHub remote:
$ git fetch origin
--or--
$ git fetch upstream
If you clone a repository, the command automatically adds that remote repository under the name “origin”. So “git fetch origin” fetches any new work that has been pushed to that server since you cloned it (or last fetched from it).
It’s important to note that the git fetch command only pulls the data to your local repository – it doesn’t automatically merge it with any of your work, or modify what you’re currently working on. You have to merge it manually into your work when you’re ready.
See also: Pro Git section 2.5
Now that the local repo is updated with upstream information, you can merge those changes into a local branch.
$ git merge upstream/master
See Pro Git section 3.2 or the “git help merge” command for details regarding managing merging conflicts.
When you push a branch back to origin, it updates the branch in your GitHub repo with all the commits you’ve made since you last pushed or pulled that branch. (For a new branch, this means all the commits made locally since you created the branch.)
To push a branch, simply use a command like this:
$ git push origin [new-branch]
You will be prompted for your GitHub username and login. You can check the GitHub website for your fork to make sure that it reflects the changes.
If you want to share a tag, you can use the same command:
$ git push origin [new-branch-v01]
If you are working on or testing the same changes on multiple machines, you may want to push a branch to your GitHub fork, then pull that same branch on a different machine and refine it there. Assuming that you’ve cloned your fork on another machine, you can fetch the branch you pushed to GitHub and make a local version of the branch like this:
$ git fetch origin
$ git branch [new-branch] origin/[new-branch]
See also: Pro Git section 3.5
Submit a pull request to merge your branch back into the upstream remote
From the GitHub web site, directly after a git push origin command, there are numerous options for submitting a pull request for your branch. You can always click on the “Pull Request” links and associated icons to issue a pull request.
See also: Pro Git section 6.2
This merge can be done by either the assignee (reviewer) of the pull request or by the original developer. In the latter case, the developer should wait for an "ok" from the reviewer.
Option 1: From the github interface
You can perform the merge from the github interface simply by clicking on the "Merge pull request" button. This works as long as there aren't any merge conflicts (in which case the button will not be available).
Option 2: From the command line
You may want to perform the merge from the command line for a number of reasons: - You need to resolve conflicts - The reviewer wants to make some small changes before merging to master - You want to test the merged version before pushing it - You don't like the log message generated by github (which starts with "Merge pull request...")
There are a number of possible workflows for doing this merge. One that is straightforward and avoids creating extra branches in the history is the following; this assumes that you are merging changes from the branch NEWBRANCH on GITUSER's fork:
# Add developer's fork as a remote if you have not already done so git remote add GITUSER https://github.com/GITUSER/CESM_postprocessing.git
# Fetch all changes from GITUSER's remote git fetch GITUSER
# Make sure your local version of master is identical to upstream. # Note: The 'reset --hard' command is potentially destructive. # However, it should be okay in this case since your local version of master # should never be ahead of upstream/master. # If you'd like to be sure of this, you can run 'git status' after checking out # master, and before running 'git reset' - if this says that your branch is ahead # of upstream/master, then you have made commits to your local master that you # never pushed upstream. # In this case, you need to fix this problem before continuing.
git fetch upstream git checkout master git reset --hard upstream/master
# Merge new changes into master # # It is VERY important that you use --no-ff here, # so that 'git log --first-parent' works as expected. # # git will open a commit message in your editor; # you should fill this in with details as noted below
git merge --no-ff GITUSER/[new-branch]
# Push new master to NCAR/CESM_postprocessing
git push upstream master
Occasionally, you may see a message like this:
error: failed to push some refs to '[email protected]/NCAR/CESM_postprocessing' hint: Updates were rejected because the remote contains work that you do hint: not have locally. This is usually caused by another repository pushing hint: to the same ref. You may want to first merge the remote changes (e.g., hint: 'git pull') before pushing again. hint: See the 'Note about fast-forwards' in 'git push --help' for details.
This can happen when you "lose a race" as integrator: Changes have been made on the remote since the time when you updated your local copy of master. Do NOT follow the suggestion of doing a 'git pull' (or a 'git merge') into your local copy of master in this case! Doing a non-fast-forward merge to update your local copy of master results in a messy history that does not summarize nicely with 'git log --first-parent'. Instead, you should discard your changes and redo the merge:
git fetch upstream git reset --hard upstream/master git merge --no-ff GITUSER/[new-branch] git push upstream master
Merge commit messages should follow the same template as regular commits (given above). Following this template is even more important for merge commits than for regular commits, because these will be the only commits visible with a summarized log ('git log --first-parent'). Thus, be sure to document what testing was done on this topic branch, along with a summary of changes being made with this merge. Note that, even if you have configured git so that it uses the commit template by default, it will not be used for merge commits (and of course would not be used when you do the merge via the web interface), so you will need to manually copy and paste the commit template into your log message, then edit it.
There are several ways to accomplish the main goal of creating a new CIME tag… but the prefered technique outlined below is recommended because it includes the most information in the repository itself (that is also visible on the website):
Make sure your local repository is up-to-date with latest master from NCAR/CESM_postprocessing
In local repo, make a new annotated tag [probably of the form cesm_postprocessing_X.Y.Z]
$ git tag -a [tagname]
This will bring up whatever is set as core.editor so you can write a brief commit log
Push your new tag back to NCAR/CESM_postprocessing
$ git push upstream [tagname]
Update the plans page for the appropriate alpha tag [for consistency, copy / paste your commit log from (3) into the Note field]
There are currently a number of external repositories managed in NCAR/CESM_postprocessing as git subtrees. These include NCAR/PyAverager, NCAR/PyReshaper, and NCAR/ASAPPyTools. It will also soon include the NCAR/PyConform tool used to create CMOR-compliant output variable time series files. A subtree is part of the repository whereas a fork is a copy of the repository. When you fork off the NCAR/CESM_postprocessing repo, you automatically get copies of the subtrees managed under the NCAR/CESM_postprocessing repo. Updates to these existing subtrees should only be done by the repo admins. Additional information about subtrees can be found on-line and will not be included in this document at this time.
Many questions can be answered by using the following resources.
From the web:
search the Pro Git on-line reference using the Search Entire Site search box. search the Github help site by clicking on the Help link in the top menu. For CIME specific help, see this document and the Github CIME wiki.
From the command-line:
$ git help
$ git help <COMMAND>
Beware, you may need to look in both places as the command line help documentation may not be up-to-date with the web documentation and visa-versa. For example, if you want to know how to sort the list of tags from a git command line:
$ git help tag
doesn’t include the --sort option but the web documentation does and states:
“--sort=<type> - Sort in a specific order. Supported type is "refname" (lexicographic order), "version:refname" or "v:refname" (tag names are treated as versions). The "version:refname" sort order can also be affected by the "versionsort.prereleaseSuffix" configuration variable. Prepend "-" to reverse sort order. When this option is not given, the sort order defaults to the value configured for the tag.sortvariable if it exists, or lexicographic order otherwise. See git-config[1].”
Cheyenne and DAV Quick Start Guide
* NO LONGER SUPPORTED as of 9/20/18 * Cheyenne and Geyser Quick Start Guide (v0.3.z)
Processor-counts, load-balancing and memory management on Cheyenne and Geyser
CESM Python Post Processing User's Guide
CESM Python Post Processing Developer's Guide