The main difficulty with describing how to contribute to a project is that there are a huge number of variations on how it’s done. Because Git is very flexible, people can and do work together in many ways, and it’s problematic to describe how you should contribute – every project is a bit different. Some of the variables involved are active contributor count, chosen workflow, your commit access, and possibly the external contribution method.
The first variable is active contributor count – how many users are actively contributing code to this project, and how often? In many instances, you’ll have two or three developers with a few commits a day, or possibly less for somewhat dormant projects. For larger companies or projects, the number of developers could be in the thousands, with hundreds or thousands of commits coming in each day. This is important because with more and more developers, you run into more issues with making sure your code applies cleanly or can be easily merged. Changes you submit may be rendered obsolete or severely broken by work that is merged in while you were working or while your changes were waiting to be approved or applied. How can you keep your code consistently up to date and your commits valid?
The next variable is the workflow in use for the project. Is it centralized, with each developer having equal write access to the main codeline? Does the project have a maintainer or integration manager who checks all the patches? Are all the patches peer-reviewed and approved? Are you involved in that process? Is a lieutenant system in place, and do you have to submit your work to them first?
The next issue is your commit access. The workflow required in order to contribute to a project is much different if you have write access to the project than if you don’t. If you don’t have write access, how does the project prefer to accept contributed work? Does it even have a policy? How much work are you contributing at a time? How often do you contribute?
All these questions can affect how you contribute effectively to a project and what workflows are preferred or available to you. We’ll cover aspects of each of these in a series of use cases, moving from simple to more complex; you should be able to construct the specific workflows you need in practice from these examples.
Before we start looking at the specific use cases, here’s a quick note about commit messages.
Having a good guideline for creating commits and sticking to it makes working with Git and collaborating with others a lot easier.
The Git project provides a document that lays out a number of good tips for creating commits from which to submit patches – you can read it in the Git source code in the Documentation/SubmittingPatches
file.
First, you don’t want to submit any whitespace errors.
Git provides an easy way to check for this – before you commit, run git diff --check
, which identifies possible whitespace errors and lists them for you.
If you run that command before committing, you can tell if you’re about to commit whitespace issues that may annoy other developers.
Next, try to make each commit a logically separate changeset.
If you can, try to make your changes digestible – don’t code for a whole weekend on five different issues and then submit them all as one massive commit on Monday.
Even if you don’t commit during the weekend, use the staging area on Monday to split your work into at least one commit per issue, with a useful message per commit.
If some of the changes modify the same file, try to use git add --patch
to partially stage files (covered in detail in [_interactive_staging]).
The project snapshot at the tip of the branch is identical whether you do one commit or five, as long as all the changes are added at some point, so try to make things easier on your fellow developers when they have to review your changes.
This approach also makes it easier to pull out or revert one of the changesets if you need to later.
[_rewriting_history] describes a number of useful Git tricks for rewriting history and interactively staging files – use these tools to help craft a clean and understandable history before sending the work to someone else.
The last thing to keep in mind is the commit message.
Getting in the habit of creating quality commit messages makes using and collaborating with Git a lot easier.
As a general rule, your messages should start with a single line that’s no more than about 50 characters and that describes the changeset concisely, followed by a blank line, followed by a more detailed explanation.
The Git project requires that the more detailed explanation include your motivation for the change and contrast its implementation with previous behavior – this is a good guideline to follow.
It’s also a good idea to use the imperative present tense in these messages.
In other words, use commands.
Instead of I added tests for'' or
Adding tests for,'' use ``Add tests for.''
Here is a template originally written by Tim Pope:
Short (50 chars or less) summary of changes
More detailed explanatory text, if necessary. Wrap it to
about 72 characters or so. In some contexts, the first
line is treated as the subject of an email and the rest of
the text as the body. The blank line separating the
summary from the body is critical (unless you omit the body
entirely); tools like rebase can get confused if you run
the two together.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet,
preceded by a single space, with blank lines in
between, but conventions vary here
If all your commit messages look like this, things will be a lot easier for you and the developers you work with.
The Git project has well-formatted commit messages – try running git log --no-merges
there to see what a nicely formatted project-commit history looks like.
In the following examples, and throughout most of this book, for the sake of brevity this book doesn’t have nicely-formatted messages like this; instead, we use the -m
option to git commit
.
Do as we say, not as we do.
The simplest setup you’re likely to encounter is a private project with one or two other developers. ``Private,'' in this context, means closed-source – not accessible to the outside world. You and the other developers all have push access to the repository.
In this environment, you can follow a workflow similar to what you might do when using Subversion or another centralized system.
You still get the advantages of things like offline committing and vastly simpler branching and merging, but the workflow can be very similar; the main difference is that merges happen client-side rather than on the server at commit time.
Let’s see what it might look like when two developers start to work together with a shared repository.
The first developer, John, clones the repository, makes a change, and commits locally.
(The protocol messages have been replaced with …
in these examples to shorten them somewhat.)
# John's Machine
$ git clone john@githost:simplegit.git
Initialized empty Git repository in /home/john/simplegit/.git/
...
$ cd simplegit/
$ vim lib/simplegit.rb
$ git commit -am 'removed invalid default value'
[master 738ee87] removed invalid default value
1 files changed, 1 insertions(+), 1 deletions(-)
The second developer, Jessica, does the same thing – clones the repository and commits a change:
# Jessica's Machine
$ git clone jessica@githost:simplegit.git
Initialized empty Git repository in /home/jessica/simplegit/.git/
...
$ cd simplegit/
$ vim TODO
$ git commit -am 'add reset task'
[master fbff5bc] add reset task
1 files changed, 1 insertions(+), 0 deletions(-)
Now, Jessica pushes her work up to the server:
# Jessica's Machine
$ git push origin master
...
To jessica@githost:simplegit.git
1edee6b..fbff5bc master -> master
John tries to push his change up, too:
# John's Machine
$ git push origin master
To john@githost:simplegit.git
! [rejected] master -> master (non-fast forward)
error: failed to push some refs to 'john@githost:simplegit.git'
John isn’t allowed to push because Jessica has pushed in the meantime. This is especially important to understand if you’re used to Subversion, because you’ll notice that the two developers didn’t edit the same file. Although Subversion automatically does such a merge on the server if different files are edited, in Git you must merge the commits locally. John has to fetch Jessica’s changes and merge them in before he will be allowed to push:
$ git fetch origin
...
From john@githost:simplegit
+ 049d078...fbff5bc master -> origin/master
At this point, John’s local repository looks something like this:
John has a reference to the changes Jessica pushed up, but he has to merge them into his own work before he is allowed to push:
$ git merge origin/master
Merge made by recursive.
TODO | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
The merge goes smoothly – John’s commit history now looks like this:
Now, John can test his code to make sure it still works properly, and then he can push his new merged work up to the server:
$ git push origin master
...
To john@githost:simplegit.git
fbff5bc..72bbc59 master -> master
Finally, John’s commit history looks like this:
In the meantime, Jessica has been working on a topic branch.
She’s created a topic branch called issue54
and done three commits on that branch.
She hasn’t fetched John’s changes yet, so her commit history looks like this:
Jessica wants to sync up with John, so she fetches:
# Jessica's Machine
$ git fetch origin
...
From jessica@githost:simplegit
fbff5bc..72bbc59 master -> origin/master
That pulls down the work John has pushed up in the meantime. Jessica’s history now looks like this:
Jessica thinks her topic branch is ready, but she wants to know what she has to merge into her work so that she can push.
She runs git log
to find out:
$ git log --no-merges issue54..origin/master
commit 738ee872852dfaa9d6634e0dea7a324040193016
Author: John Smith <[email protected]>
Date: Fri May 29 16:01:27 2009 -0700
removed invalid default value
The issue54..origin/master
syntax is a log filter that asks Git to only show the list of commits that are on the latter branch (in this case origin/master
) that are not on the first branch (in this case issue54
). We’ll go over this syntax in detail in [_commit_ranges].
For now, we can see from the output that there is a single commit that John has made that Jessica has not merged in. If she merges origin/master
, that is the single commit that will modify her local work.
Now, Jessica can merge her topic work into her master branch, merge John’s work (origin/master
) into her master
branch, and then push back to the server again.
First, she switches back to her master branch to integrate all this work:
$ git checkout master
Switched to branch 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
She can merge either origin/master
or issue54
first – they’re both upstream, so the order doesn’t matter.
The end snapshot should be identical no matter which order she chooses; only the history will be slightly different.
She chooses to merge in issue54
first:
$ git merge issue54
Updating fbff5bc..4af4298
Fast forward
README | 1 +
lib/simplegit.rb | 6 +++++-
2 files changed, 6 insertions(+), 1 deletions(-)
No problems occur; as you can see it was a simple fast-forward.
Now Jessica merges in John’s work (origin/master
):
$ git merge origin/master
Auto-merging lib/simplegit.rb
Merge made by recursive.
lib/simplegit.rb | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
Everything merges cleanly, and Jessica’s history looks like this:
Now origin/master
is reachable from Jessica’s master
branch, so she should be able to successfully push (assuming John hasn’t pushed again in the meantime):
$ git push origin master
...
To jessica@githost:simplegit.git
72bbc59..8059c15 master -> master
Each developer has committed a few times and merged each other’s work successfully.
That is one of the simplest workflows.
You work for a while, generally in a topic branch, and merge into your master branch when it’s ready to be integrated.
When you want to share that work, you merge it into your own master branch, then fetch and merge origin/master
if it has changed, and finally push to the master
branch on the server.
The general sequence is something like this:
In this next scenario, you’ll look at contributor roles in a larger private group. You’ll learn how to work in an environment where small groups collaborate on features and then those team-based contributions are integrated by another party.
Let’s say that John and Jessica are working together on one feature, while Jessica and Josie are working on a second.
In this case, the company is using a type of integration-manager workflow where the work of the individual groups is integrated only by certain engineers, and the master
branch of the main repo can be updated only by those engineers.
In this scenario, all work is done in team-based branches and pulled together by the integrators later.
Let’s follow Jessica’s workflow as she works on her two features, collaborating in parallel with two different developers in this environment.
Assuming she already has her repository cloned, she decides to work on featureA
first.
She creates a new branch for the feature and does some work on it there:
# Jessica's Machine
$ git checkout -b featureA
Switched to a new branch 'featureA'
$ vim lib/simplegit.rb
$ git commit -am 'add limit to log function'
[featureA 3300904] add limit to log function
1 files changed, 1 insertions(+), 1 deletions(-)
At this point, she needs to share her work with John, so she pushes her featureA
branch commits up to the server.
Jessica doesn’t have push access to the master
branch – only the integrators do – so she has to push to another branch in order to collaborate with John:
$ git push -u origin featureA
...
To jessica@githost:simplegit.git
* [new branch] featureA -> featureA
Jessica e-mails John to tell him that she’s pushed some work into a branch named featureA
and he can look at it now.
While she waits for feedback from John, Jessica decides to start working on featureB
with Josie.
To begin, she starts a new feature branch, basing it off the server’s master
branch:
# Jessica's Machine
$ git fetch origin
$ git checkout -b featureB origin/master
Switched to a new branch 'featureB'
Now, Jessica makes a couple of commits on the featureB
branch:
$ vim lib/simplegit.rb
$ git commit -am 'made the ls-tree function recursive'
[featureB e5b0fdc] made the ls-tree function recursive
1 files changed, 1 insertions(+), 1 deletions(-)
$ vim lib/simplegit.rb
$ git commit -am 'add ls-files'
[featureB 8512791] add ls-files
1 files changed, 5 insertions(+), 0 deletions(-)
Jessica’s repository looks like this:
She’s ready to push up her work, but gets an e-mail from Josie that a branch with some initial work on it was already pushed to the server as featureBee
.
Jessica first needs to merge those changes in with her own before she can push to the server.
She can then fetch Josie’s changes down with git fetch
:
$ git fetch origin
...
From jessica@githost:simplegit
* [new branch] featureBee -> origin/featureBee
Jessica can now merge this into the work she did with git merge
:
$ git merge origin/featureBee
Auto-merging lib/simplegit.rb
Merge made by recursive.
lib/simplegit.rb | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
There is a bit of a problem – she needs to push the merged work in her featureB
branch to the featureBee
branch on the server.
She can do so by specifying the local branch followed by a colon (:) followed by the remote branch to the git push
command:
$ git push -u origin featureB:featureBee
...
To jessica@githost:simplegit.git
fba9af8..cd685d1 featureB -> featureBee
This is called a refspec.
See [_refspec] for a more detailed discussion of Git refspecs and different things you can do with them.
Also notice the -u
flag; this is short for --set-upstream
, which configures the branches for easier pushing and pulling later.
Next, John e-mails Jessica to say he’s pushed some changes to the featureA
branch and ask her to verify them.
She runs a git fetch
to pull down those changes:
$ git fetch origin
...
From jessica@githost:simplegit
3300904..aad881d featureA -> origin/featureA
Then, she can see what has been changed with git log
:
$ git log featureA..origin/featureA
commit aad881d154acdaeb2b6b18ea0e827ed8a6d671e6
Author: John Smith <[email protected]>
Date: Fri May 29 19:57:33 2009 -0700
changed log output to 30 from 25
Finally, she merges John’s work into her own featureA
branch:
$ git checkout featureA
Switched to branch 'featureA'
$ git merge origin/featureA
Updating 3300904..aad881d
Fast forward
lib/simplegit.rb | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
Jessica wants to tweak something, so she commits again and then pushes this back up to the server:
$ git commit -am 'small tweak'
[featureA 774b3ed] small tweak
1 files changed, 1 insertions(+), 1 deletions(-)
$ git push
...
To jessica@githost:simplegit.git
3300904..774b3ed featureA -> featureA
Jessica’s commit history now looks something like this:
Jessica, Josie, and John inform the integrators that the featureA
and featureBee
branches on the server are ready for integration into the mainline.
After the integrators merge these branches into the mainline, a fetch will bring down the new merge commit, making the history look like this:
Many groups switch to Git because of this ability to have multiple teams working in parallel, merging the different lines of work late in the process. The ability of smaller subgroups of a team to collaborate via remote branches without necessarily having to involve or impede the entire team is a huge benefit of Git. The sequence for the workflow you saw here is something like this:
Contributing to public projects is a bit different. Because you don’t have the permissions to directly update branches on the project, you have to get the work to the maintainers some other way. This first example describes contributing via forking on Git hosts that support easy forking. Many hosting sites support this (including GitHub, BitBucket, Google Code, repo.or.cz, and others), and many project maintainers expect this style of contribution. The next section deals with projects that prefer to accept contributed patches via e-mail.
First, you’ll probably want to clone the main repository, create a topic branch for the patch or patch series you’re planning to contribute, and do your work there. The sequence looks basically like this:
$ git clone (url)
$ cd project
$ git checkout -b featureA
# (work)
$ git commit
# (work)
$ git commit
Note
|
You may want to use |
When your branch work is finished and you’re ready to contribute it back to the maintainers, go to the original project page and click the `Fork'' button, creating your own writable fork of the project.
You then need to add in this new repository URL as a second remote, in this case named `myfork
:
$ git remote add myfork (url)
Then you need to push your work up to it. It’s easiest to push the topic branch you’re working on up to your repository, rather than merging into your master branch and pushing that up. The reason is that if the work isn’t accepted or is cherry picked, you don’t have to rewind your master branch. If the maintainers merge, rebase, or cherry-pick your work, you’ll eventually get it back via pulling from their repository anyhow:
$ git push -u myfork featureA
When your work has been pushed up to your fork, you need to notify the maintainer.
This is often called a pull request, and you can either generate it via the website – GitHub has it’s own Pull Request mechanism that we’ll go over in [_github] – or you can run the git request-pull
command and e-mail the output to the project maintainer manually.
The request-pull
command takes the base branch into which you want your topic branch pulled and the Git repository URL you want them to pull from, and outputs a summary of all the changes you’re asking to be pulled in.
For instance, if Jessica wants to send John a pull request, and she’s done two commits on the topic branch she just pushed up, she can run this:
$ git request-pull origin/master myfork
The following changes since commit 1edee6b1d61823a2de3b09c160d7080b8d1b3a40:
John Smith (1):
added a new function
are available in the git repository at:
git://githost/simplegit.git featureA
Jessica Smith (2):
add limit to log function
change log output to 30 from 25
lib/simplegit.rb | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
The output can be sent to the maintainer–it tells them where the work was branched from, summarizes the commits, and tells where to pull this work from.
On a project for which you’re not the maintainer, it’s generally easier to have a branch like master
always track origin/master
and to do your work in topic branches that you can easily discard if they’re rejected.
Having work themes isolated into topic branches also makes it easier for you to rebase your work if the tip of the main repository has moved in the meantime and your commits no longer apply cleanly.
For example, if you want to submit a second topic of work to the project, don’t continue working on the topic branch you just pushed up – start over from the main repository’s master
branch:
$ git checkout -b featureB origin/master
# (work)
$ git commit
$ git push myfork featureB
# (email maintainer)
$ git fetch origin
Now, each of your topics is contained within a silo – similar to a patch queue – that you can rewrite, rebase, and modify without the topics interfering or interdepending on each other, like so:
Let’s say the project maintainer has pulled in a bunch of other patches and tried your first branch, but it no longer cleanly merges.
In this case, you can try to rebase that branch on top of origin/master
, resolve the conflicts for the maintainer, and then resubmit your changes:
$ git checkout featureA
$ git rebase origin/master
$ git push -f myfork featureA
This rewrites your history to now look like Commit history after featureA
work..
Because you rebased the branch, you have to specify the -f
to your push command in order to be able to replace the featureA
branch on the server with a commit that isn’t a descendant of it.
An alternative would be to push this new work to a different branch on the server (perhaps called featureAv2
).
Let’s look at one more possible scenario: the maintainer has looked at work in your second branch and likes the concept but would like you to change an implementation detail.
You’ll also take this opportunity to move the work to be based off the project’s current master
branch.
You start a new branch based off the current origin/master
branch, squash the featureB
changes there, resolve any conflicts, make the implementation change, and then push that up as a new branch:
$ git checkout -b featureBv2 origin/master
$ git merge --no-commit --squash featureB
# (change implementation)
$ git commit
$ git push myfork featureBv2
The --squash
option takes all the work on the merged branch and squashes it into one non-merge commit on top of the branch you’re on.
The --no-commit
option tells Git not to automatically record a commit.
This allows you to introduce all the changes from another branch and then make more changes before recording the new commit.
Now you can send the maintainer a message that you’ve made the requested changes and they can find those changes in your featureBv2
branch.
Many projects have established procedures for accepting patches – you’ll need to check the specific rules for each project, because they will differ. Since there are several older, larger projects which accept patches via a developer mailing list, we’ll go over an example of that now.
The workflow is similar to the previous use case – you create topic branches for each patch series you work on. The difference is how you submit them to the project. Instead of forking the project and pushing to your own writable version, you generate e-mail versions of each commit series and e-mail them to the developer mailing list:
$ git checkout -b topicA
# (work)
$ git commit
# (work)
$ git commit
Now you have two commits that you want to send to the mailing list.
You use git format-patch
to generate the mbox-formatted files that you can e-mail to the list – it turns each commit into an e-mail message with the first line of the commit message as the subject and the rest of the message plus the patch that the commit introduces as the body.
The nice thing about this is that applying a patch from an e-mail generated with format-patch
preserves all the commit information properly.
$ git format-patch -M origin/master
0001-add-limit-to-log-function.patch
0002-changed-log-output-to-30-from-25.patch
The format-patch
command prints out the names of the patch files it creates.
The -M
switch tells Git to look for renames.
The files end up looking like this:
$ cat 0001-add-limit-to-log-function.patch
From 330090432754092d704da8e76ca5c05c198e71a8 Mon Sep 17 00:00:00 2001
From: Jessica Smith <[email protected]>
Date: Sun, 6 Apr 2008 10:17:23 -0700
Subject: [PATCH 1/2] add limit to log function
Limit log functionality to the first 20
---
lib/simplegit.rb | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/lib/simplegit.rb b/lib/simplegit.rb
index 76f47bc..f9815f1 100644
--- a/lib/simplegit.rb
+++ b/lib/simplegit.rb
@@ -14,7 +14,7 @@ class SimpleGit
end
def log(treeish = 'master')
- command("git log #{treeish}")
+ command("git log -n 20 #{treeish}")
end
def ls_tree(treeish = 'master')
--
2.1.0
You can also edit these patch files to add more information for the e-mail list that you don’t want to show up in the commit message.
If you add text between the ---
line and the beginning of the patch (the diff --git
line), then developers can read it; but applying the patch excludes it.
To e-mail this to a mailing list, you can either paste the file into your e-mail program or send it via a command-line program.
Pasting the text often causes formatting issues, especially with `smarter'' clients that don’t preserve newlines and other whitespace appropriately.
Luckily, Git provides a tool to help you send properly formatted patches via IMAP, which may be easier for you.
We’ll demonstrate how to send a patch via Gmail, which happens to be the e-mail agent we know best; you can read detailed instructions for a number of mail programs at the end of the aforementioned `Documentation/SubmittingPatches
file in the Git source code.
First, you need to set up the imap section in your ~/.gitconfig
file.
You can set each value separately with a series of git config
commands, or you can add them manually, but in the end your config file should look something like this:
[imap]
folder = "[Gmail]/Drafts"
host = imaps://imap.gmail.com
user = [email protected]
pass = p4ssw0rd
port = 993
sslverify = false
If your IMAP server doesn’t use SSL, the last two lines probably aren’t necessary, and the host value will be imap://
instead of imaps://
.
When that is set up, you can use git send-email
to place the patch series in the Drafts folder of the specified IMAP server:
$ git send-email *.patch
0001-added-limit-to-log-function.patch
0002-changed-log-output-to-30-from-25.patch
Who should the emails appear to be from? [Jessica Smith <[email protected]>]
Emails will be sent from: Jessica Smith <[email protected]>
Who should the emails be sent to? [email protected]
Message-ID to be used as In-Reply-To for the first email? y
Then, Git spits out a bunch of log information looking something like this for each patch you’re sending:
(mbox) Adding cc: Jessica Smith <[email protected]> from
\line 'From: Jessica Smith <[email protected]>'
OK. Log says:
Sendmail: /usr/sbin/sendmail -i [email protected]
From: Jessica Smith <[email protected]>
To: [email protected]
Subject: [PATCH 1/2] added limit to log function
Date: Sat, 30 May 2009 13:29:15 -0700
Message-Id: <[email protected]>
X-Mailer: git-send-email 1.6.2.rc1.20.g8c5b.dirty
In-Reply-To: <y>
References: <y>
Result: OK
At this point, you should be able to go to your Drafts folder, change the To field to the mailing list you’re sending the patch to, possibly CC the maintainer or person responsible for that section, and send it off.
This section has covered a number of common workflows for dealing with several very different types of Git projects you’re likely to encounter, and introduced a couple of new tools to help you manage this process. Next, you’ll see how to work the other side of the coin: maintaining a Git project. You’ll learn how to be a benevolent dictator or integration manager.