-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some GitHub commit message guidance #110
base: main
Are you sure you want to change the base?
Conversation
|
||
- <https://github.com/instructlab/instructlab-bot/pull/125/commits> | ||
- <https://github.com/instructlab/instructlab/pull/994/commits> | ||
- <https://github.com/instructlab/instructlab/pull/951/commits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there are even better examples out there. I'm happy to change this over time to represent some of the best examples people can find in our repos. These are a few I had dug up some weeks ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is a personal preference nit. Noting that many/most PRs are small in scope and (I believe) should usually only have 1 commit. I generally think of a commit as a unit of work that you might choose to un-commit. And breaking down work into revertable logic is a great practice. But stating multiple commits is the general case might lead to a lot of commits for work that doesn't need to be broken up. My ask would be to adjust the wording to not promote multiple commits quite so strongly but still indicate how to use them effectively. If there is a fundamental disagreement on approach, I'm happy to be a chameleon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're certainly not alone in this preference, and I think there's a bit of a difference between a camp of people who developed practices before github existed, and another camp who developed practices around github and maybe are less reliant on e.g. git log
Here's a concrete example of why I'm firmly in the "more is more" (within reason) on splitting up commits: https://github.com/instructlab/sdg/pull/74/commits
Totally trivial PR, but 4 distinctly logical changes, and being able to look at those changes (and their commit message) individually is actually really valuable - e.g. separating the reordering of the parameters from the deleting unused parameters makes it trivial to review. First you make sure the reordering didn't change anything, then you confirm that the 4 parameters being dropped aren't in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few thoughts, we might consider adding:
*Splitting commits is crucial because it provides "ease of revertability." If a specific commit causes an issue, reverting a single logical unit is much easier and less disruptive than reverting an entire feature.
-
Encouraging multiple PRs for large changes: contributors should be encouraged to split their work into multiple PRs if the total lines changed across commits exceed 1,000 lines. This rule of thumb helps keep changes manageable and reviews more focused.
-
The approach to splitting commits and PRs can depend on the review's velocity. High velocity in review and addressing feedback leads to rapid merges. Small PRs can be reviewed and merged quickly, benefiting both contributors, who see their changes integrated faster, and maintainers, who avoid reviewing massive changes. I wish the various projects could reach that state, we are not there yet.
This strategy helps maintain the project's health and efficiency by ensuring changes are easier to manage, review, and revert if necessary.
PRs should be formatted as a series of logical commits that implement the | ||
overall change proposed in the PR. Each commit should represent a logical step | ||
in the progression of the change. The commit message for each should clearly | ||
explain that step of the progression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to be explicit here about squashing fixes and cleanups into those logical commits?
I look for high-level context about "what" a change is to frame my review, as well as some reason "why" the change is needed to understand prioritization and benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I have some older text at the bottom of this doc that could be moved up here as a start. I already moved up some of it. Here's what could be moved:
It is common that a PR author may need to do a final rebase to clean up their proposed commit series before a PR can be merged. It is also fine for a project maintainer to perform this step when the changes necessary are straight forward enough to do so. This includes doing a final rebase on
main
if necessary. The PR itself should NOT include any merge commits ofmain
back into the developer's branch. We expect the proposed commit series to be a clean set of commits againstmain
without conflicts or merge commit history. We only use a merge commit to record the PR's inclusion intomain
.
into crafting high-quality commit messages. Here is a script you can use to generate | ||
a summary of the commits in your PR. It starts with a list of the commits one | ||
line at a time and then follows that with the full commit messages. The format works | ||
for pasting directly into GitHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GitHub CLI tools are also good for this. hub pull-request
starts an editor with a template for creating the PR message, including all of the commit messages for commits in the branch. I'm sure the more modern gh
does something similar, and I'd be surprised if there wasn't a VSCode somewhere that did something like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually use gh pr create
myself. When it gets to the PR description part, it launches my editor (vim) and I run :read!~/bin/ghprdesc.sh
which is the script from above.
If it's just one commit, it already does what I want by default, which is include the full commit message as the PR title and description.
That reminds me of another point to make -- for a PR that is a single commit, the commit message should be equal to the PR description in most cases. Otherwise, you're probably leaving out info that should be in the commit message, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if the PR description contained a hand crafted description of the overall fix. This way the user pushing the PR has to think about the overall fix or feature being pushed. It provides a good overview to the reader of the PR afterwards.
It should contain breakdown of the different parts if helpful or can be read in the commits.
A common theme in PR reviews has been discussing the level of detail that should be included in commit messages, as well as how to craft a series of commits in a PR that helps guide a reviewer through the changes made to accomplish the overall goal of the PR. This adds some text we can point to on this topic that provides some helpful explanations and examples of commit message and commit series best practices. Signed-off-by: Russell Bryant <[email protected]>
188008c
to
460a06a
Compare
lgtm, one other resource (well, quote really) often referenced is https://who-t.blogspot.com/2009/12/on-commit-messages.html
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I'm glad to see this doc improvement, thank you! One small thing we could add is to pay attention to the PR template (if it exists) as it provides guidance to contributors too.
```sh | ||
#!/bin/bash | ||
|
||
git log --reverse --oneline origin/main..HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the script is useful, it highly depends on how the remote is set up. I'd add a note about making sure the remote points to instructlab/instructlab
and not your fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Is origin
here the main repo (as it is for me) or your fork (which I call fork
). :-)
|
||
- <https://github.com/instructlab/instructlab-bot/pull/125/commits> | ||
- <https://github.com/instructlab/instructlab/pull/994/commits> | ||
- <https://github.com/instructlab/instructlab/pull/951/commits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few thoughts, we might consider adding:
*Splitting commits is crucial because it provides "ease of revertability." If a specific commit causes an issue, reverting a single logical unit is much easier and less disruptive than reverting an entire feature.
-
Encouraging multiple PRs for large changes: contributors should be encouraged to split their work into multiple PRs if the total lines changed across commits exceed 1,000 lines. This rule of thumb helps keep changes manageable and reviews more focused.
-
The approach to splitting commits and PRs can depend on the review's velocity. High velocity in review and addressing feedback leads to rapid merges. Small PRs can be reviewed and merged quickly, benefiting both contributors, who see their changes integrated faster, and maintainers, who avoid reviewing massive changes. I wish the various projects could reach that state, we are not there yet.
This strategy helps maintain the project's health and efficiency by ensuring changes are easier to manage, review, and revert if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me, thanks @russellb for the updates.
One thing wrt PR description I would prefer a hand crafted description rather than retrieval of the commit messages. See inline comment.
into crafting high-quality commit messages. Here is a script you can use to generate | ||
a summary of the commits in your PR. It starts with a list of the commits one | ||
line at a time and then follows that with the full commit messages. The format works | ||
for pasting directly into GitHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if the PR description contained a hand crafted description of the overall fix. This way the user pushing the PR has to think about the overall fix or feature being pushed. It provides a good overview to the reader of the PR afterwards.
It should contain breakdown of the different parts if helpful or can be read in the commits.
- <https://docs.kernel.org/process/submitting-patches.html#separate-your-changes> | ||
- <https://github.com/kubernetes/community/blob/master/contributors/guide/pull-requests.md#smaller-is-better-small-commits-small-pull-requests> | ||
|
||
While not a requirement right now, we may want to consider following a spec like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can also mention 50/72 rule, since Conventional Commits don't seem to cover it. E.g. this: https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
@russellb what is going on with this? |
A common theme in PR reviews has been discussing the level of detail
that should be included in commit messages, as well as how to craft a
series of commits in a PR that helps guide a reviewer through the
changes made to accomplish the overall goal of the PR.
This adds some text we can point to on this topic that provides some
helpful explanations and examples of commit message and commit series
best practices.
Signed-off-by: Russell Bryant [email protected]