Tagged with Git

Ancient History

In OpenStack, we have a particular problem where much of the early development on the project was done using bzr and launchpad. All this history is in git, but it can be difficult to find the bzr merge proposal in launchpad which caused a given commit to be merged.

Here's an example of how I did it yesterday.

We're interested in commit 8aea573:

commit 8aea573bd2e44e152fb4ef1627640bab1818dede
Author: Trey Morris ...
Date:   Tue Dec 28 23:55:58 2010 -0600

    initial lock functionality commit

To trace back to the merge commit which merged this into master, I did:

$> git log --graph --topo-order --ancestry-path --merges 8aea573bd2e44e152fb4ef1627640bab1818dede..HEAD
* commit ae5dbe2b5d4871d3e26e859c03feab705c9c59ea
  Merge: 9eca4d5 76e3923
  Author: Trey Morris ...
  Date:   Fri Jan 7 00:49:30 2011 +0000

      This branch implements lock functionality. The lock is stored in the compute worker database. Decorators have been added to the opens

* commit f9c33f4ba09e02f8668bdd655b7acba15984838c
  Merge: ba245da 9eca4d5
  Author: Trey Morris ...
  Date:   Thu Jan 6 16:35:48 2011 -0600

      merged trunk

* commit f09d1ce4d38f3a8ef72566e95cde38f1dc1b8bed
  Merge: 9b9b5fe 9a84a2b
  Author: Trey Morris ...
  Date:   Wed Dec 29 15:13:24 2010 -0600

      fixed merge conflict with trunk

Double check that by looking at exactly what was merged in:

$> git diff 9eca4d5..ae5dbe2
diff --git a/nova/api/openstack/servers.py b/nova/api/openstack/servers.py
index ce64ac7..f8d5e76 100644
--- a/nova/api/openstack/servers.py
+++ b/nova/api/openstack/servers.py
@@ -170,6 +170,50 @@ class Controller(wsgi.Controller):
             return faults.Fault(exc.HTTPUnprocessableEntity())
         return exc.HTTPAccepted()

+    def lock(self, req, id):
...

That's the one!

Now how to find the merge proposal? Simply googling for "This branch implements lock functionality" quickly lead me to the correct merge prop, but better ideas welcome :)

Tagged ,

Gerrit Patch Review From The Command Line

OpenStack Nova switched from bzr and launchpad to github and gerrit on Friday. While I'm delighted the project is using git now, I've always found the gerrit UI to be a bit of a pain.

On IRC, Monty Taylor mentioned the gerrit command line interface which looked fairly interesting. Sure enough, you can actually review and approve a patch using this without ever touching the web UI. Below is an example of reviewing a Glance patch, but the same thing would work for Nova.

First, you obviously need to clone the repo:

$> git clone git://github.com/openstack/glance.git
$> cd glance

To make life a little easier, you can add a host alias to your SSH config:

$> cat >> ~/.ssh/config <<EOF
Host review
  Hostname review.openstack.org
  Port 29418
  User markmc
EOF

Then add the gerrit server as a git remote:

$> git remote add -f gerrit ssh://markmc@review/openstack/glance.git

Okay, now browse the patches needing review:

$> ssh review gerrit query status:open project:openstack/glance | less

Once you've picked a patch, take it's Change-Id: and look at its patch sets and reviews:

$> ssh review gerrit query status:open project:openstack/glance   
        change:I27bb6b3951422ad32e5e0225765b1056c5b3ffc5   
        --current-patch-set --all-approvals | less

Then, using the 'ref' in the output, you can fetch the patches into your repo and review them:

$> git fetch gerrit refs/changes/36/636/2
$> git checkout -b git-authors FETCH_HEAD

Once you're ready to submit your review, you can do:

$> git checkout master
$> git branch -D git-authors
$> ssh review gerrit review --code-review +1 -m "'Looks good to me.'" cd9b3a0f2fb91d0d01606ef4bbd90cf8f29267da

That's all pretty neat, but I'm missing how to go about doing a detailed review with comments inline with quoted sections of code. Perhaps if 'gerrit review' could take the review comments over stdin?

Tagged ,

Git Rebasing (cont.)

As I said already, git's interactive rebase tool is seriously useful for preparing a nice, cleanly split up series of patches. And, despite some people's dire warnings, there's no reason not to share an in-progress patch series using git so long as you take care to warn others against relying on your tree not rebasing.

Why would a patch series not be complete? One reason might be that a patch introduces a regression. As they say, you often have to break some eggs to make an omelette but, if you value the power of git's bisection tool, you'll want each individual patch to be regression free.

Okay, say you're porting an application from one database framework to another. You might do a bunch of hacking to demonstrate the concept and then send that work out for comment. Only at this point will you go about figuring out polishing the work off and, finally, cleaning the changes up into a nice patch series.

This approach implies that the work will only stop rebasing quite late in the day. Which leaves a problem - how can you possibly collaborate with others if your tree is rebasing? How can you take patches to fix regressions? How can others help you clean up the series?

Here's one suggestion, based on an approach Stephen Tweedie came up with when we were working together on a series of patches:

  1. Say your branch is called fluffy-piglet. You've pushed it out and asked for comments. Don't rebase this branch again.
  2. Create another branch called fluffy-piglet-rebasing, basing it initially on fluffy-piglet.
  3. Tag both branches with e.g. a -v1 suffix, check that the trees in both tags are identical:

    $> git diff fluffy-piglet fluffy-piglet-rebasing
    $> git show -s --format='%t' fluffy-piglet fluffy-piglet-rebasing
    
  4. Push the rebasing branch and the v1 tags to your repo.

  5. If you wish to rebase and do some cleanup work on the patches, do so and tag and push the result to the fluffy-piglet-rebasing branch in your repo as in (3) and (4), but using a new suffix.
  6. If you receive some patches, pull them into your fluffy-piglet branch, tag the result and rebase the patches onto the rebasing branch e.g.

    $> git tag fluffy-piglet-v3
    $> git rebase --onto fluffy-piglet-rebasing-v2 fluffy-piglet-v2
    $> git tag fluffy-piglet-rebasing-v3
    
  7. If you wish to rebase unto latest upstream, you could first enable git's "reuse recorded resolution" feature:

    $> git config --global rerere.enabled true
    

    Then you rebase the rebasing branch:

    $> git checkout fluffy-piglet-rebasing
    $> git rebase upstream/master
    $> git tag fluffy-piglet-rebasing-v4
    

    And then, you merge upstream into the non-rebasing branch:

    $> git checkout fluffy-piglet
    $> git merge upstream/master
    $> git tag fluffy-piglet-v4
    

    As in (3), you should be able to verify that the two resulting trees are identical.

    If any conflicts needed to be resolved during rebasing, there's a good chance that having rerere enabled will mean the conflict will be automatically resolved when merging.

  8. Finally, if anyone wants to help you with any of the series cleanup work, just 'pass the baton'. You basically say, 'No more rebasing from me after v4, go ahed' and the other person can work away on the rebasing branch until they are ready to pass control back again.

This certainly isn't a straightforward workflow, but it gives you:

  • The ability to work with others since folks have a non-rebasing branch to work against
  • The ability to clean up a series using rebase while still having confidence that nothing is being screwed up because you have the pair of tags with identical tree contents
  • The ability to allow others to help clean up the series too

The fact that this workflow is so awkward has its advantages too - it encourages you to clean up the series early and stop rebasing it. This is not a workflow you'd like to use for an extended period of time.

Tagged

Git Rebasing

For me, 'git rebase -i' is perhaps git's killer feature. I'm a big fan of small, self-contained commits both for ease of patch review and for the sake of useful commit history later. I used to do this on CVS using quilt but git takes a huge amount of the pain out of it.

Ever since I discovered the feature a few years ago, I've also been vaguely aware of kernel developers advice to people on rebase ... often simplified to OMG no. Never rebase..

When pushed to elaborate, I guess most would say:

Once you share a commit with someone, never rebase it. They may base their work on your commit and by rebasing it, you're screwing everything up.

One memorable comment from Linus on the subject was "Have the f*cking back-bone to be able to stand behind what you did!".

In context, this all makes sense. If a kernel developer sends a pull request and it gets merged into one tree, then rebases and that gets merged into another tree and both get merged into Linus's tree ... then yes, you have a bit of a disaster on your hands.

However, I think the rules above are too simplistic for most git newbies. Such newbies are unlikely to see their trees pulled into the whirling vortex of kernel trees so there's no need to terrify them about using rebase.

My advice is:

  1. If you're learning git, take the time to understand the rebase command and, especially, the interactive option.
  2. If you're working on a series of patches, it's perfectly fine for you to share that series with others even if it's not finished. That means later rebasing a commit you've shared with others.
  3. If you're worried people might base their work on a commit you plan to rebase later, then you warn people by putting e.g. "v1", "rebases" or "rebasing" in the repository or branch name.
  4. If someone does base their work on a commit you have rebased, then point them at the Recovering From Upstream Rebase part of git-rebase(1). It's really not the end of the world, especially in the simpler cases.
Tagged

Git Workflow

Havoc's recent post on git was interesting because it shows how frustrating git can be if you try and treat it as "just another CVS". From that perspective, git just seems like it's just some bizarre way for kernel hackers to torture those who just want to get work done.

I turned that corner with git when I learned about "git-rebase -i" and came to the startling realisation that git's history is editable. Basically, this allows you to change your workflow such that you can hack away at will, commit often and then rewrite the history of your hacking session so that you have a coherent set of patches/commits at the end of it with a useful changelog.

e.g. you can go from:

A1---B1---A2---A3---C1---B2---C2---C3

to:

A1---A2---A3---B1---B2---C1---C2---C3

or even:

A'---B'---C'

Using git rebasing, I found that I could use a similar workflow to using quilt with CVS, or mercurial with its patch queue (mq) extension. The revision history becomes less about tracking the progress of your work, and more a maleable mechanism for preparing patches before submitting upstream.

Red Hat Magazine has a nice article explaining all this, and I even picked up some new tricks to try out:

  • git-merge --squash : merge a branch/tag into the current branch, but squash all the commits together as an uncommitted change to the working tree. When you go to commit the result, the changelog of all the merged commits is available in the commit message editor so you can munge them together into a useful changelog.
  • git-cherry-pick --no-commit : apply the changes from a given commit to your working tree, but do not commit it. Could be used to achieve something similar to a squashed merge, but where you selectively merge only some of the commits.
  • git-add --patch/--interactive : add some changes from the working tree to the index, but e.g. selectively add only some of the patch hunks from a given file. Allows you to make a bunch of changes to a file, but commit the changes as individual commits.
Tagged