How We Moved from Subversion to Git / GitHub

In February and March, 2012, Kiva moved from Subversion to Git. We're six years old and have used Subversion from the beginning. For a few years, there has been a growing urge to move to Git. Engineers had a mixture of reasons, but I think it was mostly wanting to use the newest, coolest thing. We didn't have a "Git wizard", but a few of us had used it before in various contexts.

GitHub is just a few blocks from Kiva in San Francisco, so I contacted them and we had a field trip. I had seen Scott Chacon speak at ZendCon 2011, and he seemed like a friendly, generous, and smart guy. Scott welcomed us into their office, showed us around, talked with us about GitHub and Git, shared our pizza, and answered our questions.

We finally decided to go for it. Our engineering leads designated me to own the project -- I had as much experience as anyone with Git. It took me two months to get us fully migrated.

Deciding on a Workflow

Our Subversion workflow was:

  1. Code for about 7 days on the "main" branch
  2. Copy that branch to a release branch, begin the QA phase
  3. QA / regression / bug fixes for about 2 days
  4. Do a production release, which meant to start pointing prod at the new release branch

Git enables a wide variety of workflows. We decided to keep our workflow as similar as possible while changing the revision control tool. As a model, we mostly used Vincent Driessen's "A successful Git branching model", with a sprinkle of GitHub Flow.

Choosing GitHub

For our workflow, we needed to designate a "central repository" equivalent to our self-hosted Subversion one. We decided to go with a GitHub Organization. This allowed us to have private repositories, manage our users, and access all of GitHub's awesome features including Pull Requests for code review / collaboration and its diff viewer (with comments). Also, GitHub has Issues for issue tracking, and we are considering using it instead of our current tracker, Redmine.

Migration Gameplan

At Kiva, we have a primary codebase where most engineers do their work. But we also have a number of smaller repos that only one or two people work on at a given time. We started by taking one of those repos and moving it completely over to Git, including:

  1. Clone the repo from Subversion to Git
  2. Put the repo on GitHub
  3. Have the engineer spend the iteration working with the Git repo
  4. Change all the deployment scripts to use the GitHub origin repo

In the end, we had a four-phase migration, each phase taking an iteration (two weeks):

  1. Migrate one small repo / app, end to end
  2. Migrate a second small repo, continuing to learn and iron out details
  3. Prepare documentation, timeline, rollout strategy, and communication for migrating the primary Kiva repo
  4. Primary repo migration which involved all engineers; (also migrate our other small remaining repos)

The timing worked out well since we did the main repo migration during Innovation Iteration.

Steps to Move a Repository

I followed these instructions to use git svn clone to replay the Subversion history into a Git repo and put it on GitHub.

For our main repo, I took the additional step of creating an svn post-commit hook to do agit svn rebase / git pushto keep the GitHub repo up-to-date while engineers continued committing to the Subversion repo. This allowed me to migrate over back-end pieces, like deployment scripts and CI, while engineers continued using Subversion.

Migrating the deployment scripts to use Git was fairly straightforward -- replacingsvn co / svn upreferences togit clone / git pull. The only gotcha I encountered was how Subversion allows empty directories but Git does not -- but there are well-documented workarounds.

Migrating a repository took place over one two-week iteration. In the middle of the iteration, normal work happened in Git, while production hotfixes happened in Subversion. When we created the next release branch in Git, I changed the QA deployment scripts to use it. Just before doing the production push, I changed the production deployment scripts to track the Git master branch.

Other Behind-the-Scenes Work

Our Subversioncommits triggered emails to engineers with files changed and diffs. I replicated this for Git using GitHub's Post-Receive URL Service Hook in conjunction with a small script I wrote to grab the related patch file from GitHub (see here for example) and email it out. When researching how to solve this problem, I was impressed by the options available for automating tasks in GitHub using the Post-Receive URLs, static patch file urls, and the GitHub API.

We use Redmine for bug tracking and have a syntax in our commit messages that triggers a homemade script to update Redmine tickets. Kevin, another engineer,replicated this using built-in Redmine change support and the redmine_github_hook. He also had to clone our repos onto our Redmine server and set up a cron job to keep them up to date.

One unexpected issue was related to Subversion revision numbers being sequentially increasing, whereas Git commit hashes are not. With Subversion, it's easy to look at a revision number, compare it to the revision number from when you deployed code to a server, and know if that code is on the server. I built a script to replicate this functionality, which essentially compares where two hashes occur in the git log.

We use a url prefix for static assets in our CDN. Before, this prefix had our Subversion revision number, so it needed to change to the Git hash, and add an entry for this in our .htaccess file.

Finally, I trolled our back-end systems for references to svn and proactively changed the important ones to use Git and have a Git repo. This wasn't too hard -- I talked it over with one of our ops guys, grep'd for "svn", and created a spreadsheet inventory.

Rolling Out to Engineers

Up until the fourth and final phase of the migration, my work had no impact on our 20 engineers except Kevin, who works on the first two small repos we migrated. I did three things to make the rollout as smooth as possible.

First, I created a wiki page with links to reading about Git, our workflow with diagram, one-time setup, commands to run during normal development, etc. Basically, the "Git at Kiva" bible.

Second, I created a sandbox repo on GitHub where engineers could experiment and learn in an environment virtually identical to the real thing.

Third, I identified one person from each of our engineering sub-teams to be a "Git Guru" -- a go-to person with Git questions. I chose people who were comfortable at least with Git basics. I had a meeting with them where I went through the wiki page and our workflow in detail and had them do some test commits and pushes to origin with the sandbox. We discussed a few important issues like "Are we going to encourage rebasing by default?" (yes, but be careful about "rebase hell") and "When are long-lived feature branches dangerous and why?" which helped work out some details and get buy-in from the Gurus.

From the engineer's perspective, the migration was a few emails showing them the wiki page, encouraging them to try out Git with the sandbox, telling them the cutover timeline, and then finally announcing that we were cut over. Some people got it right away, and some people had questions that the Gurus or I answered.

Next Steps

We've had some problems with the interplay between PhpStorm (the IDE that most engineers use) and our developer VMs:

  1. Lots of files unexpectedly changing mode
  2. Not being able to Pull using PhpStorm, as it complains there are uncommitted changes (when there don't seem to be)
  3. Your working copy ending up with uncommitted changes that are from a recently pulled commit, but that you didn't do yourself

Once we cut our first QA branch, we had a number of engineers not understand the subtle differences between the Git workflow (do your work on the release branch, merge the release branch to development) and our old Subversion workflow (commit your changes to both the development branch and the release branch). People had similar problems understanding how to make production hotfixes. This has led me to periodically walk over to an engineer and quiz them / answer questions about our Git workflow and branches. This one-on-one time seems to be a good way to answer questions and get everyone on the same page.

Once we have settled in to using Git in a few months, I hope to write another post reflecting its impact on Kiva engineering.

About the author

Tim Ledlie

After college, Tim founded a non-profit bicycle shop and learned to be a bike mechanic and run a small business. Before Kiva, he worked in genomics at the Broad Institute, writing code to help us better understand viruses like Dengue and HIV. He also brings experience as a freelance web designer and developer. His interests include ultimate frisbee, bicycle education and advocacy, small-ensemble singing, home-cooked food, learning and adventuring through travel, and living deliberately. Tim graduated from Harvard University with a BA in computer science.