Just Another Technology Guy: git

Showing posts with label git. Show all posts

Monday, February 22, 2016

Finding buggy code using git history and a binary search

Some bugs in software are easy to understand. Things like a null pointer exception are obvious and don't need a lot of context. But some bugs are more difficult. Some bugs just aren't obvious until you have the context of the change that was made that introduced the bug.

What I like to do when I'm hunting a bug down is to apply a binary search algorithm to my git history. This has two benefits:

You're able to find the exact commit a bug is introduced in
You've narrowed down the surface area of the code the bug could possibly be in

#1 above is important as it gives you the ability to reverse apply a patch and see if the bug goes away. If it works without breaking anything that's come to rely on that code it means that you've got a "quick fix" that can be issued to relieve the problem while you find a longer term solution.

Unless your team is a team of developers that goes dark #2 above is means you have found the handful of lines of code that contain the problem and the context of the code dependency. Simply seeing the inter-class dependencies may be enough to make an ambiguous bug make more sense. It's also a good place to look for violations of SOLID which make code less 'ible (maintainable, scalable, flexible, etc...).

So how do you accomplish this?

1. Find the starting point

Start with the git commit from your last release. Create a build from that and see if the bug reproduces. If not you've found your starting point. If it does reproduce and has been in the system a while, keeping going back release by release trying to find the commit of a build that does not contain the bug.

Hopefully you're tagging your releases or creating release branches.'

2. Use now as your ending point

3. Pick the mid point in time between your starting point and ending point

Using git log you:

Get the hash of the commit closest to the mid point in time.
Create a build using that commit
Check to see if the bug reproduces

If the bug does reproduce you've got a new starting point: Repeat step 3 with the new starting point.

If the bug does not reproduce: Repeat step 3 with the new ending point.

Using this binary search you'll find the source commit of your bug fairly quickly (limited only by your build and validation time) and you'll have the context of the commit to fully understand the change that introduced the bug.

Monday, August 25, 2014

Backing up using git

Over the years I've found that as I move from machine to machine, whether adding a new machine to my collection or replacing an old one, it's a pain to constantly copy files from one place to the next. I've also found that the backup process tends to be error prone as I either forget to back something up, don't backup often enough because of having to plug in the extra drive, needing to retrieve something that I've deleted, or I accidentally overwrite something in the backup.

There have been several projects, both open source and commercial, that help solve these problems by doing incremental backup. But I've found that they tend to be platform specific or not supported on the wide variety of machines I have in my mix. Most of them also still require you to use an external drive in order to protect against harddrive failure.

A few years ago I switched to using git as my backup mechanism. It doesn't require me to hassle with an external drive while still solving the problem of protecting against disk failure. It's supported on all major platforms. There are cloud hosting options as well as options to host locally. It provides incremental backup. I can easily exclude things from backup as well as retrieve deleted items.

Initial creation (one-time steps)

Create repository to back up to on backup machine

$ ssh your.backup.machine
$ mkdir -p /path/to/backup/location/my_backup_repository.git
$ cd /path/to/backup/location/my_backup_repository.git
$ git --bare init
$ exit

Add repository location on machine to backup

$ cd /path/to/backup
$ git init
$ git remote add origin USER@your.backup.machine:/path/to/backup/location/my_backup_repository.git
$ echo "backup using git" > README
$ git add README
$ git commit -m "Initial Commit"
$ git push origin master

Performing the backup

It's important that BEFORE you back anything up, you encrypt anything that you don't want backed up in the clear. I like to encrypt things using openssl because of it's wide range of platform availability.

Make sure that you .gitignore anything you don't want backed up. Often I backup my OS X ~/Documents folder. But this is the default location for some application configuration that I don't want backed up. It's important that you add these folders to your .gitignore so you don't unnecessarily back them up and use up unwanted space.

On backup machine

$ cd /path/to/backup

$ # encrypt any files you want encrypted

$ git status

$ git add .
$ git commit -m "backup $(date +'%Y-%m-%d_%H%M')"
$ git push

You now have a secondary backup of your important files. But even better you have a simple way to restore from a backup.

Restoring a backup

$ cd /path/to/restore/to
$ git clone USER@your.backup.machine:/path/to/backup/location/my_backup_repository.git ./

Done!

Monday, December 16, 2013

Migrating from Subversion to Git

Today I thought I would pass along a helpful version control migration tip. I've been writing software both personally and professionally for 14 years. As the years go by the way I use version control change and every couple of years I end up migrating from version control system to another. A few years ago I migrated from Subversion to Git.

I decided to move to Git for a couple of reasons, the biggest of which, is that Git is a distributed source system. What this means is that when you checkout a repo you have a full version of the repo on that machine. This allows you to work completely disconnected from any remote server. This becomes extremely useful if you work in a coffee shop, on an airplane, or somewhere that has no WiFi.

One thing that was pretty important to me was that I was able to keep my SVN history intact. I don't believe in checking in commented out code or leaving classes around that aren't being used. It clutters the system and makes the code more error prone. So I often heavily rely on my version control history to help me remember how or why I've done something that I've removed from the system.

So here are some pretty simple instructions that will allow you to migrate from SVN to Git and maintain your commit history. Make sure you have both git and git-svn installed on your machine before attempting these instructions.

This first step in the migration process is to go to where you have the SVN repo checked out that you want to migrate.

$ cd /path/to/svn/repo

The next step is to find, on the remote repository, where the SVN database with all revisions is hosted.

$ svn info

The output of that command will give you the url of the remote repository which I will later refer to as /path/to/remote/svn/repository.

The next thing we have to do is create the directory for our git repository.

$ mkdir -p /path/to/local/git/repository

$ cd /path/to/local/git/repository

Now that we have a place to put our new Git repository we need to initialize the repository from the SVN repository

$ git svn init /path/to/remote/svn/repository

After we've initialized the repository we can fetch all the revisions from the remote SVN repository.

$ git svn fetch

At this point you have now migrated your SVN repository to Git. Pretty easy huh? Checkout your commit history while you're here.

$ git log --graph

If you're using a remote server as your canonical source for your Git repositories (like Github) you should also push the local Git repository to the remote one on the master branch. In order to do that you need to first add the remote Git repository as a remote called origin.

$ git remote add origin git@your.remote.server:username/repository.git

Now that your local Git repository knows about the remote you can simply push your changes to the remotes master branch.

$ git push origin master

Monday, November 25, 2013

Splitting a Git repository into multiple repositories

Today I thought I would pass along a helpful code organization tip. Occasionally I've run across the need to split an existing git repository into multiple repositories and have wanted to keep the histories intact for each split out repository. One common scenario where this arises is when you want to refactor out a piece of code or submodule from an existing project into it's own library for re-use.

Splitting an existing git repository into multiple repositories is actually pretty straight forward if you use git's subtree command. A git subtree is simply a sub folder within the existing repository that you can commit, branch, and merge. The easiest way to explain how to do this is with an example.

Let's pretend we have a project called MyProj that is really made up of two sub-projects ProjA and ProjB that we want to split into their own repositories. The first thing we need to do is make sure we're in the directory of the git repository that we want to split up.

$ cd /path/to/MyProj

I like to remove the origin remote so I don't accidentally push something to origin. This allows me to always start over if I mess something up.

$ git remote rm origin

Now we can split ProjA and ProjB into their own subtrees. We're going to use the -b argument which tells git to create a new branch for the split subtree with it's own complete history

$ git subtree split -P relative/folder/for/ProjA -b ProjA
$ git subtree split -P relative/folder/for/ProjB -b ProjB

For me the easiest thing to do at this point is to create a new empty git repository for ProjA and ProjB where I can fetch the new branch, add my new remote repository, and then push it to the origin remote's master branch. This presupposes that you've created new empty remote repositories for ProjA and ProjB.

Here I'm going to create the new local repository for ProjA as a sibling folder to the original project. Creating the new ProjB repository is the exact same process.

$ cd ..
$ mkdir ProjARepo
$ cd ProjARepo

Before we do anything with the ProjA subtree we need to initialize our new empty git repository

$ git init

Now that we have an empty git repository we can fetch the ProjA branch from the origin MyProj repository.

$ git fetch ../MyProj ProjA
$ git checkout -b master FETCH_HEAD

The last thing we have to do is add the origin remote for the repository and push our changes to it's master branch.

$ git remote add origin git@github.com:ProjA.git
$ git push -u origin master

And there you have it. You now have separate repositories for ProjA and ProjB. At this point you can remove them from MyProj or remove MyProj alltogether if ProjA and ProjB were the only things in the original repository.