Thursday, June 27, 2013

Team Foundation Server - Branching and Merging Demystified

I have been asked frequent times to explain how branching and merging work in Microsoft's Team Foundation Server (TFS). There is often a lot of uncertainty (even for developers) which surrounds the concepts of branching and merging. More often than not this doubt arises regardless of the repository solution being used.

It is my hope that some of the information (both my own experience and knowledge gathered collectively from the internet) I am about to detail will not only explain some of the fundamentals behind setting up a correct branching and merging strategy, but will also help provide some guidelines for you to apply in your own solution.

Please note that most of the concepts below can really be applied to any repository solution. But, the information I am providing is truly in relation to Microsoft's TFS.

Also, this is not light reading by any means. So don't expect a TLDR version. (That is too long, didn't read for you non-geek folks)

What is a branch?

A branch is a set of files in a different part of your repository that allow two or more teams of people to work on the same part of a project in parallel. When you create a branch, it does not actually create new copies of all those files on the server. It instead creates a record pointing to them. This is the primary reason why creating a branch with a vast number of files can be done so quickly.

An example...

Imagine that two developers have modified the same line of code in the same file across two different branches. The person responsible for merging those two branches together must make a decision as to which change will "take". In some cases, this will result in a hybrid merge, where the combination of the intent behind the two changes requires a different result than the text in those version being merged.

The branch that contains your changes that are being merged is commonly referred too as the source branch. The branch you want to merge your changes into is referred too as the target branch. The common ancestor between them is often referred too as the base version. When you do a merge, you can select a range of changes in the source branch to merge into the target branch.

What happens if there is a conflict?

If the same file has been modified in both the source and the target branches, TFS will mark this as a conflict. If the file has been modified in both branches, it will always be flagged as a conflict, even if the changes are to completely different parts of the file.

For certain changes, TFS can make an educated guess about what should happen. This is often referred to as an automerge. When the automerge occurs, TFS will allow you to review the changes to make sure the desired merge behavior has been performed.

Another example...

Two different bug fixes are being implemented. You probably want both changes. However, if the two corrections were just fixing the same bug in two different ways, you may want to consider a different avenue. In most cases, where your development team has good communication, the changes are a result of different changes being made to the file.

In the example above, automerge can do a great job of merging the changes together, making it easy for the developer to validate those changes.

There are many scenarios that may require more manual conflict resolution. In that the person performing the merge is responsible for deciding the correct outcome based on their understanding of the code and communicating with the team members who made the conflicting changes to understand their intent.

What is a branch relationship?

When you create a branch, the relationship between those branches form a standard hierarchy. In that the source of the branch is the parent, and the target of the branch is the child. Children who have the same parent are often referred too as sibling branches.

Baseless Merging

A baseless merge is when two arbitrary branches merge without reference to the base version. This is sometimes a necessity when the source code was originally imported in a flat structure without the branch relationship being in place, or if you want to merge between a branch and another branch not a direct parent or child.

Since there is not a base version to compare against, the chance that the server will detect conflicts between the two branches increases significantly.

Yet another example...

If a file is renamed in one branch and modified in another, it will show up as a file delete conflicting with the file edit, and then a file add that gives no hint as to which file it was related to, or that there was an edit intended for this file in the other branch.

For this reason, among others, baseless merges are frowned upon in TFS. A warning will appear whenever a baseless merge operation is selected in Microsoft's Visual Studio (VS). Standard merging through the VS or Eclipse clients are the best practice and recommended method. Where only one branch up or down (a parent to a child or vice versa) is allowed. Your branching strategy and model should attempt to constrain most merges between parent and child branches to minimize the amount of baseless merging required.

What is Forward or Reverse Integration?

When you merge code from a parent branch to the child branch, it is considered forward integration (FI). When you merge code from a child branch to the parent branch, it is referred to as reverse integration (RI). If you are doing feature development in branches, it is common to use FI at various points during the feature development cycle, and then to use RI at the end.

What are some common Branching Strategies?

Back in February I posted a very brief description of a Branch Per Release versus a Code-Promotion Branching strategies. In it I touched on the flaws and benefits surrounding those branching strategies. This time I will spend more time detailing some common strategies, why we have them, and how they are used.

Each strategy has its pros and cons. However, just as every strategy in chess is made up of simple moves, every branching strategy uses one or more combinations of some basic techniques. When developing your own strategy, you should take into account your own needs.

When looking at any branching strategy, you should keep in mind the following:
  • Always chose simplicity over control.
  • Branch only when you really need to.
  • If you ever want to merge two branches together, keep the time between merges to a minimum.
  • Ensure that your branch hierarchy matches the path you intend your merges to follow.

No Branching

This may sound a bit counterintuitive, but often the simplest technique is to not branch at all. This should always be your default position when it comes to branching and merging scenarios. Do not branch unless you need to. Keep in mind that you are using a version control tool that tracks changes over time. You can branch at any point in the future from a point in the past. This gives you the flexibility of not having to create a branch "just in case". Again, create branches only when you really need too.

There are measures you can take, however, to ready yourself to make branching easier in the future if you decide you need a branch. Some TFS specifics incoming.
  1. When you first create your team project in TFS create a folder called Main.
  2. Check it in.
  3. Right-click the folder in Source Control Explorer and select Branching and Merging | Convert to Branch.
This gives you an easy point to branch from in the future, and it also makes you think about the areas of your source code that live in the same branch together. This will also help you should you ever decided to do a branch.

No branching? In reality you have only one branch of code to work in for all teams. This technique works great when you have small teams working on the same codebase, developing features for the same version of the application and supporting only one version of the application at a time. At some point, no matter how complex your branching strategy evolves to support your business needs, you need at least one stable area that is your main (also known as the mainline or trunk) code. This is a stable version of the code that will be used for the build that you will create, test, and deploy.

During stabilization and test periods, while you are getting ready to release, it may be necessary for the team to not check in any new code (essentially a code freeze). With smaller teams working on a single version, this does not impact productivity, because the people who would be checking in code are busy testing to ensure that the application works, as well was getting ready for deployment.

With this in mind, there is no way to start work on something new before the final build of the current version has been executed. The code freeze period can, therefore, be very disruptive because there is no way to start work on the next version until the current one has shipped. It is these times when other strategies become useful for teams of any size, even a team of one.

Branch per Release

For development teams that employ branching, the most common branching technique is branch per release. With this technique, the branches contain the code for a particular release version.

Development starts in the Main branch. After a period of time, when the software is considered ready for testing, a branch is made to the V1.0 branch under Releases/Staging or Releases/Test. At which time the test build is pushed into the staging or test environments. Again, after a period of time, when the software is considered ready, a branch is made to the Releases/Production location with the final production build getting a label to indicate which versions of which files were in that version. Meanwhile, development of new features for version 2 (V2 in this example) continues on the Main branch.

A scenario to consider...

Say some minor changes are requested or bugs discovered in production, and a small modification is necessary to reflect how the business needs something to work. However, the development group does not want to include all the work for V2 that has been going on in the Main branch. Therefore, these changes are made to the V1 branch, and builds are taken from it. Any bug fixes or changes that must also be included in the next version (to ensure the bug is still fixed in that next release) are merged back (reverse-integrated or RI) into the Main branch. If a bug fix was already in the Main branch, but needed to go into V1, it might be merged (forward-integrated or FI) into it. At a certain point, the build is determined to be good, and a new V1.1 build is performed from the V1 branch and deployed to production.

During this time, development on the next version can continue uninterrupted without the risk of features being added into the code accidentally and making their way into the V1.X set of releases. At a certain point, suppose it is decided that V2.0 is ready to go out the door, the mainline of code is branched again to the V2 branch, and the V2.0 build is created from it. Work can continue on the next release in the Main branch, but it is now easy to support and release new builds to customers running on any version that you want to keep supporting.

Branch per release is very easy to understand and allows many versions to be supported at a time. It can be extended to multiple supported releases very easily, and it makes it trivial to view and compare the code that was included in a particular version of the application. Branch per release is well-suited to organization that must support multiple versions of code in parallel -- such as a typical software vendor.

However, for a particular release, there is still no more parallelism of development than in a standard "no branching" strategy. Also, if the organization must support only two or three versions at a time (the latest version, the previous version, and, perhaps, the version currently being tested by the business), this model can lead to a number of stale branches. While having lots of old, stale branches doesn't impact performance of Team Foundation Server, or even cause any significant additional storage requirements, it can clutter the repository and make it difficult to find the versions you are interested in -- especially if the organization frequently releases new versions. If this is the case, you may want to move old branches into an Archive folder, and have only the active branches (the versions that the development team are currently supporting) in the Releases folder.

Code-Promotion Branching

An alternative to branch per release is code-promotion branching (or, as it is sometimes referred to, promotion-level branching). This technique involves splitting the branches into different promotion levels.

As before, development starts with just the Main branch. When the development team is ready to test the application with the business, it pushes the code to the Test branch (also often called the QA or Staging branch). While the code is being tested, work on the next development version is carried out in the Main branch. If any fixes are required during testing, they can be developed on the Test branch and merged back in to the Main branch for inclusion in the next release. Once the code is ready to release, it is branched again from Test to Prod. When the next release cycle comes along, the same is done again. Changes are merged from Main to Test, and then Test to Prod.

Code-promotion branching works well in environments that have a single version running in production but have long test-validation cycles that do not involve all of the development team. This allows development to continue on the next version of Main while test and stabilization of the build occurs in the Test branch. It also makes it trivial for the development team to look at the code currently on each system. Finally, a branch structure makes it easy to create an automated build and deployment system using Team Foundation Build that can automatically update the QA/Test/Staging environment as code is pushed to the QA branch.

Feature Branching

The previous branching strategies all involve a single team working on the system in its entirety as it works towards a release. All features for that release are developed in parallel, and the build can be deployed only when all features in flight have been completed and tested. However, in large systems, or systems that require very frequent deployment (such as a large commercial website), feature branching (or branch per feature) can be useful.

Feature branching is used when a project requires multiple teams to be working on the same codebase in parallel. For example, you could have four feature teams working in separate branches (lets call it F1, F2, F3, and F4). Note that in a real branching structure, the feature branches themselves would likely have meaningful names such as BroadbandSelling, TroubleTickets, or whatever shorthand is used by the project to refer to the feature that is under development. The Main branch is conserved "gold code", which means no active development goes on directly in this branch. However, a feature must be reverse-integrated into this branch for it to appear in the final build and for other teams to pick it up.

Initially, F1 is started with a branch from Main. But, while it is being developed, a second and third team start F2 and F3, respectively. At the end of development of the feature, F1 is merged back into the Main branch, and the F1 branch is deleted. Then, that team starts on feature f4. The next feature to finish is F3, followed by F2. At each point, once the feature is merged into the Main branch, a new version of the software is released to the public website. But only one version is ever supported at any time.

Feature branching allows for a large amount of parallel development. However, this comes at the cost of delaying the pain of integrating each team's changes until the feature is complete, and you are merging the feature branch back into the Main branch. For example, when merging the F2 branch, all changes and inevitable conflicts introduced by features F1, F2, F3, and F4 must be analyzed and resolved.

The longer a period of time that code is separated into branches, the more independent changes occur and, therefore, the greater the likelihood of merge conflicts. To minimize conflicts, and to reduce the amount of integration debt building up, you should do the following:
  • Keep the life of the feature short. Features should be as short as possible and should be merged back into the Main branch as soon as possible.
  • Take integrations from the Main branch regularly. When F1 is merged back into Main, the feature teams still working on their features should merge those changes into their feature branches at the earliest possible convenient point.
  • Organize features into discrete areas in the codebase. Having the code related to a particular feature in one area will reduce the amount of common code being edited in multiple branches and, therefore, reduce the risk of making conflicting changes during feature development. Often, the number of teams that can be working in parallel is defined by the number of discrete areas of code in the repository.
When using feature branching, the whole team does not necessarily have to be involved. For example, one or two developers might split off from the rest of the team to go work on a well-isolated feature when there is a risk of the move not being possible (they are working on a proof of concept), or when it is decided that the current release should not wait for that particular feature to be implemented.

What does this all mean?

In conclusion, you can now see the benefits of having a solid branching strategy before utilizing your repository tools. Be it TFS or some other solution. TFS not only allows for some complex software configuration management scenarios, but also provides the tooling to help understand what is happening with changes in your repository