Monday, April 28, 2014

Continuous Delivery

Continuous Delivery is an important topic in the software world, and particularly the DevOps movement. A lot of the talk around continuous delivery centers around the technical side of achieving it. But there's another aspect that needs to be considered first before it can truly be achieved. That aspect is the cultural changes that need to happen within an organization in order to truly achieve continuous delivery. These cultural changes are not a pre-requisite for continuous delivery, but you should be aware that they do accompany it.

The follow are the five areas that I believe are required for any organization to be successful at continuous delivery/deployment. Most organizations won't be starting at ground zero but being aware of the cultural change that will come can help accelerate a move to and adoption of continuous delivery.

Cross Discipline Teams

Development and operations need to work together. Ideally, they're both represented on the same team. One of the main goals of continuous delivery is to identify defects earlier in the pipeline and to provide business value as early in the pipeline as possible. The best way to achieve this is to think about delivery from the beginning. The project team should be planning the deployment even before the code is written that they'll be deploying.

Raise visibility

For every change the project team should understand how the change can be measured. Here are a few simple questions that you can ask as part of the dev cycle:
  • What metrics are associated with this change and how?
  • What are the risks?
  • Is the performance profile expected to change?
  • How do we measure success for this change?
  • What are the signs that the change is not effective?

Build trust

Operations and development need to know that enabling the business is the #1 priority for BOTH units. No change should make it's way into the deployment pipeline unless it provides business value. If both units understand this then we have a shared understanding of the importance of each change and a shared desire to get that change out.

In order to build this trust the operations folks need to know and believe that the development folks will include them in feature discussions. Having operational discussions early and often in the conversation will help ensure successful deployment of the change.

On the flip side the development folks need to know that the operation folks are including them infrastructure discussions early and often as well. Again ideally, these folks work on the same team and these discussions are part of the normal standup discussions.


Minimize risks 

We need to understand that failure happens. It's how you deal with failure that leads to success or failure for the change. It's naive to believe that failure can be prevented 100% of the time.  Because we know it can't our goal is to minimize the risk of any change should it fail.

In order to minimize the risk to the overall system for any given change should focus on reducing the overall surface area of the change. The smaller the change the less risk there is of a failure associated with that change being catastrophic. The longer the time between releases the larger the risk of that change. The more code that's released with every change the larger the risk of failure or bugs.

In order to respond to change effectively we need to build a culture that responds to failure well instead of trying to prevent it. The easiest way to respond to failure well is to practice it. Build failure tolerance into your system. Understand how your system responds to failure and identify the expected and appropriate user experience when failure happens.


Shared responsibility

We all succeed or fail together. It's not possible for operations to succeed and development to fail (or vice versa) and have the product succeed. Both disciplines need to succeed in order for the business to truly succeed.

Once we've grasped the concept that we succeed or fail together we can create a culture where it's okay to talk about and address our mistakes. We try to avoid finger pointing and are able to do this because people own up to their mistakes and take responsibility for them. They participate in postmortems where they create action items to address the failure and prevent failure of that kind in the future.

Having shared responsibility also means that we treat rollback as a last resort. When a problem arises with a particular change we stop the line. We work together to figure out the problem when it happens and we work together to fix the problem before we move on to making additional changes in the system.

Sharing responsibility means that both the operations folks and the development folks should be on-call for incidents. Nothing provides better incentive to fix a problem than getting woken up in the middle of the night or interrupted on a Saturday.

No comments:

Post a Comment