Monday, May 18, 2015

When Not To Refactor

Refactoring software is a crucial part of extending the life of software. Refactoring contributes to enhancing the maintainability of the software by incrementally improving the design, readability and modularity of the components. But not much has been said about when not to refactor software.

Don't refactor code unless you need to change the code for a business reason.


One of the common mistakes I often see with regards to refactoring is when people refactor code that doesn't need it under the guise of making it better. The argument usually goes something like "this needs to be more abstract", "I wrote this code a long time ago and it is crappy", "this code is too complex" or something along those lines.

You should only refactor code when you are already in the code to make a change to support the business. That may sound counter intuitive but one of the worst things we can do is change code, however crappy, unreadable or complex that doesn't have a reason to change.

Valid business reasons to change code include (but are not limited to):

  • Adding new functionality
  • Extending existing functionality.
  • Making measurable performance improvements.
  • Adding a layer of abstraction in order to support a new use case.
  • Modularizing a particular object so that it can be reused in another part of the system

Adding new functionality or Extending existing functionality


This is where the boyscout rule comes into play. If you are in already in the code for another reason then you should clean up the code even if you didn't make the mess.

Making measurable performance improvements


This one is probably self explanatory but it's important to note that performance improvements will usually require some level of refactoring. 

Adding a layer of abstraction in order to support a new use case


This is an important one to understand. Often people will over generalize code at the beginning. This leads to overly complex designs and less readable code. If we follow the rule of not creating a layer of abstraction until we have at least two or three use cases for the code then there will come a point when you need to refactor the code in order to provide a layer of abstraction that doesn't already exist.

Until that second or third use case comes about the code should not be generalized. You don't have enough information about future uses of the code to get the abstraction correct. You may get lucky and guess at the future abstraction but you don't want to run your business on guesses and luck.

Modularizing a particular object so that it can be reused in another part of the system


Code reuse is one of the most important tenets of object oriented programming. When we identify code that is not specific to a particular object or package AND is needed in some other part of the system we should refactor this code into it's own module. Its important to ONLY do this when the code is actually needed in another part of the system.

Don't refactor code without tests


In order to refactor code safely you should have unit and integration tests for the existing functionality. I would also argue that you should write tests for the new functionality as well before you refactor. This will help you to understand the proper way to refactor the code as it helps you define how the refactored code should be used from a consumers standpoint.

If the tests don't exist for the the existing functionality you should write them first before you start refactoring. This helps ensure that you don't cause a new bug in the code or regress an old bug when refactoring.


Monday, May 11, 2015

The Engineers Cloud

In my previous post in this series I explained the aspect of The Cloud that I like to call The Consumers Cloud. I explained how The Consumers Cloud breaks down into data management services, social media, and streaming media. In this post I'll talk about the second aspect of The Cloud.

The Engineers Cloud


I call this type of Cloud use The Engineers Cloud because this aspect of The Cloud isn't something you as a consumer interact with directly. Instead, engineers are taking advantage of Cloud services to enhance how you interact with their content and services.

What The Cloud Means To Engineers


While there are almost no limits to the things you can do in The Cloud from an engineers perspective, there are two main areas I'd like to focus on here. The first is as means of distributed computing. The second is better reliability of their services.

Distributed Computing


The Engineers Cloud allows you to take advantage of the virtual limitlessness of server resources in The Cloud. In the days before The Cloud server resources were finite. You only had the amount of resources you could afford to keep running all the time. These resources lived in data centers.

Companies like Facebook, Netflix, Amazon, and Google use The Cloud to do a variety of tasks that would be nearly impossible with a fixed set of resources. The ability to spin up an (almost) unlimited amount of servers running your services means that you can parallelize computing to a degree that was not possible a decade ago.

Some examples of engineers using The Cloud as a means of distributed computing:

  • NASA's Jet Propulsion Laboratory (JPL) uses the cloud to capture and store images and metadata collected from the Mars Exploration Rover and the Mars Science Laboratory missions. They operate the mars.jpl.nasa.gov website out of The Cloud without building this infrastructure themselves
  • Accuweather is using the cloud to serve over 4 billion requests a day.
  • Evite is using the cloud to send more than 250 million party invitations each year.
  • Netflix is using the cloud to stream videos to it's online streaming customers. It's able to take advantage of The Cloud's distributing computing to analyze a very large amount of data and turn them into recommendations and personalization.

Better Reliability


This is going to sound counter intuitive, but one of the reasons that The Cloud is more reliable is that when planning to put your software and services in The Cloud you have to plan for failure. The best example of this in practice that I'm aware of is Netflix's Simian Army.

The Cloud allows you to plan for failure and provide better reliability because it allows for:
  • Redundency through geo-distributing services.
  • Redundency through clustering your services.
  • Reduced latency through DNS services.

Redundency through geo-distributing services


Most Cloud providers offer the ability to deploy your software and services to many different regions around the world. This allows you to keep your software and services running even if there are data center outages in a specific region like the Northeast blackout of 2003 by having your services fallback from one geo-graphic region to another if the initial region is down.

Redundancy through clustering your services


Most Cloud providers give you the ability to cluster your services behind some sort of virtual load balancer. Most of these load balancers will automatically stop sending traffic to a machine that is not responding or throwing a particular error for a predefined URL on the machine.

While clustering your services behind a load balancer allows you to remove or replace a machine that isn't functioning properly it also is the primary means by which you can quickly scale up your service to meet demand. If your service is experiencing a higher than expected load you can spin up a new server in your cluster and scale proportionally with your traffic. 

Reduced latency through DNS services


DNS is how the internet turns the name of service we go to into the address that the service resides at. For example when you type http://paul.oremland.net into your browser your computer is doing a DNS lookup of paul.oremland.net and being given an IP address. It then uses that IP address to talk directly with the service.

Many Cloud providers allow you to virtually control DNS based on characteristics of the request. Some services allow you to route traffic based on latency to or load on the receiving services. This allows you to distribute your traffic more evenly and provide a better customer experience. Instead of simply pointing your users at a specific machine, you can point them to different machines based on the current state of your system and what gives the users a great experience.

Monday, May 4, 2015

The Consumers Cloud

In my previous post in this series I gave a basic overview of what The Cloud is, its benefits, its high level infrastructure, and why you should care about it. In this post I will go into more detail of what I call The Consumers Cloud.

The Consumers Cloud


Often when people talk about The Cloud what they're really talking about are the applications that are built on top of, and enabled by the infrastructure of The Cloud. At a high level those applications are what I would call The Consumers Cloud.

The main purpose of The Consumers Cloud is to provide distributed access to your data and the services that provide that data. Your data is typically comprised of images, videos, and documents but really it can be any files you need to put or get from a variety of machines in a variety of locations.

In The Consumers Cloud you don't interact with The Cloud directly. Instead you interact with services that are built in The Cloud. Those services are the means by which your data is moved around and presented to you on a variety of devices (mobile, desktop, and etc).

The Consumers Cloud breaks down into three high level areas. The first is data management services, the second is social media, and the third is streaming media.

Your Hard Drive Is Everywhere


You can think of The Consumers Cloud as your hard drive that is everywhere. Services like Dropbox, Amazon Cloud Drive, Microsoft OneDrive, and Apple iCloud all provided the ability to store you files on their servers in order for them to be accessible from anywhere on almost any machine. They take your data and using very sophisticated algorithms distribute that data in such a way as to make reading and writing it from anywhere in the world possible and fast.

Nowadays when you purchase a mobile phone it usually comes with some sort of Cloud backup. That means that the pictures and videos you capture on your phone are uploaded to one of these services and made accessible to you from your many different devices. You can share this media with others much easier since it isn't stored locally on your phone, tablet, laptop, or desktop.

Your Social Life Is Nowhere


The second high level area that The Consumers Cloud breaks into is social networking. Facebook, Twitter, Instagram, Pintrest, and etc all exist in The Cloud. Sometimes these social networks need very few servers to serve the traffic of their users. Sometimes they need thousands of servers to meet their peak demand. Without The Cloud they wouldn't be able to efficiently scale down or up to handle the large volume of traffic they get in a cost effective way. The Cloud also allows them to distribute data and distribute load in such a way as to optimize connecting their users to servers that are closer to them or that have less load at any given time. The Cloud allows them to handle the ebbs and flows of their traffic patterns so that their services are always there.

Your Entertainment Is Just There


The last high level area that The Consumer Cloud breaks into is streaming media. Good examples of this are Amazon Prime, Netflix, and YouTube. All these services are major players in online entertainment business and all of them rely on The Cloud as the backbone of their services. They use The Cloud to optimize the distribution of media so that it can be accessed by millions of people without having millions of people each hitting their data stores for every piece of media every time.

The Consumers Cloud is about you, your data, and your online life. In my next post in this series I'll detail The Engineers Cloud.

Monday, April 27, 2015

A brief overview of The Cloud

In my first post on The Cloud I explained how The Cloud was born. In this post I want to give a high level overview of The Cloud and explain why you should care about it.

At a very high level The Cloud can be broken up into two over lapping but distinct groups denoted by their use cases. I'll refer to the first group as The Consumers Cloud and the second as The Engineers Cloud. My next post in this series will go into more detail about The Consumers Cloud, while my last post will go over The Engineers Cloud.

The commonality between both groups is that The Cloud is a set of servers distributed in multiple regions throughout the world. Having multiple servers is what allows The Cloud to handle a lot of data. Having those servers distributed regionally around the world allows The Cloud to be fast by reducing the distance between you and the data that is stored in The Cloud.

The Infrastructure of The Cloud


Cloud providers like Amazon AWS, Microsoft Azure, Google Cloud Computer, and etc provide a set of services that abstract interacting with their servers so that the software their customers build don't have to worry about the details of each individual machine.

Most Cloud providers take concepts like file storage, complex computation, geographic regionalization, and security and make them scalable with demand. If a service has high demand it can have lots of resources to meet that demand. If the demand is low it can use less resources. This is very different than the data center model which required a fixed number of servers. If you needed to scale your service you had to purchase additional hardware. When your demand was low those extra servers were sitting around idle. One of the goals for Cloud providers is to optimize use. This allows customers to pay for only what they use when they need it and to optimize for their particular usage pattern.

Why you should care about The Cloud


The Cloud is what enables so much of what we've come to rely on in our every day lives. When you purchase music from one of the many streaming services, you're investing in The Cloud. When you watch Netflix or Amazon Prime, you're investing in The Cloud.  In fact, you wouldn't have applications like Dropbox, Instagram, Facebook, iTunes, YouTube or etc without The Cloud.

Monday, April 20, 2015

How The Cloud Was Born

The Cloud has become a phrase with so much meaning that it has become meaningless. Very much like what Web 2.0 was in the early 2000's. Often two people talking about The Cloud can be talking about two completely different things.

This series is not meant to teach you how to use The Cloud or how to get your start-up going on The Cloud. The purpose of this series is to provide a brief overview of some of the typical uses and meanings of The Cloud to help make sense of how it fits into your everyday life.

I'd like to start this series on The Cloud by explaining at a high level how The Cloud was born. I'm only going to deal with the concepts of The Cloud and not on specific milestones and companies that contributed to the birth of The Cloud.

The Data Center


The Data Center can be thought of as the predecessor of The Cloud. I say predecessor because for most The Cloud replaced the traditional data center. But predecessor is really a misnomer because The Cloud couldn't exist with the data center. In fact The Cloud is a series of data centers interconnected with a set of services that abstract interacting with the servers and services running in them. The Cloud is really just a commoditization of the traditional data center.

Running a data center meant that your company had to have more than one core competency. In addition to the core competency of building whatever it is that your company built you also had to have expertise in infrastructure, operations, and network engineering (at a minimum).

Infrastructure


Running a data center meant you had to have experts on your payroll that understood server hardware. Running your software and services in a data center meant you either owned the data center yourself or you were renting space in someone else's data center (more likely the case). In either case your primary goal was making sure the servers stayed up and running. Your secondary goal was to try to reduce your capital expense and increase the efficiency of your data center by purchasing less and running more on what you purchased.

Operations


IT Operations specializes in the deployment and maintenance of the software and services you run. In the data center world an IT Ops engineer will plan and execute the deployment of software either in the traditional model with a hand-off from software engineering to operations or with the now more common model of DevOps.

The Cloud doesn't totally negate the role of operations but instead augments the software engineers role to also include deployment and maintenance of the software. This is usually done using the Continuous Delivery model of software development.

Network Engineering


Running a data center also required your company to have expertise in network engineering. Network engineers are responsible for the communications infrastructure within the data center. They manage how data gets in and out of the data center as well as between servers and services in the data center. They maintain the physical and virtual network infrastructure.

And Then Came The Cloud


As companies focused more and more on the delivery of their own software a market emerged where companies provided the ability for you to run your software and services on their servers and offloading the management your IT infrastructure, operations, and networks to them. Thus The Cloud was born.

Monday, April 13, 2015

Is it a job or a career?

Every so often I'll be having a conversation with someone who is not in the technology industry and the conversation will turn to the fact that people in the software industry are "always on". They note that we don't work a typical 9-5, that we're often working nights and weekends either checking email, coordinating with someone about a project we're working on, or simply writing some code.

I would venture to guess, at least from the engineers perspective, that this has a little to do with the fact that a large number of software engineers are in the industry because it was a hobby before it was a job. A lot of us decided to write software as our day job simply because we could get paid for doing something we were already doing in our free time. But that's tangential to the "always on" appearance we seem to have.

I'd like to posit that it's not something unique to the software industry but points to the difference between a job and a career. In fact I would argue that the difference between a job and a career has nothing to do with what your role is or what title is currently listed on your business card (or if you have a business card at all).

The dictionary definitions of job and career don't seem to vary very much. A job is a place you get paid for work you perform. A career is an occupation you've had for an extended period of your life. Really, you could say that the dictionary defines a career as a job that you've had for a long time. But I think those definitions miss the point. There's more to a career in my opinion, or at least a successful one.

Many people have jobs that they go to day in a day out for the majority of their lives. They work long hours, nights, and/or weekends. They stay in these jobs for many many years earning a paycheck and checking out when they're not in the office. This may be a very satisfying job but to me this is not a career.

To me, a career starts with a job that you want to be at. Not just because that job pays or has the ability for you to "turn off" when you get home. Instead, it's a job that gives you an opportunity to grow both as a person and in your field. A job that stretches you both mentally and (if you're lucky) physically. A career is challenging, causing you to navigate uncharted territory. A career is something you invest in and at the end have been changed by. 

I don't think a career is about whether you're "always on" or able to "check out" at the end of the day, it's about whether or not you're growing, changing, and being challenged.

If you're in a job that you thought was a career, what's stopping you from the career you want?

Monday, April 6, 2015

Software Estimation: How to estimate software accurately part 3

In this series on software estimation I defined an accurate estimate as something that will:
  • Give you an understanding of risk and unknowns.
  • Quantify the known work.
  • Be something that you can base a big decision on.
  • Be refined as new information becomes available.
  • Be a range with a 90% confidence interval.
In part 1 I explained how to account for risk and unknowns in your software estimation. In part 2 I explained how to quantify the work that is known. In the final post of this series I explain the last pieces needed to give an accurate software estimation.

Identifying Opportunity Cost


Let's say we estimate project X and determine that it will take 2 developers 2 months to complete. After estimating project Y we know that it will take 2 developers 1 month to complete (or 1 developer 2 months). Finally, when estimating project Z we determine that it will take 1 person 1 month. We now have a starting point from which to determine, based on cost to build and return on investment, if there is an opportunity loss associated with starting project X before projects Y or Z.

Refining As New Information Becomes Available


Most software estimates will start out as a range. Typically larger projects will have a wider range. This is because there are always things we know we don't know about the project as well as things that we don't know that we don't know.

It is easier to reduce ambiguity with information we know we don't know because we know where to start. As you practice software estimation you'll start to learn the correct questions to ask to identify the information you don't know you don't know.

Identifying new information should result in refining your previous estimate. The more ambiguity you are able to remove the more accurate your software estimate can be. The goal should be to continually narrow the range of the estimate.

More Accurate When Accompanied By A Confidence Interval


A confidence interval is a range of numbers from which we expect the real value to be contained. Having a good confidence interval allows us to determine the range of our estimate. This range can then be examined to validate our assumptions and to identify ambiguity.

For example, if I asked you to estimate how many marbles fit in a mason jar the actual number of marbles that would fit would be contained in your estimate. My 90% confidence interval may be 25 - 100 marbles. The actual number of marbles may be 87, in which case my interval contained the real value.

If you're interested in learning more about how to compute a confidence interval you should read How to Measure Anything by Douglas Hubbard.

Identify And Clarify Ambiguity


I'll wrap up this post with some questions you can ask to identify and clarify ambiguity in your software estimates.
  • What assumptions does your estimate rely on?
  • What are the risks associated with your estimate? Identify the things that could cause your estimate to be wrong.
  • What external dependencies does your project have? For example OS updates, product launches, planned outtages, SDK updates, etc.
  • What data do you need? Is it already available in a consumable format?
  • What services are required? Do they exist? Do they talk the correct protocol?
  • Are there any User Interface (UI) or User Experience (UX) dependencies that need to be resolved?