Machines, People and Processes
When we refer to IT infrastructure, we tend to forget that it is much more than hardware. In fact, hardware you use are the least of your concerns. You buy the best hardware of the most shinny technology, plug it to power, connect to network and it is up and running.
Is this enough? Definitely Not. If you look into the history of your problems, over say a year, you will usually see that the cause of downtime is not hardware. Everything is redundant, there is no Single-Point-Of-Failure etc., but still applications have downtime. And problems get more serious as the scale grows; everything runs just fine if you have 10 servers and 100 users, but when you have 5,000 servers, thousands of applications with updates every day, and users the whole 24 hours from different parts of the world, things start to become meshy.
To have complete control over infrastructure you have to consider 3 separate aspects (this triad is also known as the 3 P’s: Products-People-Processes):
- Technology (Machines) – hardware (servers, storage, network), system software, middleware
- People (Operations)- people operating the hardware, running maintenance, troubleshooting, fixing problems
- Processes (Business) – work flows which people should follow in daily work , what to check, who to contact and inform, how to report and escalate, how to document
Cause of downtime
So, what causes planned and unplanned downtime? Industry research has shown that 20% of downtime is planned, and 80% is unplanned. For unplanned downtime, most problems come from people (human errors) and processes (people don’t know what to do), and less from machines. I always like to show this diagram, it puts things in place:
Can downtime be prevented? Sure. This is where DevOps is suppossed to bring the revolution, and this is what this blog is about. DevOps is a combined operational model, which includes people from many teams, technologies and processes. Administrators, architects, developers work together in one team to ensure that services run non-stop in complex dynamic environments.
Has that really worked anywhere? Partially, yes. Full scale? I don’t know. It is more a destination to reach, and some are further ahead, some are just starting. We have good paradigms from NetFlix, Google, Amazon and other companies, but my the challenge is still there for complex landscapes with many applications or services.
Is DevOps enough to reach the non-stop IT target? Possibly not. Nobody claims that DevOps is perfection. It is one step ahead. When we get close to the destination, we might start realizing that there are more targets to reach, or better models.
Infrastructure under DevOps
So, how is infrastructure different in DevOps? DevOps does not refer to infrastructure directly. DevOps is a model to run IT, consequently it refers mostly to people and processes. Of course we can design and build infrastructure in a way which can support the DevOps model. To do that, we should focus in the problems DevOps tries to solve (presented above), and ask ourselves how we can build the infrastructure in a helpful way. We should also consider these problems in a “globalized”, “continuously changing” and “always available” scenario.
We will elaborate on this in further posts. In short, the infrastructure should allow deploying applications and services in a continuous and non-intrusive way. It should be capable of changing dynamically to address failures, to satisfy new demand from users, to allow for concurrent maintanance from operations, to predict issues before they threaten the system operation and involve humans in a proactive way. In the end, it should block human errors. And all that in a way which is transparent to the supported applications and services.