OK, we know that automation brings several benefits and is a necessity for modern IT infrastructure. But, as manual operations give way to decisions and actions performed by machines, a number of dangers arise.
New Types of Errors
Automation introduces new types of errors. Maybe the most important is that automation introduces new types of errors, that is errors which have never been encountered so far in a specific environment.
Such errors are caused by existing technical or process ineffeciencies which have been underestimated, because they have been easily compensated by humans so far. Humans can quickly fix minor unexpected “glitches” without much thinking or elaboration. This is something we lose with automation. As we minimize the human factor, the consequences of seemingly unimportant issues can be multiplied by automation and become really important.
One example is “IT Security”. Some parts of security depend on the operator and employee behavior being non-malicious, and this is addressed with legal bindings such as NDA agreements. However, for machines legal bindigs simply do not exist. A machine is like an employee you cannot threaten with a contract or a salary, or even of being fired or prosecuted. Combine this with the fact that we usually grant full administrative access to this “machine-employee” (for example, running scripts with administrative rights, or allowing the tools to create or configure components in real-time). Enhancing the security around the automation tools, and thinking of consequences not seen so far, becomes quite important with automation.
- Quick reaction multiplies minor errors – Quick decisions have many advantages, but one obvious disadvantages is that there is no time to assess the consequences and correct mistakes. An incorrect reaction to an event can trigger an avalanche of undesired results. The need to predict in advance the behaviour of an automation tool in all circumstances is important, as there will be no chance for corrections in real-time.
In our non-IT lives, we know that fast is not necessarily good (take fast food for example). In our lives we have the undeniable notion that “a fast computer is better that a slow computer”. This would be absolutely true only if there were no errors. If we assume, for a moment, that we have a computer which makes errors all the time, is still fast better than slow?
Unforunately, no technology is perfect. There are always errors. So, making errors and taking quick decisions poses risks by itself. Minor errors can quickly escalate to serious situations, without humans having the opportunity to intervene and stop a potential vicious cycle. Think of reaction time as the hysteresis in a closed loop system; if reactions are very fast (low hysteresis), any system can quickly move out of its stability zone.
- Lack of decision transparency – System operation is obscured behind an automation layer. Automation makes decisions and can modify the infrastructure parameters dynamically, in a way which is not directly transparent to humans. Will all automation tools, decisions have to be justified and actions logged so that it is possible to trace back the chain of events. This can help to identify wrong automated decisions or mistakes made during the design, the configuration or the training of an automated tool. In Machine Learning today, the lack of transparency is considered as a major disadvantage. With more transparency, automation can become more accountable and trusted more.
- Undocumented and complex IT architecture – Lack of documentation is always a catalyst of risks. As with the 1st point mentioned above, humans are the best compensation for the lack of documentation and mature procedures. But, with automation, having a good and documented architecture becomes more important, especially in complex and dynamic environments. Introducing automation in an environment with many undocumented or gray spots of how services and components communicate with each other, will only make things worse.
How to mitigate the risks
- Follow a holistic approach with parallel improvement in all 3 automation tiers: Automate low-level tasks, but not forget to automate the IT services and the IT processes to see benefits, otherwise you are creating silos.
- Recognize that automation is not a matter of buing and installing new plug-and-play tools. Automation requires effort and cost. It is a serious investment. You need specialized people to plan and implement it correctly in a particular environment.
- Recognize that automation is not a one-time activity. Avoiding automation errors and realizing benefits requires people working and improving automation continuously.
- Review the entire infrastructure architecture with automation in mind. The more complex infrastructure is from an operational perspective, the more difficult it is to introduce and see benefits. If infrastructure services are provided by third-party suppliers (such as cloud providers), the interactions should be made to work without human intervention.
False expectations from Automation
Below are commonly asked questions about IT automation:
- Will automation lower delivery times? The delivery time is the characteristic of an IT process, according to ITIL. Automating just one part of the process, with a new tool or a script, will not improve the delivery time. You need to automate the whole process, across all three tiers to maximize the results. Example: creating a new virtual server in minutes does not help much, if the quality acceptance tests are still manual, or if some manager who is always travelling, needs to provide an approval to deliver the server.
- Can automation reduce operational costs? Yes it can, but only if you look at it as an investment. It needs effort and spending in the short-term, and it needs transformation at an organizational level. With good planning, preparation and proper implementation it can bring mid and long-term returns.
- Is automation threatening IT jobs? By definition, automation targets reducing the human workforce in operations. As with all inventions in human history, humans will be redirected to other types of work, hopefully more advanced. Historically, during such shifts of human workforce, there are people who benefit by accumulating power and people who suffer from the transition. It is a challenge for our economy and society to prepare correctly and maintain public control and democratic rules to avoid inequalities and unemployment.