OK, we know that automation brings along several benefits and is a necessity for a modern IT infrastructure. But, as manual operations give way to logical machines and decisions and actions are taken without direct human involvement, a number of dangers arise.
Most importantly, automation introduces new types of errors.
New types of errors are errors which have not been encountered in the specific environment until now. Such errors are caused by underestimated existing technical or process errors. These errors were easily compensated by humans so far. This is actually a big advantage of human involvement, which we are losing with automation; humans can easily fix minor unexpected “glitches”. But, as we take the human factor out, seemingly unimportant errors will most probably get multiplied by automation and become real problems. Just to mention, that making machines capable of handling unexpected situations is something studied under AI (Artificial Intelligence), which is outside the purpose of this blog.
One example is “IT Security”. Security depends to some degree on the operators and employees behavior to be non-malicious, and this is enhanced with legal bindings such as NDA agreements. However, these threats simply do not exist for a machine. A machine is like an employee you cannot control with the salary, or threatening to fire him/her or prosecute legally. Combine this with the fact that we usually grant full administrative access to this “employee” (=automation tool) on the infrastructure (e.g. running scripts with administrative rights, or allowing the tools to reconfigure components in real-time). So enhancing security around the automation layer, and thinking of consequences not seen so far, becomes quite important with automation.
Factors multiplying the risks
- Quick reaction multiplies minor errors – Quick decisions have many advantages, but one obvious disadvantages is that there is no time to reconsider the consequences, correct mistakes, and make deviations. An imperfect reaction can trigger a tree of secondary events which may lead to an undesired outcome.
In our non-IT lives, we know that fast is not necessarily good (take fast food for example, or something justice, something more noble, and the known debate for fast judiciary procedures). With automation (and surely with AI) we will have to reconsider the false notion that we undeniably accept “a faster computer is better that a slower computer”. This would be true only if there were no errors.
Given that no technology or process is perfect, it becomes clear that making errors and taking decisions too quickly is a danger by itself. Minor errors can quickly escalate to serious unwanted situations, without humans having the opportunity to intervene and stop a potential vicious cycle of disaster. Think of reaction time as the hysteresis in a closed loop system; if reactions are fast (low hysteresis), a system, such as IT infrastructure, can quickly move outside its stability zone.
- Lack of decision transparency – System control and operation is obscured behind an automation layer. Automation makes decisions and changes the infrastructure dynamically in a way which is not transparent to humans. Decisions have to be justified and actions logged so that causality can be traced back. This will help identify wrong decisions and specific areas which have to be. There is research in progress on how to make, for example, machine learning more transparent. With transparency, automation will become more accountable than it is today. Humans will be in position to observe and understand the chain of events and decisions in an automated system and make corrections.
- Unmapped complex IT infrastructure – This is an important multiplier for risks. Usually automation is sought after for complex and critical IT systems, such as distributed systems across the world, or systems demanding continuous operation in varying demand and circumstances. As with the 1st point mentioned above, humans are the best compensation for lack of documentation, strict plans, procedures and unmapped conditions. Introducing automation to a complicated infrastructure, with multiple suppliers, where some resources (such as cloud) are provided as black boxes, will increase all other risks. Having well elaborated automation laws and rules is more important in such environments, so that operational thresholds are not crossed.
How to mitigate the risks
- Follow a holistic approach in automation with non-silo’d parallel improvement in all 3 automation tiers. IT processes, services and tasks should all advance together to realize benefits.
- Recognize that automation is not a matter of new plug-and-play tools to buy and install. Automation requires efforts and cost. You need specialized people to implement it correctly in your environment.
- Recognize that automation is not a one-time activity. Avoiding automation errors and realizing more benefits requires specialized people working continuously on the automation laws and rules, the hard principles which should never be violated by automated decisions.
- Review your entire IT infrastructure with automation in mind. The more complex IT infrastructure is from an operational perspective, the more difficult it is to introduce and benefit from automation. If infrastructure components and services are provided by third-party suppliers (such as cloud providers), the interaction with the supplier should be made to work without human intervention.
False expectations from Automation
Here is a list of common questions and misconceptions I have heard related to automation in IT:
- Will automation lower delivery times? Delivery time is perceived by consumers of IT services, so it refers to IT services, business processes rather than technical tools alone. Implementing automation just in one of these three tiers will not reduce the delivery times. Example: building a new server in minutes does not help much, if the quality acceptance tests are still manual, or if some manager, who is always travelling, needs to provide her approval before delivering the server to whoever requested it.
- Can automation reduce operational costs? Yes it can, but only if you look at it as an investment. It needs additional effort and spending in the short-term, and it needs transformation at an organizational level. With good planning, preparation and proper implementation it can bring mid and long-term returns.
- Is automation threatening IT jobs? By definition, automation targets reducing the human workforce in operations. As with all inventions in human history, humans will be redirected to other types of work, hopefully more advanced. During such shifts of human workforce, there are historically people who benefit by accumulating power, and people who suffer with the transition. Economy and society should prepare accordingly, and the role of public control and democracy cannot be overestimated to avoid inequalities and unemployment.