In Matt’s presentation, IT cloud infrastructure is controlled with natural language, as Alexa (an Amazon AI implementation) interprets the human speech to commands understood by the VMware cloud running on Amazon AWS. There are three parts in Matt’s example:
- a front-end interface used by the operator to operate the IT infrastructure. Instead of using the keyboard and management tools, the human just talks in natural language.
- a back-end connection of Alexa to the VMware Cloud management tools.
- Alexa, interpreting the high-level commands to low-level VMware cloud requests in real-time.
On the other hand, “The Arrival” takes the example of an attempted communication with extra-terrestrials to discuss a linguistic challenge: the significant load carried by words implicitly, a load we may easily neglect when using common words. (A better word for “extra-terrestrials” would be “extra-humans”, as we don’t even communicate with other terrestrial organisms using a language).
Here is a transcript of Matt’s presentation in the video:
- Human: Ask VMware Cloud about my cloud datacenters.
- Alexa: You have 3 cloud datacenters. Their names are … called VMC, Virginia_VMC, Oregon_VMC.
- Human: Ask VMware Cloud for details on Virginia_VMC.
- Alexa: The datacenter Virginia_VMC has 8 hosts, its operational state is “running”, and it is at 87% capacity.
- Human: Tell VMware Cloud to add 5 hosts to Virginia_VMC.
- Alexa: I just added 5 hosts to cloud datacenter Virginia_VMC.
- Human: Tell VMware Cloud to enable Elastic DRS on Virginia_VMC.
- Alexa: I just enabled Elastic DRS on cloud datacenter Virginia_VMC.
- Then Matt Dreyer continues: “That for me is the cool part about this . When you are able to provision hosts from the cloud … that’s pretty awesome.”
Hmmm… Leaving aside the audience cheers and that YouTube has categorized the video as “Comedy”, it is interesting that the words “cool” and “awesome” were selected in an example of IT operations, a technical field where words with precise meaning should be used. The two words refer to fun, and indeed Alexa is intended for fun, personal or home use. It is also true that progress is based on fun, and that what is a game today is the greatest invention tomorrow.
But we must not neglect the risks when jumping from a laboratory into the real life, where people’s everyday lives are impacted. This is what this post is about.
The video shows just a positive outcome of a laboratory test, a scenario with a start and an end. “Do this”, “done”, “closed”, =”success”. There is nothing happening before or after. In real life however, “in-vivo”, what is successful today, can easily turn to being considered a failure tomorrow. A newly applied configuration may work fine and pass all the tests, but after a few days a serious defect is discovered. What makes the difference between success and failure is the time scale under which we consider the system (where do the “start” and the “end” lie in time). Fortunately, in DevOps, time is “continuous”, and IT infrastructure operates non-stop. Consequently, the difference of success and failure in non-stop IT infrastructure is the level we look into the future with planning, preparation, and being proactive in operations. All these qualities target at avoiding the problems of tomorrow, otherwise they are unnecessary luxuries. So, an important part of IT automation, with or without Alexa, is to have a quite wide (practically “continuous”) consideration of time.
Secondly, let’s focus in the meaning of the specific words in the video. Linguistic glitches, or tinny differences in conception, become important when interacting with a critical system, as “The Arrival” shows. Going back to Alexa, many things could go wrong in the scenario shown in the video. Here are some concerns:
- What is a “host”? Just ask people in any IT company and you will get surprisingly vague and different replies. In the context of this blog, a “host” is different in the mind of people in Development and of people in Operations.
- What is “add”? Alexa says “I just added”. We touch here the so-called “Definition of Done”, a crucial challenge in DevOps. Does “add” include monitoring (APM?), replication, backup, directing load to the application, or even adding the new hosts to operational manuals and updating the capacity management reports?
- What is “capacity” (just compute resources, or network, disk capacity, I/O, web requests?). You cannot express capacity with one number and you cannot take decisions on a number.
- What is “operational state”? How many options do we have? just “ok” and “not-ok”. Why?
One could hastily reply that the above terms as are used as defined by AWS (are they actually defined in a technically precise way?), the very tiny difference in how a person conceives a word can become a fundamental cause leading to system malfunction and finally downtime. The exact meaning of words is important when it comes to non-stop systems. Misconception (“but I thought you meant…”) is a usual cause of human error and process error (see this post), and it becomes crucial in teams of people with different backgrounds, just like DevOps. The Tower of Babel is a good example of how a system can collapse because of misconception. After all, the great advantage of Automation and AI is avoiding human and process errors, such as misconception. We don’t want AI to inherit such human weaknesses.