Infrastructure Availability

What is Availability

Availability is the percentage of time a service was available. By “service” we refer to whatever the component or system we are interested in is supposed to provide.

Availability = Time Service Was Available / Total Time

which is equivalent to:

Availability = Uptime / (Uptime + Downtime)

A few points deriving from this definition:

  • Availability is known “post hoc”. It refers always to the past (what has already happened. So, it is wrong to ask “what is the availability of X?”. The correct question would be “what was the availability of X?”
  • Availability is calculated over a specific period of time. Different availabilities arise if we speak of different periods (last month’s availability, last year’s availability etc.). So the question in the previous point should be actually “what was the availability of X over period Y?”
  • We can assess the expected availability in the future, but we would be actually talking about reliability. There is a separate post on reliability.

Availability is on what has happened. – Reliability is on what is expected to happen.

  • Good availability may be misleading. It doesn’t necessarily mean that you have done a good job with infrastructure; it may simply mean that you have been lucky over that specific period. Longer periods provide a more realistic picture of availability, because the longer the period, the smaller the impact of luck.

Below is a reference table showing the yearly downtime for specific availability values. To calculate these values we use this formula:

Yearly Downtime (in seconds) = (1 - Availability) x 31,536,000
31,536,000 is the number of seconds in 1 year (365 x 24x 60 x 60)

Availability %

Yearly Downtime




5m 15s


52m 34s


4h 22m 48s


8h 45m 36s


17h 31m 12s


1d 19h 48m


7d 7h 12m


18d 6h

How to Measure Availability

Measuring availability is a monitoring activity. To capture availability we need to measure time, and whenever it comes to measurements the following precautions apply:

  • The definition of what is available and what is unavailable may not be clear. If you come to situations where somebody says “the service is unavailable when this happens, but if this also happens or that doesn’t happen, it is available, but then again if this doesn’t work then it is absolutely unavailable”, you need to work on expressing what “available” means in simple technical terms, before you start working on how to measure availability.
  • Measurements depend on the point you are measuring from. Different points give different results. Availability may be measured at the server, outside the firewall, outside the datacenter, or on the user’s computer. In some cases, the term “end-to-end” may be non-technical; we should define precisely the two ends (or more than two ends in a complex system). End-to-end availability measurements, besides trendy, have to be meaningful and actionable.
  • The monitoring mechanism introduces further considerations. One particular consideration, which can affect the result, is the sampling rate, explained here. If your monitoring solution is polling the service at specific intervals, then short outages, happening between two consecutive pollings, will be missed. And no, increasing the sampling rate to infinity is not a good technical solution; you will have to compromise.

How to Improve Availability

You cannot improve availability, unless you can change the past. If you want to improve the future, instead, then you are referring to improving reliability, and you should read this post.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s