Continuous availability The slide deck tells me that it was in 2006 that I created a set of slides for "Kees" with an overview of the continuous availability features of an IBM mainframe setup. The deck's content was interesting enough to share here, with some enhancements. What is availability? First, let's talk a little bit about availability. What do we mean when we talk about availability? A highly available computing setup should provide for the following: A highly available fault-tolerant infrastructure that enables applications to run continuously. Continuous operations to allow for non-disruptive backups and infrastructure and application maintenance. Disaster recovery measures that protect against unplanned outages due to disasters caused by factors that can not be controlled. Definitions Availability is the state of an application service being accessible to the end user. An outage (unavailability) is when a system is unavailable to an end user. An outage can be planned, for example, for software or hardware maintenance, or unplanned. What causes outages? A research report from Standish Group from 2005 showed the various causes of outages. Causes of outages It is interesting to see that (cyber) security was not part of this picture, while more recent research published by UpTime Intelligence shows this growing concern. More on this later. Causes of outages 2020 - 2021 - 2022 The myth of the nines The table below shows the availability figures for an IBM mainframe setup versus Unix and LAN availability. Things have changed. Unix (now: Linux) server availability has gone up. Server quality has improved, and so has software quality. Unix, however, still does not provide a capability similar to a z/OS sysplex. Such a sysplex simply beats any clustering facility by providing built-in, operating system-level availability. Availability figures for an IBM mainframe setup versus Unix and LAN At the time of writing, IBM publishes updated figures for a sysplex setup as well (see https://www.ibm.com/products/zos/parallel-sysplex): 99.99999% application availability for the footnote configuration: "... IBM Z servers must be configured in a Parallel Sysplex with z/OS 2.3 or above; GDPS data management and middleware recovery across Metro distance systems and storage and DS888X with IBM HyperSwap. Other resiliency technology and configurations may be needed." Redundant hardware The following slides show the redundant hardware of a z9 EC (Enterprise Class), the flagship mainframe of that time. The redundant hardware of a z9 EC Contrasting this with today's flagship, the z16 (source https://www.vm.ibm.com/library/presentations/z16hwov.pdf), is interesting. Since the mainframe is now mounted in a standard rack, the interesting views have moved to the rear of the apparatus. (iPDUs are the power supplies in this machine.) The redundant hardware of a z16 Redundant IO configuration A nice, highly tolerant server is insufficient for an ultimately highly available setup. Also, the IO configuration, a.k.a. storage configuration, must be highly available. A redundant SAN setup The following slide in the deck highlights how this can be achieved. Depending on your mood, what is amusing or annoying and what triggers me today are the "DASD CU" terms in…