System management

The z/OS operating system is designed to host many applications on a single platform. From the beginning, efficient management of the applications and their underlying infrastructure has been an essential part of the z/OS ecosystem.

This chapter will discuss the regular system operations, monitoring processes, and tools you find on z/OS. I will also look at monitoring tools that ensure all our automated business, application, and technical processes are running as expected.

System operations

The z/OS operating system has an extensive operator interface that gives the system operator the tools to control the z/OS platform and its applications and intervene when issues occur. You can compare these operations facilities very well with the operations of physical processes like in factories or power plants. The operator is equipped with many knobs, buttons, switches, and meters to keep the z/OS factory running.

Operator interfaces and some history

By design, the mainframe performs operations on so-called consoles. Consoles originally were physical terminal devices directly connected to the mainframe server with special equipment. Everything happening on the z/OS system was displayed on the console screens. A continuous newsfeed of messages generated by the numerous components running on the mainframe streamed over the console display. Warnings and failure messages were highlighted so an operator could quickly identify issues and take necessary actions.

Nowadays, physical consoles have been replaced by software equivalents. In the chapter on z/OS, I have already mentioned the tool SDSF from IBM or similar tools from other vendors available on z/OS for this purpose.  SDSF is the primary tool system operators and administrators use to view and manage the processes running on z/OS.

Additionally, z/OS has a central facility where information, warnings, and error messages from the hardware, operating system, middleware, and applications are gathered. This facility is called the system log. The system log can be viewed from the SDSF tool.

SDSF options
Executing an operator command through SDSF
The system log viewed through SDSF

An operator can intervene with the running z/OS system and applications with operator commands. z/OS itself provides many of these operator commands for a wide variety of functions. The middleware tools installed on top of z/OS often also bring their own set of operator messages and commands.

Operator commands are similar to Unix commands for Unix operating systems and functions provided by the Windows Task Manager and other Windows system administration functions. Operator commands can also be issued through application programming interfaces, which opens possibilities for building software for automated operations for the z/OS platform.

Automated operations

In the past, a crew of operators managed the daily operations of the business processes running on a central computer like the mainframe. The operators were gathered in the control room, also called a bridge, from where they monitored and operated the processes running on the mainframe.

Nowadays, daily operations have been automated. All everyday issues are handled through automated processes; special software oversees these operations. When the automation tools find issues they cannot resolve, an incident process is kicked off. A system or application administrator is then called from his bed to check out the problem.

Manual versus automated operations

Several software suppliers provide automation tools for z/OS operations. All these tools monitor the messages flowing through the system log, which reports the health of everything running on z/OS. The occurrence of messages can be turned into events for which automated actions can be programmed.

For example, if z/OS signals that a pool of disk storage is filling up, the storage management software will write a message to the system log. An automation process that increases the storage pool and sends a notification email to the storage administrator can be defined. The automation process is kicked off automatically when the message appears in the system log.

All automated operations tools for z/OS are based on this mechanism. Some tools provide more help with automation tasks and advanced functions than others. Solutions in the market include System Automation from IBM, CA-OPS/MVS from Broadcom, and AUTOMON for z/OS from Macro4.

Monitoring

System management aims to ensure that all automated business processes run smoothly. For this, detailed information must be made available to assess the health of the running processes. All the z/OS and middleware components provide a wide variety of data that can be used to analyze the health of these individual components. The amount of data is so large that it is necessary to use tools that help make sense of all this data. This is where monitoring tools can help.

Monitoring tools can be viewed on different levels of the operational system. In this section, I differentiate between infrastructure, application, and business monitoring for this chapter.

Figure 42 shows the different layers of monitoring that can be distinguished. It illustrates how application monitoring needs to be integrated to roll the information up into meaningful monitoring of the business process.

The following sections will go into the different layers of monitoring in a bit more detail.

Monitoring of different layers

Infrastructure monitoring

Infrastructure monitoring is needed to keep an eye on mainframe hardware, the z/OS operating system, and the middleware components running on top of z/OS. All these parts produce extensive data that can be used to monitor the health of the tools. z/OS provides standard facilities for infrastructure components to write monitoring data. The first one we have seen is the messages written to the system console. These are all saved in the system log. Additionally, z/OS has a System Management Facility (SMF) component, providing a low-level interface through which infrastructure components can write information in the SMF dataset in a special event log. 

There are many options for producing data, but what often does not come with these tools is the ability to make meaningful use of all that data.

To get a better grip on the health of these infrastructure components, various software vendors provide solutions to use that data to monitor and manage specific infrastructure components. Most of these tools offer particular functions for monitoring a piece of infrastructure, and integrating these tools is not always straightforward.

BMC’s Mainview suite provides tools to monitor z/OS, CICS, IMS, Db2, and other infrastructure components common in a z/OS setup.

IBM has a similar suite under the umbrella of Omegamon. The IBM suite also has tools for monitoring z/OS itself, storage, networks, and middleware such as CICS, IMS, Db2, and more.

Also, under the name ASG-TMON, ASG has an extensive suite of tools for the components mentioned above and more. This software is now acquired my Rocket Software.

Broadcom provides under their Intelligent Operations and Automation suite tools for z/OS and network monitoring.

Application monitoring

The next level of monitoring provides a view of the functioning of the applications and their components.

Off-the-shelf tools or frameworks do not extensively support this monitoring level for application in COBOL or PL/I. Application monitoring and logging frameworks like Java Management Extensions and Log4J are available for Java, but such tools are not available for languages like COBOL and PL/I. Many z/OS users have developed their frameworks for application monitoring, relying on various technologies.

Some tools can provide a certain level of application monitoring. For example, Dynatrace, AppDynamics, and IBM Application Performance Management provide capabilities to examine applications’ functioning. However, the functionality is often not easily extensible for application developers, like it is with the log4j and JMX mentioned above. There remains a need for a framework (preferably open-source) that allows application developers to create specific monitoring and logging information and events at particular points in an application on z/OS.

Business Monitoring

Ideally, the application and infrastructure monitoring tooling should feed some tools that can aggregate and enrich this information with information from other tools to create a comprehensive view of the IT components supporting business processes.

Recently, tools have become available on z/OS to gather logging and monitoring information and forward this to a central application. Syncsort’s IronStream and IBM’s Common Data Provider can collect data from different sources, such as system logs, application logs, and SMF data, and stream this to one or more destinations like Splunk or Elastic. With these tools, it is now possible to integrate available data into a cross-platform aggregated view, as shown in Figure 42. Today, Aggregated views are typically implemented in tools like Splunk or the open-source ELK stack with Elastic or other tools focused on data aggregation, analysis, and visualization. 

Noise reduction

  • Post category:Principles
  • Reading time:2 mins read

The principle of noise reduction in software systems improves software systems by removing inessential parts and options and/or making them invisible or only visible to selected users.

Reducing the options in a software solution increases usability. This goes for user interfaces as well as technical interfaces. We decide what an interface looks like and stick to it. All-too-famous examples of noise reduction are the Apple iPod and the Google search page.

Adding features for selected users means adding features and under-the-hood complexities for all clients.

Reducing options also makes the software more robust. If we build fewer interfaces, we can improve them. We can focus on really doing well with the limited set of interfaces.

In practice, we see hardware and software tools have many options and features. That is not because software suppliers desperately want to give their customers all the options but because we, their customers, are requesting these options. Software suppliers may view all these requests more critically. Some do.

Let’s aim to settle for less. We shouldn’t build more every time we can do with less just because we can. Also, we shouldn’t ask our suppliers to create features that are nice to have.

There are always more options, but let’s limit the options to 4 or better: 1.

Hypes

  • Post category:Principles
  • Reading time:2 mins read

Some companies have made a business model out of technology hypes. These are the same companies that tell the market what it needs by asking the market. Of course, this comes with an invoice mentioning generous compensation. These companies write classy reports with colorful graphics in which they advise organizations to do what the organizations tell them to do.

But hypes are for techies. Techies may feast on technology, but for organizations, jumping on hypes can be a risky and costly pastime.

There are two types of hypes. Hypes can be about something new. Other hypes are just reformulations of existing things, recycled ideas.

But hypes are hypes: they will go away. The vast majority of hypes disappear into thin air. The techie may have learned from them. Some remain. It might be valuable if a technology is still around after a few years. But usually, the stuff will not be as groundbreaking and revolutionary as predicted when announced by the hype cycle company, that is, by the market itself.

Blockchain, anyone?

Some hypes are recycled ideas. We have no memory, and we don’t read textbooks. SOA, AI, microservices, and technical advancements are wrapped in shiny new names and gift papers, so they appear to be a gift from your software supplier or consultancy company.

Think Globally, Act Locally applied to IT

  • Post category:Principles
  • Reading time:1 mins read

Solution architects should consider the enterprise impact of architectural decisions.

Solution architects must ignore enterprise directions if these lead to local inefficiencies or have other predominantly negative local effects.

There is a significant difference between the clean 30000-foot view (sometimes referred to as the air castle) and the muddy reality on sea level.

Gear Acquisition Syndrome

Photographers tend to suffer from Gear Acquisition Syndrome. They believe they will make better pictures with new gear and buy new lenses, cameras, and flashlights.

Then they find their work does not improve.

In IT, we do the same.

We have our old relational database management system.

But now we have this great Spark, MongoDB, CouchDB, or what have you. (I’m just taking a not-so-random example.) So now everything must be converted to Spark or Mongo.

We even forget that this old technology, the relational DBMS in this example, was so good at reliably processing transactions. It worked!

The new database is massively scalable, which is great. Unfortunately, it does not improve the reliability of processing our transactions.

But it’s hot, so we want it—because Google has it. Errr, but will you also use it to process web page indexes? Ah, no. You want to store your customer records in it. So, is it reliable? No. But it is satisfying our GAS.

Aesthetics and quality

  • Post category:Uncategorized
  • Reading time:3 mins read

Beautiful things are easier to use.

We can also apply this to technical designs. This often surprises a non-technical audience, but techies will recognize the beauty that can be present in technical solutions.

For example, symmetrical diagrams not only give a quick insight into an orderly, robust solution but are often also very appealing to the eye.

Symmetrical and well-colored diagrams are easier to read and understand.

Old PowerPoint presentations using the standard suggested colors were horrendously ugly, and I am sure the people using these colors did not want to be understood. (Nowadays, PowerPoint comes with more pleasing color schemes)

The success of the Python programming language is not in the least its forced readability. No crazy abbreviations as in C that make code unreadable (but programmers look very smart).

Beautiful code (yes, such a thing exists) is easier to read and understand.


If a
Then b
Else If c
Then d
Else If e
Then f

versus


Case a
b
Case c
d
Case e
f


It is pretty evident.

But do we care about the quality and beauty of code nowadays? Throw-away software is abundant. Software systems are built with the idea to throw them out and replace them within a few years.

Ursus Wehrli
Image by Ursus Wehrli

That is the idea. But the Lindy effect tells us differently.

Good programming is a profession that should be appreciated as such. Bad coding may be cheap, but only in the short run.

We don’t hire a moonlighter to build our house. We employ an architect and a construction professional who can make a comfortable house that can be used for generations.

Chris Verhoef debunking myths about legacy and COBOL

  • Post category:COBOL
  • Reading time:1 mins read

Last week, the De Technoloog, a BNR program, had a very nice interview with Professor Chris Verhoef of VU University. The interviewers, Herbert Blankesteijn and Ben van der Burg, were surprised to find that COBOL is not bad and is very good for programming administrative automation processes. Legacy is not an issue. Not allowing time for maintenance is a management issue. He mentioned the Lindy effect which tells us that the life expectancy of old code increases with time. The established code is anti-fragile.

The Andon Cord

  • Post category:Uncategorized
  • Reading time:2 mins read

Anyone in the product chain can pull the Andon Cord to stop production when he notices that the product’s quality is poor.

The andon cord

Stopping a system when a defect is suspected originates back to Toyota. The idea is that by blocking the system, you get an immediate opportunity for improvement or find a root cause instead of letting the defect move further down the line and be unresolved.

A crucial aspect of Toyota’s “Andon Cord” process was that when the team leader arrived at the workstation, they thanked the team member who pulled the Cord.

The incident would not be a paper report or a long-tail bureaucratic process. The problem would be immediately addressed, and the team member who pulled the cord would fix it.

For software systems, this practice is beneficial as well. However, the opposite process is likely the practice we see in our drive for quick results.

We don’t stop the process in case of issues. We apply a quick fix, and ‘we will resolve it later’.

The person noticing an issue is regarded as a whistle-blower. Issues may get covered in this culture, leading to even more severe problems.

When serious issues occur, we start a bureaucratic process that quickly becomes political, resulting in watered-down solutions and covering up the fundamental problems.

The backward compatibility conundrum

  • Post category:Uncategorized
  • Reading time:5 mins read

In software systems, backward compatibility is a blessing and a curse. While backward compatibility discharges users from mandatory software updates, it is also an excuse to ignore maintenance. For software vendors, omitting backward compatibility is a means to get users to buy new stuff; “enjoy our latest innovations!”.

1980s software on 64-bit hardware

DS Backward compatilibility
DS Backward compatibility

You can not run Windows 95 software on Windows 11.

You can not Run MacOS X software on a PowerBook G4 from 2006.

You can not use Java version 5 software on a Java 11 runtime.

You can, however, run mainframe software compiled in 1980 for 16-bit hardware on the latest z/OS 64-bit operating system and the latest IBM Z hardware. This compatibility is one of the reasons for the success of the IBM mainframe.

Backward compatibility in software has significant benefits. The most significant benefit is that you do not need to change applications with technology upgrades. This saves large amounts of effort and, thus, money for changes that bring no business benefit.

The dangers of backward compatibility

Backward compatibility also has very significant drawbacks:

  • Because you do not need to fix software for technology upgrades, backward compatibility leads to laziness in maintenance. Just because it keeps running, the whole existence of the software is lost out of sight. Development teams lose the knowledge of the functionality and sometimes even the supporting business processes. Minor changes may be made haphazardly, leading to slowly increasing code complexity. Horrific additions are made to applications, using tools like screen scraping, leading to further complexity of the IT landscape. Then, significant changes are suddenly necessary, and you are in big trouble.
  • Backward compatibility hinders innovation. Not only can you not take advantage of modern hardware capabilities, but you also get stuck with programming and interfacing paradigms of the past. You can not exploit functionality trapped inside old programs, and it is tough to integrate through modern technologies like REST APIs.

The problem may be even more significant. Because you do not touch your code, other issues may appear.

Over the years, you will change from source code management tools. During these transitions, code can get lost, or insight into the correct versions of programs gets lost.

Also, compilers are upgraded all the time. And the specifications of the programming languages may change. Consequently, the code you have, which belongs to the programs running in your production environment, can not be compiled any longer. When changes are necessary, your code suddenly needs to catch up with all these changes. And that will make the change a lot riskier.

How to avoid backward compatibility complacency?

Establish a policy to recompile, test, and deploy programs every 2 or 3 years, even if the code needs no functional change. Prevent a pile of technical debt.

Is that a lot of work? It does not need to be. You could automate most, if not all, of the compilation and testing process. If nothing functionally changes, modern test tools can help support this process. With these tools, you can automate running tests, compare results with the expected output, and pinpoint issues.

This process also has a benefit: your recompiled code will run faster because it can use the latest hardware features. You can save money if your software bill is based on CPU consumption.

Don’t let backward compatibility make you backward.

Assembler to get name of current address space

  • Post category:Assembler
  • Reading time:2 mins read

Assembler program that gets the name of the current address space from the PSA’s current ASCB block.

Documentation: PSA description and ASCB description.

ASCBNAME CSECT                                                          
         EQUATES                                                        
         SAVE  (14,12),,TST/NDG/&SYSDATE/&SYSTIME/               
         USING ASCBNAME,R12            SET UP BASE ADDRESSABILITY       
         LR    R12,R15                 LOAD BASE REG WITH ENTRY POINT   
         LA    R14,SAVE                GET ADDRESS OF REGISTER SAVE     
         ST    R13,4(0,R14)            SAVE CALLER'S SAVE AREA ADDR     
         ST    R14,8(0,R13)            SAVE MY SAVE AREA ADDRESS        
         LR    R13,R14                 LOAD SAVE AREA ADDRESS           
*                                                                       
INIT     DS    0H                                                       
         OPEN  (OUT,(OUTPUT))
*                                                                       
DOE      DS    0H                                                       
         SR    R1,R1                   R1 = 0
         USING PSA,R1                  ADDRESS PSA
         L     R2,PSAAOLD              GET ADDRESS CURRENT ASCB
         DROP  R1                      RELEASE PSA ADDRESSING
         USING ASCB,R2                 ADDRESS CURRENT ASCB
         L     R1,ASCBJBNS             GET ADDRESS ADDRESS SPACE NAME
         DROP  R2                      RELEASE ASCB ADDRESSING
         MVC   ADDRSPC,0(R1)           GET NAME
         PUT   OUT,OUTREC              SCHRIJF                          
RETURN   DS    0H                                                       
         CLOSE OUT                                                      
         SLR   R15,R15                                                  
         L     R13,4(R13)              LOAD CALLERS SAVE AREA ADDRESS   
         RETURN (14,12),RC=(15)        RETURN TO CALLER                 
*                                                                       
*
*
         DC     C'**********   ************* WERKGEBIED ******'
SAVE     DS    18F
OUTREC   DS    CL80
         ORG   OUTREC
ADDRSPC  DC    CL8' '
REST     DC    CL72' '
OUT      DCB   DDNAME=OUT,                                             *
               DSORG=PS,                                               *
               MACRF=(PM)
*                                                                       
         IHAASCB DSECT=YES
         IHAPSA
         END   ,