System management

The z/OS operating system is designed to host many applications on a single platform. From the beginning, efficient management of the applications and their underlying infrastructure has been an essential part of the z/OS ecosystem.

This chapter will discuss the regular system operations, monitoring processes, and tools you find on z/OS. I will also look at monitoring tools that ensure all our automated business, application, and technical processes are running as expected.

System operations

The z/OS operating system has an extensive operator interface that gives the system operator the tools to control the z/OS platform and its applications and intervene when issues occur. You can compare these operations facilities very well with the operations of physical processes like in factories or power plants. The operator is equipped with many knobs, buttons, switches, and meters to keep the z/OS factory running.

Operator interfaces and some history

By design, the mainframe performs operations on so-called consoles. Consoles originally were physical terminal devices directly connected to the mainframe server with special equipment. Everything happening on the z/OS system was displayed on the console screens. A continuous newsfeed of messages generated by the numerous components running on the mainframe streamed over the console display. Warnings and failure messages were highlighted so an operator could quickly identify issues and take necessary actions.

Nowadays, physical consoles have been replaced by software equivalents. In the chapter on z/OS, I have already mentioned the tool SDSF from IBM or similar tools from other vendors available on z/OS for this purpose.  SDSF is the primary tool system operators and administrators use to view and manage the processes running on z/OS.

Additionally, z/OS has a central facility where information, warnings, and error messages from the hardware, operating system, middleware, and applications are gathered. This facility is called the system log. The system log can be viewed from the SDSF tool.

SDSF options
Executing an operator command through SDSF
The system log viewed through SDSF

An operator can intervene with the running z/OS system and applications with operator commands. z/OS itself provides many of these operator commands for a wide variety of functions. The middleware tools installed on top of z/OS often also bring their own set of operator messages and commands.

Operator commands are similar to Unix commands for Unix operating systems and functions provided by the Windows Task Manager and other Windows system administration functions. Operator commands can also be issued through application programming interfaces, which opens possibilities for building software for automated operations for the z/OS platform.

Automated operations

In the past, a crew of operators managed the daily operations of the business processes running on a central computer like the mainframe. The operators were gathered in the control room, also called a bridge, from where they monitored and operated the processes running on the mainframe.

Nowadays, daily operations have been automated. All everyday issues are handled through automated processes; special software oversees these operations. When the automation tools find issues they cannot resolve, an incident process is kicked off. A system or application administrator is then called from his bed to check out the problem.

Manual versus automated operations

Several software suppliers provide automation tools for z/OS operations. All these tools monitor the messages flowing through the system log, which reports the health of everything running on z/OS. The occurrence of messages can be turned into events for which automated actions can be programmed.

For example, if z/OS signals that a pool of disk storage is filling up, the storage management software will write a message to the system log. An automation process that increases the storage pool and sends a notification email to the storage administrator can be defined. The automation process is kicked off automatically when the message appears in the system log.

All automated operations tools for z/OS are based on this mechanism. Some tools provide more help with automation tasks and advanced functions than others. Solutions in the market include System Automation from IBM, CA-OPS/MVS from Broadcom, and AUTOMON for z/OS from Macro4.

Monitoring

System management aims to ensure that all automated business processes run smoothly. For this, detailed information must be made available to assess the health of the running processes. All the z/OS and middleware components provide a wide variety of data that can be used to analyze the health of these individual components. The amount of data is so large that it is necessary to use tools that help make sense of all this data. This is where monitoring tools can help.

Monitoring tools can be viewed on different levels of the operational system. In this section, I differentiate between infrastructure, application, and business monitoring for this chapter.

Figure 42 shows the different layers of monitoring that can be distinguished. It illustrates how application monitoring needs to be integrated to roll the information up into meaningful monitoring of the business process.

The following sections will go into the different layers of monitoring in a bit more detail.

Monitoring of different layers

Infrastructure monitoring

Infrastructure monitoring is needed to keep an eye on mainframe hardware, the z/OS operating system, and the middleware components running on top of z/OS. All these parts produce extensive data that can be used to monitor the health of the tools. z/OS provides standard facilities for infrastructure components to write monitoring data. The first one we have seen is the messages written to the system console. These are all saved in the system log. Additionally, z/OS has a System Management Facility (SMF) component, providing a low-level interface through which infrastructure components can write information in the SMF dataset in a special event log. 

There are many options for producing data, but what often does not come with these tools is the ability to make meaningful use of all that data.

To get a better grip on the health of these infrastructure components, various software vendors provide solutions to use that data to monitor and manage specific infrastructure components. Most of these tools offer particular functions for monitoring a piece of infrastructure, and integrating these tools is not always straightforward.

BMC’s Mainview suite provides tools to monitor z/OS, CICS, IMS, Db2, and other infrastructure components common in a z/OS setup.

IBM has a similar suite under the umbrella of Omegamon. The IBM suite also has tools for monitoring z/OS itself, storage, networks, and middleware such as CICS, IMS, Db2, and more.

Also, under the name ASG-TMON, ASG has an extensive suite of tools for the components mentioned above and more. This software is now acquired my Rocket Software.

Broadcom provides under their Intelligent Operations and Automation suite tools for z/OS and network monitoring.

Application monitoring

The next level of monitoring provides a view of the functioning of the applications and their components.

Off-the-shelf tools or frameworks do not extensively support this monitoring level for application in COBOL or PL/I. Application monitoring and logging frameworks like Java Management Extensions and Log4J are available for Java, but such tools are not available for languages like COBOL and PL/I. Many z/OS users have developed their frameworks for application monitoring, relying on various technologies.

Some tools can provide a certain level of application monitoring. For example, Dynatrace, AppDynamics, and IBM Application Performance Management provide capabilities to examine applications’ functioning. However, the functionality is often not easily extensible for application developers, like it is with the log4j and JMX mentioned above. There remains a need for a framework (preferably open-source) that allows application developers to create specific monitoring and logging information and events at particular points in an application on z/OS.

Business Monitoring

Ideally, the application and infrastructure monitoring tooling should feed some tools that can aggregate and enrich this information with information from other tools to create a comprehensive view of the IT components supporting business processes.

Recently, tools have become available on z/OS to gather logging and monitoring information and forward this to a central application. Syncsort’s IronStream and IBM’s Common Data Provider can collect data from different sources, such as system logs, application logs, and SMF data, and stream this to one or more destinations like Splunk or Elastic. With these tools, it is now possible to integrate available data into a cross-platform aggregated view, as shown in Figure 42. Today, Aggregated views are typically implemented in tools like Splunk or the open-source ELK stack with Elastic or other tools focused on data aggregation, analysis, and visualization. 

WTO directly from Rexx

  • Post category:Rexx
  • Reading time:1 mins read

There are probably more ways do write a message in the system log – “Write to Operator” from a Rexx script.

This is a very straightforward one I found some time ago somewhere on the Interweb.

/* rexx */                                           
trace r                                              
call syscalls 'ON'                                   
address syscall                                      
path='/dev/console'                                  
'open' path O_wronly 666                             
if retval=-1 then                                    
do                                                    
say 'file not opened, error codes' errno errnojr     
return                                               
end                                                  
fd=retval                                            
rec= 'This is my message text to appear in the system log.' || esc_n                            
'write' fd 'rec' length(rec)                         
if retval=-1 then                                    
say 'record not written, error codes' errno errnojr  
'close' fd                                           
 

Have more solutions? Or remarks? Please let me know below.