Connecting from z/OS apps to Kafka via MQ

Apache Kafka is the de facto standard open-source event streaming platform. In event-driven architectures, applications publish events when data changes, allowing other systems to react in real-time rather than polling for updates.

An example is a CRM application that serves as the system of record for customer data. When a customer’s address changes, instead of having every application repeatedly query the CRM for current address data, the CRM can publish an ‘address-update’ event. Interested applications subscribe to these events and maintain their own current copy of the data.

Kafka provides native programming interfaces for Java, Python, and Scala. This article demonstrates how traditional z/OS applications can participate in Kafka-based event streaming using IBM MQ and Kafka Connect.

Native Kafka programming interfaces and Kafka Connect

Applications can interact directly with Kafka through native programming interfaces. Kafka, being Java-based, naturally supports Java applications. Other languages with native Kafka support include Python and Scala. IBM recently introduced a Kafka SDK for COBOL on z/OS, though I will not explore that approach here.

Kafka Connect bridges the gap for applications without native Kafka support. This open-source component sits between Kafka and other middleware technologies like databases and messaging systems, translating between their protocols and Kafka’s event streaming format.

Solution Architecture

Our solution enables z/OS applications to produce and consume Kafka events through IBM MQ, leveraging the well-established asynchronous messaging patterns familiar to mainframe developers.

Key Benefits:

  • Uses proven MQ messaging patterns
  • Works with both CICS online and batch applications
  • Supports any z/OS programming language that can create MQ messages (COBOL, PL/I, Java, Python, Node.js, Go)
  • No application code changes required beyond message formatting

Architecture Overview

The solution uses Kafka Connect as a bridge between MQ queues and Kafka topics.

For Event Production:

  • z/OS applications send messages to dedicated MQ queues
  • Kafka Connect reads from these queues
  • Messages are published to corresponding Kafka topics
  • Kafka broker makes events available to subscribers

For Event Consumption:

  • Kafka Connect subscribes to Kafka topics
  • Incoming events are placed on corresponding MQ queues
  • z/OS applications read from queues for business processing

Queue-to-Topic Mapping

Each Kafka topic has a dedicated MQ queue. This one-to-one mapping simplifies configuration and makes the data flow transparent for both operations and development teams.

Software Components

Kafka Connect runs as a started task on z/OS. Multiple instances can serve the same workload by sharing startup parameters, providing scalability and high availability.
Kafka Connect includes a REST API for:

  • Configuring connectors for your applications
  • Monitoring connector status
  • Integrating with provisioning and deployment processes

Production Configuration

In a production environment, multiple Kafka Connect instances run across different LPARs for high availability. Each instance accesses application queues through MQ local binding connections. MQ queue sharing groups distribute workload across LPARs, ensuring both performance and resilience.

The infrastructure setup supports:

  • Load balancing across multiple z/OS instances
  • Fault tolerance through redundant components
  • Efficient local MQ connections

Summary

This article describes an architecture that provides a clean, straightforward path for z/OS applications to participate in event-driven systems using Apache Kafka. By leveraging existing MQ messaging patterns and Kafka Connect middleware, traditional mainframe applications can integrate with modern streaming platforms without requiring extensive code changes or new programming paradigms.
The solution maintains the reliability and performance characteristics that z/OS environments demand while opening doors to real-time data integration and event-driven architectures.

dos2unix on z/OS

On z/OS UNIX, the dos2unix utility is not included. You can achieve similar functionality using other tools available on z/OS UNIX, such as sed or tr. These tools can be used to convert DOS-style line endings (CRLF) to Unix-style line endings (LF).

For example, you can use sed to remove carriage return characters:

sed 's/\r$//' inputfile > outputfile

Or you can use tr

tr -d '\r' < inputfile > outputfile

Continuous availability presentation in 2006, updated

Continuous availability

The slide deck tells me that it was in 2006 that I created a set of slides for “Kees” with an overview of the continuous availability features of an IBM mainframe setup.

The deck’s content was interesting enough to share here, with some enhancements.

What is availability?

First, let’s talk a little bit about availability. What do we mean when we talk about availability?

A highly available computing setup should provide for the following:

  • A highly available fault-tolerant infrastructure that enables applications to run continuously.
  • Continuous operations to allow for non-disruptive backups and infrastructure and application maintenance.
  • Disaster recovery measures that protect against unplanned outages due to disasters caused by factors that can not be controlled.

Definitions

Availability is the state of an application service being accessible to the end user.

An outage (unavailability) is when a system is unavailable to an end user. An outage can be planned, for example, for software or hardware maintenance, or unplanned.

What causes outages?

A research report from Standish Group from 2005 showed the various causes of outages.

Causes of outages 2006
Causes of outages

It is interesting to see that (cyber) security was not part of this picture, while more recent research published by UpTime Intelligence shows this growing concern. More on this later.

Causes of outages 2020 - 2021 - 2022
Causes of outages 2020 – 2021 – 2022

The myth of the nines

The table below shows the availability figures for an IBM mainframe setup versus Unix and LAN availability.

Things have changed. Unix (now: Linux) server availability has gone up. Server quality has improved, and so has software quality. Unix, however, still does not provide a capability similar to a z/OS sysplex. Such a sysplex simply beats any clustering facility by providing built-in, operating system-level availability.

Availability figures for an IBM mainframe setup versus Unix and LAN
Availability figures for an IBM mainframe setup versus Unix and LAN

At the time of writing, IBM publishes updated figures for a sysplex setup as well (see https://www.ibm.com/products/zos/parallel-sysplex): 99.99999% application availability for the footnote configuration: “… IBM Z servers must be configured in a Parallel Sysplex with z/OS 2.3 or above; GDPS data management and middleware recovery across Metro distance systems and storage and DS888X with IBM HyperSwap. Other resiliency technology and configurations may be needed.”

Redundant hardware

The following slides show the redundant hardware of a z9 EC (Enterprise Class), the flagship mainframe of that time.

The redundant hardware of a z9 EC
The redundant hardware of a z9 EC

Contrasting this with today’s flagship, the z16 (source https://www.vm.ibm.com/library/presentations/z16hwov.pdf), is interesting. Since the mainframe is now mounted in a standard rack, the interesting views have moved to the rear of the apparatus. (iPDUs are the power supplies in this machine.)

The redundant hardware of a z16
The redundant hardware of a z16

Redundant IO configuration

A nice, highly tolerant server is insufficient for an ultimately highly available setup. Also, the IO configuration, a.k.a. storage configuration, must be highly available.

A redundant SAN setup

The following slide in the deck highlights how this can be achieved. Depending on your mood, what is amusing or annoying and what triggers me today are the “DASD CU” terms in the storage boxes. These boxes are the storage systems housing the physical disks. At that time, terminologies like storage and disk were more evident than DASD (Direct Access Storage Device, goodness, what a code word for disk) and CU (Control Unit, just an abstraction anyway). Then, I ignore the valueless addition of CSS (Channel SubSystem) and CHPID (Channel Path ID) for this slide.

What a prick I must have been at that time.

At least the term Director did get the explanatory text “Switch.”

A redundant storage setup for mainframes
A redundant storage setup for mainframes

RAS features for storage

I went on to explain that a “Storage Subsystem” has the following RAS features (RAS, ugh…, Reliability, Availability, Security):

  • Independent dual power feeds (so you could attach the storage box to two different independent power lines in the data center)
    • N+1 power supply technology/hot-swappable power supplies and fans
    • N+1 cooling
    • Battery backup
    • Non-volatile subsystem cache to protect writes that have not been hardened to DASD yet (which we jokingly referred to as non-violent storage)
    • Non-disruptive maintenance
    • Concurrent LIC activation (LIC – Licensed Internal Code, a smoke-and-mirrors term for software)
    • Concurrent repair and replacement actions
    • RAID architecture
    • Redundant microprocessors and data paths
    • Concurrent upgrade support (that is, the ability to add disks while the subsystem is online)
    • Redundant shared memory
    • Spare disk drives
    • Remote Copy to a second storage subsystem
      • Synchronous (Peer to Peer Remote Copy, PPRC)
      • Asynchronous (Extended Remote Copy, XRC)

Most of this is still valid today, except that we do not have spinning disks anymore, but everything is Solid State Drives nowadays.

Disk mirroring

Ensuring that data is safely stored in this redundant setup is achieved through disk mirroring at the lowest level. Every byte written to a disk in one storage system is replicated to one or more storage systems, which can be in different locations.

There are two options for disk mirroring: Peer-to-Peer Remote Copy (PPRC) or eXtended Remote Copy (XRC). PPRC is also known as a Mero mirror solution. Data is mirrored synchronously, meaning an application receives an “I/O complete” only after both primary and secondary disks are updated. Because updates must be made to both storage systems synchronously, they can only be 15 to 20 kilometers apart. Otherwise, updates would take too long. The speed of light is the inhibitor for such a limitation.

With XRC, data is mirrored asynchronously. An application receives “I/O complete” after the primary disk is updated. The storage systems can be at an unlimited distance apart from each other. A component called System Data Mover ensures the consistency of data in the secondary storage system.

PPRC and XRC
PPRC and XRC

The following slide highlights how failover and failback would work in a PPRC configuration.

PPRC failover and failback
PPRC failover and failback

The operating system cluster: parallel sysplex

The presentation then explains how a z/OS parallel sysplex is configured to create a cluster without any single point of failure. All servers, LPARs, operating systems, and middleware are set up redundantly in a sysplex.

Features such as Dynamic Session Balancing and Dynamic Transaction Routing ensure that workloads are spread evenly across such a cluster. Facilities in the operating system and middleware work together to ensure that all data is safely and consistently shared, locking is in place when needed, and so forth.

The Coupling Facility is highlighted, which is a facility for sharing memory between the different members in a cluster. Sysplex Timers are shown; these ensure that the time of the different members in a sysplex is kept in sync.

A parallel sysplex
A parallel sysplex

A few more facilities are discussed. Workload Balancing is achieved with the Workload Manager (WLM) component of z/OS. The ability to restart applications without interfering with other applications or the z/OS itself is done by the Automatic Restart Manager (ARM). The Resource Recovery Services (RRS) assist with Two-Phase commits across members in a sysplex.

Automation is critical for successful rapid recovery and continuity

Every operation must be automated to prevent human errors and improve recovery speed. The following slide kicks in several open doors about the benefits of automation:

  • Allows business continuity processes to be built on a reliable, consistent recovery time
  • Recovery times can remain consistent as the system scales to provide a flexible solution designed to meet changing business needs
  • Reduce infrastructure management costs and staffing skills
  • Reduces or eliminates human error during the recovery process at the time of disaster
  • Facilitates regular testing to help ensure repeatable, reliable, scalable business continuity
  • Helps maintain recovery readiness by managing and monitoring the server, data replication, workload, and network, along with the notification of events that occur within the environment

Tiers of Disaster Recovery

The following slide shows an awful picture highlighting the concept of tiers of Disaster Recovery from Zero Data Loss to the Pickup Truck method.

Tiers of Disaster Recovery
Tiers of Disaster Recovery

I mostly like the Pickup Truck Access Method.

GDPS

The following slide introduces GDPS (the abbreviation of the meaningless concept of Geographically Dispersed Parallel Sysplex). GDPS is a piece of software on top of z/OS that provides the automation solution that combines all the previously discussed components to provide a Continuously Available configuration. GDPS takes care of the actions needed when failures occur in a z/OS sysplex.

GDPS
GDPS

GDPS comes in two flavors: GDPS/PPRC and GDPS/XRC.

GDPS/PPRC is designed to provide continuous availability and no data loss between z/OS members in a sysplex across two sites that are maximum at campus distance (15-20 km).

GDPS/XRC is designed to provide automatic failover of sites that are at extended distance from each other. Since GDPS/XRC is based on asynchronous data mirroring, minimum data loss can occur for data not committed to the remote site.

GDPS/PPRC and GDPS/XRC can be combined, providing a best-in-class solution having a high performance, zero data loss setup for local/metro operation, and an automatic site switch capability for extreme situations such as natural disasters.

In summary

The summary slide presents an overview of the capabilities of the server hardware, the Parallel Sysplex, and the GDPS setup.

Redundancy of Z server, Parallal Sysplex and GDPS
Redundancy of Z server, Parallal Sysplex and GDPS

But we are not there yet: ransomware recovery

When I created this presentation, ransomware was not today’s big issue. Nowadays, the IBM solution for continuous availability has been enriched with a capability for ransomware recovery. This solution, called IBM Z Cyber Vault, is a combination of various capabilities from IBM Z. The IBM Z Cyber Vault solution can create immutable copies, or Safeguarded Copies, in IBM Z Cybervault terms, taken at multiple points in time on production data with rapid recovery capability. In addition, this solution can enable data validation to support testing on the validity of each captured copy.

The IBM Z Cyber Vault environment is isolated from the production environment.

Whatever types of mainframe configuration, this IBM Z Cyber Vault capability can provide a high degree of cyber resiliency.

Source: https://www.redbooks.ibm.com/redbooks/pdfs/sg248511.pdf

IBM Z Cybervault
IBM Z Cybervault

Assembler to get name of current address space

Assembler program that gets the name of the current address space from the PSA’s current ASCB block.

Documentation: PSA description and ASCB description.

ASCBNAME CSECT                                                          
         EQUATES                                                        
         SAVE  (14,12),,TST/NDG/&SYSDATE/&SYSTIME/               
         USING ASCBNAME,R12            SET UP BASE ADDRESSABILITY       
         LR    R12,R15                 LOAD BASE REG WITH ENTRY POINT   
         LA    R14,SAVE                GET ADDRESS OF REGISTER SAVE     
         ST    R13,4(0,R14)            SAVE CALLER'S SAVE AREA ADDR     
         ST    R14,8(0,R13)            SAVE MY SAVE AREA ADDRESS        
         LR    R13,R14                 LOAD SAVE AREA ADDRESS           
*                                                                       
INIT     DS    0H                                                       
         OPEN  (OUT,(OUTPUT))
*                                                                       
DOE      DS    0H                                                       
         SR    R1,R1                   R1 = 0
         USING PSA,R1                  ADDRESS PSA
         L     R2,PSAAOLD              GET ADDRESS CURRENT ASCB
         DROP  R1                      RELEASE PSA ADDRESSING
         USING ASCB,R2                 ADDRESS CURRENT ASCB
         L     R1,ASCBJBNS             GET ADDRESS ADDRESS SPACE NAME
         DROP  R2                      RELEASE ASCB ADDRESSING
         MVC   ADDRSPC,0(R1)           GET NAME
         PUT   OUT,OUTREC              SCHRIJF                          
RETURN   DS    0H                                                       
         CLOSE OUT                                                      
         SLR   R15,R15                                                  
         L     R13,4(R13)              LOAD CALLERS SAVE AREA ADDRESS   
         RETURN (14,12),RC=(15)        RETURN TO CALLER                 
*                                                                       
*
*
         DC     C'**********   ************* WERKGEBIED ******'
SAVE     DS    18F
OUTREC   DS    CL80
         ORG   OUTREC
ADDRSPC  DC    CL8' '
REST     DC    CL72' '
OUT      DCB   DDNAME=OUT,                                             *
               DSORG=PS,                                               *
               MACRF=(PM)
*                                                                       
         IHAASCB DSECT=YES
         IHAPSA
         END   ,                                                        


ABENDIT – simple assembler to create an ABEND

Not sure what I used it for, but here is a simple program in assembler to create an ABEND with a completion code of your choice.

Look here in the IBM manuals for more specifics on the ABEND macro.

ABENDIT  CSECT
         EQUATES
         SAVE  (14,12),,ABENDIT/OURDEPT/&SYSDATE/&SYSTIME/
         USING ABENDIT,R11             SET UP BASE ADDRESSABILITY
         LR    R11,R15                 LOAD BASE REG WITH ENTRY POINT
         LA    R14,SAVE                GET ADDRESS OF REGISTER SAVE
         ST    R13,4(0,R14)            SAVE CALLER'S SAVE AREA ADDR
         ST    R14,8(0,R13)            SAVE MY SAVE AREA ADDRESS
         LR    R13,R14                 LOAD SAVE AREA ADDRESS
*        Business Logic
         ABEND 4321                    4321 or some other code up to 4096
*        Epilogue
RETURN   EQU    *
         L      R13,4(R13)
         RETURN (14,12)                RETURN TO CALLER
         LTORG
SAVE     DS     18F
         END ABENDIT

Happy ABENDing!

Exchange an MVS dataset with Windows

I have discussed in another article how you can convert data from one codepage to another.

Also I have described a method to copy unix files to MVS datasets.

This article summarizes a method to copy an MVS dataset to Windows, while keeping records and converting from ebcdic to the utf8 codepage.

This job copies the MVS dataset to a UNIX files, indicating to keep record indicators with the Windows CR character (UNIX uses the CRLF typically as record / line separators). The dataset below ‘YOUR.TEST.PS’ should be a PS – physical sequential – dataset or a PDS(E) member.

//STEP1    EXEC PGM=BPXBATCH                                         
//STDOUT   DD SYSOUT=*                                               
//STDERR   DD SYSOUT=*                                                
//STDIN    DD DUMMY    
//* Values in STDENV below are kept but have no meaning for this function                                              
//STDENV   DD *                                                      
JAVA_HOME=/usr/lpp/java/J8.0_64                                      
PATH=/usr/lpp/mqm/web/bin:/bin:/usr/sbin                      
LIBPATH=/usr/lpp/mqm/java/lib                                 
//STDPARM  DD   *                                                    
SH cp -v -F cr "//'YOUR.TEST.PS'" /your/unixdir/tst.txt 

You can now convert the EBCDIC data to UTF-8 as follows:

iconv -f 37 -t 1208 /your/unixdir/tst.txt  > /your/unixdir/utf8-tst.txt

Now you can transfer the file to Windows. Transfer the file in binary mode, otherwise another unwanted code page conversion will happen.

Programming languages and what’s next

My review of programming languages I learned in during my years in IT.

BASIC

On the Texas Instruments TI99-4a. 

Could do everything with it. Especially in combination with PEEK and POKE. Nice for building small games.

Impossible to maintain.

GOTO is unavoidable.

Assembler

In various variants.

Z80, 6802, PDP 11, System 390.

Fast, furious, unreadable, unmaintainable.

Algol 68

Loved this language. REF!

Have only seen it run on DEC 10. Mainly used in academic environments (in the Netherlands?)?

Pascal

Well. Structured. Pretty popular in the early 90s. 

Again is this widely adopted?

COBOL

Old. Never programmed extensively in it – just for year 2000.

Totally Readable.

Funny (rediculous) numbering scheme.

Seems to be necessary to use GOTO in some cases which I do not believe.

Smalltalk

Beautiful language.

Should have become the de facto OO programming language but failed for unclear reasons.

Probably because it was way ahead of it’s time with it’s OO base.

Java

Totally nitty gritty programming language.

Productivity based on frameworks, which no one knows which to use.

Never understood why this language was so widely adopted – besides it’s openness and platform independency.

Should never have become the de facto OO programming language but did so because Sun made it open (good move).

Far too many framework needed. J(2)EE add more complexity than it resolves.

Always upgrade issues. (Proud programmer: We run the application in Java! Fed up IT manager: Which Java?)

Rexx

Can do everything quickly.

But nothing structurally.

Ugly code. Readable but ugly.

Some very very strong concepts.

Php

Hodge podgy language of programming concepts and html.

Likely high programmer productivity if you maintain a stark discipline of programming standards. Stark danger of creating unmaintainable crap code mix of html and php.

Python

Nice structured language.

Difficult to set up and reuse.

Can be productive if nitty gritty setup issues can be overcome.

Ruby (on Rails or off-track)

Nice, probably the most elegant OO language. Too nitty gritty to my taste still. Like it though.

I would start with this language if I had to start today.

What is next

Visual programming? Clicking building blocks together?

In programming we should maybe separate the construction of applications from the coding of functions (or objects, or whatever you call the lower level blocks of code.

Programming complex algorithms (efficiently) will probably always remain a craft for specialists.

Constructing applications from the pieces should be brought to a higher level.

The industry (well – the software selling industry) is looking at microservices but that gives operational issues and becomes too distrubuted.

We need a way to build a house from software bricks and doors and windows and roof elements.

Probably we need more standards for that. 

Another bold statement.

AI systems “programming” themselves is nonsense (I have not seen a shred of evidence). 

AI systems are stochastical systems. 

Programming is imperical.

In summary, up to today you can not build software without getting into the nitty gritty very quickly. 

It’s like building a house but having find your own tree and rocks first to cut wood and blicks from. 

And then contruct nails and screws.

A better approach to that would help.

What do you think is the programming language of the future? What need should it address?

The Internet of Everything – from toilet seats to human bodies

I walked into the restroom. A mechanic stood at the sink fixing something. It saw him holding a toilet seat. He was fooling around with the wiring of the apparatus. Then he replaced some electronics components and rewired the seat.

Toilet sensors

It never occurred to me that even toilets could be usefully equipped with electronic features. I asked the mechanic. He explained that the toilets in the building are all connected to the Internet. If there is something wrong with the antiseptic fluid produced by the toilet, it starts calling out for help. He told me that the towel dispenser was also connected to the Internet, so that when it runs out, a maintenance operator is called in. Makes sense.

Never has technology so much helped improve the The Loo.

To cell sensors

So all things will be supplied with sensors. And it looks like these sensorized things are getting smaller and smaller and a reaching the nano space.

Sensors are gtheetting so small that they can flow through our blood and mend our bodies. And maybe fix cancer cells in the future. Or detect issues with blood vessels. Or measure the chemistry in our bodies. They can be injected in plants to protect themselves from diseases. Or be used in constructions to measure stability at smaller scales than we had ever assumed possible. Possibilities beyond imagination.

Neb sensors surveilling the body 

Imagine what it would mean if we could instrument every cell we like to. I would like a surveillance team of bot swimming through my body, like the Nebuchadnezzar in the Matrix flows through the sewers and tunnels of the abandoned cities.

To signal when my internals run out of supplies.