Automated Testing at LzLabs: Practices, Processes and Numbers

8 April 2020

At LzLabs, we work relentlessly on improving test facilities to deploy new versions of our own product rapidly. We believe our customers can achieve significant agility benefits if they are able to adopt similar testing approaches for the development and integration of mainframe workloads.

The LzLabs approach to automated testing is achieved through our Global Test Harness (GTH). GTH is a swarm of 120+ different Docker containers that we run on each merge request issued by our developers when they want to add new code to the Software Defined Mainframe (SDM).With this tooling, we run approximately 4 million tests each week to continuously improve our product’s quality. Over time, additional tests contribute to increased code coverage.

Each container in a given GTH run contains a subset from the tens of thousands of automated non-regression tests that our QA department requires to pass before accepting the merge. Indeed, at this stage, the execution is predominantly successful on the first trial because GTH is available on-demand to all developers. In our standard“shift-left testing”approach, our developers launch GTH several times a day to ensure that their latest code changes will “pass the bar” on first trial and thus avoid any issues when the corresponding merge request is issued for the new version to go into production.

 

Purpose: To shift-left and execute the full set of automated tests against every SDM branch

The 15,000+ new container images that we produce weekly – to validate the very latest RPMs of our product – have a naturally short life: they run once to validate non-regression and are discarded immediately when 100% of the automated tests pass successfully with the latest RPMs.

When one or more tests fail, we can leverage the power of containers as our images are self-contained in terms of Linux libraries, SDM binaries and application artefacts (programs + data). QA pulls them off the container registry to re-run them on a local machine to confirm the defect that was identified. When the defect is confirmed, the developer can then pull the Docker image from the same registry and launch a corresponding instance of the image to obtain the same error with 100% fidelity due the immutability properties of the container image. The developer is then able to debug it “comfortably” on his laptop rather than in the more stringent environment of our GTH cluster.

So why do we use a suite of 120+ containers executing in parallel?

Firstly,we have defined a stringent 20-minute time-limit for our Development or QA staff to obtain the results of a GTH run. A key point of Continuous Testing is that you want your team to respond quickly to the recurring question “Does the product with the new code still pass the bar of non-regression tests?”. A parallel fleet of containers can execute in less that 20 minutes, whereas a sequential execution of GTH would take around 40 hours to obtain an answer to this question!

Clearly, a 40 hour wait would be unsustainable as our developers are interdependent among themselves and work on tight schedules: new features are promised to our customers so code merges must be seamless.To make the process even more efficient, all containers of a given execution report centrally to the central service database the success or failure of each test that they hosted. The submitter of this GTH instance then gets all results in an aggregated form through an email allowing him to determine in a single glance if all of the tests passed.

The second reason we run hundreds of containers in parallel is because each container is self-contained: it can embark Linux libraries and SDM RPMs at different levels to validate the compatibility of our software with various versions of the operating system on which we operate. It is far more efficient to do it this way than via the installation of standard virtual machines (KVM, VMWare, Hyper-V, etc.): we don’t have to dedicate hardware and don’t have to install a full system before running the application tests.

Thirdly, it helps with the scalability of GTH, allowing us to continually add news tests. Some of them we develop in-house via our QA team, and some we receive from customers (either for permanent use or for temporary use during a given project). This growing test harness requires additional hardware on a recurring basis. We don’t statically allocate any hardware to a given set of functional tests: we rather grow our cluster of x86 commodity servers and use smart orchestration mechanisms (Kubernetes, Jenkins, etc.) to dispatch the GTH containers to be executed on the available and most appropriate machines. This orchestration allows very dynamic (re)definition of the execution infrastructure: orchestrated containers probably represent  the best currently available incarnation of “Infrastructure As Code” as Wikipedia defines it.

User-initiated build and test (PL-on-demand)

Finally, containers allow us to more easily test the interoperability with distributed systems surrounding most mainframe applications. For business continuity purposes, most customers don’t have just one but at least two physical machines each segregated into multiple so-called logical partitions. A container loaded with SDM and corresponding application naturally reflects such a partition. So, we can easily simulate multiple logical partitions by launching container images in parallel, interconnect them over the Kubernetes-provided software-defined network, and then test their tight interactions, via ad-hoc SDM features, to see if they deliver the expected service level agreement (SLA) definition in terms of reliability and performance, even when all kinds of failures (generated on purpose) happen.

Then we can embed the front-end applications in additional containers simulating the Linux server on which they usually reside. All in all, we can simulate a wide variety of mainframe configurations and their surrounding x86 front-ends via a set of containers interconnected over the TCP/IP network provided by the orchestrator. This allows us to test on-demand configurations that are almost impossible to implement in daily IT operations, except at very high-cost, thus being accessible only to companiesrunningthe highest-end of the mainframe spectrum!

An interesting number to be reported here is “0”-which should in fact be considered a positive number(!) in this situation–which denotes the number of constraints brought about by these containers. This is an example of portability at its best: an LPAR described above is now just a single self-contained file that can be moved around easily and run anywhere unchanged. The consequence is a highly flexible architecture where the same container image can be executed on-premise or in the cloud (even via a serverless feature). It becomes purely a decision of the orchestrator to define where each image of a given GTH run will occur. This decision can then be driven externallytoGTH, based on systems availability, cost optimization, and so on.

Architecture

As previously mentioned, we want to share these numerous benefits with our customers: they can use the same technology we developed not only to test our product, but rather to leverage our GTH for the testing of their own applications: instead of replacing the SDM on each run as we do, they will replace their application with a newer version.

As we previously described in our webinar, "How Legacy Applications can Integrate Perfectly with Modern Development Pipelines", most mainframe shops don’t do much automated testing because running large test harnesses on a mainframe tends to be far too expensive. The LzLabs internal approach, x86-based so incredibly cheaply when compared to a mainframe, and applied to application testing by customers, allows them to switch to a canonical continuous integration/continuous development (CI/CD)approach delivering a significant increase in agility over a very short time frame. It doesn’t necessarily imply the rehosting of the production system on SDM: the application can remain on the mainframe for operational use.

Interested in a state-of-the-art DevOps approach for your mission-critical workloads? Please get in touch with us for more details: we‘ll be happy to discuss how to best implement SDM-based automated testing in your own environment.



Use Case

LzSDM: Agile Test Environment

Building replica environments for the purpose of mainframe testing is both time and cost intensive.

LzSDM: Agile Test Environment Use CaseNow, organizations have the opportunity to create test environments for mainframe applications easily and efficiently in Linux. Making greater use of modern DevOps tools and automation, it is possible to test mainframe applications faster and at significantly reduced cost, deploying innovations into production much faster.

In this use case for Software Defined Mainframe, we outline the opportunities available to organizations in more agile mainframe application testing, and how our technology can enable this for your business.

Download the Use Case

Popular Articles

Related Topics