Industry: Public Sector

Escape Batch Maintenance Hell by Reducing Complexity

Discarding the burden of decades


Read Story
Image AI generated with Google Gemini

Please note: The English version of this success story was translated using AI to make it accessible to our international audience.

tl;dr

  • Industry: Public Administration
  • Application: Central Basic Service for Master Data Management
  • Modernization of Batch Job Technology
  Before After
Code Base 80 Batch Jobs
min. 6 different technologies
All deprecated legacy
1 clear pattern for batches
Cloud-native batch execution in service pods
Maintenance Massive testing effort
Expensive integration tests
Trouble-free release
Lean testing in service code
Ownership Isolated knowledge Any team members
Easy onboarding
Integrity Technology breaks Business logic consolidated
in service code

Situation: 1 problem, 6 different solutions

A federal agency operates a central basic service for master data management. The application comprises a legacy web application for end users, as well as APIs for dependent partner systems—so-called users.

Part of this technology landscape are approximately 80 batch jobs. These jobs have evolved over more than a decade and do not adhere to any common standard or technology stack. The runtime environment, scheduling, and execution of these batch jobs are handled by a separate team. Software packages and dependencies are delivered as JARs to a network drive with a fixed directory convention.

Part of the batch strategy relies on an internally developed proprietary framework. While this framework is not an established industry standard, it was formerly the internal agency standard. It is no longer supported, even within the agency itself. Technically, the process is based on J2EE clients and HTTP interfaces in the service, which are still implemented as low-level servlets—with serialized Java objects as payloads. Due to its high complexity, this batch style is by far the most problematic.

Besides this batch style, at least five others exist—developed historically, decided upon ad hoc by the respective developer, entirely according to preference and experience:

  • Shell scripts
  • Perl scripts
  • SQL Loader commands
  • SQL+ scripts
  • Other SQL CLIs

New versions of the batch jobs are released together with the rest of the application, typically three times a year. Additional interim releases with bug fixes and other non-breaking changes are made more frequently.

Most quality assurance tests must be performed as integration tests rather than in isolation. This is due, on the one hand, to technological limitations (lack of alternatives), and on the other hand, to the distribution of business logic across components, e.g., between Java clients and their associated servlet interfaces. The effort required for quality assurance and regression testing is considerable. The risk of false positives from other sources of error is real. The time required for acceptance testing of new versions precludes rapid releases.

For each delivery, a test server must be configured, and the entire batch logic must be run through using a locally executed test suite. Due to the setup, this process is highly error-prone and time-consuming, as the entire process must always be tested, which alone requires 6-10 hours for testing. Added to this are the costs of coordination and a significant reduction in flexibility.

The customer’s main pain points:

  • High maintenance costs (due to complex quality assurance)
  • Long delivery times (due to knowledge silos)
  • High risk of failure (due to knowledge silos)
  • Increasing knowledge erosion within the team

Challenge: Errors are costly

The application as a whole is extremely critical for all users. Errors can have unpredictable consequences for these partner applications—with high follow-up costs.

The application houses complex business logic—scattered across numerous modules and without any technical enforcement of business integrity. For example, SQL queries manipulate the same entities as domain code in the service, but due to technological limitations, they cannot use the same abstractions.

The goal of this project is standardization and modernization. The logic of all (previously separate) batch jobs is being integrated into the main application. Maintenance and quality assurance costs are significantly reduced. Development teams are no longer dependent on individual experts.

Solution: How to Make Batch Simple and Cloud-Native?

To address the problem of complex testing, a new concept for batch processing is introduced. This concept includes only two implementation options for batch processing to break down knowledge silos and reduce training effort.

Option 1: Simple Imports and Exports

Imports and exports of business object lists (e.g., address data) are performed using simple SQL queries.

Option 2: Batch Jobs with Logic

All business logic is now executed within the main application on the application servers. A requested job run is persisted in the database as a command object. All identical service pods poll using Spring @Scheduling methods—essentially as workers following the competing consumer pattern. A worker requests and locks pending jobs using a simple SELECT FOR UPDATE. This successfully transfers the heavy lifting to the database.

The implementation is custom-built instead of using existing libraries (e.g., jobrunr, Quartz, or db-scheduler).

A three-pronged approach is used for the migration:

  • All legacy batch jobs are recorded in a “technical debt” backlog.
  • Whenever a legacy batch job is modified, it must be migrated.
  • For the particularly challenging J2EE migration, a targeted inventory is conducted, and a migration plan is proactively implemented.

Technology

  • Spring Boot
  • Oracle WebLogic
  • Oracle SQL
  • J2EE
  • Java Servlets
  • Kubernetes
  • Rancher
  • Istio
  • UC4/Automic
  • Github Actions
  • Bash
  • Perl

Contact us