Kurly Pay · 2022.09 — 2023.12

OutOfMemoryError During Batch Jobs

  • With the owner having left the company, applied a short-term mitigation for batch OOM and drove a company-wide memory optimization

Issue

  • An OutOfMemoryError occurred during a batch job while the responsible owner had already left the company
  • Analyzed memory growth segments, implemented a short-term mitigation, then expanded it into company-wide optimization

Analysis

After receiving the issue report, I checked the OOM error logs and confirmed that memory had peaked during that time window.

Issue report / OOM error log

Issue report alongside the OutOfMemoryError log from the batch job.

Memory peak confirmed for that time window

Memory usage chart confirming the peak during that time window.

Beyond the immediate OOM, discovered the root cause: JVM default max heap size was misconfigured.

JVM default max heap size analysis

Analysis of the misconfigured JVM default max heap size.

Improvement

I improved the JVM memory options in the Dockerfile and shared the same pattern company-wide so it could be applied consistently.

Follow-up plan (Xmx, autoscaling, log alerts)

Follow-up plan covering Xmx tuning, autoscaling, and log alerts.

Dockerfile JVM option change request

Change request applying the JVM memory options in the Dockerfile.

After the infrastructure work, I confirmed on reprocessing that memory operated stably at up to 553MB.

Reprocessing memory usage confirmed

Reprocessing run confirming stable memory usage up to 553MB.