Kubernetes Job and CronJob Configuration

Jobs and CronJobs enable running batch workloads and scheduled tasks in Kubernetes. This guide covers Job specification with parallelism and completion semantics, CronJob scheduling, history management, suspension, and best practices for batch processing on your VPS and baremetal Kubernetes infrastructure.

Table of Contents

Jobs Fundamentals

What is a Job?

A Job creates one or more Pods and ensures they complete successfully. It tracks completion and retries on failure.

Job vs Other Controllers

FeatureJobDeploymentStatefulSet
PurposeBatch tasksLong-running appsStateful apps
CompletionCompletesRuns indefinitelyRuns indefinitely
RestartOn failureAlwaysAlways
StorageShared volumesPersistentPersistent

Job Lifecycle

Pending → Active → Succeeded/Failed → Complete

Job Configuration

Basic Job

apiVersion: batch/v1
kind: Job
metadata:
  name: simple-job
  namespace: batch
spec:
  template:
    spec:
      containers:
      - name: task
        image: busybox:1.35
        command: ["echo", "Hello from Job"]
      restartPolicy: Never
  backoffLimit: 3
  ttlSecondsAfterFinished: 3600

Create the job:

kubectl apply -f job.yaml
kubectl get jobs -n batch
kubectl describe job simple-job -n batch

Job Parameters

template: Pod template for job pods

backoffLimit: Max failed pod creation attempts (default: 6)

parallelism: Number of pods running in parallel

completions: Number of successful pods needed (default: 1)

activeDeadlineSeconds: Max execution time in seconds

ttlSecondsAfterFinished: Delete job after N seconds

Simple Completion

Job completes after one successful pod:

apiVersion: batch/v1
kind: Job
metadata:
  name: single-completion-job
spec:
  template:
    spec:
      containers:
      - name: task
        image: python:3.11
        command:
        - python
        - -c
        - |
          import time
          for i in range(10):
              print(f"Progress: {i+1}/10")
              time.sleep(1)
          print("Task complete!")
      restartPolicy: Never
  backoffLimit: 3
  activeDeadlineSeconds: 300

Parallel Jobs

Multiple workers in parallel:

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-job
spec:
  parallelism: 4
  completions: 10
  template:
    spec:
      containers:
      - name: worker
        image: myworker:1.0
        env:
        - name: JOB_COMPLETION_INDEX
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
      restartPolicy: Never
  backoffLimit: 2

Work Queue Pattern

Multiple workers without specified completions:

apiVersion: batch/v1
kind: Job
metadata:
  name: work-queue-job
spec:
  parallelism: 3
  template:
    spec:
      containers:
      - name: worker
        image: worker:latest
        command:
        - /bin/sh
        - -c
        - |
          while true; do
            # Get work from queue
            work=$(redis-cli BLPOP job_queue 0 | tail -1)
            if [ $? -eq 0 ]; then
              echo "Processing: $work"
              # Process work
              sleep 5
            else
              exit 0
            fi
          done
      restartPolicy: Never
  activeDeadlineSeconds: 3600

Job Patterns

Index-Based Job

Process specific indices in parallel:

apiVersion: batch/v1
kind: Job
metadata:
  name: indexed-job
spec:
  parallelism: 4
  completions: 20
  completionMode: Indexed
  template:
    spec:
      containers:
      - name: task
        image: task-processor:1.0
        env:
        - name: TASK_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
      restartPolicy: Never
  backoffLimit: 2

Batch Processing Job

Process data in batches:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
spec:
  parallelism: 5
  completions: 100
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      containers:
      - name: processor
        image: batch-processor:1.0
        volumeMounts:
        - name: data
          mountPath: /data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: batch-data
      restartPolicy: Never
  backoffLimit: 3
  activeDeadlineSeconds: 86400

CronJobs

CronJob Basics

CronJobs schedule Jobs based on cron expressions.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:1.0
            command:
            - /bin/sh
            - -c
            - /scripts/backup.sh
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  suspend: false

Cron Schedule Format

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * *

Common schedules:

"0 0 * * *"      # Daily at midnight
"0 */4 * * *"    # Every 4 hours
"0 9 * * 1-5"    # Weekdays at 9 AM
"*/15 * * * *"   # Every 15 minutes
"0 0 1 * *"      # Monthly on 1st
"0 0 * * 0"      # Weekly on Sunday

CronJob with Environment Variables

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scheduled-cleanup
spec:
  schedule: "0 3 * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup
            image: cleanup-tool:1.0
            env:
            - name: CLEANUP_DAYS
              value: "30"
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3

Timezone Support

CronJobs run in UTC by default. Set timezone:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: timezone-job
spec:
  schedule: "0 9 * * 1-5"
  timeZone: "America/New_York"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: task
            image: task:1.0
          restartPolicy: OnFailure

Advanced Features

Job Suspend

Pause a job without deleting:

kubectl patch job simple-job -p '{"spec":{"suspend":true}}'
kubectl patch job simple-job -p '{"spec":{"suspend":false}}'

Or in YAML:

spec:
  suspend: true

CronJob Suspension

Pause scheduled execution:

kubectl patch cronjob daily-backup -p '{"spec":{"suspend":true}}'

Concurrency Policy

Control simultaneous job execution:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: concurrent-job
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: Forbid  # Allow, Forbid, Replace
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: task
            image: task:1.0
          restartPolicy: OnFailure

Allow: Multiple concurrent jobs

Forbid: Skip job if previous hasn't finished

Replace: Delete previous job and start new

History Management

Control job history retention:

spec:
  successfulJobsHistoryLimit: 5    # Keep 5 successful jobs
  failedJobsHistoryLimit: 3        # Keep 3 failed jobs

Clean old jobs manually:

# Delete all completed jobs
kubectl delete job -n batch --field-selector status.successful=1

# Delete failed jobs
kubectl delete job -n batch --field-selector status.failed=1

Monitoring and Troubleshooting

Viewing Job Status

# List jobs
kubectl get jobs -n batch
kubectl get jobs -n batch -o wide

# View job details
kubectl describe job simple-job -n batch

# Check pod status
kubectl get pods -n batch -l job-name=simple-job
kubectl logs -n batch -l job-name=simple-job

Common Issues

Job stuck in pending:

kubectl describe job stuck-job -n batch
kubectl get pods -l job-name=stuck-job -o yaml | grep -A 5 "events:"

Pods failing:

kubectl logs -n batch <pod-name>
kubectl describe pod -n batch <pod-name>

Jobs not completing:

# Check job status
kubectl get job simple-job -n batch -o yaml | grep -A 10 "status:"

# View events
kubectl get events -n batch --sort-by='.lastTimestamp'

Practical Examples

Example: Database Backup Job

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-script
  namespace: batch
data:
  backup.sh: |
    #!/bin/bash
    set -e
    
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    BACKUP_FILE="/backups/db_backup_${TIMESTAMP}.sql"
    
    echo "Starting database backup..."
    mysqldump -h ${DB_HOST} -u ${DB_USER} -p${DB_PASSWORD} ${DB_NAME} > ${BACKUP_FILE}
    
    echo "Compressing backup..."
    gzip ${BACKUP_FILE}
    
    echo "Backup complete: ${BACKUP_FILE}.gz"
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
  namespace: batch
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: backup-operator
          containers:
          - name: backup
            image: mysql:8.0
            command: ["/scripts/backup.sh"]
            env:
            - name: DB_HOST
              value: mysql.databases.svc
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: username
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: password
            - name: DB_NAME
              value: production
            volumeMounts:
            - name: backup-script
              mountPath: /scripts
            - name: backups
              mountPath: /backups
          volumes:
          - name: backup-script
            configMap:
              name: backup-script
              defaultMode: 0755
          - name: backups
            persistentVolumeClaim:
              claimName: backup-storage
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3

Example: Parallel Data Processing

---
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
  namespace: batch
spec:
  parallelism: 8
  completions: 32
  completionMode: Indexed
  template:
    metadata:
      labels:
        app: data-processor
    spec:
      containers:
      - name: processor
        image: data-processor:1.0
        env:
        - name: TASK_INDEX
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
        - name: TOTAL_TASKS
          value: "32"
        volumeMounts:
        - name: input-data
          mountPath: /data/input
        - name: output-data
          mountPath: /data/output
      volumes:
      - name: input-data
        persistentVolumeClaim:
          claimName: input-data
      - name: output-data
        persistentVolumeClaim:
          claimName: output-data
      restartPolicy: Never
  backoffLimit: 3
  activeDeadlineSeconds: 86400

Example: Scheduled Report Generation

apiVersion: batch/v1
kind: CronJob
metadata:
  name: weekly-report
  namespace: batch
spec:
  schedule: "0 6 * * 1"
  timeZone: "America/New_York"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 604800
      template:
        spec:
          serviceAccountName: report-generator
          containers:
          - name: report
            image: report-generator:1.0
            command:
            - /bin/sh
            - -c
            - |
              python /app/generate_report.py \
                --start-date $(date -d 'last week monday' +%Y-%m-%d) \
                --end-date $(date -d 'yesterday' +%Y-%m-%d) \
                --output /reports/report_$(date +%Y%m%d).pdf
            volumeMounts:
            - name: reports
              mountPath: /reports
          volumes:
          - name: reports
            persistentVolumeClaim:
              claimName: reports-storage
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 12
  failedJobsHistoryLimit: 2

Conclusion

Jobs and CronJobs are essential for batch processing and scheduled tasks in Kubernetes. By properly configuring parallelism, completions, and backoff policies, you create efficient batch workflows. CronJobs provide reliable scheduled execution with history tracking and suspension capabilities. Start with simple single-completion jobs, advance to parallel jobs for performance, and implement CronJobs for production automation. Regular monitoring of job history and logs ensures reliable batch processing on your VPS and baremetal Kubernetes infrastructure.