cgroups v2 Resource Management on Linux

cgroups v2 (control groups version 2) is the Linux kernel mechanism for organizing processes into hierarchical groups and applying resource limits for CPU, memory, and I/O. This guide covers cgroups v2 configuration, CPU and memory limits, I/O throttling, delegation, systemd integration, and monitoring resource usage on Linux servers.

Prerequisites

  • Linux kernel 5.2+ (Ubuntu 20.04+, CentOS/Rocky 8+, Debian 11+)
  • Root access
  • systemd 244+ (check: systemctl --version)
  • Verify cgroups v2 is active: cat /sys/fs/cgroup/cgroup.controllers

Understanding cgroups v2 Architecture

cgroups v2 uses a unified hierarchy mounted at /sys/fs/cgroup/:

# Verify cgroups v2 is mounted
mount | grep cgroup2
# Should show: cgroup2 on /sys/fs/cgroup type cgroup2

# View the cgroup hierarchy
ls /sys/fs/cgroup/

# Check available controllers
cat /sys/fs/cgroup/cgroup.controllers
# Expected output: cpuset cpu io memory hugetlb pids rdma

Key differences from cgroups v1:

  • Single unified hierarchy (not per-subsystem mounts)
  • Controller delegation is safer
  • Pressure stall information (PSI) for resource pressure
  • Better container isolation
# View your system's cgroup tree
systemd-cgls

# View process cgroup membership
cat /proc/$$/cgroup

CPU Resource Control

Two mechanisms control CPU in cgroups v2: weight (relative share) and quota (absolute limit).

# View CPU controller files for a cgroup
ls /sys/fs/cgroup/system.slice/myapp.service/
# Relevant: cpu.weight, cpu.max, cpu.stat, cpu.pressure

CPU Weight (relative scheduling):

# Default weight is 100. Range: 1-10000
# Higher = more CPU time relative to siblings

# Set weight for a service via systemd
sudo systemctl set-property myapp.service CPUWeight=200   # 2x default priority
sudo systemctl set-property low-prio.service CPUWeight=50  # Half default priority

CPU Quota (absolute limits):

# cpu.max format: "quota period" (microseconds)
# "200000 100000" = 200ms quota per 100ms period = 2 CPUs
# "50000 100000" = 50% of one CPU

# Limit a service to 0.5 CPU
echo "50000 100000" | sudo tee /sys/fs/cgroup/system.slice/myapp.service/cpu.max

# Or via systemd (persistent):
sudo systemctl set-property myapp.service CPUQuota=50%    # 50% of one CPU
sudo systemctl set-property myapp.service CPUQuota=150%   # 1.5 CPUs

CPUSet — pin to specific CPUs:

# Pin service to CPUs 0 and 1 only
sudo systemctl set-property myapp.service AllowedCPUs=0-1

# Verify
cat /sys/fs/cgroup/system.slice/myapp.service/cpuset.cpus

Check CPU stats:

cat /sys/fs/cgroup/system.slice/myapp.service/cpu.stat
# usage_usec 45678901    <- total CPU time consumed
# user_usec 23456789
# system_usec 22222112
# throttled_usec 1234    <- time spent throttled

Memory Limits and Protection

cgroups v2 has two memory limit levels: soft (high) and hard (max).

# Memory controller files
ls /sys/fs/cgroup/system.slice/myapp.service/ | grep memory
# memory.current, memory.high, memory.max, memory.min, memory.low
# Set memory limits via systemd (persistent)
sudo systemctl set-property myapp.service MemoryHigh=400M   # Soft limit—throttles
sudo systemctl set-property myapp.service MemoryMax=512M    # Hard limit—OOM kill
sudo systemctl set-property myapp.service MemoryMin=64M     # Protected memory

# Or directly (temporary, lost on restart)
echo $((400 * 1024 * 1024)) | sudo tee \
  /sys/fs/cgroup/system.slice/myapp.service/memory.high

echo $((512 * 1024 * 1024)) | sudo tee \
  /sys/fs/cgroup/system.slice/myapp.service/memory.max

Monitor memory usage:

# Current memory usage
cat /sys/fs/cgroup/system.slice/myapp.service/memory.current

# Detailed memory stats
cat /sys/fs/cgroup/system.slice/myapp.service/memory.stat

# Memory pressure (PSI)
cat /sys/fs/cgroup/system.slice/myapp.service/memory.pressure
# some avg10=0.00 avg60=0.00 avg300=0.00 total=0
# full avg10=0.00 avg60=0.00 avg300=0.00 total=0

OOM killer configuration:

# Prefer to kill this service if memory is low (higher score = more likely to be killed)
echo 500 | sudo tee /proc/$(pidof myapp)/oom_score_adj

# Via systemd unit file:
# OOMScoreAdjust=500   (in [Service] section)

I/O Throttling

The io controller limits read/write bandwidth and IOPS per block device.

# Find your block device major:minor numbers
cat /proc/diskstats | grep -E "sda|nvme0" | awk '{print $1, $2, $3}'
# Example: 8 0 sda -> major=8, minor=0

# I/O controller files
ls /sys/fs/cgroup/system.slice/myapp.service/ | grep "^io"
# io.max, io.weight, io.stat, io.pressure
# Set I/O weight (relative, 1-10000, default 100)
sudo systemctl set-property myapp.service IOWeight=50   # Half I/O priority

# Limit bandwidth for specific device (major:minor rbps=bytes wbps=bytes)
# Limit to 50MB/s read, 20MB/s write on sda (8:0)
echo "8:0 rbps=52428800 wbps=20971520" | sudo tee \
  /sys/fs/cgroup/system.slice/myapp.service/io.max

# Limit IOPS
echo "8:0 riops=1000 wiops=500" | sudo tee \
  /sys/fs/cgroup/system.slice/myapp.service/io.max

# Via systemd (persistent):
sudo systemctl set-property myapp.service IOReadBandwidthMax="/dev/sda 50M"
sudo systemctl set-property myapp.service IOWriteBandwidthMax="/dev/sda 20M"

Check I/O stats:

cat /sys/fs/cgroup/system.slice/myapp.service/io.stat
# 8:0 rbytes=123456789 wbytes=987654321 rios=1234 wios=5678 ...

# I/O pressure
cat /sys/fs/cgroup/system.slice/myapp.service/io.pressure

systemd and cgroups v2 Integration

systemd automatically creates cgroups for every service. Use unit file properties to configure limits persistently:

sudo nano /etc/systemd/system/myapp.service
[Unit]
Description=My Application

[Service]
Type=simple
User=myapp
ExecStart=/opt/myapp/bin/app

# CPU limits
CPUWeight=100
CPUQuota=200%         # Up to 2 CPUs
AllowedCPUs=0-3       # Only on CPUs 0-3

# Memory limits
MemoryHigh=1G
MemoryMax=1.5G
MemorySwapMax=0       # Disable swap for this service
MemoryMin=256M

# I/O limits
IOWeight=100
IOReadBandwidthMax=/dev/sda 100M
IOWriteBandwidthMax=/dev/sda 50M

# Task limits
TasksMax=256

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl restart myapp.service

# Verify cgroup properties are applied
systemctl show myapp.service | grep -E "^CPU|^Memory|^IO|^Tasks"

Delegation and User Namespaces

Delegation lets unprivileged users manage their own sub-cgroups (used by rootless containers).

# Enable cgroup delegation for a service
sudo nano /etc/systemd/system/myapp.service
[Service]
# Enable delegation so the service can create sub-cgroups
Delegate=yes
DelegateSubgroup=app

# The service user can then write to its cgroup
User=myapp
# Verify delegation
cat /sys/fs/cgroup/system.slice/myapp.service/cgroup.subtree_control

# Rootless Podman/Docker use delegation automatically
# Check with:
podman info | grep cgroupVersion

Monitoring cgroups v2

# Real-time cgroup resource monitor
sudo systemd-cgtop

# One-shot snapshot
sudo systemd-cgtop -1 -n 1

# Monitor with cgroup-tools
sudo apt install cgroup-tools   # Ubuntu
cgget -r memory.current system.slice/myapp.service

# Watch memory pressure
watch -n 1 cat /sys/fs/cgroup/system.slice/myapp.service/memory.pressure

# Custom monitoring script
cat << 'EOF' > /usr/local/bin/cgroup-monitor.sh
#!/bin/bash
SERVICE="${1:-myapp.service}"
CGROUP_PATH="/sys/fs/cgroup/system.slice/$SERVICE"

while true; do
    MEM=$(cat "$CGROUP_PATH/memory.current" 2>/dev/null)
    CPU=$(cat "$CGROUP_PATH/cpu.stat" 2>/dev/null | grep usage_usec | awk '{print $2}')
    echo "$(date): MEM=${MEM}B CPU=${CPU}us"
    sleep 5
done
EOF
chmod +x /usr/local/bin/cgroup-monitor.sh

Container Resource Limits

Docker and Podman use cgroups v2 for container limits:

# Verify Docker uses cgroups v2
docker info | grep "Cgroup Version"

# Run container with resource limits
docker run -d \
  --name mycontainer \
  --cpus="1.5" \           # 1.5 CPU cores
  --memory="512m" \         # 512MB memory limit
  --memory-swap="512m" \    # Disable swap (swap = memory)
  --blkio-weight=50 \       # I/O weight
  nginx

# Check container cgroup
docker inspect mycontainer | grep -i cgroup

# View container cgroup path
cat /proc/$(docker inspect --format '{{.State.Pid}}' mycontainer)/cgroup

# Monitor container resources
docker stats mycontainer

Podman rootless containers:

# Run rootless container with limits
podman run -d \
  --name mypod \
  --cpus="0.5" \
  --memory="256m" \
  nginx

# Check cgroup path for rootless container
podman inspect mypod | grep CgroupPath

Troubleshooting

cgroups v2 not active (system uses v1):

# Check
stat -fc %T /sys/fs/cgroup/
# "cgroup2fs" = v2, "tmpfs" = v1

# Enable v2 on Ubuntu (add to GRUB)
sudo nano /etc/default/grub
# Add to GRUB_CMDLINE_LINUX: systemd.unified_cgroup_hierarchy=1
sudo update-grub && sudo reboot

Memory limit not enforced:

# Verify memory controller is enabled
cat /sys/fs/cgroup/cgroup.controllers | grep memory
# If missing, enable it at root:
echo "+memory" | sudo tee /sys/fs/cgroup/cgroup.subtree_control

CPUQuota has no effect:

# Check if cpu controller is available
cat /sys/fs/cgroup/system.slice/cgroup.controllers | grep cpu
# Check throttled_usec increasing
watch cat /sys/fs/cgroup/system.slice/myapp.service/cpu.stat

Property changes not persisting:

# systemctl set-property creates override files
ls /etc/systemd/system/myapp.service.d/
# If you changed cpu.max directly, it resets on service restart
# Always use systemctl set-property or unit file [Service] directives

Conclusion

cgroups v2 provides a clean, unified interface for resource management on Linux, with tight integration into systemd for per-service limits. By configuring CPU weight and quota, memory high/max boundaries, and I/O bandwidth limits in your service unit files, you can prevent any single service from monopolizing system resources. Use systemd-cgtop and pressure stall information (PSI) files to monitor resource contention and tune limits based on actual workload behavior.