Creating Dockerfiles: Complete Guide with Best Practices

Dockerfiles are the blueprint for creating Docker images, defining everything from the base operating system to application dependencies and runtime configurations. This comprehensive guide teaches you how to write efficient, secure, and production-ready Dockerfiles for any application.

Table of Contents

Introduction to Dockerfiles

A Dockerfile is a text document containing instructions for building a Docker image. Each instruction creates a layer in the final image, and Docker caches these layers to speed up subsequent builds. Understanding how to write efficient Dockerfiles is crucial for creating optimized, secure, and maintainable container images.

Why Dockerfiles Matter

  • Reproducibility: Same Dockerfile produces identical images
  • Version Control: Track image changes like source code
  • Automation: Integrate with CI/CD pipelines
  • Documentation: Serves as infrastructure documentation
  • Portability: Build once, run anywhere

Prerequisites

Before creating Dockerfiles, ensure you have:

  • Docker Engine installed and running
  • Basic understanding of Docker concepts
  • Knowledge of your application's dependencies
  • Text editor for writing Dockerfiles
  • Terminal access for building images

Verify Docker installation:

docker --version
docker info

Dockerfile Basics

Basic Structure

A Dockerfile consists of instructions (in uppercase) followed by arguments:

# Comment
INSTRUCTION arguments

Creating Your First Dockerfile

Create a file named Dockerfile (no extension):

mkdir my-docker-app
cd my-docker-app
nano Dockerfile

Simple example:

# Use official base image
FROM ubuntu:22.04

# Set working directory
WORKDIR /app

# Copy application files
COPY app.py .

# Install dependencies
RUN apt-get update && apt-get install -y python3

# Define command to run
CMD ["python3", "app.py"]

File Naming

  • Standard name: Dockerfile (recommended)
  • Custom names: Dockerfile.dev, Dockerfile.prod
  • Build with custom name: docker build -f Dockerfile.dev .

Essential Dockerfile Instructions

FROM - Base Image

Specifies the parent image for your build:

# Official image
FROM ubuntu:22.04

# Specific version (recommended)
FROM node:18.17.0-alpine

# Multiple stages
FROM node:18 AS builder
FROM nginx:alpine AS production

Best Practices:

  • Always specify version tags (avoid latest)
  • Use official images when possible
  • Prefer Alpine-based images for smaller size

LABEL - Metadata

Add metadata to images:

LABEL maintainer="[email protected]"
LABEL version="1.0"
LABEL description="Production web application"
LABEL org.opencontainers.image.source="https://github.com/user/repo"

WORKDIR - Working Directory

Sets the working directory for subsequent instructions:

# Set working directory
WORKDIR /app

# Creates directory if it doesn't exist
WORKDIR /var/www/html

# Relative paths work too
WORKDIR /app
WORKDIR src  # Now in /app/src

Best Practice: Use absolute paths and set WORKDIR before COPY/ADD.

COPY vs ADD

Copy files from build context to image:

# COPY - Simple file copying (preferred)
COPY package.json .
COPY src/ /app/src/
COPY --chown=appuser:appuser app.py /app/

# ADD - Advanced features (use sparingly)
ADD https://example.com/file.tar.gz /tmp/  # Downloads URL
ADD archive.tar.gz /app/  # Auto-extracts archives

Best Practice: Use COPY unless you need ADD's special features.

RUN - Execute Commands

Executes commands during image build:

# Shell form (runs in /bin/sh -c)
RUN apt-get update && apt-get install -y curl

# Exec form (recommended)
RUN ["/bin/bash", "-c", "echo hello"]

# Multiple commands (chain with &&)
RUN apt-get update && \
    apt-get install -y \
        python3 \
        python3-pip \
        curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install Python packages
RUN pip install --no-cache-dir -r requirements.txt

Best Practices:

  • Chain commands with && to reduce layers
  • Clean up in the same RUN command
  • Use --no-cache-dir for pip installations
  • Remove package manager caches

ENV - Environment Variables

Set environment variables:

# Set single variable
ENV NODE_ENV=production

# Set multiple variables
ENV APP_HOME=/app \
    APP_USER=appuser \
    APP_PORT=3000

# Use in subsequent commands
ENV PATH="/app/bin:${PATH}"

ARG - Build Arguments

Define build-time variables:

# Define argument with default
ARG NODE_VERSION=18
ARG BUILD_DATE

# Use in FROM
FROM node:${NODE_VERSION}-alpine

# Use in RUN
RUN echo "Built on ${BUILD_DATE}"

# ARG vs ENV
ARG BUILD_ENV=dev
ENV RUNTIME_ENV=${BUILD_ENV}  # Convert ARG to ENV

Build with arguments:

docker build --build-arg NODE_VERSION=20 --build-arg BUILD_DATE=$(date -u +"%Y-%m-%d") .

EXPOSE - Document Ports

Documents which ports the container listens on:

# Single port
EXPOSE 8080

# Multiple ports
EXPOSE 80 443

# With protocol
EXPOSE 8080/tcp
EXPOSE 53/udp

Note: EXPOSE is documentation only. Use -p flag when running container.

USER - Set User

Specify which user runs the container:

# Create user and switch
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

# Switch to user by UID
USER 1000

# Switch back to root if needed
USER root
RUN apt-get install -y something
USER appuser

Best Practice: Always run containers as non-root user.

VOLUME - Mount Points

Create mount points:

# Define volumes
VOLUME /data
VOLUME ["/var/log", "/var/db"]

Note: Cannot specify host path in Dockerfile. Use -v flag when running.

CMD vs ENTRYPOINT

Define container's default command:

# CMD - Can be overridden
CMD ["nginx", "-g", "daemon off;"]
CMD ["python", "app.py"]
CMD node server.js  # Shell form

# ENTRYPOINT - Main executable
ENTRYPOINT ["python", "app.py"]

# ENTRYPOINT + CMD (arguments)
ENTRYPOINT ["python"]
CMD ["app.py"]
# Override: docker run image script.py

# Use both for flexibility
ENTRYPOINT ["./docker-entrypoint.sh"]
CMD ["start"]

Best Practices:

  • Use ENTRYPOINT for main executable
  • Use CMD for default arguments
  • Prefer exec form over shell form

HEALTHCHECK

Define container health check:

# HTTP health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

# Simple health check
HEALTHCHECK CMD pg_isready -U postgres || exit 1

# Disable inherited health check
HEALTHCHECK NONE

ONBUILD

Add trigger instructions:

# In base image
ONBUILD COPY package.json /app/
ONBUILD RUN npm install

# Triggers when image is used as base
FROM my-base-image  # Executes ONBUILD instructions

Building Docker Images

Basic Build Command

# Build from current directory
docker build -t my-app:latest .

# Build from different directory
docker build -t my-app:latest /path/to/context

# Build with custom Dockerfile
docker build -t my-app:latest -f Dockerfile.prod .

Build Context

The build context is the set of files at the specified PATH or URL:

# Current directory
docker build .

# Specific directory
docker build /path/to/context

# Git repository
docker build https://github.com/user/repo.git#branch

Tagging Images

# Single tag
docker build -t my-app:1.0 .

# Multiple tags
docker build -t my-app:1.0 -t my-app:latest .

# With registry
docker build -t registry.example.com/my-app:1.0 .

Build Arguments

# Pass build arguments
docker build --build-arg ENV=production --build-arg VERSION=1.0 .

# Multiple arguments from file
docker build --build-arg-file build-args.txt .

.dockerignore File

Exclude files from build context:

# .dockerignore
.git
.gitignore
.env
node_modules
npm-debug.log
Dockerfile
.dockerignore
README.md
*.md
.vscode
.idea

Build Options

# No cache
docker build --no-cache -t my-app:latest .

# Pull latest base image
docker build --pull -t my-app:latest .

# Specify target stage
docker build --target production -t my-app:latest .

# Set memory limit
docker build --memory 2g -t my-app:latest .

# Squash layers (experimental)
docker build --squash -t my-app:latest .

Multi-Stage Builds

Multi-stage builds create optimized production images by separating build and runtime environments:

Basic Multi-Stage Build

# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER node
CMD ["node", "dist/server.js"]

Multiple Stages

# Dependencies stage
FROM node:18-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Test stage
FROM builder AS tester
RUN npm run test

# Production stage
FROM node:18-alpine AS production
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package*.json ./
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

Build specific stage:

# Build and test
docker build --target tester -t my-app:test .

# Build production
docker build --target production -t my-app:latest .

Copy From External Images

# Copy from specific image
FROM alpine:latest
COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf

Real-World Examples

Node.js Application

# Multi-stage Node.js app
FROM node:18-alpine AS builder

WORKDIR /app

# Copy dependency files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production && \
    npm cache clean --force

# Copy source code
COPY . .

# Build application
RUN npm run build

# Production stage
FROM node:18-alpine

# Add non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy built files and dependencies
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./

# Set environment
ENV NODE_ENV=production

# Switch to non-root user
USER nodejs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD node healthcheck.js

# Start application
CMD ["node", "dist/server.js"]

Python Flask Application

FROM python:3.11-slim AS builder

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc && \
    rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .

# Install Python dependencies
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.11-slim

WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /root/.local /root/.local

# Copy application
COPY . .

# Create non-root user
RUN useradd -m -u 1000 appuser && \
    chown -R appuser:appuser /app

# Update PATH
ENV PATH=/root/.local/bin:$PATH

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 5000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
  CMD python -c "import requests; requests.get('http://localhost:5000/health', timeout=2)"

# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]

Go Application

# Build stage
FROM golang:1.21-alpine AS builder

WORKDIR /app

# Copy go mod files
COPY go.mod go.sum ./

# Download dependencies
RUN go mod download

# Copy source code
COPY . .

# Build binary
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Production stage
FROM alpine:latest

# Install ca-certificates for HTTPS
RUN apk --no-cache add ca-certificates

WORKDIR /root/

# Copy binary from builder
COPY --from=builder /app/main .

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1

# Run binary
CMD ["./main"]

Java Spring Boot Application

# Build stage
FROM maven:3.9-eclipse-temurin-17 AS builder

WORKDIR /app

# Copy pom.xml
COPY pom.xml .

# Download dependencies
RUN mvn dependency:go-offline

# Copy source code
COPY src ./src

# Build application
RUN mvn clean package -DskipTests

# Production stage
FROM eclipse-temurin:17-jre-alpine

WORKDIR /app

# Create non-root user
RUN addgroup -S spring && adduser -S spring -G spring

# Copy JAR from builder
COPY --from=builder /app/target/*.jar app.jar

# Change ownership
RUN chown spring:spring app.jar

# Switch to non-root user
USER spring

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
  CMD wget --quiet --tries=1 --spider http://localhost:8080/actuator/health || exit 1

# Run application
ENTRYPOINT ["java", "-jar", "/app/app.jar"]

Nginx Static Site

# Build stage (optional, for building static assets)
FROM node:18-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine

# Copy custom nginx config
COPY nginx.conf /etc/nginx/nginx.conf

# Copy static files
COPY --from=builder /app/dist /usr/share/nginx/html

# Create non-root user (nginx already exists)
RUN chown -R nginx:nginx /usr/share/nginx/html && \
    chown -R nginx:nginx /var/cache/nginx && \
    chown -R nginx:nginx /var/log/nginx && \
    chown -R nginx:nginx /etc/nginx/conf.d

RUN touch /var/run/nginx.pid && \
    chown -R nginx:nginx /var/run/nginx.pid

# Switch to non-root user
USER nginx

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --quiet --tries=1 --spider http://localhost:8080 || exit 1

# Start nginx
CMD ["nginx", "-g", "daemon off;"]

Optimization Techniques

Layer Caching

# Bad - Changes to code invalidate all layers
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install

# Good - Dependencies cached separately
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .

Minimize Layers

# Bad - Multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN apt-get clean

# Good - Single layer
RUN apt-get update && \
    apt-get install -y \
        curl \
        git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Use .dockerignore

node_modules
.git
.env
*.log
.DS_Store
coverage
.vscode

Choose Smaller Base Images

# Large (900MB+)
FROM ubuntu:22.04

# Medium (200MB)
FROM node:18

# Small (50MB)
FROM node:18-slim

# Smallest (40MB)
FROM node:18-alpine

Remove Unnecessary Files

RUN apt-get update && \
    apt-get install -y build-essential && \
    # ... compile something ... && \
    apt-get remove -y build-essential && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Security Best Practices

Run as Non-Root User

# Create and use non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

# Or with Alpine
RUN addgroup -S appuser && adduser -S appuser -G appuser
USER appuser

Scan for Vulnerabilities

# Using Docker Scout
docker scout cves my-app:latest

# Using Trivy
trivy image my-app:latest

# Using Snyk
snyk container test my-app:latest

Use Specific Tags

# Bad - unpredictable
FROM node:latest

# Good - predictable and secure
FROM node:18.17.0-alpine3.18

Minimize Attack Surface

# Use distroless images for minimal attack surface
FROM gcr.io/distroless/nodejs:18

# Or minimal Alpine
FROM alpine:3.18

Don't Include Secrets

# Bad - secrets in image
ENV API_KEY=secret123

# Good - pass at runtime
# docker run -e API_KEY=secret123 my-app

Use COPY Instead of ADD

# Preferred
COPY app.py .

# Avoid unless needed
ADD archive.tar.gz /app/

Verify Downloads

# Verify checksum
RUN curl -fsSL https://example.com/file -o /tmp/file && \
    echo "expected_hash /tmp/file" | sha256sum -c -

Troubleshooting

Build Fails at RUN Command

# Show build output
docker build --progress=plain .

# Debug specific layer
docker run -it <layer_id> sh

Image Size Too Large

# Check layer sizes
docker history my-app:latest

# Analyze with dive
dive my-app:latest

Cache Not Working

# Force rebuild without cache
docker build --no-cache .

# Check what changed
docker build --progress=plain .

Permission Denied Errors

# Ensure proper ownership
COPY --chown=appuser:appuser app.py /app/

# Or fix after copy
RUN chown -R appuser:appuser /app

Conclusion

Writing effective Dockerfiles is fundamental to containerization success. This guide covered everything from basic syntax to advanced multi-stage builds and security practices.

Key Takeaways

  • Layer Optimization: Order instructions from least to most frequently changing
  • Multi-Stage Builds: Separate build and runtime environments
  • Security First: Always run as non-root, use specific tags, scan for vulnerabilities
  • Size Matters: Use Alpine images, minimize layers, leverage .dockerignore
  • Best Practices: Follow conventions, document with LABEL, implement health checks

Dockerfile Checklist

  • Use specific version tags for base images
  • Implement multi-stage builds for compiled languages
  • Run containers as non-root user
  • Add health check instruction
  • Create .dockerignore file
  • Minimize number of layers
  • Clean up in same RUN command
  • Use COPY instead of ADD
  • Set appropriate WORKDIR
  • Document exposed ports
  • Add metadata labels
  • Implement proper logging

Next Steps

  1. Practice: Build Dockerfiles for your applications
  2. Optimize: Use dive to analyze and reduce image size
  3. Secure: Implement vulnerability scanning in CI/CD
  4. Document: Add comprehensive LABEL metadata
  5. Test: Create test stages in multi-stage builds
  6. Automate: Integrate builds into CI/CD pipelines
  7. Monitor: Track image sizes and build times

With these Dockerfile best practices, you're equipped to create efficient, secure, and maintainable container images for any application stack.