Internal Developer Platform Design Patterns
An Internal Developer Platform (IDP) provides self-service infrastructure and standardized workflows that let developers ship software without waiting for operations teams. This guide covers the key design patterns for building an IDP, including golden paths, service templates, platform APIs, and toolchain integration for a DevOps-mature organization.
Prerequisites
- Kubernetes cluster with cluster-admin access
- Git-based source control (GitHub, GitLab, or Bitbucket)
- A CI/CD system (GitHub Actions, GitLab CI, Jenkins, or similar)
- Backstage, Port, or similar portal software (optional but recommended)
- Familiarity with GitOps principles
Core IDP Concepts
An IDP consists of several interconnected layers:
| Layer | Purpose | Example Tools |
|---|---|---|
| Portal | Developer-facing UI | Backstage, Port |
| Orchestration | Workflow automation | Argo Workflows, Temporal |
| Infrastructure | Resource provisioning | Crossplane, Terraform |
| Deployment | App delivery | Argo CD, Flux |
| Observability | Monitoring & alerting | Prometheus, Grafana |
| Security | Secrets, policies | Vault, OPA |
Platform team responsibilities:
- Maintain the golden paths (opinionated, working defaults)
- Abstract infrastructure complexity from developers
- Provide self-service primitives via APIs, not tickets
- Own the toolchain integration, not individual applications
Golden Path Templates
Golden paths are pre-configured, opinionated templates that encode best practices for your organization. They remove decision fatigue and ensure compliance:
# Directory structure for golden path templates
platform/
├── templates/
│ ├── microservice-java/
│ │ ├── skeleton/ # Project scaffold
│ │ │ ├── src/
│ │ │ ├── Dockerfile
│ │ │ ├── helm/
│ │ │ └── .github/
│ │ │ └── workflows/
│ │ │ └── ci.yml
│ │ └── template.yaml # Backstage scaffolder template
│ ├── microservice-go/
│ ├── static-frontend/
│ └── data-pipeline/
└── compositions/ # Crossplane compositions
├── postgres-db.yaml
├── redis-cache.yaml
└── message-queue.yaml
Backstage scaffolder template:
# platform/templates/microservice-go/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: go-microservice
title: Go Microservice
description: Create a production-ready Go microservice with CI/CD, observability, and Kubernetes deployment
tags:
- go
- microservice
- recommended
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- owner
- description
properties:
name:
title: Service Name
type: string
pattern: '^[a-z][a-z0-9-]*$'
description: Lowercase, alphanumeric, hyphens allowed
owner:
title: Owner Team
type: string
ui:field: OwnerPicker
description:
title: Description
type: string
enableDatabase:
title: Include PostgreSQL database?
type: boolean
default: false
- title: Infrastructure
properties:
environment:
title: Initial Environment
type: string
enum: [development, staging]
default: development
steps:
- id: fetch-template
name: Fetch Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
description: ${{ parameters.description }}
- id: create-github-repo
name: Create GitHub Repository
action: github:repo:create
input:
repoUrl: github.com?owner=my-org&repo=${{ parameters.name }}
description: ${{ parameters.description }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?owner=my-org&repo=${{ parameters.name }}
defaultBranch: main
- id: register-catalog
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['publish'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
Self-Service Service Catalog
The service catalog is the inventory of all services, their owners, dependencies, and operational status:
# catalog-info.yaml (in every service repo)
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
title: Order Service
description: Manages customer orders and fulfillment
annotations:
github.com/project-slug: my-org/order-service
backstage.io/techdocs-ref: dir:.
prometheus.io/alerts: "order-service"
argocd/app-name: order-service-production
tags:
- java
- orders
- critical
links:
- url: https://grafana.example.com/d/order-service
title: Grafana Dashboard
- url: https://runbook.example.com/order-service
title: Runbook
spec:
type: service
lifecycle: production
owner: group:order-team
system: ecommerce
dependsOn:
- component:default/inventory-service
- component:default/payment-service
- resource:default/orders-database
providesApis:
- order-api
Platform API Design
Expose infrastructure as a simple API that developers interact with via YAML or a portal:
# Developer requests a database via a simple claim
# No knowledge of RDS, subnets, or security groups required
apiVersion: platform.example.com/v1alpha1
kind: Database
metadata:
name: order-service-db
namespace: order-team
spec:
engine: postgres
version: "15"
size: small # Platform team defines: small=db.t3.micro, medium=db.t3.large
backup: true
highAvailability: false
writeConnectionSecretToRef:
name: order-service-db-credentials
# Platform team's composition maps "small" to actual cloud resources
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: database-aws-small
spec:
compositeTypeRef:
apiVersion: platform.example.com/v1alpha1
kind: XDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
instanceClass: db.t3.micro
allocatedStorage: 20
multiAZ: false
Developer Experience Patterns
Pattern 1: Port forwarding service map
# scripts/dev-connect.sh - run locally to connect to all services
#!/bin/bash
# Auto port-forward all services needed for local development
kubectl port-forward svc/redis 6379:6379 -n development &
kubectl port-forward svc/postgres 5432:5432 -n development &
kubectl port-forward svc/kafka 9092:9092 -n development &
echo "Services available:"
echo " Redis: localhost:6379"
echo " Postgres: localhost:5432"
echo " Kafka: localhost:9092"
# Cleanup on Ctrl+C
trap "kill 0" SIGINT
wait
Pattern 2: Standardized environment variables
# All services use the same env var naming convention
# Enforced via admission webhook or OPA policy
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: service-db-credentials
key: connectionString
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: service-cache-credentials
key: url
- name: SERVICE_NAME
value: order-service
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: platform-config
key: defaultLogLevel
Pattern 3: Pre-commit checks
# .pre-commit-config.yaml in every repo (enforced by golden path)
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
hooks:
- id: terraform_validate
- id: terraform_fmt
- repo: https://github.com/hadolint/hadolint
hooks:
- id: hadolint-docker
- repo: local
hooks:
- id: validate-catalog-info
name: Validate catalog-info.yaml
entry: python3 scripts/validate-catalog.py
language: python
files: catalog-info.yaml
Toolchain Integration
Integrate Vault for secrets:
# Platform-managed Vault policy per team
vault policy write order-team - <<EOF
path "secret/data/order-team/*" {
capabilities = ["read", "list"]
}
path "database/creds/order-service-role" {
capabilities = ["read"]
}
EOF
# Kubernetes service account binding
vault write auth/kubernetes/role/order-service \
bound_service_account_names=order-service \
bound_service_account_namespaces=order-team \
policies=order-team \
ttl=1h
Integrate OPA for policy enforcement:
# opa/policies/require-resource-limits.rego
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Deployment"
container := input.request.object.spec.template.spec.containers[_]
not container.resources.limits.cpu
msg := sprintf("Container '%s' must have CPU limits set", [container.name])
}
deny[msg] {
input.request.kind.kind == "Deployment"
container := input.request.object.spec.template.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%s' must have memory limits set", [container.name])
}
Measuring Platform Success
Track platform health with DORA metrics and developer satisfaction:
# Key metrics to instrument:
# 1. Deployment frequency (per team, per service)
# 2. Lead time for changes (commit to production)
# 3. Mean time to restore (incident to resolution)
# 4. Change failure rate
# Grafana query for deployment frequency
# Using Argo CD application sync events:
sum by (app) (
increase(argocd_app_sync_total{phase="Succeeded"}[7d])
)
# Track time from PR merge to deployment
# Parse timestamps from GitHub webhook events stored in your metrics system
Common Issues
Platform too complex for developers to adopt:
- Start with one golden path for the most common use case
- Measure adoption; if less than 50% use it, simplify it
- Run "paved road" workshops — developers help shape the path
Platform team becomes a bottleneck:
- Expose self-service capabilities; avoid requiring tickets
- Use GitOps — developers raise PRs, not Jira tickets
- Document the platform as code (Backstage TechDocs)
Environment drift:
# Use drift detection in Argo CD
argocd app diff order-service-production
# Enable auto-sync to prevent drift
argocd app set order-service-production --sync-policy automated --self-heal
Conclusion
A successful Internal Developer Platform reduces cognitive load by providing self-service golden paths, standardized tooling, and automated compliance. The key is treating the platform itself as a product — continuously improving developer experience based on feedback while maintaining the guardrails that keep infrastructure secure and consistent. Start small, measure adoption, and expand the platform incrementally.


