Paperless-ngx Document Management Installation
Paperless-ngx is an open-source document management system that scans, OCRs, tags, and indexes your documents into a searchable digital archive, eliminating paper clutter and manual filing. Running on Docker with Tesseract OCR and full-text search, it supports automatic document consumption from email, scanners, and watched folders, making it ideal for digitizing home office or small business paperwork on a self-hosted Linux server.
Prerequisites
- Ubuntu 20.04+, Debian 11+, or CentOS/Rocky 8+
- Docker and Docker Compose installed
- Minimum 2 GB RAM (4+ GB recommended)
- Root or sudo access
- A domain name or static IP for access
Installing Paperless-ngx with Docker
# Create the Paperless-ngx directory
sudo mkdir -p /opt/paperless
cd /opt/paperless
# Download the official docker-compose with PostgreSQL and Redis
curl -LO https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/docker-compose.postgres.yml
mv docker-compose.postgres.yml docker-compose.yml
# Download the environment file template
curl -LO https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/.env
Edit the .env file:
nano /opt/paperless/.env
# Key settings to configure in .env:
# Secret key — generate with: openssl rand -hex 32
PAPERLESS_SECRET_KEY=your-secret-key-here
# Admin user credentials
PAPERLESS_ADMIN_USER=admin
PAPERLESS_ADMIN_PASSWORD=secure-password-here
[email protected]
# Timezone
PAPERLESS_TIME_ZONE=America/New_York
# Language for OCR (3-letter ISO code)
PAPERLESS_OCR_LANGUAGE=eng
# URL for reverse proxy access
PAPERLESS_URL=https://paperless.example.com
# Storage paths
PAPERLESS_DATA_DIR=/usr/src/paperless/data
PAPERLESS_MEDIA_ROOT=/usr/src/paperless/media
PAPERLESS_CONSUMPTION_DIR=/usr/src/paperless/consume
PAPERLESS_EXPORT_DIR=/usr/src/paperless/export
# Start Paperless-ngx
sudo docker compose up -d
# Monitor startup (OCR model downloads may take a few minutes)
sudo docker compose logs -f webserver
# Verify it's running
sudo docker compose ps
Access the web interface at http://your-server:8000.
Initial Configuration
Create the admin user (if not auto-created from .env):
sudo docker compose exec webserver \
python3 manage.py createsuperuser \
--username admin \
--email [email protected]
Configure document storage volumes:
# Create local directories for document storage
sudo mkdir -p /opt/paperless/{consume,data,media,export}
sudo chown -R 1000:1000 /opt/paperless/{consume,data,media,export}
Update docker-compose.yml to mount local paths:
services:
webserver:
volumes:
- /opt/paperless/data:/usr/src/paperless/data
- /opt/paperless/media:/usr/src/paperless/media
- /opt/paperless/consume:/usr/src/paperless/consume
- /opt/paperless/export:/usr/src/paperless/export
OCR Configuration
Paperless-ngx uses Tesseract for OCR. Configure language and quality:
# In .env, set OCR options:
# Primary OCR language
PAPERLESS_OCR_LANGUAGE=eng
# Multiple languages (separate with +)
PAPERLESS_OCR_LANGUAGE=eng+deu+fra
# OCR mode:
# 0 = Skip OCR on documents that already have text
# 1 = Redo OCR on all documents
# 2 = Force OCR even on documents with text (default: 0)
PAPERLESS_OCR_MODE=skip
# Image cleanup before OCR
PAPERLESS_OCR_CLEAN=clean
# Unpaper for page straightening
PAPERLESS_OCR_DESKEW=true
PAPERLESS_OCR_ROTATE_PAGES=true
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=12
# PDF optimization
PAPERLESS_OCR_OUTPUT_TYPE=pdfa
Install additional Tesseract language packs:
# Add to Dockerfile or install in the container
sudo docker compose exec webserver \
apt-get install -y tesseract-ocr-deu tesseract-ocr-fra
# List available language packs
sudo docker compose exec webserver tesseract --list-langs
Consumption Workflows
Paperless-ngx watches the consume directory for new documents:
Drop files for automatic processing:
# Copy files to the consumption directory
cp invoice.pdf /opt/paperless/consume/
cp scan.jpg /opt/paperless/consume/
# Paperless processes them automatically within seconds
# Monitor processing
sudo docker compose logs -f consumer
Configure watched folder with inotify:
# The consumer service watches the consumption directory automatically
# Check the consumption schedule
grep -i consume /opt/paperless/.env
# Manual consumption trigger
sudo docker compose exec webserver \
python3 manage.py document_consumer --oneshot
Pre-process with filename tags:
# Paperless supports filename-based metadata hints
# Create metadata files alongside documents
# Example: invoice.pdf + invoice.pdf.json
echo '{"title": "Electric Bill", "tags": ["bills", "utilities"]}' > \
/opt/paperless/consume/electric-bill.pdf.json
cp electric-bill.pdf /opt/paperless/consume/
Tagging and Document Organization
Paperless-ngx uses correspondents, document types, and tags:
Create tags via the web UI:
- Go to Tags > Create Tag
- Name:
bills, Color: Red, Auto-match: enabled
Set up automatic tagging rules:
- Go to Correspondents > Add Correspondent
- Set Matching Algorithm:
Auto(ML-based) orRegular expression - For regex: Match
Electric Company→ Tagutilities
Bulk assign documents:
# Via the web UI:
# 1. Select multiple documents (checkboxes)
# 2. Click the tag icon
# 3. Apply tags in bulk
# Via CLI
sudo docker compose exec webserver \
python3 manage.py shell -c "
from documents.models import Document, Tag
tag = Tag.objects.get(name='bills')
Document.objects.filter(title__contains='invoice').update()
"
Full-Text Search
Paperless-ngx uses Whoosh for full-text search indexing:
# Rebuild the search index
sudo docker compose exec webserver \
python3 manage.py document_index reindex
# Search from CLI (useful for scripting)
sudo docker compose exec webserver \
python3 manage.py shell -c "
from documents.models import Document
results = Document.objects.filter(content__icontains='invoice 2024')
for doc in results:
print(doc.title, doc.created)
"
Search query syntax in the web UI:
content:invoice # Search document content
title:electric # Search by title
tag:bills # Filter by tag
correspondent:amazon # Filter by correspondent
created:[2024-01-01 TO *] # Date range
Email Integration
Automatically import documents from email accounts:
# Configure mail accounts in Admin panel:
# Mail > Mail Accounts > Add Mail Account
# Settings:
# - IMAP Server: imap.gmail.com
# - Port: 993
# - Username: [email protected]
# - Password: app-password
# - IMAP Security: SSL
# Configure mail rules:
# Mail > Mail Rules > Add Mail Rule
# - Account: your Gmail account
# - Subject filter: "invoice" or "receipt"
# - Action: Consume attachments
# - Tags to assign: bills, email
Poll mail manually:
sudo docker compose exec webserver \
python3 manage.py mail_fetcher
Troubleshooting
Documents not being processed from consume folder:
# Check consumer service
sudo docker compose logs consumer -n 50
# Verify file permissions
ls -la /opt/paperless/consume/
sudo chown 1000:1000 /opt/paperless/consume/*.pdf
# Check consume directory is mounted correctly
sudo docker compose exec consumer ls /usr/src/paperless/consume/
OCR producing garbled text:
# Check if the correct language is set
grep OCR_LANGUAGE /opt/paperless/.env
# Test OCR on a specific file
sudo docker compose exec webserver \
tesseract /path/to/test.pdf output txt -l eng
# Enable deskew for scanned documents
# PAPERLESS_OCR_DESKEW=true
# PAPERLESS_OCR_ROTATE_PAGES=true
High memory usage during indexing:
# Check memory
sudo docker stats
# Limit concurrent tasks in .env
PAPERLESS_TASK_WORKERS=1
PAPERLESS_THREADS_PER_WORKER=1
# Rebuild search index can be memory-intensive
sudo docker compose exec webserver \
python3 manage.py document_index reindex --no-progress-bar
Conclusion
Paperless-ngx transforms document management from manual filing into an automated, searchable digital archive with OCR, intelligent tagging, and multi-source consumption from email, scanners, and watched folders. The combination of full-text search, automatic correspondent detection, and email integration makes it practical to maintain a paperless office workflow entirely on self-hosted infrastructure without cloud document services.


