Self-Host Paperless-ngx on a VPS with Docker Compose

Deploy Paperless-ngx on a VPS with Docker Compose, PostgreSQL, and Redis. Configure OCR languages and modes, set up automatic tagging rules, email consumption, and a production backup strategy with offsite sync.

Paperless-ngx turns your VPS into a searchable document archive. You scan or upload documents, Paperless-ngx OCRs them, makes them full-text searchable, and files them with automatic tags and correspondents. It runs as a Docker Compose stack with PostgreSQL, Redis, and Gotenberg for document conversion.

This guide deploys Paperless-ngx on a VPS that already has Docker and a reverse proxy running. If you need that foundation first, start with Self-Host Apps on a VPS: Architecture, RAM Usage, and What to Deploy First.

What resources does Paperless-ngx need on a VPS?

Paperless-ngx needs 2 GB RAM minimum, 4 GB recommended. The stack runs PostgreSQL, Redis, Gotenberg, Tika, and the web server. At idle, total memory usage sits around 800 MB. During OCR ingestion of scanned PDFs, CPU usage spikes to 100% on one core and RAM climbs to 1.5-2 GB. Disk usage averages 5-10 MB per document (original + archive + thumbnail).

Component Idle RAM Peak RAM (OCR) Disk per 1,000 docs
Paperless-ngx webserver ~300 MB ~900 MB 5-10 GB
PostgreSQL ~50 MB ~100 MB ~500 MB
Redis ~10 MB ~10 MB negligible
Gotenberg ~150 MB ~300 MB
Tika ~250 MB ~400 MB
Total ~760 MB ~1,710 MB 5-10 GB

Plan disk space based on your document volume. A household scanning 50 documents per month needs about 6 GB per year. A small business doing 500/month should budget 50-60 GB per year.

How do I deploy Paperless-ngx with Docker Compose on a VPS?

Create a directory for the stack, generate secrets, and write the Compose file. Every service gets a health check so Docker restarts unhealthy containers automatically.

Create the project directory

mkdir -p /opt/paperless-ngx && cd /opt/paperless-ngx

Generate secrets

Never use default passwords. Generate a strong database password and a Django secret key:

openssl rand -base64 32 > .db_password
openssl rand -base64 48 > .secret_key
chmod 600 .db_password .secret_key

These files stay on disk with restricted permissions. The Compose file reads them at container start.

Write the environment file

cat > .env << 'EOF'
COMPOSE_PROJECT_NAME=paperless
EOF

Write the Compose file

# docker-compose.yml
services:
  broker:
    image: docker.io/library/redis:8
    restart: unless-stopped
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 30s
      timeout: 5s
      retries: 3

  db:
    image: docker.io/library/postgres:18
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U paperless"]
      interval: 30s
      timeout: 5s
      retries: 3

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9998/"]
      interval: 30s
      timeout: 5s
      retries: 3

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy
      broker:
        condition: service_healthy
      gotenberg:
        condition: service_healthy
      tika:
        condition: service_healthy
    ports:
      - "127.0.0.1:8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS_FILE: /run/secrets/db_password
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998
      PAPERLESS_SECRET_KEY_FILE: /run/secrets/secret_key
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_OCR_MODE: skip
      PAPERLESS_OCR_OUTPUT_TYPE: pdfa
      PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{title}"
      PAPERLESS_URL: https://paperless.example.com
      USERMAP_UID: 1000
      USERMAP_GID: 1000
    secrets:
      - db_password
      - secret_key
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s

secrets:
  db_password:
    file: .db_password
  secret_key:
    file: .secret_key

volumes:
  data:
  media:
  pgdata:
  redisdata:

About this configuration:

  • Ports bind to 127.0.0.1 only. Your reverse proxy handles public access over HTTPS. Exposing port 8000 to 0.0.0.0 would bypass TLS and your firewall.
  • Secrets use Docker secrets (_FILE suffix variables). The passwords never appear in docker inspect output or process listings.
  • USERMAP_UID/USERMAP_GID map the container user to UID 1000 on the host. Files created in the consume and export directories are owned by this user.
  • PAPERLESS_URL must match your public domain. Paperless-ngx uses this for generating share links and email URLs. Replace paperless.example.com with your actual domain.

Start the stack

docker compose up -d

Wait about 60 seconds for all services to initialize. Check that every container is healthy:

docker compose ps
NAME                   SERVICE      STATUS                  PORTS
paperless-broker-1     broker       running (healthy)
paperless-db-1         db           running (healthy)
paperless-gotenberg-1  gotenberg    running (healthy)
paperless-tika-1       tika         running (healthy)
paperless-webserver-1  webserver    running (healthy)       127.0.0.1:8000->8000/tcp

All five containers should show (healthy). If any show (health: starting), wait another 30 seconds. If one stays (unhealthy), check its logs:

docker compose logs gotenberg --tail 20

Create the superuser

docker compose exec webserver createsuperuser

Follow the prompts for username, email, and password. This is your admin account for the web UI.

Access the web UI

If your reverse proxy is configured, open https://paperless.example.com in a browser. The Paperless-ngx login page loads. If you're still setting up the reverse proxy, test locally:

curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8000
200

What OCR languages and modes should I configure?

Paperless-ngx uses Tesseract for OCR. You configure the language and processing mode through environment variables. The defaults work for English documents, but most users need to adjust these.

OCR language

Set PAPERLESS_OCR_LANGUAGE to the three-letter Tesseract language code for your documents. For multiple languages, join them with +:

# Single language
PAPERLESS_OCR_LANGUAGE: deu

# Multiple languages
PAPERLESS_OCR_LANGUAGE: deu+eng+fra

Common language codes: eng (English), deu (German), fra (French), spa (Spanish), ita (Italian), nld (Dutch), por (Portuguese). The full list is in the Tesseract documentation.

The container image includes most language packs. Adding more languages to PAPERLESS_OCR_LANGUAGE increases OCR processing time per page.

OCR modes

The PAPERLESS_OCR_MODE setting controls how Paperless-ngx handles documents that already contain text layers (common with digitally-created PDFs).

Mode Behavior Use when
skip OCR only pages with no text. Always creates an archive copy. Default. Best for mixed input: scanned + digital PDFs.
skip_noarchive Same as skip, but skips archive creation for documents that already have text. You want to save disk space on digital-origin PDFs.
redo Re-OCR all pages, replacing existing text layers. You receive documents with bad OCR from scanners or other software.
force Rasterize the document and OCR from scratch. Destroys original text. Last resort. Produces larger files with less sharp text.

For most setups, skip is the right choice. It handles both scanned and digital documents without wasting CPU re-processing documents that already have searchable text.

If you receive documents from a scanner that does its own (poor) OCR, set the mode to redo for better results. Be aware that redo is incompatible with PAPERLESS_OCR_CLEAN and PAPERLESS_OCR_DESKEW.

OCR output type

PAPERLESS_OCR_OUTPUT_TYPE: pdfa (the default) produces PDF/A files, the archival standard. These are self-contained and include embedded fonts. Keep this unless you have a specific reason not to.

After changing any OCR settings, restart the webserver container:

docker compose restart webserver

Existing documents are not re-processed. To re-OCR a document, use the "Redo OCR" action in the web UI.

How does automatic tagging work in Paperless-ngx?

Paperless-ngx supports six matching algorithms for tags, correspondents, and document types. When a document arrives, each tag checks the document content against its match pattern. If the pattern hits, the tag is applied automatically.

Algorithm How it works When to use Example match
Any Matches if any word in the match field appears. Broad categories. invoice receipt bill
All Matches if all words appear (any order). Narrower matching without position dependency. electricity quarterly
Exact Matches the exact phrase in order. Company names, account numbers. Acme Corp
Regular Expression Full regex pattern matching. Structured data: dates, reference numbers, amounts. Invoice\s+#?\d{4,}
Fuzzy Approximate matching with a configurable threshold. Inconsistent OCR output, slight misspellings. Stadtwerke (matches Stadtwenke)
Auto ML classifier that learns from your manual assignments. After you have ~50+ manually tagged documents. (learns from your corrections)

Setting up tags with matching rules

In the web UI, go to Manage > Tags and create a tag. Set the Matching Algorithm and the Match field. Some practical examples:

  • Tag: Invoice, Algorithm: Any, Match: invoice rechnung facture (catches invoices in multiple languages)
  • Tag: Bank, Algorithm: Regular Expression, Match: IBAN\s*[A-Z]{2}\d{2} (matches any IBAN number)
  • Tag: Medical, Algorithm: All, Match: patient diagnosis (requires both words)
  • Correspondent: Electric Company, Algorithm: Exact, Match: Springfield Energy Inc

Training the Auto classifier

The Auto algorithm uses a machine learning classifier that Paperless-ngx retrains automatically. To train it:

  1. Tag at least 50 documents manually across your categories
  2. Make sure these documents are not in your inbox (the classifier ignores inbox documents)
  3. The classifier retrains on a schedule (default: hourly via document_create_classifier)
  4. New documents start receiving automatic assignments

You can trigger a manual retrain:

docker compose exec webserver document_create_classifier

The classifier improves as you correct its mistakes. Every correction feeds back into the next training cycle.

How do I set up document consumption in Paperless-ngx?

Paperless-ngx ingests documents through three channels: manual upload via the web UI, a watched folder on disk, and email fetching via IMAP.

Web UI upload

Drag and drop files into the web interface. This works immediately after setup. Supports PDF, PNG, JPEG, TIFF, and (with Tika enabled) DOCX, XLSX, ODT, and other office formats.

Watched folder (consume directory)

The ./consume directory mapped in the Compose file is monitored by an inotify watcher. Drop a file in, and Paperless-ngx picks it up within seconds.

cp /tmp/scan-001.pdf /opt/paperless-ngx/consume/

The file disappears from the consume directory once processing completes. To organize consumption by tag, enable subdirectory tagging:

PAPERLESS_CONSUMER_RECURSIVE: true
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: true

With this setting, a file placed in consume/invoices/scan.pdf gets tagged invoices automatically.

Email consumption (IMAP)

Email consumption is configured entirely in the web UI under Manage > Mail. You add mail accounts and mail rules.

Add a mail account:

  1. Go to Manage > Mail > Mail Accounts
  2. Set the IMAP server, port (993 for TLS), username, and password
  3. Set the folder to monitor (e.g. INBOX)

Add a mail rule:

  1. Go to Manage > Mail > Mail Rules
  2. Select the mail account
  3. Choose what to consume: attachments only, or the full email as PDF
  4. Set an action for processed emails: mark as read, move to a folder, or delete
  5. Optionally assign a tag, correspondent, or document type to consumed documents

A typical setup: forward documents to a dedicated email address (e.g. scan@yourdomain.com), configure Paperless-ngx to check that inbox, and consume attachments. The originals get moved to an "Archived" IMAP folder after processing.

How should I organize storage paths and filenames?

Paperless-ngx stores documents in two locations: the media volume holds files on disk, while PostgreSQL tracks metadata. The PAPERLESS_FILENAME_FORMAT variable controls how archived files are named inside the media directory.

The Compose file above uses:

PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{title}"

This creates a structure like:

media/documents/archive/
  2026/
    Springfield Energy Inc/
      Electricity Bill January 2026.pdf
    Dr. Smith/
      Lab Results March 2026.pdf

Available template variables include {created_year}, {created_month}, {created_day}, {correspondent}, {document_type}, {title}, {tag_list}, and {owner_username}. If a variable resolves to empty (no correspondent assigned), Paperless-ngx replaces it with none.

After changing the filename format, apply it to existing documents:

docker compose exec webserver document_renamer

What is the best backup strategy for Paperless-ngx?

A working backup covers the PostgreSQL database, the document files (originals + archives + thumbnails), and the Paperless-ngx metadata (tags, correspondents, matching rules). The document exporter captures all of this in a portable format. Pair it with a database dump for faster database-only restores.

Database dump

docker compose exec db pg_dump -U paperless paperless | gzip > /opt/paperless-ngx/backups/db-$(date +%Y%m%d).sql.gz

Document exporter

The exporter creates a full snapshot: documents, metadata, and manifest. This is the canonical backup method.

docker compose exec webserver document_exporter ../export -c -d

The -c flag compares checksums (only exports changed files). The -d flag deletes files from the export that no longer exist in Paperless-ngx. Together they keep the export directory as a mirror of the current state.

Automate with cron

Create a backup script:

cat > /opt/paperless-ngx/backup.sh << 'SCRIPT'
#!/bin/bash
set -euo pipefail

BACKUP_DIR="/opt/paperless-ngx/backups"
EXPORT_DIR="/opt/paperless-ngx/export"

mkdir -p "$BACKUP_DIR"

# Database dump
docker compose -f /opt/paperless-ngx/docker-compose.yml exec -T db \
  pg_dump -U paperless paperless | gzip > "$BACKUP_DIR/db-$(date +%Y%m%d).sql.gz"

# Document exporter
docker compose -f /opt/paperless-ngx/docker-compose.yml exec -T webserver \
  document_exporter ../export -c -d

# Remove database dumps older than 30 days
find "$BACKUP_DIR" -name "db-*.sql.gz" -mtime +30 -delete

echo "[$(date -Is)] Backup completed" >> "$BACKUP_DIR/backup.log"
SCRIPT

chmod 700 /opt/paperless-ngx/backup.sh

Schedule it with cron (this preserves any existing crontab entries):

(crontab -l 2>/dev/null; echo "0 3 * * * /opt/paperless-ngx/backup.sh") | crontab -

This adds a nightly job at 03:00.

Offsite sync

The export directory and database dumps need to leave the server. Use rsync or rclone to push them to a second location. For an S3-compatible target:

rclone sync /opt/paperless-ngx/export remote:paperless-backup/export
rclone sync /opt/paperless-ngx/backups remote:paperless-backup/db-dumps

For another server via SSH:

rsync -az --delete /opt/paperless-ngx/export/ backup-server:/backups/paperless/export/
rsync -az /opt/paperless-ngx/backups/ backup-server:/backups/paperless/db-dumps/

For a deeper look at Docker volume backup strategies, see Docker Volume Backup and Restore on a VPS.

How do I restore a Paperless-ngx backup?

Restoring requires a fresh Paperless-ngx installation. Spin up the stack, then import.

Database-only restore (faster, for database corruption or migration):

gunzip < /opt/paperless-ngx/backups/db-20260320.sql.gz | \
  docker compose exec -T db psql -U paperless paperless

Full restore from exporter (includes documents, tags, correspondents, everything):

docker compose exec webserver document_importer ../export

Run the importer against an empty database. It recreates all metadata and links files from the export directory.

Test your restore periodically. A backup you have never restored is a backup you do not have.

How do I harden the Paperless-ngx containers?

The Compose file above already handles the basics: secrets files instead of inline passwords, port binding to localhost, and user mapping. Here are additional hardening steps.

Drop unnecessary capabilities

Add security options to the webserver service:

webserver:
  # ... existing config ...
  security_opt:
    - no-new-privileges:true

This prevents the container process from gaining additional privileges through setuid binaries.

Set resource limits

Prevent a runaway OCR process from consuming all server resources:

webserver:
  # ... existing config ...
  deploy:
    resources:
      limits:
        memory: 2g
        cpus: "2.0"
      reservations:
        memory: 512m

For more on resource limits, see Docker Compose Resource Limits, Healthchecks, and Restart Policies.

Hide version information

Configure your reverse proxy to strip the Server header. With Nginx:

server_tokens off;

Paperless-ngx itself does not expose its version in HTTP headers, but your reverse proxy might.

How do I update Paperless-ngx?

Pull the latest images and recreate the containers. The Paperless-ngx image runs database migrations automatically on startup.

cd /opt/paperless-ngx
docker compose pull
docker compose up -d

After the update, check the logs for migration output:

docker compose logs webserver --tail 30

Look for lines like Applying documents.XXXX_migration_name... OK. If migrations fail, the container stops. Check the release notes for breaking changes before major version jumps.

Always run your backup script before updating.

Something went wrong?

Container stays unhealthy: Check logs with docker compose logs <service> --tail 50. Common causes: PostgreSQL password mismatch (regenerate .db_password and recreate the database volume), Redis connection refused (broker not started yet).

OCR produces garbage text: Wrong language set. Check PAPERLESS_OCR_LANGUAGE matches your documents. For mixed-language documents, add all relevant language codes separated by +.

Documents not appearing after upload: Check the consumer log:

docker compose logs webserver --tail 50 | grep -i consumer

Common cause: file permission mismatch. The consume directory must be writable by UID 1000 (or whatever USERMAP_UID is set to):

chown -R 1000:1000 /opt/paperless-ngx/consume

Auto-tagging not working: The classifier needs at least ~50 manually tagged documents to produce results. Check if the classifier has been trained:

docker compose exec webserver document_create_classifier

Disk space growing fast: Check which documents are largest:

docker compose exec webserver document_sanity_checker

Also review your OCR mode. force mode creates significantly larger archive files than skip.

Email consumption not fetching: Check IMAP credentials in the web UI. Trigger a manual fetch to see errors:

docker compose exec webserver mail_fetcher

For related self-hosting guides, see Self-Host Immich on a VPS with Docker Compose for photo management on the same stack.


Copyright 2026 Virtua.Cloud. All rights reserved. This content is original work by the Virtua.Cloud team. Reproduction, republication, or redistribution without written permission is prohibited.

Ready to try it yourself?

Deploy your own server in seconds. Linux, Windows, or FreeBSD.

See VPS Plans