Sizing and Requirements

This page is operating guidance based on the current architecture. It is not a hard product limit matrix.

What drives load in this system

The biggest load multipliers are:

number of active tenants
polling frequency for messages, payloads, packages, and artifacts
amount of historical backfill
payload download depth
archive frequency and retention
number of concurrent users
number of active Celery worker processes
whether PostgreSQL and Redis are internal or external

Built-in interval defaults

Current platform metric defaults are:

host metrics every 600 seconds
Django metrics every 300 seconds
storage quick snapshots every 3600 seconds
storage deep snapshots every 21600 seconds

Current scheduled report dispatch batch limit:

50 jobs per periodic pass

Small installation

Use this as a starting point for:

1 to 5 users
low or moderate alert volume
a handful of tenants
limited payload and archive depth

Recommended shape:

SQLite can be acceptable
internal Redis can be acceptable
worker profile safe or balanced

Starting infrastructure guidance:

CPU: 2 vCPU
RAM: 4 GB
disk: 20 to 50 GB
storage: persistent mount for /app/data

Operational notes:

keep polling conservative
avoid heavy cold backfills during peak hours
accept that internal Redis has no persistence

Medium installation

Use this as a starting point for:

multiple admins
regular background processing
higher message volume
meaningful archive and payload usage

Recommended shape:

PostgreSQL recommended
external Redis recommended
worker profile balanced

Starting infrastructure guidance:

CPU: 4 vCPU
RAM: 8 to 16 GB
disk: 100 GB and upward depending on archive retention
network: stable low-latency path to Redis and PostgreSQL

Operational notes:

queue monitoring becomes mandatory
table growth review should be part of routine operations
use mounted or managed persistent storage for archives

Large installation

Use this as a starting point for:

many tenants
constant CPI ingestion
heavier payload inspection
broad operational reporting

Recommended shape:

dedicated PostgreSQL or managed PostgreSQL
dedicated Redis or managed Redis
worker profile fast or custom only after evidence-based tuning

Starting infrastructure guidance:

CPU: 8 vCPU or more
RAM: 16 to 32 GB or more
disk: sized primarily around DB growth and archive retention
storage: strong IOPS matter more than only raw capacity

Operational notes:

monitor queue depth continuously
monitor DB write latency and vacuum behavior
separate infrastructure services are strongly preferred

CPU considerations

CPU demand increases with:

many Celery workers
archive export and compression
PDF export generation
AI task execution
message parsing and batch processing

The container image also includes Chromium, Java, PostgreSQL, Redis, Nginx, and Python runtime pieces, so it is not a minimal single-process image.

RAM considerations

Idle memory footprint is affected by:

Gunicorn with 4 workers and 2 threads each
many Celery worker processes
internal PostgreSQL
internal Redis
Python process duplication across worker groups

For constrained environments, the first safe levers are:

reduce Celery concurrency
use external PostgreSQL
use external Redis
keep archive and cold backfill activity moderate

Disk and IOPS considerations

Disk demand is shaped by:

SQLite write amplification if SQLite is used
PostgreSQL table and index growth
archive exports
job and service logs
payload and attachment retention

IOPS become important when:

many workers write at once
archive cleanup triggers heavy DB maintenance
storage snapshots run during active ingestion windows

Network considerations

Network quality matters for:

PostgreSQL round trips
Redis round trips
CPI API calls
email delivery
managed AI backends

If Redis or PostgreSQL are remote, low latency improves queue throughput and lock behavior.

Practical tuning order

When throughput is not sufficient, the safest order is:

validate queue depth and Redis health
validate DB latency and write pressure
review polling frequencies and cold backfill behavior
only then raise worker concurrency

This order matches how the current architecture actually bottlenecks.