Redis and Queues

Redis is not a minor optional optimization in the current architecture. It is used in several roles at once.

What Redis is used for

The codebase uses Redis for:

Django cache data through REDIS_CACHE_URL
Celery broker transport through CELERY_BROKER_URL
Celery result backend through CELERY_RESULT_BACKEND
Django Channels websocket fan-out through CHANNELS_REDIS_URL
lock and coordination keys through Django cache and direct Redis access

Default URL layout

The current defaults are not fully consistent between boot.py and Django fallbacks, so be explicit in production.

Typical intended split is:

cache: one DB or endpoint
broker: one DB or endpoint
result backend: one DB or endpoint
channels: ideally its own DB or endpoint

Local sample JSON files use patterns such as:

broker redis://localhost:6379/0
result backend redis://localhost:6379/1
cache redis://localhost:6379/2
channels redis://127.0.0.1:6379/3

Internal Redis mode

If REDIS_LOCAL=true, Supervisor starts Redis with:

--save ""
--appendonly no
--stop-writes-on-bgsave-error no

This means the bundled Redis instance is intentionally non-persistent.

Operational consequence:

queue state is fast and simple
broker memory is lost on restart
cache content is lost on restart
it is suitable for development and compact deployments, but not as a durable queue system

What happens without Redis

If Redis is missing or unreachable, these areas are affected immediately:

Celery workers cannot consume broker queues
task results cannot be stored in Redis result backend
Django cache-backed locks and caches fail
websocket and Channels features fail
queue metrics and worker lock cleanup become unreliable

In other words:

the UI may still load in some cases
but background processing, realtime features, and large parts of operational behavior degrade quickly

Queue families in use

Current queue names include:

messagelog_read
messagelog_read_cold
payload_read
payload_get_details
package_read
messagelog_get_details_hot
messagelog_get_details_cold
messagelog_get_details_alert
messagelog_process_batch
messagelog_process_batch_cold
messagelog_correlations
messagelog_customheaders
iflow_download
archive_data
trigger_alerts
periodic_run
stats_cache
ai

The controller loop also inspects queue lengths directly through Redis when building queue metrics.

Memory and persistence considerations

Redis memory usage is driven by:

queued Celery tasks
task result retention
cache entries with long TTL or no expiry
websocket channel traffic
lock keys

Current code keeps some caches indefinitely:

several application caches use timeout=None
docs index cache is short-lived
worker path caches can live for a year

That means Redis can accumulate more than only short broker traffic.

TLS support

If a URL starts with rediss://, the code automatically enables CA-based TLS verification using certifi.

This applies to:

CELERY_BROKER_URL
CELERY_RESULT_BACKEND
REDIS_CACHE_URL

CHANNELS_REDIS_URL does not currently get the same custom TLS options block in settings.py, so test managed Redis TLS carefully.

Recommended operating models

Small installation

internal Redis is acceptable
monitor queue depth and restart frequency
expect queue and cache loss after restart

Medium installation

external Redis is recommended
isolate cache, broker, and channels DB numbers or endpoints
watch queue growth in superadmin metrics

Large installation

dedicated Redis or managed Redis is strongly recommended
monitor memory, evictions, and rejected connections
keep queue backlogs visible and alert on them

Redis metric defaults

The platform metric defaults are:

queue warn: 1000
queue critical: 10000
memory warn percent: 90
memory critical percent: 98
evicted keys warn min: 1
evicted keys critical min: 10
rejected connections critical min: 1

These are not Redis server settings. They are application-level thresholds stored in cMetricSettings.

Failure patterns

Common symptoms of Redis trouble are:

stuck or growing Celery queues
websocket updates missing
delayed alerting or report dispatch
repeated worker lock warnings
No cached templates found style API responses for cache-backed reads

Recovery guidance

If Redis was restarted and is intentionally non-persistent:

expect queue loss
expect cache loss
allow periodic jobs to rebuild caches
verify that workers reconnect and queue depth returns to normal

If Redis is external and unhealthy:

fix connectivity first
then confirm broker queue consumption
then validate channels and cache-dependent features