Redis and Queues
Redis is not a minor optional optimization in the current architecture. It is used in several roles at once.
What Redis is used for
The codebase uses Redis for:
- Django cache data through
REDIS_CACHE_URL - Celery broker transport through
CELERY_BROKER_URL - Celery result backend through
CELERY_RESULT_BACKEND - Django Channels websocket fan-out through
CHANNELS_REDIS_URL - lock and coordination keys through Django cache and direct Redis access
Default URL layout
The current defaults are not fully consistent between boot.py and Django fallbacks, so be explicit in production.
Typical intended split is:
- cache: one DB or endpoint
- broker: one DB or endpoint
- result backend: one DB or endpoint
- channels: ideally its own DB or endpoint
Local sample JSON files use patterns such as:
- broker
redis://localhost:6379/0 - result backend
redis://localhost:6379/1 - cache
redis://localhost:6379/2 - channels
redis://127.0.0.1:6379/3
Internal Redis mode
If REDIS_LOCAL=true, Supervisor starts Redis with:
--save ""--appendonly no--stop-writes-on-bgsave-error no
This means the bundled Redis instance is intentionally non-persistent.
Operational consequence:
- queue state is fast and simple
- broker memory is lost on restart
- cache content is lost on restart
- it is suitable for development and compact deployments, but not as a durable queue system
What happens without Redis
If Redis is missing or unreachable, these areas are affected immediately:
- Celery workers cannot consume broker queues
- task results cannot be stored in Redis result backend
- Django cache-backed locks and caches fail
- websocket and Channels features fail
- queue metrics and worker lock cleanup become unreliable
In other words:
- the UI may still load in some cases
- but background processing, realtime features, and large parts of operational behavior degrade quickly
Queue families in use
Current queue names include:
messagelog_readmessagelog_read_coldpayload_readpayload_get_detailspackage_readmessagelog_get_details_hotmessagelog_get_details_coldmessagelog_get_details_alertmessagelog_process_batchmessagelog_process_batch_coldmessagelog_correlationsmessagelog_customheadersiflow_downloadarchive_datatrigger_alertsperiodic_runstats_cacheai
The controller loop also inspects queue lengths directly through Redis when building queue metrics.
Memory and persistence considerations
Redis memory usage is driven by:
- queued Celery tasks
- task result retention
- cache entries with long TTL or no expiry
- websocket channel traffic
- lock keys
Current code keeps some caches indefinitely:
- several application caches use
timeout=None - docs index cache is short-lived
- worker path caches can live for a year
That means Redis can accumulate more than only short broker traffic.
TLS support
If a URL starts with rediss://, the code automatically enables CA-based TLS verification using certifi.
This applies to:
CELERY_BROKER_URLCELERY_RESULT_BACKENDREDIS_CACHE_URL
CHANNELS_REDIS_URL does not currently get the same custom TLS options block in settings.py, so test managed Redis TLS carefully.
Recommended operating models
Small installation
- internal Redis is acceptable
- monitor queue depth and restart frequency
- expect queue and cache loss after restart
Medium installation
- external Redis is recommended
- isolate cache, broker, and channels DB numbers or endpoints
- watch queue growth in superadmin metrics
Large installation
- dedicated Redis or managed Redis is strongly recommended
- monitor memory, evictions, and rejected connections
- keep queue backlogs visible and alert on them
Redis metric defaults
The platform metric defaults are:
- queue warn:
1000 - queue critical:
10000 - memory warn percent:
90 - memory critical percent:
98 - evicted keys warn min:
1 - evicted keys critical min:
10 - rejected connections critical min:
1
These are not Redis server settings. They are application-level thresholds stored in cMetricSettings.
Failure patterns
Common symptoms of Redis trouble are:
- stuck or growing Celery queues
- websocket updates missing
- delayed alerting or report dispatch
- repeated worker lock warnings
No cached templates foundstyle API responses for cache-backed reads
Recovery guidance
If Redis was restarted and is intentionally non-persistent:
- expect queue loss
- expect cache loss
- allow periodic jobs to rebuild caches
- verify that workers reconnect and queue depth returns to normal
If Redis is external and unhealthy:
- fix connectivity first
- then confirm broker queue consumption
- then validate channels and cache-dependent features