Monitoring, Metrics, and Alerts

Use the platform monitoring dashboards and alerting policies to observe RabbitMQ resource usage, broker health, queue growth, and traffic patterns. Monitoring should answer two different questions:

  • Is the instance healthy enough to accept traffic?
  • Is the workload keeping up with its message volume and retention targets?

Metrics Collection

The platform collects RabbitMQ metrics by default. The embedded monitoring panel presents the most common broker and resource signals for operational inspection and performance tuning.

By default, RabbitMQ metrics are exposed by the broker on port 15692. If standalone exporter integration is enabled (spec.exporter.enabled=true), metrics are also available from the exporter service on port 9419.

Verify that the broker listener is present:

kubectl -n <namespace> exec <instance-name>-server-0 -- \
  rabbitmq-diagnostics listeners

Verify that the Service exposes the expected ports:

kubectl -n <namespace> get svc <instance-name>

Key Metric Categories

CategoryWhat to WatchWhy It Matters
Instance availabilityRabbitmqCluster phase, readiness, and service reachabilityShows whether the cluster is available from the platform and network perspective.
Connections and channelsConnection count, channel count, consumer countSudden drops or spikes usually indicate client-side failures or connection storms.
Publish and delivery ratesMessage ingress, delivery, acknowledgement, and redelivery ratesShows whether producers and consumers remain in balance.
Queue backlogmessages_ready, messages_unacknowledged, queue countIndicates consumer lag, stuck consumers, or retry loops.
MemoryMemory usage, high watermark, memory alarmsMemory alarms can block publishers and reduce throughput.
DiskFree disk space and disk alarmsDisk alarms can block publishers and indicate a backlog or storage sizing problem.
File descriptors and socketsUsed file descriptors and socket countProtects the broker from exhausting connection-related resources.
Plugins and listenersManagement, Prometheus, TLS, Shovel, or Federation listenersConfirms that operational features remain enabled after updates.

RabbitMQ exports mostly broker-level metrics by default. When you need detailed per-queue or per-exchange analysis, combine broker metrics with application metrics, queue inspection commands, and workload-specific dashboards.

Create or tune alert policies for the following conditions:

SignalRecommended Starting PointOperational Meaning
Instance availability!= 1 for 30 secondsThe cluster is not fully available to clients.
Node memory utilization> 80% for 30 secondsMemory pressure is rising and can trigger broker alarms.
Node storage utilization> 80% for 30 secondsThe broker is approaching disk pressure and publish blocking risk.
Channel countThreshold based on application designSudden growth can indicate connection churn or channel leaks.
Connection countThreshold based on application designLarge drops or spikes often indicate client or network problems.
Message write frequencyThreshold based on expected workloadA drop can indicate producer issues; a spike can require scaling or backlog controls.
Queue backlogSustained growth in ready or unacknowledged messagesConsumers are not keeping up or retry patterns are unhealthy.

How to Interpret Common Signals

SignalInterpretationRecommended Follow-up
Disk alarm is activeThe broker is protecting itself because free space is below the configured watermark.Inspect backlog, storage usage, and message retention policies.
Memory alarm is activeThe broker is under memory pressure and can block publishers.Check connection churn, queue growth, and memory-heavy workloads.
messages_ready growsConsumers are not draining the queue fast enough.Check consumer health, scaling, backlog controls, and retry topology.
messages_unacknowledged growsConsumers are holding deliveries longer than expected.Check consumer latency, prefetch, downstream dependencies, and stuck consumers.
Connection count drops sharplyProducers or consumers lost connectivity.Check service exposure, TLS, credentials, and broker alarms.
Connection count spikes sharplyClients may be reconnecting repeatedly.Check rolling restarts, DNS, TLS failures, and application reconnect loops.

Alert Policy Guidance

Go to the Application Service's Alerts > Alert Policies page to create alert policies for RabbitMQ. Built-in metrics are the fastest way to enable baseline coverage. When built-in metrics are not sufficient, create custom PromQL-based alerts and test them before you rely on them in production.

For more information on configuring and using alerts, see the platform Alert Management documentation.