Monitoring, Metrics, and Alerts

Use the platform monitoring dashboards and alerting policies to observe RabbitMQ resource usage, broker health, queue growth, and traffic patterns. Monitoring should answer two different questions:

Is the instance healthy enough to accept traffic?
Is the workload keeping up with its message volume and retention targets?

Metrics Collection

The platform collects RabbitMQ metrics by default. The embedded monitoring panel presents the most common broker and resource signals for operational inspection and performance tuning.

By default, RabbitMQ metrics are exposed by the broker on port 15692. If standalone exporter integration is enabled (spec.exporter.enabled=true), metrics are also available from the exporter service on port 9419.

Verify that the broker listener is present:

kubectl -n <namespace> exec <instance-name>-server-0 -- \
  rabbitmq-diagnostics listeners

Verify that the Service exposes the expected ports:

kubectl -n <namespace> get svc <instance-name>

Key Metric Categories

Category	What to Watch	Why It Matters
Instance availability	`RabbitmqCluster` phase, readiness, and service reachability	Shows whether the cluster is available from the platform and network perspective.
Connections and channels	Connection count, channel count, consumer count	Sudden drops or spikes usually indicate client-side failures or connection storms.
Publish and delivery rates	Message ingress, delivery, acknowledgement, and redelivery rates	Shows whether producers and consumers remain in balance.
Queue backlog	`messages_ready`, `messages_unacknowledged`, queue count	Indicates consumer lag, stuck consumers, or retry loops.
Memory	Memory usage, high watermark, memory alarms	Memory alarms can block publishers and reduce throughput.
Disk	Free disk space and disk alarms	Disk alarms can block publishers and indicate a backlog or storage sizing problem.
File descriptors and sockets	Used file descriptors and socket count	Protects the broker from exhausting connection-related resources.
Plugins and listeners	Management, Prometheus, TLS, Shovel, or Federation listeners	Confirms that operational features remain enabled after updates.

RabbitMQ exports mostly broker-level metrics by default. When you need detailed per-queue or per-exchange analysis, combine broker metrics with application metrics, queue inspection commands, and workload-specific dashboards.

Recommended Alert Signals

Create or tune alert policies for the following conditions:

Signal	Recommended Starting Point	Operational Meaning
Instance availability	`!= 1` for 30 seconds	The cluster is not fully available to clients.
Node memory utilization	`> 80%` for 30 seconds	Memory pressure is rising and can trigger broker alarms.
Node storage utilization	`> 80%` for 30 seconds	The broker is approaching disk pressure and publish blocking risk.
Channel count	Threshold based on application design	Sudden growth can indicate connection churn or channel leaks.
Connection count	Threshold based on application design	Large drops or spikes often indicate client or network problems.
Message write frequency	Threshold based on expected workload	A drop can indicate producer issues; a spike can require scaling or backlog controls.
Queue backlog	Sustained growth in ready or unacknowledged messages	Consumers are not keeping up or retry patterns are unhealthy.

How to Interpret Common Signals

Signal	Interpretation	Recommended Follow-up
Disk alarm is active	The broker is protecting itself because free space is below the configured watermark.	Inspect backlog, storage usage, and message retention policies.
Memory alarm is active	The broker is under memory pressure and can block publishers.	Check connection churn, queue growth, and memory-heavy workloads.
`messages_ready` grows	Consumers are not draining the queue fast enough.	Check consumer health, scaling, backlog controls, and retry topology.
`messages_unacknowledged` grows	Consumers are holding deliveries longer than expected.	Check consumer latency, prefetch, downstream dependencies, and stuck consumers.
Connection count drops sharply	Producers or consumers lost connectivity.	Check service exposure, TLS, credentials, and broker alarms.
Connection count spikes sharply	Clients may be reconnecting repeatedly.	Check rolling restarts, DNS, TLS failures, and application reconnect loops.

Alert Policy Guidance

Go to the Application Service's Alerts > Alert Policies page to create alert policies for RabbitMQ. Built-in metrics are the fastest way to enable baseline coverage. When built-in metrics are not sufficient, create custom PromQL-based alerts and test them before you rely on them in production.

For more information on configuring and using alerts, see the platform Alert Management documentation.

#Monitoring, Metrics, and Alerts

#TOC

#Metrics Collection

#Key Metric Categories

#Recommended Alert Signals

#How to Interpret Common Signals

#Alert Policy Guidance

#Related Information