monitor.html.md.erb

---
title: Monitoring and KPIs for VMware RabbitMQ for Tanzu Application Service
owner: London Services
---

This topic explains how to monitor the health of the <%= vars.product_full %>
service using the logs, metrics, and Key Performance Indicators (KPIs) generated by
<%= vars.product_short %> component VMs.

<p class="note">
  <strong>Note:</strong> As of <%= vars.product_short %> v2.0, <code>rabbitmq_prometheus</code>
  plug-in now provides RabbitMQ Server metrics.
  Consequently, many metric names change after upgrading to <%= vars.product_short %> v2.0.
  Both on-demand and pre-provisioned <%= vars.product_short %> are affected.
  For a list of the changes made in v2.0 to metric names, see
  <a href="./migrate-2-0-metrics.html">Migrating Metrics from <%= vars.product_short %> v1.x to v2.0</a>.
</p>

## <a id="metrics"></a> Metrics

Metrics are regularly-collected log entries that report measured component states.
You can either consume metrics through the Loggregator subsystem, or by configuring a Prometheus server
or the Healthwatch tile.
The Loggregator subsystem collects metrics automatically based on the metrics polling interval.
Prometheus servers and the Healthwatch tile directly scrape the VMs deployed by <%= vars.product_short %>.
The RabbitMQ servers expose the same information in each case.

For a full list of all metrics exposed in pre-provisioned and on-demand service
instances of <%= vars.product_short %>, see the [Component Metrics Reference](#reference)
later in this topic.

<p class="note">
  <strong>Note:</strong> As of <%= vars.product_short %> v2.0, the format of the metrics has changed.
  For a list of the changes to metric names in <%= vars.product_short %> v2.0, see
  <a href="./migrate-2-0-metrics.html">Migrating Metrics from <%= vars.product_short %> v1.x to v2.0</a>.
</p>

### <a id="loggregator"></a> Collecting Metrics with the Loggregator System

Loggregator-collected metrics are long, single lines of text that follow the format:

```
origin:"p-rabbitmq" eventType:ValueMetric timestamp:1616427704616569016 deployment:"cf-rabbitmq" job:"rabbitmq-broker" index:"0" ip:"10.0.4.101" tags:<key:"instance_id" value:"d4b4fd51-50de-4227-a96f-8ce636960f0b" > tags:<key:"source_id" value:"rabbitmq-broker" > valueMetric:<name:"_p_rabbitmq_service_broker_heartbeat" value:1 unit:"boolean" >
```

If the prometheus plug-in is enabled, <%= vars.product_short %> automatically
collects these metrics and forwards them to the Loggregator system.
For general information about logging and metrics in <%= vars.app_runtime_full %>
and how to consume the metrics from the Loggregator system, see
[Overview of Logging and Metrics](https://docs.pivotal.io/application-service/loggregator/data-sources.html).

#### <a id="metrics-polling-interval"></a> Configure the Metrics Polling Interval

The default metrics polling interval for Loggregator is 30 seconds.
The **metrics polling interval** is a configuration option on the <%= vars.product_short %> tile
(**Settings** > **RabbitMQ**). Setting this interval to -1 deactivates metrics.
The interval setting applies to all components deployed by the tile.

To configure the metrics polling interval:

1. From the <%= vars.ops_manager %> Installation Dashboard, click the <%= vars.product_short %> tile.
1. In the <%= vars.product_short %> tile, click the **Settings** tab.
1. Click **Metrics**.
   ![Screenshot of the RabbitMQ tile with header
   'Metrics settings for both Pre-Provisioned and On-Demand service offerings'.
   The fields shown are described in the table in the step.](images/metrics-configuration.png)

1. Configure the fields on the **Metrics** pane as follows:

   <table class="nice">
       <th>Field</th>
       <th>Description</th>
        <tr>
            <td><strong>Metrics polling interval</strong></td>
            <td>
              The default setting is 30 seconds for all deployed components.
              VMware recommends that you do not change this interval.
              To avoid overwhelming components, do not set this below 10 seconds.
              Set this to -1 to deactivate metrics.
              Changing this setting affects all deployed instances.
            </td>
        </tr>
   </table>

1. Click **Save**.
1. Return to the <%= vars.ops_manager %> Installation Dashboard.
1. Click **Review Pending Changes**.
   For more information about this <%= vars.ops_manager %> page,
   see [Reviewing Pending Product Changes](https://docs.pivotal.io/ops-manager/install/review-pending-changes.html).
1. Click **Apply Changes** to redeploy with the changes.

#### <a id="detailed-metrics"></a> Gathering Additional Metrics

As of <%= vars.product_short %> v2.0.11, in addition to the standard RabbitMQ server
metrics gathered by <%= vars.product_short %>, you can gather additional, detailed metrics for your system.
For more information about the additional metrics, see
[rabbitmq-server](https://github.com/rabbitmq/rabbitmq-server/tree/master/deps/rabbitmq_prometheus#selective-querying-of-per-object-metrics) in GitHub.
To limit the performance impact of gathering more data, you can choose to gather
additional metrics only for specific vhosts, or for only a subset of these metrics to be generated.

The process to configure additional metrics differs for the different service offerings:

- **For the on-demand offering:** You configure additional metrics when creating
or updating a service instance.
For more information, see [Collect Additional RabbitMQ Metrics in Loggregator (on-demand instances)](use.html#detailed-metrics).

- **For the pre-provisioned offering:** You configure additional metrics in <%= vars.ops_manager %>.
For more information, see [Collect Additional RabbitMQ Metrics in Loggregator (pre-provisioned instances)](install-config-pp.html#detailed-metrics).


### <a id="prometheus"></a> Collecting Metrics with Prometheus

Prometheus-style metrics are available at `SERVICE-INSTANCE-ID:15692/metrics`.
To pull these metrics from the service instances, you must deploy and configure a Prometheus instance.
For more information about the plugin and monitoring RabbitMQ using Prometheus and Grafana, see the
[RabbitMQ documentation](https://www.rabbitmq.com/prometheus.html).

The following Prometheus scrape config dynamically discovers RabbitMQ instances:

```
job_name: rabbitmq
metrics_path: "/metrics"
scheme: http
dns_sd_configs:
- names:
    - q-s4.rabbitmq-server.*.*.bosh.
  type: A
  port: 15692
```

<p class="note">
  <strong>Note:</strong> If you are using TLS in the on-demand service offering,
  your port will be <code>15691</code>.
</p>

The regular expression in the scrape config name ensures that Prometheus discovers all future service
instances too.

If Prometheus is deployed with the Healthwatch v2 tile, then the above configuration is automatically applied.

<p class="note">
  <strong>Note:</strong> By default, metrics are aggregated.
  This results in a lower performance overhead at the cost of lower data fidelity.
  For more information, see the
  <a href="https://www.rabbitmq.com/prometheus.html#metric-aggregation">RabbitMQ documentation</a>.
</p>


#### <a id="per-object"></a> Scrape Per-Object Metrics

To collect metrics on a per-object scope, such as per-queue, do one of the following:

- Enable per-object metrics by setting `prometheus.return_per_object_metrics = true`.
  For instructions, see [Expert Mode: Overriding RabbitMQ Server Configuration](./expert-override-config.html)

- Scrape the dedicated per-object metrics endpoint, for example:

    ```
    job_name: rabbitmq
    metrics_path: "/metrics/per-object"
    scheme: http
    dns_sd_configs:
    - names:
        - q-s4.rabbitmq-server.*.*.bosh.
      type: A
      port: 15692
    ```

<p class="note">
  <strong>Note:</strong> Collecting per-object metrics on a system with many objects,
  such as queues or connections, is very slow.
  Ensure you understand the impact on your system and its load before enabling
  this on a production cluster.
</p>

#### <a id="filter-per-object"></a> Filter the Per-Object Metrics

As of <%= vars.product_short %> v2.0.7, you can collect only the per-object metrics for
certain scopes of metrics.
This decreases the performance overhead, while retaining data fidelity for metrics that you are interested in.
For more information, see [Selective querying of per-object metrics](https://github.com/rabbitmq/rabbitmq-server/tree/master/deps/rabbitmq_prometheus#selective-querying-of-per-object-metrics).

For example, the following scrape config collects only the per-object metrics that allow you to see how
many messages sit in every queue and how many consumers each of these queues have:

```
job_name: rabbitmq
metrics_path: "/metrics/detailed?family=queue_coarse_metrics&family=queue_consumer_count"
scheme: http
dns_sd_configs:
- names:
    - q-s4.rabbitmq-server.*.*.bosh.
  type: A
  port: 15692
```

### <a id="Grafana"></a> Grafana Dashboards

The RabbitMQ team has written dashboards that you can import into Grafana.
These dashboards include documentation for each metric.

* **[RabbitMQ-Overview](https://grafana.com/grafana/dashboards/10991):**
  Dashboard for an overview of the RabbitMQ system

* **[Erlang-Distribution](https://grafana.com/grafana/dashboards/11352):**
  Dashboard for the underlying Erlang distribution

For more information about these dashboards, see the
[RabbitMQ documentation](https://www.rabbitmq.com/prometheus.html).
If Grafana is deployed using the Healthwatch v2 tile, you can load these dashboards by selecting the
**Enable RabbitMQ dashboards** checkbox in the Healthwatch tile.

### <a id="heartbeats"></a> Component Heartbeats

Some components periodically emit Boolean heartbeat metrics to the Loggregator system.
<code>1</code> means the system is available, and <code>0</code> or the absence of a heartbeat metric
means the service is not responding and you must investigate the issue.

#### <a id="broker-heartbeat"></a> Service Broker Heartbeat

<table>
   <tr><th colspan="2" style="text-align: center;"><br>_p_rabbitmq_service_broker_heartbeat<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        RabbitMQ service broker <code>is alive</code> poll that indicates if the component is
        available and can respond to requests.
        <br><br>
        <strong>Use</strong>: If the service broker does not emit heartbeats, this indicates that it
        is offline.
        The service broker is required to create, update, and delete service instances, which are
         critical for dependent tiles such as Spring Cloud Services and Spring Cloud Data Flow.
        <br><br>
        <strong>Origin</strong>: Doppler/Firehose<br>
        <strong>Type</strong>: Boolean<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
    </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 5 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: N/A<br>
        <strong>Red critical</strong>: &lt; 1
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        Search the RabbitMQ service broker logs for errors.
        You can find this VM by targeting your <%= vars.product_short %> deployment
        with BOSH, and running one of these commands:
        <ul>
          <li><strong>For on-demand:</strong> <pre class="terminal">bosh -d service-instance_GUID vms</pre></li>
          <li><strong>For pre-provisioned:</strong> <pre class="terminal">bosh -d p-rabbitmq-GUID vms</pre></li>
        </ul>
      </td>
   </tr>
</table>

#### <a id="haproxy-heartbeat"></a> HAProxy Heartbeat

<p class="note">
  <strong>Note:</strong> The HAProxy is only used in the pre-provisioned service
  offering, so HAProxy heartbeats are only present if this service offering is enabled.
</p>

<table>
   <tr><th colspan="2" style="text-align: center;"><br> _p_rabbitmq_haproxy_heartbeat<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        RabbitMQ HAProxy <code>is alive</code> poll, which indicates if the
        component is available can respond to requests.
        <br><br>
        <strong>Use</strong>: If the HAProxy does not emit heartbeats, this indicates
        that it is offline. To be functional, pre-provisioned service instances require HAProxy.
        <br><br>
        <strong>Origin</strong>: Doppler/Firehose<br>
        <strong>Type</strong>: Boolean<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
      </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 5 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: N/A<br>
        <strong>Red critical</strong>: &lt; 1
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
         Search the RabbitMQ HAProxy logs for errors.
         You can find the VM by targeting your <%= vars.product_short %> deployment
         with BOSH and running the following command, which lists <code>HAProxy_GUID</code>:
         <pre class="terminal">bosh -d service-instance_GUID vms</pre>
      </td>
   </tr>
</table>


### <a id="kpi"></a> Key Performance Indicators

The following sections describe the metrics used as Key Performance Indicators (KPIs)
and other useful metrics for monitoring the <%= vars.product_short %> service.

KPIs for <%= vars.product_short %> are metrics that operators find most
useful for monitoring their <%= vars.product_short %> service to ensure smooth operation.
KPIs are high-signal-value metrics that can indicate emerging issues.
KPIs can be raw component metrics or derived metrics generated by applying formulas to raw metrics.

VMware provides the following KPIs as general alerting and response guidance for typical
<%= vars.product_short %> installations.
VMware recommends the following to operators:

- Continue to fine-tune the alert measures to your installation by observing historical trends.
- Expand beyond the guidance and create new, installation-specific monitoring metrics,
thresholds, and alerts based on learning from your own installation.

For a list of all <%= vars.product_short %> raw component metrics, see
[Component Metrics Reference](#reference) later in this topic.

#### <a id="kpi-heartbeat"></a> Component Heartbeats

If collecting metrics using Loggregator, several components in <%= vars.product_short %> emit heartbeat
metrics. For more information, see [Component Heartbeats](#heartbeats) earlier in this topic.

#### <a id="file-descriptors"></a> RabbitMQ Server File Descriptors

<table>
   <tr><th colspan="2" style="text-align: center;"><br> rabbitmq_process_open_fds<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        The number of file descriptors consumed.
        <br><br>
        <strong>Use</strong>: If the number of file descriptors consumed becomes too large,
        the VM might lose the ability to perform disk I/O, which can cause data loss.
        <p class="note">
          <strong>Note:</strong> nonpersistent messages are handled by retries or some other
          logic by the producers.
        </p>
        <strong>Origin</strong>: Doppler/Firehose<br>
        <strong>Type</strong>: Count<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
      </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td><strong>Yellow warning</strong>: &gt; 250000 <br>
      <strong>Red critical</strong>: &gt; 280000</td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        The default <code>ulimit</code> for <%= vars.product_short %> is 300,000.
        If this metric meets or exceeds the recommended thresholds for extended
        periods of time, consider reducing the load on the server.
      </td>
   </tr>
</table>

#### <a id="erlang-processes"></a> Erlang Processes

<table>
   <tr><th colspan="2" style="text-align: center;"><br> erlang_vm_process_count<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        The number of Erlang processes that RabbitMQ consumes. RabbitMQ runs on an Erlang VM.
        For more information, see the <a href="https://www.erlang.org/docs">Erlang Documentation</a>.
        <br><br>
        <strong>Use</strong>: This is the key indicator of the processing capability of a node.
        <br><br>
        <strong>Origin</strong>: Doppler/Firehose<br>
        <strong>Type</strong>: Count<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
      </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: &gt; 900000<br>
        <strong>Red critical</strong>: &gt; 950000
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        The default Erlang process limit in <%= vars.product_short %> v1.6 and later is 1,048,816.
        If this metric meets or exceeds the recommended thresholds for extended
        periods of time, consider scaling the RabbitMQ nodes in the tile <strong>Resource Config</strong> pane.
      </td>
   </tr>
</table>


### <a id="bosh"></a> BOSH System Health Metrics

<%# The below partial is in https://github.com/pivotal-cf/docs-partials %>

<%= partial vars.path_to_partials + '/services/bosh_health_metrics_pcf2' %>

All BOSH-deployed components generate the system health metrics listed in this section.
These component metrics are from <%= vars.product_short %> components, and serve as KPIs for
the <%= vars.product_short %> service.

#### <a id="ram"></a> RAM

<table>
   <tr><th colspan="2" style="text-align: center;"><br> system_mem_percent <br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        RAM being consumed by the <code>p.rabbitmq</code> VM.
        <br><br>

      <strong>Use</strong>: RabbitMQ is considered to be in a good state when it has few or no messages.
              In other words, "an empty rabbit is a happy rabbit."
              Alerting on this metric can indicate that there are too few consumers or apps that
              read messages from the queue.
      <br><br>
      Healthmonitor reports when RabbitMQ uses more than 40% of its RAM for the past ten minutes.
      <br><br>
      <strong>Origin</strong>:  BOSH HM<br>
      <strong>Type</strong>: Percent<br>
      <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
    </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: &gt; 40 <br>
        <strong>Red critical</strong>: &gt; 50
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>Add more consumers to drain the queue as fast as possible.</td>
   </tr>
</table>

#### <a id="cpu"></a> CPU

<table>
   <tr><th colspan="2" style="text-align: center;"><br> system_cpu_user<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>CPU being consumed by user processes on the <code>p.rabbitmq</code> VM.<br><br>

      <strong>Use</strong>: A node that experiences context switching or high CPU usage becomes unresponsive.
      This also affects the ability of the node to report metrics.
      <br><br>
      Healthmonitor reports when RabbitMQ uses more than 40% of its CPU for the past ten minutes.
      <br><br>
      <strong>Origin</strong>:  BOSH HM<br>
      <strong>Type</strong>: Percent<br>
      <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)<br>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: &gt; 60 <br>
        <strong>Red critical</strong>: &gt; 75
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        Remember that "an empty rabbit is a happy rabbit". Add more consumers to drain the queue as fast as possible.
      </td>
   </tr>
</table>

#### <a id="ephemeral-disk"></a> Ephemeral Disk

<table>
   <tr><th colspan="2" style="text-align: center;"><br> system_disk_ephemeral_percent<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        Ephemeral Disk being consumed by the <code>p.rabbitmq</code> VM.
        <br><br>
        <strong>Use</strong>: If system disk fills up, there are too few consumers.
        <br><br>
        Healthmonitor reports when RabbitMQ uses more than 50% of its Ephemeral Disk for the past ten minutes.
        <br><br>
        <strong>Origin</strong>:  BOSH HM<br>
        <strong>Type</strong>: Percent<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
      </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: &gt; 50 <br>
        <strong>Red critical</strong>: &gt; 75
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        Remember that "an empty rabbit is a happy rabbit". Add more consumers to drain the queue as
        fast as possible. Insufficient disk space leads to node failures and might result in data
        loss due to all disk writes failing.
      </td>
   </tr>
</table>

#### <a id="persistent-disk"></a> Persistent Disk

<table>
   <tr><th colspan="2" style="text-align: center;"><br> system_disk_persistent_percent<br><br></th></tr>
   <tr>
      <th width="25%">Description</th>
      <td>
        Persistent Disk being consumed by the <code>p.rabbitmq</code> VM.<br><br>
        <strong>Use</strong>: If system disk fills up, there are too few consumers.
        <br><br>
        Healthmonitor reports when RabbitMQ uses more than 50% of its Persistent Disk.
        <br><br>
        <strong>Origin</strong>:  BOSH HM<br>
        <strong>Type</strong>: percent<br>
        <strong>Frequency</strong>: 30 seconds (default), 10 seconds (configurable minimum)
      </td>
   </tr>
   <tr>
      <th>Recommended measurement</th>
      <td>Average over last 10 minutes</td>
   </tr>
   <tr>
      <th>Recommended alert thresholds</th>
      <td>
        <strong>Yellow warning</strong>: &gt; 50 <br>
        <strong>Red critical</strong>: &gt; 75
      </td>
   </tr>
   <tr>
      <th>Recommended response</th>
      <td>
        Remember that "an empty rabbit is a happy rabbit". Add more consumers to drain the queue as fast as possible. Insufficient disk space leads to node failures and might result in data loss due to all disk writes failing.
      </td>
   </tr>
</table>

## <a id="logging"></a> Logging

You can configure <%= vars.product_short %> to forward logs to an external syslog server, and customise the format of
the logs output.

### <a id="syslog-forwarding"></a> Configure Syslog Forwarding

Syslog forwarding is preconfigured and enabled by default.
VMware recommends that you keep the default setting because it is good operational practice.
However, you can opt out by selecting **No** for **Do you want to configure syslog?** in the
<%= vars.ops_manager %> **Settings** tab.

To enable monitoring for <%= vars.product_short %>, operators designate an external syslog endpoint
for <%= vars.product_short %> component log entries.
The endpoint serves as the input to a monitoring platform such as Datadog, Papertrail, or SumoLogic.

To specify the destination for <%= vars.product_short %> log entries:

1. From the <%= vars.ops_manager %> Installation Dashboard, click the <%= vars.product_short %> tile.
1. In the <%= vars.product_short %> tile, click the **Settings** tab.
1. Click **Syslog**.
![Screenshot of RabbitMQ tile settings with header called 'Syslog'.
The fields shown are described in the table in the next step.](images/syslog-config.png)
1. Configure the fields on the **Syslog** pane as follows:

   <table class="nice">
       <th>Field</th>
       <th>Description</th>
       <tr>
           <td><strong>Syslog Address</strong></td>
           <td>Enter the IP or DNS address of the syslog server</td>
       </tr>
       <tr>
           <td><strong>Syslog Port</strong></td>
           <td>Enter the port of the syslog server</td>
       </tr>
       <tr>
           <td><strong>Transport Protocol</strong></td>
           <td>Select the transport protocol of the syslog server. The options are <strong>TLS</strong>,
             <strong>UDP</strong>, or <strong>RELP</strong>.</td>
       </tr>
       <tr>
           <td><strong>Enable TLS</strong></td>
           <td>Enable TLS to the syslog server.</td>
       </tr>
       <tr>
           <td><strong>Permitted Peer</strong></td>
           <td>If there are several peer servers that can respond to remote syslog connections,
               enter a wildcard in the domain, such as <code>*.example.com</code>.</td>
       </tr>
       <tr>
           <td><strong>SSL Certificate</strong></td>
           <td>If the server certificate is not signed by a known authority, such as an internal syslog
             server, enter the CA certificate of the log management service endpoint.</td>
       </tr>
       <tr>
           <td><strong>Queue Size</strong></td>
           <td>The number of log entries the buffer holds before dropping messages.
             A larger buffer size might overload the system. The default is 100000.</td>
       </tr>
       <tr>
           <td><strong>Forward Debug Logs</strong></td>
           <td>Some components produce very long debug logs. This option prevents them from being
             forwarded.
             These logs are still written to local disk.</td>
       </tr>
       <tr>
           <td><strong>Custom Rules</strong></td>
           <td>
             The custom rsyslog rules are written in
             <a href="https://www.rsyslog.com/doc/v8-stable/rainerscript/index.html">RainerScript</a>
             and are inserted before the rule that forwards logs.
             For the list of custom rules you can add in this field, see
             <a href="#rabbitmq-syslog-custom-rules">RabbitMQ Syslog Custom Rules</a> later in this topic.
             For more information about the program names you can use in the custom rules, see
             <a href="#program-names">RabbitMQ Program Names</a> later in this topic.
           </td>
       </tr>
   </table>

1. Click **Save**.
1. Return to the <%= vars.ops_manager %> Installation Dashboard.
1. Click **Review Pending Changes**.
For more information about this <%= vars.ops_manager %> page,
see [Reviewing Pending Product Changes](https://docs.pivotal.io/ops-manager/install/review-pending-changes.html).
1. Click **Apply Changes** to redeploy with the changes.


### <a id="log-format"></a> Logging Format

With <%= vars.product_short %> logging configured, several types of components generate logs:
the RabbitMQ message server nodes, the service brokers, and (if present) HAProxy.

* The logs for RabbitMQ server nodes follow the format:

    ```
    [job:"rabbitmq-server" ip:"192.0.2.0"]
    ```
* The logs for the pre-provisioned RabbitMQ service broker follow the format:

    ```
    [job:"rabbitmq-broker" ip:"192.0.2.1"]
    ```
* The logs for the on-demand RabbitMQ service broker follow the format:

    ```
    [job:"on-demand-broker" ip:"192.0.2.2"]
    ```
* The logs for HAProxy nodes follow the format:

    ```
    [job:"rabbitmq-haproxy" ip:"192.0.2.3"]
    ```

RabbitMQ and HAProxy servers log at the <code>info</code> level and capture errors,
warnings, and informational messages.

<%= partial vars.path_to_partials + '/rabbitmq/log-formats' %>

## <a id="reference"></a> Component Metrics Reference

<%= vars.product_short %> component VMs emit the following raw metrics.

<p class="note">
  <strong>Note:</strong> As of <%= vars.product_short %> v2.0, the format of the metrics has changed.
  For a list of the changes to metric names in <%= vars.product_short %> v2.0, see
  <a href="./migrate-2-0-metrics.html">Migrating Metrics from <%= vars.product_short %> v1.x to v2.0</a>.
</p>

### <a id="rabbitmq-metrics"></a> RabbitMQ Server Metrics

RabbitMQ server metrics are emitted by the `rabbitmq_prometheus` plug-in.
The list of metrics provided is extensive, and allows full observability of your messages, VM health, and more.

For the full list of metrics emitted, see the
[rabbitmq-server](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_prometheus/metrics.md)
repository in GitHub.

### <a id="haproxy-metrics"></a>HAProxy Metrics (Pre-Provisioned Only)

<%= vars.product_short %> HAProxy components emit the following metrics.

<table>
    <tr>
        <th>Name Space</th>
        <th>Unit</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><code>_p_rabbitmq_haproxy_heartbeat</code></td>
        <td>Boolean</td>
        <td>Indicates whether the RabbitMQ HAProxy component is available and can respond to requests</td>
    </tr>
    <tr>
        <td><code>_p_rabbitmq_haproxy_health_connections</code></td>
        <td>Count</td>
        <td>The total number of concurrent front-end connections to the server</td>
    </tr>
    <tr>
        <td><code>_p_rabbitmq_haproxy_backend_qsize_amqp</code></td>
        <td>Size</td>
        <td>The total size of the AMQP queue on the server</td>
    </tr>
    <tr>
        <td><code>_p_rabbitmq_haproxy_backend_retries_amqp</code></td>
        <td>Count</td>
        <td>The number of AMQP retries to the server</td>
    </tr>
    <tr>
        <td><code>_p_rabbitmq_haproxy_backend_ctime_amqp</code></td>
        <td>Time</td>
        <td>The total time to establish the TCP AMQP connection to the server</td>
    </tr>
</table>

### <a id="odb-metrics"></a>On-Demand Broker Metrics

The <%= vars.product_short %> on-demand broker emits the following metrics.

<table>
    <tr>
        <th>Name Space</th>
        <th>Unit</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><code>_on_demand_broker_p_rabbitmq_quota_remaining</code></td>
        <td>Count</td>
        <td>The total quota for on-demand service instances set for this broker</td>
    </tr>
    <tr>
        <td><code>_on_demand_broker_p_rabbitmq_total_instances</code></td>
        <td>Count</td>
        <td>The total count of on-demand service instances created by this broker</td>
    </tr>
    <tr>
        <td><code>_on_demand_broker_p_rabbitmq_{PLAN_NAME}_quota_remaining</code></td>
        <td>Count</td>
        <td>The total quota for on-demand service instances set for this broker for a specific plan</td>
    </tr>
    <tr>
        <td><code>_on_demand_broker_p_rabbitmq_{PLAN_NAME}_total_instances</code></td>
        <td>Count</td>
        <td>The total count of on-demand service instances created by this broker for a specific plan</td>
    </tr>
</table>