forked from jasonkeene/docs-rabbitmq-staging
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathondemand.html.md.erb
426 lines (326 loc) · 24.8 KB
/
ondemand.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
---
title: Unlocking the Power of On-Demand VMware RabbitMQ for Tanzu Application Service
owner: London Services
---
This topic explains how to benefit from the on-demand service plans.
## <a id="intro"></a> Introduction
<%= vars.product_full %> responds to the demands of operators by offering a RabbitMQ on-demand cluster
for their app developer teams, in addition to the single-node on-demand plan.
The on-demand cluster plan is designed for workloads that require the same resilience requirements as the pre-provisioned offering,
but also require their workloads be isolated. The platform operations team can configure a
<%= vars.product_short %> cluster to meet their business requirements
and empower app development teams to self-serve their own RabbitMQ cluster.
<%= vars.product_short %> also provides smoke tests for the on-demand plans so that operations
teams can validate the app developer workflow for on-demand services.
See [Smoke Tests](./smoke-tests.html).
With the on-demand cluster plan, platform operators can offer their app developers three types of <%= vars.product_short %> service plans:
* **On-Demand single node**---For app developer teams requiring greater isolation than provided by the pre-provisioned approach.
App development teams can have full access to their own message broker to adapt the runtime parameters to their requirements.
For more information about these parameters, see [Parameters and Policies](https://www.rabbitmq.com/parameters.html) in the RabbitMQ documentation.
* **On-Demand cluster**---For an increased level of message resilience and cluster availability,
as well as the benefits of workload isolation mentioned above.
* **Pre-provisioned**---For light to moderate messaging needs, this service is fully operated and managed by platform operators as a service.
For information about the pre-provisioned plan, see [Deploying the <%= vars.product_short %> Pre-Provisioned Service](./deploying.html).
For information about using pre-provisioned plans to isolate workloads, see [Creating Isolation with the Tile Replicator](./replicator.html).
## <a id="plan-selection"></a> Deciding Which Service Plan to Use
VMware recommends the on-demand service, which is designed for independent isolated <%= vars.product_short %> service instances.
The existing pre-provisioned offering has many <%= vars.product_short %> service instances on a single VM, in a multi-tenancy model.
In this multi-tenancy model, a single misbehaving app can take down the entire cluster for everyone.
From research and feedback on the issues customers had when using the pre-provisioned service,
VMware is making different design decisions for the on-demand service.
To provide you enough visibility to decide which service to use,
the table below describes the current feature discrepancies between the pre-provisioned and on-demand services,
and plans for addressing these discrepancies. Please give feedback
on how to meet your use case requirements with the on-demand service.
<table>
<th width="150">Feature</th><th width="250">Pre-Provisioned Service</th><th width="250">On-Demand Service</th>
<tr>
<td>Configuration</td><td>Enabled in base-64 encoded text box</td><td>Plan to address</td>
</tr>
<tr>
<td>Plugins</td>
<td>Tier-1 plugins enabled using checkboxes in UI.</td>
<td>A selection of tier-1 plugins are enabled by default on all instances.
Other tier-1 plugins can be optionally enabled by an app developer.
See <a href="./install-config.html#server-plugins">RabbitMQ Server Plugins</a>.</td>
</tr>
<tr>
<td>RabbitMQ admin credentials to access RabbitMQ Management UI</td>
<td>Can set password using tile UI. For more information, see
[Access RabbitMQ Management UI for Pre-Provisioned Service Instances](./managing.html#access-ui-pp).</td>
<td>Can access by creating a service key, see
<a href="install-config.html#admin-service-key">Create an Admin User for a Service Instance</a>.</td>
</tr>
<tr>
<td>Erlang cookie</td>
<td>Operator can change. This might cause problems.
For more information, see <a href="./install-config-pp.html#changing-issue">Changing the Erlang Cookie Value Known Issue</a>.</td>
<td>This is managed by the service. No operator intervention needed.</td>
</tr>
<tr>
<td><%= vars.product_short %> TLS versions</td>
<td>Available due to security concerns about the TLS packaged with the pre-provisioned service</td>
<td>All instances only have TLS v1.1 and TLS v1.2 available.</td>
</tr>
<tr>
<td>Resource Sharing</td>
<td>Yes. Service instances share resources on the same VM and can affect one another.</td>
<td>No. On-Demand ensures isolation between service instances by creating a separate VM per service instance.</td>
</tr>
<tr>
<td>External load balancer DNS name</td><td>Available</td>
<td><ul><li>Plan to address</li><li>IaaS-specific load balancers can still be used.</li></ul></td>
</tr>
<tr>
<td>Disk free alarm limit</td><td>Configurable</td><td>No plans to address.
The default persistent disk size is controlled at the plan level and is set relative to memory.
This removes the ability to mis-configure the alarm limit.</td>
</tr>
<tr>
<td>Load-balancing</td>
<td>Available using HAProxy</td>
<td>Available using BOSH DNS</td>
</tr>
<tr>
<td>RabbitMQ servers static IP</td><td>Exists</td><td>Evaluating based on customer needs and feedback</td>
</tr>
<tr>
<td>Policy for new instances</td><td>Configurable</td><td>Plan to address</td>
</tr>
<tr>
<td>Almost-instant provisioning</td>
<td>Instantly provisioned</td>
<td>
<ul><li>No plans to address.</li>
<li>Instant provisioning for on-demand is limited by the IaaS.
This limitation might be addressed by containers.</li></td>
</tr>
<tr>
<td>Individual instance upgrade</td><td>Possible when using tile replicator</td><td>Plan to address</td>
</tr>
<tr>
<td>Network partition behavior</td>
<td><ul><li>Defaults to <code>pause_minority</code></li><li> Configurable</li></ul></td>
<td><ul><li>Defaults to <code>pause_minority</code></li><li> Configurable for each plan</li></ul></td>
</tr>
<tr>
<td>TLS</td><td>TLS enabled between client and RabbitMQ broker, possible to configure peer validation</td><td>TLS between client and the RabbitMQ broker has been implemented.</td>
</tr>
</table>
## <a id="singlenode"></a> On-Demand Single Node Plan
This plan is designed to be simple to configure, deploy, and use.
It gives app developer teams fast access to the power of the leading open source message broker backed by BOSH
to meet all but the most demanding high availability app messaging requirements.
This plan can suit high-performance workloads requiring messaging resilience and asynchronous messaging replication.
RabbitMQ copies messages to disk for resilience and allows asynchronous messaging replication through the RabbitMQ Federation plugin.
This plan offers:
* Fast access to an isolated instance of RabbitMQ scoped for the app developer teams
* Org and space admin access to the RabbitMQ Management UI so app developer teams can have full control over the node
* Updates and upgrades initiated and controlled by the operator to keep the instance up-to-date with the latest security patches and issue fixes
* Message resilience provided through RabbitMQ exchange, queue Federation, and Shovel plugins.
## <a id="cluster"></a> On-Demand Cluster Plan
Like the single node plan, this plan is designed to be simple to configure, deploy and use.
It gives app developer teams fast access to the power of the leading Open Source message broker backed by BOSH
to meet all but the most demanding high availability app messaging requirements.
This plan can suit high performance workloads requiring messaging resilience (copied to disk)
and asynchronous messaging replication through the RabbitMQ Federation plugin.
With this plan, however, you also scale out <%= vars.product_short %> to multiple nodes.
This plan offers:
* Fast access to an isolated, clustered instance of RabbitMQ scoped to the app developer team orgs and spaces
* Admin access to the RabbitMQ Management UI to give app developer teams full control over the cluster
* Updates and upgrades initiated and controlled by the operator to keep the instance up-to-date with the latest security patches and issue fixes.
* Message resilience provided by mirroring queues across RabbitMQ nodes, and the option to use the Federation and Shovel plugins.
### <a id="principles"></a> General Principles of the Cluster Plan
The following are some general principles to be aware of when configuring the cluster plan:
#### Designed for Consistency
RabbitMQ clustering is not primarily a solution for increased availability.
Instead, it is designed for consistency and partition tolerance, as described in the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem).
RabbitMQ clustering provides increased message consistency through queue mirroring.
This means that messages accessed in one queue are exactly the same as in another queue.
For more information, see [Consistency or Availability Tradeoff](#tradeoff).
Other options can be used for availability requirements, such as the use of federation between exchanges or queues.
For a detailed description of distributed RabbitMQ brokers, see the [RabbitMQ documentation](http://www.rabbitmq.com/distributed.html).
#### Number of Nodes
Every node in the on-demand cluster maintains a complete database of all metadata, and all changes to the metadata are confirmed by every node in the cluster.
Therefore, going beyond seven nodes can have a significant negative impact on performance.
For optimum resilience and performance, VMware recommends three nodes for most workloads.
#### <a id="latency"></a> Network Latency
RabbitMQ clusters are only recommended for deployment in low latency networks,
which normally means that it is not advisable to deploy these clusters across availability zones (AZs).
The stability and performance of the RabbitMQ cluster is heavily influenced by the workload on the nodes, replication choices, and network latency.
For this reason, VMware recommends that you deploy RabbitMQ clusters into a single <%= vars.ops_manager %> AZ.
However, where different AZs are in the same data center, with reliable low latency links, spanning AZs can be used.
For cloud IaaS deployments, VMware does not recommend that deployments span *regions*.
For example, in Amazon Web Services (AWS) terms, deploying a RabbitMQ cluster across AZs
within a region should provide high enough network performance to prevent impacting cluster stability.
However, deploying across AWS regions is likely to lead to cluster instability.
For more information, see the [AWS documentation](http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/RegionsAndAZs.html).
### <a id="tradeoff"></a> Consistency or Availability Tradeoff
In a distributed messaging system, a tradeoff must be made between availability or consistency when a network partition event occurs
and one or more nodes are not able to communicate with each other.
The cluster plan lets operators decide how they want the RabbitMQ cluster to react in the event of a network partition.
VMware recommends keeping the default cluster partition option of `pause_minority` because this satisfies most use cases.
Choosing the `pause_minority` partition-handling strategy favors message consistency over availability.
For more information about the options for handling partitions,
see the [RabbitMQ documentation](https://www.rabbitmq.com/partitions.html).
For a detailed description of the options available in <%= vars.product_short %>, see [Clustering and Network Partitions](./partitions.html).
Here is an example of how `pause_minority` works.
If you create a RabbitMQ cluster with three nodes and one node becomes unable to communicate with the other two, this node is in the minority.
The node that is in the minority is paused, and the other two nodes continue serving traffic.
If each of the nodes loses connectivity with the other two, then the entire cluster is paused to preserve data as no majority can be established.
The cluster *heals* when two or more nodes are able to communicate with each other.
### <a id="queues"></a> RabbitMQ Queue Availability
It is important to be aware that message queue availability is different from cluster availability.
So, having cluster availability does not mean that all of the messages within the queues are also available.
By default, queues within a RabbitMQ cluster are located on a single node---the node on which they were first declared.
However, queues can be configured to mirror across multiple nodes,
so that any message published to the queue is replicated to all mirrors.
Enabling mirroring can have a negative impact on queue performance because messages must be copied to all mirrors before being acknowledged.
Each mirrored queue consists of one leader replica and one or more mirrors, with the oldest mirror being promoted to the new leader replica if the old leader replica disappears for any reason.
Consumers are connected to the leader replica regardless of which node they connect to, and mirrors drop messages that have been acknowledged at the leader replica.
Queue mirroring enhances queue availability,
but does not distribute load across nodes because each of the participating nodes must still do all the work.
App developers must decide if they want to use queue mirroring and determine the policy they want to apply to their queues.
These choices have significant impact on the availability of their queues.
For more information, see the [RabbitMQ documentation](https://www.rabbitmq.com/ha.html).
Unlike the pre-provisioned plan, the cluster plan does not ship with a default load balancer.
Therefore, developers must configure their app to use the array of hosts provided in `VCAP_SERVICES`.
If developers enable queue mirroring, they must also ensure their apps have re-try logic
and reconnection logic that iterates over the range of hosts provided.
Most common RabbitMQ clients have this logic built into them.
For more information, see the [Spring Advanced Message Queuing Protocol (Spring AMQP) documentation](https://docs.spring.io/spring-amqp/reference/html/#auto-recovery).
Because the cluster plan is designed to enable app developer teams to self-serve, not having a load balancer in front of the RabbitMQ cluster has these benefits:
* Manage resources better, as fewer VMs are needed.
* Help with troubleshooting.
Client IP is now the IP of the source container and not the HAProxy.
* Reduce the number of hops between apps and broker.
This helps with latency.
* Determine queue placement.
This makes sense for larger scale deployments.
* Empower app developer teams to manage their cluster in the best way for their app.
* Require re-try logic in an app if it needs HA access to a queue.
Thus, all nodes can route to a queue if it is available.
## <a id="resources"></a> Managing On-Demand Resources Through Plans
In configuring each plan, there are a number of operational controls that platform operations teams can use to manage the resources consumed by on-demand RabbitMQ:
* **Control Access**---Operators can choose the app development orgs and spaces for which the plans are available and visible.
Each plan can be activated or deactivated, and service access and visibility can either be global, or enabled per org and space through the command line.<br><br>
For example, you might decide to enable the single node on-demand plan across all app developer teams to meet their demand to isolate their workload.
You might then choose to offer the on-demand cluster plan only to a subset of app developer teams who require the extra resources.
* **Set Quotas**---You can set a global quota for all on-demand instances that takes precedence over each plan quota.
This lets you guard against the risk of over-committing resources, but allows the flexibility of over-committing each plan, so you can meet the fluctuating demands of your app developers.
* **Control Resource Consumption**---Each plan offers more fine-grained control over individual plan resource consumption.
At the highest level, you can use the plan quota to control the number of instances that can be deployed within a foundation.
For each plan, you can also configure the number of nodes that constitute a cluster (3, 5, or 7), the instance type, and persistent disk storage size to best suit your requirements.
* **Monitor**---You can monitor the number of instances that have been deployed against the quota you have set
so that you can plan future resource requirements.
## <a id="customizing"></a> Customizing Plan Options
The <%= vars.product_short %> on-demand plans expose a number of configuration options.
In most cases, the default configurations meet most app demands.
However, it is important for an operations team to consider the options to ensure that they provide the best service to their app developers.
This section explains these options.
### <a id="configoptions"></a> Configuration Options
#### Single Node and Cluster Plans
* Activate or deactivate the plan
* Determine which orgs and spaces can see and access the plan
* Set Service Instance Quota
* Select AZ placement (where applicable)
* Set <%= vars.product_short %> service instance size (CPU and Memory)
* Set persistent disk size (Persisted Message Store) for the <%= vars.product_short %> service instance.
Ensure the size of the persistent disk is at least twice as large as the instance memory.
#### Cluster Plan Only
* Set number of nodes to 3, 5, or 7
* Determine network partition behavior. See [Consistency or Availability Tradeoff](#tradeoff) above.
<p class="note"><strong>Note</strong>: A load balancer, such as HAProxy, is not deployed with on-demand cluster plans.</p>
### <a id="preconfigured"></a> Things That Are Preconfigured
The following are preconfigured for both the single node and the cluster plans:
* **RabbitMQ VM Type**---When installing using <%= vars.ops_manager %>, each RabbitMQ node is configured to have the following properties:
- CPUs: 2
- RAM: 8 GB
- Ephemeral disk: 16 GB
You can change these settings in the [Service Plan Configuration](./install-config.html#service-plan) page.
Changing these settings affects all nodes.
* **Persistent Disk Type**---When installing using <%= vars.ops_manager %>,
each RabbitMQ node is configured to have 30 GB of persistent disk space.
<br>You can change this setting in the [Service Plan Configuration](./install-config.html#service-plan) page.
VMware recommends you set this value to be twice the amount of RAM of the selected **RabbitMQ VM Type**.
* **Metrics**---Emitted to the Loggregator Firehose for all on-demand instances.
The polling interval is set in <%= vars.ops_manager %>, in the **Metrics polling interval** field, in the **Pre-Provisioned RabbitMQ** tab of the <%= vars.product_short %> tile.
Due to the impact of some of the cluster settings detailed below,
VMware strongly recommends that you monitor the exposed metrics
and configure alarms as recommended in [Monitoring and KPIs for On-Demand <%= vars.product_full %>](./monitor.html#kpi).
See also [Monitoring On-Demand RabbitMQ Clusters](#monitor-clusters) below.
* **Logs**---On-demand <%= vars.product_short %> service instance logs are forwarded using the same configuration
as contained in the **Syslog and Metrics** tab of the <%= vars.product_short %> tile.
* **Disk free space limit**---The disk free space limit is set to 150% of RAM of the instance type you select.
For example, if you select an instance type with 10 GB of RAM, the disk free space limit is set to 15 GB.
A cluster-wide alarm is triggered if the amount of free disk space drops below this, and all publishers are blocked.
Instances must be configured to have persistent disks that are at least twice the size of instance RAM.
For more information, see the [RabbitMQ documentation](https://www.rabbitmq.com/disk-alarms.html).
* **Memory threshold for triggering flow control**---Threshold at which flow control is triggered is set to 40% of the instance RAM.
This means that when the alarm is triggered, all connections publishing messages are blocked cluster-wide until the alarm is cleared.
<br><br>
For example, if you select an instance type with 10 GB of RAM, when more than 4 GB of memory is used, all publishing connections are blocked.
For more information, see [Memory Alarms](https://www.rabbitmq.com/memory.html) in the RabbitMQ documentation.
* **Memory paging threshold**---This is the level at which RabbitMQ tries to free up memory by instructing queues to page their contents out to disk. This is done to try to avoid reaching the high watermark and blocking publishers.
This threshold is set to 50% of the configured high watermark, which is 20% of configured memory.<br><br>
For example, if you select an instance type with 10 GB of RAM, when more than 2 GB of memory is used,
all queues start writing all queue contents to disk.
For more information, see the [RabbitMQ documentation](https://www.rabbitmq.com/memory.html).
#### <a id="monitor-clusters"></a> Monitoring On-Demand RabbitMQ Clusters
* It is important to monitor and compare the number of instances that have been deployed against the quota you set using the metric exposed on the Firehose.
* Each instance is pre-configured to emit metrics to the Firehose and can be identified by the `deployment` tag,
which has the service instance ID.
It is important to monitor these metrics as recommended in [Monitoring and KPIs for On-Demand <%= vars.product_full %>](./monitor.html#kpi).
## <a id="migrating-from-mt"></a> About Migrating a Pre-Provisioned Instance to an On-Demand Instance
VMware recommends the on-demand service for production workloads due to its workload isolation.
For instructions for developers about migrating from a pre-provisioned to an on-demand instance, see
[Migrating a <%= vars.product_short %> Service Instance to Another Service Instance](./migration.html).
For how operators can turn off the pre-provisioned service, see
[Turning Off the Pre-Provisioned Service](./turn-pre-provisioned-off.html).
When migrating from a pre-provisioned to an on-demand offering, be aware of the following:
* Pre-provisioned service instances have very low resource consumption, that is, a
virtual host within an existing cluster.
However, every on-demand service consists of dedicated VMs.
Therefore, you must select an on-demand service plan that provides adequate resources,
but avoids unnecessary resource consumption.
* For example: you might select a single-node plan with a small VM for development
purposes, but select a three-node cluster of large VMs for a mission critical system.
* VMware recommends that you define all required structures in your app to ensure
they get defined if you connect to a new instance.
These structures include:
+ Exchanges
+ Queues
+ Bindings
* If your pre-provisioned instance uses any of the following, ensure that you
apply them to the on-demand instance:
<ul>
<li>Policies</li>
<li>virtual host-specific parameters, such as <code>max_connections</code></li>
</ul>
You can apply them using the RabbitMQ Management UI or using APIs.
* You lose messages that have not been consumed when you delete your old service instance.
If you do not want to lose messages, do one of the following:
* Switch your producers to the new instance but keep the consumers bound to the
old instance until the queues are empty.
* Use shovel or federation plugins to consume messages from the old instance.
### <a id="differences"></a>Differences between Pre-Provisioned and On-Demand Services Instances
There are some differences between pre-provisioned and on-demand services instances
that you should be aware of:
* On-Demand instances are not fronted by a load balancer, therefore, ensure the following:
* That you configure the RabbitMQ client in your app with all available nodes,
not just one.
* That your re-connection logic can handle a node failure. VMware recommends this
behavior for Spring AMQP clients.
* The instance you are migrating to might use a different version of RabbitMQ than
your old instance.
For more information, see the [<%= vars.product_full %> Release Notes](./releases.html) and the
[RabbitMQ Changelog](https://www.rabbitmq.com/changelog.html).
* Critical tier-1 plugins are enabled for on-demand. However, on-demand does not
yet have the same plugins enabled as pre-provisioned.
If you are missing a plugin, contact your VMware representative.
* You might have configured your on-demand instance differently.
For example, you might have changed:
* VM sizing---CPU, RAM, and disk
* RabbitMQ network partition behavior, both offerings default to `pause_minority`
* If your <%= vars.product_short %> service instance uses TLS, ensure that you enable TLS for on-demand instances.
See [Enable TLS for Your Service Instance](./use.html#tls).