forked from mboldt/docs-tiledev
-
Notifications
You must be signed in to change notification settings - Fork 0
/
nozzle.html.md.erb
361 lines (287 loc) · 15.1 KB
/
nozzle.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
---
title: Logs, metrics, and nozzles
owner: Services
---
You can integrate services with Cloud Foundry's logging system, _Loggregator_, by writing to and
reading from its _Firehose_ endpoint.
The Loggregator logging system collects your logs and metrics from apps and platform components
and streams them to a single endpoint, Firehose.
Your tile can integrate its service with Loggregator in following ways:
* By sending your service component logs and metrics to the Firehose, to be streamed with core platform
component logs and metrics.
* By installing a _nozzle_ on Firehose that directs Firehose data to be consumed by external services or
apps -- a built-in nozzle can activate a service to:
- Drain metrics to an external dashboard product for system operators.
- Send HTTP request details to search or analysis tools.
- Drain app logs to an external system.
- Auto scale itself based on Firehose metrics, as detailed in this [YouTube video](https://youtu.be/skJKvQfpKD4?t=1021).
For a real world production example of a nozzle,
see [Firehose-to-syslog](https://github.com/cloudfoundry-community/firehose-to-syslog).
## <a id="firehose"></a> Firehose communication
<%= vars.app_runtime_full %> components publish logs and metrics to the Firehose through
Loggregator agent processes that run locally on the component VMs. Loggregator agents input the data to the
Loggregator system through a co-located Loggregator agent. To see how logs and metrics travel
from <%= vars.app_runtime_abbr %> system components to the Firehose,
see the [Cloud Foundry documentation](https://docs.cloudfoundry.org/loggregator/architecture.html).
Component VMs running <%= vars.app_runtime_abbr %> services can publish logs and metrics the same way, by
including the same component, Loggregator Agent.
Historically, components use Metron for this communication.
### <a id="https"></a> HTTPS protocol
To activate a service component to supply logs and metrics to the Firehose through encrypted communications,
do the following:
1. Include a Loggregator agent in the service component's template definitions.
<br><br>
For example:
```
name: service
label: Service
templates:
- name: service
release: service
manifest: |
- name: bpm
release: bpm
properties: {}
- name: loggregator_agent
release: loggregator-agent
consumes:
doppler:
deployment: cf-e8e79eaed2a50130f206
properties:
deployment: generator
loggregator:
tls:
ca_cert: (( $ops_manager.ca_certificate ))
agent:
cert: ((CERTIFICATE))
key: ((KEY))
```
Where `CERTIFICATE` and `KEY` are the values used for mutual TLS communication. For example, `.properties.agent_certificate.cert_pem` and `.properties.agent_certificate.private_key_pem`.
1. Generate the Tanzu Operations Manager CA certificate and sign the certificate that is needed for mutual TLS communication,
with the following properties:
```
- name: agent_certificate
type: rsa_cert_credentials
label: Agent Security Certificate
configurable: false
default:
domains:
- agent.(( ..cf.cloud_controller.system_domain.value ))
description: mTLS Certificate for Agent
```
## <a id="nozzle"></a> Nozzles
A nozzle is a component that is dedicated to reading and processing data that streams from the Firehose.
A service tile installs a nozzle as either a managed service, with package type `bosh-release`, or as an
app that is pushed to <%= vars.app_runtime_abbr %>, with the package type `app`.
### <a id="develop"></a> Develop a nozzle
You must develop a nozzle in Go to use the
[NOAA library](https://github.com/cloudfoundry/noaa).
NOAA establishes an authenticated websocket connection to the logging system and de-serializes the
protocol buffers.
Draining the logs consists of:
1. Authenticating
1. Establishing a connection to the logging system
1. Forwarding events on to their ultimate destination
Authenticate against the [API](https://github.com/cloudfoundry-community/go-cfclient)
with a user in the `doppler.firehose` group:
```go
import "github.com/cloudfoundry-community/go-cfclient"
...
config := &cfclient.Config{
ApiAddress: apiUrl,
Username: username,
Password: password,
SkipSslValidation: sslSkipVerify,
}
client, err := cfclient.NewClient(config)
```
Use the client's token,to create a consumer and connect to Firehose with a subscription ID.
The ID is important because Firehose looks for connections with the same ID and only
sends an event to one of those connections. A nozzle developer can run two or more
instances to prevent message loss during upgrades an other deployments.
```go
token, err := client.GetToken()
consumer := consumer.New(config.TrafficControllerURL, &tls.Config{
InsecureSkipVerify: config.SkipSSL,
}, nil)
events, errors := consumer.Firehose(firehoseSubscriptionID, token)
```
`Firehose` returns two channels, one for events and one for errors.
The events channel receives the following types of events:
* _ValueMetric_ represents some platform metric at a point in time, emitted by platform components. For example, how many `2xx` responses the router has sent out.
* _CounterEvent_ represents an incrementing counter, emitted by platform components. For example, a Diego cell's remaining memory capacity.
* _Error_ represents an error in the originating process.
* _HttpStartStop_ represents HTTP request details, including both app and platform requests.
* _LogMessage_ represents a log entry for an individual app.
* _ContainerMetric_ represents application container information. For example, memory used.
For the full details about events, see
[dropsonde protocol](https://github.com/cloudfoundry/dropsonde-protocol/tree/master/events).
These events show how the data targets two different personas:
*Platform operators
*App developers
The `doppler.firehose` scope recieves nozzle data for *every* app and the platform.
Any filtering that is based on the event payload is the nozzle implementor's responsibility.
An advanced integration can combine a
[service broker](service-brokers.html) with a nozzle to:
* App developers can opt into logging (implementing filtering in the nozzle).
* Establish [Cloud Foundry documentation](https://docs.cloudfoundry.org/services/dashboard-sso.html) exchange for authentication so
that developers only can access logs for their space's apps.
For a full working example (suitable as an integration starting point),
see [firehose-nozzle](https://github.com/cf-platform-eng/firehose-nozzle).
### <a id="deploy"></a> Deploy a nozzle
After you have built a nozzle, you can deploy it as a managed service or as an app.
For more information, see [managed service](managed.html) and what it means to be a
managed service.
Also, see this [example nozzle BOSH release](https://github.com/cloudfoundry-incubator/example-nozzle-release).
You can deploy the nozzle as an app on <%= vars.app_runtime_abbr %>.
For more information about the Tile Generator's
[section on pushed apps], see (tile-generator.html#pushed-applications).
### <a id="examples"></a> Nozzle examples
There are several open source examples you can use
as a reference for building your nozzle.
* [firehose-nozzle](https://github.com/cf-platform-eng/firehose-nozzle) writes to standard out. It's a useful
starting point as scaffolding, tests,
and more are already in place.
* [example-nozzle](https://github.com/cloudfoundry-incubator/example-nozzle) in a
single file implementation with no tests.
* [gcp-tools-release](https://github.com/cloudfoundry-community/gcp-tools-release)
drains component syslogs and health data in addition to nozzle data. It shows how
to work with a BOSH add-on for additional data outside a nozzle. The nozzle is
managed through BOSH. Raw logs and metrics data take different paths in the source.
* [firehose-to-syslog](https://github.com/cloudfoundry-community/firehose-to-syslog)
includes implementation code that adds additional metadata, which might be needed
for the access control list (ACL) app name, space UUID and name, and org UUID and name.
* [logsearch-for-cloudfoundry](https://github.com/cloudfoundry-community/logsearch-for-cloudfoundry)
packages this nozzle as a BOSH release.
* [splunk-firehose-nozzle](https://github.com/cf-platform-eng/splunk-firehose-nozzle)
has source code based on `firehose-to-syslog` and is packaged to run an app on <%= vars.app_runtime_abbr %>.
* [datadog-firehose-nozzle](https://github.com/DataDog/datadog-firehose-nozzle) is
another real world implementation.
## <a id="syslog-format-pcf"></a> Log Format for <%= vars.app_runtime_abbr %> components
The standard log format for <%= vars.app_runtime_abbr %> adheres to the
[RFC-5424 syslog protocol](https://tools.ietf.org/html/rfc5424), with log entriess formatted as follows:
`<${PRI}>${VERSION} ${TIMESTAMP} ${HOST_IP} ${APP_NAME} ${PROD_ID} ${MSG_ID} ${SD-ELEMENT-instance} ${MESSAGE}`
The [Syslog Message Elements table](#syslog-elements) immediately describes each element of the log, and the [Structured Instance Data Format](#sd-element) table describes the contents of the structured data element that carries Cloud Foundry VM instance information.
### <a id="syslog-elements"></a> Syslog entries elements
The following table describes each element of a standard <%= vars.app_runtime_abbr %> syslog entries:
<table id='syslog-elements-table' border="1" class="table" >
<thead>
<tr>
<th>Syslog entries Element</th>
<th>Meaning or Value</th>
</tr>
<thead>
<tr>
<td><code>${PRI}</code></td>
<td><p><a href="https://tools.ietf.org/html/rfc5424#section-6.2.1">Priority value (PRI)</a>, calculated as <code>8 × Facility Code + Severity Code</code></p>
<p><%= vars.app_runtime_abbr %> uses a Facility Code value of <code>1</code>, indicating a user-level facility. This adds <code>8</code> to the RFC-5424 Severity Codes, resulting in the
numbers listed in the following <a href="#severity-codes">table</a>.</p>
<p>If in doubt, default to <code>13</code>, to indicate Notice-level severity.</p>
</td>
</tr><tr>
<td><code>${VERSION}</code></td>
<td><a href="https://tools.ietf.org/html/rfc5424#section-9.1"><code>1</code></a></td>
</tr><tr>
<td><code>${TIMESTAMP}</code></td>
<td><p>The <a href="https://tools.ietf.org/html/rfc5424#section-9.1">timestamp</a> of when the log entry is forwarded (typically slightly after it was generated). Example: <code>2017-07-24T05:14:15.000003Z</code></p></td>
</tr><tr>
<td><code>${HOST_IP}</code></td>
<td><a href="https://tools.ietf.org/html/rfc5424#section-6.2.4">Internal IP address</a> of origin server</td>
</tr><tr>
<td><code>${APP_NAME}</code></td>
<td><p><a href="https://tools.ietf.org/html/rfc5424#section-6.2.5">Process name</a> of the program the generated the message. Prefixed with <code>vcap</code>. For example:</p>
<ul><li><code>vcap.rep</code></li>
<li><code>vcap.garden</code></li>
<li><code>vcap.cloud_controller_ng</code></li></ul>
<p>
You can derive this process name from either the program name configured for the local Metron agent or the <code>:progname</code>that blackbox derives from the directory that syslog-release forwards logs into.</p></td>
</tr><tr>
<td><code>${PROD_ID}</code></td>
<td>The <a href="https://tools.ietf.org/html/rfc5424#section-6.2.6">Process ID</a> of the syslog process doing the forwarding. If this is not easily available, default to <code>-</code> (hyphen) to indicate unknown.</td>
</tr><tr>
<td><code>${MSG_ID}</code></td>
<td>The <a href="https://tools.ietf.org/html/rfc5424#section-6.2.7">type</a> of log entries. If this is not available, default to <code>-</code> (hyphen) to indicate unknown.</td>
</tr><tr>
<td><code>${SD-ELEMENT-instance}</code></td>
<td>Structured data (SD) relevant to <%= vars.app_runtime_abbr %> about the <a href="https://tools.ietf.org/html/rfc5424#section-6.3.1">source instance (VM)</a> that originates the log message. See the <a href="#sd-element">Structured Instance Data Format table</a> for content and format.</td>
</tr><tr>
<td><code>${MESSAGE}</code></td>
<td>The log entries itself, ideally in JSON</td>
</tr>
</table>
### <a id="severity"></a> RFC-5424 severity codes
<%= vars.app_runtime_abbr %> components generate log entries with the following severity levels.
The most common severity level is `13`.
<table id='severity-table' border="1" class="table" >
<thead>
<tr>
<th>Severity Code</th>
<th>Meaning</th>
</tr>
</thead>
<tr>
<td><code>8</code></td>
<td>Emergency: system is unusable</td>
</tr><tr>
<td><code>9</code></td>
<td>Alert: action must be taken immediately</td>
</tr><tr>
<td><code>10</code></td>
<td>Critical: critical conditions</td>
</tr><tr>
<td><code>11</code></td>
<td>Error: error conditions</td>
</tr><tr>
<td><code>12</code></td>
<td>Warning: warning conditions</td>
</tr><tr>
<td><code>13</code></td>
<td>Notice: normal but significant condition</td>
</tr><tr>
<td><code>14</code></td>
<td>Informational: informational messages</td>
</tr><tr>
<td><code>15</code></td>
<td>Debug: debug-level messages</td>
</tr>
</table>
### <a id="sd-element"></a> Structured instance data format
The RFC-5424 syslog protocol includes a [structured data element](https://tools.ietf.org/html/rfc5424#section-6.3.1) that you can use. <%= vars.app_runtime_abbr %> uses this
element to carry VM instance information as follows:
<table id='sd-element-table' border="1" class="table" >
<thead>
<tr>
<th><code>SD-ELEMENT-instance</code> element</th>
<th>Meaning</th>
</tr>
</thead>
<tr>
<td><code>${ENTERPRISE_ID}</code></td>
<td>Your Enterprise Number, as <a href="https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers">listed</a> by the Internet Assigned Numbers Authority (IANA) </td>
</tr><tr>
<td><code>${DIRECTOR}</code></td>
<td>The BOSH irector managing the deployment. </td>
</tr><tr>
<td><code>${DEPLOYMENT}</code></td>
<td>BOSH <code>spec.deployment</code> value</td>
</tr><tr>
<td><code>${INSTANCE_GROUP}</code></td>
<td>BOSH <code>instance_group</code>, <code>spec.job.name</code></td>
</tr><tr>
<td><code>${AVAILABILITY_ZONE}</code></td>
<td>BOSH <code>spec.az</code> value</td>
</tr><tr>
<td><code>${ID}</code></td>
<td>BOSH <code>spec.id</code> value. This is a UUID, not an index. It's necessary because
BOSH Availability Zone index values are not always unique or sequential.</td>
</tr>
</table>
## <a id="metrics"></a> Metrics
The [Logging and metrics in Cloud Foundry](https://docs.vmware.com/en/VMware-Tanzu-Application-Service/4.0/tas-for-vms/data-sources.html) topic has a
rundown of the
various metrics and how to make them useful.
## <a id="resources"></a> Other resources
* Cloud Foundry Summit Video [Monitoring Cloud Foundry: Learning about the Firehose](https://youtu.be/skJKvQfpKD4).
* [Loggregator GitHub repository](https://github.com/cloudfoundry/loggregator/).
* [Overview of the Loggregator System](https://docs.cloudfoundry.org/loggregator/architecture.html).
* [Loggregator's Slack Channel](https://cloudfoundry.slack.com/messages/loggregator/).