Skip to content
Samuel Ortiz edited this page Jun 21, 2016 · 1 revision

##Problem Statement Ciao is made of several distributed components, each of them being built from different pieces of software. As each component generates local traces and system logs, getting a clear and timely ordered view of the overall cluster behavior without specific tools proves to be a tedious and error prone task. On the other hand, it is merely impossible to address performance and scalability issues without such tools. The Ciao distributed tracing provides an infrastructure for collecting and gathering traces, logs and cluster events into one single database.

##Requirements The Ciao tracing architecture is designed on top of the following requirements:

  • Low overhead: We want the tracing infrastructure to have the lowest possible overhead for mainly 3 reasons:
    1. It must have the smallest possible CPU, memory and networking footprint as it will run alongside all Ciao components.
    2. The bigger the overhead, the less relevant tracing is. In other words, if the tracing overhead is significant, it will influence timings and will actually change the cluster behaviour.
    3. By focusing on a minimal tracing overhead we can safely enable it permanently.
  • Annotation API: The tracing infrastructure provides a tracing API for any Ciao component to generate their own custom traces. In order to prevent over zealous components from generating huge traces stacks, we will rate limit the number of pending traces per component.
  • Minimal application level impact: While Ciao components can generate their own traces, we will instrument low level libraries like the SSNTP and networking ones.
  • Security: All traces should be encrypted and sent over a TLS link.
  • Privacy: When tracing, an SSNTP payload must be discarded and absolutely cannot be included in the trace as it could contain confidential and personal tenant information.
  • External access: We should provide a secure and simple API for giving ciao cluster administrators and developers access to the tracing data.

##Architecture

Overview

The ciao tracing architecture goal is to gather and store all ciao components logs, events and errors in a centralized database. As a consequence each ciao component and library should be able to generate and pass traces to an agent that will be responsible for locally gathering them. This agent is called a tracer. Each tracer pushes the locally queued traces to a pool of trace collectors, via a secure networking link. Trace collectors store the collected traces into a trace database: This is loosely based on Google's Dapper but simplified for ciao's network and architecture.

Traces and spans

One can look at a ciao trace as a tree where each node is a span. The edges in this tree represents a causal relationship between spans. A span is the basic ciao tracing unit and has 0 or 1 parent and 0 or more children. The tracers will gather, queue and push YAML payloads that represent spans.

Span

The basic ciao tracing unit is a span. A span represents ciao cluster event, and can be linked to other spans:

  • If span B is caused by span A, span B's parent will be span A.
  • A span can be the root of a ciao trace, and will have no parents.

There are 2 types of spans:

  • Component specific spans: Those spans contain component specific (e.g. SSNTP, libsnnet,...) information (component name and component payload) that is built by the component itself. Component span payloads are typically marshaled JSON structures but the semantics are defined by the component itself. Any trace caller can create component specific spans by implementing the Spanner interface.
  • Anonymous spans: Trace callers that only need to carry a string message can create anonymous spans that will not contain component specific payload. The component name for anonymous spans is set to anonymous.

A span contains the following information:

  • Span UUID
  • Span parent UUID
  • SSNTP UUID, i.e. the SSNTP UUID of the ciao component that created that span
  • Span creation timestamp
  • Span component name
  • Span component payload
  • Span message, i.e. the actual log message

For example, an anonymous root span would look like:

spanUUID: "c73322e8-d5fe-4d57-874c-dcee4fd368cd"
parentUUID: "00000000-0000-0000-0000-000000000000"
ssntpUUID: "b265f62b-e957-47fd-a0a2-6dc261c7315c"
created: "2016-06-02T15:04:05.999999999Z07:00"
component: "anonymous"
componentPayload: nil
message: "StartWorkload for tenant 53cdd9ef-228f-4ce1-911d-706c2b41454a"

Trace

A ciao trace is a logical tree where each node is a span and edges represent causal relationships between spans. Only spans are collected by ciao tracers and pushed back to trace collectors. Actual ciao trace trees can be built by visualization tools based on each span UUID and parent UUID.

A simplified trace tree example for a START command would look like:

Tracer

A ciao tracer role is to collect traces from a specific ciao component or library. In practice a ciao tracer is a go routine that listens to a dedicated go channel where the component pushes new traces through the tracing API. This go routine is implicitly created by the component when initially calling into the tracing API to get a new tracer.

The tracer will queue traces in memory and will eventually push them back to the collector based on the following criteria:

  • Network bandwidth: In order to have a minimal impact on the control plane bandwidth, the traces should be sent back when bandwidth is available.
  • Pending queue length: The longer the pending traces queue is, the more information we may lose if the machine crashes. [TODO] Store traces temporarily.
  • Collection delay: Although traces collection delay is acceptable, one should not wait more than a few minutes before being able to query them.

Trace Collector

A trace collector is a network server receiving and gathering ciao traces from all tracers. One one hand it handles tracers as network clients and listens for new traces coming in and in the other hand it stores them into a dedicated tracing database. [TBD] Database structure and requirements.

Since a trace collector only reads from tracers and only writes to the tracing database, there can be several collectors running at the same time, each of them gathering traces from a subset of all tracers running in the cluster.

Tracer and collectors network link

Tracers and collectors are talking to each other through SSNTP by using one single event: TraceReport.

Why SSNTP and not HTTPS ?

One could wonder if it makes sense to use a custom protocol like SSNTP instead of HTTPS. Here are a few reasons why we think SSNTP is better suited for the ciao tracing infrastructure:

  1. Efficiency: Ciao tracing main requirement is about having the smallest possible footprint. Using SSNTP for trace reports, any ciao component can hint its ciao tracer about the current SSNTP available bandwidth. This will allow tracers to push those reports while using SSNTP bandwidth when available, minimizing the tracing control plane impact.

  2. Minimal configuration: We can re-use some SSNTP certificates and avoid having to generate new HTTPS ones. For example, the controller certificate could be dual roled for controller and trace collector roles.

  3. Consistency: For consistency sake we want to use one single protocol through our control plane.

Out-of-Band collection

Although we could use the existing SSNTP ciao network for reporting traces through the scheduler up to the controller, it makes much more sense to do out-of-band collection by having tracers and collectors connected together via a dedicated SSNTP network.

Having all launcher traces going through the scheduler forwarding rules might have a measurable impact on the scheduler performances. Eventually the controller would also have to receives and process all those traces. In a out-of-band tracing architecture both the controller and scheduler components are not impacted by tracing except for pushing their own traces to their tracers.

Context propagation

Being able to link spans through a causal relationship allows visualization tools to show not only a time based cluster log, but also a view of how traces relate to each others. Some examples:

  1. On a launcher node, ciao-launcher creates 2 spans:
  • span #12: A SSNTP START received one
  • span #23: A docker error one as some container it was trying to start failed to find its image
  1. On the controller node, ciao-scheduler creates 3 spans:
  • span #16: A SSNTP START received one as the controller sent a request for starting an instance
  • span #19: A SSNTP START queued one as the START frame can not be dispatched immediatly
  • span #57: A SSNTP START sent one when the same START frame is dequeued and sent to an available compute node

In both cases those spans would be time stamped and after being received and stored by the collectors, one would be able to look at the span time line and see how they were created one after another. But only a manual inspection of the spans may show that they actually relate to each other, e.g. that the docker error span #12 is caused by the START frame received a few milliseconds before, traced through span #23, on example #1.

On those 2 examples we would like to express the following facts:

  1. span #23 is caused by span #12
  2. span #57 is caused by span #19 which itself is caused by span #16

In order to do that, a trace context needs to be propagated across the SSNTP network and optionally across API calls. This context can be passed to the tracing APIs when creating new spans so that those new spans can be linked to the previous others belonging to the same context.

Below is a comparison between how a span time line for example #2 would look like when not propagating tracing contexts as opposed to when context is propagated:

Trace context propagation

In practice, this trace context is a trace.TraceContext pointer that component implementations can decide to propagate by adding it as an additional parameter to their internal APIs.

SSNTP internally propagates trace contexts by carrying it through its frame structure. SSNTP receivers can fetch the SSNTP tracing context by calling ssntp.Frame.TraceContext() on the SSNTP frames they receive.

APIs

Ciao tracing Go library API

To be used by ciao components and libraries:

  • package trace

  • type Tracer struct {}: This is the tracer structure. Internally it maps to a tracing thread listening for traces creations by the local component

type Tracer struct {
	ssntpUUID UUID
	component Component
	span Spanner
}
  • type Spanner interface{}: This is the span builder interface for building component specific spans.
type Spanner interface {
	ComponentSpan(context interface{}) ([]byte)
}

ComponentSpan() takes a component specific structure and returns a component specific payload.

  • type TraceContext struct {}: This is the trace context to propagate at will.
type TraceContext struct {
	parentUUID UUID
}
  • func NewComponentTracer(component Component, span Spanner, ssntpUUID UUID) (*Tracer, *TraceContext, error): This will create a component specific tracer. Calling tracer.Trace() with this tracer will add the tracer.span.ComponentSpan() payload to all created spans.

  • func NewTracer(ssntpUUID string) (*Tracer, *TraceContext, error): This will create the tracer thread.

  • func (tracer *Tracer) Stop(): This shuts the tracer routine down.

  • func (tracer *Tracer) Trace(context *TraceContext, componentContext interface{}, format string, args ...interface{}) (*TraceContext): This is for components to call when they wish to create a ciao trace. Components willing to propagate tracing context should use the returned TraceContext pointer.

  • func (tracer *Tracer) GlogTrace(context *TraceContext, componentContext interface{}, format string, args ...interface{}) (*TraceContext): Same as Trace but combined with a glog logger to also log locally.

Ciao tracing HTTPS endpoints

[TBD]