CSD Primer

Writing a CSD is a relatively simple process. This page will walk through developing a toy CSD called ECHO. The ECHO service has one role type, a python webserver, that starts up and listens on a specific port.

Creating the Echo CSD

First create a directory called ECHO-1.0:

mkdir ECHO-1.0

The Service Descriptor

Add ECHO-1.0\descriptor\service.sdl

{
  "name" : "ECHO",
  "label" : "Echo",
  "description" : "The echo service",
  "version" : "1.0",
  "runAs" : { 
    "user" : "root",
    "group" : "root"
   },  
   "roles" : [
    {
       "name" : "ECHO_WEBSERVER",
       "label" : "Web Server",
       "pluralLabel" : "Web Servers",
       "parameters" : [
        {
          "name" : "port_num",
          "label" : "Webserver port",
          "description" : "The web server port number",
          "required" : "true",
          "type" : "port",
          "default" : 8080
        }
      ],
      "startRunner" : {
         "program" : "scripts/control.sh",
         "args" : [ "start" ],
         "environmentVariables" : {
           "WEBSERVER_PORT" : "${port_num}"         
         }
      }
    }
  ]
}

The Control Script

Add ECHO-1.0\scripts\control.sh

#!/bin/bash
CMD=$1

case $CMD in
  (start)
    echo "Starting the web server on port [$WEBSERVER_PORT]"
    exec python -m SimpleHTTPServer $WEBSERVER_PORT
    ;;
  (*)
    echo "Don't understand [$CMD]"
    ;;
esac

Validation

The service.sdl file can be validated with the provided validator. To run a validation, run:

java -jar validator.jar -s <ECHO-1.0/descriptor/service.sdl>

Building

A CSD is packaged inside a jar file. To build a CSD, the following command can be run inside the CSD directory:

cd ECHO-1.0
jar -cvf ECHO-1.0.jar *

Testing ECHO CSD in Cloudera Manager

After the ECHO-1.0.jar CSD is built, it needs to be installed in Cloudera Manager. Perform the following steps:

Copy the ECHO-1.0.jar to the CSD repository. The location is configurable but by default it is /opt/cloudera/csd

scp ECHO-1.0.jar myhost.com:/opt/cloudera/csd/.`

Restart Cloudera Manager

service cloudera-scm-service restart`

Restart all Management Services

More detailed instructions can be found here

Once Cloudera Manager and the Management Services are restarted, use the Add Services wizard to add the Echo Service. Detailed instructions can be found here. Once the service is started, on your browser go to http://<yourhost>:8080/. It should show all the files in the agent's process directory.

If you wish, you can go to the configuration page of the WebServer role and modify the port number. When the Echo Service is restarted, the python web server should be listening on the new port.

A Closer Look into the ECHO CSD

Lets take a closer look at what exactly is going on with this simple CSD.

The Service Descriptor

The service.sdl file is the entry point that Cloudera Manager has to the CSD. It describes the service and associated roles along with configuration option, commands etc. In our example, ECHO uses a very basic service.sdl. For a complete reference to the SDL look here

{
  "name" : "ECHO",
  "label" : "Echo",
  "description" : "The echo service",
}

This is identifying information about the service. The name is the service type this CSD is exposing and must be capitalized and only contain letters, numbers, or underscores. This needs to match the name of the jar file. The label is what string appears to the customer in Cloudera Manager.

{
  "version" : "1.0",
}

The version of the CSD. This version is independent of the version of the software the CSD is controlling. For example, the Spark CSD can be version 1.0 but managing Spark 0.9. This is the second identifier used in the naming of the CSD jar.

  "runAs" : { 
    "user" : "root",
    "group" : "root"
   },

This is the default user/group that is used to run the start script or any other commands specified in the service.sdl. The CM administrator can change this user if they like after a service is added.

"roles" : [
 {
   "name" : "ECHO_WEBSERVER",
   "label" : "Web Server",
   "pluralLabel" : "Web Servers",
 }

Declares what the role type are that are associated with this service. In our example we only have one role type: ECHO_WEBSERVER. A current limitation is that the role type needs to be globally unique. Because of this it is suggested that the service type be prepended to the role type to make it scoped to this service. Just like the service type, the role type must be capitalized and only contain letters, numbers, and underscores. The label and pluralLabel are what appear to the customer in Cloudera Manager.

"parameters" : [
 {
  "name" : "port_num",
  "label" : "Webserver port",
  "description" : "The web server port number",
  "required" : "true",
  "type" : "port",
  "default" : 8080
 }
]

Parameters are used to describe the configuration values of a role type. Parameters can also exist at the service level and get inherited by all the role types. A parameter has a name that should remain stable between different versions of the CSD. It is convention that the parameter names are lower case and separated by an underscore.

Parameters also have types. In this example, the type of the parameter is "port" and it defaults to 8080. This configuration can be modified by the user on a per instance or host level.

"startRunner" : {
  "program" : "scripts/control.sh",
  "args" : [ "start" ],
  "environmentVariables" : {
    "WEBSERVER_PORT" : "${port_num}"         
  }
},

The startRunner is a structure that describes to Cloudera Manager how to start the ECHO_WEBSERVER role. The program refers to the path of the script to execute relative to the root of the CSD. In addition to the program path, arguments and environment variables can be set. Both the argument and environment variables can be hardcoded as the case with "start" or have placeholders that point to parameters as in the case with "${port_num}". If the user does not change the default of "port_num" when a user starts the ECHO_WEBSERVER role, the agent effectively will execute:

WEBSERVER_PORT=8080 scripts/control.sh start

The Control Script

The control script can be written in whatever language the author wishes as long as it is executable on the cluster. For ECHO, we have decided to use bash since it is ubiquitous on most linux installations. In ECHO, the control.sh script is used to start the python webserver on the port specified by the "port_num" parameter. The port number is passed into the script with an environment variable.

#!/bin/bash
CMD=$1

The command name is passed into the script as the first argument. This is a common pattern since we can have one script that can perform different operations.

case $CMD in
  (start)
    echo "Starting the web server on port [$WEBSERVER_PORT]"
    exec python -m SimpleHTTPServer $WEBSERVER_PORT
    ;;
  (*)
    echo "Don't understand [$CMD]"
    ;;
esac

If the command is "start" we exec the python web server on the specified port. Notice that the "$WEBSERVER_PORT" is the same name that is specified in the service.sdl. The most important line is: exec python -m SimpleHTTPServer $WEBSERVER_PORT. This is what starts the python server. The script MUST exec the program as the last thing it does. This is to ensure that the service executable is rooted under the supervisory process and not the control.sh script.

Operational Diagnostics

As described in the Cloudera Manager concepts, the agent maintains a separate directory under /var/run/cloudera-scm-agent/process for each role command. An example of one of the ECHO_WEBSERVER roles looks like:

$ tree -a /var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
/var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
├── cloudera-monitor.properties
├── logs
│   ├── stderr.log
│   └── stdout.log
└── scripts
    └── control.sh

We can see that the scripts/* directory has been copied to the agent. This is how the agent can start our python webserver. If we had additional configuration files that were generated, they would also be in this directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSD Primer

Creating the Echo CSD

The Service Descriptor

The Control Script

Validation

Building

Testing ECHO CSD in Cloudera Manager

A Closer Look into the ECHO CSD

The Service Descriptor

The Control Script

Operational Diagnostics

Clone this wiki locally