-
Notifications
You must be signed in to change notification settings - Fork 132
CSD Primer
Writing a CSD is a relatively simple process. This page will walk through developing a toy CSD called ECHO
. The ECHO
service has one role type, a python webserver, that starts up and listens on a specific port.
First create a directory called ECHO-1.0:
mkdir ECHO-1.0
Add ECHO-1.0/descriptor/service.sdl
.
{
"name" : "ECHO",
"label" : "Echo",
"description" : "The echo service",
"version" : "1.0",
"runAs" : {
"user" : "root",
"group" : "root"
},
"roles" : [
{
"name" : "ECHO_WEBSERVER",
"label" : "Web Server",
"pluralLabel" : "Web Servers",
"parameters" : [
{
"name" : "port_num",
"label" : "Webserver port",
"description" : "The web server port number",
"required" : "true",
"type" : "port",
"default" : 8080
}
],
"startRunner" : {
"program" : "scripts/control.sh",
"args" : [ "start" ],
"environmentVariables" : {
"WEBSERVER_PORT" : "${port_num}"
}
}
}
]
}
Add ECHO-1.0/scripts/control.sh
.
#!/bin/bash
CMD=$1
case $CMD in
(start)
echo "Starting the web server on port [$WEBSERVER_PORT]"
exec python -m SimpleHTTPServer $WEBSERVER_PORT
;;
(*)
echo "Don't understand [$CMD]"
;;
esac
The service.sdl
file can be validated with the provided validator. To run a validation, run:
java -jar validator.jar -s ECHO-1.0/descriptor/service.sdl
A CSD is packaged inside a jar file. To build a CSD, the following command can be run inside the CSD directory:
cd ECHO-1.0
jar -cvf ECHO-1.0.jar *
Next install the ECHO-1.0.jar
CSD in Cloudera Manager:
- Copy the
ECHO-1.0.jar
to the CSD repository. The location is configurable but by default it is/opt/cloudera/csd
.
scp ECHO-1.0.jar myhost.com:/opt/cloudera/csd/.`
- Restart Cloudera Manager.
service cloudera-scm-service restart`
- Restart all Management Services
More detailed instructions can be found here.
Once Cloudera Manager and the Management Services are restarted, use the Add Service
wizard to add the Echo Service. Detailed instructions can be found here. Once the service is started, on your browser go to http://<yourhost>:8080/
. It should show all the files in the agent's process directory.
If you wish, you can go to the configuration page of the WebServer role and modify the port number. When the Echo Service is restarted, the python web server should be listening on the new port.
Lets take a closer look at what exactly is going on with this simple CSD.
The service.sdl
file is the entry point that Cloudera Manager has to the CSD. It describes the service and associated roles along with configuration options, commands etc. In our example, ECHO uses a very basic service.sdl
. For a complete reference to the SDL look here.
{
"name" : "ECHO",
"label" : "Echo",
"description" : "The echo service",
}
This is the identifying information about the service. The name
is the service type this CSD is exposing and must be capitalized and only contain letters, numbers, or underscores. This needs to match the name of the jar file. The label
is the string that appears to the user in the Cloudera Manager UI.
{
"version" : "1.0",
}
The version of the CSD. This version is independent of the version of the software the CSD is controlling. For example, the Spark CSD can be version 1.0 but managing Spark 0.9. This is the second identifier used in the naming of the CSD jar.
{
"runAs" : {
"user" : "root",
"group" : "root"
}
}
This is the default user/group that is used to run the start script or any other commands specified in the service.sdl
. The CM administrator can change this user if they like after the service is added.
{
"roles" : [
{
"name" : "ECHO_WEBSERVER",
"label" : "Web Server",
"pluralLabel" : "Web Servers",
}
]
}
Declares role types associated with this service. In our example we only have one role type: ECHO_WEBSERVER
. A current limitation is that the role type needs to be globally unique. Because of this it is suggested that the service type be prepended to the role type to scope it to the service. Just like the service type, the role type must be capitalized and only contain letters, numbers, and underscores. The label
and pluralLabel
are what appear to the user in Cloudera Manager.
{
"parameters" : [
{
"name" : "port_num",
"label" : "Webserver port",
"description" : "The web server port number",
"required" : "true",
"type" : "port",
"default" : 8080
}
]
}
Parameters are used to describe configuration values consumed by the service binary. Parameters can also exist at the service level and get inherited by all the role types. A parameter has a name
that should remain stable between different versions of the CSD. It is convention that the parameter names are lower case and separated by an underscore.
Parameters also have types. In this example, the type of the parameter is "port" and it defaults to 8080. This configuration can be modified by the user across all instances or on a per-instance basis.
{
"startRunner" : {
"program" : "scripts/control.sh",
"args" : [ "start" ],
"environmentVariables" : {
"WEBSERVER_PORT" : "${port_num}"
}
}
}
The startRunner
is a structure that describes to Cloudera Manager how to start the ECHO_WEBSERVER
role. The program
refers to the path of the script to execute relative to the root of the CSD. In addition to the program path, arguments and environment variables can be set. Both the argument and environment variables can be hardcoded as the case with "start" or have placeholders that point to parameters as in the case with "${port_num}". If the user does not change the default of "port_num", when a user starts the ECHO_WEBSERVER
role, the agent effectively will execute:
WEBSERVER_PORT=8080 scripts/control.sh start
The control script can be written in whatever language as long as it is executable on the cluster. For ECHO, we have decided to use bash since it is ubiquitous on most linux installations. In ECHO, the control.sh
script is used to start the python webserver on the port specified by the "port_num" parameter. The port number is passed into the script with an environment variable.
#!/bin/bash
CMD=$1
The command name is passed into the script as the first argument. This is a common pattern since we can have one script that can perform different operations.
case $CMD in
(start)
echo "Starting the web server on port [$WEBSERVER_PORT]"
exec python -m SimpleHTTPServer $WEBSERVER_PORT
;;
(*)
echo "Don't understand [$CMD]"
;;
esac
If the command is "start" we exec the python web server on the specified port. Notice that the "$WEBSERVER_PORT" is the same name that is specified in the service.sdl
.
The most important line is what starts the python server:
exec python -m SimpleHTTPServer $WEBSERVER_PORT
The script MUST execute the program via the shell primitive exec
. This is to ensure that the service executable is rooted under the supervisord process tree and not the control.sh
script.
As described in the Cloudera Manager concepts, the agent maintains a separate directory under /var/run/cloudera-scm-agent/process
for each role command. An example of one of the ECHO_WEBSERVER
roles looks like:
$ tree -a /var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
/var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
├── cloudera-monitor.properties
├── logs
│ ├── stderr.log
│ └── stdout.log
└── scripts
└── control.sh
We can see that the scripts/*
directory has been copied to the agent. This is how the agent can start our python webserver.