-
Notifications
You must be signed in to change notification settings - Fork 132
CSD Primer
Writing a CSD is a relatively simple process. This page will walk through developing a toy CSD called ECHO
. The ECHO
service has one role type, a python webserver, that starts up and listens on a specific port.
First create a directory called ECHO-1.0:
mkdir ECHO-1.0
Add ECHO-1.0\descriptor\service.sdl
{
"name" : "ECHO",
"label" : "Echo",
"description" : "The echo service",
"version" : "1.0",
"runAs" : {
"user" : "root",
"group" : "root"
},
"roles" : [
{
"name" : "ECHO_WEBSERVER",
"label" : "Web Server",
"pluralLabel" : "Web Servers",
"parameters" : [
{
"name" : "port_num",
"label" : "Webserver port",
"description" : "The web server port number",
"required" : "true",
"type" : "port",
"default" : 8080
}
],
"startRunner" : {
"program" : "scripts/control.sh",
"args" : [ "start" ],
"environmentVariables" : {
"WEBSERVER_PORT" : "${port_num}"
}
}
}
]
}
Add ECHO-1.0\scripts\control.sh
#!/bin/bash
CMD=$1
case $CMD in
(start)
echo "Starting the web server on port [$WEBSERVER_PORT]"
exec python -m SimpleHTTPServer $WEBSERVER_PORT
;;
(*)
echo "Don't understand [$CMD]"
;;
esac
The service.sdl
file can be validated with the provided validator. To run a validation, run:
java -jar validator.jar -s <ECHO-1.0/descriptor/service.sdl>
A CSD is packaged inside a jar file. To build a CSD, the following command can be run inside the CSD directory:
cd ECHO-1.0
jar -cvf ECHO-1.0.jar *
After the ECHO-1.0.jar
CSD is built, it needs to be installed in Cloudera Manager. Perform the following steps:
- Copy the
ECHO-1.0.jar
to the CSD repository. The location is configurable but by default it is/opt/cloudera/csd
scp ECHO-1.0.jar myhost.com:/opt/cloudera/csd/.`
- Restart Cloudera Manager
service cloudera-scm-service restart`
- Restart all Management Services
More detailed instructions can be found here
Once Cloudera Manager and the Management Services are restarted, use the Add Services
wizard to add the Echo Service. Detailed instructions can be found here. Once the service is started, on your browser go to http://<yourhost>:8080/
. It should show all the files in the agent's process directory.
If you wish, you can go to the configuration page of the WebServer role and modify the port number. When the Echo Service is restarted, the python web server should be listening on the new port.
Lets take a closer look at what exactly is going on with this simple CSD.
The service.sdl
file is the entry point that Cloudera Manager has to the CSD. It describes the service and associated roles along with configuration option, commands etc. In our example, ECHO uses a very basic service.sdl
. For a complete reference to the SDL look here
{
"name" : "ECHO",
"label" : "Echo",
"description" : "The echo service",
}
This is identifying information about the service. The name
is the service type this CSD is exposing and must be capitalized and only contain letters, numbers, or underscores. This needs to match the name of the jar file. The label
is what string appears to the customer in Cloudera Manager.
{
"version" : "1.0",
}
The version of the CSD. This version is independent of the version of the software the CSD is controlling. For example, the Spark CSD can be version 1.0 but managing Spark 0.9. This is the second identifier used in the naming of the CSD jar.
"runAs" : {
"user" : "root",
"group" : "root"
},
This is the default user/group that is used to run the start script or any other commands specified in the service.sdl
. The CM administrator can change this user if they like after a service is added.
"roles" : [
{
"name" : "ECHO_WEBSERVER",
"label" : "Web Server",
"pluralLabel" : "Web Servers",
}
Declares what the role type are that are associated with this service. In our example we only have one role type: ECHO_WEBSERVER
. A current limitation is that the role type needs to be globally unique. Because of this it is suggested that the service type be prepended to the role type to make it scoped to this service. Just like the service type, the role type must be capitalized and only contain letters, numbers, and underscores. The label
and pluralLabel
are what appear to the customer in Cloudera Manager.
"parameters" : [
{
"name" : "port_num",
"label" : "Webserver port",
"description" : "The web server port number",
"required" : "true",
"type" : "port",
"default" : 8080
}
]
Parameters are used to describe the configuration values of a role type. Parameters can also exist at the service level and get inherited by all the role types. A parameter has a name
that should remain stable between different versions of the CSD. It is convention that the parameter names are lower case and separated by an underscore.
Parameters also have types. In this example, the type of the parameter is "port" and it defaults to 8080. This configuration can be modified by the user on a per instance or host level.
"startRunner" : {
"program" : "scripts/control.sh",
"args" : [ "start" ],
"environmentVariables" : {
"WEBSERVER_PORT" : "${port_num}"
}
},
The startRunner
is a structure that describes to Cloudera Manager how to start the ECHO_WEBSERVER
role. The program refers to the path of the script to execute relative to the root of the CSD. In addition to the program path, arguments and environment variables can be set. Both the argument and environment variables can be hardcoded as the case with "start" or have placeholders that point to parameters as in the case with "${port_num}". If the user does not change the default of "port_num" when a user starts the ECHO_WEBSERVER
role, the agent effectively will execute:
WEBSERVER_PORT=8080 scripts/control.sh start
The control script can be written in whatever language the author wishes as long as it is executable on the cluster. For ECHO, we have decided to use bash since it is ubiquitous on most linux installations. In ECHO, the control.sh
script is used to start the python webserver on the port specified by the "port_num" parameter. The port number is passed into the script with an environment variable.
#!/bin/bash
CMD=$1
The command name is passed into the script as the first argument. This is a common pattern since we can have one script that can perform different operations.
case $CMD in
(start)
echo "Starting the web server on port [$WEBSERVER_PORT]"
exec python -m SimpleHTTPServer $WEBSERVER_PORT
;;
(*)
echo "Don't understand [$CMD]"
;;
esac
If the command is "start" we exec the python web server on the specified port. Notice that the "$WEBSERVER_PORT" is the same name that is specified in the service.sdl
. The most important line is:
exec python -m SimpleHTTPServer $WEBSERVER_PORT
. This is what starts the python server. The script MUST exec the program as the last thing it does. This is to ensure that the service executable is rooted under the supervisory process and not the control.sh
script.
As described in the Cloudera Manager concepts, the agent maintains a separate directory under /var/run/cloudera-scm-agent/process
for each role command. An example of one of the ECHO_WEBSERVER
roles looks like:
$ tree -a /var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
/var/run/cloudera-scm-agent/process/121-echo-ECHO_WEBSERVER/
├── cloudera-monitor.properties
├── logs
│ ├── stderr.log
│ └── stdout.log
└── scripts
└── control.sh
We can see that the scripts/*
directory has been copied to the agent. This is how the agent can start our python webserver. If we had additional configuration files that were generated, they would also be in this directory.