This document describes how to use this transformer template for your custom algorithm processing plot-level RGB data.
Additional technical information can be found on our GitHub IO page for this repository.
If you are not familiar with working with an image's alpha channel, there is information and examples on how to do that on the technical page.
It is assumed that:
- you are generating a Docker image containing your algorithm and that you have Docker installed on your computer
- are familiar with GitHub template repositories, or know how to use
git
The following steps can be taken to develop your algorithm for inclusion into a processing pipeline.
- Setup: Click the
Use this template
button in GitHub to make a copy of this repository (or rungit clone
) - Definitions: Fill in and modify the definitions in the algorithm_rgb.py file
- Algorithm: Replace the code in the
calculate
function with your algorithm - Generate: Run
python3 generate.py
to create a Dockerfile - Test: Run the
python3 testing.py
script to run your algorithm and validate the results - Docker: Create a Docker image for your algorithm and publish it
- Testing Your docker image: OPTIONAL
- Testing Image Production: OPTIONAL
- Finishing: Finish up your development efforts
The first thing to do is to create a copy of this repository has a meaningful name and that you are able to modify.
In GitHub this is easy, browse to this repository and click the Use this template
button.
You will be led through the steps necessary to create a clone in a location of your choosing.
If you are not on GitHub, you will need to setup your git
environment and clone the repository.
To fill in the needed definitions, first open the algorithm_rgb.py
file in your favorite editor.
If you are modifying your existing code, you should consider updating the version number definition: VERSION
.
It's assumed that Semantic Version numbers will be used, but any methodology can be used.
Fill in the algorithm definitions with the creator(s) of the algorithm: ALGORITHM_AUTHOR
, ALGORITHM_AUTHOR_EMAIL
, ALGORITHM_NAME
, and ALGORITHM_DESCRIPTION
.
Multiple names for ALGORITHM_AUTHOR
and multiple emails for ALGORITHM_AUTHOR_EMAIL
are supported.
It's best if only one algorithm name is used, but call it what you want.
The safest algorithm naming convention to use is to convert any white-space or other characters to periods (.) which allows different systems to more-easily change the name, if needed.
Next fill in the citation information that will be used in the generated CSV file: CITATION_AUTHOR
, CITATION_TITLE
, and CITATION_YEAR
.
Be sure to enter the citation information accurately since some systems may expect exact matches.
The names of the variables are used to determine the number of returned values your algorithm produces: VARIABLE_NAMES
.
Enter each variable name for each returned value, in the order they are returned, separated by a comma.
Be sure to enter them accurately since some systems may expect exact matches.
It is considered an error to have a mismatch between the number of variables names and the number of returned values.
A CSV file suitable for ingestion to BETYdb is generated depending upon the value of the WRITE_BETYDB_CSV
variable.
Setting this value to False
will suppress the generation of this file by default.
A CSV file suitable for ingestion to TERRA REF Geostreams is generated depending upon the value of the WRITE_GEOSTREAMS_CSV
variable.
Setting this value to False
will suppress the generation of this file by default.
Be sure to save your changes.
Open the algorithm_rgb.py
file in your favorite editor, if it isn't opened already.
Scroll to the bottom of the file to the function named calculate
.
Replace the comment starting with # ALGORITHM
and the line below with your calculation(s).
As needed, change the name of array used in your algorithm to the function's parameter pxarray
.
Once you have your algorithm in place, replace the comment starting with # RETURN
and the line below with your return values.
Remember to order your return values to match the declared names in the VARIABLE_NAMES
definition.
Modify the rest of the file as necessary if there are additional import statements, functions, classes, and other code needed by your algorithm.
Be sure to save your changes.
It's time to generate the Dockerfile that's used to build Docker images.
Docker images can be used as part of a workflow and this step can be used to create that image for your algorithm.
To assist in this effort we've provided a script named generate.py
to produce a file containing the Docker commands needed.
Running this script will not only produce a Docker command file, named Dockerfile
but also two other files that can be used to install additional dependencies your algorithm needs.
These two other files are named requirements.txt
for additional Python modules and packages.txt
for other dependencies.
To generate these files, run python3 generate.py
.
If your algorithm has additional python module dependencies, edit requirements.txt
and add the names of the modules.
The listed modules will then be installed as part of the Docker build process.
If there are other dependencies needed by your algorithm, add them to the packages.txt
file.
The packages listed will be installed using apt-get
as part of the Docker build process.
A testing script named testing.py
is provided for testing your algorithm.
It checks whether the configuration is correct for testing the files by making sure that the arguments in algorithm_rgb as well as the image files are in the correct format
The testing script requires numpy
and PIL
to be installed on the testing system.
The following command can be used to install these libraries:
python3 -m pip install numpy Pillow
If your testing files reside in a subfolder named test_images
, the following command can be used to run the tests:
./testing.py ${PWD}/test_images
If your files reside in /user/myself/test_images
the command to test could be the following:
python3 testing.py /user/myself/test_images
What isn't provided in the template repository are the plot-level RGB images to test against. It's expected that you will either provide the images or use a standard set that can be downloaded. The following commands can be used to retrieve and extract the test images:
# Create the subfolder to hold the images
mkdir test_images
# Download the archive containing the images
curl -X GET https://de.cyverse.org/dl/d/4108BB75-AAA3-48E1-BBD4-E10B06CADF54/sample_plot_images.zip -o test_images/sample_plot_images.zip
# Extract the images into the subfolder
unzip test_images/sample_plot_images.zip -d test_images/
# Remove the archive
rm test_images/sample_plot_images.zip
The testing script expects to have either a list of source plot image files, or a folder name, or both specified on the command line.
Now that you have generated your Dockerfile
as described above and specified any Python modules and other packages needed by your algorithm, you are ready to create a Docker image of your algorithm.
You can build the Docker image with the following command:
docker build -t my_algorithm:latest ./
Please refer to the Docker documentation for additional information on building a docker image.
Once the image is built, you can run it locally or push it to an image repository, such as DockerHub. Please note that there may be naming requirements for pushing images to a repository.
In order to test your docker image, you can use the command:
docker run --rm -it -v ${PWD}:/mnt --entrypoint /mnt/testing.py my_algorithm:latest /mnt/test_images
Breaking apart this command line, we have the following pieces:
docker run
tells Docker to run an instance of the image (specified later in the command) (Refer to docker run documentation)--rm
tells Docker to remove the container (an image instance) when it's completed-it
allows you to have a stdin stream and terminal driver added to the docker container allowing an interactive session-v ${PWD}:/mnt
bind mounts a volume to the docker container so that the current working directory (given by ${PWD}) will be available--entrypoint /mnt/testing.py
overrides the default entrypoint of the Docker image tp run the testing scriptmy_algorithm:latest
is the Docker image to run (the running image is known as a container)/mnt/test_images
specifies the location where the script finds the images to test with
Output should be in the format of image name and calculated value for that image on a single line for each of the images in the images folder. Example output from the images in the sample image set is shown below for the plot sample images folder, which is titled test_images:
/mnt/test_images/rgb_17_7_W.tif,7000
/mnt/test_images/rgb_40_11_W.tif,7000
/mnt/test_images/rgb_6_1_E.tif,7000
/mnt/test_images/rgb_1_2_E.tif,7000
/mnt/test_images/rgb_33_8_W.tif,7000
/mnt/test_images/rgb_5_11_W.tif,7000
Using the same image setup as used when testing your algorithm, a sample command line to run the image could be:
docker run --rm -v "${PWD}:/mnt" my_algorithm:latest --working_space "/mnt" "/mnt/test_images"
Breaking apart this command line, we have the following pieces:
docker run
tells Docker to run an instance of the image (specified later in the command) (Refer to docker run documentation)--rm
tells Docker to remove the container (an image instance) when it's completed-v "${PWD}:/mnt"
specifies the ${PWD} path is to be made available as /mnt in the containermy_algorithm:latest
is the image to run (the running image is known as a container)--working_space "/mnt"
lets the software in the container know where its working disk space is located; files are created here"/mnt/test_images"
specifies where the plot-level image files are located
The -v
command line parameter is important since it allows the running container to access the local file system.
The container can then load the images from the file system directly, without having to perform any copies.
The parameters after the Docker image name are all relative to the target folder specified with this command line parameter.
Once the image files have been processed, the resulting CSV file(s) will be located in the current folder at ${PWD}
(in this example).
The result.json file should tell you what errors were found in the checks from testing.py (make sure to check the output in the CSV file(s) even if the result.json file does not find errors)
Now that you're created your algorithm, there's a few more things to take care of:
- Make sure you've checked in your changes into source control; you don't want to lose all that hard work!
- Update the README.md file, filling out the sections with information on your algorithm; others will want to know so they can use it!
- Submit any requests to our ticketing system on GitHub: https://github.com/AgPipeline/computing-pipeline/issues/new/choose