Prerequisites

This covers the steps needed to complete before starting on any other StreamSets Control Hub job related tutorials in this set.

Python 3.4+ and pip3 installed
StreamSets for SDK Installed and activated
Access to StreamSets Control Hub with an user account in your organization
At least one StreamSets Data Collector instance registered with the above StreamSets Control Hub instance

Note: Make sure that the user account has proper access to do the following tasks this blog post covers. The easiest way for this, is to do those tasks using the Web UI of the StreamSets Control Hub first and fix any access problems before embarking on the path below.

Tutorial environment details

While creating this tutorial following was used:

Python 3.6
StreamSets for SDK 3.8.0
All StreamSets Data Collector with version 3.17.0

Outline

In this preparation, 2 jobs are created with following names:

Job for Kirti-HelloWorld
Job for Kirti-DevRawDataSource

This page details on how to create them using SDK for Python. Optionally, you can create them using UI in the browser too. Just follow all the details needed for the jobs.

Workflow

On a terminal, type the following command to open a Python 3 interpreter.

$ python3
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Step 1 — Connect to StreamSets Control Hub instance

Let’s assume the StreamSets Control Hub is running at http://sch.streamsets.com Create an object called control_hub which is connected to the above.

from streamsets.sdk import ControlHub

# Replace the argument values according to your setup
control_hub = ControlHub(server_url='http://sch.streamsets.com',
                         username='user@organization1',
                         password='password')

Step 2 — Create first job

Create a job either using UI or using SDK for Python.

Here is a sample job created using SDK for Python. For this tutorial purpose, create the job with

tags e.g. tags=['kirti-job-dev-tag']
datacollector-labels e.g. data_collector_labels = ['kirti-dev']
Time series analysis enabled

# Create a pipeline
builder = control_hub.get_pipeline_builder()
dev_raw_data_source = builder.add_stage('Dev Data Generator')
trash = builder.add_stage('Trash')
dev_raw_data_source >> trash  # connect the Dev Raw Data Source origin to the Trash destination.
pipeline = builder.build('Kirti-HelloWorld')
control_hub.publish_pipeline(pipeline)

# Create a job for the above
job_builder = control_hub.get_job_builder()
job = job_builder.build('Job for Kirti-HelloWorld', pipeline=pipeline, tags=['kirti-job-dev-tag'])
job.data_collector_labels = ['kirti-dev']
job.enable_time_series_analysis = True
control_hub.add_job(job)

After the above code is executed, one can see the job in the UI as following. Note the datacollector-label here.

Step 3 — Create second job

Create another job either using UI or using SDK for Python.

Here is a sample job created using SDK for Python. For this tutorial purpose, create the job with

tags e.g. tags=['kirti-job-dev-RawDS-tag']
datacollector-labels e.g. data_collector_labels = ['kirti-dev']
Time series analysis enabled

# Create second pipeline
builder = control_hub.get_pipeline_builder()
dev_raw_data_source = builder.add_stage('Dev Raw Data Source')
trash = builder.add_stage('Trash')
dev_raw_data_source >> trash  # connect the Dev Raw Data Source origin to the Trash destination.
pipeline = builder.build('Kirti-DevRawDataSource')
control_hub.publish_pipeline(pipeline)

# Create a job for the above
job_builder = control_hub.get_job_builder()
job = job_builder.build('Job for Kirti-DevRawDataSource', pipeline=pipeline, tags=['kirti-job-dev-RawDS-tag'])
job.data_collector_labels = ['kirti-dev']
job.enable_time_series_analysis = True
control_hub.add_job(job)

Conclusion

Now with this preparation, you are ready to start on other tutorials in this set. To get to know more details about SDK for Python, check the SDK documentation.

If you encounter any problems with this tutorial, please file an issue in the tutorials project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Prerequisites - for SCH Jobs related tutorials

Prerequisites

Tutorial environment details

Outline

Workflow

Step 1 — Connect to StreamSets Control Hub instance

Step 2 — Create first job

Step 3 — Create second job

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Prerequisites - for SCH Jobs related tutorials

Prerequisites

Tutorial environment details

Outline

Workflow

Step 1 — Connect to StreamSets Control Hub instance

Step 2 — Create first job

Step 3 — Create second job

Conclusion