This StreamSets Data Collector pipeline is designed to load a pre-trained TensorFlow model to classify cancer condition as either benign or malignant.
- StreamSets Data Collector. You can deploy Data Collector on your choice of cloud provider or you can download it for local development.
- Download and import the pipeline into your instance of Data Colelctor
- Download the sample dataset
- Download the TensorFlow model
- After importing the pipeline into your environment and before running the pipeline, update the following pipeline parameters:
[
{
"key": "INPUT_DATA_LOCATION",
"value": ""
},
{
"key": "INPUT_DATA_FILE",
"value": ""
},
{
"key": "KAFKA_TOPIC_BENIGN",
"value": ""
},
{
"key": "KAFKA_TOPIC_MALIGNANT",
"value": ""
},
{
"key": "KAFKA_BROKER_URI",
"value": ""
},
{
"key": "TF_MODEL_LOCATION",
"value": ""
}
]
These pipeline parameters refer to the locations of source dataset, the TensorFlow model, Kafka topics as well as Kafka broker URI.
For techincal information and detailed explanation of this use case, read this blog.