Skip to content

Latest commit

 

History

History
140 lines (92 loc) · 3.95 KB

README.md

File metadata and controls

140 lines (92 loc) · 3.95 KB

Summary

This software is intended to be a simple (non production ready) processor for apache nifi server, using Greenplum Streaming Service functionalities.
It is written in Java and it uses the following technologies: Apache nifi, java, GRPC, Greenplum GPSS.
At the moment it is just supporting json format. The processor is receiving .json entries from a nifi relation and ingest in a Greenplum table.

The following reading can help you to better understand the software:

Apache Nifi:
https://nifi.apache.org/
GRPC:
https://grpc.io/
Greenplum GPSS:
https://gpdb.docs.pivotal.io/5160/greenplum-stream/overview.html
https://gpdb.docs.pivotal.io/5160/greenplum-stream/api/dev_client.html

These are the steps to run the software:

Prerequisites

  1. Activate the gpss extension on the greenplum database you want to use (for example test)

    test=# CREATE EXTENSION gpss;
    
  2. Create the Greenplum table to be ingested

    Create a table with a json data field (called data)

    test=# create table test(data json);
    
  3. Run a gpss server with the right configuration (ex):

    gpss ./gpsscfg1.json --log-dir ./gpsslogs where gpsscfg1.json

    {
       "ListenAddress": {
          "Host": "",
          "Port": 8085,
          "SSL": false
       },
       "Gpfdist": {
          "Host": "",
          "Port": 8086
       }
    }
    
  4. download, install and start nifi

Screenshot

Deploy and test the nifi processor

  1. Copy the .nar file to the nifi lib directory

The nifi processor is written in Java. Maven will automatically create a .nar file to be deployed in nifi. Copy the .nar file in ./nifi-gpss-nar/target/nifi-gpss-nar-1.0-SNAPSHOT.nar inside your nifi lib directory

Screenshot
Screenshot

  1. restart nifi

Once copied restart nifi

  1. insert the processor in the nifi UI

Screenshot

  1. Setting property of the processor

Screenshot

Password can be null. All the other properties must be specified.

NumberOfItemsToBatch specify if the components need to batch items before ingesting. In this case is 5 so the processor needs to receive at least 5 json entries before ingesting.

For pure streaming way you can set it to 1.

Also set the processor to be a terminated one.
Screenshot

  1. Add a GetFile processor as a tester

Screenshot
Screenshot

  1. Create a relashionship

Screenshot

  1. Start the two processors

Screenshot

You can stop and restart the processor whenever you want.

  1. Put a populated json file inside the test directory you specified in the Get file

You can copy several one line files or you can submit a file with a number of json (one every line).

Screenshot
Screenshot

  1. Have a look to the application logs of nifi and see the greenplum tables populated

Screenshot Screenshot

test=# select * from test;
                       data                        
---------------------------------------------------
 {"name": "John", "age": "31", "city": "New York"}
 {"name": "John", "age": "31", "city": "New York"}
 {"name": "John", "age": "31", "city": "New York"}
 {"name": "John", "age": "31", "city": "New York"}
 {"name": "John", "age": "31", "city": "New York"}
(5 rows)

Build and development

The software is based on maven. to build the project you can just:

mvn build

in the main directory.

This will create a .nar file inside ./nifi-gpss-nar/target that you can deploy on nifi.