This project is a collection of useful transformations of data. These plugins are currently available:
- CSV Parser
- CSV Formatter
- JSON Parser
- Clone Record
- Compressor
- Decompressor
- Encoder
- Decoder
- Hasher
- XML to JSON Converter
Follow these instructions to build and deploy Hydrator transform plugins.
To use plugins, you must have CDAP version 3.2.0 or later. You can download CDAP Standalone that includes Hydrator here.
You get started with Hydrator plugins by building directly from the latest source code:
git clone https://github.com/cdapio/hydrator-plugins.git cd hydrator-plugins mvn clean package -pl transform-plugins -am
After the build completes, you will have a JAR for each plugin under each
<plugin-name>/target/
directory.
You can deploy the transform plugin using the CDAP CLI:
cdap > load artifact target/transform-plugins-<version>.jar \ config-file target/transform-plugins-<version>.json
ID: | CSVParser |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Parses an input field as a CSV Record into a Structured Record. Supports multi-line CSV Record parsing into multiple Structured Records. Different formats of CSV Record can be parsed using this plugin. Supports these CSV Record types: DEFAULT, EXCEL, MYSQL, RFC4180, and TDF. |
Configuration: |
|
ID: | CSVFormatter |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Formats a Structured Record as a CSV Record. Supported CSV Record formats are DELIMITED, EXCEL, MYSQL, RFC4180, and TDF. When the format is DELIMITED, one can specify different delimiters that a CSV Record should use for separating fields. |
Configuration: |
|
ID: | JSONParser |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Parses an input field value as a JSON Object. Each record in the input is parsed as a JSON Object and converted into a Structured Record. The Structured Record can specify particular fields that it's interested in, making projections possible. |
Configuration: |
|
ID: | JSONFormatter |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Formats a Structured Record as JSON Object. Plugin will convert the Structured Record to a JSON object and write to the output record. The output record schema is a single field, either type STRING or type BYTE array. |
Configuration: | schema: Specifies the output schema, a single field either type STRING or type BYTE array |
ID: | CloneRecord |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Makes a copy of every input record received for a configured number of times on the output. This transform does not change any record fields or types. It's an identity transform. |
Configuration: | copies: Specifies the numbers of copies of the input record that are to be emitted |
ID: | Compressor |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Compresses configured fields. Multiple fields can be specified to be compressed using different compression algorithms. Plugin supports SNAPPY, ZIP, and GZIP types of compression of fields. |
Configuration: |
|
ID: | Decompressor |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Decompresses configured fields. Multiple fields can be specified to be decompressed using different decompression algorithms. Plugin supports SNAPPY, ZIP, and GZIP types of decompression of fields. |
Configuration: |
|
ID: | Encoder |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Encodes configured fields. Multiple fields can be specified to be encoded using different encoding methods. Available encoding methods are STRING_BASE64, BASE64, BASE32, STRING_BASE32, and HEX. |
Configuration: |
|
ID: | Decoder |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Decodes configured fields. Multiple fields can be specified to be decoded using different decoding methods. Available decoding methods are STRING_BASE64, BASE64, BASE32, STRING_BASE32, and HEX. |
Configuration: |
|
ID: | Hasher |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Hashes fields using a digest algorithm such as MD2, MD5, SHA1, SHA256, SHA384, or SHA512. |
Configuration: |
|
Prerequisites: | The fields to be hashed must be of type string and non-nullable. |
ID: | XMLToJSON |
---|---|
Type: | Transform |
Mode: | Batch and Realtime |
Description: | Converts an XML string to a JSON string. |
Configuration: |
|
Copyright © 2016-2019 Cask Data, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Cask is a trademark of Cask Data, Inc. All rights reserved.
Apache, Apache HBase, and HBase are trademarks of The Apache Software Foundation. Used with permission. No endorsement by The Apache Software Foundation is implied by the use of these marks.