Transform example demonstrates how to create Apache Beam pipeline, create the new transformation and use it together with GBIF transforms and core classes
- Avro schema - example-record.avsc is used to generate target data class.
- Interpretation ExampleInterpreter.java class uses source data object to apply some logic and sets data to the target object.
- ExampleTransform.java is Apache Beam ParDo transformation, uses ExampleInterpreter.java and Interpretation.java.
- ExamplePipeline.java is Apache Beam pipeline uses ExampleTransform.java as a ParDo transformation, also you can find example of a Darwin Core Archive - example.zip and example of pipeline options - example.properties to run the pipeline.blob/master/examples/src/main/java/or
Please change BUILD_VERSION to the current project version
java -jar target/examples-BUILD_VERSION-shaded.jar src/main/resources/example.properties
You can find output files in the output
directory
The example uses DirectRunner, in case when your dataset contains more than 1000 records, please use Spark standalone instance