Skip to content

Latest commit

 

History

History
77 lines (60 loc) · 3.08 KB

jdbc-data-sink.md

File metadata and controls

77 lines (60 loc) · 3.08 KB

JdbcDataSink

Description

The JdbcDataSink framework is a utility framework that helps configuring and writing DataFrames.

This framework provides for writing to a given Jdbc connection.

This framework supports different save modes like overwrite or append, as well as partitioning parameters like columns and number of partition files.

The framework is composed of two classes:

  • JdbcDataSink, which is created based on a JdbcSinkConfiguration class and provides two main functions:
    def writer(data: DataFrame): Try[DataFrameWriter[Row]]
    def write(data: DataFrame): Try[DataFrame]
  • JdbcSinkConfiguration: the necessary configuration parameters

Sample code

import org.tupol.spark.io._

val sinkConfiguration: JdbcSinkConfiguration = ???
JdbcDataSink(sinkConfiguration).write(dataframe)

Optionally, one can use the implicit decorator for the DataFrame available by importing org.tupol.spark.io.implicits._.

Sample code

import org.tupol.spark.io._

val sinkConfiguration: JdbcSinkConfiguration = ???
dataframe.sink(sinkConfiguration).write

Configuration Parameters

  • url Required
    • the JDBC friendly URL pointing to the source data base
  • table Required
    • the source table
  • user Optional
    • the data base connection user
  • password Optional
    • the data base connection password
  • driver Optional
    • the JDBc driver class
  • schema Optional
    • this is an optional parameter that represents the json Apache Spark schema that should be enforced on the input data
    • this schema can be easily obtained from a DataFrame by calling the prettyJson function
    • due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
  • options Optional
    • due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
    • for more details about the available options please check the References section.
  • mode Optional
    • the save mode can be overwrite, append, ignore and error;
    • more details available here
  • options Optional
    • additional options that can be passed to the Apache Spark DataFrameWriter;
    • due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
    • more details available here

References

For the more details about the optional parameters consult the DataFrameWriter API and sources, especially JDBCOptions.