JdbcDataSink

Description

The JdbcDataSink framework is a utility framework that helps configuring and writing DataFrames.

This framework provides for writing to a given Jdbc connection.

This framework supports different save modes like overwrite or append, as well as partitioning parameters like columns and number of partition files.

The framework is composed of two classes:

JdbcDataSink, which is created based on a JdbcSinkConfiguration class and provides two main functions:

def writer(data: DataFrame): Try[DataFrameWriter[Row]]
def write(data: DataFrame): Try[DataFrame]

JdbcSinkConfiguration: the necessary configuration parameters

Sample code

import org.tupol.spark.io._

val sinkConfiguration: JdbcSinkConfiguration = ???
JdbcDataSink(sinkConfiguration).write(dataframe)

Optionally, one can use the implicit decorator for the DataFrame available by importing org.tupol.spark.io.implicits._.

Sample code

import org.tupol.spark.io._

val sinkConfiguration: JdbcSinkConfiguration = ???
dataframe.sink(sinkConfiguration).write

Configuration Parameters

url Required
- the JDBC friendly URL pointing to the source data base
table Required
- the source table
user Optional
- the data base connection user
password Optional
- the data base connection password
driver Optional
- the JDBc driver class
schema Optional
- this is an optional parameter that represents the json Apache Spark schema that should be enforced on the input data
- this schema can be easily obtained from a DataFrame by calling the prettyJson function
- due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
options Optional
- due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
- for more details about the available options please check the References section.
mode Optional
- the save mode can be overwrite, append, ignore and error;
- more details available here
options Optional
- additional options that can be passed to the Apache Spark DataFrameWriter;
- due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
- more details available here

References

For the more details about the optional parameters consult the DataFrameWriter API and sources, especially JDBCOptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jdbc-data-sink.md

jdbc-data-sink.md

JdbcDataSink

Description

Configuration Parameters

References

Files

jdbc-data-sink.md

Latest commit

History

jdbc-data-sink.md

File metadata and controls

JdbcDataSink

Description

Configuration Parameters

References