Skip to content

Latest commit

 

History

History
37 lines (28 loc) · 1.72 KB

README.md

File metadata and controls

37 lines (28 loc) · 1.72 KB

spark-k8s-v322

Spark for spark operator on k8s in version 3.2.2.

  1. Build Spark Image from binaries
  1. Push base image to your repo
  • In this case, I push a clean image created after last step. Pay attention in the tag used to build.
  1. Customize jars from Delta, AWS Hadoop etc.
  • In this repo I place common jars for interact with AWS and Delta Lake. Build image using the dockerfile located in root folder of this repo. Adjust the base image of your dockerfile to the tag builded and pushed in the last steps.

List of jars used in this repo:

  • aws-java-sdk-bundle-1.11.901.jar (download this jar from maven, the size doesn't allow upload this to github)
  • delta-core_2.12-2.0.0.jar
  • delta-storage-2.0.0.jar
  • hadoop-aws-3.3.1.jar
  • hadoop-common-3.3.1.jar
  • hadoop-mapreduce-client-core-3.3.1.jar
  • spark-hadoop-cloud_2.12-3.3.0.jar

Insert others jars in the folder before build a new image. In my dockerfile I use some parameters to deliver AWS access and secret keys, in case of you don't use, remove them. If you use different python libs, insert them on requirements.txt file.

  1. Use the ConfigSpark.yaml to submit your application on k8s and test your image. Peace!