assumerolespark-s3

Usecase:

You can pass credential to assume role and read/write the file for specific bucket and use instance profile for other buckets. Example, if you want to read the logs from master account and write to test account then analysis it.

Steps: (tested on emr 5.x) 1. Build the jar and upload to s3 gradle build

2. Configure EMR to add above jar in emrfs 
 
 https://aws.amazon.com/blogs/big-data/securely-analyze-data-from-another-aws-account-with-emrfs/
 
 [{"classification":"emrfs-site", "properties":{"fs.s3.customAWSCredentialsProvider":"software.zip.s3.RoleBasedAWSCredentialProvider"}, "configurations":[]}]

3. Configure spark context with required valued

spark.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY","") spark.sparkContext.hadoopConfiguration.set("AWS_SECRET_KEY_ID",") spark.sparkContext.hadoopConfiguration.set("AWS_SESSION_TOKEN","") spark.sparkContext.hadoopConfiguration.set("amz-assume-role-arn","") spark.sparkContext.hadoopConfiguration.set("s3_bucket_uri","")

All set, you can run spark code which uses assume role for bucket uri and instance profile for others

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assumerolespark-s3

About

Releases

Packages

Languages

awsbigdata/assumerolespark-s3

Folders and files

Latest commit

History

Repository files navigation

assumerolespark-s3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages