Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Core: Support for IRSA when logging on EKS and passing role to Dest/Source Pods #177

Closed
wants to merge 9 commits into from

Conversation

sh4sh
Copy link
Contributor

@sh4sh sh4sh commented Feb 28, 2023

Migrating airbytehq/airbyte#19010 to airbyte-platform, PR originally submitted by @gilandose

This PR is tied with airbytehq/airbyte#23594 which contains documentation and connector dependency updates.

What

DefaultAWSCredentialsProviderChain into the logging route for S3 this allows logs on EKS or EC2 to use standard more secure role assumption methods.

Specifically this unlocks IRSA by removing passing of access keys, and adding STS dependency to allow WebIdentityTokenProvider to be used. Note this would all allow a GKE cluster to interface with AWS using IRSA

Additionally allow the airbyte-admin ServiceAccount to be passed to connectors to allow the IRSA role to be assumed here. Note it may be preferable to assign a different service account for the dest/source pod eventually this could be a connector specific SA with restrictive permissions

resolves airbytehq/airbyte#10570
resolves airbytehq/airbyte#5942

extends excellent work done in: airbytehq/airbyte#10682 credit: @adamschmidt

How

For AWS access keys can automatically be read from the DefaultAWSCredentialsProviderChain, these were being passed around too eagerly and also set into system properties for logging. To change this I've removed some of the explicit passing, and allow the access keys to be passed around transparently when set

Recommended reading order

  1. deps.toml
  2. airbyte-config/config-models/src/main/java/io/airbyte/config/storage/DefaultS3ClientFactory.java
  3. airbyte-commons/src/main/resources/log4j2.xml
  4. airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePodProcess.java
  5. build.gradle
  6. airbyte-integrations/connectors/destination-s3/build.gradle

🚨 User Impact 🚨

No user impact using ACCESS_KEYS works as before

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

Task :airbyte-config:config-models:test

ConfigSchemaTest > testFile() PASSED

ConfigSchemaTest > testPrepareKnownSchemas() PASSED

DataTypeEnumTest > testConversionFromJsonSchemaPrimitiveToDataType() PASSED

EnvConfigsTest > testAirbyteVersion() PASSED

EnvConfigsTest > testGetDatabaseUrl() PASSED

EnvConfigsTest > testSharedJobEnvMapRetrieval() PASSED

EnvConfigsTest > testJobKubeNodeSelectors() PASSED

EnvConfigsTest > testWorkspaceRoot() PASSED

EnvConfigsTest > ensureGetEnvBehavior() PASSED

EnvConfigsTest > testLocalRoot() PASSED

EnvConfigsTest > testDiscoverKubeNodeSelectors() PASSED

EnvConfigsTest > testPublishMetrics() PASSED

EnvConfigsTest > testConfigRoot() PASSED

EnvConfigsTest > testGetDatabasePassword() PASSED

EnvConfigsTest > testSpecKubeNodeSelectors() PASSED

EnvConfigsTest > testAirbyteRole() PASSED

EnvConfigsTest > testCheckKubeNodeSelectors() PASSED

EnvConfigsTest > testDeploymentMode() PASSED

EnvConfigsTest > testTrackingStrategy() PASSED

EnvConfigsTest > testGetWorkspaceDockerMount() PASSED

EnvConfigsTest > testDockerNetwork() PASSED

EnvConfigsTest > testGetLocalDockerMount() PASSED

EnvConfigsTest > testRemoteConnectorCatalogUrl() PASSED

EnvConfigsTest > testGetDatabaseUser() PASSED

EnvConfigsTest > Should parse constant tags PASSED

EnvConfigsTest > testSplitKVPairsFromEnvString() PASSED

EnvConfigsTest > testworkerKubeTolerations() PASSED

EnvConfigsTest > testErrorReportingStrategy() PASSED

EnvConfigsTest > testAllJobEnvMapRetrieval() PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobCpuRequest should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobMemoryLimit should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobMemoryRequest should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainMemoryRequest if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainCpuRequest if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobCpuLimit should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainCpuLimit if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainMemoryLimit if not set PASSED

CloudLogsClientTest > createCloudLogClientTestAws() PASSED

CloudLogsClientTest > createCloudLogClientTestGcs() PASSED

CloudLogsClientTest > createCloudLogClientTestMinio() PASSED

LogClientSingletonTest > testGetJobLogFileNullPath() PASSED

LogClientSingletonTest > testGetJobLogFileEmptyPath() PASSED

LogClientSingletonTest > testGetJobLogFileK8s() PASSED

ScheduleHelpersTest > testGetSecondsInUnit() PASSED

ScheduleHelpersTest > testAllOfTimeUnitEnumValues() PASSED

StateMessageHelperTest > testEmpty() PASSED

StateMessageHelperTest > testDuplicatedGlobalState() PASSED

StateMessageHelperTest > testLegacyInNewFormat() PASSED

StateMessageHelperTest > testLegacyStateConversion() PASSED

StateMessageHelperTest > testEmptyList() PASSED

StateMessageHelperTest > testStreamForceLegacy() PASSED

StateMessageHelperTest > testInvalidMixedState() PASSED

StateMessageHelperTest > testGlobalStateConversion() PASSED

StateMessageHelperTest > testGlobal() PASSED

StateMessageHelperTest > testLegacy() PASSED

StateMessageHelperTest > testGlobalForceLegacy() PASSED

StateMessageHelperTest > testLegacyInList() PASSED

StateMessageHelperTest > testStream() PASSED

StateMessageHelperTest > testStreamStateConversion() PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on bad data PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on duplicate name PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should correctly read yaml file PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on duplicate id PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on empty file PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on bad data PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on duplicate name PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should correctly read yaml file PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on duplicate id PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on empty file PASSED

CloudLogsClientTest > testGcs() PASSED

CloudLogsClientTest > testGcsMissingBucket() PASSED

DefaultS3ClientFactoryTest > testS3() PASSED

DefaultS3ClientFactoryTest > testS3RegionNotSet() PASSED

MinioS3ClientFactoryTest > testMinio() PASSED

MinioS3ClientFactoryTest > testMinioEndpointMissing() PASSED

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@sh4sh sh4sh changed the title Core: Support for IRSA when logging on EKS and passing role to Dest/Source Pods 🎉 Core: Support for IRSA when logging on EKS and passing role to Dest/Source Pods Feb 28, 2023
@@ -131,6 +131,8 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: JOB_KUBE_NAMESPACE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from previous PR by @mjstel:

This is probably a typo and should be JOB_KUBE_SERVICE_ACCOUNT right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: JOB_KUBE_NAMESPACE
- name: JOB_KUBE_SERVICE_ACCOUNT

podBuilder = podBuilder.withServiceAccount("airbyte-admin").withAutomountServiceAccountToken(true);
}
.withNewSpec()
.withServiceAccount(isOrchestrator ? "airbyte-admin" : serviceAccount)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benmoriceau added a comment to the previous PR:

Should we add a dedicated variable for the orchestrator SA and well as the one added in order to have something like:
.withServiceAccount(isOrchestrator ? orchestratorServiceAccount : serviceAccount). If we go that way, we need to make sure that the default value is "airbyte-admin".

@@ -118,10 +122,12 @@ public ProcessFactory discoverDockerProcessFactory(
public ProcessFactory discoverKubernetesProcessFactory(
@Named("discoverWorkerConfigs") final WorkerConfigs workerConfigs,
@Value("${airbyte.worker.job.kube.namespace}") final String kubernetesNamespace,
@Value("${airbyte.worker.job.kube.service.account") final String kubernetesServiceAccount,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benmoriceau left a comment:

This value is missing from the file application.yaml in the resource folder of the airbyte-worker module. This will cause some issue while being created.

@@ -54,10 +54,12 @@ public ProcessFactory checkDockerProcessFactory(
public ProcessFactory checkKubernetesProcessFactory(
@Named("checkWorkerConfigs") final WorkerConfigs workerConfigs,
@Value("${airbyte.worker.job.kube.namespace}") final String kubernetesNamespace,
@Value("${airbyte.worker.job.kube.service.account") final String kubernetesServiceAccount,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, there is a missing bracket here

@czomo
Copy link
Contributor

czomo commented Apr 15, 2023

@gilandose Any update on this? 🙏

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@qimumu9406
Copy link

@sh4sh hi,sh4sh.Is there any new progress on this PR?

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@hansh0801
Copy link

track

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@qimumu9406
Copy link

qimumu9406 commented Jul 13, 2023

@bothra90 I just merged the PR from @msaffitz so the next helm release should include the SecretsManager fix

Looking forward to your PR contribution

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@bothra90
Copy link

bothra90 commented Jul 18, 2023

I updated to the latest 0.50.8 release and running into the following error:

airbyte-ddbae802-worker-f67c5cdb9-fdsh2 airbyte-worker-container Cannot end publish with com.van.logging.aws.S3PublishHelper@53f4b9a9 due to error: Cannot end publishing: Cannot publish to S3: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID:

I suspect we still need to merge the changes @msaffitz highlighted in main...msaffitz:airbyte-platform:migration/19010. In particular, I think we need to remove the s3AwsKey and s3AwsSecret config values passed to Log4j2Appender.

@pmossman: Can we rebase and land this PR?

Edit: I tried to pull in the changes and deploy from a fork. I had to make a few more changes to make it work. @pmossman / @sh4sh : Let me know if I should open a new PR.

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@msaffitz
Copy link
Contributor

msaffitz commented Jul 28, 2023

@bothra90 -- I'm seeing the same error re: the invalid access key and secret.

I can rebase that other PR that removes these values from the Log4J appender XML and open a PR for it, but I'm not super familiar with Log4J, how it works, and what the implications of that are.

My assumption (based on the other work that's been done) would have been that if these values are either null / empty strings the Airbyte-provided S3 client will fall back to the default credential provider

Given the error we're seeing, what I now think may be happening is that Log4J is trying to setup an S3 client outside that path? If that's the case, removing these from the XML makes sense, but then I don't know if that breaks the use cases when the keys should be used instead of IAM.

Were you able to get this working?

Edit: I've opened a PR for this; see below.

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@weichunnn
Copy link

weichunnn commented Aug 29, 2023

Heya @msaffitz @bothra90 - Are you guys able to get IRSA to work for logging and the other S3 connectors with the latest release? On my side, I created the IRSA role and set the following but still faced some errors related to credentials.

I checked that from the pod itself I am able to access AWS resources using the configured role - but seems like Airbyte can't pickup the credentials

global:
  ##  serviceAccountName -- Service Account name override
  serviceAccountName: &service-account-name "staging-airbyte"

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/stag-oidc-airbyte
  name: *service-account-name

Error

Internal Server Error: Unable to load credentials from any of the providers in the chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(profilesAndSectionsMap=[])), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): To use web identity tokens, the 'sts' service module must be on the class path., ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(profilesAndSectionsMap=[])): Profile file contained no credentials for profile 'default': ProfileFile(profilesAndSectionsMap=[]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neither AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.]

cc. @pmossman

@github-actions
Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@weichunnn
Copy link

Heya @pmossman @sh4sh bumping this error up in the above comment if you have any updates on it? We are suspecting it's related to a missing STS module in one of Airbyte's deployment

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@sidharrth2002
Copy link
Contributor

Hi all, I'm really looking forward to IRSA support for logging as well! Just bumping this up 😄

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@cyberjar09
Copy link

Hi we have a use case where we want to write to the destination redshift.
Was wondering if the IRSA support and ability to pass role to a pod will work in a generic way so we can use it for redshift destination too?

thanks!

Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@ilyasemenov84
Copy link

ilyasemenov84 commented Nov 30, 2023

We are also eager to get this future implemented 🙏

Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@weichunnn
Copy link

Hey @cyberjar09 @ilyasemenov84 My colleague - @sidharrth2002 and I had patched the bug based on #274 (comment). The fix was implemented in this PR here #279 and we are able to confirm that IRSA works on our production workload.

For redshift, if the connector is already built with the default credentials provider in mind, it should be able to pick up the IRSA role. Hope this helps!

Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

Copy link

sonarqubecloud bot commented Dec 1, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug E 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell C 14 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

Copy link
Contributor

github-actions bot commented Dec 1, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@cyberjar09
Copy link

Hey @cyberjar09 @ilyasemenov84 My colleague - @sidharrth2002 and I had patched the bug based on #274 (comment). The fix was implemented in this PR here #279 and we are able to confirm that IRSA works on our production workload.

For redshift, if the connector is already built with the default credentials provider in mind, it should be able to pick up the IRSA role. Hope this helps!

hey thanks for the update @weichunnn

I have been testing internally using 0.50.34 (the fix you mentioned was released in 0.50.32)
I can confirm that the IRSA works for S3, but not when I use Redshift+S3 (load data to S3 then use copy command)

Copy link
Contributor

github-actions bot commented Dec 2, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@ilyasemenov84
Copy link

Hey @cyberjar09 @ilyasemenov84 My colleague - @sidharrth2002 and I had patched the bug based on #274 (comment). The fix was implemented in this PR here #279 and we are able to confirm that IRSA works on our production workload.

For redshift, if the connector is already built with the default credentials provider in mind, it should be able to pick up the IRSA role. Hope this helps!

@cyberjar09, thanks a lot! I can confirm that chart 0.50.3 with some extraEnv works with IRSA.

Copy link
Contributor

github-actions bot commented Dec 7, 2023

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@sergeyshevch
Copy link

@ilyasemenov84 Can you share how you set it up with the IRSA role? I get

ERROR i.m.r.Micronaut(handleStartupException):338 - Error starting Micronaut server: null 

in both server and worker

Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

@ilyasemenov84
Copy link

ilyasemenov84 commented Dec 12, 2023

@ilyasemenov84 Can you share how you set it up with the IRSA role? I get

ERROR i.m.r.Micronaut(handleStartupException):338 - Error starting Micronaut server: null 

in both server and worker

Hey @sergeyshevch, please try to add the following envs, in case of K8s chart deployment here are the values:

worker:
    extraEnv:
      - name: STATE_STORAGE_S3_BUCKET_NAME
        value: <airbyte-bucket>
      - name: STATE_STORAGE_S3_REGION
        value: <region>

Copy link
Contributor

Your branch is not currently up-to-date with main. Please update your branch before attempting to snapshot your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet