Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] E2E testing tool #15152

Closed
56 of 66 tasks
DoNotPanicUA opened this issue Aug 1, 2022 · 8 comments
Closed
56 of 66 tasks

[EPIC] E2E testing tool #15152

DoNotPanicUA opened this issue Aug 1, 2022 · 8 comments

Comments

@DoNotPanicUA
Copy link
Contributor

DoNotPanicUA commented Aug 1, 2022

Tell us about the problem you're trying to solve

Our main goal is to implement an E2E testing tool that should test Airbyte connections via CI. This testing tool will test different connector versions close to the user experience. Such E2E testing will help us to detect possible integration issues before a version release.
Potential issues:

  • Performance degradation (Benchmark)
  • Critical changes (backward compatibility check)
  • Incompatible with other connectors (integration compatibility check)
  • Incompatible with Airbyte core (core compatibility check)

Note! This solution is inspired by the previously designed similar tool. #8243

Describe the solution you’d like

Stage 1. POC - Done ✔️

The initial stage provides us with fundamental functionality.
In addition, it's enough to start integration with potential benchmark frameworks.

  • Configure the new project in a separate repository - Repository
  • Fulfill readme
  • Implement common core
    • Scenario model
    • Scenario consistency validation
    • Scenario config parser
    • Scenario executor
    • Credential model
    • Credential config parser (local)
    • Mapper credentials and scenario
    • Make log formatting similar to Airbyte
  • Implement basic sync runner
    • Connect to existing Airbyte instance
    • Create Source
    • Create Destination
    • Create Connection
    • Run sync
    • Return sync result (is successful)
  • Prepare test Airbyte instance (before we automate the tool)
  • Prepare test source instances with test data (before we automate the tool)
  • Prepare test destination instances (before we automate the tool)

Main flow diagram

image

Scenario example

{
  "scenarioName" : "Poc Scenario",
  "usedInstances" : [
    {
      "instanceName" : "airbyte_1",
      "instanceType" : "AIRBYTE"
    },
    {
      "instanceName" : "source_1",
      "instanceType" : "SOURCE"
    },
    {
      "instanceName": "destination_1",
      "instanceType": "DESTINATION"
    },
    {
      "instanceName": "connection_1",
      "instanceType": "CONNECTION"
    }
  ],
  "preparationActions" : [
    {
      "action" : "CONNECT_AIRBYTE_API",
      "resultInstance" : "airbyte_1"
    },
    {
      "action" : "CREATE_SOURCE",
      "requiredInstances" : ["airbyte_1"],
      "resultInstance" : "source_1"
    },
    {
      "action": "CREATE_DESTINATION",
      "requiredInstances" : ["airbyte_1"],
      "resultInstance": "destination_1"
    },
    {
      "action" : "CREATE_CONNECTION",
      "requiredInstances" : ["airbyte_1", "source_1", "destination_1"],
      "resultInstance" : "connection_1"
    }
  ],
  "scenarioActions" : [
    {
      "action" : "SYNC_CONNECTION",
      "requiredInstances" : ["airbyte_1", "connection_1"]
    }
  ]
}

Stage 2. Credential customization - Done ✔️

This stage allows specifying the Airbyte instance, source, and destination credentials.

  • Parsing incoming args
  • Extend the readme by section with scenarios and call examples (with args)
  • Retrieve credentials
    • Implement reading credentials from local files
    • Implement reading credentials from secret storage
    • Handle incoming Airbite instance credentials
    • Handle source/destination credentials
  • Extend the scenario model to provide customizations for Actions
  • Implement Update version scenario action
  • Implement Scenario helper
  • Add possible to call helper for a scenario

Stage 3. Run configuration - Done ✔️

  • Extend scenario structure by description
  • Show description and validation results in the help and list commands
  • New credential type source_with_connector_settings
  • Implement actions which can provide credentials
  • Implement new action create_custom_connector
  • New scenario for incremental sync
  • Add result parameter to the Scenario Action model
  • Implement new action get_source_version
  • Implement new action get_destination_version
  • Upgrade the version update scenarios by returning the original version after a run

Stage 4. Docker & CI - Done ✔️

  • Configure docker
  • Configure CI commands
  • Provide summary result class
  • Store result class into file
  • Read result class in the GA and put it into the commet

Example List all scenarios command :

image

Example Help command :

image

Example Run sync command :

image

Example Fail sync run command :

image

Checkpoint - Reached 🎉

We have a fully operational E2E test tool that can interact with an existing Airbity instance and running sources or destinations.
The CI commands and predefined configs allow us to run integration tests for specific source-destination combinations.
In this state, we can already cover such cases:

  • Incompatible with other connectors (integration compatibility check)
  • Incompatible with Airbyte core (core compatibility check)

Stage 6. Benchmark - In progress 🏗️

  • Integrate the benchmark framework with the testing tool

Stage 5. Autonomous run - Done ✔️

  • Extend the core to handle autonomous instances
  • Add possibility to up local Airbyte instance
  • Add possibility to up source/destination instances (Common logic with implementation few the most popular source/destination connectors)
  • Use normalization by default
  • Implement autonomous Postgres destination instance
  • Add GA by pushing the project into the docker hub
  • Publish the project docker image
  • Pull the image in the GAs instead of the image build
  • Integrate the tool with main repository GAs

Stage 5.1. Implement destination containers

  • Implement autonomous MySql destination instance
  • Implement autonomous Oracle destination instance
  • Implement autonomous MsSql destination instance
  • Implement autonomous MariaDb destination instance
    ...

Stage 7. Test data generation on the fly

  • Extend source/destination handler by testing data population methods
  • Design test data config files
  • Implement

Stage 8. Result comparison

To detect possible issues in the new version, we should compare the results of the current version and the new version's results. If we don't expect any changes in the result, the structure and data should be equal.
Note! Some changes lead to different results (like fixes). In this case, we will accept a flag like diff_is_expected.

  • Add the possibility to run a few different versions and collect their results
  • Implement common comparison logic

Checkpoint

Here we have an automated testing tool that can be scheduled on CI tasks or run on demand with different configurations and data sets.
The main advantage of the tool is true E2E. Such testing guarantee that we validate the whole system before a version release.

@DoNotPanicUA DoNotPanicUA self-assigned this Aug 1, 2022
@DoNotPanicUA DoNotPanicUA changed the title [DRAFT] [EPIC] E2E testing tool [DRAFT] 📜 [EPIC] E2E testing tool Aug 1, 2022
@DoNotPanicUA DoNotPanicUA changed the title [DRAFT] 📜 [EPIC] E2E testing tool [DRAFT] [EPIC] E2E testing tool Aug 1, 2022
@DoNotPanicUA DoNotPanicUA changed the title [DRAFT] [EPIC] E2E testing tool [EPIC] E2E testing tool Aug 2, 2022
@DoNotPanicUA
Copy link
Contributor Author

@alexandr-shegeda
Please review

@alexandr-shegeda
Copy link
Contributor

@DoNotPanicUA all looks good, the only suggestion is to move Stage 8. Test data population closer to 1-2 stages

@DoNotPanicUA
Copy link
Contributor Author

@DoNotPanicUA all looks good, the only suggestion is to move Stage 8. Test data population closer to 1-2 stages

This step means filling test data using config files. The tool will generate data on the fly.
Before automatization and local run, we will prepare test data manually and reuse it.
I will rephrase a bit to make it more clear.

@grishick
Copy link
Contributor

grishick commented Aug 4, 2022

Tagging @bleonard, @sherifnada and @davinchia for review

@grishick
Copy link
Contributor

grishick commented Aug 5, 2022

I like the approach. Please file Github issues for the first stage and include @davinchia and me as reviewers when creating PRs.

@evantahler
Copy link
Contributor

Some suggestions:

  • Use the Octavia CLI! We have a CLI tool for setting up sources, destinations, and syncs. It might be helpful. This repo (https://github.com/airbytehq/airflow-summit-airbyte-2022) has some examples of automating the octavia CLI within Github Actions CI.
  • For setting up sample data, maybe source-faker can help - This source produces N "user", "purchase", and "product" records. They can be randomly seeded or with a fixed seed to always produce the same data.

@alafanechere
Copy link
Contributor

alafanechere commented Oct 3, 2022

Use the Octavia CLI!

+1 , using the CLI will reduce the maintenance burden in the case of Airbyte API evolutions: the CLI is responsible for adapting to Airbyte API changes.

@DoNotPanicUA
Copy link
Contributor Author

I've inspected the possibility of using Octavia CLI as part of the solution. I don't see a good integration between the E2E testing tool and Octavia CLI.
But I assume that when I finish the original architecture and list the main use cases, we can decrease the tool's flexibility and reuse some other modules to improve the nonfunctional aspects of the tool.

@DoNotPanicUA DoNotPanicUA removed their assignment Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants