diff --git a/README.md b/README.md index 8da749370..c84d54d98 100644 --- a/README.md +++ b/README.md @@ -4,226 +4,31 @@ Airbyte Python CDK is a framework for building Airbyte API Source Connectors. It classes and helpers that make it easy to build a connector against an HTTP API (REST, GraphQL, etc), or a generic Python source connector. -## Usage +## Building Connectors with the CDK -If you're looking to build a connector, we highly recommend that you +If you're looking to build a connector, we highly recommend that you first [start with the Connector Builder](https://docs.airbyte.com/connector-development/connector-builder-ui/overview). It should be enough for 90% connectors out there. For more flexible and complex connectors, use the [low-code CDK and `SourceDeclarativeManifest`](https://docs.airbyte.com/connector-development/config-based/low-code-cdk-overview). -If that doesn't work, then consider building on top of the -[lower-level Python CDK itself](https://docs.airbyte.com/connector-development/cdk-python/). - -### Quick Start - -To get started on a Python CDK based connector or a low-code connector, you can generate a connector -project from a template: - -```bash -# from the repo root -cd airbyte-integrations/connector-templates/generator -./generate.sh -``` - -### Example Connectors - -**HTTP Connectors**: - -- [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/) -- [Salesforce](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/) - -**Python connectors using the bare-bones `Source` abstraction**: - -- [Google Sheets](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py) - -This will generate a project with a type and a name of your choice and put it in -`airbyte-integrations/connectors`. Open the directory with your connector in an editor and follow -the `TODO` items. +For more information on building connectors, please see the [Connector Development](https://docs.airbyte.com/connector-development/) guide on [docs.airbyte.com](https://docs.airbyte.com). ## Python CDK Overview Airbyte CDK code is within `airbyte_cdk` directory. Here's a high level overview of what's inside: -- `connector_builder`. Internal wrapper that helps the Connector Builder platform run a declarative - manifest (low-code connector). You should not use this code directly. If you need to run a - `SourceDeclarativeManifest`, take a look at - [`source-declarative-manifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest) - connector implementation instead. -- `destinations`. Basic Destination connector support! If you're building a Destination connector in - Python, try that. Some of our vector DB destinations like `destination-pinecone` are using that - code. -- `models` expose `airbyte_protocol.models` as a part of `airbyte_cdk` package. -- `sources/concurrent_source` is the Concurrent CDK implementation. It supports reading data from - streams concurrently per slice / partition, useful for connectors with high throughput and high - number of records. -- `sources/declarative` is the low-code CDK. It works on top of Airbyte Python CDK, but provides a - declarative manifest language to define streams, operations, etc. This makes it easier to build - connectors without writing Python code. -- `sources/file_based` is the CDK for file-based sources. Examples include S3, Azure, GCS, etc. +- `airbyte_cdk/connector_builder`. Internal wrapper that helps the Connector Builder platform run a declarative manifest (low-code connector). You should not use this code directly. If you need to run a `SourceDeclarativeManifest`, take a look at [`source-declarative-manifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest) connector implementation instead. +- `airbyte_cdk/cli/source_declarative_manifest`. This module defines the `source-declarative-manifest` (aka "SDM") connector execution logic and associated CLI. +- `airbyte_cdk/destinations`. Basic Destination connector support! If you're building a Destination connector in Python, try that. Some of our vector DB destinations like `destination-pinecone` are using that code. +- `airbyte_cdk/models` expose `airbyte_protocol.models` as a part of `airbyte_cdk` package. +- `airbyte_cdk/sources/concurrent_source` is the Concurrent CDK implementation. It supports reading data from streams concurrently per slice / partition, useful for connectors with high throughput and high number of records. +- `airbyte_cdk/sources/declarative` is the low-code CDK. It works on top of Airbyte Python CDK, but provides a declarative manifest language to define streams, operations, etc. This makes it easier to build connectors without writing Python code. +- `airbyte_cdk/sources/file_based` is the CDK for file-based sources. Examples include S3, Azure, GCS, etc. ## Contributing -Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to -get you started: - -- We adhere to the [code of conduct](/CODE_OF_CONDUCT.md). -- You can contribute by reporting bugs, posting github discussions, opening issues, improving - [documentation](/docs/), and submitting pull requests with bugfixes and new features alike. -- If you're changing the code, please add unit tests for your change. -- When submitting issues or PRs, please add a small reproduction project. Using the changes in your - connector and providing that connector code as an example (or a satellite PR) helps! - -### First time setup - -Install the project dependencies and development tools: - -```bash -poetry install --all-extras -``` - -Installing all extras is required to run the full suite of unit tests. - -#### Running tests locally - -- Iterate on the CDK code locally -- Run tests via `poetry run poe unit-test-with-cov`, or `python -m pytest -s unit_tests` if you want - to pass pytest options. -- Run `poetry run poe check-local` to lint all code, type-check modified code, and run unit tests - with coverage in one command. - -To see all available scripts, run `poetry run poe`. - -#### Formatting the code - -- Iterate on the CDK code locally -- Run `poetry run ruff format` to format your changes. - -To see all available `ruff` options, run `poetry run ruff`. - -##### Autogenerated files - -Low-code CDK models are generated from `sources/declarative/declarative_component_schema.yaml`. If -the iteration you are working on includes changes to the models or the connector generator, you -might want to regenerate them. In order to do that, you can run: - -```bash -poetry run poe build -``` - -This will generate the code generator docker image and the component manifest files based on the -schemas and templates. - -#### Testing - -All tests are located in the `unit_tests` directory. Run `poetry run poe unit-test-with-cov` to run -them. This also presents a test coverage report. For faster iteration with no coverage report and -more options, `python -m pytest -s unit_tests` is a good place to start. - -#### Building and testing a connector with your local CDK - -When developing a new feature in the CDK, you may find it helpful to run a connector that uses that -new feature. You can test this in one of two ways: - -- Running a connector locally -- Building and running a source via Docker - -##### Installing your local CDK into a local Python connector - -Open the connector's `pyproject.toml` file and replace the line with `airbyte_cdk` with the -following: - -```toml -airbyte_cdk = { path = "../../../airbyte-cdk/python/airbyte_cdk", develop = true } -``` - -Then, running `poetry update` should reinstall `airbyte_cdk` from your local working directory. - -##### Building a Python connector in Docker with your local CDK installed - -_Pre-requisite: Install the -[`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_ - -You can build your connector image with the local CDK using - -```bash -# from the airbytehq/airbyte base directory -airbyte-ci connectors --use-local-cdk --name= build -``` - -Note that the local CDK is injected at build time, so if you make changes, you will have to run the -build command again to see them reflected. - -##### Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed - -_Pre-requisite: Install the -[`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_ - -To run acceptance tests for a single connectors using the local CDK, from the connector directory, -run - -```bash -airbyte-ci connectors --use-local-cdk --name= test -``` - -#### When you don't have access to the API - -There may be a time when you do not have access to the API (either because you don't have the -credentials, network access, etc...) You will probably still want to do end-to-end testing at least -once. In order to do so, you can emulate the server you would be reaching using a server stubbing -tool. - -For example, using [mockserver](https://www.mock-server.com/), you can set up an expectation file -like this: - -```json -{ - "httpRequest": { - "method": "GET", - "path": "/data" - }, - "httpResponse": { - "body": "{\"data\": [{\"record_key\": 1}, {\"record_key\": 2}]}" - } -} -``` - -Assuming this file has been created at `secrets/mock_server_config/expectations.json`, running the -following command will allow to match any requests on path `/data` to return the response defined in -the expectation file: - -```bash -docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0 -``` - -HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file. -To test this, the implementer either has to change the code which defines the base URL for Python -source or update the `url_base` from low-code. With the Connector Builder running in docker, you -will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed -within docker. - -#### Publishing a new version to PyPi - -Python CDK has a -[GitHub workflow](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml) -that manages the CDK changelog, making a new release for `airbyte_cdk`, publishing it to PyPI, and -then making a commit to update (and subsequently auto-release) -[`source-declarative-m anifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest) -and Connector Builder (in the platform repository). - -> [!Note]: The workflow will handle the `CHANGELOG.md` entry for you. You should not add changelog -> lines in your PRs to the CDK itself. +For instructions on how to contribute, please see our [Contributing Guide](docs/CONTRIBUTING.md). -> [!Warning]: The workflow bumps version on it's own, please don't change the CDK version in -> `pyproject.toml` manually. +## Release Management -1. You only trigger the release workflow once all the PRs that you want to be included are already - merged into the `master` branch. -2. The - [`Publish CDK Manually`](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml) - workflow from master using `release-type=major|manor|patch` and setting the changelog message. -3. When the workflow runs, it will commit a new version directly to master branch. -4. The workflow will bump the version of `source-declarative-manifest` according to the - `release-type` of the CDK, then commit these changes back to master. The commit to master will - kick off a publish of the new version of `source-declarative-manifest`. -5. The workflow will also add a pull request to `airbyte-platform-internal` repo to bump the - dependency in Connector Builder. +Please see the [Release Management](./RELEASES.md) guide for information on how to perform releases and pre-releases. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 4b417ada9..1b29f22a1 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -1,32 +1,159 @@ -# Contributing to the Python CDK +# Airbyte Python CDK - Contributing Guide Learn how you can become a contributor to the Airbyte Python CDK. -## Development +Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to get you started: -- Make sure [Poetry is installed](https://python-poetry.org/docs/#). -- Run `poetry install` -- For examples, check out the `examples` folder. They can be run via `poetry run python examples/` -- Unit tests and type checks can be run via `poetry run pytest` +- We adhere to the Airbyte [code of conduct](https://docs.airbyte.com/community/code-of-conduct). +- You can contribute by reporting bugs, posting github discussions, opening issues, improving docs, and submitting pull requests with bugfixes and new features alike. +- If you're changing the code, please add unit tests for your change. +- When submitting issues or PRs, please add a small reproduction project. Using the changes in your connector and providing that connector code as an example (or a satellite PR) helps! -## Documentation +## First Time Setup + +Here are some tips to get started using the project dependencies and development tools: + +1. Clone the CDK repo. If you will be testing connectors, you should clone the CDK into the same parent directory as `airbytehq/airbyte`, which contains the connector definitions. +1. Make sure [Poetry is installed](https://python-poetry.org/docs/#). +1. Run `poetry install --all-extras`. +1. Unit tests can be run via `poetry run pytest`. +1. You can use "Poe" tasks to perform common actions such as lint checks (`poetry run poe lint`), autoformatting (`poetry run poe format-fix`), etc. For a list of tasks you can run, try `poetry run poe list`. + +Note that installing all extras is required to run the full suite of unit tests. + +## Working with Poe Tasks + +The Airbyte CDK uses [Poe the Poet](https://poethepoet.natn.io/) to define common development task. You can run `poetry run poe list` to see all available tasks. This will work after `poetry install --all-extras` without any additional installations. + +Optionally, if you can [pre-install Poe](https://poethepoet.natn.io/installation.html) with `pipx install poethepoet` and then you will be able to run Poe tasks with the shorter `poe TASKNAME` syntax instead of `poetry run poe TASKNAME`. + +## Running tests locally + +- Iterate on the CDK code locally. +- Run tests via `poetry run poe pytest`, or `python -m pytest -s unit_tests` if you want to pass pytest options. +- Run `poetry run pytest-fast` to run the subset of PyTest tests which are not flagged as `slow`. (Should take <5 min for fast tests only.) +- Run `poetry run poe check-local` to lint all code, type-check modified code, and run unit tests with coverage in one command. + +To see all available scripts, run `poetry run poe`. + +## Formatting Code + +- Iterate on the CDK code locally. +- Run `poetry run poe format-fix` to auto-fix formatting issues. +- If you only want to format Python code (excluding markdown, yaml, etc.), you can use `poetry run ruff format` to autoformat your Python code. + +To see all available `ruff` options, run `poetry run ruff`. + +## Auto-Generating the Declarative Schema File + +Low-code CDK models are generated from `sources/declarative/declarative_component_schema.yaml`. If the iteration you are working on includes changes to the models or the connector generator, you may need to regenerate them. In order to do that, you can run: + +```bash +poetry run poe build +``` + +This will generate the code generator docker image and the component manifest files based on the schemas and templates. + +## Generating API Reference Docs Documentation auto-gen code lives in the `/docs` folder. Based on the doc strings of public methods, we generate API documentation using [pdoc](https://pdoc.dev). -To generate the documentation, run: +To generate the documentation, run `poe docs-generate` or to build and open the docs preview in one step, run `poe docs-preview`. + +The `docs-generate` Poe task is mapped to the `run()` function of `docs/generate.py`. Documentation pages will be generated in the `docs/generated` folder (ignored by git). You can also download auto-generated API docs for any GitHub push by navigating to the "Summary" tab of the docs generation check in GitHub Actions. + +## Release Management + +Please see the [Release Management](./RELEASES.md) guide for information on how to perform releases and pre-releases. + +## FAQ + +### Q: Who are "maintainers"? + +For the purpose of this documentation, "maintainers" are those who have write permissions (or higher) on the repo. Generally these are Airbyte team members. + +### Q: Where should connectors put integration tests? + +Only tests within the `unit_tests` directory will be run by `airbyte-ci`. If you have integration tests that should also run, the common convention is to place these in the `unit_tests/integration` directory. This ensures they will be run automatically in CI and before each new release. + +### Q: What GitHub slash commands are available and who can run them? -```console -poe docs-generate +Only Airbyte CDK maintainers can run slash commands. The most common slash commands are as follows: + +- `/autofix`- Corrects any linting or formatting issues and commits the change back to the repo. +- `/poetry-lock` - Re-locks dependencies and updates the `poetry.lock` file, then commits the changes back to the repo. This is helpful after merging in updates from main, or when creating a PR in the browser - such as for version bumps or dependency updates directly in the PR. + +The full list of available slash commands can be found in the [slash command dispatch file](https://github.com/airbytehq/airbyte-python-cdk/blob/main/.github/workflows/slash_command_dispatch.yml#L21-L25). + +# Appendix: Advanced Topics + +## Using MockServer in Place of Direct API Access + +There may be a time when you do not have access to the API (either because you don't have the credentials, network access, etc...) You will probably still want to do end-to-end testing at least once. In order to do so, you can emulate the server you would be reaching using a server stubbing tool. + +For example, using [MockServer](https://www.mock-server.com/), you can set up an expectation file like this: + +```json +{ + "httpRequest": { + "method": "GET", + "path": "/data" + }, + "httpResponse": { + "body": "{\"data\": [{\"record_key\": 1}, {\"record_key\": 2}]}" + } +} +``` + +Assuming this file has been created at `secrets/mock_server_config/expectations.json`, running the following command will allow to match any requests on path `/data` to return the response defined in the expectation file: + +```bash +docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0 +``` + +HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file. To test this, the implementer either has to change the code which defines the base URL for Python source or update the `url_base` from low-code. With the Connector Builder running in docker, you will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed within docker. + +## Testing Connectors against local CDK Changes + +When developing a new feature in the CDK, you will sometimes find it necessary to run a connector that uses that new feature, or to use an existing connector to validate some new feature or fix in the CDK. + +### Option 1: Installing your local CDK into a local Python connector + +Open the connector's `pyproject.toml` file and replace the line with `airbyte_cdk` with the following: + +```toml +airbyte_cdk = { path = "../../../../airbyte-python-cdk", develop = true } ``` -Or to build and open the docs preview in one step: +Then, running `poetry update` should reinstall `airbyte_cdk` from your local working directory. When testing is complete and you've published the CDK update, remember to revert your change and bump to the latest CDK version before re-publishing the connector. + +### Option 2: Build and Test Connectors Using `airbyte-ci --use-local-cdk` + +_Pre-requisite: Install the [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_ + +You can build your connector image with the local CDK using -```console -poe docs-preview +```bash +# from the airbytehq/airbyte base directory +airbyte-ci connectors --use-local-cdk --name= build ``` -or `poetry run poe docs-preview` if you don't have [Poe](https://poethepoet.natn.io/index.html) installed yet. +Or use the `test` command with `--use-local-cdk` to run the full set of connector tests, including connector acceptance tests (CAT) and the connector's own unit tests: -The `docs-generate` Poe task is mapped to the `run()` function of `docs/generate.py`. +```bash +# from the airbytehq/airbyte base directory +airbyte-ci connectors --use-local-cdk --name= build +``` + +Note that the local CDK is injected at build time, so if you make changes, you will have to run the build command again to see them reflected. + +#### Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed -Documentation pages will be generated in the `docs/generated` folder. The `test_docs.py` test in pytest will automatically update generated content. This updates must be manually committed before docs tests will pass. +_Pre-requisite: Install the +[`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_ + +To run acceptance tests for a single connectors using the local CDK, from the connector directory, run: + +```bash +airbyte-ci connectors --use-local-cdk --name= test +``` diff --git a/docs/RELEASES.md b/docs/RELEASES.md new file mode 100644 index 000000000..c51ebd6bf --- /dev/null +++ b/docs/RELEASES.md @@ -0,0 +1,46 @@ +# Airbyte Python CDK - Release Management Guide + +## Publishing stable releases of the CDK + +A few seconds after any PR is merged to `main` , a release draft will be created or updated on the releases page here: https://github.com/airbytehq/airbyte-python-cdk/releases. Here are the steps to publish a CDK release: + +1. Click “Edit” next to the release. +2. Optionally change the version if you want a minor or major release version. When changing the version, you should modify both the tag name and the release title so the two match. The format for release tags is `vX.Y.Z` and GitHub will prevent you from creating the tag if you forget the “v” prefix. +3. Optionally tweak the text in the release notes - for instance to call out contributors, to make a specific change more intuitive for readers to understand, or to move updates into a different category than they were assigned by default. (Note: You can also do this retroactively after publishing the release.) +4. Publish the release by pressing the “Publish release” button. + +*Note:* + +- *Only maintainers can see release drafts. Non-maintainers will only see published releases.* +- If you create a tag on accident that you need to remove, contact a maintainer to delete the tag and the release. +- You can monitor the PyPi release process here in the GitHub Actions view: https://github.com/airbytehq/airbyte-python-cdk/actions/workflows/pypi_publish.yml + +- **_[▶️ Loom Walkthrough](https://www.loom.com/share/ceddbbfc625141e382fd41c4f609dc51?sid=78e13ef7-16c8-478a-af47-4978b3ff3fad)_** + +## Publishing Pre-Release Versions of the CDK + +Publishing a pre-release version is similar to publishing a stable version. However, instead of using the auto-generated release draft, you’ll create a new release draft. + +1. Navigate to the releases page: https://github.com/airbytehq/airbyte-python-cdk/releases +2. Click “Draft a new release”. +3. In the tag selector, type the version number of the prerelease you’d like to create and copy-past the same into the Release name box. + - The release should be like `vX.Y.Zsuffix` where `suffix` is something like `dev0`, `dev1` , `alpha0`, `beta1`, etc. + +## Publishing new versions of SDM (source-declarative-manifest) + +Prereqs: + +1. The SDM publish process assumes you have already published the CDK. If you have not already done so, you’ll want to first publish the CDK using the steps above. While this prereq is not technically *required*, it is highly recommended. + +Publish steps: + +1. Navigate to the GitHub action page here: https://github.com/airbytehq/airbyte-python-cdk/actions/workflows/publish_sdm_connector.yml +2. Click “Run workflow” to start the process of invoking a new manual workflow. +3. Click the drop-down for “Run workflow from” and then select the “tags” tab to browse already-created tags. Select the tag of the published CDK version you want to use for the SDM publish process. Notes: + 1. Optionally you can type part of the version number to filter down the list. + 2. You can ignore the version prompt box (aka leave blank) when publishing from a release tag. The version will be detected from the git tag. + 3. You can optionally click the box for “Dry run” if you want to observe the process before running the real thing. The dry run option will perform all steps *except* for the DockerHub publish step. +4. Without changing any other options, you can click “Run workflow” to run the workflow. +5. Watch the GitHub Action run. If successful, you should see it publish to DockerHub and a URL will appear on the “Summary” view once it has completed. + +- **_[▶️ Loom Walkthrough](https://www.loom.com/share/bc8ddffba9384fcfacaf535608360ee1)_** diff --git a/pyproject.toml b/pyproject.toml index 8c81a42e7..9a0e80327 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -151,7 +151,7 @@ lint = {sequence = ["_lint-ruff", "type-check"], help = "Lint all code. Includes check-lockfile = {cmd = "poetry check", help = "Check the poetry lock file."} # Linting/Typing fix tasks -lint-fix = { cmd = "poetry run ruff check --fix ." } +lint-fix = { cmd = "poetry run ruff check --fix .", help = "Auto-fix any lint issues that Ruff can automatically resolve (excluding 'unsafe' fixes)." } lint-fix-unsafe = { cmd = "poetry run ruff check --fix --unsafe-fixes .", help = "Lint-fix modified files, including 'unsafe' fixes. It is recommended to first commit any pending changes and then always manually review any unsafe changes applied." } # Combined Check and Fix tasks @@ -176,8 +176,8 @@ check-ci = {sequence = ["check-lockfile", "build", "lint", "unit-test-with-cov"] pre-push = {sequence = ["build", "check-local"], help = "Run all build and check tasks."} # API Docs with PDoc -docs-generate = {env = {PDOC_ALLOW_EXEC = "1"}, cmd = "python -m docs.generate run"} -docs-preview = {shell = "poe docs-generate && open docs/generated/index.html"} +docs-generate = {env = {PDOC_ALLOW_EXEC = "1"}, cmd = "python -m docs.generate run", help="Generate API documentation with PDoc."} +docs-preview = {shell = "poe docs-generate && open docs/generated/index.html", help="Generate API documentation with PDoc and then open the docs in the default web browser."} [tool.check-wheel-contents] # Quality control for Python wheel generation. Docs here: