Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-amplitude] - Migrate to manifest-only #51601

Merged
merged 37 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
629a6fa
update airbyte-cdk to latest
pnilan Jan 16, 2025
a9814e7
migration progress
pnilan Jan 16, 2025
c1c6df4
migrate events stream to low-code, create custom transformation compo…
pnilan Jan 16, 2025
ef971e5
update inner_parser to JsonLinesParser
pnilan Jan 16, 2025
5d65756
remove unused code, clean up source.py
pnilan Jan 16, 2025
5b7ed37
update airbyte-cdk to dev
pnilan Jan 17, 2025
370bc60
bumpm version
pnilan Jan 17, 2025
ce974f9
update transformation to raise ATE on failure, adds test cases
pnilan Jan 17, 2025
e692e84
add state migration
pnilan Jan 17, 2025
9eab2a2
remove outdated tests
pnilan Jan 21, 2025
9e1b952
bypass `test_state_with_abnormally_large_values` and update to latest…
pnilan Jan 21, 2025
cdd165e
update integration test
pnilan Jan 21, 2025
093ab6e
Merge branch 'master' into pnilan/source-amplitude/manifest-only-migr…
pnilan Jan 21, 2025
5a4b301
chore: format code
pnilan Jan 21, 2025
363e66f
update state cursor field
pnilan Jan 21, 2025
43081c0
Revert -- remove events state migration
pnilan Jan 21, 2025
19be117
remove filter condition
pnilan Jan 21, 2025
78910cd
clean up
pnilan Jan 21, 2025
99b223c
add tests
pnilan Jan 21, 2025
9b5e5c6
add lookback window, update state format in sample
pnilan Jan 21, 2025
0e31378
hard code date-time properties into custom transfomration (for compat…
pnilan Jan 21, 2025
c294da4
remove python from metadata
pnilan Jan 21, 2025
f6db2b6
migrate events schema to inline
pnilan Jan 21, 2025
63c6074
migrated to manifest-only via `airbyte-ci` command
pnilan Jan 21, 2025
eccb6f7
chore: format
pnilan Jan 22, 2025
06a33bf
fix error
pnilan Jan 22, 2025
45100a9
update tests for compatibility with manifest-only
pnilan Jan 22, 2025
0489051
clean up
pnilan Jan 22, 2025
37cb3e6
removes cohort and annotations integration tests
pnilan Jan 22, 2025
b097d59
remove duplicates from manifest
pnilan Jan 22, 2025
f591a66
update integration_test.py
pnilan Jan 22, 2025
14d3bcb
Merge branch 'master' into pnilan/source-amplitude/manifest-only-migr…
pnilan Jan 22, 2025
afb747a
remove rc suffix (for prerelease publishing)
pnilan Jan 23, 2025
619a179
revert to rc
pnilan Jan 23, 2025
87273ac
deactive progressive rollout for dev image publishing
pnilan Jan 23, 2025
48e2235
revert to rc
pnilan Jan 23, 2025
20bb3b6
remove unnecessary definitions, remove references to removed "paramet…
pnilan Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 25 additions & 64 deletions airbyte-integrations/connectors/source-amplitude/README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,22 @@
# Amplitude source connector

This is the repository for the Amplitude source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/amplitude).
This directory contains the manifest-only connector for `source-amplitude`.
This _manifest-only_ connector is not a Python package on its own, as it runs inside of the base `source-declarative-manifest` image.

## Local development

### Prerequisites

- Python (~=3.9)
- Poetry (~=1.7) - installation instructions [here](https://python-poetry.org/docs/#installation)

### Installing the connector

From this connector directory, run:

```bash
poetry install --with dev
```

### Create credentials
For information about how to configure and use this connector within Airbyte, see [the connector's full documentation](https://docs.airbyte.com/integrations/sources/amplitude).

**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/amplitude)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_amplitude/spec.yaml` file.
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
See `sample_files/sample_config.json` for a sample config file.

### Locally running the connector

```
poetry run source-amplitude spec
poetry run source-amplitude check --config secrets/config.json
poetry run source-amplitude discover --config secrets/config.json
poetry run source-amplitude read --config secrets/config.json --catalog sample_files/configured_catalog.json
```
## Local development

### Running unit tests
We recommend using the Connector Builder to edit this connector.
Using either Airbyte Cloud or your local Airbyte OSS instance, navigate to the **Builder** tab and select **Import a YAML**.
Then select the connector's `manifest.yaml` file to load the connector into the Builder. You're now ready to make changes to the connector!

To run unit tests locally, from the connector directory run:

```
poetry run pytest unit_tests
```
If you prefer to develop locally, you can follow the instructions below.

### Building the docker image

You can build any manifest-only connector with `airbyte-ci`:

1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
2. Run the following command to build the docker image:

Expand All @@ -53,52 +26,40 @@ airbyte-ci connectors --name=source-amplitude build

An image will be available on your host with the tag `airbyte/source-amplitude:dev`.

### Creating credentials

**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/amplitude)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `spec` object in the connector's `manifest.yaml` file.
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.

### Running as a docker container

Then run any of the connector commands as follows:
Then run any of the standard source connector commands:

```
```bash
docker run --rm airbyte/source-amplitude:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-amplitude:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-amplitude:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-amplitude:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```

### Running our CI test suite
### Running the CI test suite

You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):

```bash
airbyte-ci connectors --name=source-amplitude test
```

### Customizing acceptance Tests

Customize `acceptance-test-config.yml` file to configure acceptance tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.

### Dependency Management

All of your dependencies should be managed via Poetry.
To add a new dependency, run:

```bash
poetry add <package-name>
```

Please commit the changes to `pyproject.toml` and `poetry.lock` files.

## Publishing a new version of the connector

You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?

1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-amplitude test`
2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
- bump the `dockerImageTag` value in in `metadata.yaml`
- bump the `version` value in `pyproject.toml`
3. Make sure the `metadata.yaml` content is up to date.
If you want to contribute changes to `source-amplitude`, here's how you can do that:
1. Make your changes locally, or load the connector's manifest into Connector Builder and make changes there.
2. Make sure your changes are passing our test suite with `airbyte-ci connectors --name=source-amplitude test`
3. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
- bump the `dockerImageTag` value in in `metadata.yaml`
4. Make sure the connector documentation and its changelog is up to date (`docs/integrations/sources/amplitude.md`).
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
6. Pat yourself on the back for being an awesome contributor.
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.
3 changes: 0 additions & 3 deletions airbyte-integrations/connectors/source-amplitude/__init__.py

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ test_strictness_level: high
acceptance_tests:
spec:
tests:
- spec_path: "source_amplitude/spec.yaml"
- spec_path: "manifest.yaml"
backward_compatibility_tests_config:
# added new `active_users_group_by_country` prop to toggle grouping by country
disable_for_version: 0.6.10
Expand All @@ -25,9 +25,13 @@ acceptance_tests:
- config_path: "secrets/config.json"
empty_streams:
- name: "cohorts"
bypass_reason: "This stream is empty due to free subscription plan for the sandbox."
bypass_reason:
"This stream is empty due to free subscription plan for the
sandbox."
- name: "annotations"
bypass_reason: "This stream is empty due to free subscription plan for the sandbox."
bypass_reason:
"This stream is empty due to free subscription plan for the
sandbox."
expect_records:
path: "integration_tests/expected_records.jsonl"
exact_order: no
Expand All @@ -36,7 +40,9 @@ acceptance_tests:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
future_state:
future_state_path: "integration_tests/abnormal_state.json"
bypass_reason:
"Test `test_state_with_abnormally_large_values` is bypassed
as test does not make sense using Concurrent CDK"
timeout_seconds: 3600
skip_comprehensive_incremental_tests: yes
full_refresh:
Expand Down
96 changes: 96 additions & 0 deletions airbyte-integrations/connectors/source-amplitude/components.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
#
# Copyright (c) 2025 Airbyte, Inc., all rights reserved.
#

import logging
from dataclasses import dataclass
from typing import Any, Dict, List, Mapping, MutableMapping, Optional

import pendulum
import requests

from airbyte_cdk.models import FailureType
from airbyte_cdk.sources.declarative.extractors.record_extractor import RecordExtractor
from airbyte_cdk.sources.declarative.migrations.state_migration import StateMigration
from airbyte_cdk.sources.declarative.schema import JsonFileSchemaLoader
from airbyte_cdk.sources.declarative.transformations import RecordTransformation
from airbyte_cdk.sources.declarative.types import Config, Record
from airbyte_cdk.sources.types import Config, StreamSlice, StreamState
from airbyte_cdk.utils import AirbyteTracedException


logger = logging.getLogger("airbyte")


class AverageSessionLengthRecordExtractor(RecordExtractor):
"""
Create records from complex response structure
Issue: https://github.com/airbytehq/airbyte/issues/23145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the docs link to the same issue, I'd love to have longer, more descriptive docstrings that explain what the custom component does, and why do we have to use it.

I know this might not be related to the migration itself as you're just moving them over.

"""

def extract_records(self, response: requests.Response) -> List[Record]:
response_data = response.json().get("data", [])
if response_data:
# From the Amplitude documentation it follows that "series" is an array with one element which is itself
# an array that contains the average session length for each day.
# https://developers.amplitude.com/docs/dashboard-rest-api#returns-2
series = response_data.get("series", [])
if len(series) > 0:
series = series[0] # get the nested list
return [{"date": date, "length": length} for date, length in zip(response_data["xValues"], series)]
return []


class ActiveUsersRecordExtractor(RecordExtractor):
"""
Create records from complex response structure
Issue: https://github.com/airbytehq/airbyte/issues/23145
"""

def extract_records(self, response: requests.Response) -> List[Record]:
response_data = response.json().get("data", [])
if response_data:
series = list(zip(*response_data["series"]))
if series:
return [
{"date": date, "statistics": dict(zip(response_data["seriesLabels"], users))}
for date, users in zip(response_data["xValues"], series)
]
return []


class TransformDatetimesToRFC3339(RecordTransformation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, potential improvement for a follow-up PR: I think this is possible to do in the Builder already, no? Do we not support date transforms in jinja?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per convo I'll make this a follow-up task so we can remove the custom component.

def __init__(self):
self.name = "events"
self.date_time_fields = [
"event_time",
"server_upload_time",
"processed_time",
"server_received_time",
"user_creation_time",
"client_upload_time",
"client_event_time",
]

def transform(
self,
record: Dict[str, Any],
config: Optional[Config] = None,
stream_state: Optional[StreamState] = None,
stream_slice: Optional[StreamSlice] = None,
) -> None:
"""
Transform 'date-time' items to RFC3339 format
"""
for item in record:
if item in self.date_time_fields and record[item]:
try:
record[item] = pendulum.parse(record[item]).to_rfc3339_string()
except Exception as e:
logger.error(f"Error converting {item} to RFC3339 format: {e}")
raise AirbyteTracedException(
message=f"Error converting {item} to RFC3339 format. See logs for more infromation",
internal_message=f"Error converting {item} to RFC3339 format: {e}",
failure_type=FailureType.system_error,
) from e
return record
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
# Copyright (c) 2025 Airbyte, Inc., all rights reserved.
#


Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
# Copyright (c) 2025 Airbyte, Inc., all rights reserved.
#

import json
Expand All @@ -11,6 +11,7 @@

from airbyte_cdk.models import SyncMode
from airbyte_cdk.sources.declarative.types import StreamSlice
from airbyte_cdk.test.catalog_builder import CatalogBuilder


@pytest.fixture(scope="module")
Expand All @@ -21,7 +22,13 @@ def config():

@pytest.fixture(scope="module")
def streams(config):
return SourceAmplitude().streams(config=config)
catalog = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how does that work? Why is this needed? How are they usually configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated -- accidentally pushed before completing. intergration_test.py is updated now.

CatalogBuilder()
.with_stream("annotations_stream", sync_mode=SyncMode.full_refresh)
.with_stream("cohorts_stream", sync_mode=SyncMode.full_refresh)
.build()
)
return SourceAmplitude(catalog=catalog, config=config, state={}).streams(config=config)


@pytest.fixture(scope="module")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
{
"type": "STREAM",
"stream": {
"stream_state": { "event_time": "2023-02-15 08:00:00.000000" },
"stream_state": { "event_time": "20230215T08" },
"stream_descriptor": { "name": "events" }
}
},
Expand Down
9 changes: 0 additions & 9 deletions airbyte-integrations/connectors/source-amplitude/main.py

This file was deleted.

Loading
Loading