Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-amplitude] - Migrate to manifest-only #51601

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented Jan 16, 2025

What

How

  • Leverages new ZipfileDecoder, GzipParser, JsonLineParser to decode and parse data returned by Amplitude's events export API.
  • Adds 2-hour lookback window to events stream per Amplitude documentation:

Data is available to export at a minimum within 2 hours of when the servers received it.

  • Creates new TransformDatetimesToRFC3339 component to maintain existing stream behavior of converting all date-time strings to RFC3339 format.
    • To maintain compatibility with inline schema requirement for manifest-only connectors and to prevent having to parse the entire manifest.yaml file for every transformation, event stream datetime fields are hard-coded into this class.
  • Removes outdated unit tests
  • Migrates to manifest-only via airbyte-ci

Review guide

  1. manifest.yaml
  2. components.py
  3. tests

Recommended Readings

  • streams.py for now-deleted Events stream
  • airbyte-cdk's ZipfileDecoder, GzipParser, and JsonLineParser

User Impact

  • None, events stream state format is updated with migration to low-code, but can be parsed from either format and converted to the relevant format.

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Jan 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 23, 2025 1:38am

class AverageSessionLengthRecordExtractor(RecordExtractor):
"""
Create records from complex response structure
Issue: https://github.com/airbytehq/airbyte/issues/23145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the docs link to the same issue, I'd love to have longer, more descriptive docstrings that explain what the custom component does, and why do we have to use it.

I know this might not be related to the migration itself as you're just moving them over.

Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh look ma, I'm useful!

I'm approving with a caveat that I'd like to see green CI before merging.

Left a couple of nit comments, non-blocking.

Here is why CI is not green:

FileNotFoundError: Could not find pyproject.toml in the unit_tests directory for source-amplitude.

Which means — you want to have a pyproject.toml file in unit_tests directory that is a dependency management thing specifically for unit tests. Here is the guide that @ChristoGrab made about setting it up.

return []


class TransformDatetimesToRFC3339(RecordTransformation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, potential improvement for a follow-up PR: I think this is possible to do in the Builder already, no? Do we not support date transforms in jinja?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per convo I'll make this a follow-up task so we can remove the custom component.

@@ -21,7 +22,13 @@ def config():

@pytest.fixture(scope="module")
def streams(config):
return SourceAmplitude().streams(config=config)
catalog = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how does that work? Why is this needed? How are they usually configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated -- accidentally pushed before completing. intergration_test.py is updated now.

@maxi297
Copy link
Contributor

maxi297 commented Jan 22, 2025

Can you also share the regression tests that you've done? Also, can we clarify that this change can't be reverted and why in the Can this PR be safely reverted and rolled back? section of the PR?

The rest seems fine to me once we have fixed the mock server/integration tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/amplitude
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants