Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(low-code cdk): add config component resolver #149

Merged
merged 15 commits into from
Dec 12, 2024

Conversation

lazebnyi
Copy link
Contributor

@lazebnyi lazebnyi commented Dec 6, 2024

What

This PR add config component resolver to dynamic stream
Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10787?issue=airbytehq%7Cairbyte-internal-issues%7C10789

How

Described schema, models and data classes for ConfigComponentsResolver andStreamConfig. Defined behavior for components resolving using source config. Added validation to ensure dynamic stream names are unique.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced StreamConfig and ConfigComponentsResolver components for enhanced stream configuration management.
    • Updated DynamicDeclarativeStream to support new component resolution options.
    • Enhanced flexibility in component mapping with updated field path definitions.
    • Expanded component type mappings to include new configurations.
  • Bug Fixes

    • Improved handling of unique dynamic stream names to prevent duplicates.
  • Tests

    • Added unit tests for ConfigComponentsResolver to validate dynamic stream behavior.
    • Enhanced tests for HttpComponentsResolver with additional mock data and configuration usage.

@lazebnyi lazebnyi requested a review from maxi297 December 6, 2024 18:39
@github-actions github-actions bot added the enhancement New feature or request label Dec 6, 2024
Copy link
Contributor

coderabbitai bot commented Dec 6, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces significant updates to the Airbyte CDK's declarative components. New components StreamConfig and ConfigComponentsResolver are added to enhance schema flexibility for stream configurations. Modifications to existing classes, such as ComponentMappingDefinition and DynamicDeclarativeStream, allow for more versatile field paths and component resolution options. Additionally, the ManifestDeclarativeSource class is updated to prevent duplicate dynamic stream entries. The changes are complemented by new tests for the ConfigComponentsResolver and adjustments to existing tests for the HttpComponentsResolver.

Changes

File Path Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added StreamConfig and ConfigComponentsResolver; updated ComponentMappingDefinition and DynamicDeclarativeStream.
airbyte_cdk/sources/declarative/manifest_declarative_source.py Added Set import; introduced seen_dynamic_streams to prevent duplicate dynamic stream names.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added StreamConfig and ConfigComponentsResolver classes; updated ComponentMappingDefinition and DeclarativeStream.
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py Updated DEFAULT_MODEL_TYPES and CUSTOM_COMPONENTS_MAPPING for new component types.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added create_stream_config and create_config_components_resolver methods; updated mappings for new models.
airbyte_cdk/sources/declarative/resolvers/__init__.py Added imports for StreamConfig and ConfigComponentsResolver; updated COMPONENTS_RESOLVER_TYPE_MAPPING and __all__.
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py Introduced StreamConfig and ConfigComponentsResolver classes for managing stream configurations.
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py Added unit tests for ConfigComponentsResolver.
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py Renamed test function and updated mock data and configuration handling.

Possibly related PRs

Suggested reviewers

  • maxi297
  • aaronsteers

Wydt about these suggestions?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (1)

68-91: Consider extending support for non-string component values

Currently, the __post_init__ method raises a ValueError if a component's value is neither a string nor an InterpolatedString. Would it be beneficial to support other data types, such as numbers or booleans, to enhance flexibility in component mappings? This could make the resolver more versatile when dealing with different types of configuration values.

wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1631-1645: Review default values for optional fields

In the DeclarativeStream class, the name field is set as Optional[str] = Field("", ...), which defaults to an empty string. This might blur the distinction between a missing name and an intentionally empty name. Would setting the default to None provide clearer semantics? Similarly, reviewing other optional fields to ensure their defaults are appropriately set could improve clarity.

wdyt?

airbyte_cdk/sources/declarative/resolvers/__init__.py (2)

7-9: Organize imports for better readability

Consider reorganizing the imports to group related modules together and sort them alphabetically. This can enhance readability and make it easier to locate specific imports.

wdyt?


16-16: Format __all__ for improved clarity

The __all__ list is getting quite long. Formatting it across multiple lines or as a tuple could improve readability and maintainability, especially as more components are added in the future.

wdyt?

unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

Line range hint 150-176: Add a docstring to the test function

Adding a brief docstring to test_dynamic_streams_read_with_http_components_resolver would provide clarity on the purpose of the test and outline the scenarios being validated. This can aid future contributors in understanding and maintaining the test suite.

wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2307-2317: The StreamConfig creation looks good, but could we enhance error handling?

The implementation correctly handles the configs_pointer field, but what do you think about adding validation to ensure the pointer paths are valid? This could help catch configuration issues early. wdyt?

 @staticmethod
 def create_stream_config(
     model: StreamConfigModel, config: Config, **kwargs: Any
 ) -> StreamConfig:
     model_configs_pointer: List[Union[InterpolatedString, str]] = (
         [x for x in model.configs_pointer] if model.configs_pointer else []
     )
+    # Validate pointer paths
+    for pointer in model_configs_pointer:
+        if not isinstance(pointer, (str, InterpolatedString)):
+            raise ValueError(f"Invalid configs_pointer value: {pointer}")

     return StreamConfig(
         configs_pointer=model_configs_pointer,
         parameters=model.parameters or {},
     )
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

2923-2936: Support for array indices in field paths looks good!

The addition of integer type support in field_path enables direct array element access. The examples nicely demonstrate both string and integer usage patterns.

Quick question: Should we add a note in the description about integer indices being 0-based? This might help prevent off-by-one errors, wdyt?


2981-3006: StreamConfig component looks well-structured!

The new StreamConfig component provides a clean way to locate stream configurations within the source config. The required fields and interpolation support are well thought out.

One suggestion: The description mentions "source config file" but it's actually the runtime source config object. Should we update the description to avoid confusion? Something like "...path in source config where streams configs are located", wdyt?


3007-3026: ConfigComponentsResolver implementation looks solid!

The component provides a nice alternative to HTTP-based resolution, allowing stream templates to be populated from config values. The structure mirrors HttpComponentsResolver for consistency.

A thought: Given this is marked experimental, should we add a note about potential breaking changes in the description? Maybe something like "Breaking changes may be introduced as the API evolves", wdyt?


3041-3043: DynamicDeclarativeStream update integrates well!

The addition of ConfigComponentsResolver as an alternative to HttpComponentsResolver provides flexibility in how stream templates are resolved.

Consider adding an example in the description showing both resolver types? This could help users understand when to use each one, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 59505ff and e43343c.

📒 Files selected for processing (9)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (12 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
  • airbyte_cdk/sources/declarative/resolvers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (3 hunks)
🔇 Additional comments (7)
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (1)

94-105: Verify the handling of stream configuration paths

In the _stream_config property, dpath.get is used with the evaluated path list to retrieve the stream configurations from self.config. Are there scenarios where the evaluated paths might not exist in self.config, potentially leading to unexpected results or exceptions? It might be helpful to include error handling or default values to ensure robustness when the expected configuration paths are missing.

wdyt?

unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1)

107-142: Great job on comprehensive testing

The test_dynamic_streams_read_with_config_components_resolver function effectively covers the main use case of the ConfigComponentsResolver. The test is well-structured and validates the functionality as expected.

unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

Line range hint 182-198: Ensure consistent use of configuration in tests

You've updated the source.discover method to pass _CONFIG as the configuration. Are there any potential issues if _CONFIG changes in the future? Ensuring that tests use a consistent and controlled configuration can help prevent unexpected behaviors due to configuration changes.

wdyt?

airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)

36-36: LGTM! The mapping changes look good.

The updated mappings correctly reflect the new ConfigComponentResolver functionality and its associated components.

Also applies to: 40-42

airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

316-316: Great addition of duplicate stream name prevention!

The implementation effectively prevents duplicate dynamic stream names by tracking them in a set. This helps avoid potential issues that could arise from having multiple streams with the same name.

Also applies to: 354-359

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

200-205: LGTM! Component registration is properly handled.

The new ConfigComponentsResolver and StreamConfig components are correctly imported and registered in the factory system.

Also applies to: 490-491


2319-2342: LGTM! The ConfigComponentsResolver creation is well implemented.

The method properly creates both the stream_config and components_mapping, maintaining consistency with the factory pattern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1826-1826: Consider improving method signature readability.

The create_record_selector method signature is getting quite long with many parameters. What do you think about extracting some related parameters into a dataclass or using the builder pattern to improve readability? wdyt?

@dataclass
class RecordSelectorConfig:
    name: str
    transformations: List[RecordTransformation]
    decoder: Optional[Decoder] = None
    client_side_incremental_sync: Optional[Dict[str, Any]] = None

def create_record_selector(
    self,
    model: RecordSelectorModel,
    config: Config,
    selector_config: RecordSelectorConfig,
    **kwargs: Any,
) -> RecordSelector:
    # ... rest of the implementation

Also applies to: 2305-2305


2319-2342: LGTM! Clear implementation of config components resolver.

The create_config_components_resolver method:

  • Properly creates stream_config and components_mapping
  • Follows the factory method pattern consistently
  • Handles type conversion correctly

One small suggestion: Would it make sense to add some validation to ensure components_mapping is not empty? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e43343c and 29bb2ee.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

131-133: LGTM! Clean import addition.

The new import for ConfigComponentsResolver is properly grouped with related imports.


490-491: LGTM! Proper registration of new components.

The new components are correctly registered in the PYDANTIC_MODEL_TO_CONSTRUCTOR mapping.


2306-2318: LGTM! Well-structured stream config creation.

The create_stream_config method is clean and follows the established pattern:

  • Proper type hints
  • Handles optional parameters safely
  • Consistent with other factory methods

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1194-1203: Consider adding validation for configs_pointer paths

The StreamConfig class looks good but could benefit from additional validation to ensure the paths exist in the source config. This would help catch configuration errors early.

What if we added a validator to check that the paths exist? Something like:

@validator('configs_pointer')
def validate_configs_pointer(cls, v):
    if not v:
        raise ValueError("configs_pointer cannot be empty")
    return v

wdyt?


1169-1172: Consider documenting array index support in field_path

The field_path now supports integers for array indexing, which is a great addition. However, this change could use some documentation to help users understand this capability.

What if we added a docstring example showing array indexing? Something like:

# Example showing array indexing in field_path:
# ["data", 0, "name"] would access the 'name' field of the first element in the 'data' array

wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

2979-3004: Consider enhancing StreamConfig examples

The StreamConfig schema looks good, but the examples could be more comprehensive to show different config scenarios.

What if we added more examples to showcase different use cases? Something like:

examples:
  - ["data"]  # Simple path
  - ["data", "streams"]  # Nested path
  - ["data", "{{ parameters.name }}"]  # With interpolation
  - ["connections", "{{ config.connection_id }}", "streams"]  # Complex path with config

wdyt?


2931-2934: Consider adding a warning about array indices

The examples for field_path now show array indexing, but it might be worth adding a note about potential risks.

What if we added a warning in the description about array indices? Something like:

description: >
  A list of potentially nested fields indicating the full path where value will be added or updated.
  When using array indices, ensure the arrays exist and have sufficient length to avoid index errors.

wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 29bb2ee and a11aa49.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/auth/selective_authenticator.py (1 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/resolvers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/declarative/auth/selective_authenticator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/resolvers/init.py
  • airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1205-1210: LGTM! The ConfigComponentsResolver implementation looks clean and well-structured.

The class properly encapsulates the stream configuration and components mapping, with clear required fields.


1991-1991: LGTM! The components_resolver update provides good flexibility.

The DynamicDeclarativeStream now supports both HTTP and config-based component resolution, which provides good extensibility.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

3005-3024: LGTM! The ConfigComponentsResolver schema is well-defined.

The schema properly defines the required fields and includes a good description of the experimental nature of the component.


3039-3041: LGTM! The components_resolver schema update looks good.

The schema properly supports both HTTP and config-based resolvers through anyOf.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions to make sure I have the full context but generally, it looks good to me!

@lazebnyi lazebnyi requested a review from maxi297 December 12, 2024 14:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

2931-2934: Would it be helpful to add documentation about wildcard behavior? wdyt?

The new examples show support for integer indices and wildcards in field paths, which is great! Consider adding documentation to explain:

  • The difference between single * and double ** wildcards
  • Any limitations on wildcard depth or combinations

3005-3024: Should we enhance the experimental warning? wdyt?

The component looks well-structured! Consider expanding the experimental warning to be more specific about:

  • What aspects might change (e.g., interface, behavior)
  • Any known limitations or edge cases
  • Migration path when changes occur
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a11aa49 and 182552b.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (12 hunks)
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

1202-1211: LGTM! The StreamConfig class is well-structured and follows the established pattern.

The class provides a clean interface for locating stream configurations in the source config, with good examples and documentation. The separate entity approach allows for future extensions as discussed in the past review.


1213-1218: LGTM! The ConfigComponentsResolver class is well-defined and properly marked as experimental.

The class provides a clear interface for resolving components from source config, with all necessary fields and proper documentation.


Line range hint 3039-3041: LGTM! The components_resolver field now properly supports both resolver types.

The change enables dynamic streams to resolve components either via HTTP or from source config, aligning with the PR objectives.


1177-1180: LGTM! The examples effectively demonstrate the new field path capabilities.

The updated examples show:

  • Array access using integer indices: ["data", 1, "name"]
  • Dynamic field access using interpolation: ["data", "{{ components_values.name }}"]
  • Wildcard patterns: ["*", "**", "name"]

Let's verify if there are any existing usages that might be affected:

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

2979-3004: LGTM! The StreamConfig component looks well-designed.

The component structure allows for future extensibility while keeping it focused on its current responsibility. The naming discussion in past reviews was resolved with good rationale.


3039-3041: LGTM! The DynamicDeclarativeStream changes look good.

The addition of ConfigComponentsResolver as an alternative to HttpComponentsResolver is clean and maintains backward compatibility.

@lazebnyi lazebnyi requested a review from maxi297 December 12, 2024 15:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1194-1203: LGTM! Consider adding path validation for configs_pointer.

The new StreamConfig class looks well-designed. To make it more robust, would you consider adding validation to ensure the configs_pointer paths exist in the source config? This could help catch configuration errors early. wdyt?

def validate_configs_pointer(cls, v):
    if not v:
        raise ValueError("configs_pointer cannot be empty")
    return v

configs_pointer: List[str] = Field(
    ...,
    description="A list of potentially nested fields indicating the full path in source config file where streams configs located.",
    examples=[["data"], ["data", "streams"], ["data", "{{ parameters.name }}"]],
    title="Configs Pointer",
    validators=[validate_configs_pointer]
)

1205-1210: LGTM! Consider enhancing the documentation.

The ConfigComponentsResolver class looks good, but would you consider adding more detailed documentation about:

  1. How it interacts with StreamConfig
  2. Examples of typical usage patterns
  3. Best practices for component mapping definitions

This would help other developers understand its purpose better. wdyt?

class ConfigComponentsResolver(BaseModel):
    type: Literal["ConfigComponentsResolver"]
+   """Resolves components using stream configuration.
+   
+   This resolver allows dynamic configuration of components based on source config values.
+   It uses StreamConfig to locate stream configurations and ComponentMappingDefinition
+   to map those configurations to component fields.
+   
+   Example:
+       resolver = ConfigComponentsResolver(
+           stream_config=StreamConfig(configs_pointer=["streams"]),
+           components_mapping=[
+               ComponentMappingDefinition(
+                   field_path=["name"],
+                   value="{{ components_values['stream_name'] }}"
+               )
+           ]
+       )
+   """
    stream_config: StreamConfig
    components_mapping: List[ComponentMappingDefinition]
    parameters: Optional[Dict[str, Any]] = Field(None, alias="$parameters")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95370b2 and 079d32a.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1991-1991: LGTM! The addition of ConfigComponentsResolver enhances flexibility.

The change allows both HTTP and config-based component resolution. Let's verify that existing HTTP-based resolvers continue to work as expected.

✅ Verification successful

The HttpComponentsResolver integration looks solid and well-maintained

The codebase shows proper integration of HttpComponentsResolver across multiple files with clear dependency chains:

  • Defined as a concrete implementation of ComponentsResolver
  • Properly exported in __init__.py
  • Integrated into the component transformation and factory layers
  • Has established mapping definitions for retrievers and components

The addition of ConfigComponentsResolver as a Union type is a safe enhancement that maintains backward compatibility with existing HTTP-based resolvers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for existing uses of HttpComponentsResolver to ensure
# backward compatibility

# Search for HttpComponentsResolver usage
rg -U 'HttpComponentsResolver' -g '!*test*' -g '!*schema*'

Length of output: 1876


1169-1172: LGTM! The addition of integer support in field paths enhances flexibility.

The change allows accessing array elements by index, which is a useful enhancement. Let's verify that the integer indices are properly handled in the implementation.

✅ Verification successful

The previous search might have failed because the regex pattern was too specific. Let's try a broader search to understand the field path implementation and integer handling.


Let's look at the implementation to verify how integer indices are handled in field paths.


The previous patterns didn't match. Let's try to find the implementation and tests with broader patterns to understand how field paths and integer indices are handled.


We found the DpathExtractor class but need to see its implementation and tests to understand how integer indices are handled. Let's look at those specifically.


LGTM! The addition of integer support in field paths looks safe and well-designed.

Looking at the test file, the DpathExtractor already handles various path expressions including array access with wildcards (*). The addition of integer indices is a natural extension that follows the same pattern. The test cases show that the extractor correctly handles:

  • Nested paths with array access
  • Various data structures including nested arrays and objects
  • Different types of field path expressions

The change is backward compatible since existing usages with string paths and wildcards continue to work as before. What do you think about adding a test case specifically for integer indices to make the new functionality more explicit? 🤔

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any existing usages of integer indices in field paths
# to ensure backward compatibility

# Search for field_path usage with integer indices
rg -U 'field_path.*\[.*\d+.*\]' -g '!*test*'

Length of output: 46


Script:

#!/bin/bash
# Let's check for field path related implementations to understand how integers are handled
ast-grep --pattern 'class $_ {
  $$$
  field_path($$$) {
    $$$
  }
  $$$
}'

# Also search for any field path related code
rg -U "field_path.*\[" -A 5 -B 5

Length of output: 103740


Script:

#!/bin/bash
# Let's check the DpathExtractor implementation to understand how field paths are handled
ast-grep --pattern 'class DpathExtractor {
  $$$
  def get_records($$$) {
    $$$
  }
  $$$
}'

# Also search for any tests that verify integer index handling
rg -U "test.*field_path.*\d+" -A 5 -B 5

Length of output: 138


Script:

#!/bin/bash
# Let's try a different pattern to find the DpathExtractor implementation
rg -U "class DpathExtractor" -A 20 -B 5

# Also look for test files related to DpathExtractor
rg -U "test.*DpathExtractor.*field_path" -A 10 -B 10

Length of output: 5355


Script:

#!/bin/bash
# Let's look at the implementation of get_records in DpathExtractor
rg -U "def get_records" -A 20 -B 5 airbyte_cdk/sources/declarative/extractors/dpath_extractor.py

# And check for any test files specifically for DpathExtractor
fd "test.*dpath.*\.py" -x cat {}

Length of output: 4549

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing those concerns

Copy link
Contributor

@aldogonzalez8 aldogonzalez8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, changes make sense to me.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1220-1229: LGTM! Consider enhancing the experimental warning with usage guidance.

The StreamConfig class looks well-designed. Since it's marked experimental, would it be helpful to add a note about potential API changes and migration paths in the description? Something like:

-    description: (This component is experimental. Use at your own risk.) Describes how to get streams config from the source config.
+    description: (This component is experimental. Use at your own risk. The API may change in future versions.) Describes how to get streams config from the source config. Consider using HttpComponentsResolver for production use cases until this component is stable.

wdyt?


1231-1236: LGTM! The integration looks clean. Consider enhancing the experimental warning.

The ConfigComponentsResolver and its integration with DynamicDeclarativeStream look good. Since this is an experimental feature, would it be helpful to add more context about when to use this vs HttpComponentsResolver? Maybe something like:

-    description: (This component is experimental. Use at your own risk.) Resolves and populates stream templates with components fetched from the source config.
+    description: (This component is experimental. Use at your own risk.) Resolves and populates stream templates with components fetched from the source config. Use this when stream configurations are defined in the source config rather than retrieved via HTTP. For HTTP-based configurations, use HttpComponentsResolver instead.

wdyt?

Also applies to: 2033-2033

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 079d32a and f118a88.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1195-1198: LGTM! The field_path examples nicely demonstrate the enhanced capabilities.

The new examples effectively show how to:

  • Use integer indices for array access: ["data", 1, "name"]
  • Use wildcards for flexible path matching: ["*", "**", "name"]
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

Line range hint 1-3500: Skipping review of generated file.

This file is auto-generated by datamodel-codegen from the Python models, so it doesn't need direct review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (3)

27-27: Consider enriching the test config

The _CONFIG looks good for basic testing, but should we add more fields that might be commonly used in real scenarios (like api_key that's referenced in the manifest)? wdyt? 🤔

 _CONFIG = {
     "start_date": "2024-07-01T00:00:00.000Z",
+    "api_key": "test_key",
+    "base_url": "https://api.test.com"
 }

114-198: Consider reducing duplication in test manifests

I notice _MANIFEST_WITH_DUPLICATES is largely identical to _MANIFEST except for the path value. Maybe we could reduce duplication by creating a helper function to generate manifests with different paths? wdyt? 🤔

def create_test_manifest(path: str = "items") -> dict:
    """Create a test manifest with the given path."""
    return {
        "version": "6.7.0",
        # ... rest of the common manifest structure
        "components_resolver": {
            "type": "HttpComponentsResolver",
            "retriever": {
                # ... common retriever config
                "requester": {
                    # ... common requester config
                    "path": path,
                },
            },
        },
    }

# Usage
_MANIFEST = create_test_manifest()
_MANIFEST_WITH_DUPLICATES = create_test_manifest("duplicates_items")

Line range hint 237-280: Consider enhancing test assertions

The test looks good with the improved name and proper config usage! A few suggestions to make it even more robust:

  1. Should we assert the actual response data structure, not just the stream names?
  2. Maybe add assertions for the error cases too?

wdyt about something like this? 🤔

def test_dynamic_streams_read_with_http_components_resolver():
    expected_streams = [
        {"name": "item_1", "id": "1"},
        {"name": "item_2", "id": "2"}
    ]
    # ... existing setup code ...
    
    # Assert catalog structure
    assert len(actual_catalog.streams) == len(expected_streams)
    for stream, expected in zip(actual_catalog.streams, expected_streams):
        assert stream.name == expected["name"]
        assert stream.json_schema is not None  # Verify schema exists
    
    # Assert record data
    assert len(records) == len(expected_streams)
    for record, expected in zip(records, expected_streams):
        assert record.stream == expected["name"]
        assert record.data["id"] == expected["id"]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba7b95e and b867154.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (5 hunks)
🔇 Additional comments (1)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

24-24: Heads up: New import suggests missing test case

I notice we're importing AirbyteTracedException, but I don't see a test case that uses it yet. Should we add a test for the duplicate streams error case? wdyt? 🤔

Would you like me to help draft a test case for handling duplicate stream errors?

@lazebnyi lazebnyi force-pushed the lazebnyi/add-config-components-resolver branch from b867154 to ba7b95e Compare December 12, 2024 17:41
@lazebnyi lazebnyi merged commit ceebfda into main Dec 12, 2024
31 of 35 checks passed
@lazebnyi lazebnyi deleted the lazebnyi/add-config-components-resolver branch December 12, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants