Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SDK config #1232

Merged
merged 3 commits into from
Nov 20, 2024
Merged

Add SDK config #1232

merged 3 commits into from
Nov 20, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Nov 17, 2024

This adds a few things to enable one to more
easily customize what is captured via the SDK.

  1. Should you capture any data statistics at all?
  2. Max lengths of dicts and lists -> so they don't get too big.

You can configure this via three means:

  1. python constants
  2. config file variables
  3. environment variables

There is a prescedence order. So there are default values in the module.
You can then override them via config file variables.
Which then can in turn be overriden by environment variables.
Lastly the user can always modify the constants by directly
changing the module variables.

Note: we also skip logging metadata from datasavers and loaders if
the CAPTURE_DATA_STATISTICS = False. We can fix this by special
casing it, but for now I don't think people would complain with
the current functionality.

We should still enable #921 but I think this is a simpler route for now.

Changes

  • telemetry.py
  • sdk related files

How I tested this

  • locally

Notes

This is related to:

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to fc6093a in 44 seconds

More details
  • Looked at 255 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. ui/sdk/src/hamilton_sdk/tracking/constants.py:28
  • Draft comment:
    The _load_config function is repeated in multiple files. Consider refactoring to maintain a single implementation to adhere to the DRY principle.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is about a potential DRY violation, which is a valid concern. However, without evidence from the diff that the function is repeated, the comment is speculative. The comment should be removed unless there is strong evidence within the diff itself.
    I might be missing the context from other files where the function is repeated. However, the rules state to focus only on the file in the diff.
    The rules are clear about focusing on the file in the diff, so without evidence in this file, the comment should be removed.
    Remove the comment as it is speculative and not supported by evidence in the diff.

Workflow ID: wflow_9ePQ1BbWNVunb0N6


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 76606ee in 3 seconds

More details
  • Looked at 17 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. tests/plugins/test_plotly_extensions.py:19
  • Draft comment:
    Consider adding a comment explaining why setting BROWSER_PATH to an empty string is necessary for PlotlyStaticWriter to work. This will help future developers understand the context.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The environment variable 'BROWSER_PATH' is set to an empty string to make PlotlyStaticWriter work. This should be documented or explained in the code comments for clarity.
2. tests/plugins/test_plotly_extensions.py:19
  • Draft comment:
    Avoid setting environment variables like BROWSER_PATH directly in the code without proper documentation or explanation, as it might be considered sensitive or unnecessary. Consider adding a comment explaining why this is safe or necessary.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code sets an environment variable BROWSER_PATH to an empty string, which might be considered sensitive or unnecessary if not properly documented. However, since this is a test file, it might be acceptable, but a comment explaining why this is safe or necessary would be helpful.

Workflow ID: wflow_HyUqsLgX9Vo6blrB


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 09491de in 16 seconds

More details
  • Looked at 74 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. ui/sdk/tests/tracking/test_constants.py:5
  • Draft comment:
    Consider adding a test case for an empty string to ensure _convert_to_type handles it correctly.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test function is well-structured and covers various cases for the _convert_to_type function. However, it would be beneficial to add a test case for an empty string to ensure the function handles it correctly.
2. ui/sdk/src/hamilton_sdk/tracking/constants.py:74
  • Draft comment:
    Consider using _convert_to_type for CAPTURE_DATA_STATISTICS to ensure consistent type conversion logic.
CAPTURE_DATA_STATISTICS = _convert_to_type(os.getenv("HAMILTON_CAPTURE_DATA_STATISTICS", CAPTURE_DATA_STATISTICS))
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The current implementation of environment variable handling for CAPTURE_DATA_STATISTICS is not consistent with the _convert_to_type function. It would be better to use _convert_to_type for consistency and to handle potential future changes in type conversion logic.
3. ui/sdk/tests/tracking/test_constants.py:5
  • Draft comment:
    Function name test__convert_to_type should follow a consistent naming pattern. Consider renaming it to something more descriptive and consistent with other test function names.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function _convert_to_type is used to convert string values to their appropriate types. This function is tested in test__convert_to_type, which is good. However, the function name test__convert_to_type does not follow a consistent naming pattern with the rest of the codebase. It should be more descriptive and consistent with other test function names.

Workflow ID: wflow_PvxHwUKjsFDIIWzr


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/constants.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 1decd00 in 12 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. tests/plugins/test_plotly_extensions.py:19
  • Draft comment:
    Avoid hardcoding sensitive paths like BROWSER_PATH. Consider using a configuration file or environment variable.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_PplrWqit4WSxXY3S


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

tests/plugins/test_plotly_extensions.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on f890278 in 15 seconds

More details
  • Looked at 35 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. ui/sdk/src/hamilton_sdk/tracking/constants.py:45
  • Draft comment:
    Consider logging the exception in the _load_config function to help with debugging if the config file cannot be read.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code is mostly well-structured, but there are some areas for improvement.
2. ui/sdk/src/hamilton_sdk/tracking/constants.py:48
  • Draft comment:
    The function _convert_to_type is well-implemented, but consider adding type hints for better clarity and maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 30%
    The code is mostly well-structured, but there are some areas for improvement.

Workflow ID: wflow_w0enUsxTnfM6Eiju


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/constants.py Show resolved Hide resolved
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking reasonable, let's:

  1. SImplify the way we use globals
  2. Add docs

Will take another quick look after docs then let's merge.

Also, some features we'll want down the road:

  1. Turn on/off per-node/type/attribute name
  2. Add a max-payload size, where it gets dropped afterwards

Although (2) is a bit more complicated.

These should be in the docs IMO as next steps.

tests/plugins/test_plotly_extensions.py Outdated Show resolved Hide resolved
return config


_constant_values = globals()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you should just hardcode this to a dict then we don't have to deal with globals. E.G.

constants_values={"CAPTURE_DATA_STATISTICS" : True, ...}

Then you just load up the file and modify that. Should simplify the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't like that because that would then encumber directly accessing the module values; or I could do something like this at the end to push them all into the module space... either way playing with globals.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a single global versus multiple? Don't see why we want to directly access module values.

This is for flexibility in determining where to look for the
config file.
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on ff484f5 in 3 seconds

More details
  • Looked at 100 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/hamilton-ui/ui.rst:308
  • Draft comment:
    Typo: 'prescedence' should be 'precedence'.
precedence
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The documentation is clear and provides detailed instructions on how to configure the SDK. However, there is a minor typo in the word 'precedence'.
2. docs/hamilton-ui/ui.rst:239
  • Draft comment:
    This section on SDK Configuration is a valuable addition to the documentation. It provides clear instructions on how to configure the HamiltonTracker, which is beneficial for users. Ensure this is included in the Sphinx documentation.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The documentation added in this PR is comprehensive and provides clear instructions on SDK configuration. It is appropriate to include this in the Sphinx documentation as it enhances user understanding of configuration options.

Workflow ID: wflow_C3u8h0DLm7G50ZOO


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 25ace32 in 5 seconds

More details
  • Looked at 28 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/hamilton-ui/ui.rst:278
  • Draft comment:
    Add a newline before the code block for better readability.

.. code-block:: bash
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The code block formatting is inconsistent in the added sections. Adding a newline before the code block can improve readability.
2. docs/hamilton-ui/ui.rst:306
  • Draft comment:
    Consider adding a section in the Sphinx documentation under docs/ to explain the new SDK configuration options and how to use them. This will help users understand the new features and how to configure them effectively.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The documentation update is appropriate for the changes made in the PR. It explains how to configure the SDK using different methods, which aligns with the changes made in the code.

Workflow ID: wflow_iEZJjJDA7dtHg5xt


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on c9a855b in 3 seconds

More details
  • Looked at 64 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/hamilton-ui/ui.rst:308
  • Draft comment:
    Add a blank line before the code block for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The documentation is clear and provides detailed instructions on how to change the configuration settings. However, there is a minor formatting issue in the code block for changing defaults via environment variables and directly. Adding a blank line before the code block would improve readability.
2. docs/hamilton-ui/ui.rst:241
  • Draft comment:
    Consider adding a section here to document the new configuration options for CAPTURE_DATA_STATISTICS, MAX_LIST_LENGTH_CAPTURE, and MAX_DICT_LENGTH_CAPTURE as they are part of the new SDK configuration options.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The documentation update is appropriate for the changes made in the PR. It explains the new configuration options and how to use them.

Workflow ID: wflow_Qu8TXAszeDsvqkEo


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 498068e in 2 seconds

More details
  • Looked at 93 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/hamilton-ui/ui.rst:280
  • Draft comment:
    The table alignment directive :align: left is not necessary and can be removed for cleaner formatting.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The documentation is clear and well-structured, but there is a minor formatting issue with the table alignment.
2. docs/hamilton-ui/ui.rst:241
  • Draft comment:
    Consider adding a reference to this section in the main documentation index to improve discoverability.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    The documentation update is appropriate and provides clear instructions on SDK configuration. It aligns with the changes made in the code.

Workflow ID: wflow_sZbYU0azgyFHoNxt


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but I don't see the need to change module-level variables for configuration? Just have a .set() function that takes in the values? I guess it's nice for auto-complete, but it should work with a dict anyway.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 3a6a5c1 in 34 seconds

More details
  • Looked at 27 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. ui/sdk/src/hamilton_sdk/tracking/utils.py:50
  • Draft comment:
    Consider adding a comment here to explain that the .head() method is used to truncate dataframes to their head rows, as indicated in the test file.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The comment in the test file indicates that the .head() method is being used to truncate data, but this is not documented in the main code. A comment should be added to the main code to explain this behavior.

Workflow ID: wflow_BZhQXixtVXdUlOXL


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/utils.py Outdated Show resolved Hide resolved
ui/sdk/src/hamilton_sdk/tracking/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on c7fe7c0 in 26 seconds

More details
  • Looked at 27 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. ui/sdk/tests/tracking/test_utils.py:139
  • Draft comment:
    The test does not account for the .head() call in make_json_safe. Update the expected result to match the truncated series output.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The test case already has a comment explaining the skipped element due to .head(), and the expected result matches this explanation. The automated comment seems redundant as the change is already reflected in the test case.
    I might be missing some context about the make_json_safe function's behavior, but based on the test case, it seems the expected result is already correct.
    The test case's comment and expected result align, indicating the change is already accounted for. The automated comment does not provide new information.
    The automated comment is unnecessary as the test case already accounts for the .head() call, and the expected result is correct.

Workflow ID: wflow_9QpLRVxNl8Dwtkft


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/utils.py Show resolved Hide resolved
This adds a few things to enable one to more
easily customize what is captured via the SDK, and
adds docs.

1. Should you capture any data statistics at all?
2. Max lengths of dicts and lists -> so they don't get too big.

You can configure this via three means:
1. python constants
2. config file variables
3. environment variables

There is a precedence order. So there are default values in the module.
You can then override them via config file variables.
Which then can in turn be overridden by environment variables.
Lastly the user can always modify the constants by directly
changing the module variables.

Note: we also skip logging metadata from datasavers and loaders if
the CAPTURE_DATA_STATISTICS = False. We can fix this by special
casing it, but for now I don't think people would complain with
the current functionality.

We use globals and modify state of the constants to help ensure
a great developer experience / static analysis is able to find
where people are using the constants easily.

Squashed commits:

Tweaks docs on sdk constants (+4 squashed commits)
Squashed commits:
[ff484f5] Adds some docs on the SDK configurability
[80c24fa] minor refactor of constants function
[69d12f3] Adds function to convert config types

Config parser types are by default strings. They aren't converted.
This adds a function to do that for ints, floats, and booleans.
[0a4949e] Adds configurable capture for SDK

This adds a few things to enable one to more
easily customize what is captured via the SDK.

1. Should you capture any data statistics at all?
2. Max lengths of dicts and lists -> so they don't get too big.

You can configure this via three means:
1. python constants
2. config file variables
3. environment variables

There is a prescedence order. So there are default values in the module.
You can then override them via config file variables.
Which then can in turn be overriden by environment variables.
Lastly the user can always modify the constants by directly
changing the module variables.

Note: we also skip logging metadata from datasavers and loaders if
the CAPTURE_DATA_STATISTICS = False. We can fix this by special
casing it, but for now I don't think people would complain with
the current functionality.
This should ideally use the same data observability introspection
code path. It doesn't. So this is a stop gap measure to log the
head rows of a dataframe if you use one as input.
@skrawcz skrawcz merged commit 7026038 into main Nov 20, 2024
27 checks passed
@skrawcz skrawcz deleted the add_sdk_config branch November 20, 2024 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants