Add SDK config #1232

skrawcz · 2024-11-17T01:43:11Z

This adds a few things to enable one to more
easily customize what is captured via the SDK.

Should you capture any data statistics at all?
Max lengths of dicts and lists -> so they don't get too big.

You can configure this via three means:

python constants
config file variables
environment variables

There is a prescedence order. So there are default values in the module.
You can then override them via config file variables.
Which then can in turn be overriden by environment variables.
Lastly the user can always modify the constants by directly
changing the module variables.

Note: we also skip logging metadata from datasavers and loaders if
the CAPTURE_DATA_STATISTICS = False. We can fix this by special
casing it, but for now I don't think people would complain with
the current functionality.

We should still enable #921 but I think this is a simpler route for now.

Changes

telemetry.py
sdk related files

How I tested this

locally

Notes

This is related to:

Enable configuration/tags to turn off data introspection in SDK #921 this still leaves that on the table, but a different implementation
Hamilton UI - Configuration menu for clean up and tracked execution infos #1228 this would give people an option to not log too much

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

ellipsis-dev

❌ Changes requested. Reviewed everything up to fc6093a in 44 seconds

More details

Looked at 255 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. ui/sdk/src/hamilton_sdk/tracking/constants.py:28

Draft comment:
The _load_config function is repeated in multiple files. Consider refactoring to maintain a single implementation to adhere to the DRY principle.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about a potential DRY violation, which is a valid concern. However, without evidence from the diff that the function is repeated, the comment is speculative. The comment should be removed unless there is strong evidence within the diff itself.
I might be missing the context from other files where the function is repeated. However, the rules state to focus only on the file in the diff.
The rules are clear about focusing on the file in the diff, so without evidence in this file, the comment should be removed.
Remove the comment as it is speculative and not supported by evidence in the diff.

Workflow ID: wflow_9ePQ1BbWNVunb0N6

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/constants.py

ui/sdk/src/hamilton_sdk/tracking/data_observation.py

ellipsis-dev

👍 Looks good to me! Incremental review on 76606ee in 3 seconds

More details

Looked at 17 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. tests/plugins/test_plotly_extensions.py:19

Draft comment:
Consider adding a comment explaining why setting BROWSER_PATH to an empty string is necessary for PlotlyStaticWriter to work. This will help future developers understand the context.
Reason this comment was not posted:
Confidence changes required: 50%
The environment variable 'BROWSER_PATH' is set to an empty string to make PlotlyStaticWriter work. This should be documented or explained in the code comments for clarity.

2. tests/plugins/test_plotly_extensions.py:19

Draft comment:
Avoid setting environment variables like BROWSER_PATH directly in the code without proper documentation or explanation, as it might be considered sensitive or unnecessary. Consider adding a comment explaining why this is safe or necessary.
Reason this comment was not posted:
Confidence changes required: 50%
The code sets an environment variable BROWSER_PATH to an empty string, which might be considered sensitive or unnecessary if not properly documented. However, since this is a test file, it might be acceptable, but a comment explaining why this is safe or necessary would be helpful.

Workflow ID: wflow_HyUqsLgX9Vo6blrB

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on 09491de in 16 seconds

More details

Looked at 74 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. ui/sdk/tests/tracking/test_constants.py:5

Draft comment:
Consider adding a test case for an empty string to ensure _convert_to_type handles it correctly.
Reason this comment was not posted:
Confidence changes required: 50%
The test function is well-structured and covers various cases for the _convert_to_type function. However, it would be beneficial to add a test case for an empty string to ensure the function handles it correctly.

2. ui/sdk/src/hamilton_sdk/tracking/constants.py:74

Draft comment:
Consider using _convert_to_type for CAPTURE_DATA_STATISTICS to ensure consistent type conversion logic.

CAPTURE_DATA_STATISTICS = _convert_to_type(os.getenv("HAMILTON_CAPTURE_DATA_STATISTICS", CAPTURE_DATA_STATISTICS))

Reason this comment was not posted:
Confidence changes required: 50%
The current implementation of environment variable handling for CAPTURE_DATA_STATISTICS is not consistent with the _convert_to_type function. It would be better to use _convert_to_type for consistency and to handle potential future changes in type conversion logic.

3. ui/sdk/tests/tracking/test_constants.py:5

Draft comment:
Function name test__convert_to_type should follow a consistent naming pattern. Consider renaming it to something more descriptive and consistent with other test function names.
Reason this comment was not posted:
Confidence changes required: 50%
The function _convert_to_type is used to convert string values to their appropriate types. This function is tested in test__convert_to_type, which is good. However, the function name test__convert_to_type does not follow a consistent naming pattern with the rest of the codebase. It should be more descriptive and consistent with other test function names.

Workflow ID: wflow_PvxHwUKjsFDIIWzr

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/constants.py

ellipsis-dev

❌ Changes requested. Incremental review on 1decd00 in 12 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. tests/plugins/test_plotly_extensions.py:19

Draft comment:
Avoid hardcoding sensitive paths like BROWSER_PATH. Consider using a configuration file or environment variable.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_PplrWqit4WSxXY3S

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

tests/plugins/test_plotly_extensions.py

ellipsis-dev

❌ Changes requested. Incremental review on f890278 in 15 seconds

More details

Looked at 35 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. ui/sdk/src/hamilton_sdk/tracking/constants.py:45

Draft comment:
Consider logging the exception in the _load_config function to help with debugging if the config file cannot be read.
Reason this comment was not posted:
Confidence changes required: 50%
The code is mostly well-structured, but there are some areas for improvement.

2. ui/sdk/src/hamilton_sdk/tracking/constants.py:48

Draft comment:
The function _convert_to_type is well-implemented, but consider adding type hints for better clarity and maintainability.
Reason this comment was not posted:
Confidence changes required: 30%
The code is mostly well-structured, but there are some areas for improvement.

Workflow ID: wflow_w0enUsxTnfM6Eiju

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/constants.py

elijahbenizzy

Looking reasonable, let's:

SImplify the way we use globals
Add docs

Will take another quick look after docs then let's merge.

Also, some features we'll want down the road:

Turn on/off per-node/type/attribute name
Add a max-payload size, where it gets dropped afterwards

Although (2) is a bit more complicated.

These should be in the docs IMO as next steps.

tests/plugins/test_plotly_extensions.py

elijahbenizzy · 2024-11-17T23:46:41Z

ui/sdk/src/hamilton_sdk/tracking/constants.py

+    return config
+
+
+_constant_values = globals()


I feel like you should just hardcode this to a dict then we don't have to deal with globals. E.G.

constants_values={"CAPTURE_DATA_STATISTICS" : True, ...}

Then you just load up the file and modify that. Should simplify the code.

No I don't like that because that would then encumber directly accessing the module values; or I could do something like this at the end to push them all into the module space... either way playing with globals.

This has a single global versus multiple? Don't see why we want to directly access module values.

This is for flexibility in determining where to look for the config file.

ellipsis-dev

👍 Looks good to me! Incremental review on ff484f5 in 3 seconds

More details

Looked at 100 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. docs/hamilton-ui/ui.rst:308

Draft comment:
Typo: 'prescedence' should be 'precedence'.

precedence

Reason this comment was not posted:
Confidence changes required: 10%
The documentation is clear and provides detailed instructions on how to configure the SDK. However, there is a minor typo in the word 'precedence'.

2. docs/hamilton-ui/ui.rst:239

Draft comment:
This section on SDK Configuration is a valuable addition to the documentation. It provides clear instructions on how to configure the HamiltonTracker, which is beneficial for users. Ensure this is included in the Sphinx documentation.
Reason this comment was not posted:
Confidence changes required: 0%
The documentation added in this PR is comprehensive and provides clear instructions on SDK configuration. It is appropriate to include this in the Sphinx documentation as it enhances user understanding of configuration options.

Workflow ID: wflow_C3u8h0DLm7G50ZOO

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 25ace32 in 5 seconds

More details

Looked at 28 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. docs/hamilton-ui/ui.rst:278

Draft comment:
Add a newline before the code block for better readability.


.. code-block:: bash

Reason this comment was not posted:
Confidence changes required: 10%
The code block formatting is inconsistent in the added sections. Adding a newline before the code block can improve readability.

2. docs/hamilton-ui/ui.rst:306

Draft comment:
Consider adding a section in the Sphinx documentation under docs/ to explain the new SDK configuration options and how to use them. This will help users understand the new features and how to configure them effectively.
Reason this comment was not posted:
Confidence changes required: 50%
The documentation update is appropriate for the changes made in the PR. It explains how to configure the SDK using different methods, which aligns with the changes made in the code.

Workflow ID: wflow_iEZJjJDA7dtHg5xt

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on c9a855b in 3 seconds

More details

Looked at 64 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. docs/hamilton-ui/ui.rst:308

Draft comment:
Add a blank line before the code block for better readability.
Reason this comment was not posted:
Confidence changes required: 10%
The documentation is clear and provides detailed instructions on how to change the configuration settings. However, there is a minor formatting issue in the code block for changing defaults via environment variables and directly. Adding a blank line before the code block would improve readability.

2. docs/hamilton-ui/ui.rst:241

Draft comment:
Consider adding a section here to document the new configuration options for CAPTURE_DATA_STATISTICS, MAX_LIST_LENGTH_CAPTURE, and MAX_DICT_LENGTH_CAPTURE as they are part of the new SDK configuration options.
Reason this comment was not posted:
Confidence changes required: 50%
The documentation update is appropriate for the changes made in the PR. It explains the new configuration options and how to use them.

Workflow ID: wflow_Qu8TXAszeDsvqkEo

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 498068e in 2 seconds

More details

Looked at 93 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. docs/hamilton-ui/ui.rst:280

Draft comment:
The table alignment directive :align: left is not necessary and can be removed for cleaner formatting.
Reason this comment was not posted:
Confidence changes required: 10%
The documentation is clear and well-structured, but there is a minor formatting issue with the table alignment.

2. docs/hamilton-ui/ui.rst:241

Draft comment:
Consider adding a reference to this section in the main documentation index to improve discoverability.
Reason this comment was not posted:
Confidence changes required: 20%
The documentation update is appropriate and provides clear instructions on SDK configuration. It aligns with the changes made in the code.

Workflow ID: wflow_sZbYU0azgyFHoNxt

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

elijahbenizzy

This is fine but I don't see the need to change module-level variables for configuration? Just have a .set() function that takes in the values? I guess it's nice for auto-complete, but it should work with a dict anyway.

ellipsis-dev

❌ Changes requested. Incremental review on 3a6a5c1 in 34 seconds

More details

Looked at 27 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. ui/sdk/src/hamilton_sdk/tracking/utils.py:50

Draft comment:
Consider adding a comment here to explain that the .head() method is used to truncate dataframes to their head rows, as indicated in the test file.
Reason this comment was not posted:
Confidence changes required: 50%
The comment in the test file indicates that the .head() method is being used to truncate data, but this is not documented in the main code. A comment should be added to the main code to explain this behavior.

Workflow ID: wflow_BZhQXixtVXdUlOXL

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/utils.py

ellipsis-dev

❌ Changes requested. Incremental review on c7fe7c0 in 26 seconds

More details

Looked at 27 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. ui/sdk/tests/tracking/test_utils.py:139

Draft comment:
The test does not account for the .head() call in make_json_safe. Update the expected result to match the truncated series output.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The test case already has a comment explaining the skipped element due to .head(), and the expected result matches this explanation. The automated comment seems redundant as the change is already reflected in the test case.
I might be missing some context about the make_json_safe function's behavior, but based on the test case, it seems the expected result is already correct.
The test case's comment and expected result align, indicating the change is already accounted for. The automated comment does not provide new information.
The automated comment is unnecessary as the test case already accounts for the .head() call, and the expected result is correct.

Workflow ID: wflow_9QpLRVxNl8Dwtkft

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ui/sdk/src/hamilton_sdk/tracking/utils.py

This adds a few things to enable one to more easily customize what is captured via the SDK, and adds docs. 1. Should you capture any data statistics at all? 2. Max lengths of dicts and lists -> so they don't get too big. You can configure this via three means: 1. python constants 2. config file variables 3. environment variables There is a precedence order. So there are default values in the module. You can then override them via config file variables. Which then can in turn be overridden by environment variables. Lastly the user can always modify the constants by directly changing the module variables. Note: we also skip logging metadata from datasavers and loaders if the CAPTURE_DATA_STATISTICS = False. We can fix this by special casing it, but for now I don't think people would complain with the current functionality. We use globals and modify state of the constants to help ensure a great developer experience / static analysis is able to find where people are using the constants easily. Squashed commits: Tweaks docs on sdk constants (+4 squashed commits) Squashed commits: [ff484f5] Adds some docs on the SDK configurability [80c24fa] minor refactor of constants function [69d12f3] Adds function to convert config types Config parser types are by default strings. They aren't converted. This adds a function to do that for ints, floats, and booleans. [0a4949e] Adds configurable capture for SDK This adds a few things to enable one to more easily customize what is captured via the SDK. 1. Should you capture any data statistics at all? 2. Max lengths of dicts and lists -> so they don't get too big. You can configure this via three means: 1. python constants 2. config file variables 3. environment variables There is a prescedence order. So there are default values in the module. You can then override them via config file variables. Which then can in turn be overriden by environment variables. Lastly the user can always modify the constants by directly changing the module variables. Note: we also skip logging metadata from datasavers and loaders if the CAPTURE_DATA_STATISTICS = False. We can fix this by special casing it, but for now I don't think people would complain with the current functionality.

This should ideally use the same data observability introspection code path. It doesn't. So this is a stop gap measure to log the head rows of a dataframe if you use one as input.

ellipsis-dev bot reviewed Nov 17, 2024

View reviewed changes

ui/sdk/src/hamilton_sdk/tracking/constants.py Show resolved Hide resolved

ui/sdk/src/hamilton_sdk/tracking/data_observation.py Show resolved Hide resolved

ui/sdk/src/hamilton_sdk/tracking/data_observation.py Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 17, 2024

View reviewed changes

skrawcz requested a review from elijahbenizzy November 17, 2024 08:14

ellipsis-dev bot reviewed Nov 17, 2024

View reviewed changes

ui/sdk/src/hamilton_sdk/tracking/constants.py Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 17, 2024

View reviewed changes

tests/plugins/test_plotly_extensions.py Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 17, 2024

View reviewed changes

ui/sdk/src/hamilton_sdk/tracking/constants.py Show resolved Hide resolved

elijahbenizzy reviewed Nov 17, 2024

View reviewed changes

Adds environment variable to change hamilton config location

177983c

This is for flexibility in determining where to look for the config file.

skrawcz force-pushed the add_sdk_config branch from f890278 to 80c24fa Compare November 18, 2024 04:29

ellipsis-dev bot reviewed Nov 18, 2024

View reviewed changes

skrawcz force-pushed the add_sdk_config branch from 25ace32 to c9a855b Compare November 18, 2024 05:24

ellipsis-dev bot reviewed Nov 18, 2024

View reviewed changes

skrawcz force-pushed the add_sdk_config branch from c9a855b to 498068e Compare November 18, 2024 05:27

ellipsis-dev bot reviewed Nov 18, 2024

View reviewed changes

elijahbenizzy approved these changes Nov 18, 2024

View reviewed changes

skrawcz mentioned this pull request Nov 18, 2024

Hamilton UI - Configuration menu for clean up and tracked execution infos #1228

Open

ellipsis-dev bot reviewed Nov 19, 2024

View reviewed changes

ui/sdk/src/hamilton_sdk/tracking/utils.py Outdated Show resolved Hide resolved

ui/sdk/src/hamilton_sdk/tracking/utils.py Outdated Show resolved Hide resolved

skrawcz force-pushed the add_sdk_config branch from 3a6a5c1 to c7fe7c0 Compare November 19, 2024 17:24

ellipsis-dev bot reviewed Nov 19, 2024

View reviewed changes

ui/sdk/src/hamilton_sdk/tracking/utils.py Show resolved Hide resolved

skrawcz added 2 commits November 19, 2024 16:57

Fixes SDK large DF input issue

4088955

This should ideally use the same data observability introspection code path. It doesn't. So this is a stop gap measure to log the head rows of a dataframe if you use one as input.

skrawcz force-pushed the add_sdk_config branch from c7fe7c0 to 4088955 Compare November 20, 2024 00:57

skrawcz merged commit 7026038 into main Nov 20, 2024
27 checks passed

skrawcz deleted the add_sdk_config branch November 20, 2024 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SDK config #1232

Add SDK config #1232

skrawcz commented Nov 17, 2024 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

elijahbenizzy left a comment •

edited

Loading

elijahbenizzy Nov 17, 2024

skrawcz Nov 18, 2024

elijahbenizzy Nov 18, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

elijahbenizzy left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Add SDK config #1232

Add SDK config #1232

Conversation

skrawcz commented Nov 17, 2024 • edited Loading

Changes

How I tested this

Notes

Checklist

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

elijahbenizzy left a comment • edited Loading

Choose a reason for hiding this comment

elijahbenizzy Nov 17, 2024

Choose a reason for hiding this comment

skrawcz Nov 18, 2024

Choose a reason for hiding this comment

elijahbenizzy Nov 18, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

elijahbenizzy left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

skrawcz commented Nov 17, 2024 •

edited

Loading

elijahbenizzy left a comment •

edited

Loading