Skip to content

Commit

Permalink
Issue#54 SMTP authentication (#124)
Browse files Browse the repository at this point in the history
* smtp authentication

* addressing comments after sprint review

* addressing comments after sprint review

* addressing comments after sprint review

* addressing comments after sprint review

* adding url links to docs

---------

Co-authored-by: Yevheniia Nikonchuk <[email protected]>
  • Loading branch information
Jennikon and Yevheniia Nikonchuk authored Jan 24, 2025
1 parent 0c265d9 commit 9e93f15
Show file tree
Hide file tree
Showing 21 changed files with 1,340 additions and 626 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ se_user_conf = {
#Below two params are optional and need to be enabled to pass the custom email body
#user_config.se_notifications_enable_custom_email_body: True,
#user_config.se_notifications_email_custom_body: "Custom statistics: 'product_id': {}",

#Below parameter is optional and needs to be enabled in case authorization is required to access smtp server.
#user_config.se_notifications_email_smtp_auth: True,
}
```

Expand Down
18 changes: 16 additions & 2 deletions docs/bigquery.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### Example - Write to Delta

Setup SparkSession for BigQuery to test in your local environment. Configure accordingly for higher environments.
Refer to Examples in [base_setup.py](../spark_expectations/examples/base_setup.py) and
[delta.py](../spark_expectations/examples/sample_dq_bigquery.py)
Refer to Examples in [base_setup.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/base_setup.py) and
[bigquery.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/sample_dq_bigquery.py)

```python title="spark_session"
from pyspark.sql import SparkSession
Expand Down Expand Up @@ -55,13 +55,27 @@ se: SparkExpectations = SparkExpectations(
stats_streaming_options={user_config.se_enable_streaming: False}
)

#if smtp server needs to be authenticated, password can be passed directly with user config or set in a secure way like cerberus or databricks secret
smtp_creds_dict = {
user_config.secret_type: "cerberus",
user_config.cbs_url: "htpps://cerberus.example.com",
user_config.cbs_sdb_path: "",
user_config.cbs_smtp_password: "",
# user_config.secret_type: "databricks",
# user_config.dbx_workspace_url: "https://workspace.cloud.databricks.com",
# user_config.dbx_secret_scope: "your_secret_scope",
# user_config.dbx_smtp_password: "your_password",
}

# Commented fields are optional or required when notifications are enabled
user_conf = {
user_config.se_notifications_enable_email: False,
# user_config.se_notifications_enable_smtp_server_auth: False,
# user_config.se_notifications_enable_custom_email_body: True,
# user_config.se_notifications_email_smtp_host: "mailhost.com",
# user_config.se_notifications_email_smtp_port: 25,
# user_config.se_notifications_smtp_password: "your_password",
# user_config.se_notifications_smtp_creds_dict: smtp_creds_dict,
# user_config.se_notifications_email_from: "",
# user_config.se_notifications_email_to_other_mail_id: "",
# user_config.se_notifications_email_subject: "spark expectations - data quality - notifications",
Expand Down
19 changes: 17 additions & 2 deletions docs/delta.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### Example - Write to Delta

Setup SparkSession for Delta Lake to test in your local environment. Configure accordingly for higher environments.
Refer to Examples in [base_setup.py](../spark_expectations/examples/base_setup.py) and
[delta.py](../spark_expectations/examples/sample_dq_delta.py)
Refer to Examples in [base_setup.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/base_setup.py) and
[delta.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/sample_dq_delta.py)

```python title="spark_session"
from pyspark.sql import SparkSession
Expand Down Expand Up @@ -46,12 +46,27 @@ se: SparkExpectations = SparkExpectations(
stats_streaming_options={user_config.se_enable_streaming: False}
)

#if smtp server needs to be authenticated, password can be passed directly with user config or set in a secure way like cerberus or databricks secret
smtp_creds_dict = {
user_config.secret_type: "cerberus",
user_config.cbs_url: "htpps://cerberus.example.com",
user_config.cbs_sdb_path: "",
user_config.cbs_smtp_password: "",
# user_config.secret_type: "databricks",
# user_config.dbx_workspace_url: "https://workspace.cloud.databricks.com",
# user_config.dbx_secret_scope: "your_secret_scope",
# user_config.dbx_smtp_password: "your_password",
}

# Commented fields are optional or required when notifications are enabled
user_conf = {
user_config.se_notifications_enable_email: False,
# user_config.se_notifications_enable_smtp_server_auth: False,
# user_config.se_notifications_enable_custom_email_body: True,
# user_config.se_notifications_email_smtp_host: "mailhost.com",
# user_config.se_notifications_email_smtp_port: 25,
# user_config.se_notifications_smtp_password: "your_password",
# user_config.se_notifications_smtp_creds_dict: smtp_creds_dict,
# user_config.se_notifications_email_from: "",
# user_config.se_notifications_email_to_other_mail_id: "",
# user_config.se_notifications_email_subject: "spark expectations - data quality - notifications",
Expand Down
124 changes: 84 additions & 40 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,56 +8,100 @@ from spark_expectations.config.user_config import Constants as user_config

se_user_conf = {
user_config.se_notifications_enable_email: False, # (1)!
user_config.se_notifications_enable_custom_email_body: False, # (2)
user_config.se_notifications_email_smtp_host: "mailhost.com", # (3)!
user_config.se_notifications_email_smtp_port: 25, # (4)!
user_config.se_notifications_email_from: "<sender_email_id>", # (5)!
user_config.se_notifications_email_to_other_mail_id: "<receiver_email_id's>", # (6)!
user_config.se_notifications_email_subject: "spark expectations - data quality - notifications", # (7)!
user_config.se_notifications_email_custom_body: "custom stats: 'product_id': {}", # (8)!
user_config.se_notifications_enable_slack: True, # (9)!
user_config.se_notifications_slack_webhook_url: "<slack-webhook-url>", # (10)!
user_config.se_notifications_on_start: True, # (11)!
user_config.se_notifications_on_completion: True, # (12)!
user_config.se_notifications_on_fail: True, # (13)!
user_config.se_notifications_on_error_drop_exceeds_threshold_breach: True, # (14)!
user_config.se_notifications_on_rules_action_if_failed_set_ignore: True, # (15)!
user_config.se_notifications_on_error_drop_threshold: 15, # (16)!
user_config.se_enable_error_table: True, # (17)!
user_config.enable_query_dq_detailed_result: True, # (18)!
user_config.enable_agg_dq_detailed_result: True, # (19)!
user_config.querydq_output_custom_table_name: "<catalog.schema.table-name>", #20
user_config.se_notifications_enable_smtp_server_auth: False, # (2)!
user_config.se_notifications_enable_custom_email_body: False, # (3)
user_config.se_notifications_email_smtp_host: "mailhost.com", # (4)!
user_config.se_notifications_email_smtp_port: 25, # (5)!
user_config.se_notifications_smtp_password: "your_password",# (6)!
# user_config.se_notifications_smtp_creds_dict: {
# user_config.secret_type: "cerberus",
# user_config.cbs_url: "https://prod.cerberus.nikecloud.com",
# user_config.cbs_sdb_path: "your_sdb_path",
# user_config.cbs_smtp_password: "your_smtp_password",
# }, # (7)!
user_config.se_notifications_email_from: "<sender_email_id>", # (8)!
user_config.se_notifications_email_to_other_mail_id: "<receiver_email_id's>", # (9)!
user_config.se_notifications_email_subject: "spark expectations - data quality - notifications", # (10)!
user_config.se_notifications_email_custom_body: "custom stats: 'product_id': {}", # (11)!
user_config.se_notifications_enable_slack: True, # (12)!
user_config.se_notifications_slack_webhook_url: "<slack-webhook-url>", # (13)!
user_config.se_notifications_on_start: True, # (14)!
user_config.se_notifications_on_completion: True, # (15)!
user_config.se_notifications_on_fail: True, # (16)!
user_config.se_notifications_on_error_drop_exceeds_threshold_breach: True, # (17)!
user_config.se_notifications_on_rules_action_if_failed_set_ignore: True, # (18)!
user_config.se_notifications_on_error_drop_threshold: 15, # (19)!
user_config.se_enable_error_table: True, # (20)!
user_config.enable_query_dq_detailed_result: True, # (21)!
user_config.enable_agg_dq_detailed_result: True, # (22)!
user_config.querydq_output_custom_table_name: "<catalog.schema.table-name>", #23
user_config.se_dq_rules_params: {
"env": "local",
"table": "product",
}, # (21)!
}, # (24)!
}
}
```

1. The `user_config.se_notifications_enable_email` parameter, which controls whether notifications are sent via email, is set to false by default
2. The `user_config.se_notifications_enable_custom_email_body` optional parameter, which controls whether custom email body is enabled, is set to false by default
3. The `user_config.se_notifications_email_smtp_host` parameter is set to "mailhost.com" by default and is used to specify the email SMTP domain host
4. The `user_config.se_notifications_email_smtp_port` parameter, which accepts a port number, is set to "25" by default
5. The `user_config.se_notifications_email_from` parameter is used to specify the email ID that will trigger the email notification
6. The `user_config.se_notifications_email_to_other_mail_id` parameter accepts a list of recipient email IDs
7. The `user_config.se_notifications_email_subject` parameter captures the subject line of the email
8. The `user_config.se_notifications_email_custom_body` optional parameter, captures the custom email body, need to be compliant with certain syntax
9. The `user_config.se_notifications_enable_slack` parameter, which controls whether notifications are sent via slack, is set to false by default
10. The `user_config/se_notifications_slack_webhook_url` parameter accepts the webhook URL of a Slack channel for sending notifications
11. When `user_config.se_notifications_on_start` parameter set to `True` enables notification on start of the spark-expectations, variable by default set to `False`
12. When `user_config.se_notifications_on_completion` parameter set to `True` enables notification on completion of spark-expectations framework, variable by default set to `False`
13. When `user_config.se_notifications_on_fail` parameter set to `True` enables notification on failure of spark-expectations data quality framework, variable by default set to `True`
14. When `user_config.se_notifications_on_error_drop_exceeds_threshold_breach` parameter set to `True` enables notification when error threshold reaches above the configured value
15. When `user_config.se_notifications_on_rules_action_if_failed_set_ignore` parameter set to `True` enables notification when rules action is set to ignore if failed
16. The `user_config.se_notifications_on_error_drop_threshold` parameter captures error drop threshold value
17. The `user_config.se_enable_error_table` parameter, which controls whether error data to load into error table, is set to true by default
18. When `user_config.enable_query_dq_detailed_result` parameter set to `True`, enables the option to cature the query_dq detailed stats to detailed_stats table. By default set to `False`
19. When `user_config.enable_agg_dq_detailed_result` parameter set to `True`, enables the option to cature the agg_dq detailed stats to detailed_stats table. By default set to `False`
20. The `user_config.querydq_output_custom_table_name` parameter is used to specify the name of the custom query_dq output table which captures the output of the alias queries passed in the query dq expectation. Default is <stats_table>_custom_output
21. The `user_config.se_dq_rules_params` parameter, which are required to dynamically update dq rules
2. The `user_config.se_notifications_enable_smtp_server_auth` optional parameter, which controls whether SMTP server authentication is enabled, is set to false by default
3. The `user_config.se_notifications_enable_custom_email_body` optional parameter, which controls whether custom email body is enabled, is set to false by default
4. The `user_config.se_notifications_email_smtp_host` parameter is set to "mailhost.com" by default and is used to specify the email SMTP domain host
5. The `user_config.se_notifications_email_smtp_port` parameter, which accepts a port number, is set to "25" by default
6. The `user_config.se_notifications_smtp_password` parameter is used to specify the password for the SMTP server (if smtp_server requires authentication either this parameter or `user_config.se_notifications_smtp_creds_dict` should be set)
7. The `user_config.se_notifications_smtp_creds_dict` parameter is used to specify the credentials for the SMTP server (if smtp_server requires authentication either this parameter or `user_config.se_notifications_smtp_password` should be set)
8. The `user_config.se_notifications_email_from` parameter is used to specify the email ID that will trigger the email notification
9. The `user_config.se_notifications_email_to_other_mail_id` parameter accepts a list of recipient email IDs
10. The `user_config.se_notifications_email_subject` parameter captures the subject line of the email
11. The `user_config.se_notifications_email_custom_body` optional parameter, captures the custom email body, need to be compliant with certain syntax
12. The `user_config.se_notifications_enable_slack` parameter, which controls whether notifications are sent via slack, is set to false by default
13. The `user_config/se_notifications_slack_webhook_url` parameter accepts the webhook URL of a Slack channel for sending notifications
14. When `user_config.se_notifications_on_start` parameter set to `True` enables notification on start of the spark-expectations, variable by default set to `False`
15. When `user_config.se_notifications_on_completion` parameter set to `True` enables notification on completion of spark-expectations framework, variable by default set to `False`
16. When `user_config.se_notifications_on_fail` parameter set to `True` enables notification on failure of spark-expectations data quality framework, variable by default set to `True`
17. When `user_config.se_notifications_on_error_drop_exceeds_threshold_breach` parameter set to `True` enables notification when error threshold reaches above the configured value
18. When `user_config.se_notifications_on_rules_action_if_failed_set_ignore` parameter set to `True` enables notification when rules action is set to ignore if failed
19. The `user_config.se_notifications_on_error_drop_threshold` parameter captures error drop threshold value
20. The `user_config.se_enable_error_table` parameter, which controls whether error data to load into error table, is set to true by default
21. When `user_config.enable_query_dq_detailed_result` parameter set to `True`, enables the option to cature the query_dq detailed stats to detailed_stats table. By default set to `False`
22. When `user_config.enable_agg_dq_detailed_result` parameter set to `True`, enables the option to cature the agg_dq detailed stats to detailed_stats table. By default set to `False`
23. The `user_config.querydq_output_custom_table_name` parameter is used to specify the name of the custom query_dq output table which captures the output of the alias queries passed in the query dq expectation. Default is <stats_table>_custom_output
24. The `user_config.se_dq_rules_params` parameter, which are required to dynamically update dq rules

In case of SMTP server authentication, the password can be passed directly with the user config or set in a secure way like Cerberus or Databricks secret.
If it is preferred to use Cerberus for secure password storage, the `user_config.se_notifications_smtp_creds_dict` parameter can be used to specify the credentials for the SMTP server in the following way:
```python
from spark_expectations.config.user_config import Constants as user_config

smtp_creds_dict = {
user_config.secret_type: "cerberus", # (1)!
user_config.cbs_url: "https://prod.cerberus.nikecloud.com", # (2)!
user_config.cbs_sdb_path: "your_sdb_path", # (3)!
user_config.cbs_smtp_password: "your_smtp_password", # (4)!
}
```
1. The `user_config.secret_type` used to define type of secret store and takes two values (`databricks`, `cerberus`)
2. The `user_config.cbs_url` used to pass Cerberus URL
3. The `user_config.cbs_sdb_path` captures Cerberus secure data store path
4. The `user_config.cbs_smtp_password` captures key for smtp_password in the Cerberus sdb

Similarly, if it is preferred to use Databricks for secure password storage, the `user_config.se_notifications_smtp_creds_dict` parameter can be used to specify the credentials for the SMTP server in the following way:
```python
from spark_expectations.config.user_config import Constants as user_config

smtp_creds_dict = {
user_config.secret_type: "databricks", # (1)!
user_config.dbx_workspace_url: "https://workspace.cloud.databricks.com", # (2)!
user_config.dbx_secret_scope: "your_secret_scope", # (3)!
user_config.dbx_smtp_password: "your_password", # (4)!
}
```
1. The `user_config.secret_type` used to define type of secret store and takes two values (`databricks`, `cerberus`)
2. The `user_config.dbx_workspace_url` used to pass Databricks workspace in the format `https://<workspace_name>.cloud.databricks.com`
3. The `user_config.dbx_secret_scope` captures name of the secret scope
4. The `user_config.dbx_smtp_password` captures secret key for smtp password in the Databricks secret scope

```python
### Spark Expectations Initialization

For all the below examples the below import and SparkExpectations class instantiation is mandatory
Expand Down
19 changes: 17 additions & 2 deletions docs/iceberg.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### Example - Write to Delta

Setup SparkSession for iceberg to test in your local environment. Configure accordingly for higher environments.
Refer to Examples in [base_setup.py](../spark_expectations/examples/base_setup.py) and
[delta.py](../spark_expectations/examples/sample_dq_iceberg.py)
Refer to Examples in [base_setup.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/base_setup.py) and
[iceberg.py](https://github.com/Nike-Inc/spark-expectations/blob/main/spark_expectations/examples/sample_dq_iceberg.py)

```python title="spark_session"
from pyspark.sql import SparkSession
Expand Down Expand Up @@ -52,12 +52,27 @@ se: SparkExpectations = SparkExpectations(
stats_streaming_options={user_config.se_enable_streaming: False},
)

#if smtp server needs to be authenticated, password can be passed directly with user config or set in a secure way like cerberus or databricks secret
smtp_creds_dict = {
user_config.secret_type: "cerberus",
user_config.cbs_url: "htpps://cerberus.example.com",
user_config.cbs_sdb_path: "",
user_config.cbs_smtp_password: "",
# user_config.secret_type: "databricks",
# user_config.dbx_workspace_url: "https://workspace.cloud.databricks.com",
# user_config.dbx_secret_scope: "your_secret_scope",
# user_config.dbx_smtp_password: "your_password",
}

# Commented fields are optional or required when notifications are enabled
user_conf = {
user_config.se_notifications_enable_email: False,
# user_config.se_notifications_enable_smtp_server_auth: False,
# user_config.se_notifications_enable_custom_email_body: True,
# user_config.se_notifications_email_smtp_host: "mailhost.com",
# user_config.se_notifications_email_smtp_port: 25,
# user_config.se_notifications_smtp_password: "your_password",
# user_config.se_notifications_smtp_creds_dict: smtp_creds_dict,
# user_config.se_notifications_email_from: "",
# user_config.se_notifications_email_to_other_mail_id: "",
# user_config.se_notifications_email_subject: "spark expectations - data quality - notifications",
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ otherwise specified.
send notifications at the start, as well as upon completion


There is a field in the rules table called [action_if_failed](getting-started/setup/#action_if_failed), which determines
There is a field in the rules table called [action_if_failed](getting-started/setup.md/#action_if_failed), which determines
what needs to be done if a rule fails


Expand Down
Loading

0 comments on commit 9e93f15

Please sign in to comment.