Skip to content

Commit

Permalink
Fix to execute the validation when func is called and replaced the ol…
Browse files Browse the repository at this point in the history
…d func definition of validate_schema()
  • Loading branch information
kunaljubce committed Mar 29, 2024
1 parent e642b86 commit 3188b54
Showing 1 changed file with 24 additions and 38 deletions.
62 changes: 24 additions & 38 deletions quinn/dataframe_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,28 @@ def validate_presence_of_columns(df: DataFrame, required_col_names: list[str]) -
if missing_col_names:
raise DataFrameMissingColumnError(error_message)

Check failure on line 39 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W293)

quinn/dataframe_validator.py:39:1: W293 Blank line contains whitespace
def validate_schema(required_schema: StructType, ignore_nullable=False, _func=None):
def validate_schema(
required_schema: StructType,

Check failure on line 41 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

quinn/dataframe_validator.py:41:33: W291 Trailing whitespace
ignore_nullable: bool = False,

Check failure on line 42 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

quinn/dataframe_validator.py:42:35: W291 Trailing whitespace
_df: DataFrame = None

Check failure on line 43 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (COM812)

quinn/dataframe_validator.py:43:26: COM812 Trailing comma missing
) -> function:

Check failure on line 44 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (F821)

quinn/dataframe_validator.py:44:6: F821 Undefined name `function`
"""Function that validate if a given DataFrame has a given StructType as its schema.
Implemented as a decorator factory so can be used both as a standalone function or as

Check failure on line 46 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

quinn/dataframe_validator.py:46:90: W291 Trailing whitespace
a decorator to another function.
:param required_schema: StructType required for the DataFrame
:type required_schema: StructType
:param ignore_nullable: (Optional) A flag for if nullable fields should be
ignored during validation
:type ignore_nullable: bool, optional
:param _df: DataFrame to validate, mandatory when called as a function. Not required
when called as a decorator
:type _df: DataFrame
:raises DataFrameMissingStructFieldError: if any StructFields from the required
schema are not included in the DataFrame schema
"""

def decorator(func):

Check failure on line 62 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (ANN202)

quinn/dataframe_validator.py:62:9: ANN202 Missing return type annotation for private function `decorator`

Check failure on line 62 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (ANN001)

quinn/dataframe_validator.py:62:19: ANN001 Missing type annotation for function argument `func`
def wrapper(*args, **kwargs):

Check failure on line 63 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (ANN202)

quinn/dataframe_validator.py:63:13: ANN202 Missing return type annotation for private function `wrapper`

Check failure on line 63 in quinn/dataframe_validator.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (ANN002)

quinn/dataframe_validator.py:63:22: ANN002 Missing type annotation for `*args`
df = func(*args, **kwargs)
Expand All @@ -59,47 +80,12 @@ def wrapper(*args, **kwargs):
return df
return wrapper

if _func is None:
if _df is None:
# This means the function is being used as a decorator
return decorator
else:
# This means the function is being called directly with a DataFrame
return decorator(lambda: _func)()


def x_validate_schema(
df: DataFrame,
required_schema: StructType,
ignore_nullable: bool = False,
) -> None:
"""Function that validate if a given DataFrame has a given StructType as its schema.
:param df: DataFrame to validate
:type df: DataFrame
:param required_schema: StructType required for the DataFrame
:type required_schema: StructType
:param ignore_nullable: (Optional) A flag for if nullable fields should be
ignored during validation
:type ignore_nullable: bool, optional
:raises DataFrameMissingStructFieldError: if any StructFields from the required
schema are not included in the DataFrame schema
"""
_all_struct_fields = copy.deepcopy(df.schema)
_required_schema = copy.deepcopy(required_schema)

if ignore_nullable:
for x in _all_struct_fields:
x.nullable = None

for x in _required_schema:
x.nullable = None

missing_struct_fields = [x for x in _required_schema if x not in _all_struct_fields]
error_message = f"The {missing_struct_fields} StructFields are not included in the DataFrame with the following StructFields {_all_struct_fields}"

if missing_struct_fields:
raise DataFrameMissingStructFieldError(error_message)
return decorator(lambda: _df)()


def validate_absence_of_columns(df: DataFrame, prohibited_col_names: list[str]) -> None:
Expand Down

0 comments on commit 3188b54

Please sign in to comment.