Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count Distinct Support For String Cols #19

Open
wants to merge 4 commits into
base: hbo-feathr-branch
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions feathr_project/feathr/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
import copy
import logging
import os
from collections import Counter
import tempfile
from typing import Dict, List, Union

from azure.identity import DefaultAzureCredential
from feathr.definition.transformation import WindowAggTransformation
from feathr.definition.transformation import WindowAggTransformation , ExpressionTransformation
from jinja2 import Template
from pyhocon import ConfigFactory
import redis
Expand Down Expand Up @@ -873,4 +874,6 @@ def _reshape_config_str(self, config_str:str):
if self.spark_runtime == 'local':
return "'{" + config_str + "}'"
else:
return config_str
return config_str


Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,9 @@ private[offline] object SlidingWindowFeatureUtils {
// In Feathr's use case, we want to treat the count aggregation as simple count of non-null items.
val rewrittenDef = s"CASE WHEN ${featureDef} IS NOT NULL THEN 1 ELSE 0 END"
new CountAggregate(rewrittenDef)
case AggregationType.COUNT_DISTINCT => new CountDistinctAggregate(featureDef)
case AggregationType.COUNT_DISTINCT =>
val rewrittenDef = s"CASE WHEN ${featureDef} IS NOT NULL THEN hash(${featureDef}) ELSE 0 END"
new CountDistinctAggregate(rewrittenDef)
case AggregationType.AVG => new AvgAggregate(featureDef)
case AggregationType.MAX => new MaxAggregate(featureDef)
case AggregationType.MIN => new MinAggregate(featureDef)
Expand Down