feat: add classification functions

Summary: Add the classification functions from presto into velox: https://prestodb.io/docs/current/functions/aggregate.html#classification-metrics-aggregate-functions Classification functions all use `FixedDoubleHistogram`, which is a data structure to represent the bucket of weights. The index of the bucket for the histogram is evenly distributed between the min and value values. For all of the classification functions, the only difference is the extraction phase. All other steps will be the same. At a high level: - addRawInput will add a value into either the true or false weight bucket. The bucket to add the value to will depend on the prediction value. The prediction value is linearly mapped into a bucket based on (min, max and bucketCount) by normalizing the prediction between min and max. - The schema of the intermediate states is [version header][bucket count][min][max][weights] Differential Revision: D66684198
facebookincubator · Dec 9, 2024 · e939248 · e939248
1 parent 929affe
commit e939248
Show file tree

Hide file tree

Showing 6 changed files with 944 additions and 0 deletions.
diff --git a/velox/functions/prestosql/aggregates/AggregateNames.h b/velox/functions/prestosql/aggregates/AggregateNames.h
@@ -32,6 +32,11 @@ const char* const kBitwiseXor = "bitwise_xor_agg";
 const char* const kBoolAnd = "bool_and";
 const char* const kBoolOr = "bool_or";
 const char* const kChecksum = "checksum";
+const char* const kClassificationFallout = "classification_fall_out";
+const char* const kClassificationPrecision = "classification_precision";
+const char* const kClassificationRecall = "classification_recall";
+const char* const kClassificationMissRate = "classification_miss_rate";
+const char* const kClassificationThreshold = "classification_thresholds";
 const char* const kCorr = "corr";
 const char* const kCount = "count";
 const char* const kCountIf = "count_if";