Skip to content

Commit

Permalink
feat: add classification functions
Browse files Browse the repository at this point in the history
Summary:
Add the classification functions from presto into velox: https://prestodb.io/docs/current/functions/aggregate.html#classification-metrics-aggregate-functions

Classification functions all use `FixedDoubleHistogram`, which is a data structure to represent the bucket of weights. The index of the bucket for the histogram is evenly distributed between the min and value values. 

For all of the classification functions, the only difference is the extraction phase. All other steps will be the same.

At a high level:
- addRawInput will add a value into either the true or false weight bucket. The bucket to add the value to will depend on the prediction value. The prediction value is linearly mapped into a bucket based on (min, max and bucketCount) by normalizing the prediction between min and max.

- The schema of the intermediate states is [version header][bucket count][min][max][weights]

Differential Revision: D66684198
  • Loading branch information
yuandagits authored and facebook-github-bot committed Dec 9, 2024
1 parent 929affe commit e939248
Show file tree
Hide file tree
Showing 6 changed files with 944 additions and 0 deletions.
5 changes: 5 additions & 0 deletions velox/functions/prestosql/aggregates/AggregateNames.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ const char* const kBitwiseXor = "bitwise_xor_agg";
const char* const kBoolAnd = "bool_and";
const char* const kBoolOr = "bool_or";
const char* const kChecksum = "checksum";
const char* const kClassificationFallout = "classification_fall_out";
const char* const kClassificationPrecision = "classification_precision";
const char* const kClassificationRecall = "classification_recall";
const char* const kClassificationMissRate = "classification_miss_rate";
const char* const kClassificationThreshold = "classification_thresholds";
const char* const kCorr = "corr";
const char* const kCount = "count";
const char* const kCountIf = "count_if";
Expand Down
Loading

0 comments on commit e939248

Please sign in to comment.