Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify standard deviation / variance documentation with Bessel's correction #4786

Merged
merged 7 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -568,8 +568,9 @@ class Aggregate {
}

/**
* Returns an aggregator that computes the standard deviation of values, within an aggregation
* group, for each input column.
* Returns an aggregator that computes the sample standard deviation of values, within an
* aggregation group, for each input column. Sample standard deviation is computed using
* Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*/
[[nodiscard]]
static Aggregate Std(std::vector<std::string> column_specs);
Expand Down Expand Up @@ -608,8 +609,9 @@ class Aggregate {
}

/**
* Returns an aggregator that computes the variance of values, within an aggregation group,
* for each input column.
* Returns an aggregator that computes the sample variance of values, within an aggregation group,
* for each input column. Sample variance is computed using Bessel's correction:
* https://en.wikipedia.org/wiki/Bessel%27s_correction
*/
[[nodiscard]]
static Aggregate Var(std::vector<std::string> column_specs);
Expand Down Expand Up @@ -801,8 +803,9 @@ Aggregate AggPct(double percentile, Args &&... args) {
}

/**
* Returns an aggregator that computes the standard deviation of values, within an aggregation
* group, for each input column.
* Returns an aggregator that computes the sample standard deviation of values, within an
* aggregation group, for each input column. Sample standard deviation is computed using Bessel's correction:
* https://en.wikipedia.org/wiki/Bessel%27s_correction
*/
template<typename ...Args>
[[nodiscard]]
Expand All @@ -821,8 +824,9 @@ Aggregate aggSum(Args &&... args) {
}

/**
* Returns an aggregator that computes the variance of values, within an aggregation group,
* for each input column.
* Returns an aggregator that computes the sample variance of values, within an aggregation group,
* for each input column. Sample variance is computed using
* Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*/
template<typename ...Args>
[[nodiscard]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -528,11 +528,12 @@ UpdateByOperation rollingCountTime(std::string timestamp_col, std::vector<std::s
deephaven::client::utility::DurationSpecifier rev_time,
deephaven::client::utility::DurationSpecifier fwd_time = 0);
/**
* Creates a rolling standard deviation UpdateByOperation for the supplied column names, using ticks as the
* windowing unit. Ticks are row counts, and you may specify the reverse and forward window in
* number of rows to include. The current row is considered to belong to the reverse window but
* not the forward window. Also, negative values are allowed and can be used to generate completely
* forward or completely reverse windows.
* Creates a rolling sample standard deviation UpdateByOperation for the supplied column names,
* using ticks as the windowing unit. Sample standard deviation is computed using Bessel's correction,
* discussed here: https://en.wikipedia.org/wiki/Bessel%27s_correction
* Ticks are row counts, and you may specify the reverse and forward window in number of rows to include.
* The current row is considered to belong to the reverse window but not the forward window.
* Also, negative values are allowed and can be used to generate completely forward or completely reverse windows.
*
* See the documentation of rollingSumTick() for examples of window values.
*
Expand All @@ -543,8 +544,10 @@ UpdateByOperation rollingCountTime(std::string timestamp_col, std::vector<std::s
*/
UpdateByOperation rollingStdTick(std::vector<std::string> cols, int rev_ticks, int fwd_ticks = 0);
/**
* Creates a rolling standard deviation UpdateByOperation for the supplied column names, using time as the
* windowing unit. This function accepts nanoseconds or time strings as the reverse and forward
* Creates a rolling sample standard deviation UpdateByOperation for the supplied column names, using time as the
* windowing unit. Sample standard deviation is computed using Bessel's correction,
* discussed here: https://en.wikipedia.org/wiki/Bessel%27s_correction
* This function accepts nanoseconds or time strings as the reverse and forward window parameters.
* window parameters. Negative values are allowed and can be used to generate completely forward or
* completely reverse windows. A row containing a null in the timestamp column belongs to no window
* and will not be considered in the windows of other rows; its output will be null.
Expand Down
76 changes: 45 additions & 31 deletions engine/function/src/templates/Numeric.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -427,20 +427,22 @@ public class Numeric {
}

/**
* Returns the variance. Null values are excluded.
* Returns the sample variance. Null values are excluded.
* Sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return variance of non-null values.
* @return sample variance of non-null values.
*/
public static double var(${pt.boxed}[] values) {
return var(unbox(values));
}

/**
* Returns the variance. Null values are excluded.
* Returns the sample variance. Null values are excluded.
* Sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return variance of non-null values.
* @return sample variance of non-null values.
*/
public static double var(${pt.primitive}... values) {
if (values == null) {
Expand All @@ -451,10 +453,11 @@ public class Numeric {
}

/**
* Returns the variance. Null values are excluded.
* Returns the sample variance. Null values are excluded.
* Sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return variance of non-null values.
* @return sample variance of non-null values.
*/
public static double var(${pt.vector} values) {
if (values == null) {
Expand All @@ -476,7 +479,7 @@ public class Numeric {
}
}

// Return NaN if poisoned or too few values to compute variance.
// Return NaN if poisoned or too few values to compute sample variance.
if (count <= 1 || Double.isNaN(sum) || Double.isNaN(sum2)) {
return Double.NaN;
}
Expand All @@ -487,19 +490,20 @@ public class Numeric {
final double delta = sum2 - vs2bar;
final double rel_eps = delta / eps;

// Return zero when the variance is leq the floating point error.
// Return zero when the sample variance is leq the floating point error.
return Math.abs(rel_eps) > 1.0 ? delta / (count - 1) : 0.0;
}

<#list primitiveTypes as pt2>
<#if pt2.valueType.isNumber >

/**
* Returns the weighted variance. Null values are excluded.
* Returns the weighted sample variance. Null values are excluded.
* Weighted sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted variance of non-null values.
* @return weighted sample variance of non-null values.
*/
public static double wvar(${pt.primitive}[] values, ${pt2.primitive}[] weights) {
if (values == null || weights == null) {
Expand All @@ -510,11 +514,12 @@ public class Numeric {
}

/**
* Returns the weighted variance. Null values are excluded.
* Returns the weighted sample variance. Null values are excluded.
* Weighted sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted variance of non-null values.
* @return weighted sample variance of non-null values.
*/
public static double wvar(${pt.primitive}[] values, ${pt2.vector} weights) {
if (values == null || weights == null) {
Expand All @@ -525,11 +530,12 @@ public class Numeric {
}

/**
* Returns the weighted variance. Null values are excluded.
* Returns the weighted sample variance. Null values are excluded.
* Weighted sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted variance of non-null values.
* @return weighted sample variance of non-null values.
*/
public static double wvar(${pt.vector} values, ${pt2.primitive}[] weights) {
if (values == null || weights == null) {
Expand All @@ -540,11 +546,12 @@ public class Numeric {
}

/**
* Returns the weighted variance. Null values are excluded.
* Returns the weighted sample variance. Null values are excluded.
* Weighted sample variance is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted variance of non-null values.
* @return weighted sample variance of non-null values.
*/
public static double wvar(${pt.vector} values, ${pt2.vector} weights) {
if (values == null || weights == null) {
Expand Down Expand Up @@ -579,7 +586,7 @@ public class Numeric {
}
}

// Return NaN if poisoned or too few values to compute variance.
// Return NaN if poisoned or too few values to compute sample variance.
if (count <= 1 || Double.isNaN(sum) || Double.isNaN(sum2) || Double.isNaN(count) || Double.isNaN(count2)) {
return Double.NaN;
}
Expand All @@ -597,20 +604,22 @@ public class Numeric {


/**
* Returns the standard deviation. Null values are excluded.
* Returns the sample standard deviation. Null values are excluded.
* Sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return standard deviation of non-null values.
* @return sample standard deviation of non-null values.
*/
public static double std(${pt.boxed}[] values) {
return std(unbox(values));
}

/**
* Returns the standard deviation. Null values are excluded.
* Returns the sample standard deviation. Null values are excluded.
* Sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return standard deviation of non-null values.
* @return sample standard deviation of non-null values.
*/
public static double std(${pt.primitive}... values) {
if (values == null) {
Expand All @@ -621,10 +630,11 @@ public class Numeric {
}

/**
* Returns the standard deviation. Null values are excluded.
* Returns the sample standard deviation. Null values are excluded.
* Sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @return standard deviation of non-null values.
* @return sample standard deviation of non-null values.
*/
public static double std(${pt.vector} values) {
if (values == null) {
Expand All @@ -639,11 +649,12 @@ public class Numeric {
<#if pt2.valueType.isNumber >

/**
* Returns the weighted standard deviation. Null values are excluded.
* Returns the weighted sample standard deviation. Null values are excluded.
* Weighted sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted standard deviation of non-null values.
* @return weighted sample standard deviation of non-null values.
*/
public static double wstd(${pt.primitive}[] values, ${pt2.primitive}[] weights) {
if (values == null || weights == null) {
Expand All @@ -654,11 +665,12 @@ public class Numeric {
}

/**
* Returns the weighted standard deviation. Null values are excluded.
* Returns the weighted sample standard deviation. Null values are excluded.
* Weighted sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted standard deviation of non-null values.
* @return weighted sample standard deviation of non-null values.
*/
public static double wstd(${pt.primitive}[] values, ${pt2.vector} weights) {
if (values == null || weights == null) {
Expand All @@ -669,11 +681,12 @@ public class Numeric {
}

/**
* Returns the weighted standard deviation. Null values are excluded.
* Returns the weighted sample standard deviation. Null values are excluded.
* Weighted sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted standard deviation of non-null values.
* @return weighted sample standard deviation of non-null values.
*/
public static double wstd(${pt.vector} values, ${pt2.primitive}[] weights) {
if (values == null || weights == null) {
Expand All @@ -684,11 +697,12 @@ public class Numeric {
}

/**
* Returns the weighted standard deviation. Null values are excluded.
* Returns the weighted sample standard deviation. Null values are excluded.
* Weighted sample standard deviation is computed using Bessel's correction: https://en.wikipedia.org/wiki/Bessel%27s_correction
*
* @param values values.
* @param weights weights
* @return weighted standard deviation of non-null values.
* @return weighted sample standard deviation of non-null values.
*/
public static double wstd(${pt.vector} values, ${pt2.vector} weights) {
if (values == null || weights == null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ public enum AggType {
Sum,
/** Return the sum of absolute values in each group. */
AbsSum,
/** Return the variance of values in each group. */
/** Return the sample variance of values in each group. */
Var,
/** Return the average of values in each group. */
Avg,
/** Return the standard deviation of each group. */
/** Return the sample standard deviation of each group. */
Std,
/** Return the first value of each group. */
First,
Expand Down
12 changes: 8 additions & 4 deletions go/pkg/client/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -1013,13 +1013,15 @@ func (qb QueryNode) AvgBy(by ...string) QueryNode {
return qb.addOp(dedicatedAggOp{child: qb, colNames: by, kind: tablepb2.ComboAggregateRequest_AVG})
}

// StdBy returns the standard deviation for each group. Null values are ignored.
// StdBy returns the sample standard deviation for each group. Null values are ignored.
// Sample standard deviation is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// Columns not used in the grouping must be numeric.
func (qb QueryNode) StdBy(by ...string) QueryNode {
return qb.addOp(dedicatedAggOp{child: qb, colNames: by, kind: tablepb2.ComboAggregateRequest_STD})
}

// VarBy returns the variance for each group. Null values are ignored.
// VarBy returns the sample variance for each group. Null values are ignored.
// Sample variance is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// Columns not used in the grouping must be numeric.
func (qb QueryNode) VarBy(by ...string) QueryNode {
return qb.addOp(dedicatedAggOp{child: qb, colNames: by, kind: tablepb2.ComboAggregateRequest_VAR})
Expand Down Expand Up @@ -1156,14 +1158,16 @@ func (b *AggBuilder) Percentile(percentile float64, cols ...string) *AggBuilder
return b
}

// Std returns an aggregator that computes the standard deviation of values, within an aggregation group, for each input column.
// Std returns an aggregator that computes the sample standard deviation of values, within an aggregation group, for each input column.
// Sample standard deviation is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// The source columns are specified by cols.
func (b *AggBuilder) StdDev(cols ...string) *AggBuilder {
b.addAgg(aggPart{matchPairs: cols, kind: tablepb2.ComboAggregateRequest_STD})
return b
}

// Var returns an aggregator that computes the variance of values, within an aggregation group, for each input column.
// Var returns an aggregator that computes the sample variance of values, within an aggregation group, for each input column.
// Sample variance is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// The source columns are specified by cols.
func (b *AggBuilder) Variance(cols ...string) *AggBuilder {
b.addAgg(aggPart{matchPairs: cols, kind: tablepb2.ComboAggregateRequest_VAR})
Expand Down
6 changes: 4 additions & 2 deletions go/pkg/client/tablehandle.go
Original file line number Diff line number Diff line change
Expand Up @@ -543,7 +543,8 @@ func (th *TableHandle) AvgBy(ctx context.Context, cols ...string) (*TableHandle,
return th.client.dedicatedAggOp(ctx, th, cols, "", tablepb2.ComboAggregateRequest_AVG)
}

// StdBy returns the standard deviation for each group. Null values are ignored.
// StdBy returns the sample standard deviation for each group. Null values are ignored.
// Sample standard deviation is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// Columns not used in the grouping must be numeric.
func (th *TableHandle) StdBy(ctx context.Context, cols ...string) (*TableHandle, error) {
if !th.rLockIfValid() {
Expand All @@ -553,7 +554,8 @@ func (th *TableHandle) StdBy(ctx context.Context, cols ...string) (*TableHandle,
return th.client.dedicatedAggOp(ctx, th, cols, "", tablepb2.ComboAggregateRequest_STD)
}

// VarBy returns the variance for each group. Null values are ignored.
// VarBy returns the sample variance for each group. Null values are ignored.
// Sample variance is calculated using `Bessel's correction <https://en.wikipedia.org/wiki/Bessel%27s_correction>`_.
// Columns not used in the grouping must be numeric.
func (th *TableHandle) VarBy(ctx context.Context, cols ...string) (*TableHandle, error) {
if !th.rLockIfValid() {
Expand Down
Loading
Loading