You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The R summary is a fivenum summary + most common items counts for non numeric cols + count of NAs
A ddf fivenum is a fivenum summary + drop non-numeric cols + ignore NAs
A ddf summary is a method for a R base package generic and as such should be semantically close and if possible identical to the method for data.frames, particularly in this case where data frames in memory and distributed are implementation of the same abstraction (summary for vectors is by necessity a little different). Instead a ddf summary has: mean, min, max and count of NAs in common with R summary, stddev, count of rows, repeated for each column, and returns NAs for string columns (not even the rows that make sense, like count an cNA)
My suggestion is: keep fivenum what it is, since everybody knows the five num summary. Make summary a true method for base::summary results should be identical.
Add a function for momenta like mean, stddev, skewness, kurtosis etc
If the changes in the definition of summary should apply to other languages is a bit more subtle. I guess no one has the monopoly on the word summary. Unless there are specific expectation in Java or Python it'd be simpler to make them all the same. In R there are clear expectations for what summary should do.
The text was updated successfully, but these errors were encountered:
The R summary is a fivenum summary + most common items counts for non numeric cols + count of NAs
A ddf fivenum is a fivenum summary + drop non-numeric cols + ignore NAs
A ddf summary is a method for a R base package generic and as such should be semantically close and if possible identical to the method for data.frames, particularly in this case where data frames in memory and distributed are implementation of the same abstraction (summary for vectors is by necessity a little different). Instead a ddf summary has: mean, min, max and count of NAs in common with R summary, stddev, count of rows, repeated for each column, and returns NAs for string columns (not even the rows that make sense, like count an cNA)
My suggestion is: keep fivenum what it is, since everybody knows the five num summary. Make summary a true method for base::summary results should be identical.
Add a function for momenta like mean, stddev, skewness, kurtosis etc
If the changes in the definition of summary should apply to other languages is a bit more subtle. I guess no one has the monopoly on the word summary. Unless there are specific expectation in Java or Python it'd be simpler to make them all the same. In R there are clear expectations for what summary should do.
The text was updated successfully, but these errors were encountered: