-
Notifications
You must be signed in to change notification settings - Fork 10
Add public API to deal with grouped data #17
Comments
Note that there are two issues here. The first case is something that to my knowledge is not possible in Say you want to graph a histogram of income for white people and hispanic people, but many people identify as both white and hispanic.
Grammar of Graphics assumes that a category is mutually exclusive, as it would only allow grouping based on a single categorical variable What I would love to be able to do is a syntax along the lines of
Here,
Note that the above scenario only works (I think), if both Perhaps this idea could be extended all the way to the grouping APIs themselves in JuliaDB and DataFrames. As far as I know, there isn't too much preventing a cc @nalimilan because this seems like something a demographer might have desired before. I think that |
This type of grouping is not part of grouping APIs because it's statistically invalid, and the approach listed leads to pseudoreplication. I am a strong believer in that your plots should honestly portray your data and there should be a seamless correspondence between plots and statistics. A statistically appropriate way (which is consistent with standard grouping) is to include a third factor level for those that self-identify as more than one ethnic group. |
I think there are situations where it's fine to represent stats for non-exclusive subgroups. For example it can happen if you ask a batteries of yes/no questions and want to see the characteristics of people who answered "yes" for each question. In this case it's not practical (nor interesting) to have a level for each combination of possible answers. That said, I'm not familiar enough with StatsMakie to have an opinion regarding the API. |
Reminder: while GoG assumes data is the long tidy format, one could probably be more flexible by allowing more methods to construct the
PlottableTable
: there could be some public API to buildPlottableTables
manually starting from different data structures and grouping information. An added benefit would be if this gives some equivalent ofplot(x, [y1 y2])
from Plots for free.The text was updated successfully, but these errors were encountered: