You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had not fully realized the implications so far from our reliance on code from wordcloud::comparison.cloud(), which according to its man page does the following:
Let p_{i,j} be the rate at which word i occurs in document j, and p_j be the average across documents(∑_ip_{i,j}/ndocs). The size of each word is mapped to its maximum deviation ( max_i(p_{i,j}-p_j) ), and its angular position is determined by the document where that maximum occurs.
So words that occur at the same rate across partitions are not mapped, and each word is mapped only to one partition. If comparing three groups for instance, where two talk a lot about "x", and a third about "y", then while group three will have "x" plotted for it, only one of group one or two will have "x". And if they use "x" at the same rates, neither will have it plotted.
I can think of many reasons why we would want to change this behaviour, or at least provide alternative options.
library("quanteda")
## Package version: 2.1.2dfmat<- as.dfm(
matrix(c(
1, 2, 3, 2, 1,
3, 2, 1, 2, 3
),
nrow=2,
dimnames=list(c("d1", "d2"), letters[1:5]), byrow=TRUE
)
)
dfmat## Document-feature matrix of: 2 documents, 5 features (0.0% sparse).## features## docs a b c d e## d1 1 2 3 2 1## d2 3 2 1 2 3# all are same size
textplot_wordcloud(dfmat, min_count=1)
# three different sizes
textplot_wordcloud(dfmat[1, ], min_count=1)
# empty because there is no "maximum deviation" across documents
textplot_wordcloud(dfmat[c(1, 1), ], min_count=1, comparison=TRUE)
## Error in graphics::strwidth(word[i], cex = size[i]): invalid 'cex' value
# was this what we were expecting?
textplot_wordcloud(dfmat, min_count=1, comparison=TRUE)
The text was updated successfully, but these errors were encountered:
I had not fully realized the implications so far from our reliance on code from
wordcloud::comparison.cloud()
, which according to its man page does the following:So words that occur at the same rate across partitions are not mapped, and each word is mapped only to one partition. If comparing three groups for instance, where two talk a lot about "x", and a third about "y", then while group three will have "x" plotted for it, only one of group one or two will have "x". And if they use "x" at the same rates, neither will have it plotted.
I can think of many reasons why we would want to change this behaviour, or at least provide alternative options.
The text was updated successfully, but these errors were encountered: