Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The parameter ngroup of stm() significantly modify the gamma matrix #292

Open
tdelcey opened this issue Sep 16, 2024 · 0 comments
Open

The parameter ngroup of stm() significantly modify the gamma matrix #292

tdelcey opened this issue Sep 16, 2024 · 0 comments

Comments

@tdelcey
Copy link

tdelcey commented Sep 16, 2024

Hi,

Ceteris paribus, adding the ngroup parameter to the stm function appears to modify the gamma matrix. From my understanding of the documentation, it could be normal and we should not expect that the model will converge to the same solution than a model without this parameter.

However, while the beta matrix and the global topic prevalence remain broadly similar between models, the gamma matrix is not only different but significantly so. It appears incorrect: the top documents associated with each topic do not seem to be related to the topic itself.

I tested this with my own data, and a quick check using the sample data from the stm package suggests a similar issue.

Below is a simple example:

library(stm)

docs <- stm::poliblog5k.docs
vocab <- stm::poliblog5k.voc
data <- stm::poliblog5k.meta 


stm_1 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123) 
stm_2 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123, ngroups = 2) 


plot(stm_1, type = "summary", n = 5)
plot(stm_2, type = "summary", n = 5)

findThoughts(stm_1, texts = data$text, topics = 6, n = 5)
findThoughts(stm_2, texts = data$text, topics = 6, n = 5)

gamma_1 <- tidytext::tidy(stm_1, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))

gamma_2 <- tidytext::tidy(stm_2, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))

@tdelcey tdelcey changed the title The parameter ngroup of stm() significantly modify the matrix gamma The parameter ngroup of stm() significantly modify the gamma matrix Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant