-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add native textmodel_lda #30
Comments
I manage to make GibbsLDA++ work and we have both seeded and regular LDA. # seeded LDA (repliates https://github.com/koheiw/quanteda.seededlda)
> result10 <- textmodel_lda(dfmt_spnik, verbose = FALSE, seeds = tfmt_spnik)
> terms(result10)
economy politics society diplomacy military nature other
[1,] "company" "parliament" "police" "diplomatic" "army" "human" "going"
[2,] "money" "congress" "school" "embassy" "navy" "sand" "really"
[3,] "market" "politicians" "hospital" "ambassador" "soldiers" "water" "come"
[4,] "bank" "parliamentary" "prison" "treaty" "marine" "syria" "see"
[5,] "industry" "lawmakers" "women" "diplomat" "korea" "syrian" "american"
[6,] "banks" "voters" "man" "diplomats" "korean" "terrorist" "know"
[7,] "markets" "lawmaker" "investigation" "sanctions" "missile" "daesh" "facebook"
[8,] "banking" "politician" "found" "iran" "air" "turkish" "much"
[9,] "china" "uk" "court" "deal" "nuclear" "turkey" "good"
[10,] "chinese" "eu" "children" "meeting" "force" "weapons" "team"
# regular (unseeded) LDA
> result11 <- textmodel_lda(dfmt_spnik, k = 7, verbose = FALSE)
> terms(result11)
topic1 topic2 topic3 topic4 topic5 topic6 topic7
[1,] "korea" "china" "syria" "eu" "going" "uk" "police"
[2,] "korean" "chinese" "syrian" "sanctions" "really" "house" "video"
[3,] "nuclear" "economic" "israel" "iran" "much" "british" "women"
[4,] "missile" "india" "terrorist" "deal" "know" "department" "court"
[5,] "air" "oil" "daesh" "union" "see" "white" "man"
[6,] "nato" "billion" "turkish" "agreement" "come" "campaign" "found"
[7,] "force" "trade" "turkey" "germany" "good" "ukrainian" "children"
[8,] "japan" "project" "weapons" "elections" "something" "secretary" "service"
[9,] "kim" "indian" "saudi" "parliament" "facebook" "ukraine" "swedish"
[10,] "aircraft" "companies" "iraq" "german" "problem" "intelligence" "rights" My question is should I separate the function to |
Just my very subjective two cents: I think a dedicated Which doesn't mean though that |
@JBGruber thanks for the input. I added |
Sorry to be a downer here - and I was offline for 2 weeks - but seeded LDA is already available through |
topicmodels::LDA
is implemented using this library, which I can call directly via Rcpp:https://sourceforge.net/projects/gibbslda/files/
We can call the library in this way
https://github.com/cran/topicmodels/blob/ade6dc5698f385ad222fd28aa8e90c1a4bd33cf5/R/lda.R#L134-L155
There are a lot of things going on but it shouldn't be too complex for minimal functions that users usually need:
If we implement our quanteda-native LDA, I move quanteda.seededlda to this package.
https://github.com/koheiw/quanteda.seededlda
The text was updated successfully, but these errors were encountered: