Add native textmodel_lda #30

koheiw · 2020-08-04T15:19:40Z

topicmodels::LDA is implemented using this library, which I can call directly via Rcpp:

https://sourceforge.net/projects/gibbslda/files/

We can call the library in this way

https://github.com/cran/topicmodels/blob/ade6dc5698f385ad222fd28aa8e90c1a4bd33cf5/R/lda.R#L134-L155

There are a lot of things going on but it shouldn't be too complex for minimal functions that users usually need:

If we implement our quanteda-native LDA, I move quanteda.seededlda to this package.

https://github.com/koheiw/quanteda.seededlda

The text was updated successfully, but these errors were encountered:

koheiw · 2020-08-04T15:25:37Z

GibbsLDA++-0.2.tar.gz

koheiw · 2020-08-10T07:29:35Z

I manage to make GibbsLDA++ work and we have both seeded and regular LDA.

# seeded LDA (repliates https://github.com/koheiw/quanteda.seededlda)

> result10 <- textmodel_lda(dfmt_spnik, verbose = FALSE, seeds = tfmt_spnik)
> terms(result10)
      economy    politics        society         diplomacy    military   nature      other     
 [1,] "company"  "parliament"    "police"        "diplomatic" "army"     "human"     "going"   
 [2,] "money"    "congress"      "school"        "embassy"    "navy"     "sand"      "really"  
 [3,] "market"   "politicians"   "hospital"      "ambassador" "soldiers" "water"     "come"    
 [4,] "bank"     "parliamentary" "prison"        "treaty"     "marine"   "syria"     "see"     
 [5,] "industry" "lawmakers"     "women"         "diplomat"   "korea"    "syrian"    "american"
 [6,] "banks"    "voters"        "man"           "diplomats"  "korean"   "terrorist" "know"    
 [7,] "markets"  "lawmaker"      "investigation" "sanctions"  "missile"  "daesh"     "facebook"
 [8,] "banking"  "politician"    "found"         "iran"       "air"      "turkish"   "much"    
 [9,] "china"    "uk"            "court"         "deal"       "nuclear"  "turkey"    "good"    
[10,] "chinese"  "eu"            "children"      "meeting"    "force"    "weapons"   "team"  

# regular (unseeded) LDA
> result11 <- textmodel_lda(dfmt_spnik, k = 7, verbose = FALSE)
> terms(result11)
      topic1     topic2      topic3      topic4       topic5      topic6         topic7    
 [1,] "korea"    "china"     "syria"     "eu"         "going"     "uk"           "police"  
 [2,] "korean"   "chinese"   "syrian"    "sanctions"  "really"    "house"        "video"   
 [3,] "nuclear"  "economic"  "israel"    "iran"       "much"      "british"      "women"   
 [4,] "missile"  "india"     "terrorist" "deal"       "know"      "department"   "court"   
 [5,] "air"      "oil"       "daesh"     "union"      "see"       "white"        "man"     
 [6,] "nato"     "billion"   "turkish"   "agreement"  "come"      "campaign"     "found"   
 [7,] "force"    "trade"     "turkey"    "germany"    "good"      "ukrainian"    "children"
 [8,] "japan"    "project"   "weapons"   "elections"  "something" "secretary"    "service" 
 [9,] "kim"      "indian"    "saudi"     "parliament" "facebook"  "ukraine"      "swedish" 
[10,] "aircraft" "companies" "iraq"      "german"     "problem"   "intelligence" "rights"

My question is should I separate the function to textmodel_lda(x, k) and textmodel_seededlda(x, dictionary) just like my older package?

JBGruber · 2020-08-11T08:54:08Z

Just my very subjective two cents: I think a dedicated textmodel_seededlda() function would be good advertisement for the concept as it is not widely known yet.

Which doesn't mean though that textmodel_lda() shouldn't be able to do it as well. Like stringi::stri_detect() which runs stringi::stri_detect_fixed() if one wants to.

koheiw · 2020-08-11T10:35:07Z

@JBGruber thanks for the input. I added textmodel_seededlda() to make it more visible to users.

kbenoit · 2020-08-18T16:49:12Z

Sorry to be a downer here - and I was offline for 2 weeks - but seeded LDA is already available through topicmodels::LDA(). See #31 (review).

koheiw added the enhancement New feature or request label Aug 4, 2020

koheiw added the help wanted Extra attention is needed label Aug 4, 2020

koheiw added a commit that referenced this issue Aug 9, 2020

Draft native LDA function #30

3e6c67d

koheiw added a commit that referenced this issue Aug 10, 2020

Allow seed words for #30

c22bd43

koheiw mentioned this issue Aug 11, 2020

Add native LDA function #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native textmodel_lda #30

Add native textmodel_lda #30

koheiw commented Aug 4, 2020

koheiw commented Aug 4, 2020

koheiw commented Aug 10, 2020

JBGruber commented Aug 11, 2020

koheiw commented Aug 11, 2020

kbenoit commented Aug 18, 2020

Add native textmodel_lda #30

Add native textmodel_lda #30

Comments

koheiw commented Aug 4, 2020

koheiw commented Aug 4, 2020

koheiw commented Aug 10, 2020

JBGruber commented Aug 11, 2020

koheiw commented Aug 11, 2020

kbenoit commented Aug 18, 2020