-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3. Topic Modeling #7
Comments
Not important: http://brazenly.blogspot.ca/2016/05/r-text-classification-and-topic_1.html |
Main guide that I followed: |
DONE: |
Tips and tuning parameters: |
Based on the following: The graph above I produced using: Select
|
@neowangkkk |
@neowangkkk Working on 40 topics... |
@neowangkkk Working on 30 |
@neowangkkk |
R Code for importing dataset your working driectorysetwd("/Users/Tao/Dropbox/Data/Reddit_data/") install tydyverse packageinstall.packages("tidyverse") read file using read_csvdata<-read_csv(file = "data_full.csv") check summary statssummary(data) |
@neowangkkk |
##can you please replace these: ##can you please remove these: |
@neowangkkk I am not sure that it was good idea to exclude a lot of words. It seems that it influences and change topics. I got rid of the following words:
|
You could start with topic modeling first. Dr. Yang was using LDA with five or six methods like SVM etc. I think you can easily google some guides to do it with R or python. This is quite mature now.
The text was updated successfully, but these errors were encountered: