Topic Modelling on US Presidential election 2012 speeches
We have downloaded some speeches by B. Obama and M. Romney made during US Presidential election 2012. The speeches are uploaded on GitHUB and we would try to find some imortant topics each of their speeches consisted of.
Unsupervised ML algorithm are difficult to execute as there are NO label to measure the performance however we would use some search method to chose the best parameters. For better management, we will do Topic modelling in 2 steps
-
Data cleaning using Spacy, NLTK
-
Model creation using Gensim and visualize them using plot, Wordcloud and pyLDAvis. We have chosen LDA algorithm to start with.