Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents. The objective is to identefy some topics throught frequently repeted words.
Methodology: Using the Latent Dirichlet Allocation (LDA) algorithm. Comparison between Gensim LDA model and Sklearn LDA model to compare the results.
Data summary: The data that was used to train the models is articles dataset. I used the content column to train the models. It’s a column of strings with 50000 record.