Topic-Modeling

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents. The objective is to identefy some topics throught frequently repeted words.

Methodology: Using the Latent Dirichlet Allocation (LDA) algorithm. Comparison between Gensim LDA model and Sklearn LDA model to compare the results.

Data summary: The data that was used to train the models is articles dataset. I used the content column to train the models. It’s a column of strings with 50000 record.

Most common words in the dataset using wordcloud.

Results: Sklearn model results: Gensim model results:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Topic Modeling.py		Topic Modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic-Modeling

About

Releases

Packages

Languages

MaherAhmed0/Topic-Modeling

Folders and files

Latest commit

History

Repository files navigation

Topic-Modeling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages