Skip to content

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

Notifications You must be signed in to change notification settings

MaherAhmed0/Topic-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Topic-Modeling

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents. The objective is to identefy some topics throught frequently repeted words.

Methodology: Using the Latent Dirichlet Allocation (LDA) algorithm. Comparison between Gensim LDA model and Sklearn LDA model to compare the results.

Data summary: The data that was used to train the models is articles dataset. I used the content column to train the models. It’s a column of strings with 50000 record.

data data2

Most common words in the dataset using wordcloud. WordCloud

Results: Sklearn model results: image image Gensim model results: image image

About

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages