Performed text document (news reports) classification using Naïve Bayes classifier in Python and compared the performance of binomial and multinomial models. Obtained an accuracy of 63% for binomial model and 77% for multinomial model
This README file corresponds to the final_classifier.py script.
Run the "final_classifier.py" file, this file will import the other files for usage.
-
The program loads the training and test data from the scikit learn python module.
-
Then the feature extraction is done using TFIDF process.
-
Then based on this both Bernoulli and Multinomial classifiers are built.
-
Next, the model is tested on test data sample.
-
Then evaluate the performance using precision, recall and F-1 measure.
-
Performance of Bernoulli and Multinomial models are compared.