Multinomial Naive Bayes

Naive bayes is a bayesian model often used for text classification. Multinomial Naive Bayes specifically classifies observations based on variables with a multinomial distribution (a.k.a. numbers).

Below is a simple text classification using Multinomial Naive Bayes in Lurn.

Start with some text documents

documents = [
  'ruby is a great programming language',
  'the giants recently won the world series',
  'java is a compiled programming language',
  'the jets are a football team'
]

labels = ['computers','sports','computers','sports']

Convert them to arrays of booleans representing which words they contain (or don't contain). Lurn provides vectorizers for this purpose.

vectorizer = Lurn::Text::WordCountVectorizer.new
vectorizer.fit(documents)
vectors = vectorizer.transform(documents)

Initialize and train the model

model = Lurn::NaiveBayes::MultinomialNaiveBayes.new
model.fit(vectors, labels)

Classify a new document

new_vectors = vectorizer.transform(['programming is fun'])

# get the most probable class for the new document given the training data
model.max_class(new_vectors.first)

# get the probability score for the most probable class
model.max_probability(new_vectors.first)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multinomial_naive_bayes.md

multinomial_naive_bayes.md

Multinomial Naive Bayes

Files

multinomial_naive_bayes.md

Latest commit

History

multinomial_naive_bayes.md

File metadata and controls

Multinomial Naive Bayes