Denote the set of features by , and the set of classes by .
Given the full training data of size , let be the number of samples that belong to class . We estimate the frequency of each class by
where is the smoothing prior and is the total number of values of .
For a continuous feature , the parameters of are estimated by maximum likelihood estimation.
For numerical stability, you may instead compute
- distribution: the form of distribution for
- smoothing: additive smoothing parameter
- fit_prior: whether to learn class prior probabilities from data
- class_prior: user-specified prior probabilities of the classes
Check out the documentation listed below to view the attributes that are available in sklearn but not exposed to the user in the software.
- sklearn tutorial on Naïve Bayes.
- sklearn
LogisticRegression
documentation.