Table Of Contents

Previous topic

Ranking Learners

Next topic

Third-party Learners

This Page

Sparse Learners

The learners.sparse package containts Learners meant for sparse data. By sparse data, we mean data where the non-zero inputs are not given explicitly.

The MLProblems for these Learners should have inputs which decompose into two parts, the input variable values and the corresponding indices of those non-zero variables. For instance, a non-sparse input vector [0,-1,0,0,0.5] would correspond to the pair ([-1,0.5],[1,4]) in a sparse format.

This package decomposes into different modules, based on the type of task the Learners are solving. Currently, the available modules are:

  • learners.sparse.classification: classification Learners on sparse inputs.

Sparse Classification Learners

Learners in this module are meant for classification problems on sparse data. They normally will require (at least) the metadata 'targets'. The mlproblems for these learners should be iterators over pairs of inputs and targets, with the target being a class index.

The currently implemented algorithms are:

  • MultinomialNaiveBayesClassifier: a multinomial naive Bayes classifier.
class learners.sparse.classification.MultinomialNaiveBayesClassifier(dirichlet_prior_parameter=1)[source]

Multinomial Naive Bayes Classifier.

This simple classifier has been found useful for text classification. Each non-zero input feature is treated as indication of the presence of a word, and its value is treated as the frequency of that word.

Options dirichlet_prior_parameter controls the amount of regularization.

Required metadata:

  • 'targets'
  • 'input_size'
A Comparison of Event Models for Naive Bayes Text Classification
McCallum and Nigam