Table Of Contents

Previous topic

LIBSVM Learners

Next topic

TreeLearn Learners

This Page

Milk Learners

The package learners.third_party.milk contains modules for learning algorithms using the Milk library. These modules all require that the Milk library be installed.

To use milk through MLPython, do the following:

  1. download milk from here: http://pypi.python.org/pypi/milk/#downloads

  2. install milk (instructions from Milk’s INSTALL.rst) The following should work:

    python setup.py install
    
  3. IMPORTANT When importing milk for the first time, make sure not to be in its directory.

And that should do it! Try ‘import milk’ to see if your installation is working.

Currently, learner.third_party.milk contains the following modules:

  • learning.third_party.milk.classification: Classifiers from the Milk library.

Milk Classifiers

The learners.third_party.milk.classification module contains classifiers based on the Milk library:

  • TreeClassifier: Decision tree classifier.
class learners.third_party.milk.classification.TreeClassifier(criterion='information_gain', min_split=4, include_entropy=False)[source]

Decision Tree Classifier using Milk library

A decision tree classifier (currently, implements the greedy ID3 algorithm without any pruning).

Option criterion should be a string. Set it to 'information_gain', to use the information gain splitting criterion (see http://en.wikipedia.org/wiki/Information_gain_in_decision_trees), or to 'z1_loss' to use the 0-1 classification accuracy as the splitting criterion (default: 'information_gain').

Option min_split is a threshold, such that a node will not be split further if it has less than min_split examples in it (default: 4).

If option include_entropy is True, the information gain criterion will include the original entropy (default: False).

Required metadata:

  • 'targets'
  • 'class_to_id'

TODO:

  • Support Milk’s options R and subsample to support random sampling of splitting decisions.
train(trainset)[source]

Trains the Milk Tree Learner.

use(dataset)[source]

Outputs the class predictions for dataset.

test(dataset)[source]

Outputs the result of use(dataset) and the classification error cost for each example in the dataset.