Table Of Contents

Previous topic

Milk Learners

Next topic

Orange Learners

This Page

TreeLearn Learners

The package learners.third_party.treelearn contains modules for learning algorithms implemented by in the TreeLearn library. These modules all require that the TreeLearn and scikits-learn libraries be installed.

To install scikits-learn, one option is to use easy_install:

easy_install -U scikit-learn

For other ways of installing scikits-learn, see http://scikit-learn.sourceforge.net/dev/install.html#installing-an-official-release.

To install TreeLearn:

  1. download TreeLearn through this link: https://github.com/capitalk/treelearn/zipball/master

  2. unzip the downloaded content and run (possibly with sudo):

    python setup.py install
    

And that should do it!

Currently, learner.third_party.treelearn contains the following modules:

  • learning.third_party.treelearn.classification: Classifiers from the TreeLearn library.
  • learning.third_party.treelearn.regression: Regression models from the TreeLearn library.

TreeLearn Classifiers

The learners.third_party.treelearn.classification module contains classifiers from the TreeLearn library:

  • RandomForest: Random forest classifier.
class learners.third_party.treelearn.classification.RandomForest(n_trees=50, sample_percent=0.5, n_features_per_node=None, min_leaf_size=1, max_height=100, max_thresholds=None, seed=1234)[source]

Random Forest classifier based on the TreeLearn library

Option n_trees is the number of trees to train in the ensemble (default = 50).

Option sample_percent is the proportion of the dataset to sample for training each tree (default = 0.5)

Option n_features_per_node is the number of inputs (features) to consider when splitting a tree node. The default (None) is to use the log of the input size.

Option min_leaf_size is a minimum threshold on the number of training examples in a node, below which a node is not split (default = 1).

Option max_height is the maximum height of the trees (default = 100).

Option max_thresholds is the maximum number of thresholds to consider when splitting an input (feature). Those thresholds are evenly spaced between the minimum and maximum input value. The default (None) behavior is to consider all midpoints between unique input values.

Option seed is the seed of the random number generator.

Required metadata:

  • 'input_size'
  • 'targets'
  • 'class_to_id'
train(trainset)[source]

Trains a random forest using TreeLearn.

use(dataset)[source]

Outputs the class predictions for dataset and the class probabilities.

test(dataset)[source]

Outputs the result of use(dataset) and the classification error cost for each example in the dataset.