Previous topic

distribution Learners

Next topic

Ranking Learners

This Page

Feature Learners

The learners.features module contains FeatureLearner objects, meant for feature or representation learning. The MLProblems for these Learners should be iterators over inputs. Their output should be a new feature representation of the input.

The currently implemented algorithms are:

  • FeatureLearner: The general interface for learners of features.
  • CenterAndNormalize: Removes the input’s mean and divides by its standard deviation, for each input.
  • PCA: Principal Component Analysis learner.
  • ZCA: ZCA whitening learner.
  • RBM: Restricted Boltzmann Machine learner for feature extraction.
  • k_means: The k-means clustering algorithm.
  • FeaturePipeline: A learner made from a pipeline of simpler FeatureLearner objects.
class learners.features.FeatureLearner[source]

Interface for all Learner objects that learn features.

The only additional requirement from Learner is to define a method compute_features(example) that outputs the feature representation for some given example (normally a single input).

compute_features(example)[source]

Return the feature representation of some given example.

A general implementation is provided here, but it is recommended that classes inheriting from FeatureLearner override it.

class learners.features.CenterAndNormalize(regularizer=10)[source]

Removes the input’s mean and divides by its standard deviation, for each input.

Note that the mean and standard deviation is computed for each input vector individually, not on the dataset.

Option regularizer is a small constant to add to the standard deviation, to avoid divisions by 0.

Required metadata:

  • 'input_size': Size of the inputs.
train(trainset)[source]

Does nothing: no training needed

use(dataset)[source]

Outputs the projection on the principal components, so as to obtain a representation with mean zero and identity covariance.

test(dataset)[source]

Outputs the squared difference between the processed and original input.

class learners.features.PCA(n_components, regularizer=1e-10)[source]

Principal Component Analysis.

Outputs the input’s projection on the principal components, so as to obtain a representation with mean zero and identity covariance.

Option n_components is the number of principal components to compute.

Option regularizer is a small constant to add to the diagonal of the estimated covariance matrix (default=1e-10).

Required metadata:

  • 'input_size': Size of the inputs.
train(trainset)[source]

Extract principal components.

use(dataset)[source]

Outputs the projection on the principal components, so as to obtain a representation with mean zero and identity covariance.

test(dataset)[source]

Outputs the squared error of the reconstructed inputs.

class learners.features.ZCA(regularizer=1e-10)[source]

ZCA whitening preprocessing.

Outputs the whitened input, which has the same dimensionality as the original input but with mean zero and identity covariance.

Option regularizer is a small constant to add to the diagonal of the estimated covariance matrix (default=1e-10).

Required metadata:

  • 'input_size': Size of the inputs.
train(trainset)[source]

Extract principal components required for ZCA whitening

use(dataset)[source]

Outputs the whitened inputs.

test(dataset)[source]

Outputs the squared error between the inputs and whitened inputs.

class learners.features.RBM(n_stages, learning_rate=0.01, decrease_constant=0, hidden_size=100, l1_regularization=0, seed=1234)[source]

Restricted Boltzmann Machine for feature learning

Option n_stages is the number of training iterations.

Options learning_rate and decrease_constant correspond to the learning rate and decrease constant used for stochastic gradient descent.

Option hidden_size should be a positive integer specifying the number of hidden units (features).

Option l1_regularization is the weight of L1 regularization on the connection matrix.

Option seed determines the seed for randomly initializing the weights.

Required metadata:

  • 'input_size': Size of the inputs.
class learners.features.k_means(n_stages=50, n_clusters=10, use_triangle_activation=False, seed=1234)[source]

The k-means clustering algorithm.

We use the first few examples in the training set to initialize the cluster means.

For a given input, the Learner outputs a vector in which the component at the index of the selected cluster is 1, and all others 0.

Option n_stages is the number of iterations over the training set (default=10).

Option n_clusters is the number of clusters (default=10).

Option use_triangle_activation is True if the triangle activation function should be used to compute features. If False, than a hard one-hot feature representation is used (default=False).

Option seed is the seed for the random number generator (only used when some clusters are initially empty, to spawn new clusters).

Required metadata:

  • 'input_size': Size of the inputs.
train(trainset)[source]

Extract clusters.

use(dataset)[source]

For a given input, the Learner outputs a vector in which the component at the index of the selected cluster is 1, and all others 0.

test(dataset)[source]

Outputs the squared error of the reconstructed inputs.

class learners.features.FeaturePipeline(feature_learners)[source]

Learns a pipeline of FeatureLearners.

Outputs the result of applying each trained features sequentially (i.e. stacked features).

Option feature_learners is the list of FeatureLearner objects to train, corresponding to feature learners.

Required metadata:

  • 'input_size': Size of the inputs.
train(trainset)[source]

Trains the pipeline of features.

use(dataset)[source]

Outputs the result of applying each FeatureLearners sequentially.

test(dataset)[source]

Returns the outputs and costs based on the last FeatureLearner of the pipeline.