The learners.features module contains FeatureLearner objects, meant for feature or representation learning. The MLProblems for these Learners should be iterators over inputs. Their output should be a new feature representation of the input.
The currently implemented algorithms are:
Interface for all Learner objects that learn features.
The only additional requirement from Learner is to define a method compute_features(example) that outputs the feature representation for some given example (normally a single input).
Removes the input’s mean and divides by its standard deviation, for each input.
Note that the mean and standard deviation is computed for each input vector individually, not on the dataset.
Option regularizer is a small constant to add to the standard deviation, to avoid divisions by 0.
Required metadata:
Principal Component Analysis.
Outputs the input’s projection on the principal components, so as to obtain a representation with mean zero and identity covariance.
Option n_components is the number of principal components to compute.
Option regularizer is a small constant to add to the diagonal of the estimated covariance matrix (default=1e-10).
Required metadata:
ZCA whitening preprocessing.
Outputs the whitened input, which has the same dimensionality as the original input but with mean zero and identity covariance.
Option regularizer is a small constant to add to the diagonal of the estimated covariance matrix (default=1e-10).
Required metadata:
Restricted Boltzmann Machine for feature learning
Option n_stages is the number of training iterations.
Options learning_rate and decrease_constant correspond to the learning rate and decrease constant used for stochastic gradient descent.
Option hidden_size should be a positive integer specifying the number of hidden units (features).
Option l1_regularization is the weight of L1 regularization on the connection matrix.
Option seed determines the seed for randomly initializing the weights.
Required metadata:
The k-means clustering algorithm.
We use the first few examples in the training set to initialize the cluster means.
For a given input, the Learner outputs a vector in which the component at the index of the selected cluster is 1, and all others 0.
Option n_stages is the number of iterations over the training set (default=10).
Option n_clusters is the number of clusters (default=10).
Option use_triangle_activation is True if the triangle activation function should be used to compute features. If False, than a hard one-hot feature representation is used (default=False).
Option seed is the seed for the random number generator (only used when some clusters are initially empty, to spawn new clusters).
Required metadata:
Learns a pipeline of FeatureLearners.
Outputs the result of applying each trained features sequentially (i.e. stacked features).
Option feature_learners is the list of FeatureLearner objects to train, corresponding to feature learners.
Required metadata: