SPIDER The Spider Tutorial

How can I have a look at it quickly?

Download it here . Just unzip into the directory of choice. Start MATLAB and run the script "use_spider" to install. To get started type "help spider" and "help demos". E.g run spider/demos/feat_sel/go.m
NOTE: The Spider was developed with Matlab Version 11, it may not work with earlier versions.

What happens if this tutorial is incomprehensible to me?

Before we begin, we should note that some people who found this tutorial incomprehensible found an alternative tutorial using the spider from the MPI machine learning summer school useful, which is available here . Try that if you don't get this one.

How do I train and test an algorithm?

a) Prepare your data in a matrix of attributes (e.g X) and a matrix of labels (e.g Y) such that the rows are the examples.

b) Create a data object via d=data(X,Y)

X=rand(50)-0.5; Y=sign(sum(X,2)); d=data(X,Y) % make simple data

c) Train an algorithm e.g svm.knn,c45:

[tr a]=train(svm,d)

tr is the predictions (a data object),
a is the model that you learnt (an algorithm object, in this case an a svm object)

The predictions themselves can be accessed via tr.X . In general any members of an object (e.g. hyperparameters, model parameters) are accessed in the same way.

d) Test the algorithm on new data d2:

tst=test(a,d2,'class_loss')

and measure the loss/error rate with the classification loss function. Alternatively, the predictions themselves can be accessed via

tst=test(a,d2); tst.X

. We could have also calculated the loss equivalently with:

tst=test(a,d2); tst=loss(tst,'class_loss');

Note that train returns two parameters: the data results and the algorithm (which includes the learnt model). Test only returns the data results and hence cannot change the model of the underyling algorithm. Type "help train" and "help test" for a little more information. Type "help " for help on any object, e.g "help svm" and "help spider" for a list of objects.

How do I set parameters of algorithms?

When an algorithm is initialized the last parameter is the set of command strings in a cell array which setup hyperparameters, e.g:

a=svm('C=1')

sets hyperparameter C. To set two parameters you can do either:
a=svm({'C=1','ridge=1e-10'}) or a=svm('C=1;ridge=1e-10') .

Or, after an object is created one can access its members (model parameters, or hyperparameters) with:

a=svm; a.C=1;

Type "help svm" for a list of its hyperparameters.
Another useful tip is to use the matlab conversion of an object to a struct to quickly examine its members. E.g:

>> d=gen(toy2d); struct(d)

ans =

X: [50x2 double]
Y: [50x1 double]
index: [1x50 double]
findex: [1 2]
algorithm: [1x1 algorithm]

How do you plug objects together?

This section is the same as in the "What is it?" page but with examples in matlab for each object described.

  • Both algorithm and data are objects.
  • All objects have two methods: train and test
  • .
  • A data object has an X (input) component and, usually, a Y (output) component.
  • train & test methods take a data object & output another data object.
  • train can access the Y component, but test cannot.
  • in general they transform the X component
  • e.g in supervised learning an object transforms the X to make it close to the Y.

  • a chain object is a list of objects which pass their output to input to next object in the list. Example: normalize the data, run a feature selection algorithm and then feed the output into an SVM:

    d=toy; train(chain({normalize l0 svm}),d)

  • a param object trains/tests multiple versions of an object with different parameters. Example: train k-NN with differing k:

    train(param(knn,'k',[1:5]),d)

    You could also use it to generate different kinds of data, e.g train on toy data with different dimensionality (n):

    get_mean(train(cv(svm),param(toy,'n',[1:5])))

  • a group object puts several algorithms or data objects into a set.
    Example: training k-NN and SVM as a group produces a group of two outputs.

    train(group({knn svm}),d)

    You could also use it to generate different kinds of data, e.g train on two toy data with different dimensionality (n):

    train(group({knn svm}),group({toy('n=5') toy('n=10')}))

  • some objects, such as get_mean (get the mean of several results) and wilcoxon (the wilcoxon statistical test) take group objects as input
    Example: get the mean loss of training SVM on 10 different splits of data.

    d=group; for i=1:10 d=add(d,toy); end; get_mean(train(svm,d))

  • a loss object takes data and produces a new data object which stores the loss between the original X and Y components

    r=train(svm,d); a=loss('class_loss'); train(a,r)

    or, equivalently,

    d=toy; r=train(chain({svm loss('class_loss')}),d);

  • a cv object takes data and produces a group object of data objects for each tested fold.

    get_mean(train(cv(svm),d))

  • a grid_sel object takes a group of algorithms and chooses the one with the best predicted generalization error (measured by another object, e.g the cv object).

    r=train(grid_sel(param(knn,'k',[1:5])),d);

    Multi-class classification

    Multi-class (and multi-label) classification is implemented by having a Y matrix with columns equal to the number of classes, for a given example the i^th row is 1 if it belongs to class i, and -1 otherwise. Here is an example:

    d=gen(bayes({gauss([-1 3]) gauss([0 4]) gauss([1 2])},'l=6'))

    This generates a three class dataset, with each class coming from a Gaussian with differing mean and standard deviation. Now if we look at d.X and d.Y (inputs and outputs) they are:

    X:
    1.6339 4.2481
    0.3305 4.1003
    -1.1005 3.1459
    -1.1459 2.8517
    1.2835 2.0098
    0.6552 1.2341


    Y:
    1 -1 -1
    -1 1 -1
    -1 -1 1
    1 -1 -1
    -1 -1 1
    -1 1 -1

    We now have to train a multi-class classifier on this, so we can no longer use a vanilla SVM. We could use for example, the one-vs-the-rest method with an SVM:

    [r,a]=train(one_vs_rest(svm),d)

    You can look at the individual trained machines and results using indices:
    r{1}
    a{2}
    r{2}.X % look at predictions of second machine
    a{1}.alpha % look at alphas of first machine

    See spider/demos/mclass for more examples.

    Feature Selection

    To perform feature selection, most feature selection objects have a member feat which is a scalar indicating how many features should be selected. The output of a feature selection algorithm is then a data object where the X component has that chosen number of features. Often, the features are ranked by column, with the most important feature as decided by the algorithm placed in the first column.

    Many of the algorithms, e.g. fisher , rfe and l0 can also perform a classification step after feature selection. This is controlled by the boolean output_rank. (If this is set to 1, the ranked features are output, otherwise classification is performed.)

    Lets look at an example. We will use rfe on some toy data, selecting 10 features.

    d=gen(toy);
    a=rfe;
    a.feat=10;
    a.output_rank=1;
    [r,a]=train(a,d);

    Now, we have trained the system. We are returned a trained model and a new data object with 10 features. The trained model stored in a contains a member rank which includes the indices of the chosen features.

    See spider/demos/feat_sel for more examples.

    Unbalanced data

    If the number of positive and negative examples is highly unbalanced, and/or you wish to measure the loss in a classification problem giving more weight to one of the two classes, in some kernel methods such as svm you can use the balanced ridge to deal with this. You simply set

    a=svm;
    a.balanced_ridge=0.1;

    or some other value to indicate the amount of regularization (larger = more regularization.) It works the same as adding a ridge parameter, that is it adds a constant to the diagonal of the kernel matrix, but adds r p/(n+p) to the positive examples and r n/(n+p) to the negative examples, where r is the balanced ridge parameter (0.1 in the example above) p is the number of positives, and n is the number of negatives. The parameter ridge is totally separate.

    This has the effect that e.g. if you have a very small number of positive examples you make sure you classify them correctly at the expense of classifying incorrectly the larger, negative class.

    More advanced topics: kernels, saving memory, implementing your own objects, ..

    Click here for more advanced topics.