SPIDER The Spider Tutorial

How can I have a look at it quickly?

Download it here . Just unzip into the directory of choice. Start MATLAB and run the script "use_spider" to install. To get started type "help spider" and "help demos". E.g run spider/demos/feat_sel/go.m
NOTE: The Spider was developed with Matlab Version 11, it may not work with earlier versions.

What happens if this tutorial is incomprehensible to me?

Before we begin, we should note that some people who found this tutorial incomprehensible found an alternative tutorial using the spider from the MPI machine learning summer school useful, which is available here . Try that if you don't get this one.

How do I train and test an algorithm?

a) Prepare your data in a matrix of attributes (e.g X) and a matrix of labels (e.g Y) such that the rows are the examples.

b) Create a data object via d=data(X,Y)

X=rand(50)-0.5; Y=sign(sum(X,2)); d=data(X,Y) % make simple data

c) Train an algorithm e.g svm.knn,c45:

[tr a]=train(svm,d)

tr is the predictions (a data object),
a is the model that you learnt (an algorithm object, in this case an a svm object)

The predictions themselves can be accessed via tr.X . In general any members of an object (e.g. hyperparameters, model parameters) are accessed in the same way.

d) Test the algorithm on new data d2:

tst=test(a,d2,'class_loss')

and measure the loss/error rate with the classification loss function. Alternatively, the predictions themselves can be accessed via

tst=test(a,d2); tst.X

. We could have also calculated the loss equivalently with:

tst=test(a,d2); tst=loss(tst,'class_loss');

Note that train returns two parameters: the data results and the algorithm (which includes the learnt model). Test only returns the data results and hence cannot change the model of the underyling algorithm. Type "help train" and "help test" for a little more information. Type "help " for help on any object, e.g "help svm" and "help spider" for a list of objects.

How do I set parameters of algorithms?

When an algorithm is initialized the last parameter is the set of command strings in a cell array which setup hyperparameters, e.g:

a=svm('C=1')

sets hyperparameter C. To set two parameters you can do either:
a=svm({'C=1','ridge=1e-10'}) or a=svm('C=1;ridge=1e-10') .

Or, after an object is created one can access its members (model parameters, or hyperparameters) with:

a=svm; a.C=1;

Type "help svm" for a list of its hyperparameters.
Another useful tip is to use the matlab conversion of an object to a struct to quickly examine its members. E.g:

>> d=gen(toy2d); struct(d)

ans =

X: [50x2 double]
Y: [50x1 double]
index: [1x50 double]
findex: [1 2]
algorithm: [1x1 algorithm]

How do you plug objects together?

This section is the same as in the "What is it?" page but with examples in matlab for each object described.

  • Both algorithm and data are objects.
  • All objects have two methods: train and test
  • .
  • A data object has an X (input) component and, usually, a Y (output) component.
  • train & test methods take a data object & output another data object.
  • train can access the Y component, but test cannot.
  • in general they transform the X component
  • e.g in supervised learning an object transforms the X to make it close to the Y.

  • a chain object is a list of objects which pass their output to input to next object in the list. Example: normalize the data, run a feature selection algorithm and then feed the output into an SVM:

    d=toy; train(chain({normalize l0 svm}),d)

  • a param object trains/tests multiple versions of an object with different parameters. Example: train k-NN with differing k:

    train(param(knn,'k',[1:5]),d)

    You could also use it to generate different kinds of data, e.g train on toy data with different dimensionality (n):

    get_mean(train(cv(svm),param(toy,'n',[1:5])))

  • a group object puts several algorithms or data objects into a set.
    Example: training k-NN and SVM as a group produces a group of two outputs.

    train(group({knn svm}),d)

    You could also use it to generate different kinds of data, e.g train on two toy data with different dimensionality (n):

    train(group({knn svm}),group({toy('n=5') toy('n=10')}))

  • some objects, such as get_mean (get the mean of several results) and wilcoxon (the wilcoxon statistical test) take group objects as input
    Example: get the mean loss of training SVM on 10 different splits of data.

    d=group; for i=1:10 d=add(d,toy); end; get_mean(train(svm,d))

  • a loss object takes data and produces a new data object which stores the loss between the original X and Y components

    r=train(svm,d); a=loss('class_loss'); train(a,r)

    or, equivalently,

    d=toy; r=train(chain({svm loss('class_loss')}),d);

  • a cv object takes data and produces a group object of data objects for each tested fold.

    get_mean(train(cv(svm),d))

  • a grid_sel object takes a group of algorithms and chooses the one with the best predicted generalization error (measured by another object, e.g the cv object).

    r=train(grid_sel(param(knn,'k',[1:5])),d);

    Multi-class classification

    Multi-class (and multi-label) classification is implemented by having a Y matrix with columns equal to the number of classes, for a given example the i^th row is 1 if it belongs to class i, and -1 otherwise. Here is an example:

    d=gen(bayes({gauss([-1 3]) gauss([0 4]) gauss([1 2])},'l=6'))

    This generates a three class dataset, with each class coming from a Gaussian with differing mean and standard deviation. Now if we look at d.X and d.Y (inputs and outputs) they are:

    X:
    1.6339 4.2481
    0.3305 4.1003
    -1.1005 3.1459
    -1.1459 2.8517
    1.2835 2.0098
    0.6552 1.2341


    Y:
    1 -1 -1
    -1 1 -1
    -1 -1 1
    1 -1 -1
    -1 -1 1
    -1 1 -1

    We now have to train a multi-class classifier on this, so we can no longer use a vanilla SVM. We could use for example, the one-vs-the-rest method with an SVM:

    [r,a]=train(one_vs_rest(svm),d)

    You can look at the individual trained machines and results using indices:
    r{1}
    a{2}
    r{2}.X % look at predictions of second machine
    a{1}.alpha % look at alphas of first machine

    See spider/demos/mclass for more examples.

    Feature Selection

    To perform feature selection, most feature selection objects have a member feat which is a scalar indicating how many features should be selected. The output of a feature selection algorithm is then a data object where the X component has that chosen number of features. Often, the features are ranked by column, with the most important feature as decided by the algorithm placed in the first column.

    Many of the algorithms, e.g. fisher , rfe and l0 can also perform a classification step after feature selection. This is controlled by the boolean output_rank. (If this is set to 1, the ranked features are output, otherwise classification is performed.)

    Lets look at an example. We will use rfe on some toy data, selecting 10 features.

    d=gen(toy);
    a=rfe;
    a.feat=10;
    a.output_rank=1;
    [r,a]=train(a,d);

    Now, we have trained the system. We are returned a trained model and a new data object with 10 features. The trained model stored in a contains a member rank which includes the indices of the chosen features.

    See spider/demos/feat_sel for more examples.

    Unbalanced data

    If the number of positive and negative examples is highly unbalanced, and/or you wish to measure the loss in a classification problem giving more weight to one of the two classes, in some kernel methods such as svm you can use the balanced ridge to deal with this. You simply set

    a=svm;
    a.balanced_ridge=0.1;

    or some other value to indicate the amount of regularization (larger = more regularization.) It works the same as adding a ridge parameter, that is it adds a constant to the diagonal of the kernel matrix, but adds r p/(n+p) to the positive examples and r n/(n+p) to the negative examples, where r is the balanced ridge parameter (0.1 in the example above) p is the number of positives, and n is the number of negatives. The parameter ridge is totally separate.

    This has the effect that e.g. if you have a very small number of positive examples you make sure you classify them correctly at the expense of classifying incorrectly the larger, negative class.


    Tell me how SVMs and KERNELS work

    Some explanation of kernel objects is important because there are quite a few objects which use them.

  • a kernel maps input data to output data where the output data is a matrix of inner products.
  • it is often stored in the member object 'child', Example: in SVMs

    a=svm; a.child=kernel('rbf',1)

    makes a svm with an rbf kernel with sigma=1. One can also do a.kerparam=0.5 to change sigma. One could also have created the same kernel with

    a=svm(kernel('rbf',1))

    as kernels are automatically detected as a hyperparameters. Kernels are created using kernel([name],[params]). See help kernel for other types of kernel. One other particularly useful kernel is the custom kernel. This allows you to specify a matrix of inner products of your own choice, e.g:

    d=gen(toy); a=svm; a.child=kernel('custom',d.X*d.X');
    loss(train(cv(a),d))

    is equivalent to a vanilla linear svm:

    d=gen(toy); loss(train(cv(svm),d))

  • Training these algorithms in the normal way thus uses these kernels. The svm object uses mex files to implement fast training in C by calling svmlight or libsvm. This can be changed via the member optimizer, see help svm for more details.

    Note: 'custom' kernels do not work at present for LIBSVM, however they do work for svmlight.

    To see what is actually happening with a kernel one can use it without a host algorithm such as svm. For example make some data with:

    d=data(rand(5)); k=kernel('rbf',2); r=train(k,d); r.X

    The last command prints out the new kernel matrix returned in kernel object r . In the spider a kernel is viewed as another learning algorithm (just not a very complicated one in terms of generalization):
  • After training one can test on new data (to know the inner products between the test data and the original data).
  • Many of the spider objects don't implement things in this way yet, they use an old function called calc. Try not to use this.
  • Seeing kernels in the train, test methodology allows one to create complex kernels e.g

    k=chain({ kernel('custom',D) kernel('rbf_from_dist',0.1) })

    takes a custom distance matrix and transforms it to an rbf kernel. One can then use the param object to try different values of sigma of the rbf. Here's a simple example of using param and kernel objects together:

    s=svm(kernel('poly',1));
    a=param(s,'poly',[1:3]);
    [r a]=train(cv(a),toy)
    get_mean(r)

    Note that poly is used to access the parameter of the kernel which is actually called kerparam in the object. This is because it is also aliased to be called poly, this is useful to differentiate it from other kernel parameters, e.g in a chain . See help algorithm on how to make aliases.

    Saving memory: data_global and custom_fast kernels

    Because matlab uses a lot of memory (it passes by value and copies variables every time you call a function) we wrote some routines to use global variables to avoid this problem. data_global works in exactly the same way as data except it takes its X and Y components from global variables X and Y. Example: the following are equivalent, (apart from memory usage):

    rand('seed',1); d=gen(toy); get_mean(train(cv(svm),d))

    is equivalent to

    rand('seed',1); d=gen(toy);
    global X; global Y; X=d.X; Y=d.Y; clear d; % use global data instead
    d=data_global; get_mean(train(cv(svm),d))

  • If you give the constructor of data_global extra parameters it creates local copies of X and Y and doesn't save memory any more.
  • (It is very important when handling data objects, as with all objects, to use the access functions only, see implementation issues below.)
  • Also, the drawback of data_global is that it can only handle one dataset at one time (although you can create multiple data_global objects which hold subsets of the data because they index the global matrices).

    Likewise, the 'custom_fast' kernel can also replace 'custom' kernel but with a global variable K. Example: the following are equivalent, (apart from memory usage):

    d=gen(toy); a=svm; a.child=kernel('custom',d.X*d.X');
    loss(train(cv(a),d))

    is equivalent to

    d=gen(toy); a=svm; global K; K=d.X*d.X';
    a.child=kernel('custom_fast'); loss(train(cv(a),d))

    In order to be able to test on a test set one has to two options:

  • 1. Replace the kernel during testing time with the actual kernel calculation. e.g:

    d=gen(toy); a=svm; global K; K=d.X*d.X';
    a.child=kernel('custom_fast'); [tr a]=train(a,d)
    dtst=gen(toy); a.child=kernel('linear');
    loss(test(a,dtst))

  • 2. Compute the custom kernel on train and test data from the start and extract two data splits that access the same kernel. The data objects know which part of the kernel matrix to access because the remember the indices from the original data object that you extract the from with the get command. e.g:

    d=gen(toy('l=100')); a=svm; global K; K=d.X*d.X';
    a.child=kernel('custom_fast');
    dtrn=get(d,[1:50]); dtst=get(d,[51:100]);
    [tr a]=train(a,dtrn)
    r=test(a,dtst)
    loss(r)



  • Structured objects: kernels for strings, trees and graphs

    At the moment The Spider doesn't have much special facilities for structured kernels such as those on strings, trees and graphs.
    Practically, you have to use the gram matrix in a 'custom' kernel at the moment to train on these, and produce that matrix with your own code.

    Potentially, though, there is nothing stopping you defining a data object with the same public functions as the usual data object which deals with another type of data ( data only deals with vectors), e.g. a string_data object and then a kernel or kernels which calculate dot products for this new data type.
    (If someone wants to write this please send it to us!)

    Implementation: What is an object in Matlab?

    An object in matlab is implemented with a directory beginning with a "@" . So we have spider/pat/@knn e.g for k-nearest neighbours which includes all the methods of that objects (which are M-files). Spider objects have a constructor which sets default hyperparameters and initials the model. They also have a "training" and "testing" method.

    Technically all the objects in the spider are children of another object called algorithm . This stores some basic flags and also hosts the methods "train" and "test" which call the "training" and "testing" of child. The parent method handles training multiple datasets and other complications.

    Implementation: How do I make my own object?

    Take a look at the @template object in the spider/pat directory which is a simple svm object and copy this directory and just change the training.m and testing.m files and the constructor (which should be the same name as the directory) to make your new algorithm. Then to find about adding the object to the publically available spider package click here .

    Implementation: PROGRAMMING GUIDELINES

    1) Try not to use "..." in your M-files to extend to the next line because switching between Linux and Windows adds extra line feeds and makes it not work...

    2) If you use a kernel, try to call train and test members of kernel, see e.g the svm object. This will help to make complex kernel objects, e.g chain networks of kernels.

    3) try to use get, get_x, set_x of data objects, and try not to create new data objects, e.g in your code do not do

    d2=data; d2=d.X(1:2,:); d2=d.Y(1:2,:);
    d3=data(d.name,d.X,d.Y); d3.X=xx;

    but rather, do:

    d2=get(d,1:2);
    d3=set_x(d,xx);

    otherwise you may lose e.g indexing information and other transparent members in an object, or be incompatible if members are added later.