SPIDER The Spider Tutorial
How can I have a look at it quickly?
Download it here . Just unzip into the directory of choice. Start MATLAB and run the script "use_spider" to install. To get started type "help spider" and "help demos". E.g run spider/demos/feat_sel/go.m
NOTE: The Spider was developed with Matlab Version 11, it may not work with earlier versions.
What happens if this tutorial is incomprehensible to me?
Before we begin, we should note that some people who found this tutorial incomprehensible found an alternative tutorial using the spider from the MPI machine learning summer school useful, which is available here . Try that if you don't get this one.
How do I train and test an algorithm?
a) Prepare your data in a matrix of attributes (e.g X) and a matrix of labels (e.g Y) such that the rows are the examples.
b) Create a data object via d=data(X,Y)X=rand(50)-0.5; Y=sign(sum(X,2)); d=data(X,Y) % make simple data
c) Train an algorithm e.g svm.knn,c45:[tr a]=train(svm,d)
tr is the predictions (a data object),
a is the model that you learnt (an algorithm object, in this case an a svm object)The predictions themselves can be accessed via tr.X . In general any members of an object (e.g. hyperparameters, model parameters) are accessed in the same way.
d) Test the algorithm on new data d2:tst=test(a,d2,'class_loss')
and measure the loss/error rate with the classification loss function. Alternatively, the predictions themselves can be accessed viatst=test(a,d2); tst.X
. We could have also calculated the loss equivalently with:
tst=test(a,d2); tst=loss(tst,'class_loss');
Note that train returns two parameters: the data results and the algorithm (which includes the learnt model). Test only returns the data results and hence cannot change the model of the underyling algorithm. Type "help train" and "help test" for a little more information. Type "helpsome objects, such as get_mean (get the mean of several results) and wilcoxon (the wilcoxon statistical test) take group objects as input
Example: get the mean loss of training SVM on 10 different splits of data.d=group; for i=1:10 d=add(d,toy); end; get_mean(train(svm,d))
a loss object takes data and produces a new data object which stores the loss between the original X and Y components r=train(svm,d); a=loss('class_loss'); train(a,r)
or, equivalently,d=toy; r=train(chain({svm loss('class_loss')}),d);
a cv object takes data and produces a group object of data objects for each tested fold. get_mean(train(cv(svm),d))
a grid_sel object takes a group of algorithms and chooses the one with the best predicted generalization error (measured by another object, e.g the cv object). r=train(grid_sel(param(knn,'k',[1:5])),d);
Multi-class classification
Multi-class (and multi-label) classification is implemented by having a Y matrix with columns equal to the number of classes, for a given example the i^th row is 1 if it belongs to class i, and -1 otherwise. Here is an example:d=gen(bayes({gauss([-1 3]) gauss([0 4]) gauss([1 2])},'l=6'))
This generates a three class dataset, with each class coming from a Gaussian with differing mean and standard deviation. Now if we look at d.X and d.Y (inputs and outputs) they are:X:
We now have to train a multi-class classifier on this, so we can no longer use a vanilla SVM. We could use for example, the one-vs-the-rest method with an SVM:
1.6339 4.2481
0.3305 4.1003
-1.1005 3.1459
-1.1459 2.8517
1.2835 2.0098
0.6552 1.2341
Y:
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 -1 1
-1 1 -1
[r,a]=train(one_vs_rest(svm),d)
You can look at the individual trained machines and results using indices:
r{1}
a{2}
r{2}.X % look at predictions of second machine
a{1}.alpha % look at alphas of first machine
See spider/demos/mclass for more examples.
Feature Selection
To perform feature selection, most feature selection objects have a member feat which is a scalar indicating how many features should be selected. The output of a feature selection algorithm is then a data object where the X component has that chosen number of features. Often, the features are ranked by column, with the most important feature as decided by the algorithm placed in the first column.Many of the algorithms, e.g. fisher , rfe and l0 can also perform a classification step after feature selection. This is controlled by the boolean output_rank. (If this is set to 1, the ranked features are output, otherwise classification is performed.)
Lets look at an example. We will use rfe on some toy data, selecting 10 features.
d=gen(toy);
a=rfe;
a.feat=10;
a.output_rank=1;
[r,a]=train(a,d);
Now, we have trained the system. We are returned a trained model and a new data object with 10 features. The trained model stored in a contains a member rank which includes the indices of the chosen features.
See spider/demos/feat_sel for more examples.
Unbalanced data
If the number of positive and negative examples is highly unbalanced, and/or you wish to measure the loss in a classification problem giving more weight to one of the two classes, in some kernel methods such as svm you can use the balanced ridge to deal with this. You simply seta=svm;
a.balanced_ridge=0.1;or some other value to indicate the amount of regularization (larger = more regularization.) It works the same as adding a ridge parameter, that is it adds a constant to the diagonal of the kernel matrix, but adds r p/(n+p) to the positive examples and r n/(n+p) to the negative examples, where r is the balanced ridge parameter (0.1 in the example above) p is the number of positives, and n is the number of negatives. The parameter ridge is totally separate.
This has the effect that e.g. if you have a very small number of positive examples you make sure you classify them correctly at the expense of classifying incorrectly the larger, negative class.
Tell me how SVMs and KERNELS work
Some explanation of kernel objects is important because there are quite a few objects which use them.a kernel maps input data to output data where the output data is a matrix of inner products. it is often stored in the member object 'child', Example: in SVMs a=svm; a.child=kernel('rbf',1)
makes a svm with an rbf kernel with sigma=1. One can also do a.kerparam=0.5 to change sigma. One could also have created the same kernel witha=svm(kernel('rbf',1))
as kernels are automatically detected as a hyperparameters. Kernels are created using kernel([name],[params]). See help kernel for other types of kernel. One other particularly useful kernel is the custom kernel. This allows you to specify a matrix of inner products of your own choice, e.g:d=gen(toy); a=svm; a.child=kernel('custom',d.X*d.X');
is equivalent to a vanilla linear svm:
loss(train(cv(a),d))d=gen(toy); loss(train(cv(svm),d))
Training these algorithms in the normal way thus uses these kernels. The svm object uses mex files to implement fast training in C by calling svmlight or libsvm. This can be changed via the member optimizer, see help svm for more details. Note: 'custom' kernels do not work at present for LIBSVM, however they do work for svmlight.
To see what is actually happening with a kernel one can use it without a host algorithm such as svm. For example make some data with:d=data(rand(5)); k=kernel('rbf',2); r=train(k,d); r.X
The last command prints out the new kernel matrix returned in kernel object r . In the spider a kernel is viewed as another learning algorithm (just not a very complicated one in terms of generalization):After training one can test on new data (to know the inner products between the test data and the original data). Many of the spider objects don't implement things in this way yet, they use an old function called calc. Try not to use this. Seeing kernels in the train, test methodology allows one to create complex kernels e.g k=chain({ kernel('custom',D) kernel('rbf_from_dist',0.1) })
takes a custom distance matrix and transforms it to an rbf kernel. One can then use the param object to try different values of sigma of the rbf. Here's a simple example of using param and kernel objects together:s=svm(kernel('poly',1));
Note that poly is used to access the parameter of the kernel which is actually called kerparam in the object. This is because it is also aliased to be called poly, this is useful to differentiate it from other kernel parameters, e.g in a chain . See help algorithm on how to make aliases.
a=param(s,'poly',[1:3]);
[r a]=train(cv(a),toy)
get_mean(r)
Saving memory: data_global and custom_fast kernels
Because matlab uses a lot of memory (it passes by value and copies variables every time you call a function) we wrote some routines to use global variables to avoid this problem. data_global works in exactly the same way as data except it takes its X and Y components from global variables X and Y. Example: the following are equivalent, (apart from memory usage):rand('seed',1); d=gen(toy); get_mean(train(cv(svm),d))
is equivalent torand('seed',1); d=gen(toy);
global X; global Y; X=d.X; Y=d.Y; clear d; % use global data instead
d=data_global; get_mean(train(cv(svm),d))If you give the constructor of data_global extra parameters it creates local copies of X and Y and doesn't save memory any more. (It is very important when handling data objects, as with all objects, to use the access functions only, see implementation issues below.) Also, the drawback of data_global is that it can only handle one dataset at one time (although you can create multiple data_global objects which hold subsets of the data because they index the global matrices).
Likewise, the 'custom_fast' kernel can also replace 'custom' kernel but with a global variable K. Example: the following are equivalent, (apart from memory usage):d=gen(toy); a=svm; a.child=kernel('custom',d.X*d.X');
is equivalent to
loss(train(cv(a),d))d=gen(toy); a=svm; global K; K=d.X*d.X';
In order to be able to test on a test set one has to two options:
a.child=kernel('custom_fast'); loss(train(cv(a),d))
1. Replace the kernel during testing time with the actual kernel calculation. e.g: d=gen(toy); a=svm; global K; K=d.X*d.X';
a.child=kernel('custom_fast'); [tr a]=train(a,d)
dtst=gen(toy); a.child=kernel('linear');
loss(test(a,dtst))
2. Compute the custom kernel on train and test data from the start and extract two data splits that access the same kernel. The data objects know which part of the kernel matrix to access because the remember the indices from the original data object that you extract the from with the get command. e.g: d=gen(toy('l=100')); a=svm; global K; K=d.X*d.X';
a.child=kernel('custom_fast');
dtrn=get(d,[1:50]); dtst=get(d,[51:100]);
[tr a]=train(a,dtrn)
r=test(a,dtst)
loss(r)
Structured objects: kernels for strings, trees and graphs
At the moment The Spider doesn't have much special facilities for structured kernels such as those on strings, trees and graphs.
Practically, you have to use the gram matrix in a 'custom' kernel at the moment to train on these, and produce that matrix with your own code.Potentially, though, there is nothing stopping you defining a data object with the same public functions as the usual data object which deals with another type of data ( data only deals with vectors), e.g. a string_data object and then a kernel or kernels which calculate dot products for this new data type.
(If someone wants to write this please send it to us!)
Implementation: What is an object in Matlab?
An object in matlab is implemented with a directory beginning with a "@" . So we have spider/pat/@knn e.g for k-nearest neighbours which includes all the methods of that objects (which are M-files). Spider objects have a constructor which sets default hyperparameters and initials the model. They also have a "training" and "testing" method.
Technically all the objects in the spider are children of another object called algorithm . This stores some basic flags and also hosts the methods "train" and "test" which call the "training" and "testing" of child. The parent method handles training multiple datasets and other complications.
Implementation: How do I make my own object?
Take a look at the @template object in the spider/pat directory which is a simple svm object and copy this directory and just change the training.m and testing.m files and the constructor (which should be the same name as the directory) to make your new algorithm. Then to find about adding the object to the publically available spider package click here .
Implementation: PROGRAMMING GUIDELINES
1) Try not to use "..." in your M-files to extend to the next line because switching between Linux and Windows adds extra line feeds and makes it not work...
2) If you use a kernel, try to call train and test members of kernel, see e.g the svm object. This will help to make complex kernel objects, e.g chain networks of kernels.
3) try to use get, get_x, set_x of data objects, and try not to create new data objects, e.g in your code do not dod2=data; d2=d.X(1:2,:); d2=d.Y(1:2,:);
but rather, do:
d3=data(d.name,d.X,d.Y); d3.X=xx;d2=get(d,1:2);
otherwise you may lose e.g indexing information and other transparent members in an object, or be incompatible if members are added later.
d3=set_x(d,xx);