A Kernel Statistical Test of Independence
Arthur Gretton, Kenji Fukumizu, Choon-Hui Teo,
We propose to test whether random variables X and Y are independent based on a sample of observed pairs (x_i,y_i). We use as our test statistic the Hilbert-Schmidt norm of the covariance operator between RKHS mappings of X and Y: this is called the Hilbert-Schmidt Independence Criterion (HSIC). The population HSIC is zero at independence, so the sample is unlikely to be independent when the empirical HSIC is large. The test software returns both HSIC and a threshold, where the latter is a user-specified quantile of the empirical HSIC distribution at independence. When HSIC exceeds this threshold, we reject the independence hypothesis. Aside from the papers mentioned above, a more intuitive explanation of HSIC and the associated test may be found in these talk slides.
Two strategies are used to calculate the test threshold:
Code may be downloaded here. The zipfile contains three programs: hsicTestBoot.m uses a resampling procedure to obtain the test threshold, hsicTestGamma.m uses a two-parameter Gamma approximation to the null distribution to get the test threshold, and rbf_dot.m computes the kernel matrices.
|[GreEtAl08a]||Gretton, A., K. Fukumizu, C.-H. Teo, L. Song, B. Schoelkopf and A. Smola: A Kernel Statistical Test of Independence. NIPS 21, 2007.|
|[GreEtAl08b]||Gretton, A., K. Fukumizu, C.-H. Teo, L. Song, B. Schoelkopf and A. Smola: A Kernel Statistical Test of Independence. MPI Technical Report 168, 2008.|