Workshop on

Stability and Resampling Methods for Clustering

July 16 - 18, 2007

Max Planck Institute for Biological Cybernetics, Tübingen, Germany

Workshop Program (as pdf file)

Videos and slides of all talks

Outline of the workshop topic and aims:

Model assessment is one of the most crucial aspects of statistical data analysis problems. In particular in data clustering it is difficult to devise reasonable tools for this purpose - the most prominent example is the problem of choosing the number k of clusters one wants to construct. Stability-based methods and resampling methods have become a popular choice for model selection in clustering, which is documented by the wealth of literature on this topic. The basic rationale of those approaches is that valid models should be reproducible under perturbation or resampling of the data. If high instability of models is observed, the inferred solution does not seem to be a generally valid model, or at least seems to have missed some important aspects of the data.

Many scientists report that stability and resampling methods work well for clustering model selection. Moreover, for supervised learning there is a wealth of literature that proves that stable classification algorithms have a good generalization performance. On the other hand, it has recently been claimed that stability methods for clustering can be misleading and do not necessarily work the way people believe they do. There is still an ongoing debate on how those results should be interpreted. But many researchers working on clustering stability methods agree that there is a lack of theoretical understanding for stability methods in clustering. In particular it seems unclear in which situations stability works and what the mechanism is which makes it a successful tool in those situations.

This lack of understanding is the motivation for holding a workshop on stability and resampling methods for clustering. We plan to hold a rather small workshop for specialists working on stability questions for clustering, or on stability-related questions in other areas of computer science or mathematics. We want to have a small number of invited talks, but want to dedicate a considerable amount of time to discussions. Hopefully, combining the expertise of people working on different aspects of stability and resampling will lead to a deeper understanding of this tool and its role with respect to clustering.

To guide the discussion, we would like to point out the following list of questions about the theory of stability methods for clustering:

For which purposes can we use clustering stability? For example, (how) can stability be used for model selection?
What is the mechanism which makes stability a valid tool in those situations?
Can we characterize the situations when stability tools can be successful? Can we predict situations in which stability tools will not help at all or are misleading?
What are the inherent limitations of stability approaches? What are assumptions we have to make?

Participants:

Joachim Buhmann, ETH Zürich, Switzerland
Shai Ben-David, Univeristy of Waterloo, Canada
Patrice Bertrand, ENST Bretagne, France
Michael Kaufmann, Tübingen University
Tilman Lange, ETH Zürich, Switzerland
Marina Meila, University of Washington, USA
Frank Meinecke, Fraunhofer First, Germany
Tsuyoshi Okita, Vrije Universiteit Brussel, Belgium
Kristiaan Pelckmans, KU Leuven, Belgium
Volker Roth, ETH Zürich, Switzerland
Bernhard Schölkopf, Max Planck Institute for Biological Cybernetics, Germany
Ohad Shamir, Hebrew University, Israel
Nati Srebro, TTI Chicago, USA
Giorgio Valentini, Universita di Milano, Italia
Zeev Volkovich, ORT Braude College, Karmiel, Israel
Ulrike von Luxburg, Max Planck Institute for Biological Cybernetics, Germany

Workshop location:

The workshop will be held at the Max Planck House next to the Max Planck Institute for Biological Cybernetics in Tübingen, Germany. Tübingen is a small university town in the south of Germany, and is worth a visity by its own. For many links about Tübingen and its surroundings see here.

Accomodation:
There is a limited number of rooms in the Max Planck Guest House, direct next to where the workshop will take place. For other hotels in Tübingen please have a look at the hotel page of the tourist information. In general, the institute can be reached by a 15-20 min bus ride from more or less any hotel in Tübingen.

Traveling to Tübingen:
For traveling directions see here.

Registration:
If you would like to take part in the workshop, please send an email to Tilman Lange.

Workshop organization:

This workshop is being organized by Ulrike von Luxburg, Max Planck Institute for Biological Cybernetics, Tübingen, and Tilman Lange, ETH Zürich, Switzerland. It is sponsored by the PASCAL network of excellence and the MPI for Biological Cybernetics.