Workshop on
Stability and Resampling Methods for Clustering
July 16 - 18, 2007
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Workshop Program (as pdf
file)
Videos and
slides of all talks
Outline of the workshop topic and aims:
Model assessment is one of the most crucial aspects of statistical
data analysis problems. In particular in data clustering it is
difficult to devise reasonable tools for this purpose - the most
prominent example is the problem of choosing the number k of clusters one wants to construct.
Stability-based methods and resampling methods have become a popular
choice for model selection in clustering, which is documented by the
wealth of literature on this topic. The basic rationale of those
approaches is that valid models should be reproducible under
perturbation or resampling of the data. If high instability of models
is observed, the inferred solution does not seem to be a generally
valid model, or at least seems to have missed some important aspects
of the data.
Many scientists report that stability and resampling methods
work well for clustering model selection. Moreover, for supervised
learning there is a wealth of literature that proves that stable
classification algorithms have a good generalization performance. On
the other hand, it has recently been claimed that stability methods
for clustering can be misleading and do not necessarily work the way
people believe they do. There is still an ongoing debate on how those
results should be interpreted. But many researchers working on
clustering stability methods agree that there is a lack of theoretical
understanding for stability methods in clustering. In particular it
seems unclear in which situations stability works and what the
mechanism is which makes it a successful tool in those situations.
This lack of understanding is the motivation for holding a workshop on
stability and resampling methods for clustering. We plan to hold a
rather small workshop for specialists working on stability questions
for clustering, or on stability-related questions in other areas of
computer science or mathematics. We want to have a small number of
invited talks, but want to dedicate a considerable amount of time to
discussions. Hopefully, combining the expertise of people working on
different aspects of stability and resampling will lead to a deeper
understanding of this tool and its role with respect to clustering.
To guide the discussion, we would like to point out the following list of questions about the theory of stability methods for clustering:
- For which purposes can we use clustering stability? For example, (how) can stability be used for model selection?
- What is the mechanism which makes stability a valid tool in those situations?
- Can we characterize the situations when stability tools can be successful? Can we predict situations in which stability tools will not help at all or are misleading?
- What are the inherent limitations of stability approaches? What are assumptions we have to make?
Participants:
- Joachim Buhmann, ETH Zürich, Switzerland
- Shai Ben-David, Univeristy of Waterloo, Canada
- Patrice Bertrand, ENST Bretagne, France
- Michael Kaufmann, Tübingen University
- Tilman Lange, ETH Zürich, Switzerland
- Marina Meila, University of Washington, USA
- Frank Meinecke, Fraunhofer First, Germany
- Tsuyoshi Okita, Vrije Universiteit Brussel, Belgium
- Kristiaan Pelckmans, KU Leuven, Belgium
- Volker Roth, ETH Zürich, Switzerland
- Bernhard Schölkopf, Max Planck Institute for Biological
Cybernetics, Germany
- Ohad Shamir, Hebrew University, Israel
- Nati Srebro, TTI Chicago, USA
- Giorgio Valentini, Universita di Milano, Italia
- Zeev Volkovich, ORT Braude College, Karmiel, Israel
- Ulrike von Luxburg, Max Planck Institute for Biological
Cybernetics, Germany
Workshop location:
The workshop will be held at the
Max Planck House
next to the Max Planck Institute
for Biological Cybernetics in Tübingen, Germany.
Tübingen is a small university town in the south of Germany, and
is worth a visity by its own. For many links about Tübingen and
its surroundings see here.
Accomodation:
There is a limited number of rooms in the
Max Planck Guest House, direct next to where the workshop will take place.
For other hotels in Tübingen please have a look at the
hotel page of the tourist information. In general, the institute can be reached by a 15-20 min bus ride from more or less any hotel in Tübingen.
Traveling to Tübingen:
For traveling directions see here.
Registration:
If you would like to take part in the workshop, please send an email to
Tilman
Lange.
Workshop organization:
This workshop is being organized by Ulrike von Luxburg, Max Planck Institute for Biological Cybernetics, Tübingen, and Tilman Lange, ETH Zürich, Switzerland. It is sponsored by the PASCAL network of excellence and the MPI for Biological Cybernetics.