Organizers: Shai Ben-David, Ulrike von Luxburg, John Shawe-Taylor and Naftali Tishby
Background: Clustering is one of the most widely used techniques for exploratory data analysis. Despite the large number of algorithms and applications, the theoretical foundations of clustering seem to be distressingly meager, covering only some sub-domains and failing to address some of the most basic general aspects of the area. We wish to initiate a concerted discussion, in order to move towards a consolidation of the theoretical basis for -at least some of the aspects of -clustering. One prospective benefit of building a theoretical framework for clustering may come from enabling the transfer of tools developed in other related domains, such as machine learning and information theory, where the usefulness of having a general mathematical framework have been impressively demonstrated.
Questions we wish to address:
1. What is clustering? How can it be defined?
2. How should prior knowledge be encoded? As a pair-wise similarity/distance function over domain points? As a set of relevant features? Should data be embedded in a richer structure?
3. Is there a principled way to measure the quality of a clustering on particular data set?
4. Is there a principled way to measure the quality of a clustering algorithm? What are sufficient conditions for reasonable clustering? Are there obvious necessary conditions? What type of performance guarantees can one hope to provide?
5. How should the similarity between different clusterings be measured?
6. Can one distinguish clusterable data from structureless data?
7. What are the tools we should try to import from other relevant areas of research?
(as pdf file)
Morning session: 7:30am–10:30am
Introduction and Goals of theWorkshop. slides.
Towards a statistical theory of clustering. Ulrike von Luxburg. slides
What is clustering?
|7:45am||Attempts to formalize clustering, Shai Ben-David. abstract. slides|
|8:25am||Informational and computational limits of clustering, Nati Srebro, Sam Roweis, and Gregory Shakhnarovich. abstract. slides|
Stability and Evaluation of clustering
|8:55am||Invited talk: Confidence and stability in comparing clusterings, Marina Meila. abstract. slides|
|9:25am||Cascade evaluation, Laurent Candillier, Isabelle Tellier, Fabien Torre, and Olivier Bousquet. abstract. slides|
|9:35am||Invited talk: Data Clustering and the Stability Method, Joachim M Buhmann. slides|
|10:05 am||Stability of clustering method, Sasha Rakhlin. abstract. slides|
Afternoon session: 3:30pm–6:30pm
The clustering input structure
|3:30pm||Feature space generalization - the missing dimension of learning? Tali Tishby. slides|
|4:00pm||Invited talk: Learning with similarity functions, Avrim Blum. abstract. slides|
|4:30 pm||Learning clusterwise similarity with first order formulas, Aron Culotta and Andrew McCallum. abstract. slides|
Information theoretic and other principled approaches to clustering
|4:50pm||Invited talk: Clustering from a rate-distortion perspective, Joydeep Ghosh. abstract. slides|
|5:30pm||An MDL Framework for Data Clustering, Petri Myllymaki. abstract . slides|
|5:40pm||Density traversal clustering, Amos J Storkey and Tom G Griffiths. abstract. slides|
|5:50pm||Clustering and Staircases, K. Pelckmans, J.A.K. Suykens, B. De Moor. abstract. slides|
Extended workshop description
Call for contributions - deadline passed