Organizers: Shai Ben-David, Ulrike von Luxburg, John Shawe-Taylor and Naftali Tishby
Background: Clustering is one of the most widely used techniques for exploratory data analysis. Despite the large number of algorithms and applications, the theoretical foundations of clustering seem to be distressingly meager, covering only some sub-domains and failing to address some of the most basic general aspects of the area. We wish to initiate a concerted discussion, in order to move towards a consolidation of the theoretical basis for -at least some of the aspects of -clustering. One prospective benefit of building a theoretical framework for clustering may come from enabling the transfer of tools developed in other related domains, such as machine learning and information theory, where the usefulness of having a general mathematical framework have been impressively demonstrated.
Questions we wish to address:
1. What is clustering? How can it be defined?
2. How should prior knowledge be encoded? As a pair-wise similarity/distance function over domain points? As a set of relevant features? Should data be embedded in a richer structure?
3. Is there a principled way to measure the quality of a clustering on particular
data set?
4. Is there a principled way to measure the quality of a clustering algorithm? What are sufficient conditions for reasonable clustering? Are there obvious necessary conditions? What type of performance guarantees can one hope to provide?
5. How should the similarity between different clusterings be measured?
6. Can one distinguish clusterable data from structureless data?
7. What are the tools we should try to import from other relevant areas of research?
Morning session: 7:30am–10:30am
7:30am |
Introduction and Goals of theWorkshop. slides. Towards a statistical theory of clustering. Ulrike von Luxburg. slides |
What is clustering?
7:45am | Attempts to formalize clustering, Shai Ben-David. abstract. slides |
8:15am |
On the futility of attempts to formalize clustering within the conventional |
8:25am | Informational and computational limits of clustering, Nati Srebro, Sam Roweis, and Gregory Shakhnarovich. abstract. slides |
8:35am | discussion |
8:45am | coffee break |
Stability and Evaluation of clustering
8:55am | Invited talk: Confidence and stability in comparing clusterings, Marina Meila. abstract. slides |
9:25am | Cascade evaluation, Laurent Candillier, Isabelle Tellier, Fabien Torre, and Olivier Bousquet. abstract. slides |
9:35am | Invited talk: Data Clustering and the Stability Method, Joachim M Buhmann. slides |
10:05 am | Stability of clustering method, Sasha Rakhlin. abstract. slides |
10:15am | discussion |
Afternoon session: 3:30pm–6:30pm
The clustering input structure
3:30pm | Feature space generalization - the missing dimension of learning? Tali Tishby. slides |
4:00pm | Invited talk: Learning with similarity functions, Avrim Blum. abstract. slides |
4:30 pm | Learning clusterwise similarity with first order formulas, Aron Culotta and Andrew McCallum. abstract. slides |
4:40pm | coffee break |
Information theoretic and other principled approaches to clustering
4:50pm | Invited talk: Clustering from a rate-distortion perspective, Joydeep Ghosh. abstract. slides |
5:20pm |
Information based clustering - a principled approach for cluster analysis. Noam Slonim, Gurinder Singh Atwal, Gasper Tkacik, William Bialek. abstract. slides |
5:30pm | An MDL Framework for Data Clustering, Petri Myllymaki. abstract . slides |
5:40pm | Density traversal clustering, Amos J Storkey and Tom G Griffiths. abstract. slides |
5:50pm | Clustering and Staircases, K. Pelckmans, J.A.K. Suykens, B. De Moor. abstract. slides |
6:00pm | Concluding discussion |
Call for contributions - deadline passed