Title: On learning with similarity functions (with extensions to clustering) ABSTRACT ======== There has been substantial work in both theory and practice on learning with {\em kernel} functions. However, there is also a bit of a disconnect between theory and practice. In practice, kernels are viewed as measures of similarity, but the theory talks instead about margins of separators in implicit, high dimensional spaces defined by mappings that one may not even be able to calculate. In this talk I will discuss work on developing an alternative theory of learning with {\em similarity} functions (i.e., sufficient conditions for a similarity function to allow one to learn well) that (a) does not require reference to any implicit spaces, (b) does not require the function to be positive semi-definite, and (c) generalizes the standard theory in that any good kernel function in the standard sense is also a good similarity function in the sense defined here. I will then talk about some preliminary work attempting to extend this to clustering: that is, what conditions on a similarity function would be sufficient to allow one to cluster well, and how should that be defined. This is joint work with Nina Balcan.