See our tutorial on "PAC-Bayesian Analysis and its Applications"
at ECML-PKDD here: https://sites.google.com/site/ecmlpkddtutorialpacbayes/.
PAC-Bayesian analysis is a basic and very general tool for data-dependent analysis in machine learning. By now, it has been applied in such diverse areas as supervised learning, unsupervised learning, and reinforcement learning, leading to state-of-the-art algorithms and accompanying generalization bounds. PAC-Bayesian analysis, in a sense, takes the best out of Bayesian methods and PAC learning and puts it together: (1) it provides an easy way to exploit prior knowledge (like Bayesian methods); (2) it provides strict and explicit generalization guarantees (like VC theory); and (3) it is data-dependent and provides an easy and strict way of exploiting benign conditions (like Rademacher complexities). In addition, PAC-Bayesian bounds directly lead to efficient learning algorithms.
We will start with a general introduction to PAC-Bayesian
analysis, which should be accessible to an average student, who is
familiar with machine learning at the basic level. Then, we will
survey multiple forms of PAC-Bayesian bounds and their numerous
applications in different fields (including supervised and
unsupervised learning, finite and continuous domains, and the very
recent extension to martingales and reinforcement learning). Some
of these applications will be explained in more details, while
others will be surveyed at a high level. We will also describe the
relations and distinctions between PAC-Bayesian analysis, Bayesian
learning, VC theory, and Rademacher complexities. We will discuss
the role, value, and shortcomings of frequentist bounds that are
inspired by Bayesian analysis.
The talk will be targeted at a general audience: people that
would like to learn about PAC-Bayesian analysis, understand it
better, and apply it in their research. We will assume that the
audience is familiar with the notions of a hypothesis space, loss
function, Markov's and Hoeffding's inequalities, KL-divergence,
and mutual information. No prior knowledge beyond these basics
will be assumed. We aim at several goals at increasing difficulty
level of what we would like our tutorial participants to learn.
(1) The participants should learn, what PAC-Bayesian analysis is
in general, what PAC-Bayesian bounds mean, where they can and
cannot be applied, and what can and cannot be achieved with
PAC-Bayesian analysis. (2) Participants should get an idea of how
to derive a PAC-Bayesian bound for a standard application and how
to derive a learning algorithm from a PAC-Bayesian bound. (3)
Advanced listeners, who are already familiar with the PAC-Bayesian
analysis, will enjoy the new organized and categorized view of
this subject and, possibly, learn some new form or application of
In the first part we present the simplest form of PAC-Bayesian theorem, show two simple applications of this theorem in supervised learning, and discuss the relation to Bayesian learning.
In the second part we show multiple different forms that PAC-Bayesian theorems can take, tighter bounds for supervised learning, and additional applications in unsupervised and reinforcement learning.
Professeur adjoint, Département d'informatique, Université Laval (Québec), Canada
Head of Department of Computer Science, University College London, UK
For implementation of prior learning technique in PAC-Bayesian
analysis, please, see the web page of Emilio