Langevin MCMC: theory and methods

43 mins 14 secs, 174.98 MB, WebM 640x360, 29.97 fps, 44100 Hz, 552.59 kbits/sec

Share this media item:

Embed this media item:

<iframe width="_width_" height="_height_" src="https://sms.cam.ac.uk/media/2523866/embed" frameborder="0" scrolling="no" allowfullscreen></iframe>

Choose size:

About this item

Available Formats

About this item

Description:	Moulines, E Friday 7th July 2017 - 09:00 to 09:45


Created:	2017-07-24 12:16
Collection:	Scalable inference; statistical, algorithmic, computational aspects
Publisher:	Isaac Newton Institute
Copyright:	Moulines, E
Language:	eng (English)
Distribution:	World (downloadable)
Explicit content:	No
Aspect Ratio:	16:9
Screencast:	No
Bumper:	UCS Default
Trailer:	UCS Default


Abstract:	Nicolas Brosse, Ecole Polytechnique, Paris Alain Durmus, Telecom ParisTech and Ecole Normale Supérieure Paris-Saclay Marcelo Pereira, Herriot-Watt University, Edinburgh The complexity and sheer size of modern datasets, to whichever increasingly demanding questions are posed, give rise to major challenges. Traditional simulation methods often scale poorly with data size and model complexity and thus fail for the most complex of modern problems. We are considering the problem of sampling from a log-concave distribution. Many problems in machine learning fall into this framework, such as linear ill-posed inverse problems with sparsity-inducing priors, or large scale Bayesian binary regression. The purpose of this lecture is to explain how we can use ideas which have proven very useful in machine learning community to solve large-scale optimization problems to design efficient sampling algorithms. Most of the efficient algorithms know so far may be seen as variants of the gradient descent algorithms, most often coupled with « partial updates » (coordinates descent algorithms). This, of course, suggests studying methods derived from Euler discretization of the Langevin diffusion. Partial updates may in this context as « Gibbs steps »This algorithm may be generalized in the non-smooth case by « regularizing » the objective function. The Moreau-Yosida inf-convolution algorithm is an appropriate candidate in such case. We will prove convergence results for these algorithms with explicit convergence bounds both in Wasserstein distance and in total variation. Numerical illustrations will be presented (on the computation of Bayes factor for model choice, Bayesian analysis of high-dimensional regression, aggregation of estimators) to illustrate our results.

Abstract:

Nicolas Brosse, Ecole Polytechnique, Paris
Alain Durmus, Telecom ParisTech and Ecole Normale Supérieure Paris-Saclay
Marcelo Pereira, Herriot-Watt University, Edinburgh

The complexity and sheer size of modern datasets, to whichever increasingly demanding questions are posed, give rise to major challenges. Traditional simulation methods often scale poorly with data size and model complexity and thus fail for the most complex of modern problems.
We are considering the problem of sampling from a log-concave distribution. Many problems in machine learning fall into this framework,
such as linear ill-posed inverse problems with sparsity-inducing priors, or large scale Bayesian binary regression.

The purpose of this lecture is to explain how we can use ideas which have proven very useful in machine learning community to
solve large-scale optimization problems to design efficient sampling algorithms.
Most of the efficient algorithms know so far may be seen as variants of the gradient descent algorithms,
most often coupled with « partial updates » (coordinates descent algorithms). This, of course, suggests studying methods derived from Euler discretization of the Langevin diffusion. Partial updates may in this context as « Gibbs steps »This algorithm may be generalized in the non-smooth case by « regularizing » the objective function. The Moreau-Yosida inf-convolution algorithm is an appropriate candidate in such case.

We will prove convergence results for these algorithms with explicit convergence bounds both in Wasserstein distance and in total variation. Numerical illustrations will be presented (on the computation of Bayes factor for model choice, Bayesian analysis of high-dimensional regression, aggregation of estimators) to illustrate our results.

Available Formats

Format	Quality	Bitrate	Size
MPEG-4 Video	640x360	1.94 Mbits/sec	628.97 MB	View	Download
WebM *	640x360	552.59 kbits/sec	174.98 MB	View	Download
iPod Video	480x270	522.26 kbits/sec	165.31 MB	View	Download
MP3	44100 Hz	249.77 kbits/sec	79.15 MB	Listen	Download
Auto	(Allows browser to choose a format it supports)

Streaming Media Service Upload

Langevin MCMC: theory and methods