Perturbed Proximal Gradient Algorithm

Size: px
Start display at page:

Download "Perturbed Proximal Gradient Algorithm"

Transcription

1 Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing and astrophysics Grenoble, November 2015

2 Introduction Works in collaboration with Eric Moulines (Professor, Ecole Polytechnique) Yves Atchadé (Assistant Professor, Univ. Michigan, USA) and also Jean-Francois Aujol (IMB, Univ. Bordeaux), Charles Dossal (IMB, Univ. Bordeaux) and Soukaina Douissi. Y. Atchadé, G. Fort and E. Moulines. On Stochastic Proximal Gradient Algorithms. arxiv:1402:2365 math.st.

3 Introduction Optimization problem Outline Introduction Optimization problem Proximal-Gradient algorithm Untractable proximal-gradient iteration Perturbed Proximal Gradient Convergence of the (stable) perturbed proximal-gradient algorithm Convergence of the Monte Carlo proximal-gradient algorithm Conclusion, Other results and Works in progress

4 Introduction Optimization problem Problem (arg)min θ Θ F (θ) with F (θ) = f(θ) + g(θ) where Θ finite-dimensional Euclidean space with scalar product, and norm the function f:θ R is a smooth function i.e. f is continuously differentiable and there exists L > 0 such that f(θ) f(θ ) L θ θ the function g: Θ (, ] is convex, not identically equal to +, and lower semi-continuous in the case f(θ) and f are intractable.

5 Introduction Proximal-Gradient algorithm Classical algorithm when f tractable (1/3) Since f is Lipschitz, for any u, θ, which yields for any L γ 1 f(θ) f(u) + f(u), θ u + L θ u 2 2 F (θ) f(u) + f(u), θ u + 1 2γ θ u 2 + g(θ) F(θ) u The RHS satisfies for fixed u, an upper bound of θ F (θ) for θ = u, this upper bound is equal to F (u). for fixd u, it is convex (in θ) C(u) + 1 2γ θ {u γ f(u)} 2 + g(θ)

6 Introduction Proximal-Gradient algorithm Classical algorithm when f tractable (2/3) Denote the upper bound by Q γ(θ u) def = C(u) + 1 2γ θ {u γ f(u)} 2 + g(θ) Majorization-Minimization (MM) algorithm Define {θ n, n 0} iteratively by or equivalently with θ n+1 = argmin θ Q γ(θ θ n) θ n+1 = Prox γ(θ n γ f(θ n)) Prox γ(τ) def = argmin θ g(θ) + 1 θ τ 2 2γ also called Proximal-Gradient algorithm

7 Introduction Proximal-Gradient algorithm Classical algorithm when f tractable (3/3) The sequence {θ n, n 0} is given by θ n+1 = argmin θ Q γ(θ θ n) where the upper bound θ Q γ(θ u) satisfies F (θ) Q γ(θ u) F (u) = Q γ(u u) Lyapunov function F(θ) Q γ (θ θ n ) θ n θ n F (θ n+1) F (θ n) since F (θ n+1) Q γ(θ n+1 θ n) Q γ(θ n θ n) = F (θ n)

8 Introduction Untractable proximal-gradient iteration Untractable proximal-gradient iteration The exact proximal-gradient algorithm: θ n+1 = Prox γn+1 (θ n γ n+1 f(θ n)) where {γ n, n 0} is a step-size sequence in (0, 1/L]. 1 Prox γ(u) can be untractable (not in this talk) 2 f can be untractable (in this talk)

9 Introduction Untractable proximal-gradient iteration Untractable proximal-gradient iteration: explicit proximal operator (Projection on { C) 0 if θ C When g(θ) = + otherwise where C is closed convex, Prox γ(τ) = min τ θ 2 θ C (Elastic net penalty) g(θ) = λ ( ) 1 α 2 θ α θ 1 τ 1 i γλα si τ i γλα (Prox γ(τ)) i = τ i + γλα si τ i γλα 1 + γλ(1 α) 0 sinon proximal gradient algorithm = thresholded gradient algorithm

10 Introduction Untractable proximal-gradient iteration Untractable proximal-gradient iteration: untractable gradient f 1 Unknown function f and its gradient is of the form f(θ) = H θ (x) π θ (dx). In this case f(θ) 1 m H θ (X k ) m {X k, k 1}: (Online) Learning, Markov chain Monte Carlo.

11 Introduction Untractable proximal-gradient iteration Untractable proximal-gradient iteration: untractable gradient f 1 Unknown function f and its gradient is of the form f(θ) = H θ (x) π θ (dx). In this case f(θ) 1 m H θ (X k ) m {X k, k 1}: (Online) Learning, Markov chain Monte Carlo. 2 Large scale optimization In this case f(θ) = 1 N f k (θ), large N N f(θ) = 1 N N f k (θ) 1 m f Ik (θ) m

12 Introduction Perturbed Proximal Gradient In this talk The exact proximal-gradient algorithm θ n+1 = Prox γn+1 (θ n γ n+1 f(θ n)) The perturbed proximal-gradient algorithm θ n+1 = Prox γn+1 (θ n γ n+1 { f(θ n) + η n+1}) 1 Which conditions on γ n, η n to ensure the convergence to the same limiting set as for the exact algorithm? 2 When η n is a (random) Monte Carlo approximation, which conditions on γ n, m n?

13 Convergence of the (stable) perturbed proximal-gradient algorithm Outline Introduction Convergence of the (stable) perturbed proximal-gradient algorithm Assumptions On the convergence of {θ n, n 0} On the convergence of F ( θ n) Convergence of the Monte Carlo proximal-gradient algorithm Conclusion, Other results and Works in progress

14 Convergence of the (stable) perturbed proximal-gradient algorithm Assumptions Assumptions (arg)min θ Θ F (θ) F (θ) = f(θ) + g(θ) 1 the function g: Θ (, ] is convex, not identically equal to +, and lower semi-continuous. 2 the function f: Θ R is continuously differentiable and there exists L > 0 such that f(θ) f(θ ) L θ θ 3 the function f is convex and the set L def = argmin θ F (θ) is not empty. 4 the stepsize {γ n, n 0} is positive and γ n (0, 1/L].

15 Convergence of the (stable) perturbed proximal-gradient algorithm Assumptions The algorithm Stable sequence Let K int(dom(g)) be a compact subset of Θ such that K L =. Algorithm: θ n+1 = Prox γn+1 (θ n γ n+1 f(θ n) γ n+1η n+1) θ n+1 = Proj K ( θ n+1) Weighted average sequence Let {a n, n 0} be a non-negative sequence. θ n = 1 n a k a k θk

16 Convergence of the (stable) perturbed proximal-gradient algorithm On the convergence of {θn, n 0} Convergence of {θ n, n 0} θ n+1 = Proj K ( θ n+1) θn+1 = Prox γn+1 (θ n γ n+1 f(θ n) γ n+1η n+1) Theorem (Atchadé, F., Moulines (2015)) If assumptions 1 to 4, n γn = + and γ n+1η n+1 < n γ n+1 Tγn+1 (θ n), η n+1 < n γn+1 η 2 n+1 2 < n there exists θ L K such that lim n θ n = lim n θn = θ where Tγ(θ) = Prox γ(θ γ f(θ)) Includes the convergence analysis for the exact algorithm (η n = 0) Beck and Teboulle (2009); improves previous results Combettes and Wajs (2005); Combettes and Pesquet (2014).

17 Convergence of the (stable) perturbed proximal-gradient algorithm On the convergence of F ( θn) Rates of convergence for {F (θ n ), n 0} θ n+1 = Proj K ( θ n+1) θn+1 = Prox γn+1 (θ n γ n+1 f(θ n) γ n+1η n+1) Theorem (Atchadé, F., Moulines (2015)) If assumptions 1 to 4, for any a k 0 with U n def = 1 2 } a k {F ( θ k ) min F U n ( ak a ) k 1 θ k 1 θ 2 + a0 θ 0 θ 2 γ k γ k 1 2γ 0 a k T γk (θ k 1 ) θ, η k + a k γ k η k 2. Includes the convergence analysis for the exact algorithm (η n+1 = 0); Extends previous results in the case γ n = γ, a n = 1 Schmidt, Le Roux, Bach (2011) where it is assumed n ηn <.

18 Convergence of the Monte Carlo proximal-gradient algorithm Outline Introduction Convergence of the (stable) perturbed proximal-gradient algorithm Convergence of the Monte Carlo proximal-gradient algorithm Monte Carlo Approximation Additional assumptions Convergence of θ n Convergence of F ( θ n) How to choose γ n, m n? Conclusion, Other results and Works in progress

19 Convergence of the Monte Carlo proximal-gradient algorithm Monte Carlo Approximation Monte Carlo approximation of the gradient Assume that f(θ) is of the form f(θ) = H θ (x) π θ (dx). Consider a Monte Carlo perturbation η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1

20 Convergence of the Monte Carlo proximal-gradient algorithm Monte Carlo Approximation Monte Carlo approximation of the gradient Assume that f(θ) is of the form f(θ) = H θ (x) π θ (dx). Consider a Monte Carlo perturbation which includes the cases η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1 1 {X (1) n+1,, X(m n+1) n+1 } are i.i.d. with distribution π θn : E [η n+1 Past n] = 0 (unbiased approximation)

21 Convergence of the Monte Carlo proximal-gradient algorithm Monte Carlo Approximation Monte Carlo approximation of the gradient Assume that f(θ) is of the form f(θ) = H θ (x) π θ (dx). Consider a Monte Carlo perturbation which includes the cases η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1 1 {X (1) n+1,, X(m n+1) n+1 } are i.i.d. with distribution π θn : E [η n+1 Past n] = 0 (unbiased approximation) 2 {X (1) n+1,, X(m n+1) n+1 } is a non-stationary Markov chain (e.g. MCMC path) with invariant distribution π θn : E [η n+1 Past n] 0 (biased approximation)

22 Convergence of the Monte Carlo proximal-gradient algorithm Additional assumptions Additional assumptions 5 the error is of the form η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) where f(θ) = n+1 H θ (x) π θ (dx) 6 {X (k) n+1, k 0} is a Markov chain with transition kernel P θ n. For all θ, π θ is invariant for P θ. 7 The kernels {P θ, θ Θ} are geometrically ergodic uniformly-in-θ (aperiodic, phi-irreducible, uniform-in-θ geometric drift inequalities w.r.t. W p where p 2, level sets of W p are small): there exists p 2 and for any l (0, p], there exist C > 0, ρ (0, 1) s.t. sup Pθ n (x, ) π θ W l Cρ n W l (x). θ K Trivial condition in the i.i.d. case There exist many sufficient conditions for the Markov case when samples are drawn from MCMC samplers.

23 Convergence of the Monte Carlo proximal-gradient algorithm Convergence of θn Convergence of θ n when m n η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1 Theorem (Atchadé, F., Moulines (2015)) Assume Assumption 1 to 7 and n γn = +, n γ2 n+1m 1 n+1 <. If the approximation is biased, assume also: n γn+1m 1 n+1 <. With probability one, there exists θ L K such that lim n θ n = lim n θn = θ.

24 Convergence of the Monte Carlo proximal-gradient algorithm Convergence of θn Convergence of θ n when m n η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1 Theorem (Atchadé, F., Moulines (2015)) Assume Assumption 1 to 7 and n γn = +, n γ2 n+1m 1 n+1 <. If the approximation is biased, assume also: n γn+1m 1 n+1 <. With probability one, there exists θ L K such that lim n θ n = lim n θn = θ. The key ingredient for the proof is the control F. and Moulines (2003) for p 2: w.p.1 and the decomposition E [η n+1 F n] C m 1 n+1w (X (mn) n ), E [ η n+1 p F n] C m p/2 n+1 W p (X (mn) n ). η n+1 = η n+1 E [η n+1 F n] + E [η n+1 F n] = Martingale Increment + Bias

25 Convergence of the Monte Carlo proximal-gradient algorithm Convergence of θn Convergence of θ n when m n = m η n+1 = 1 m n+1 H θn (X (k) n+1 m ) f(θn) n+1 Theorem (Atchadé, F., Moulines (2015)) Assume Assumption 1 to 7 and n γn+1 = +, n γ2 n+1 <. If the approximation is biased, assume also: there exists a constant C such that for any θ, θ K H θ H θ W + P θ P θ W + π θ π θ W C θ θ. sup γ (0,1/L] sup θ K γ 1 Prox γ(θ) θ <. n γn+1 γn <. With probability one, there exists θ L K such that lim n θ n = lim n θn = θ.

26 Convergence of the Monte Carlo proximal-gradient algorithm Convergence of F ( θn) Convergence of F ( θ n ) when m n Theorem (Atchadé, F., Moulines (2015)) Assume Assumption 1 to 7. For any q (1, p/2], there exists C > 0 s.t. C and } Lq a k {F ( θ k ) min F a0 γ 0 + a k a ( k 1 n γ k + γ k 1 a k {E[F ( θ k )] min F } C ( a 0 γ 0 + a 2 km 1 k+1) 1/2 + a k (γ k + υ)m 1 a k a k 1 n γ k + γ k 1 a k (γ k + υ)m 1 k where υ = 0 if the Monte-Carlo approximation is unbiased and υ = 1 otherwise. k+1 ),

27 Convergence of the Monte Carlo proximal-gradient algorithm Convergence of F ( θn) Convergence of F ( θ n ) when m n = m Theorem (Atchadé, F., Moulines (2015)) Assume Assumption 1 to 7. For any q (1, p/2], there exists C > 0 s.t. C and } Lq a k {F ( θ k ) min F a0 γ 0 + a k a ( k 1 n 1/2 γ k + ak) 2 + a k γ k + υ a k+1 a k γ k 1 a k {E[F ( θ k )] min F } C ( a 0 γ 0 + γ k 1 a k a k 1 n ) γ k + a k γ k + υ a k+1 a k where υ = 0 if the Monte-Carlo approximation is unbiased and υ = 1 otherwise.

28 Convergence of the Monte Carlo proximal-gradient algorithm How to choose γn, mn? Fixed or Increasing batch-size m n? Fixed or Decreasing step-size γ n? Consider the L q -convergence rate: ( n ) 1 a k n a k F ( θ k ) F (θ ) L q Increasing batch size m n : With γ n = γ m n n a n = 1, Rate: O(ln n/n) Complexity: O(ln n/ n). Fixed batch size m n = m With γ n γ / n a n = 1 or a n = γ n, Rate: O(1/ n) Complexity: O(1/ n).

29 Conclusion, Other results and Works in progress Outline Introduction Convergence of the (stable) perturbed proximal-gradient algorithm Convergence of the Monte Carlo proximal-gradient algorithm Conclusion, Other results and Works in progress

30 Conclusion, Other results and Works in progress Conclusion Contributions: a) NOT in the strongly convex case. b) Sufficient conditions for the convergence of perturbed Proximal-Gradient algorithms. c) Case of Monte Carlo approximations, biased or unbiased, increasing or fixed batch-size. Major contributions a) for Monte Carlo approximations b) biased approximations c) fixed batch-size

31 Conclusion, Other results and Works in progress Other results, Works in progress and Future works a) When f is not convex. b) Accelerations (Nesterov, ) c) Convergence of the Proximal Stochastic Approximation Expectation Maximization algorithm : for the maximization of a penalized likelihood in latent models by using a generalization of the SAEM algorithm. d) Rates of convergence explicit controls.

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

On Perturbed Proximal Gradient Algorithms

On Perturbed Proximal Gradient Algorithms On Perturbed Proximal Gradient Algorithms Yves F. Atchadé University of Michigan, 1085 South University, Ann Arbor, 48109, MI, United States, yvesa@umich.edu Gersende Fort LTCI, CNRS, Telecom ParisTech,

More information

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

consistent learning by composite proximal thresholding

consistent learning by composite proximal thresholding consistent learning by composite proximal thresholding Saverio Salzo Università degli Studi di Genova Optimization in Machine learning, vision and image processing Université Paul Sabatier, Toulouse 6-7

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting

More information

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic

More information

Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data

Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Stéphanie Allassonnière with J.B. Schiratti, O. Colliot and S. Durrleman Université Paris Descartes & Ecole Polytechnique

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim

for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule Abstract Simulated annealing has been widely used in the solution of optimization problems. As known

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A mathematical framework for Exact Milestoning

A mathematical framework for Exact Milestoning A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1

More information

Gradient Estimation for Attractor Networks

Gradient Estimation for Attractor Networks Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks

More information

Accelerated Training of Max-Margin Markov Networks with Kernels

Accelerated Training of Max-Margin Markov Networks with Kernels Accelerated Training of Max-Margin Markov Networks with Kernels Xinhua Zhang University of Alberta Alberta Innovates Centre for Machine Learning (AICML) Joint work with Ankan Saha (Univ. of Chicago) and

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Sélection adaptative des paramètres pour le débruitage des images

Sélection adaptative des paramètres pour le débruitage des images Journées SIERRA 2014, Saint-Etienne, France, 25 mars, 2014 Sélection adaptative des paramètres pour le débruitage des images Adaptive selection of parameters for image denoising Charles Deledalle 1 Joint

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

arxiv: v1 [math.st] 4 Dec 2015

arxiv: v1 [math.st] 4 Dec 2015 MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv:151.01366v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

A Backward Particle Interpretation of Feynman-Kac Formulae

A Backward Particle Interpretation of Feynman-Kac Formulae A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO. By Yves F. Atchadé University of Michigan

KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO. By Yves F. Atchadé University of Michigan Submitted to the Annals of Statistics arxiv: math.pr/0911.1164 KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE FOR ADAPTIVE MARKOV CHAIN MONTE CARLO By Yves F. Atchadé University of Michigan We study the asymptotic

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Numerical methods for a fractional diffusion/anti-diffusion equation

Numerical methods for a fractional diffusion/anti-diffusion equation Numerical methods for a fractional diffusion/anti-diffusion equation Afaf Bouharguane Institut de Mathématiques de Bordeaux (IMB), Université Bordeaux 1, France Berlin, November 2012 Afaf Bouharguane Numerical

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

The Theory behind PageRank

The Theory behind PageRank The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19 A Crash Course on Discrete Probability Events and Probability

More information

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING COMMUNICATIONS IN INFORMATION AND SYSTEMS c 01 International Press Vol. 1, No. 3, pp. 1-3, 01 003 HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING QI HE AND JACK XIN Abstract.

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine

More information

Large-scale machine learning and convex optimization

Large-scale machine learning and convex optimization Large-scale machine learning and convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE IFCAM, Bangalore - July 2014 Big data revolution? A new scientific

More information

Sampling multimodal densities in high dimensional sampling space

Sampling multimodal densities in high dimensional sampling space Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journées MAS Toulouse, Août 4 Introduction Sample from a target distribution

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

A framework for adaptive Monte-Carlo procedures

A framework for adaptive Monte-Carlo procedures A framework for adaptive Monte-Carlo procedures Jérôme Lelong (with B. Lapeyre) http://www-ljk.imag.fr/membres/jerome.lelong/ Journées MAS Bordeaux Friday 3 September 2010 J. Lelong (ENSIMAG LJK) Journées

More information

Large-scale machine learning and convex optimization

Large-scale machine learning and convex optimization Large-scale machine learning and convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Allerton Conference - September 2015 Slides available at www.di.ens.fr/~fbach/gradsto_allerton.pdf

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

Stochastic Dynamic Programming: The One Sector Growth Model

Stochastic Dynamic Programming: The One Sector Growth Model Stochastic Dynamic Programming: The One Sector Growth Model Esteban Rossi-Hansberg Princeton University March 26, 2012 Esteban Rossi-Hansberg () Stochastic Dynamic Programming March 26, 2012 1 / 31 References

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Non-homogeneous random walks on a semi-infinite strip

Non-homogeneous random walks on a semi-infinite strip Non-homogeneous random walks on a semi-infinite strip Chak Hei Lo Joint work with Andrew R. Wade World Congress in Probability and Statistics 11th July, 2016 Outline Motivation: Lamperti s problem Our

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

Lecture 2 February 25th

Lecture 2 February 25th Statistical machine learning and convex optimization 06 Lecture February 5th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization.

More information

Recent Advances in Regional Adaptation for MCMC

Recent Advances in Regional Adaptation for MCMC Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

arxiv: v3 [stat.me] 12 Jul 2015

arxiv: v3 [stat.me] 12 Jul 2015 Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models Arnaud Doucet 1, Pierre E. Jacob and Sylvain Rubenthaler 3 1 Department of Statistics,

More information

Titles and Abstracts

Titles and Abstracts Titles and Abstracts Stability of the Nonlinear Filter for Random Expanding Maps Jochen Broecker A ubiquitous problem in science and engineering is to reconstruct the state of a hidden Markov process (the

More information