dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés
|
|
- Justina George
- 5 years ago
- Views:
Transcription
1 Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse, France
2 Based on joint works with Yves Atchadé (Univ. Michigan, USA) Eric Moulines (Ecole Polytechnique, France) On Perturbed Proximal-Gradient algorithms (JMLR, 2016) Edouard Ollier (ENS Lyon, France) Adeline Samson (Univ. Grenoble Alpes, France). Penalized inference in Mixed Models by Proximal Gradients methods (work in progress) Jean-François Aujol (IMB, Bordeaux, France) Charles Dossal (IMB, Bordeaux, France). Acceleration for perturbed Proximal Gradient algorithms (work in progress)
3 Motivation : Pharmacokinetic (1/2) N patients. For patient i, observations {Y ij, 1 j J}: evolution of the concentration at times t ij, 1 j J. Initial dose D. Model: Y ij = f (t ij, X i) + ɛ ij X i = Z iβ + d i R L ɛ ij i.i.d. N (0, σ 2 ) i.i.d. d i N L(0, Ω) and independent of ɛ Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates
4 Motivation : Pharmacokinetic (1/2) N patients. For patient i, observations {Y ij, 1 j J}: evolution of the concentration at times t ij, 1 j J. Initial dose D. Model: Y ij = f (t ij, X i) + ɛ ij X i = Z iβ + d i R L ɛ ij i.i.d. N (0, σ 2 ) i.i.d. d i N L(0, Ω) and independent of ɛ Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Statistical analysis: estimation of (β, σ 2, Ω), under sparsity constraints on β selection of the covariates based on ˆβ. Penalized Maximum Likelihood
5 Motivation : Pharmacokinetic (2/2) Model: Y ij = f (t ij, X i) + ɛ ij X i = Z iβ + d i R L ɛ ij i.i.d. N (0, σ 2 ) d i i.i.d. N L(0, Ω) and independent of ɛ Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Likelihoods: The distribution of {Y ij, X i; 1 i N, 1 j J} has an explicit expression. The distribution of {Y ij; 1 i N, 1 j J} does not have an explicit expression; at least, the marginal distribution of the previous one.
6 Penalized Maximum Likelihood inference in models with untractable likelihood Outline Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Example 2: Discrete graphical model (Markov random field) Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Convergence analysis
7 Penalized Maximum Likelihood inference in models with untractable likelihood Penalized Maximum Likelihood inference with untractable Likelihood N observations : Y = (Y 1,, Y N ) A parametric statistical model θ Θ R d θ L(θ) likelihood of the observations A penalty constraint on the parameter θ: θ g(θ) for sparsity constraints on θ. Usually, g non-smooth and convex. Goal: Computation of θ argmin θ Θ ( 1 N log L(θ) + g(θ) ) when the likelihood L has no closed form expression, and can not be evaluated.
8 Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Example 1: Latent variable model The log-likelihood of the observations Y is of the form θ log L(θ) L(θ) = p θ (x) µ(dx), where µ is a positive σ-finite measure on a set X. x are the missing/latent data; (x, Y) are the complete data. X In these models, the complete likelihood p θ (x) can be evaluated explicitly, the likelihood has no closed expression. The exact integral could be replaced by a Monte Carlo approximation ; known to be inefficient. Numerical methods based on the a posteriori distribution of the missing data are preferred (see e.g. Expectation-Maximization approaches). What about the gradient of the (log)-likelihood?
9 Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Gradient of the likelihood in a latent variable model log L(θ) = log p θ (x) µ(dx) Under regularity conditions, θ log L(θ) is C 1 and θ p θ (x) µ(dx) log L(θ) = pθ (z) µ(dz) p θ (x) µ(dx) = θ log p θ (x) pθ (z) µ(dz) }{{} the a posteriori distribution
10 Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Gradient of the likelihood in a latent variable model log L(θ) = log p θ (x) µ(dx) Under regularity conditions, θ log L(θ) is C 1 and θ p θ (x) µ(dx) log L(θ) = pθ (z) µ(dz) p θ (x) µ(dx) = θ log p θ (x) pθ (z) µ(dz) }{{} the a posteriori distribution The gradient of the log-likelihood { θ 1 } N log L(θ) = H θ (x) π θ (dx) is an untractable expectation w.r.t. the conditional distribution of the latent variable given the observations Y. For all (x, θ), H θ (x) can be evaluated.
11 Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Approximation of the gradient { θ 1 } N log L(θ) = H θ (x) π θ (dx) X 1 Quadrature techniques: poor behavior w.r.t. the dimension of X 2 use i.i.d. samples from π θ to define a Monte Carlo approximation: not possible, in general. 3 use m samples from a non stationary Markov chain {X j,θ, j 0} with unique stationary distribution π θ, and define a Monte Carlo approximation. MCMC samplers provide such a chain.
12 Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Approximation of the gradient { θ 1 } N log L(θ) = H θ (x) π θ (dx) X 1 Quadrature techniques: poor behavior w.r.t. the dimension of X 2 use i.i.d. samples from π θ to define a Monte Carlo approximation: not possible, in general. 3 use m samples from a non stationary Markov chain {X j,θ, j 0} with unique stationary distribution π θ, and define a Monte Carlo approximation. MCMC samplers provide such a chain. Stochastic approximation of the gradient a biased approximation E [h(x j,θ )] h(x) π θ (dx). If the Markov chain is ergodic enough, the bias vanishes when j.
13 Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Example 2: Discrete graphical model (Markov random field) N independent observations of an undirected graph with p nodes. Each node takes values in a finite alphabet X. N i.i.d. observations Y i in X p with distribution y = (y 1,, y p) π θ (y) def = 1 p exp θ kk B(y k, y k ) + Z θ where B is a symmetric function. θ is a symmetric p p matrix. k=1 = 1 exp ( ) θ, B(y) Z θ 1 j<k p θ kj B(y k, y j) the normalizing constant (partition function) Z θ can not be computed - sum over X p terms.
14 Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Likelihood and its gradient in Markov random field Likelihood of the form (scalar product between matrices = Frobenius inner product) 1 log L(θ) = θ, 1 N B(Y i) log Z θ N N The likelihood is untractable. i=1
15 Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Likelihood and its gradient in Markov random field Likelihood of the form (scalar product between matrices = Frobenius inner product) 1 log L(θ) = θ, 1 N B(Y i) log Z θ N N The likelihood is untractable. Gradient of the form with ( ) 1 θ N log L(θ) = 1 N N i=1 i=1 B(Y i) X p π θ (y) def = 1 Z θ exp ( θ, B(y) ). The gradient of the (log)-likelihood is untractable. B(y) πθ (y) µ(dy)
16 Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Approximation of the gradient ( ) 1 θ N log L(θ) = 1 N N i=1 B(Y i) X p B(y) πθ (y) µ(dy). The Gibbs measure π θ (y) def = 1 exp ( ) θ, B(y) Z θ is known up to the constant Z θ. Exact sampling from π θ can be approximated by MCMC samplers (Gibbs-type samplers such as Swendsen-Wang,...) A biased approximation of the gradient is available.
17 Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) To summarize, Problem: when θ Θ R d argmin θ Θ F (θ) g convex non-smooth function (explicit). with F (θ) = f(θ) + g(θ) f is C 1, with an untractable gradient of the form f(θ) = H θ (x) π θ (dx); which can be approximated by biased Monte Carlo techniques. f is Lipschitz L > 0, θ, θ f(θ) f(θ ) L θ θ. f is not necessarily convex.
18 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Outline Penalized Maximum Likelihood inference in models with untractable likelihood Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms Numerical illustration Convergence analysis
19 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms The Proximal-Gradient algorithm (1/2) argmin θ Θ F (θ) with F (θ) = f(θ) + g(θ) }{{} }{{} smooth non smooth The Proximal Gradient algorithm θ n+1 = Prox γn+1,g (θ n γ n+1 f(θ n)) where ( Prox γ,g(τ) def = argmin θ Θ g(θ) + 1 ) θ τ 2 2γ Proximal map: Moreau(1962) Proximal Gradient algorithm: Beck-Teboulle(2010); Combettes-Pesquet(2011); Parikh-Boyd(2013)
20 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms The Proximal-Gradient algorithm (1/2) argmin θ Θ F (θ) with F (θ) = f(θ) + g(θ) }{{} }{{} smooth non smooth The Proximal Gradient algorithm θ n+1 = Prox γn+1,g (θ n γ n+1 f(θ n)) where ( Prox γ,g(τ) def = argmin θ Θ g(θ) + 1 ) θ τ 2 2γ Proximal map: Moreau(1962) Proximal Gradient algorithm: Beck-Teboulle(2010); Combettes-Pesquet(2011); Parikh-Boyd(2013) A generalization of the gradient algorithm to a composite objective function. An iterative algorithm (from the Majorize-Minimize optim method) which produces a sequence {θ n, n 0} such that F (θ n+1) F (θ n).
21 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms The proximal-gradient algorithm (2/2) argmin θ Θ F (θ) with F (θ) = f(θ) + g(θ) }{{} }{{} smooth non smooth The Proximal Gradient algorithm θ n+1 = Prox γn+1,g (θ n γ n+1 f(θ n)) where ( Prox γ,g(τ) def = argmin θ Θ g(θ) + 1 ) θ τ 2 2γ About the Prox-step: when g = 0: Prox(τ) = τ when g is the projection on a compact set: the algorithm is the projected gradient. in some cases, Prox is explicit (e.g. elastic net penalty). Otherwise, numerical approximation: θ n+1 = Prox γn+1,g (θ n γ n+1 f(θ n)) + ɛ n+1
22 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms The perturbed proximal-gradient algorithm The Perturbed Proximal Gradient algorithm Given a deterministic sequence {γ n, n 0}, θ n+1 = Prox γn+1,g (θ n γ n+1h n+1) where H n+1 is an approximation of f(θ n).
23 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms Algorithm Monte Carlo-Proximal Gradient for Penalized ML When the gradient of the log-likelihood is of the form f(θ) = H θ (x) π θ (x)µ(dx), The MC-Proximal Gradient algorithm Given the current value θ n, 1 Sample a Markov chain {X j,n, j 0} from a MCMC sampler with target distribution π θn dµ 2 Set H n+1 = 1 m n+1 H θn (X j,n). m n+1 j=1 3 Update the current value of the parameter θ n+1 = Prox γn+1,g (θ n γ n+1h n+1)
24 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms Algorithm Stochastic Approximation-Proximal gradient for Penalized ML If in addition, H θ (x) = Φ(θ) + Ψ(θ)S(x) which implies ( f(θ) = Φ(θ) + Ψ(θ) ) S(x) π θ (x)µ(dx), The SA-Proximal Gradient algorithm Given the current value θ n, 1 Sample a Markov chain {X j,n, j 0} from a MCMC sampler with target distribution π θn dµ 2 Set H n+1 = Φ(θ n) + Ψ(θ n)s n+1 with ( m n+1 ) 1 S n+1 = S n + δ n+1 S(X j,n) S n. m n+1 3 Update the current value of the parameter j=1 θ n+1 = Prox γn+1,g (θ n γ n+1h n+1)
25 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms (*) Penalized Expectation-Maximization vs Proximal-Gradient EM Dempster et al. (1977) is a Majorize-Minimize algorithm for the computation of the ML estimate in a latent variable model. The Proximal Gradient algorithm is a Penalized Generalized-EM algorithm: g(θ n+1) + Q(θ n+1, θ n) g(θ n) + Q(θ n, θ n) where Q(θ, θ ) def = log p θ (x) π θ (x)dµ(x), π θ (x) def = p θ (x) pθ (z)dµ(z)
26 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Algorithms (*) Penalized Expectation-Maximization vs Proximal-Gradient EM Dempster et al. (1977) is a Majorize-Minimize algorithm for the computation of the ML estimate in a latent variable model. The Proximal Gradient algorithm is a Penalized Generalized-EM algorithm: g(θ n+1) + Q(θ n+1, θ n) g(θ n) + Q(θ n, θ n) where Q(θ, θ ) def = log p θ (x) π θ (x)dµ(x), π θ (x) def = p θ (x) pθ (z)dµ(z) MC-Proximal Gradient corresponds to the Penalized Generalized-MCEM algorithm Wei and Tanner (1990) SA-Proximal Gradient corresponds to the Penalized Generalized-SAEM algorithm Delyon et al. (1999)
27 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Numerical illustration Numerical illustration (1/3): pharmacokinetic For the implementation of the algorithm Penalty term: g(θ) = λ β 1. How to choose λ? Weighted norm? Step size sequences: constant or vanishing stepsize sequence {γ n, n 0}? (and δ n for the SA-Prox Gdt algorithm) Monte Carlo approximation: fixed or increasing batch size?
28 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Numerical illustration Numerical illustration (2/3): pharmacokinetic Proximal MCEM Decreasing Step Size Proximal MCEM Adaptive Step Size Parameter value Parameter value Iteration Proximal SAEM Decreasing Step Size Iteration Proximal SAEM Adaptive Step Size Parameter value Parameter value Iteration Iteration
29 Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Numerical illustration Numerical illustration (3/3): pharmacokinetic Parameter value Iteration Figure: Low dimension simulation setting. Regularization path of the covariate parameters for the clearance (left), absorption constant (middle) and volume of distribution (right) parameters. Black dashed line corresponds to the λ value selected by EBIC. Each colored curve corresponds to a covariate.
30 Convergence analysis Outline Penalized Maximum Likelihood inference in models with untractable likelihood Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Convergence analysis
31 Convergence analysis Problem: argmin θ Θ F (θ) with F (θ) = f(θ) + }{{} g(θ) }{{} smooth non smooth The Perturbed Proximal Gradient algorithm Given a (0, 1/L]-valued deterministic sequence {γ n, n 0}, set for n 0, θ n+1 = Prox γn+1,g (θ n γ n+1h n+1) where H n+1 is an approximation of f(θ n). Which conditions on the sequence {γ n, n 0} and on the perturbations H n+1 f(θ n), for this algorithm to converge to the minima of F?
32 Convergence analysis The assumptions where argmin θ Θ F (θ) with F (θ) = f(θ) + g(θ) the function g: R d [0, ] is convex, non smooth, not identically equal to +, and lower semi-continuous the function f: R d R is a smooth convex function i.e. f is continuously differentiable and there exists L > 0 such that f(θ) f(θ ) L θ θ θ, θ R d Θ R d is the domain of g: Θ = {θ R d : g(θ) < }. The set argmin Θ F is a non-empty subset of Θ.
33 Convergence analysis Existing results in the literature There exist results under (some of) the assumptions inf n γn > 0, H n+1 f(θ n) <, i.i.d. Monte Carlo approx i.e. results for n unbiased sampling. Almost NO conditions for the biased sampling, such as the MCMC one. non vanishing stepsize sequence {γ n, n 0}. increasing batch size: when H n+1 is a Monte Carlo sum i.e. then lim n m n = + at some rate. Combettes (2001) Elsevier Science. Combettes-Wajs (2005) Multiscale Modeling and Simulation. Combettes-Pesquet (2015, 2016) SIAM J. Optim, arxiv Lin-Rosasco-Villa-Zhou (2015) arxiv Rosasco-Villa-Vu (2014,2015) arxiv Schmidt-Leroux-Bach (2011) NIPS H n+1 = 1 m n+1 H θn (X j,n) m n+1 j=1
34 Convergence analysis Convergence of the perturbed proximal gradient algorithm θ n+1 = Prox γn+1,g (θ n γ n+1 H n+1) with H n+1 f(θ n) Set: L = argmin Θ (f + g) η n+1 = H n+1 f(θ n) Theorem (Atchadé, F., Moulines (2015)) Assume g convex, lower semi-continuous; f convex, C 1 and its gradient is Lipschitz with constant L; L is non empty. n γn = + and γn (0, 1/L]. Convergence of the series γn+1 η 2 n+1 2, γ n+1η n+1, γ n+1 T n, η n+1 n where T n = Prox γn+1,g(θ n γ n+1 f(θ n)). Then there exists θ L such that lim n θ n = θ. n n
35 Convergence analysis Convergence: when H n+1 is a Monte-Carlo approximation (1/3) In the case f(θ n) H n+1 = 1 m n+1 H θn (X j,n) m n+1 and {X j,n, j 0} is a non-stationary Markov chain with unique stationary distribution π θn : i.e. for any n 0, j=1 X j+1,n past P θn (X j,n, ) π θ P θ = π θ ;
36 Convergence analysis Convergence: when H n+1 is a Monte-Carlo approximation (1/3) In the case f(θ n) H n+1 = 1 m n+1 H θn (X j,n) m n+1 and {X j,n, j 0} is a non-stationary Markov chain with unique stationary distribution π θn : i.e. for any n 0, j=1 X j+1,n past P θn (X j,n, ) π θ P θ = π θ ; let us check a condition: γ n+1η n+1 = γ n+1 (H n+1 f(θ n)) n n = n = n γ n+1 {H n+1 E [H n+1 F n]} + n γ n+1martingale increment + n γ n+1 E [H n+1 F n] f(θ n) }{{} bias, non vanishing when m n = m γ n+1remainder
37 Convergence analysis Convergence: when H n+1 is a Monte-Carlo approximation (1/3) In the case f(θ n) H n+1 = 1 m n+1 H θn (X j,n) m n+1 and {X j,n, j 0} is a non-stationary Markov chain with unique stationary distribution π θn : i.e. for any n 0, j=1 X j+1,n past P θn (X j,n, ) π θ P θ = π θ ; let us check a condition: γ n+1η n+1 = γ n+1 (H n+1 f(θ n)) n n = n = n γ n+1 {H n+1 E [H n+1 F n]} + n γ n+1martingale increment + n γ n+1 E [H n+1 F n] f(θ n) }{{} bias, non vanishing when m n = m γ n+1remainder the most technical case is biased approximation with fixed batch size
38 Convergence analysis Convergence: when H n+1 is a Monte-Carlo approximation (2/3) Increasing batch size: lim n m n = + Conditions on the step sizes and batch sizes γ n = +, n Conditions on the Markov kernels: function W : X [1, + ) such that n γ 2 n m n < ; n γ n m n < (biased case) There exist λ (0, 1), b <, p 2 and a measurable sup H θ W <, sup P θ W p λw p + b. θ Θ θ Θ In addition, for any l (0, p], there exist C < and ρ (0, 1) such that for any x X, Condition on Θ: Θ is bounded. sup P n θ (x, ) π θ W l Cρ n W l (x). (1) θ Θ
39 Convergence analysis Convergence: when H n+1 is a Monte-Carlo approximation (3/3) Fixed batch size: m n = m Condition on the step size: γ n = + n γn 2 < n γ n+1 γ n < n Condition on the Markov chain: same as in the case increasing batch size and there exists a constant C such that for any θ, θ Θ P θ (x, ) P H θ H θ W + sup θ (x, ) W + π θ π x W (x) θ W C θ θ. Condition on the Prox: sup γ (0,1/L] θ Θ Condition on Θ: Θ is bounded. sup γ 1 Prox γ,g(θ) θ <.
40 Convergence analysis Rates of convergence (1/3) : the problem For non negative weights a k, find an upper bound of n k=1 a k n l=1 a l F (θ k ) min F It provides an upper bound for the cumulative regret (a k = 1) an upper bound for an averaging strategy when F is convex since ( n ) a k n a F n l=1 a θ k min F k n l l=1 a F (θ k ) min F. l k=1 k=1
41 Convergence analysis Rates of convergence (2/3): a deterministic control Theorem (Atchadé, F., Moulines (2016)) For any θ argmin Θ F, n k=1 where A n = a k F (θ k ) min F a0 θ 0 θ 2 A n 2γ 0A n + 1 2A n + 1 A n n k=1 n k=1 ( ak a ) k 1 θ k 1 θ 2 γ k γ k 1 a k γ k η k 2 1 A n n a k T k 1 θ, η k n a l, η k = H k f(θ k 1 ), T k = Prox γk,g(θ k 1 γ k f(θ k 1 )). l=1 k=1
42 Convergence analysis Rates (3/3): when H n+1 is a Monte Carlo approximation, bound in L q ( 1 F n u n = O(1/ n) ) n θ k min F 1 L q n k=1 n F (θ k ) min F un L q k=1 with fixed size of the batch and (slowly) decaying stepsize γ n = γ, a [1/2, 1] na mn = m. With averaging: optimal rate, even with slowly decaying stepsize γ n 1/ n. u n = O(ln n/n) with increasing batch size and constant stepsize γ n = γ m n n. Rate with O(n 2 ) Monte Carlo samples!
43 Convergence analysis Acceleration (1) Let {t n, n 0} be a positive sequence s.t. γ n+1t n(t n 1) γ nt 2 n 1 Nesterov acceleration of the Proximal Gradient algorithm Nesterov(2004), Tseng(2008), Beck-Teboulle(2009) θ n+1 = Prox γn+1,g (τ n γ n+1 f(τ n)) τ n+1 = θ n+1 + tn 1 t n+1 (θ n+1 θ n) Zhu-Orecchia (2015); Attouch-Peypouquet(2015); Bubeck-Lee-Singh(2015); Su-Boyd-Candes(2015) (deterministic) Proximal-gradient (deterministic) Accelerated Proximal-gradient ( ) 1 F (θ n) min F = O n ( ) 1 F (θ n) min F = O n 2
44 Convergence analysis Acceleration (2) Aujol-Dossal-F.-Moulines, work in progress Perturbed Nesterov acceleration: some convergence results Choose γ n, m n, t n s.t. γ n (0, 1/L], lim n γ nt 2 n = +, Then there exists θ argmin Θ F s.t lim n θ n = θ. In addition ( 1 F (θ n+1) min F = O n γ nt n(1 + γ nt n) 1 m n < γ n+1t 2 n ) Schmidt-Le Roux-Bach (2011); Dossal-Chambolle(2014); Aujol-Dossal(2015) γ n m n t n rate NbrMC γ n 3 n n 2 n 4 γ/ n n 2 n n 3/2 n 3 Table: Control of F (θ n) min F
Perturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationOn Perturbed Proximal Gradient Algorithms
On Perturbed Proximal Gradient Algorithms Yves F. Atchadé University of Michigan, 1085 South University, Ann Arbor, 48109, MI, United States, yvesa@umich.edu Gersende Fort LTCI, CNRS, Telecom ParisTech,
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationconsistent learning by composite proximal thresholding
consistent learning by composite proximal thresholding Saverio Salzo Università degli Studi di Genova Optimization in Machine learning, vision and image processing Université Paul Sabatier, Toulouse 6-7
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationSequential convex programming,: value function and convergence
Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationStochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment
Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment Adeline Samson 1, Marc Lavielle and France Mentré 1 1 INSERM E0357, Department of
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationPenalized Barycenters in the Wasserstein space
Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationStatistical Optimality of Stochastic Gradient Descent through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationIncremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning
Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic
More informationRandom Matrix Eigenvalue Problems in Probabilistic Structural Mechanics
Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationMathematical methods for Image Processing
Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationA Backward Particle Interpretation of Feynman-Kac Formulae
A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationRandom Eigenvalue Problems Revisited
Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMixed effect model for the spatiotemporal analysis of longitudinal manifold value data
Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Stéphanie Allassonnière with J.B. Schiratti, O. Colliot and S. Durrleman Université Paris Descartes & Ecole Polytechnique
More informationMean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral
Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationLecture: Smoothing.
Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMultilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems
Jonas Latz 1 Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz Technische Universität München Fakultät für Mathematik Lehrstuhl für Numerische Mathematik jonas.latz@tum.de November
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIncremental Gradient, Subgradient, and Proximal Methods for Convex Optimization
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationCONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM
The Annals of Statistics 1999, Vol. 27, No. 1, 94 128 CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM By Bernard Delyon, Marc Lavielle and Eric Moulines IRISA/INRIA, Université Paris
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationAdvanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP
Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae,
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationA framework for adaptive Monte-Carlo procedures
A framework for adaptive Monte-Carlo procedures Jérôme Lelong (with B. Lapeyre) http://www-ljk.imag.fr/membres/jerome.lelong/ Journées MAS Bordeaux Friday 3 September 2010 J. Lelong (ENSIMAG LJK) Journées
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More information