Regularization with variance-mean mixtures

Size: px
Start display at page:

Download "Regularization with variance-mean mixtures"

Transcription

1 Regularization with variance-mean mixtures Nick Polson University of Chicago James Scott University of Texas at Austin Workshop on Sensing and Analysis of High-Dimensional Data Duke University July 2011

2 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter

3 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter Our approach works for arbitrary combinations of likelihood with prior. No matrix inversion; no numerical derivatives. Fully parallelizable block updates.

4 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter Our approach works for arbitrary combinations of likelihood with prior. No matrix inversion; no numerical derivatives. Fully parallelizable block updates.

5 But I can solve all of these with an 1 constraint. True enough. But consider an example: p = 25; n = 500; 1000 simulated data sets. y i = 1 zi >0 for i = 1,..., n z N(X β, I ) β = 5 (5x) 0 (20x) MLE Lasso-UT Lasso-CV HS Median SSE Mean SSE

6 But I can solve all of these with an 1 constraint. True enough. But consider an example: p = 25; n = 500; 1000 simulated data sets. y i = 1 zi >0 for i = 1,..., n z N(X β, I ) β = 5 (5x) 0 (20x) (an r-spike signal) MLE Lasso-UT Lasso-CV HS Median SSE Mean SSE

7

8

9 There seems to be precise separation of the computationally nice penalties/priors asso... from the statistically nice priors. Cauchy Horseshoe Our contribution: an algorithm for many priors in this second class, when used in conjunction with many common non-gaussian likelihoods.

10 There seems to be precise separation of the computationally nice penalties/priors asso... from the statistically nice priors. Cauchy Horseshoe PS, 2011 Fan and Li, 2001 Pericchi and Smith, 1992 Masreliez, 1975 Brown, 1971 etc. Our contribution: an algorithm for many priors in this second class, when used in conjunction with many common non-gaussian likelihoods.

11 A teaser example: logit with a bridge penalty ˆβ = arg min β p n i=1 log(1 + exp{ y i x T i β})+ p j =1 β j /τs j α

12 A teaser example: logit with a bridge penalty ˆβ = arg min β p n i=1 log(1 + exp{ y i x T i β})+ p j =1 β j /τs j α gamma o ouch.

13 To find the MAP, just iterate three steps β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X XT 1 ˆω 1(g+1) i = 1 z (g) i e z(g) i 1 + e z(g) i 1 2 ˆλ 1(g+1) j = α(τs j ) 2 α β (g) α 2 j

14 To find the MAP, just iterate three steps Don t actually do this. β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) 1 1 X 2 XT 1 ˆω 1(g+1) i = 1 z (g) i e z(g) i 1 + e z(g) i 1 2 ˆλ 1(g+1) j = α(τs j ) 2 α β (g) α 2 j

15 Example: covariate-dependent disease networks ~112m patient records comprising ICD-9 codes + prescriptions + covariates ISCHEMIC HEART DISEASE ( ) 410 Acute myocardial infarction Of anterolateral wall Of other anterior wall Of inferolateral wall Of inferoposterior wall Of other inferior wall Of other lateral wall True posterior wall infarction Subendocardial infarction Of other specified sites Unspecified site Our approach: a tree of graphs Tree splits on covariates, mainly demographics and geography. Each terminal node is a disease network for a subgroup of the population.

16 Example: covariate-dependent disease networks ~112m patient records comprising ICD-9 codes + prescriptions + covariates ISCHEMIC HEART DISEASE ( ) 410 Acute myocardial infarction Of anterolateral wall Of other anterior wall Of inferolateral wall Of inferoposterior wall Of other inferior wall Of other lateral wall True posterior wall infarction Subendocardial infarction Of other specified sites Unspecified site Our approach: a tree of graphs Goh et. al., PNAS (2007) Tree splits on covariates, mainly demographics and geography. Each terminal node is a disease network for a subgroup of the population.

17 The big picture We use normal variance mean mixtures to represent a wide class of objective functions commonly encountered in high-dimensional problems. By modern Bayesian standards, these are pretty simple. But they are ubiquitous, useful, and often the only tractable approach in their respective domains for data sets beyond a certain size.

18 This class is surprisingly broad. For example: p(α 1:K,β 1:Nd W,µ,Σ) D d=1 Nd p(β d µ,σ) n=1 K k=1 eβ dk K l =1 eβ dl α k,wn p(α 1:K ) We get an exact MAP estimate. No variational approximation.

19 This class is surprisingly broad. For example: p(α 1:K,β 1:Nd W,µ,Σ) D d=1 Nd p(β d µ,σ) n=1 K k=1 eβ dk K l =1 eβ dl α k,wn p(α 1:K ) (e.g. Blei and Lafferty, 2009) We get an exact MAP estimate. No variational approximation.

20 Connection with previous work Penalties/priors corresponding to scale mixtures: lasso (Tibshirani, 1996; Park and Casella, 2008; Hans, 2009) bridge estimators (West, 1987; Huang et al., 2008) relevance vector machines (Tipping, 2001) normal/jeffreys (Figueiredo, 2003; Bae and Mallick, 2004) normal/exponential-gamma (Griffin and Brown, 2005) normal/inverse-gaussian (Caron and Doucet, 2008) normal/gamma (Griffin and Brown, 2010) horseshoe/inverted-beta prior (CPS 2010; Polson and Scott, 2011) double-pareto (Armagan et al., 2010) Algorithms for regularized regression LARS (Efron et al., 2004) LQA (Fan and Li, 2001) and LLA (Zou and Li, 2008) EM/ECME (Dempster et al., 1977; Meng and Rubin, 1993; many others) MM (Hunter and Lange, 2000; Taddy, 2010) MCMC for support-vector machines (Polson and Steve Scott, 2011) MCMC for logistic regression (Gramacy and Polson; Holmes and Held; Steve Scott; SFS; others) Distributional theory based on variance-mean mixtures Z distributions (Fisher, 1923) Generalized inverse-gaussian distributions (Barndorff-Nielsen 1977) Penalties, priors, and Lévy processes (Polson and Scott, 2011)

21 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv

22 ˆβ = argmin y X β 2 + ν p g(β j ) j =1 (y β) N(X β,σ 2 β) (β i τ 2,λ 2 i ) N(0,τ2 λ 2 i ) λ 2 i π(λ 2 i ) (τ 2,σ 2 ) π(τ 2,σ 2 ) (β y,τ 2,Λ,σ 2 ) N( ˆβ, ˆΣ β ) ˆβ = (τ 2 Λ 1 + X T X ) 1 X T y

23 ˆβ = argmin y X β 2 + ν p g(β j ) j =1 (y β) N(X β,σ 2 β) (β i τ 2,λ 2 i ) N(0,τ2 λ 2 i ) λ 2 i π(λ 2 i ) (τ 2,σ 2 ) π(τ 2,σ 2 ) Exponential > Lasso Inverted-beta > Horseshoe Gamma > Normal/gamma (Andrews and Mallows; West) (β y,τ 2,Λ,σ 2 ) N( ˆβ, ˆΣ β ) ˆβ = (τ 2 Λ 1 + X T X ) 1 X T y

24 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv

25 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv Two ways of generalizing this: the Lévy representation variance-mean mixtures

26 1) The Lévy representation Let ψ(t), t > 0, be a nonnegative-real-valued, totally monotone function such that lim t 0 ψ(t)=0. Part A: Suppose that these conditions are met for t f (β j ). Then the prior distribution p(β j s) exp{ sψ[ f (β j )]}, where s > 0, is the moment-generating function of a subordinator T (s), evaluated at f (β j ), whose Lévy measure satisfies ψ(t)= 0 {1 exp( tx)} µ(dx). (1) Part B: Suppose that these conditions are met for t β 2/2. Then p(β j j exp{ sψ(β 2 /2)}, where s > 0, is a mixture of normals given by j s) p(β j s) 0 N β j 0,T 1 T 1/2 p(t ) dt, where p(t ) is the density of the subordinator T, observed at time s, whose Lévy measure µ(dx) satisfies (1).

27 1) The Lévy representation Let ψ(t), t > 0, be a nonnegative-real-valued, totally monotone function such that lim t 0 ψ(t)=0. Part A: Suppose that these conditions are met for t f (β j ). Then the prior distribution p(β j s) exp{ sψ[ f (β j )]}, where s > 0, is the moment-generating function of a subordinator T (s), evaluated at f (β j ), whose Lévy measure satisfies ψ(t)= 0 {1 exp( tx)} µ(dx). (1) Part B: Suppose that these conditions are met for t β 2/2. Then p(β j j exp{ sψ(β 2 /2)}, where s > 0, is a mixture of normals given by j s) p(β j s) 0 N β j 0,T 1 T 1/2 p(t ) dt, where p(t ) is the density of the subordinator T, observed at time s, whose Lévy measure µ(dx) satisfies (1). (super useful for block-wise penalties, e.g. the group lasso)

28 2) Mean-variance mixtures p(z)= p(z)= 0 0 φ(z v) p(v) dv φ(z µ + kv, v) p(v) dv (like Brownian motion with a drift; useful for non-gaussian likelihoods)

29 The generic regularization problem n p β j Q(β)= f (y i, x T i β)+ g i=1 j =1 τs j We consider properties of the posterior distribution e Q(β) p(β τ, y) exp n n i=1 i=1 p(z i x T i f (y i, x i β) p β) = p(z β) p(β τ), k j =1 j =1 g(β j /τs j ) p(β j τ) where z i = y i x T i β for regression, or z i = y i x T i β for classification.

30 The generic regularization problem Suppose that both the likelihood and prior/penalty can be represented as normal variance-mean mixtures. p(z i β)= p(β j τ)= 0 0 φ(z i µ z + κ z ω i,σ 2 ω i ) dp(ω i ) φ(β j µ β + κ β λ j,τ 2 s 2 j λ j ) dp(λ j ).

31 Then we have an easy EM: E Step Compute the expected value of the log posterior, given the current parameter estimate: Q(β β (g) )= log p(β ω,λ,τ, y) p(ω,λ β (g),τ, y) dω dλ. M Step Maximize the complete-data posterior to update the parameter estimate: β (g+1) = argmax β Q(β β (g) ).

32 The E Step The observed-data log posterior is p(β τ, y)= π(β ω,λ, y) p(ω,λ y,τ) dω dλ. Exploiting the mean variance mixture representation, the complete-data log posterior is log p(β ω,λ,τ, y)=c 0 (ω,λ, y,τ) 1 2 n i=1 1 2s 2 j τ2 ω 1 i p j =1 zi µ z κ y ω i 2 λ 1 (β j j µ β κ β λ j ) 2 This can be shown to depend linearly upon the conditional moments { ˆω 1 } and i }. Therefore the E-step is to simply plug these conditional expected values into {ˆλ 1 j the complete-data log posterior.

33 The E Step Theorem: these conditional moments are: (β j µ β )ˆλ 1(g) j (z i µ z ) ˆω 1(g) i = κ β + τ 2 s 2 j g (β j τ), = κ z + σ 2 f (z i β), Key fact: we don t need the conditional posterior for the latent variances. Only need the functional form of the likelihood (f) and prior (g). These are pre-specified.

34 Tilted, iteratively re-weighted least squares E step Given a current estimate β = β (g), compute the conditional moments of the latent variances as M Step (β (g) j (z (g) i µ β )ˆλ 1(g) j µ z ) ˆω 1(g) i = κ β + τ 2 s 2 j g (β (g) j = κ z + σ 2 f (z (g) i β). τ), For regression, compute β (g+1) as β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X 1 X T ˆΩ 1(g) y µ z ω 1(g) κ z 1. For classification, compute β (g+1) as β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X 1 X T ˆΩ 1(g) µ z 1 + κ z ˆω (g).

35 This works for a broad class of models. Likelihood f (z i β) k z µ z p(ω i ) Squared-error z ω i i 1 Absolute-error z i 0 0 Exponential Check loss z i +(2q 1)z i 1 2q 0 GIG SVM max(1 z i,0) 1 1 GIG Logistic log(1 + e z i ) 1/2 0 Polya Penalty function g(β j τ) k β µ β p(λ j ) Ridge (β j /τ) ω i 1 Lasso β j /τ 0 0 Exponential Bridge β j /τ α 0 0 Stable Gen. Double-Pareto {(1 + α)/τ}log(1 + β j /ατ) 0 0 Exp Gam + multinomial logit, autologistic, topic models, RBMs, extreme-value....

36 Two key identities: α 2 κ 2 2α e α θ µ +κ(θ µ) = 1 B(α, κ) e α(θ µ) (1 + e θ µ ) 2(α κ) = 0 0 φ(θ µ + κv, v)) p gig v 1,0, α 2 κ 2 dv φ(θ µ + κv, v) p pol (v α,α 2κ) dv. Improper versions (treat purely as an integral identity) a 1 exp 2c 1 max(au,0) = c 1 exp 2c 1 ρ q (u) = (1 + exp{u µ}) 1 = φ(u av, cv) dv φ(u (2τ 1)v, cv)e 2τ(1 τ)v dv φ(u µ (1/2)v, v) p pol (v 0,1) dv where ρ q (u)= 1 2 u + q 1 2 u is the check-loss function.

37 Back to the teaser example: logit models A Polya mixing distribution... The terms in this sum are p pol (v α,α 2k)= k=0 w k e a k v (α + k)(k + k) a k = 2 ak w k = a k j =k a j a k = 2δ k where b = 1 2 (α k), δ = 1 2 (α + k), and 2δ k (δ + k) B(δ + b,δ b), = ( 1)k (2δ)...(2δ+k 1) k!.

38 Back to the teaser example: logit models A Polya mixing distribution leads to a Z distribution marginal. p Z (θ µ,α, k)= 1 B(α, k) (e θ µ ) α (1 + e θ µ ) 2(α k) This family includes the logistic regression model as a limiting, improper case: (a, k, µ) = (1, 1/2, 0). Our representation theorem gives the relevant conditional moment as ˆω 1 i = 1 z i e z i 1 + e z i 1 2, z i = y i x T i β

39 Multinomial logit The probability that observation i falls into class k is θ ki = P(y i = k)= exp(xt β i k ) K l =1 exp(xt β i l ) To represent the multinomial logit model in our framework, let η ki = exp x T i c ki (β ( k) ) = log l =k β k c ki /{1 + exp x T i exp{x T i β l } β k c ki } (Holmes and Held, 2006)

40 Multinomial logit Write the conditional likelihood for category k as n K L(β k β ( k), y) = i=1 n i=1 n i=1 n i=1 n i=1 l =1 θỹli li ηỹki ki {w i (1 η ki )}1 ỹ ki {(1 η ηỹki ki ki )}1 ỹ ki exp(γki x T β i k γ ki c ki ) exp(γ ki x T i β k γ ki c ki ) φ(z ki µ ki + κξ ki,ξ ki ) p PY (ξ ki 1,0) dξ ki where κ = 1/2, z ki = γ ki x T β i k, µ ki = γ ki c ki, and p PY (ξ ki 1,0) is a function of ξ ki that is the limit of a Polya density as b 0.

41 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update Ω k : z (t) ki µ (t) ki ω (t) ki Ω (t) k := γ ki x T i β (t) k := γ ki log l =k := z (t) ki 1 µ (t) ki exp(x T i := diag(ω (t) k1,...,ω(t) kn ), β (t) ) l exp 1 + exp z (t) ki z (t) ki µ (t) ki µ (t) ki 1 2 where β (t) is the current estimate for the kth block of coefficients, and k where γ ki = ±1 is an indicator of whether y i = k.

42 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update Λ k : λ kj := Λ (t) k τ 2 ψ (β (t) kj /τ) β (t) kj := diag(λ (t) k1,...,λ(t) kp ), where ψ is the derivative of the penalty function ψ(β kj ).

43 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update β k : Solve the linear system A (t) k β(t+1) k = b (t) k for β (t+1), with k A (t) k b (t) k := τ 2 Λ (t) + X T k k Ω(t) X k k := X T Ω (t) k k µ(t) + 1 k 2 1, where X k is the n p matrix having rows x i = γ ki x i, 1 is a column vector of ones, and µ (t) k =(µ(t) k1,...,µ(t) kn )T.

44 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update β k : Solve the linear system A (t) k β(t+1) k = b (t) k for β (t+1), with k A (t) k b (t) k := τ 2 Λ (t) + X T k k Ω(t) X k k := X T Ω (t) k k µ(t) + 1 k 2 1, where X k is the n p matrix having rows x i = γ ki x i, 1 is a column vector of ones, and µ (t) k =(µ(t) k1,...,µ(t) kn )T. Don t solve this exactly.

45 Tilted, iteratively re-weighted conjugate gradient In the update step for ß, don t solve the system exactly. Instead: While (t,l ) >δ min, increment l and set β (c,l ) (c,l 1) := β + k (t,l 1) r (t,l ) := r (t,l 1) α (t,l 1) c (t,l 1) γ (t,l ) := r T (t,l ) r (t,l ) r T (t,l 1) r (t,l 1) d (t,l ) := r (t,l ) + γ (t,l ) d (t,l 1) c (t,l ) := A (t) k d (t,l ) α (t,l ) := r T (t,0) r (t,l ) d T (t,l ) c (t,l ) (t,l ) := α (t,l ) d (t,l ).

46 Tilted, iteratively re-weighted conjugate gradient In the update step for ß, don t solve the system exactly. Instead: While (t,l ) >δ min, increment l and set β (c,l ) (c,l 1) := β + k (t,l 1) r (t,l ) := r (t,l 1) α (t,l 1) c (t,l 1) γ (t,l ) := r T (t,l ) r (t,l ) r T (t,l 1) r (t,l 1) Parallelize this. d (t,l ) := r (t,l ) + γ (t,l ) d (t,l 1) c (t,l ) := A (t) k d (t,l ) α (t,l ) := r T (t,0) r (t,l ) d T (t,l ) c (t,l ) (t,l ) := α (t,l ) d (t,l ).

47 Other methods You can solve many of these problems using tailored methods but not all of them, and rarely this simply or efficiently. LARS: efficient only in Gaussian models Coordinate descent: can get stuck; may require numerical derivatives; irredeemably serial LLA/LQA: exact only in Gaussian models (where it s a special case of our method) Variational methods: never exact; sometimes inconsistent; poor in cases of interesting a posteriori dependence

48 How would you do MCMC in this class? Some people have worked out special cases. Here s an interesting fact: p pol ( 1 2, 1 2) (λ) = ( 1) n n + 1 e 1 2(n+ 1 2) 2 λ 2 C D = n=0 1 4π 2 λ p(c )=4π 2 C D = 1 2π 2 ( 1) n n + 1 e 2 (n+ 1 2) 2 π 2 C 2 n=0 n=1 (1) n 1 2 2

49 How would you do MCMC in this class? There is a duality between the scale-mixture and MGF representations for C: e x2 2 C = (2πC ) 12 e x2 8π 2 C = 1 cosh x 2 A logit-type likelihood would thus look like 2 a+b e ax = e 1 (1 + e x ) a+b 2 (a b)x 1 cosh x 2 = e 1 2 (a b)x e x2 2 C a+b

50 How would you do MCMC in this class? Therefore we can simulate from C using Polya-Gamma distributions: (C x) D = 2 (a + b,1) a n = a n=1 n 1 + x 2 4(n 1 2) 2 π 2 2 π 2 4 n 1 2

51 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Ridge Lasso Bridge Normal/exponential-gamma Normal/gamma Generalized-t (double Pareto) Normal/inverted-beta Normal/inverse-Gaussian Meridian filter The theory of variance-mean mixtures allows MAP estimation for arbitrary combinations of model (left) with prior or penalty (right). With the conjugate-gradient version, there are no matrix inverses and no numerical derivatives.

52 Three papers Sparse Bayes estimation in non-gaussian models via data augmentation. Sparse multinomial logistic regression via nonconcave penalized likelihood. Exact MAP estimation in logistic-normal topic models.

arxiv: v3 [stat.me] 26 Feb 2012

arxiv: v3 [stat.me] 26 Feb 2012 Data augmentation for non-gaussian regression models using variance-mean mixtures arxiv:113.547v3 [stat.me] 26 Feb 212 Nicholas G. Polson Booth School of Business University of Chicago James G. Scott University

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Data augmentation for non-gaussian regression models using variance-mean mixtures

Data augmentation for non-gaussian regression models using variance-mean mixtures Biometrika (213), 1,2,pp. 459 471 doi: 1.193/biomet/ass81 C 213 Biometrika Trust Advance Access publication 11 March 213 Printed in Great Britain Data augmentation for non-gaussian regression models using

More information

Dynamic Generalized Linear Models

Dynamic Generalized Linear Models Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Sampling Equation Derivation for Lex-MED-RTM

Sampling Equation Derivation for Lex-MED-RTM Sampling Equation Derivation for Lex-MED-RTM Weiwei Yang Computer Science University of Maryland College Park, MD wwyang@cs.umd.edu Jordan Boyd-Graber Computer Science University of Colorado Boulder, CO

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

arxiv: v2 [stat.me] 27 Oct 2012

arxiv: v2 [stat.me] 27 Oct 2012 The Bayesian Bridge Nicholas G. Polson University of Chicago arxiv:119.2279v2 [stat.me] 27 Oct 212 James G. Scott Jesse Windle University of Texas at Austin First Version: July 211 This Version: October

More information

Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction

Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction BAYESIAN STATISTICS 9, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2010 Shrink Globally, Act Locally: Sparse Bayesian

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Sparse Bayesian Nonparametric Regression

Sparse Bayesian Nonparametric Regression François Caron caronfr@cs.ubc.ca Arnaud Doucet arnaud@cs.ubc.ca Departments of Computer Science and Statistics, University of British Columbia, Vancouver, Canada Abstract One of the most common problems

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

On the Half-Cauchy Prior for a Global Scale Parameter

On the Half-Cauchy Prior for a Global Scale Parameter Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

On the half-cauchy prior for a global scale parameter

On the half-cauchy prior for a global scale parameter On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Aspects of Feature Selection in Mass Spec Proteomic Functional Data

Aspects of Feature Selection in Mass Spec Proteomic Functional Data Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information