Regularization with variance-mean mixtures
|
|
- Eleanor Daniel
- 5 years ago
- Views:
Transcription
1 Regularization with variance-mean mixtures Nick Polson University of Chicago James Scott University of Texas at Austin Workshop on Sensing and Analysis of High-Dimensional Data Duke University July 2011
2 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter
3 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter Our approach works for arbitrary combinations of likelihood with prior. No matrix inversion; no numerical derivatives. Fully parallelizable block updates.
4 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Lasso/ridge/bridge MC+ Group lasso Normal/exponential-gamma Normal/gamma Generalized double Pareto Normal/inverted-beta Normal/inverse-Gaussian Meridian filter Our approach works for arbitrary combinations of likelihood with prior. No matrix inversion; no numerical derivatives. Fully parallelizable block updates.
5 But I can solve all of these with an 1 constraint. True enough. But consider an example: p = 25; n = 500; 1000 simulated data sets. y i = 1 zi >0 for i = 1,..., n z N(X β, I ) β = 5 (5x) 0 (20x) MLE Lasso-UT Lasso-CV HS Median SSE Mean SSE
6 But I can solve all of these with an 1 constraint. True enough. But consider an example: p = 25; n = 500; 1000 simulated data sets. y i = 1 zi >0 for i = 1,..., n z N(X β, I ) β = 5 (5x) 0 (20x) (an r-spike signal) MLE Lasso-UT Lasso-CV HS Median SSE Mean SSE
7
8
9 There seems to be precise separation of the computationally nice penalties/priors asso... from the statistically nice priors. Cauchy Horseshoe Our contribution: an algorithm for many priors in this second class, when used in conjunction with many common non-gaussian likelihoods.
10 There seems to be precise separation of the computationally nice penalties/priors asso... from the statistically nice priors. Cauchy Horseshoe PS, 2011 Fan and Li, 2001 Pericchi and Smith, 1992 Masreliez, 1975 Brown, 1971 etc. Our contribution: an algorithm for many priors in this second class, when used in conjunction with many common non-gaussian likelihoods.
11 A teaser example: logit with a bridge penalty ˆβ = arg min β p n i=1 log(1 + exp{ y i x T i β})+ p j =1 β j /τs j α
12 A teaser example: logit with a bridge penalty ˆβ = arg min β p n i=1 log(1 + exp{ y i x T i β})+ p j =1 β j /τs j α gamma o ouch.
13 To find the MAP, just iterate three steps β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X XT 1 ˆω 1(g+1) i = 1 z (g) i e z(g) i 1 + e z(g) i 1 2 ˆλ 1(g+1) j = α(τs j ) 2 α β (g) α 2 j
14 To find the MAP, just iterate three steps Don t actually do this. β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) 1 1 X 2 XT 1 ˆω 1(g+1) i = 1 z (g) i e z(g) i 1 + e z(g) i 1 2 ˆλ 1(g+1) j = α(τs j ) 2 α β (g) α 2 j
15 Example: covariate-dependent disease networks ~112m patient records comprising ICD-9 codes + prescriptions + covariates ISCHEMIC HEART DISEASE ( ) 410 Acute myocardial infarction Of anterolateral wall Of other anterior wall Of inferolateral wall Of inferoposterior wall Of other inferior wall Of other lateral wall True posterior wall infarction Subendocardial infarction Of other specified sites Unspecified site Our approach: a tree of graphs Tree splits on covariates, mainly demographics and geography. Each terminal node is a disease network for a subgroup of the population.
16 Example: covariate-dependent disease networks ~112m patient records comprising ICD-9 codes + prescriptions + covariates ISCHEMIC HEART DISEASE ( ) 410 Acute myocardial infarction Of anterolateral wall Of other anterior wall Of inferolateral wall Of inferoposterior wall Of other inferior wall Of other lateral wall True posterior wall infarction Subendocardial infarction Of other specified sites Unspecified site Our approach: a tree of graphs Goh et. al., PNAS (2007) Tree splits on covariates, mainly demographics and geography. Each terminal node is a disease network for a subgroup of the population.
17 The big picture We use normal variance mean mixtures to represent a wide class of objective functions commonly encountered in high-dimensional problems. By modern Bayesian standards, these are pretty simple. But they are ubiquitous, useful, and often the only tractable approach in their respective domains for data sets beyond a certain size.
18 This class is surprisingly broad. For example: p(α 1:K,β 1:Nd W,µ,Σ) D d=1 Nd p(β d µ,σ) n=1 K k=1 eβ dk K l =1 eβ dl α k,wn p(α 1:K ) We get an exact MAP estimate. No variational approximation.
19 This class is surprisingly broad. For example: p(α 1:K,β 1:Nd W,µ,Σ) D d=1 Nd p(β d µ,σ) n=1 K k=1 eβ dk K l =1 eβ dl α k,wn p(α 1:K ) (e.g. Blei and Lafferty, 2009) We get an exact MAP estimate. No variational approximation.
20 Connection with previous work Penalties/priors corresponding to scale mixtures: lasso (Tibshirani, 1996; Park and Casella, 2008; Hans, 2009) bridge estimators (West, 1987; Huang et al., 2008) relevance vector machines (Tipping, 2001) normal/jeffreys (Figueiredo, 2003; Bae and Mallick, 2004) normal/exponential-gamma (Griffin and Brown, 2005) normal/inverse-gaussian (Caron and Doucet, 2008) normal/gamma (Griffin and Brown, 2010) horseshoe/inverted-beta prior (CPS 2010; Polson and Scott, 2011) double-pareto (Armagan et al., 2010) Algorithms for regularized regression LARS (Efron et al., 2004) LQA (Fan and Li, 2001) and LLA (Zou and Li, 2008) EM/ECME (Dempster et al., 1977; Meng and Rubin, 1993; many others) MM (Hunter and Lange, 2000; Taddy, 2010) MCMC for support-vector machines (Polson and Steve Scott, 2011) MCMC for logistic regression (Gramacy and Polson; Holmes and Held; Steve Scott; SFS; others) Distributional theory based on variance-mean mixtures Z distributions (Fisher, 1923) Generalized inverse-gaussian distributions (Barndorff-Nielsen 1977) Penalties, priors, and Lévy processes (Polson and Scott, 2011)
21 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv
22 ˆβ = argmin y X β 2 + ν p g(β j ) j =1 (y β) N(X β,σ 2 β) (β i τ 2,λ 2 i ) N(0,τ2 λ 2 i ) λ 2 i π(λ 2 i ) (τ 2,σ 2 ) π(τ 2,σ 2 ) (β y,τ 2,Λ,σ 2 ) N( ˆβ, ˆΣ β ) ˆβ = (τ 2 Λ 1 + X T X ) 1 X T y
23 ˆβ = argmin y X β 2 + ν p g(β j ) j =1 (y β) N(X β,σ 2 β) (β i τ 2,λ 2 i ) N(0,τ2 λ 2 i ) λ 2 i π(λ 2 i ) (τ 2,σ 2 ) π(τ 2,σ 2 ) Exponential > Lasso Inverted-beta > Horseshoe Gamma > Normal/gamma (Andrews and Mallows; West) (β y,τ 2,Λ,σ 2 ) N( ˆβ, ˆΣ β ) ˆβ = (τ 2 Λ 1 + X T X ) 1 X T y
24 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv
25 The standard scale-mixture trick p(z)= 0 φ(z v) p(v) dv Two ways of generalizing this: the Lévy representation variance-mean mixtures
26 1) The Lévy representation Let ψ(t), t > 0, be a nonnegative-real-valued, totally monotone function such that lim t 0 ψ(t)=0. Part A: Suppose that these conditions are met for t f (β j ). Then the prior distribution p(β j s) exp{ sψ[ f (β j )]}, where s > 0, is the moment-generating function of a subordinator T (s), evaluated at f (β j ), whose Lévy measure satisfies ψ(t)= 0 {1 exp( tx)} µ(dx). (1) Part B: Suppose that these conditions are met for t β 2/2. Then p(β j j exp{ sψ(β 2 /2)}, where s > 0, is a mixture of normals given by j s) p(β j s) 0 N β j 0,T 1 T 1/2 p(t ) dt, where p(t ) is the density of the subordinator T, observed at time s, whose Lévy measure µ(dx) satisfies (1).
27 1) The Lévy representation Let ψ(t), t > 0, be a nonnegative-real-valued, totally monotone function such that lim t 0 ψ(t)=0. Part A: Suppose that these conditions are met for t f (β j ). Then the prior distribution p(β j s) exp{ sψ[ f (β j )]}, where s > 0, is the moment-generating function of a subordinator T (s), evaluated at f (β j ), whose Lévy measure satisfies ψ(t)= 0 {1 exp( tx)} µ(dx). (1) Part B: Suppose that these conditions are met for t β 2/2. Then p(β j j exp{ sψ(β 2 /2)}, where s > 0, is a mixture of normals given by j s) p(β j s) 0 N β j 0,T 1 T 1/2 p(t ) dt, where p(t ) is the density of the subordinator T, observed at time s, whose Lévy measure µ(dx) satisfies (1). (super useful for block-wise penalties, e.g. the group lasso)
28 2) Mean-variance mixtures p(z)= p(z)= 0 0 φ(z v) p(v) dv φ(z µ + kv, v) p(v) dv (like Brownian motion with a drift; useful for non-gaussian likelihoods)
29 The generic regularization problem n p β j Q(β)= f (y i, x T i β)+ g i=1 j =1 τs j We consider properties of the posterior distribution e Q(β) p(β τ, y) exp n n i=1 i=1 p(z i x T i f (y i, x i β) p β) = p(z β) p(β τ), k j =1 j =1 g(β j /τs j ) p(β j τ) where z i = y i x T i β for regression, or z i = y i x T i β for classification.
30 The generic regularization problem Suppose that both the likelihood and prior/penalty can be represented as normal variance-mean mixtures. p(z i β)= p(β j τ)= 0 0 φ(z i µ z + κ z ω i,σ 2 ω i ) dp(ω i ) φ(β j µ β + κ β λ j,τ 2 s 2 j λ j ) dp(λ j ).
31 Then we have an easy EM: E Step Compute the expected value of the log posterior, given the current parameter estimate: Q(β β (g) )= log p(β ω,λ,τ, y) p(ω,λ β (g),τ, y) dω dλ. M Step Maximize the complete-data posterior to update the parameter estimate: β (g+1) = argmax β Q(β β (g) ).
32 The E Step The observed-data log posterior is p(β τ, y)= π(β ω,λ, y) p(ω,λ y,τ) dω dλ. Exploiting the mean variance mixture representation, the complete-data log posterior is log p(β ω,λ,τ, y)=c 0 (ω,λ, y,τ) 1 2 n i=1 1 2s 2 j τ2 ω 1 i p j =1 zi µ z κ y ω i 2 λ 1 (β j j µ β κ β λ j ) 2 This can be shown to depend linearly upon the conditional moments { ˆω 1 } and i }. Therefore the E-step is to simply plug these conditional expected values into {ˆλ 1 j the complete-data log posterior.
33 The E Step Theorem: these conditional moments are: (β j µ β )ˆλ 1(g) j (z i µ z ) ˆω 1(g) i = κ β + τ 2 s 2 j g (β j τ), = κ z + σ 2 f (z i β), Key fact: we don t need the conditional posterior for the latent variances. Only need the functional form of the likelihood (f) and prior (g). These are pre-specified.
34 Tilted, iteratively re-weighted least squares E step Given a current estimate β = β (g), compute the conditional moments of the latent variances as M Step (β (g) j (z (g) i µ β )ˆλ 1(g) j µ z ) ˆω 1(g) i = κ β + τ 2 s 2 j g (β (g) j = κ z + σ 2 f (z (g) i β). τ), For regression, compute β (g+1) as β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X 1 X T ˆΩ 1(g) y µ z ω 1(g) κ z 1. For classification, compute β (g+1) as β (g+1) = τ 2 S 1 ˆΛ 1(g) + X T ˆΩ 1(g) X 1 X T ˆΩ 1(g) µ z 1 + κ z ˆω (g).
35 This works for a broad class of models. Likelihood f (z i β) k z µ z p(ω i ) Squared-error z ω i i 1 Absolute-error z i 0 0 Exponential Check loss z i +(2q 1)z i 1 2q 0 GIG SVM max(1 z i,0) 1 1 GIG Logistic log(1 + e z i ) 1/2 0 Polya Penalty function g(β j τ) k β µ β p(λ j ) Ridge (β j /τ) ω i 1 Lasso β j /τ 0 0 Exponential Bridge β j /τ α 0 0 Stable Gen. Double-Pareto {(1 + α)/τ}log(1 + β j /ατ) 0 0 Exp Gam + multinomial logit, autologistic, topic models, RBMs, extreme-value....
36 Two key identities: α 2 κ 2 2α e α θ µ +κ(θ µ) = 1 B(α, κ) e α(θ µ) (1 + e θ µ ) 2(α κ) = 0 0 φ(θ µ + κv, v)) p gig v 1,0, α 2 κ 2 dv φ(θ µ + κv, v) p pol (v α,α 2κ) dv. Improper versions (treat purely as an integral identity) a 1 exp 2c 1 max(au,0) = c 1 exp 2c 1 ρ q (u) = (1 + exp{u µ}) 1 = φ(u av, cv) dv φ(u (2τ 1)v, cv)e 2τ(1 τ)v dv φ(u µ (1/2)v, v) p pol (v 0,1) dv where ρ q (u)= 1 2 u + q 1 2 u is the check-loss function.
37 Back to the teaser example: logit models A Polya mixing distribution... The terms in this sum are p pol (v α,α 2k)= k=0 w k e a k v (α + k)(k + k) a k = 2 ak w k = a k j =k a j a k = 2δ k where b = 1 2 (α k), δ = 1 2 (α + k), and 2δ k (δ + k) B(δ + b,δ b), = ( 1)k (2δ)...(2δ+k 1) k!.
38 Back to the teaser example: logit models A Polya mixing distribution leads to a Z distribution marginal. p Z (θ µ,α, k)= 1 B(α, k) (e θ µ ) α (1 + e θ µ ) 2(α k) This family includes the logistic regression model as a limiting, improper case: (a, k, µ) = (1, 1/2, 0). Our representation theorem gives the relevant conditional moment as ˆω 1 i = 1 z i e z i 1 + e z i 1 2, z i = y i x T i β
39 Multinomial logit The probability that observation i falls into class k is θ ki = P(y i = k)= exp(xt β i k ) K l =1 exp(xt β i l ) To represent the multinomial logit model in our framework, let η ki = exp x T i c ki (β ( k) ) = log l =k β k c ki /{1 + exp x T i exp{x T i β l } β k c ki } (Holmes and Held, 2006)
40 Multinomial logit Write the conditional likelihood for category k as n K L(β k β ( k), y) = i=1 n i=1 n i=1 n i=1 n i=1 l =1 θỹli li ηỹki ki {w i (1 η ki )}1 ỹ ki {(1 η ηỹki ki ki )}1 ỹ ki exp(γki x T β i k γ ki c ki ) exp(γ ki x T i β k γ ki c ki ) φ(z ki µ ki + κξ ki,ξ ki ) p PY (ξ ki 1,0) dξ ki where κ = 1/2, z ki = γ ki x T β i k, µ ki = γ ki c ki, and p PY (ξ ki 1,0) is a function of ξ ki that is the limit of a Polya density as b 0.
41 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update Ω k : z (t) ki µ (t) ki ω (t) ki Ω (t) k := γ ki x T i β (t) k := γ ki log l =k := z (t) ki 1 µ (t) ki exp(x T i := diag(ω (t) k1,...,ω(t) kn ), β (t) ) l exp 1 + exp z (t) ki z (t) ki µ (t) ki µ (t) ki 1 2 where β (t) is the current estimate for the kth block of coefficients, and k where γ ki = ±1 is an indicator of whether y i = k.
42 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update Λ k : λ kj := Λ (t) k τ 2 ψ (β (t) kj /τ) β (t) kj := diag(λ (t) k1,...,λ(t) kp ), where ψ is the derivative of the penalty function ψ(β kj ).
43 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update β k : Solve the linear system A (t) k β(t+1) k = b (t) k for β (t+1), with k A (t) k b (t) k := τ 2 Λ (t) + X T k k Ω(t) X k k := X T Ω (t) k k µ(t) + 1 k 2 1, where X k is the n p matrix having rows x i = γ ki x i, 1 is a column vector of ones, and µ (t) k =(µ(t) k1,...,µ(t) kn )T.
44 An exact ECM For iteration t = 1, 2,... For k = 2,..., K, cycle through the following steps: Update β k : Solve the linear system A (t) k β(t+1) k = b (t) k for β (t+1), with k A (t) k b (t) k := τ 2 Λ (t) + X T k k Ω(t) X k k := X T Ω (t) k k µ(t) + 1 k 2 1, where X k is the n p matrix having rows x i = γ ki x i, 1 is a column vector of ones, and µ (t) k =(µ(t) k1,...,µ(t) kn )T. Don t solve this exactly.
45 Tilted, iteratively re-weighted conjugate gradient In the update step for ß, don t solve the system exactly. Instead: While (t,l ) >δ min, increment l and set β (c,l ) (c,l 1) := β + k (t,l 1) r (t,l ) := r (t,l 1) α (t,l 1) c (t,l 1) γ (t,l ) := r T (t,l ) r (t,l ) r T (t,l 1) r (t,l 1) d (t,l ) := r (t,l ) + γ (t,l ) d (t,l 1) c (t,l ) := A (t) k d (t,l ) α (t,l ) := r T (t,0) r (t,l ) d T (t,l ) c (t,l ) (t,l ) := α (t,l ) d (t,l ).
46 Tilted, iteratively re-weighted conjugate gradient In the update step for ß, don t solve the system exactly. Instead: While (t,l ) >δ min, increment l and set β (c,l ) (c,l 1) := β + k (t,l 1) r (t,l ) := r (t,l 1) α (t,l 1) c (t,l 1) γ (t,l ) := r T (t,l ) r (t,l ) r T (t,l 1) r (t,l 1) Parallelize this. d (t,l ) := r (t,l ) + γ (t,l ) d (t,l 1) c (t,l ) := A (t) k d (t,l ) α (t,l ) := r T (t,0) r (t,l ) d T (t,l ) c (t,l ) (t,l ) := α (t,l ) d (t,l ).
47 Other methods You can solve many of these problems using tailored methods but not all of them, and rarely this simply or efficiently. LARS: efficient only in Gaussian models Coordinate descent: can get stuck; may require numerical derivatives; irredeemably serial LLA/LQA: exact only in Gaussian models (where it s a special case of our method) Variational methods: never exact; sometimes inconsistent; poor in cases of interesting a posteriori dependence
48 How would you do MCMC in this class? Some people have worked out special cases. Here s an interesting fact: p pol ( 1 2, 1 2) (λ) = ( 1) n n + 1 e 1 2(n+ 1 2) 2 λ 2 C D = n=0 1 4π 2 λ p(c )=4π 2 C D = 1 2π 2 ( 1) n n + 1 e 2 (n+ 1 2) 2 π 2 C 2 n=0 n=1 (1) n 1 2 2
49 How would you do MCMC in this class? There is a duality between the scale-mixture and MGF representations for C: e x2 2 C = (2πC ) 12 e x2 8π 2 C = 1 cosh x 2 A logit-type likelihood would thus look like 2 a+b e ax = e 1 (1 + e x ) a+b 2 (a b)x 1 cosh x 2 = e 1 2 (a b)x e x2 2 C a+b
50 How would you do MCMC in this class? Therefore we can simulate from C using Polya-Gamma distributions: (C x) D = 2 (a + b,1) a n = a n=1 n 1 + x 2 4(n 1 2) 2 π 2 2 π 2 4 n 1 2
51 Robust regression Logit models Multinomial logit models Extreme-value models Support vector machines Topic models Restricted Boltzmann machines Neural networks Autologistic models Penalized additive models Student t Ridge Lasso Bridge Normal/exponential-gamma Normal/gamma Generalized-t (double Pareto) Normal/inverted-beta Normal/inverse-Gaussian Meridian filter The theory of variance-mean mixtures allows MAP estimation for arbitrary combinations of model (left) with prior or penalty (right). With the conjugate-gradient version, there are no matrix inverses and no numerical derivatives.
52 Three papers Sparse Bayes estimation in non-gaussian models via data augmentation. Sparse multinomial logistic regression via nonconcave penalized likelihood. Exact MAP estimation in logistic-normal topic models.
arxiv: v3 [stat.me] 26 Feb 2012
Data augmentation for non-gaussian regression models using variance-mean mixtures arxiv:113.547v3 [stat.me] 26 Feb 212 Nicholas G. Polson Booth School of Business University of Chicago James G. Scott University
More informationLasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationData augmentation for non-gaussian regression models using variance-mean mixtures
Biometrika (213), 1,2,pp. 459 471 doi: 1.193/biomet/ass81 C 213 Biometrika Trust Advance Access publication 11 March 213 Printed in Great Britain Data augmentation for non-gaussian regression models using
More informationDynamic Generalized Linear Models
Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationHorseshoe, Lasso and Related Shrinkage Methods
Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationSampling Equation Derivation for Lex-MED-RTM
Sampling Equation Derivation for Lex-MED-RTM Weiwei Yang Computer Science University of Maryland College Park, MD wwyang@cs.umd.edu Jordan Boyd-Graber Computer Science University of Colorado Boulder, CO
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationarxiv: v2 [stat.me] 27 Oct 2012
The Bayesian Bridge Nicholas G. Polson University of Chicago arxiv:119.2279v2 [stat.me] 27 Oct 212 James G. Scott Jesse Windle University of Texas at Austin First Version: July 211 This Version: October
More informationShrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction
BAYESIAN STATISTICS 9, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2010 Shrink Globally, Act Locally: Sparse Bayesian
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSparse Bayesian Nonparametric Regression
François Caron caronfr@cs.ubc.ca Arnaud Doucet arnaud@cs.ubc.ca Departments of Computer Science and Statistics, University of British Columbia, Vancouver, Canada Abstract One of the most common problems
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationOn the Half-Cauchy Prior for a Global Scale Parameter
Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationOn the half-cauchy prior for a global scale parameter
On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationDesign of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.
Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationGeneralized Linear Models and Its Asymptotic Properties
for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationStable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence
Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationAspects of Feature Selection in Mass Spec Proteomic Functional Data
Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More information