An Extended BIC for Model Selection

Size: px
Start display at page:

Download "An Extended BIC for Model Selection"

Transcription

1 An Extended BIC for Model Selection at the JSM meeting Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri, University of Valencia; Woncheol Jang, University of Georgia; Luis R. Pericchi, University of Puerto Rico JSM: July-August slide #1

2 Motivation and Outline Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Initiated from a SAMSI Social Sciences working group. getting the model right in structural equation modeling. Introduced in the Wald Lecture by Jim Berger. Specific characteristics They have independent cases (individuals). Often Multilevel or random effects models (p grows as n grows). Regularity Conditions may not be satisfied. Problems with BIC. Can a generalization of BIC work? Goal: Use only Standard software output and close-form expressions. Present a generalization of BIC ( extended BIC or ) and some examples in linear regression scenario. Work in progress (comments are welcome!) JSM: July-August slide #2

3 Bayesian Information Criterion: BIC Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p l(ˆθ x) the log-likelihood of ˆθ given x BIC = 2l(ˆθ x) p log(n), p the number of estimated parameters. n the cardinality of x. Swartz s result: As n (with p fixed) this is an approximation (up to a constant) to twice the Bayesian log likelihood for the model, m(x) = R f(x θ)π(θ)dθ, so that m(x) = c π e BIC/2 (1 + o(1)). BIC approaches to BF s further ignore the c π s check this JSM: July-August slide #3

4 First Ambiguity: What is n? Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Consider time series y it, y it = βy i t 1 + ε it, i = 1,..., n, t = 1,... T, ε it N(0, σ 2 ). When the goal is to estimate the mean level ȳ, consider two extreme cases: 1. β = 0; then the y it are i.i.d., and the estimated parameter ȳ has sample size n = n T 2. β = 1; then clearly the effective sample size for ȳ equals n = n See Thiebaux & Zwiers (1984) for discussion of effective sample size in time series. JSM: July-August slide #4

5 Second Ambiguity: Can n be different for differnt parameters? Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Consider data from p groups y ip = µ p + ε ip, i = 1... n, p = 1... r ε ip N(0, σ 2 ) µ p and σ 2 both unknown. Group 1 y 11 y y n 1 Group 2 y 12 y y n Group p y 1p y 2p... y n p Group r y 1r y 2r... y n r JSM: July-August slide #5

6 Second Ambiguity: Can n be different for differnt parameters? Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Consider data from p groups y ip = µ p + ε ip, i = 1... n, p = 1... r ε ip N(0, σ 2 ) µ p and σ 2 both unknown. Different parameters can have different sample sizes: the sample size for estimating each µ p is n the sample size for estimating σ 2 is n r JSM: July-August slide #5

7 Second Ambiguity: Can n be different for differnt parameters? Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Consider data from p groups y ip = µ p + ε ip, i = 1... n, p = 1... r ε ip N(0, σ 2 ) µ p and σ 2 both unknown. Computation for this example yields Î = n I ˆσ 2 r r 0 0 n 2ˆσ 4 1 A, where ˆσ 2 = 1 n P p i=1 P r l=1 (X il X i ) 2. JSM: July-August slide #5

8 Third Ambiguity: What is the number of parameters Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Example: Random effects group means In the previous Group Means example, with p groups and r observations X il per group, with X il N(µ i, σ 2 ) for l = 1,..., r, consider the multilevel version in which with ξ and τ 2 being unknown. µ i N(ξ, τ 2 ), What is the number of parameters? (see also Pauler (1998)) (1) If τ 2 = 0, there is only two parameters: ξ and σ 2. (2) If τ 2 is huge, is the number of parameters p + 3? the means along with ξ, τ 2 and σ 2 Continued... JSM: July-August slide #6

9 Example: Random effects group means Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p (3) But, if one integrates out µ = (µ 1,..., µ p ), then Z f(x σ 2, ξ, τ 2 ) = f(x µ, ξ, σ 2 )π(µ ξ, τ 2 )dµ 1 ˆσ2 exp σ p(r 1)! py 2σ 2 i=1 so p = 3 if one can work directly with f(x σ 2, ξ, τ 2 ).! exp ( x i ξ) 2 2( σ2 + τ, r 2 ) Note: This seems to be common practice in Social Sciences: latent variables are integrated out, so remaining parameters are the only unknowns when applying BIC Note: In this example the effective sample sizes should be p r for σ 2, p for ξ and τ 2, and r for the µ i s. JSM: July-August slide #7

10 Example: Common mean, differing variances: Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p (Cox mixtures) Suppose each independent observation X i, i = 1,..., n, has probability 1/2 of arising from the N(θ, 1), and probability 1/2 of arising from the N(θ, 1000). Clearly half of the observations are worthless, so the effective sample size is roughly n/2. JSM: July-August slide #8

11 Problem with n and p Motivation and Outline Bayesian Information Criterion: BIC First Ambiguity: What is n? Second Ambiguity: Can n be different for differnt parameters? Third Ambiguity: What is the number of parameters Example: Random effects group means Example: Common mean, differing variances: Problem with n and p Problems with n. Is n the number of vector observations or the number of real observations? Different θ i can have different effective sample sizes. Some observations can me more informative than others as in mixture models models with mixed, continuous and discrete observations Problems with p. What is p with random effects, latent variables, mixture model... etc.? Often p grows with n JSM: July-August slide #9

12 : a proposed solution : a proposed solution Laplace approximation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant c π in the expansion often good even for moderate n Implicitly uses effective sample size (discussion later) Computable using standard software using mle s and observed information matrices. JSM: July-August slide #10

13 : a proposed solution : a proposed solution Laplace approximation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant c π in the expansion often good even for moderate n Implicitly uses effective sample size (discussion later) Computable using standard software using mle s and observed information matrices. JSM: July-August slide #10

14 : a proposed solution : a proposed solution Laplace approximation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant c π in the expansion often good even for moderate n Implicitly uses effective sample size (discussion later) Computable using standard software using mle s and observed information matrices. JSM: July-August slide #10

15 Laplace approximation : a proposed solution Laplace approximation 1. Preliminary: nice reparameterization. 2. Taylor expansion By a Taylor s series expansion of e l(θ) about the mle b θ, m(x) = Z Z f(x θ)π(θ)dθ = exp Z e l(θ) π(θ)dθ» l(ˆθ) + (θ ˆθ) t l(ˆθ) 1 2 ˆθ t ˆθ) θ Î θ π(θ)dθ where denotes the gradient and I b = (Îjk) is the observed information matrix, with (j, k) entry 3. Approximation to marginal m(x) If b θ occurs on the interior of the parameter space, so l( b θ) = 0 (if not true, the analysis must proceed as in Haughton 1991,1993), mild conditions yield Z m(x) = e l( θ) b e 1 2 (θ b θ) t b I(θ c θ) π(θ)dθ(1 + o n (1)). JSM: July-August slide #11

16 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC Integrate out common parameters Orthogonalize the parameters. JSM: July-August slide #12

17 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Integrate out common parameters Orthogonalize the parameters. Let O be orthogonal and D = diag(d i ), i = 1,... p 1 such that Σ = O t DO Proposal for Comparison of and BIC JSM: July-August slide #12

18 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Proposal for Integrate out common parameters Orthogonalize the parameters. Let O be orthogonal and D = diag(d i ), i = 1,... p 1 such that Σ = O t DO Make the change of variables ξ = Oθ (1), and let b ξ = O b θ (1) Comparison of and BIC JSM: July-August slide #12

19 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC Integrate out common parameters Orthogonalize the parameters. Let O be orthogonal and D = diag(d i ), i = 1,... p 1 such that Σ = O t DO Make the change of variables ξ = Oθ (1), and let b ξ = O b θ (1) If there is no θ (2) to be integrated out, Σ = Î 1 = O t DO JSM: July-August slide #12

20 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC Integrate out common parameters Orthogonalize the parameters. Let O be orthogonal and D = diag(d i ), i = 1,... p 1 such that Σ = O t DO Make the change of variables ξ = Oθ (1), and let b ξ = O b θ (1) If there is no θ (2) to be integrated out, Σ = Î 1 = O t DO We now use a prior that is independent in the ξ i i.e, π(ξ) = Q p 1 i=1 π i(ξ i ) JSM: July-August slide #12

21 Choosing a good prior Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC Integrate out common parameters Orthogonalize the parameters. Let O be orthogonal and D = diag(d i ), i = 1,... p 1 such that Σ = O t DO Make the change of variables ξ = Oθ (1), and let b ξ = O b θ (1) If there is no θ (2) to be integrated out, Σ = Î 1 = O t DO We now use a prior that is independent in the ξ i i.e, π(ξ) = Q p 1 i=1 π i(ξ i ) The approximation to m(x) is then " m(x) e l( ˆθ) Y p1 Z # (2π) p/2 1 Î 1/2 e (ξ i ˆξ i ) 2 2d i π i (ξ i )dξ i. 2πdi i=1 where we recall that d i are the diagonal elements of D from Σ = O t DO, that is, Var(ξ i ). They will appear often in what follows. JSM: July-August slide #12

22 Univariate priors Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC For the priors π i (ξ i ), there are several possibilities: 1. Jeffreys recommended the Cauchy(0, b i ) density Z «πi C 1 (ξ i ) = N ξ i 0, b i Ga λ i where (b i ) 1 = (d i) 1 0 n i λ i 1 2, 1 «2 dλ i is the unit information for ξ i with n i being the effecteive sample size for ξ i (Kass & Wasserman, 1995). Note: For i.i.d. scenarios, d i is like σ 2 i /n i so b i is like the variance of one observation. 2. Use other sensible (objective) testing priors, like the intrinsic prior. 3. Goal is to get closed form expression: We use priors proposed in Berger (1985) which is very close to both the Cauchy and the intrinsic priors and produces close form expressions for m(x) for normal likelihoods. JSM: July-August slide #13

23 Berger s robust priors Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC For if b i d i, the prior is given by Z 1 «πi R 1 1 (ξ i ) = N ξ i 0, (d i + b i ) d i 2λ i 2 dλ i, λ i 0 JSM: July-August slide #14

24 Berger s robust priors Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC For if b i d i, the prior is given by Z 1 «πi R 1 1 (ξ i ) = N ξ i 0, (d i + b i ) d i 2λ i 2 dλ i, λ i 0 Introduced by Berger in the context of Robust Analyses No closed form expression. Not widely used in model selection Behaves like Cauchy priors with parameters b i. JSM: July-August slide #14

25 Berger s robust priors Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC For if b i d i, the prior is given by Z 1 «πi R 1 1 (ξ i ) = N ξ i 0, (d i + b i ) d i 2λ i 2 dλ i, λ i 0 Introduced by Berger in the context of Robust Analyses No closed form expression. Not widely used in model selection Behaves like Cauchy priors with parameters b i. Gives closed form expression for approximation to m(x) m(x) e l( ˆθ) p/2 (2π) Î 1/2 2 4 p 1 Y i= e ˆξ i 2 /(d i+b i ) p 5 2π(di + b i ) 2 ˆξ2 i /(d i + b i ) where (d i ) 1 is global information about ξ i, and (b i ) 1 is unit information. JSM: July-August slide #14

26 Proposal for Finally, we have, as the approximation to 2 log m(x), Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC 2l(ˆθ) + (p p 1 ) log(2π) log Î22 P p 1 i=1 log(1 + n i) +2 P p 1 i=1 log ( 1 e v i) 2 vi, where v i = ˆξ i 2 b i +d i. (error as approximation to 2 log m(x) is o n (1); exact for Normals) If no parameters are integrated out (so that p 1 = p), then px px `1 e v i = 2l(ˆθ) log(1 + n i ) + 2 log. 2 vi i=1 i=1 If all n i = n, the dominant terms in the expression (as n ) are 2l(ˆθ) p log n. The third term (the constant ignored by Swartz) is negative. JSM: July-August slide #15

27 Comparison of and BIC Choosing a good prior Univariate priors Berger s robust priors Proposal for Comparison of and BIC Compare this and BIC: = 2l(ˆθ) px log(1 + n i ) k π i=1 BIC = 2l(ˆθ) p log n where k π > 0 is the ignored constant from π It is easy to intuitively notice that BIC makes two mistakes : Penalizes too much with p log n. Note that n i n Penalizes too little with the prior (the third term, a new penalization, is absent) The second mistake actually helps the first one (in this sense, BIC is lucky ), but often the first mistake completely overwhelms this little help. JSM: July-August slide #16

28 : Modification More Favorable to Complex Models : Modification More Favorable to Complex Models Priors centered at zero penalize complex models too much? Priors centered around MLE ( Raftery, 1996) favor complex models too much? An attractive compromise: Cauchy-type priors centered at zero, but with the scales, b i, chosen so as to maximize the marginal likelihood of the model. Empirical Bayes alternative, popularized in the robust Bayesian literature (Berger, 1994) JSM: July-August slide #17

29 : Modification More Favorable to Complex Models The b i that maximizes m(x) can easily be seen to be : Modification More Favorable to Complex Models ˆbi = max{d i, ˆξ 2 i w d i}, with w s.t. e w = 1 + 2w, or w 1.3. Problem: when ξ i = 0, this empirical Bayes choice can result in inconsistency as n i. Solution: prevent ˆb i from being less than n i d i. This results, after an accurate approximation (for fixed p) in 2l(ˆθ) px log(1 + n i ) i=1 px log (3v i + 2 max{v i, 1}). i=1 again a very simple expresion JSM: July-August slide #17

30 Effective Sample size Calculate I = I jk, the expected information matrix, at the MLE, I jk = E X θ 2 log f(x θ) θ j θ k θ= θ b. Effective Sample size Effective Sample size: Continued JSM: July-August slide #18

31 Effective Sample size Calculate I = I jk, the expected information matrix, at the MLE, I jk = E X θ 2 log f(x θ) θ j θ k θ= θ b. Effective Sample size Effective Sample size: Continued For each case x i, compute I i = (I i,jk ), having (j, k) entry I i,jk = E X i θ 2 log f i (x i θ) θ j θ k θ= θ b. JSM: July-August slide #18

32 Effective Sample size Calculate I = I jk, the expected information matrix, at the MLE, I jk = E X θ 2 log f(x θ) θ j θ k θ= θ b. Effective Sample size Effective Sample size: Continued For each case x i, compute I i = (I i,jk ), having (j, k) entry I i,jk = E X i θ 2 log f i (x i θ) θ j θ k θ= θ b. Define I jj = P n i=1 I i,jj Define n j as follows: Define information weights w ij = I i,jj / P n k=1 I k,jj. Define the effective sample size for θ j as n j = I jj P n i=1 w iji i,jj = (I jj ) 2 P n i=1 (I i,jj) 2. JSM: July-August slide #18

33 Effective Sample size: Continued Effective Sample size Effective Sample size: Continued Intuitively, P w ij I i,jj is a weighted measure of the information per observation, and dividing the total information about θ j by this information per case seems plausible as an effective sample size. This is just a tentative proposal Works for most of the examples we discussed Research in progress JSM: July-August slide #19

34 A small linear regression simulation: A small linear regression simulation: Linear regression Example Linear regression Example Linear regression Example Conclusions and future work References Y n 1 = X n 8 β ɛ n 1 ɛ N(0, σ 2 ), β = (3, 1.5, 0, 0, 2, 0, 0, 0), and design Matrix X N(0, Σ x ), Σ x = ρ ρ ρ 1 ρ C A.. ρ ρ 1 JSM: July-August slide #20

35 Linear regression Example PSfrag replacements Method abf2 aic bic Gbic Gbic* hbf σ=3 n=30 σ=3 σ=3 σ=3 σ= A small linear regression simulation: Linear regression Example Linear regression Example Linear regression Example Conclusions and future work References exact σ=2 n=30 σ=2 σ=2 σ= σ= σ=1 n=30 σ=1 σ=1 σ=1 σ= ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Percentage of true models selected under different situations JSM: July-August slide #21

36 Linear regression Example PSfrag replacements Method abf2 aic bic Gbic Gbic* hbf 5.0 σ=3 n=30 σ=3 σ=3 σ=3 σ= A small linear regression simulation: Linear regression Example Linear regression Example Linear regression Example Conclusions and future work References correct 3.5 σ=2 n=30 σ=2 σ=2 σ= σ= σ=1 n=30 σ=1 σ=1 σ=1 σ= ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Average number of 0 s that were determined to be 0. (True Model) JSM: July-August slide #22

37 Linear regression Example PSfrag replacements σ=3 n=30 Method abf2 aic bic Gbic Gbic* hbf σ=3 σ=3 σ=3 σ= A small linear regression simulation: Linear regression Example Linear regression Example 0.0 σ=2 n=30 σ=2 σ=2 σ= σ= Linear regression Example Conclusions and future work References incorrect σ=1 n=30 σ=1 σ=1 σ=1 σ= ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Average number of non-zeroes that were determined to be zero (small is good). JSM: July-August slide #23

38 Conclusions and future work A small linear regression simulation: Linear regression Example Linear regression Example Linear regression Example Conclusions and future work BIC can be improved by keeping rather than ignoring the prior using a sensible, objective prior producing close-form expressions carefully acknowledging effective sample size for each parameter to guide choice of the scale of the priors Work in progress A satisfactory definition of effective sample size (if possible). Current proposal works in most cases. Extension to the non-independent case References JSM: July-August slide #24

39 References A small linear regression simulation: Linear regression Example Linear regression Example Linear regression Example Conclusions and future work References 1. Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd edition. Springer-Verlag. 2. Berger, J.O., Ghosh, J.K., and Mukhopadhyay (2003). Approximations and Consistency of Bayes Factors as Model Dimension Grows. Journal of Statistical Planning and Inference 112, Berger, J., Pericchi, L, and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhya A 60, Bollen, K., Ray S., Zavisca J.,(2007) A scaled unit information prior approximation to the Bayes Factor. under prepration JSM: July-August slide #25

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

More on nuisance parameters

More on nuisance parameters BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Bayesian Asymptotics

Bayesian Asymptotics BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Divergence Based Priors for Bayesian Hypothesis testing

Divergence Based Priors for Bayesian Hypothesis testing Divergence Based Priors for Bayesian Hypothesis testing M.J. Bayarri University of Valencia G. García-Donato University of Castilla-La Mancha November, 2006 Abstract Maybe the main difficulty for objective

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

An Overview of Objective Bayesian Analysis

An Overview of Objective Bayesian Analysis An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Improving Model Selection in Structural Equation Models with New Bayes Factor Approximations

Improving Model Selection in Structural Equation Models with New Bayes Factor Approximations Improving Model Selection in Structural Equation Models with New Bayes Factor Approximations Kenneth A. Bollen Surajit Ray Jane Zavisca Jeffrey J. Harden April 11, 11 Abstract Often researchers must choose

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Bayesian Model Comparison

Bayesian Model Comparison BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Econ 2148, spring 2019 Statistical decision theory

Econ 2148, spring 2019 Statistical decision theory Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Econ 2140, spring 2018, Part IIa Statistical Decision Theory Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Light touch assessment One or two problems assigned weekly graded during Reading Week http://www.utstat.toronto.edu/reid/4508s14.html

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1 Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 5: Conventional Model Selection Priors

Lecture 5: Conventional Model Selection Priors Lecture 5: Conventional Model Selection Priors Susie Bayarri University of Valencia CBMS Conference on Model Uncertainty and Multiplicity July 23-28, 2012 Outline The general linear model and Orthogonalization

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information