Part 7: Hierarchical Modeling
|
|
- Baldwin Ellis
- 5 years ago
- Views:
Transcription
1 Part 7: Hierarchical Modeling!1
2 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example, patients within several hospitals" genes within a group of animals, or" people within counties within regions within countries!
3 Two groups The simplest type of multilevel data has levels, in which one level consists of groups" and the other consists of units within groups y i,j i th In this case, we denote as the data on the unit within group j!3
4 Hierarchical model The sampling model should reflect/acknowledge the hierarchy so that we may distinguish between within-group variability, and" between-group variability One typically uses the following hierarchical model, for j =1,...,m n j, with observations in each group {Y 1,j,...,Y nj,j j } iid p(y j ) (within-group sampling variability) { 1,..., m } iid p( j ) (between-group sampling variability) p( )!4 (prior distribution)
5 Variability accounting It is important to recognize that the distributions p(y ) and p( ) both represent sampling variability among populations of objects: p(y ) represents variability among measurements within a group " p( ) represents variability across groups In contrast, p( ) represents information about a single fixed but unknown quantity These are both sampling distributions; the data are used to estimate and ; but p( ) is not estimated!5
6 Hierarchical normal model A popular model for describing the heterogeneity of means across several populations is the hierarchical normal model, in which the within- and between-group sampling models are both normal: j =(µ j, ),p(y j )=N (µ j, ) (within-group model) =(, ),p( j )=N (, ) (between-group model) Note that p( ) only describes heterogeneity across group means, and not any heterogeneity in groupspecific variances The within-group sampling variability is assumed to be constant across groups!6
7 Hierarchical diagram (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m!7
8 The priors The fixed but unknown parameters in the models are, and For convenience we will use the standard semiconjugate normal and IG prior for these parameters IG( 0 /, 0 0 /) IG( 0 /, 0 0 /) N ( 0, 0)!8
9 Posterior inference The full set of unknown quantities in our system include the group-specific means {µ 1,...,µ m }, the within-group sampling variability and the mean and variance (, ) of the population group-specific means Posterior inference for these parameters can be made by GS which approximates the joint posterior distribution p(µ 1,...,µ m,,, y 1,...,y m ) GS proceeds by iteratively sampling each parameter from its full conditional distribution!9
10 Full conditionals Deriving the full conditional distributions in this highly parameterized system may seem like a daunting task " But it turns out that we have already worked out all of the necessary technical details " All that is required is that we recognize certain analogies between the current model and the univariate normal model!10
11 Posterior essence The following factorization of the posterior distribution will be useful p(µ 1,...,µ m,,, y 1,...,y m ) / p(y 1,...,y m µ 1,...,µ n,,, ) p(µ 1,...,µ m, ) p(,, < n my Y j = < my = = p(y i,j µ j, ) : ; p(µ j, ) : ; p( )p( )p( ) j=1 i=1 j=1 ) This term is the result of an important conditional independence feature of our model This relates back to our diagram...!11
12 Conditional independence (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m The existence of a path from (, ) to each indicates that these parameters provide information about Y j Y j but only indirectly through µ j Conditional on {µ 1,...,µ m,,, } the random variables Y 1,j,...,Y are independent with a nj,j distribution that depends only on µ j and!1
13 Full conditional for (, ) As a function of and, the posterior distribution is proportional to my p(µ j, )p( )p( ) j=1 And so the full conditional distributions of are also proportional to this quantity and p( µ 1,...,µ m,, p( µ 1,...,µ m,,,y 1,...,y m )= Y p(µ j, )p( ),y 1,...,y m )= Y p(µ j, )p( )!13
14 Full conditional for (, ) These conditionals are exactly the full conditional distributions from the one-sample normal problem Therefore, by analogy: m µ/ { µ 1,...,µ m, + 0 / 0 } N m/ +1/ 0, [m/ +1/ 0] { 0 + m µ 1,...,µ m, } IG, P (µ j ) 1!14
15 Full conditional for µ j Likewise, the full conditional distribution of proportional to µ j must be p(µ j,,,µ ( j),y 1,...,y n ) / n j Y p(y i,j µ j, ) p(µ j, ) µ j i=1 is conditionally independent of {µ k,y k } for k 6= j (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m While there is a path from each µ j to every other µ k, the paths go through (, ) or!15
16 Full conditional for µ j We can think of this as meaning that the µ s contribute no information about each other beyond that contained in, and The terms in our conditional are n j Y p(y i,j µ j, ) p(µ j, ) i=1 (product of normals) (normal) Mathematically, this is the same setup as our normal model. So by analogy, the full conditional distribution is {µ j,,,y 1,j,...,y nj,j} N nj ȳ j / + /!16 n j / +1/, [n j/ +1/ ] 1
17 Full conditional for By similar arguments, is conditionally independent of {, } given {y 1,...,y m,µ 1,...,µ m } The derivation of the full conditional of is similar to that in the one-sample normal model, except that now m we have information about from separate groups p( µ 1,...,µ m,y 1,...,y m ) / my j=1 n j Y i=1 p(y i,j µ j, / ( ) P n j / e ) p( ) PP (yi,j µ j ) ( ) 0/ e!17
18 Full conditional for Adding powers of and collecting terms in the exponent, we recognize that this is an IG: { µ, y} IG mx n j 3 5, mx 31 n X j (y i,j µ j ) 5A j=1 j=1 i=1 PP (yi,j µ j ) Note that is the sum of squared residuals across all groups, conditional on the withingroup means So the conditional distribution concentrates probability around a pooled-sample estimate of the variance!18
19 Example: Math scores in US public schools Consider data that is part of the 00 Educational Longitudinal Study (ELS), a survey of students from a large sample of schools in the United States The data consist of math scores of 10th grade students at 100 different urban public high schools with a (10th grade) enrollment of 400+ The scores are based on a national exam, standardized to produce a nationwide mean of 50 and standard deviation of 10!19
20 Example: the data Scores for students within the same school plotted along a common vertical bar: rank of school specific math score average math score!0
21 Example: the data The range of average scores (36, 65) is quite large Extreme sample averages occur for schools with small sample sizes This is a common relationship in hierarchical datasets ybar Frequency n ybar!1
22 Example: priors The prior parameters that we need to specify are ( 0, 0) for p( ) ( 0, 0 ) for p( ), and ( 0, 0) for p( ) The exam was designed to give a variance of 100, both within and between schools so the within-school variance should be at most 100, which we take as" 0 this is likely an overestimate, so we take a weak prior concentration around this value with 0 =1!
23 Example: priors, ctd. Similarly, the between-school variance should not be more than 100, so we likewise take ( 0, 0 )=(1, 100) Finally, the nationwide mean over all schools is 50 Although the mean for large urban public schools may be different than the nationwide average, it should not differ by too much We shall take 0 = 50 and 0 = 5, so that the prior probability that (40, 60) is about 95%!3
24 Example: Gibbs sampling Posterior approximation may now proceed by GS Given a current state of the unknowns {µ (s) 1,...,µ(s) m, (s), (s), (s) } a new state may be generated as follows 1. sample. sample 3. sample (s+1) p( µ (s) 1,...,µ(s) m, (s) ) (s+1) p( µ (s) 1,...,µ(s) m, (s+1) ) (s+1) p( µ (s) 1,...,µ(s) m,y 1,...y m ) 4. sample for each j {1,...,m} sample µ (s+1) j p(µ j (s+1), (s+1),!4 (s+1),y j )
25 Example: Gibbs sampling The order in which the new parameters are updated does not matter What does matter is that each parameter is updated conditional upon the most current value of the other parameters Checking for mixing or convergence problems with so many parameters can be difficult Calculating the ESS and plotting traces is the state of the art!5
26 Example: Posterior summaries Density Density Density psi sigma = =9.1 =4.97 tau 95% of the scores within a school are within points of each other" whereas, 95% of the average school scores are within points of each other !6
27 Example: Shrinkage One of the motivations behind hierarchical modeling is that information can be shared across groups Recall that, conditional on,, and the data, the expected value of µ j ȳ j is a weighted average of and E{µ j y j,,, } = ȳjn j / + / n j / +1/ As a result, the expected value of µ j is pulled a bit ȳ j from towards by an amount depending upon n j This effect is called shrinkage!7
28 Example: Shrinkage Consider the relationship between ȳ j and µ j = E{µ j y j,,, } for j =1,...,m obtained in via our MCMC method Notice that the relationship apply(mu,, mean) follows a line with a slope that is less than one, indicating that high values of ȳ j correspond to slightly less high values of and vice-versa for low µ j values ybar!8
29 Example: Shrinkage It is also interesting to observe the shrinkage as a function of the group-specific sample size (ybar apply(mu,, mean))[o] Groups with low sample sizes get shrunk the most, where as groups with large sample sizes hardly get shrunk at all This makes sense: n[o] The larger the sample size the more information we have for that group, and the less information we need to borrow from the rest of the population!9
30 Example: Ranking schools Suppose our task is to rank the schools according to what we think their performances would be if every student in each school took the math exam In this case, it makes sense to rank the schools according to the school-specific posterior expectations { µ 1,..., µ m } Alternatively, we could ignore the results of the hierarchical model and just use the school-specific averages {ȳ 1,...,ȳ m } The two methods will give similar, but not identical rankings!30
31 Example: Ranking schools Consider the posterior distributions of µ 46 and µ 8 Both schools have exceptionally low sample means, in the bottom 10% of all schools Density n 46 = 1 n 8 =5 school 46 school 8 psi bar (data averages)!31 N = 5000 Bandwidth = 0.301
32 Example: Ranking schools So school 8 is ranked lower than 46 by the data averages, but higher by the posterior group-means Does this make sense? The posterior density for school 46 is more peaked because it was derived from a much larger sample size Therefore our degree of certainty about higher than that for µ 8 µ 46 is much How does this translate into lack of faith in the data averages if we are going to justify using the posterior means instead?!3
33 Example: Hypothetical absence Suppose that on the day of the exam the student who got the lowest exam score in school 8 did not come to class Then the sample mean for school 8 would have been rather than 38.76, a change of more than three points (and a change in ranking) In contrast, if the lowest performing student in school ȳ had not shown up, then would have been 40.9 as opposed to 40.18, a change of only 3/4!33
34 Example: Hypothetical absence In other words, the low value of the sample mean for school 8 can be explained by either low µ 8 being very... or just the possibility that a few of the 5 sampled students were among the poorer performing students in the school In contrast for school 46, this latter possibility cannot explain the low value of the sample mean, because of the large sample size Therefore it makes sense to shrink the expectation of school 8 towards the population expectation greater amount than for 46!34 by a
35 Example: Is it fair? To some people this reversal of rankings may seem strange or unfair While fairness may be debated, the hierarchical model reflects the objective fact that there is more evidence that evidence that µ 46 µ 8 is exceptionally low than there is is exceptionally low There are many other real-life situations where differing amounts of evidence results in a switch of ranking!35
36 Hierarchical variances If the population means vary across groups, shouldn t we allow for the possibility that the population variances also vary across groups? Letting be the variance for group j, our sampling j model would then become Y 1,j,...,Y nj,j iid N (µj, j ) and our full conditional for each {µ j y 1,j,...,y nj,j, j,, } µ j would be N n jȳ j j + / n j / j +1/, [n j/ j +1/ ]!36 1!
37 Estimating j How does j get estimated? Suppose we were to specify that 1,..., m iid IG 0, 0 0 Then we can use our familiar result that the full conditional distribution of j { j y 1,j,...,y nj,j,µ j, 0, 0} IG and estimation for 1,..., can proceed by iteratively sampling their values along with the other is m 0 + n j, P i (y i,j µ j ) parameters via GS!37
38 Inefficient independence 0 0 If and are fixed in advance at some particular values, then our IID IG prior on the variances implies that, e.g., p( m 1,..., m 1) =p( m) and so the information we may have about 1,..., m 1 This seems inefficient: if 1,..., m 1 is not used to help us estimate n m m was small, and we saw that were tightly concentrated around a particular value, then we would want to use this fact to improve our estimation of!38 m
39 Heterogeneity in variances In other words, we want to be able to learn about the sampling distribution of the s and use this information to improve our estimation for groups that may have low sample sizes j 0 0 This can be done by treating and as parameters to be estimated by thinking of 1,..., m iid IG 0, 0 0 as a sampling model for across-group heterogeneity in population variances (not as a prior distribution)!39
40 Hierarchical diagram Putting this together with our model for heterogeneity in population means gives (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m 1 m 1 m ( 0,!40 0)
41 Unknown parameters Now, the unknown parameters to be estimated include: representing the {(µ 1, 1),...,(µ m, m)} within-group sampling distributions {, } representing the across-group heterogeneity in means { 0, 0} representing the across-group heterogeneity in variances!41
42 Gibbs sampling As before, the joint posterior distribution for all of these parameters can be approximated by iteratively sampling each parameter from its full conditional distribution given the others The full conditional distributions for and are unchanged in this new setup We just derived the full conditionals for µ j and j What remains to do is to specify prior distributions for 0 and 0 and obtain their full conditionals!4
43 Conjugate prior for 0 A conjugate class of prior densities for Gamma densities 0 are the If 0 G(a, b) then it is straightforward to show that 0 1 { 0 1,..., m, 0} + 1 m 0,b+ 1 0 m X 1 A j=1 j Notice that for small values of and the conditional mean of 1,..., 0 m a is approximately the harmonic mean of!43 b
44 No conjugate prior for 0 A simple conjugate prior for We shall restrict 0 0 does not exist to be a positive integer, so that it may retain its interpretation as a prior sample size If we let the prior on 0 be the geometric distribution {1,,...} p( 0 ) / e 0 on, so that then the full conditional distribution is proportional to p( 0 0, 1,..., m) / p( 1,..., m 0, 0 1 ( 0 = 0 /) m 0/ A ( 0 /) j=1 1 j 0 / 1 0) p( 0 ) exp X 0 (1/ j )!44
45 Awkward conditional for 0 While this not pretty, this un-normalized probability density can be computed for a large range of 0 -values and then sampled from In fact, since the interpretation of prior sample size for the is that it is the s, it is reasonable to restrict it to be smaller than the data sample size mx j 0 N = j=1 n j The expression for p( 0 0, 1,..., m) may easily be evaluated (and re-normalized) over this range!45
46 Example: a particular full conditional for 0 Consider the full conditional for conditional upon 0 = 100 and j = 85 0 for all j =1,...,m mass nu.0!46
47 Gibbs sampler Therefore, we may complete the GS algorithm by sampling from the full conditional for 0, which is a discrete distribution over a finite set of values {1,,...,N} This may be accomplished with the sample function in R!47
48 Example: math score data Let us re-analyze the math score data with our hierarchical model for school-specific means and variances We shall take the parameters in our prior distributions to be the same as before, with the addition of =1 and {a =1,b=1/100} for the prior distributions on 0 and 0, respectively We shall run the GS algorithm for 5,000 iterations, as before, and compare our results to the hierarchical model with a common group-variance!48
49 Example: group means and variances Density = vars!= vars Density psi tau mass Density nu.0! sigma.0
50 Example: group variances Our first model, in which all within-group " variances are forced to be equal, is equivalent to a value of 0 = 1 in our current hierarchical model In contrast, a low value of 0 =1 would indicate that the variances are quite unequal, and little information about variances could be shared across groups The posterior summaries from the GS output indicates that neither of these extremes is appropriate, as the posterior distribution is concentrated around a moderate value of 14 or 15!50
51 Example: group variances This estimated distribution of is used to shrink extreme sample variances towards an acrossgroup center 1,..., m s apply(sigma.,, mean) n[o] (s apply(sigma.,, mean))[o] The larger amounts of shrinkage generally occur for groups with smaller sample sizes!51
52 Hierarchical binomial model Another commonly used hierarchical model is the Beta-binomial model, where Y j Bin(n j, j ) j Beta(, ) (, ) p(, ) j is the familiar Beta distribution j Y j,, Beta( + y j, + n j y j ) Conditional on and the posterior conditional for!5
53 Hierarchical diagram (, ) 1 m 1 m Y 1 Y Y m 1 Y m As usual, we treat n 1,...,n m as known!53
54 Hyperprior Unfortunately, there is no (semi-) conjugate prior for and However, it is possible to set up a non-informative hyperprior that is dominated by the likelihood and yields a proper posterior distribution which leads to a convenient Metropolis-within-Gibbs sampling method!54
55 A hyperprior choice A reasonable choice of diffuse hyperprior for the Betabinomial hierarchical model is uniform on, ( + ) 1/ + A change of variables shows that this implies the following prior on the original scale p(, ) / ( + ) 5/ There are many other possibilities. Why is this a sensible choice?!55
56 A hyperprior choice p(, ) / ( + ) 5/ It may be shown that it gives a roughly uniform prior distribution to the log standard deviation of the resulting j Beta(, ) sampling model It may also be shown that the posterior marginal p(, y) is proper with this choice as long as 0 <y j <n j for at least one experiment j!56
57 The posterior marginal How can we calculate the posterior marginal distribution? (cond prob.) p(, y) = = = p(,, y) p(,,y) (Bayes rule) p(y,, )p(,, ) p(,,y) p(y )p(, )p(, ) (cond indep. & cond prob.) p(,,y) Q m j=1 Bin(y j; n j, j ) Q m j=1 Beta( j;, ) p(, ) Q m j=1 Beta( j; + y j, + n j y j ) (...) p(, ) apple ( + ) ( ) ( ) m m Y j=1 ( + y j ) ( + n j y j ) ( + + n j )!57
58 Metropolis-within-Gibbs We may sample from the marginal ( (s+1), p(, (s+1) ) ( (s), y) generating from via 1. propose 0 Unif( (s) /, (s) ) (s) ) by. if u< p( 0, (s) y) p( (s), (s) y) (s) 0 for u Unif(0, 1) then (s+1) 0, else (s+1) (s) 3. propose 0 Unif( (s) /, (s) ) 4. if u< p( (s+1), 0 y) (s) p( (s+1), (s) y) 0 then (s+1) 0, else!58 (s+1) for u Unif(0, 1) (s)
59 Completing the MC Conditional on samples ( (s), (s) ) we may complete the MC method for sampling from the joint posterior distribution by sampling from the conditional(s) (s) j Beta( (s) + y j, (s) + n j y j ) for j =1,...,m We may also obtain samples from the posterior predictive p(ỹ y) =p(ỹ 1,...,ỹ m y 1,...,y m ) as ỹ (s) j Bin(n j, (s) j ) for j =1,...,m!59
60 Example: risk of tumors in a group of rats In the evaluation of drugs for possible clinical application, studies are routinely performed on rodents In a particular study, the aim is to estimate the probability of a tumor in a population of female rats F344 that receive a zero-dose of the drug (control group) The data show that 4/14 rats developed endometrial stromal polyps (a kind of tumor) Typically, the mean and standard deviation of underlying tumor risks are not available to form a prior!60
61 Example: prior data Rather, historical data are available on previous experiments on similar groups of rats Tarone (198) provides data on the observations of tumor incidence in 70 groups of rats # tumors # of rats in group!61
62 Example: Bayesian analysis We shall model the m = 71 rat tumor data with a hierarchical Beta-binomial sampling model with MC(MC) inference, as just described First we must obtain samples from alpha ESS = the marginal s posterior of (, ) beta ESS = !6 s
63 Example: The posterior marginal Once we have determined that the mixing is good, and we think the chain has achieved stationarity we can inspect the marginal posterior in a number of ways beta On the original scale alpha!63 log(alpha + beta) log(alpha/beta) A sensible transformation
64 Example: Mean Shrinkage As in the normal hierarchical model, we can assess the amount of shrinkage in the group-specific means, which may be obtained by direct MC, in a number of ways theta bar y/n theta bar y/n !64 n
65 Example: Rat group F344 We can now present the posterior distribution for for our 71st rat group, and compare it to the population mean of tumor rates in the 70 prior rat groups Density rat F344 a/(a+b) E{ 71 y} =0.199 E y = P 71 > = theta!65
66 In summary We have seen how hierarchical models may be used to model data which are nested or have a natural hierarchy" pool information about groups of similar populations so that smaller groups may borrow information from larger ones (i.e., shrinkage)" provide an efficient way of using prior data in an appropriate way!66
Part 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationθ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model
Hierarchical models (chapter 5) Introduction to hierarchical models - sometimes called multilevel model Exchangeability Slide 1 Hierarchical model Example: heart surgery in hospitals - in hospital j survival
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More informationBernoulli and Poisson models
Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule
More information2 Bayesian Hierarchical Response Modeling
2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMultivariate Normal & Wishart
Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationSpecifying Latent Curve and Other Growth Models Using Mplus. (Revised )
Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationA Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles
A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationNeutral Bayesian reference models for incidence rates of (rare) clinical events
Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationChapter 2. Data Analysis
Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationLuminosity Functions
Department of Statistics, Harvard University May 15th, 2012 Introduction: What is a Luminosity Function? Figure: A galaxy cluster. Introduction: Project Goal The Luminosity Function specifies the relative
More informationMulti-level Models: Idea
Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random
More informationLinear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.
Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationGibbs Sampling in Linear Models #1
Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationA Primer on Statistical Inference using Maximum Likelihood
A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More information1 Inference for binomial proportion (Matlab/Python)
Bayesian data analysis exercises from 2015 1 Inference for binomial proportion (Matlab/Python) Algae status is monitored in 274 sites at Finnish lakes and rivers. The observations for the 2008 algae status
More informationBayesian Model Comparison:
Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500
More informationModel Checking and Improvement
Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationHierarchical Linear Models. Jeff Gill. University of Florida
Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationDefault Priors and Efficient Posterior Computation in Bayesian Factor Analysis
Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y
More informationStatistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )
Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv:1809.05778) Workshop on Advanced Statistics for Physics Discovery aspd.stat.unipd.it Department of Statistical Sciences, University of
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationBayesian course - problem set 5 (lecture 6)
Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationBayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1
Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455 0392, U.S.A.
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationExample using R: Heart Valves Study
Example using R: Heart Valves Study Goal: Show that the thrombogenicity rate (TR) is less than two times the objective performance criterion R and WinBUGS Examples p. 1/27 Example using R: Heart Valves
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationThe Metropolis Algorithm
16 Metropolis Algorithm Lab Objective: Understand the basic principles of the Metropolis algorithm and apply these ideas to the Ising Model. The Metropolis Algorithm Sampling from a given probability distribution
More informationEco517 Fall 2014 C. Sims MIDTERM EXAM
Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.
More informationBUGS Bayesian inference Using Gibbs Sampling
BUGS Bayesian inference Using Gibbs Sampling Glen DePalma Department of Statistics May 30, 2013 www.stat.purdue.edu/~gdepalma 1 / 20 Bayesian Philosophy I [Pearl] turned Bayesian in 1971, as soon as I
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationMultivariate spatial modeling
Multivariate spatial modeling Point-referenced spatial data often come as multivariate measurements at each location Chapter 7: Multivariate Spatial Modeling p. 1/21 Multivariate spatial modeling Point-referenced
More information2 Inference for Multinomial Distribution
Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationEstimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters
Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationHierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018
Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 14, 2018 Jarad Niemi (STAT544@ISU) Hierarchical models February 14, 2018 1 / 38 Outline Motivating example Independent vs pooled
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationECE285/SIO209, Machine learning for physical applications, Spring 2017
ECE285/SIO209, Machine learning for physical applications, Spring 2017 Peter Gerstoft, 534-7768, gerstoft@ucsd.edu We meet Wednesday from 5 to 6:20pm in Spies Hall 330 Text Bishop Grading A or maybe S
More information(I AL BL 2 )z t = (I CL)ζ t, where
ECO 513 Fall 2011 MIDTERM EXAM The exam lasts 90 minutes. Answer all three questions. (1 Consider this model: x t = 1.2x t 1.62x t 2 +.2y t 1.23y t 2 + ε t.7ε t 1.9ν t 1 (1 [ εt y t = 1.4y t 1.62y t 2
More informationMCMC Review. MCMC Review. Gibbs Sampling. MCMC Review
MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html
More information