Part 7: Hierarchical Modeling

Size: px
Start display at page:

Download "Part 7: Hierarchical Modeling"

Transcription

1 Part 7: Hierarchical Modeling!1

2 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example, patients within several hospitals" genes within a group of animals, or" people within counties within regions within countries!

3 Two groups The simplest type of multilevel data has levels, in which one level consists of groups" and the other consists of units within groups y i,j i th In this case, we denote as the data on the unit within group j!3

4 Hierarchical model The sampling model should reflect/acknowledge the hierarchy so that we may distinguish between within-group variability, and" between-group variability One typically uses the following hierarchical model, for j =1,...,m n j, with observations in each group {Y 1,j,...,Y nj,j j } iid p(y j ) (within-group sampling variability) { 1,..., m } iid p( j ) (between-group sampling variability) p( )!4 (prior distribution)

5 Variability accounting It is important to recognize that the distributions p(y ) and p( ) both represent sampling variability among populations of objects: p(y ) represents variability among measurements within a group " p( ) represents variability across groups In contrast, p( ) represents information about a single fixed but unknown quantity These are both sampling distributions; the data are used to estimate and ; but p( ) is not estimated!5

6 Hierarchical normal model A popular model for describing the heterogeneity of means across several populations is the hierarchical normal model, in which the within- and between-group sampling models are both normal: j =(µ j, ),p(y j )=N (µ j, ) (within-group model) =(, ),p( j )=N (, ) (between-group model) Note that p( ) only describes heterogeneity across group means, and not any heterogeneity in groupspecific variances The within-group sampling variability is assumed to be constant across groups!6

7 Hierarchical diagram (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m!7

8 The priors The fixed but unknown parameters in the models are, and For convenience we will use the standard semiconjugate normal and IG prior for these parameters IG( 0 /, 0 0 /) IG( 0 /, 0 0 /) N ( 0, 0)!8

9 Posterior inference The full set of unknown quantities in our system include the group-specific means {µ 1,...,µ m }, the within-group sampling variability and the mean and variance (, ) of the population group-specific means Posterior inference for these parameters can be made by GS which approximates the joint posterior distribution p(µ 1,...,µ m,,, y 1,...,y m ) GS proceeds by iteratively sampling each parameter from its full conditional distribution!9

10 Full conditionals Deriving the full conditional distributions in this highly parameterized system may seem like a daunting task " But it turns out that we have already worked out all of the necessary technical details " All that is required is that we recognize certain analogies between the current model and the univariate normal model!10

11 Posterior essence The following factorization of the posterior distribution will be useful p(µ 1,...,µ m,,, y 1,...,y m ) / p(y 1,...,y m µ 1,...,µ n,,, ) p(µ 1,...,µ m, ) p(,, < n my Y j = < my = = p(y i,j µ j, ) : ; p(µ j, ) : ; p( )p( )p( ) j=1 i=1 j=1 ) This term is the result of an important conditional independence feature of our model This relates back to our diagram...!11

12 Conditional independence (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m The existence of a path from (, ) to each indicates that these parameters provide information about Y j Y j but only indirectly through µ j Conditional on {µ 1,...,µ m,,, } the random variables Y 1,j,...,Y are independent with a nj,j distribution that depends only on µ j and!1

13 Full conditional for (, ) As a function of and, the posterior distribution is proportional to my p(µ j, )p( )p( ) j=1 And so the full conditional distributions of are also proportional to this quantity and p( µ 1,...,µ m,, p( µ 1,...,µ m,,,y 1,...,y m )= Y p(µ j, )p( ),y 1,...,y m )= Y p(µ j, )p( )!13

14 Full conditional for (, ) These conditionals are exactly the full conditional distributions from the one-sample normal problem Therefore, by analogy: m µ/ { µ 1,...,µ m, + 0 / 0 } N m/ +1/ 0, [m/ +1/ 0] { 0 + m µ 1,...,µ m, } IG, P (µ j ) 1!14

15 Full conditional for µ j Likewise, the full conditional distribution of proportional to µ j must be p(µ j,,,µ ( j),y 1,...,y n ) / n j Y p(y i,j µ j, ) p(µ j, ) µ j i=1 is conditionally independent of {µ k,y k } for k 6= j (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m While there is a path from each µ j to every other µ k, the paths go through (, ) or!15

16 Full conditional for µ j We can think of this as meaning that the µ s contribute no information about each other beyond that contained in, and The terms in our conditional are n j Y p(y i,j µ j, ) p(µ j, ) i=1 (product of normals) (normal) Mathematically, this is the same setup as our normal model. So by analogy, the full conditional distribution is {µ j,,,y 1,j,...,y nj,j} N nj ȳ j / + /!16 n j / +1/, [n j/ +1/ ] 1

17 Full conditional for By similar arguments, is conditionally independent of {, } given {y 1,...,y m,µ 1,...,µ m } The derivation of the full conditional of is similar to that in the one-sample normal model, except that now m we have information about from separate groups p( µ 1,...,µ m,y 1,...,y m ) / my j=1 n j Y i=1 p(y i,j µ j, / ( ) P n j / e ) p( ) PP (yi,j µ j ) ( ) 0/ e!17

18 Full conditional for Adding powers of and collecting terms in the exponent, we recognize that this is an IG: { µ, y} IG mx n j 3 5, mx 31 n X j (y i,j µ j ) 5A j=1 j=1 i=1 PP (yi,j µ j ) Note that is the sum of squared residuals across all groups, conditional on the withingroup means So the conditional distribution concentrates probability around a pooled-sample estimate of the variance!18

19 Example: Math scores in US public schools Consider data that is part of the 00 Educational Longitudinal Study (ELS), a survey of students from a large sample of schools in the United States The data consist of math scores of 10th grade students at 100 different urban public high schools with a (10th grade) enrollment of 400+ The scores are based on a national exam, standardized to produce a nationwide mean of 50 and standard deviation of 10!19

20 Example: the data Scores for students within the same school plotted along a common vertical bar: rank of school specific math score average math score!0

21 Example: the data The range of average scores (36, 65) is quite large Extreme sample averages occur for schools with small sample sizes This is a common relationship in hierarchical datasets ybar Frequency n ybar!1

22 Example: priors The prior parameters that we need to specify are ( 0, 0) for p( ) ( 0, 0 ) for p( ), and ( 0, 0) for p( ) The exam was designed to give a variance of 100, both within and between schools so the within-school variance should be at most 100, which we take as" 0 this is likely an overestimate, so we take a weak prior concentration around this value with 0 =1!

23 Example: priors, ctd. Similarly, the between-school variance should not be more than 100, so we likewise take ( 0, 0 )=(1, 100) Finally, the nationwide mean over all schools is 50 Although the mean for large urban public schools may be different than the nationwide average, it should not differ by too much We shall take 0 = 50 and 0 = 5, so that the prior probability that (40, 60) is about 95%!3

24 Example: Gibbs sampling Posterior approximation may now proceed by GS Given a current state of the unknowns {µ (s) 1,...,µ(s) m, (s), (s), (s) } a new state may be generated as follows 1. sample. sample 3. sample (s+1) p( µ (s) 1,...,µ(s) m, (s) ) (s+1) p( µ (s) 1,...,µ(s) m, (s+1) ) (s+1) p( µ (s) 1,...,µ(s) m,y 1,...y m ) 4. sample for each j {1,...,m} sample µ (s+1) j p(µ j (s+1), (s+1),!4 (s+1),y j )

25 Example: Gibbs sampling The order in which the new parameters are updated does not matter What does matter is that each parameter is updated conditional upon the most current value of the other parameters Checking for mixing or convergence problems with so many parameters can be difficult Calculating the ESS and plotting traces is the state of the art!5

26 Example: Posterior summaries Density Density Density psi sigma = =9.1 =4.97 tau 95% of the scores within a school are within points of each other" whereas, 95% of the average school scores are within points of each other !6

27 Example: Shrinkage One of the motivations behind hierarchical modeling is that information can be shared across groups Recall that, conditional on,, and the data, the expected value of µ j ȳ j is a weighted average of and E{µ j y j,,, } = ȳjn j / + / n j / +1/ As a result, the expected value of µ j is pulled a bit ȳ j from towards by an amount depending upon n j This effect is called shrinkage!7

28 Example: Shrinkage Consider the relationship between ȳ j and µ j = E{µ j y j,,, } for j =1,...,m obtained in via our MCMC method Notice that the relationship apply(mu,, mean) follows a line with a slope that is less than one, indicating that high values of ȳ j correspond to slightly less high values of and vice-versa for low µ j values ybar!8

29 Example: Shrinkage It is also interesting to observe the shrinkage as a function of the group-specific sample size (ybar apply(mu,, mean))[o] Groups with low sample sizes get shrunk the most, where as groups with large sample sizes hardly get shrunk at all This makes sense: n[o] The larger the sample size the more information we have for that group, and the less information we need to borrow from the rest of the population!9

30 Example: Ranking schools Suppose our task is to rank the schools according to what we think their performances would be if every student in each school took the math exam In this case, it makes sense to rank the schools according to the school-specific posterior expectations { µ 1,..., µ m } Alternatively, we could ignore the results of the hierarchical model and just use the school-specific averages {ȳ 1,...,ȳ m } The two methods will give similar, but not identical rankings!30

31 Example: Ranking schools Consider the posterior distributions of µ 46 and µ 8 Both schools have exceptionally low sample means, in the bottom 10% of all schools Density n 46 = 1 n 8 =5 school 46 school 8 psi bar (data averages)!31 N = 5000 Bandwidth = 0.301

32 Example: Ranking schools So school 8 is ranked lower than 46 by the data averages, but higher by the posterior group-means Does this make sense? The posterior density for school 46 is more peaked because it was derived from a much larger sample size Therefore our degree of certainty about higher than that for µ 8 µ 46 is much How does this translate into lack of faith in the data averages if we are going to justify using the posterior means instead?!3

33 Example: Hypothetical absence Suppose that on the day of the exam the student who got the lowest exam score in school 8 did not come to class Then the sample mean for school 8 would have been rather than 38.76, a change of more than three points (and a change in ranking) In contrast, if the lowest performing student in school ȳ had not shown up, then would have been 40.9 as opposed to 40.18, a change of only 3/4!33

34 Example: Hypothetical absence In other words, the low value of the sample mean for school 8 can be explained by either low µ 8 being very... or just the possibility that a few of the 5 sampled students were among the poorer performing students in the school In contrast for school 46, this latter possibility cannot explain the low value of the sample mean, because of the large sample size Therefore it makes sense to shrink the expectation of school 8 towards the population expectation greater amount than for 46!34 by a

35 Example: Is it fair? To some people this reversal of rankings may seem strange or unfair While fairness may be debated, the hierarchical model reflects the objective fact that there is more evidence that evidence that µ 46 µ 8 is exceptionally low than there is is exceptionally low There are many other real-life situations where differing amounts of evidence results in a switch of ranking!35

36 Hierarchical variances If the population means vary across groups, shouldn t we allow for the possibility that the population variances also vary across groups? Letting be the variance for group j, our sampling j model would then become Y 1,j,...,Y nj,j iid N (µj, j ) and our full conditional for each {µ j y 1,j,...,y nj,j, j,, } µ j would be N n jȳ j j + / n j / j +1/, [n j/ j +1/ ]!36 1!

37 Estimating j How does j get estimated? Suppose we were to specify that 1,..., m iid IG 0, 0 0 Then we can use our familiar result that the full conditional distribution of j { j y 1,j,...,y nj,j,µ j, 0, 0} IG and estimation for 1,..., can proceed by iteratively sampling their values along with the other is m 0 + n j, P i (y i,j µ j ) parameters via GS!37

38 Inefficient independence 0 0 If and are fixed in advance at some particular values, then our IID IG prior on the variances implies that, e.g., p( m 1,..., m 1) =p( m) and so the information we may have about 1,..., m 1 This seems inefficient: if 1,..., m 1 is not used to help us estimate n m m was small, and we saw that were tightly concentrated around a particular value, then we would want to use this fact to improve our estimation of!38 m

39 Heterogeneity in variances In other words, we want to be able to learn about the sampling distribution of the s and use this information to improve our estimation for groups that may have low sample sizes j 0 0 This can be done by treating and as parameters to be estimated by thinking of 1,..., m iid IG 0, 0 0 as a sampling model for across-group heterogeneity in population variances (not as a prior distribution)!39

40 Hierarchical diagram Putting this together with our model for heterogeneity in population means gives (, ) µ 1 µ µ m 1 µ m Y 1 Y Y m 1 Y m 1 m 1 m ( 0,!40 0)

41 Unknown parameters Now, the unknown parameters to be estimated include: representing the {(µ 1, 1),...,(µ m, m)} within-group sampling distributions {, } representing the across-group heterogeneity in means { 0, 0} representing the across-group heterogeneity in variances!41

42 Gibbs sampling As before, the joint posterior distribution for all of these parameters can be approximated by iteratively sampling each parameter from its full conditional distribution given the others The full conditional distributions for and are unchanged in this new setup We just derived the full conditionals for µ j and j What remains to do is to specify prior distributions for 0 and 0 and obtain their full conditionals!4

43 Conjugate prior for 0 A conjugate class of prior densities for Gamma densities 0 are the If 0 G(a, b) then it is straightforward to show that 0 1 { 0 1,..., m, 0} + 1 m 0,b+ 1 0 m X 1 A j=1 j Notice that for small values of and the conditional mean of 1,..., 0 m a is approximately the harmonic mean of!43 b

44 No conjugate prior for 0 A simple conjugate prior for We shall restrict 0 0 does not exist to be a positive integer, so that it may retain its interpretation as a prior sample size If we let the prior on 0 be the geometric distribution {1,,...} p( 0 ) / e 0 on, so that then the full conditional distribution is proportional to p( 0 0, 1,..., m) / p( 1,..., m 0, 0 1 ( 0 = 0 /) m 0/ A ( 0 /) j=1 1 j 0 / 1 0) p( 0 ) exp X 0 (1/ j )!44

45 Awkward conditional for 0 While this not pretty, this un-normalized probability density can be computed for a large range of 0 -values and then sampled from In fact, since the interpretation of prior sample size for the is that it is the s, it is reasonable to restrict it to be smaller than the data sample size mx j 0 N = j=1 n j The expression for p( 0 0, 1,..., m) may easily be evaluated (and re-normalized) over this range!45

46 Example: a particular full conditional for 0 Consider the full conditional for conditional upon 0 = 100 and j = 85 0 for all j =1,...,m mass nu.0!46

47 Gibbs sampler Therefore, we may complete the GS algorithm by sampling from the full conditional for 0, which is a discrete distribution over a finite set of values {1,,...,N} This may be accomplished with the sample function in R!47

48 Example: math score data Let us re-analyze the math score data with our hierarchical model for school-specific means and variances We shall take the parameters in our prior distributions to be the same as before, with the addition of =1 and {a =1,b=1/100} for the prior distributions on 0 and 0, respectively We shall run the GS algorithm for 5,000 iterations, as before, and compare our results to the hierarchical model with a common group-variance!48

49 Example: group means and variances Density = vars!= vars Density psi tau mass Density nu.0! sigma.0

50 Example: group variances Our first model, in which all within-group " variances are forced to be equal, is equivalent to a value of 0 = 1 in our current hierarchical model In contrast, a low value of 0 =1 would indicate that the variances are quite unequal, and little information about variances could be shared across groups The posterior summaries from the GS output indicates that neither of these extremes is appropriate, as the posterior distribution is concentrated around a moderate value of 14 or 15!50

51 Example: group variances This estimated distribution of is used to shrink extreme sample variances towards an acrossgroup center 1,..., m s apply(sigma.,, mean) n[o] (s apply(sigma.,, mean))[o] The larger amounts of shrinkage generally occur for groups with smaller sample sizes!51

52 Hierarchical binomial model Another commonly used hierarchical model is the Beta-binomial model, where Y j Bin(n j, j ) j Beta(, ) (, ) p(, ) j is the familiar Beta distribution j Y j,, Beta( + y j, + n j y j ) Conditional on and the posterior conditional for!5

53 Hierarchical diagram (, ) 1 m 1 m Y 1 Y Y m 1 Y m As usual, we treat n 1,...,n m as known!53

54 Hyperprior Unfortunately, there is no (semi-) conjugate prior for and However, it is possible to set up a non-informative hyperprior that is dominated by the likelihood and yields a proper posterior distribution which leads to a convenient Metropolis-within-Gibbs sampling method!54

55 A hyperprior choice A reasonable choice of diffuse hyperprior for the Betabinomial hierarchical model is uniform on, ( + ) 1/ + A change of variables shows that this implies the following prior on the original scale p(, ) / ( + ) 5/ There are many other possibilities. Why is this a sensible choice?!55

56 A hyperprior choice p(, ) / ( + ) 5/ It may be shown that it gives a roughly uniform prior distribution to the log standard deviation of the resulting j Beta(, ) sampling model It may also be shown that the posterior marginal p(, y) is proper with this choice as long as 0 <y j <n j for at least one experiment j!56

57 The posterior marginal How can we calculate the posterior marginal distribution? (cond prob.) p(, y) = = = p(,, y) p(,,y) (Bayes rule) p(y,, )p(,, ) p(,,y) p(y )p(, )p(, ) (cond indep. & cond prob.) p(,,y) Q m j=1 Bin(y j; n j, j ) Q m j=1 Beta( j;, ) p(, ) Q m j=1 Beta( j; + y j, + n j y j ) (...) p(, ) apple ( + ) ( ) ( ) m m Y j=1 ( + y j ) ( + n j y j ) ( + + n j )!57

58 Metropolis-within-Gibbs We may sample from the marginal ( (s+1), p(, (s+1) ) ( (s), y) generating from via 1. propose 0 Unif( (s) /, (s) ) (s) ) by. if u< p( 0, (s) y) p( (s), (s) y) (s) 0 for u Unif(0, 1) then (s+1) 0, else (s+1) (s) 3. propose 0 Unif( (s) /, (s) ) 4. if u< p( (s+1), 0 y) (s) p( (s+1), (s) y) 0 then (s+1) 0, else!58 (s+1) for u Unif(0, 1) (s)

59 Completing the MC Conditional on samples ( (s), (s) ) we may complete the MC method for sampling from the joint posterior distribution by sampling from the conditional(s) (s) j Beta( (s) + y j, (s) + n j y j ) for j =1,...,m We may also obtain samples from the posterior predictive p(ỹ y) =p(ỹ 1,...,ỹ m y 1,...,y m ) as ỹ (s) j Bin(n j, (s) j ) for j =1,...,m!59

60 Example: risk of tumors in a group of rats In the evaluation of drugs for possible clinical application, studies are routinely performed on rodents In a particular study, the aim is to estimate the probability of a tumor in a population of female rats F344 that receive a zero-dose of the drug (control group) The data show that 4/14 rats developed endometrial stromal polyps (a kind of tumor) Typically, the mean and standard deviation of underlying tumor risks are not available to form a prior!60

61 Example: prior data Rather, historical data are available on previous experiments on similar groups of rats Tarone (198) provides data on the observations of tumor incidence in 70 groups of rats # tumors # of rats in group!61

62 Example: Bayesian analysis We shall model the m = 71 rat tumor data with a hierarchical Beta-binomial sampling model with MC(MC) inference, as just described First we must obtain samples from alpha ESS = the marginal s posterior of (, ) beta ESS = !6 s

63 Example: The posterior marginal Once we have determined that the mixing is good, and we think the chain has achieved stationarity we can inspect the marginal posterior in a number of ways beta On the original scale alpha!63 log(alpha + beta) log(alpha/beta) A sensible transformation

64 Example: Mean Shrinkage As in the normal hierarchical model, we can assess the amount of shrinkage in the group-specific means, which may be obtained by direct MC, in a number of ways theta bar y/n theta bar y/n !64 n

65 Example: Rat group F344 We can now present the posterior distribution for for our 71st rat group, and compare it to the population mean of tumor rates in the 70 prior rat groups Density rat F344 a/(a+b) E{ 71 y} =0.199 E y = P 71 > = theta!65

66 In summary We have seen how hierarchical models may be used to model data which are nested or have a natural hierarchy" pool information about groups of similar populations so that smaller groups may borrow information from larger ones (i.e., shrinkage)" provide an efficient way of using prior data in an appropriate way!66

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model Hierarchical models (chapter 5) Introduction to hierarchical models - sometimes called multilevel model Exchangeability Slide 1 Hierarchical model Example: heart surgery in hospitals - in hospital j survival

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

Luminosity Functions

Luminosity Functions Department of Statistics, Harvard University May 15th, 2012 Introduction: What is a Luminosity Function? Figure: A galaxy cluster. Introduction: Project Goal The Luminosity Function specifies the relative

More information

Multi-level Models: Idea

Multi-level Models: Idea Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #1 Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

A Primer on Statistical Inference using Maximum Likelihood

A Primer on Statistical Inference using Maximum Likelihood A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

1 Inference for binomial proportion (Matlab/Python)

1 Inference for binomial proportion (Matlab/Python) Bayesian data analysis exercises from 2015 1 Inference for binomial proportion (Matlab/Python) Algae status is monitored in 274 sites at Finnish lakes and rivers. The observations for the 2008 algae status

More information

Bayesian Model Comparison:

Bayesian Model Comparison: Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

Bayesian Graphical Models

Bayesian Graphical Models Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y

More information

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: ) Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv:1809.05778) Workshop on Advanced Statistics for Physics Discovery aspd.stat.unipd.it Department of Statistical Sciences, University of

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Bayesian course - problem set 5 (lecture 6)

Bayesian course - problem set 5 (lecture 6) Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455 0392, U.S.A.

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Example using R: Heart Valves Study

Example using R: Heart Valves Study Example using R: Heart Valves Study Goal: Show that the thrombogenicity rate (TR) is less than two times the objective performance criterion R and WinBUGS Examples p. 1/27 Example using R: Heart Valves

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

The Metropolis Algorithm

The Metropolis Algorithm 16 Metropolis Algorithm Lab Objective: Understand the basic principles of the Metropolis algorithm and apply these ideas to the Ising Model. The Metropolis Algorithm Sampling from a given probability distribution

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

BUGS Bayesian inference Using Gibbs Sampling

BUGS Bayesian inference Using Gibbs Sampling BUGS Bayesian inference Using Gibbs Sampling Glen DePalma Department of Statistics May 30, 2013 www.stat.purdue.edu/~gdepalma 1 / 20 Bayesian Philosophy I [Pearl] turned Bayesian in 1971, as soon as I

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Multivariate spatial modeling

Multivariate spatial modeling Multivariate spatial modeling Point-referenced spatial data often come as multivariate measurements at each location Chapter 7: Multivariate Spatial Modeling p. 1/21 Multivariate spatial modeling Point-referenced

More information

2 Inference for Multinomial Distribution

2 Inference for Multinomial Distribution Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018 Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 14, 2018 Jarad Niemi (STAT544@ISU) Hierarchical models February 14, 2018 1 / 38 Outline Motivating example Independent vs pooled

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

ECE285/SIO209, Machine learning for physical applications, Spring 2017

ECE285/SIO209, Machine learning for physical applications, Spring 2017 ECE285/SIO209, Machine learning for physical applications, Spring 2017 Peter Gerstoft, 534-7768, gerstoft@ucsd.edu We meet Wednesday from 5 to 6:20pm in Spies Hall 330 Text Bishop Grading A or maybe S

More information

(I AL BL 2 )z t = (I CL)ζ t, where

(I AL BL 2 )z t = (I CL)ζ t, where ECO 513 Fall 2011 MIDTERM EXAM The exam lasts 90 minutes. Answer all three questions. (1 Consider this model: x t = 1.2x t 1.62x t 2 +.2y t 1.23y t 2 + ε t.7ε t 1.9ν t 1 (1 [ εt y t = 1.4y t 1.62y t 2

More information

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html

More information