Short Course Applied Linear and Nonlinear Mixed Models* Introduction

Size: px
Start display at page:

Download "Short Course Applied Linear and Nonlinear Mixed Models* Introduction"

Transcription

1 Short Course Applied Linear and Nonlinear Mixed Models* Introduction Mixed-effect models (or simply, mixed models ) are like classical ( fixedeffects ) statistical models, except that some of the parameters describing group effects or covariate effects are replaced by random variables, or random-effects. Thus, the model has both parameters, also known as fixed effects, and random effects. Thus the model has mixed effects. Random effects can be thought of as random versions of parameters. So, in some sense, a mixed model has both fixed and random parameters. This can be a useful way to think about it, but it s really not quite right and can lead to confusion. The word parameter means fixed, unknown constant, so it is really something of an oxymoron to say random parameter. As we ll see, the distinction between a parameter and a random effect goes well beyond vocabulary. * Temple-Inland Forest Products, Inc., Jan ,

2 Random effects arise when the observations being analyzed are heterogeneous, and can be thought of as belonging to several groups or clusters. This happens when there is one observation per experimental unit (tree, patient, plot, animal) and the experimental units occur or are measured in different locations, at different time points, from different sires or genetic strains, etc. Also often occurs when repeated measurements of each experimental unit are taken. E.g., several observations are taken through time of the height of 100 trees. The repeated height measurements are grouped or clustered by tree. The use of random effects in linear models leads to linear mixed models (LMMs). LMMs are not new. Some examples from this class are among the simplest, most familiar linear model and are very old. However, until recently, software and statistical methods for inference were not well-developed enough to handle the general case. Thus, only recently has the full flexibility and power of this class of models been realized. 2

3 Some Simple LMMs: The one-way random effects model Railway Rails: (See Pinheiro and Bates, 1.1) The data displayed below are from an experiment conducted to measure longitudinal (lengthwise) stress in railway rails. Six rails were chosen at random and tested three times each by measuring the time it took for a certain type of ultrasonic wave to travel the length of the rail Rail Zero-force travel time (nanoseconds) Clearly, these data are grouped, or clustered, by rail. This clustering has two closely related implications: 1. (within-cluster correlation) we should expect that observations from the same rail will be more similar to one another than observations from different rails; and 2. (between cluster heterogeneity) we should expect that the mean response will vary from rail to rail in addition to varying from one measurement to the next. These ideas are really flip-sides of the same coin. 3

4 Although it is fairly obvious that clustering by rail must be incorporated in the modeling of these data somehow, we first consider a naive approach. The primary interest here is in measuring the mean travel time. Therefore, we might naively consider the model y ij = µ + e ij, i =1,...,6,j =1,...,3, where y ij is the travel time for the j th trial on the i th rail, and we assume iid ε 11,...,ε 63 N(0,σ 2 ). Here, the notation iid N(0,σ 2 ) means, are independent, identically distributed random variables each with a normal distribution with mean 0 and (constant) variance σ 2. In addition, µ is the mean travel time which we wish to estimate. Its maximum likelihood (ML)/ordinary least-squares (OLS) estimate is the grand sample mean of all observations in the data set: ȳ =66.5. Themeansquareerror(MSE)iss 2 = , which estimates the error variance σ 2. However, an examination of the residuals form this model plotted separately by rail reveals the inadequacy of the model: Boxplots of Raw Residuals by Rail, Simple Mean Model Residuals for simple mean model Rail No. 4

5 Clearly, the mean response is changing from rail to rail. Therefore, we consider a one-way ANOVA model: y ij = µ + α i + e ij. ( ) Here, µ is a grand mean across the rails included in the experiment, and α i is an effect up or down from the grand mean specific to the i th rail. Alternatively, we could define µ i = µ + α i as the mean response for the i th rail and reparameterize this model as y ij = µ i + e ij. The OLS estimates of the parameters of this model are ˆµ i = ȳ i, of (ˆµ 1,...,ˆµ 6 )=(54.00, 31.67, 84.67, 96.00, 50.00, 82.67) and s 2 = The residual plot looks much better: Boxplots of Raw Residuals by Rail, One-way Fixed Effects Model Residuals for one-way fixed effects model Rail No. 5

6 However, there are still drawbacks to this one-way fixed effects model: It only models the specific sample of rails used in the experiment, while the main interest is in the population of rails from which these rails were drawn. It does not produce an estimate of the rail-to-rail variability in travel time, which is a quantity of significant interest in the study. The number of parameters increases linearly with the number of rails used in the experiment. These deficiencies are overcome by the one-way random effects model. To motivate this model, consider again the one-way fixed effects model. Model (*) can be written as y ij = µ +(µ i µ)+e ij where, under the usual constraint i α i =0,(µ i µ) =α i has mean 0 when averaged over the groups (rails). The one-way random effects model, replaces the fixed parameter (µ i µ) with a random effect b i, a random variable specific to the i th rail, which isassumedtohavemean0andanunknownvarianceσb 2. This yields the model y ij = µ + b i + e ij, ( ) where b 1,...,b 6 are independent random variables, each with mean 0 and variance σb 2. Often, the b i s are assumed normal, and they are usually assumed independent of the e ij s. Thus we have b 1,...,b a iid N(0,σ 2 b ), independent of e 11...,e an iid N(0,σ 2 ), where a is the number of rails, n the number of observations on the i th rail. 6

7 Note that now the interpretation of µ changes from the mean over the 6 rails included in the experiment (fixed effects model) to the mean over the population of all rails from which the six rails were sampled. In addition, we don t estimate µ i the mean response for rail i, which is not of interest. Instead we estimate the population mean µ and the variance from rail to rail in the population, σ 2 b. In addition: σ 2 b That is, our scope of inference has changed from the six rails included in the study to the population of rails from which those six rails were drawn. we can estimate rail to rail variability σ 2 b ;and the number of parameters no longer increases with the number of rails tested in the experiment. The parameters in the fixed-effect model were the grand mean µ, the rail-specific effects α 1,...,α a, and the error variance σ 2. In the random effects model, the only parameters are µ, σ 2 and σ 2 b. quantifies heterogeneity from rail-to-rail, which is one consequence of having observations that are grouped or clustered by rail, but what about within-rail correlation? 7

8 Unlike a purely fixed-effect model, the one-way random effects model does not assume that all of the responses are independent. Instead, it implies that observations that share the same random effect are correlated. E.g., for two observations from the i th rail, y i1 and y i3,say,themodel implies y i1 = µ + b i + e i1 and y i3 = µ + b i + e i3 That is, y i1 and y i3 share the random effect b i, and are therefore correlated. Why? Because one can easily show that var(y ij )=σ 2 b + σ 2 cov(y ij,y ij )=σ 2 b, j j corr(y ij,y ij )=ρ σ2 b σb 2 +, j σ2 j, and cov(y ij,y i j )=0, i i. That is, if we stack up all of the observations from the i th rail (the observations that share the random effect b i )asy i =(y i1,...,y in ) T,then 1 ρ ρ var(y i )=(σb 2 + ρ 1 ρ σ2 ) ( ) ρ ρ 1 and groups of observations from different rails (those that do not share random effects) are independent. 8

9 The variance-covariance structure given by ( ) has a a special name: compound symmetry. This means that observations from the same rail all have constant variance equal to σ 2 + σ 2 b,and all pairs of observations from the same rail have constant correlation equal to ρ = σ2 b σ 2 + σ 2 b ρ, the correlation between any two observations from the same rail, is called the intraclass correlation coefficient. In addition, because the total variance of any observation is var(y ij )= σ 2 b + σ2, the sum of two terms, σ 2 b and σ2 are called variance components. 9

10 Both fixed-effects and random-effects versions of the one-way model are fit to these data in intro.r. For (*), the fixed-effect version of the one-way model, we obtain ˆµ =66.5 with a standard error of For (**), the random-effect version of the one-way model, we obtain ˆµ =66.5 with a standard error of Standard error is larger in random-effects model, because this model has a larger scope of inference. That is, the two models are estimating different µ s: the fixed effect model is estimating the grand mean for the six rails in the study; the fixed effect model is estimating the grand mean for all possible rails. It makes sense that we would be much less certain of (i.e., there would be more error in) our estimate of the latter quantity especially if there is a lot of rail-to-rail variability. The usual method of moment/anova/reml estimates of the variance components in model (**) are ˆσ 2 = and ˆσ 2 b = 4.022, so here there is much more between-rail variability than within-rail variability. 10

11 The randomized complete block model Stool Example: In the last example, the data were grouped by rail and we were interested in only one treatment (there was only one experimental condition under which the travel time along the rail was measured). Often, several treatments are of interest and the data are grouped. In a randomized complete block design (RCBD), each of a treatments are observed in each of n blocks. As an example, consider the data displayed below. These data come from an experiment to compare the ergonomics of four different stool designs. n = 9 subjects were asked to sit in each of a = 4 stools. The response measured was the amount of effort required to stand up. T1 T2 T3 T Subject Effort required to arise (Borg scale) 11

12 Here, subjects form the blocks and we have a complete set of treatments observed in each block (each subject tests each stool). Thus we have a RCBD. Let y ij be the response for the j th stool type tested by the i th subject. The classical fixed effects model for the RCBD assumes y ij = µ + α j + β i + e ij, = µ j + β i + e ij, i =1,...,n,j =1,...,a, where e 11,...,e na iid N(0,σ 2 ). Here, µ j is the mean response for the j th stool type, which can be broken apart into a grand mean µ and a stool type effect α j. β i is a fixed subject effect. Again, the scope of inference for this model is the set of 9 subjects used in this experiment. If we wish to generalize to the population from which the 9 subjects were drawn, we would consider the subject effects to be random. 12

13 The RCBD model with random block effects is where y ij = µ j + b i + e ij, b 1,...,b n iid N(0,σ 2 b ) independent of e 11,...,e na iid N(0,σ 2 ). Since µ j s are fixed, and b i s are random, this is a mixed model. The variance-covariance structure here is quite similar to that in the oneway random effects model. Again, the model implies that any two observations that share a random effect (i.e., any two observations from the same block) are correlated. In fact, the same compound symmetry structure holds. In particular, if y i =(y i1,...,y ia ) T is the vector of observations from the i th block, then as in the last example, 1 ρ ρ var(y i )=(σb 2 + ρ 1 ρ σ2 ) ( ) ρ ρ 1 All pairs of observations from the same block have correlation ρ = σ 2 b ; σ 2 +σ 2 b all pairs of observations from different blocks are independent; and all observations have variance σ 2 +σb 2 and between block variances). (two components: within-block The RCBD model treating block effects is fit to these data in intro.r. First blocks are treated as fixed, then random. 13

14 It is often stated that whether block effects are assumed random or fixed does not affect the analysis of the RCBD. This is not completely true. It is true that whether or not blocks are treated as random does not affect the ANOVA F test for treatments. Either way we test for equal treatment means with the test statistic F = MS Trt MS E However, there are important differences in the analysis of the two designs. These differences affect inferences on treatment means. For instance, the variance of a treatment mean is var(ȳ j )= { σ 2 n σ 2 b +σ2 n for fixed block effects, for random block effects. Substituting the usual method of moment/anova estimators for σ 2 and σb 2 leads to a standard error of MS Blocks +(s 1)MS E for random block effects ns s.e.(ȳ j )= var(ȳ j ˆ )= for fixed block effects. MS E n Again, the standard error of a treatent mean is larger in the random effects model, because the scope of inference is broader. For these data, s.e.(ˆµ j )=.367 in the fixed block effects model, and s.e.(ˆµ j )=.576 in the random block effects model. For these data, the estimated between- and within-subject variance components are ˆσ 2 b =1.332 and σ 2 = This means that the estimated correlation between any pair of observations on the same subject is ˆρ = ˆσ 2 b ˆσ 2 +ˆσ 2 b = =

15 A Split-plot model Grass Example: A split-plot experimental design is one in which two sizes of experimental unit are used. The larger experimental unit, known as the whole plot, is randomized to some experimental design (a RCBD, say). The whole plot is then subdivided into smaller units, known as split plots, which are assigned to a second experimental design within each whole plot. Example: A study of the effects of three bacterial inoculation treatments and two cultivar types on grass yield was conducted as follows. Four fields, or blocks, were divided in half, and the two cultivars (A 1 and A 2 ) were assigned at random to be grown in the two halves of each field. Then each half-field (the whole plot) was divided into three subunits or split-plots, and the three inoculation treatments (B 1, B 2, and B 3 ) were randomly assigned to the three split-plots in each whole plot. The resulting design and data are as follows: Block 1 A 1 A 2 B 2 B Block 2 A 2 A 1 Block 3 A 2 A 1 Block 4 A 1 A 2 B 3 B B 2 B B 1 B B 1 B B 2 B B 1 B B 3 B B 3 B B 1 B B 3 B B 2 B Here it was easier to randomize the planting of the two cultivars to a few large units (the whole plots) then to many small units (the split plots). Convenience is the motivation for this design. 15

16 Here the 8 columns within the four rectangles are the whole plots and cultivar is the whole plot factor. The 24 smaller squares within the columns are the split plots and inoculation type is the split plot factor. The Data: y ijk, i = 1,...,a (levels of W.P. factor) j =1,...,n (blocks) k = 1,...,b (levels of S.P. factor) That is, y ijk is the response for the i th cultivar in the j th block treated with the k th inoculation type. Model: y ijk = µ + α i + τ j + b ij + β k +(αβ) ik + e ijk, where α i = effect of i th cultivar τ j = effect of j th block (treated here as fixed) β k = effect of k th inoculation treatment (αβ) ik = interaction between cultivars and inoculations In addition, b ij s iid N(0,σ 2 b ) independent of e ijk s iid N(0,σ 2 ) b ij sometimes describes as whole plot error terms. In a sense that is what a random effect is, an additional error term in the model. 16

17 b ij s are random effects for each whole plot (one for each half-field). They account for: heterogeneity from one whole plot to the next (quantified by ˆσ 2 b ); correlation among the three split-plots within a given whole plot. Again, the variance-covariance structure in this model is compound symmetric. Model implies 1 ρ ρ ρ 1 ρ var(y ij )=(σb 2 + σ 2 ) ρ ρ 1 where y ij =(y ij1,y ij2,...,y ijb ) T (vector of all observations on i, j th whole plot). This means: all pairs of observations from the same whole-plot have correlation ρ = σ2 b ; σ 2 +σ 2 b all pairs of observations from different whole-plots are independent; and all observations have variance σ 2 + σb 2 (two components: withinwhole-plot and between-whole-plot variances). The split plot model is fit to these data with the lme() function in S-PLUS/R in intro.r. Note that here we ve treated blocks as fixed. Later, we ll return to this example and model block effects as random. 17

18 The Experimental Unit, Pseudoreplication, D.F., & Balance The split-plot design involves two different experimental units: the whole plot and the split plot. Whole plots are randomly assigned to the whole plot factor. E.g., half-fields were randomized to the two cultivars. There are many fewer whole plot experimental units than there are observations (which equals the number of split plots in the experiment). Only 8 half-fields, but 24 observations in the grass experiment. With respect to cultivar, then, 8 experimental units are randomized. So degrees of freedom for testing cultivar are based on a sample size of 8. At the whole plot level, we have a RCBD design with two treatments (cultivars), four blocks, so error d.f. for testing cultivars is the error d.f. in a RCBD of this size. Namely: (2-1)(4-1)=3. With respect to cultivar, the measurements on the three split plots in each whole plot are pseudoreplicates (or subsamples). That is, they are not independently randomized to cultivars and thus proved no additional d.f. (information) regarding cultivar effects. 18

19 In some sense, modeling whole plots with random effects: identifies the appropriate error term for the whole plot factor; identifies the appropriate d.f. (amount of relevant information in the data/design) for testing the whole plot factor (cultivar); and identifies which units are true experimental units and which are pseudoreplicates with respect to each experimental factor. If a purely fixed-effect model is used in the split-plot design* then the usual MSE and DFE based upon the e ijk error term will lead to incorrect inferences on the whole plot factor. See model grass.lm1 in intro.r. Correct inferences on whole plot factors can sometimes be obtained from a fixed-effects analysis, but you have to really know what you re doing, especially in complex situations like split-split-plot models, etc; and the design has to be balanced (rare!). So, the use of random effects is also motivated by use of multiple sizes of experimental units with distinct randomizations; presence of pseudo-replication; and imbalance. Mixed effects models handles these complications much more automatically than fixed-effects models and, consequently, avoid incorrect inferences to which fixed-effects models are prone in these situations. * this would be done by modeling variability among whole plots with a fixed cult*block interaction effect 19

20 A More Complex Example PMRC Site Preparation Study: Study of various site preparation and intensive management regimes on the growth of slash pine. Involved ha plots nested within 16 sites in lower coastal plain of GA and FL. Data consist of repeated plot-level measurements of hd=dominant height (m), ba=basal area (m 2 /ha), tph (100 s of trees per ha), derived volume (total volume outside bark in m 3 /ha), and other variables at ages 2, 5, 8, 11, 14, 17, and 20 years. At each site, plots were randomized to eleven treatments consisting of a subset of the 2 5 = 32 combinations of five two-level (absent/present) treatment factors: A = Chop, site prep. w/ a single pass of a rolling drum chopper; B = Fert, fertilizer following the first, 12th and 17th growing seasons; C = Burn, broadcast burn of site prior to planting; D = Bed, a double pass bedding of the site; E = Herb, veg. control with chemical herbicide. 20

21 Here is a plot of the data. Each panel represents a site, and each panel contains tph over time profiles for each plot on that site. 1 2 Separate profiles for each plot, graphed separately by site Trees/Hectare (100s of trees) Age (yrs) (Each panel is a site) These data are grouped, or clustered, by site. We would expect heterogeneity from site to site. We would expect correlation among plots within the same site. Data are also grouped by plot, since we have repeated measures throughtimeoneachplot. Again, we would expect plots to be heterogeneous. Expect stronger correlation among observations from the same plot than observations from different plots. 21

22 In addition, we d like to make inferences about the population of plantation sites for which these sites are representative, not just these sites alone. Also would like to be able to generalize to the population from which these plots are drawn. Hence, it makes sense to model sites with random site effects, plots with random plot effects. Plots are nested within sites. This would be an example of a multilevel mixed model. In addition, plots are randomized to treatments, then repeated measures through time are taken on each plot. With respect to treatments, plots are the experimental unit, but measurement unit occurs at a finer scale: times within plots. These time-specific measurements are a bit like measurements on split plots. However, in a split-plot example, observations from the same whole plot are correlated due to shared characteristics of that whole plot. These are captured by whole plot random effects. In a repeated measures context, observations through time from the same unit are correlated due to shared characteristics of that unit and are subject to serial correlation (observations taken close together in time more similar than observations taken far apart in time). Thus, in a repeated measures context, we may want random effects and serial correlation built into our model. We ll soon see how multilevel random effects, serial correlation, and other features can be handled in the general form of the LMM. 22

23 Fixed vs. random effects: The effects in the model account for variability in the response across levels of treatment and design factors. The decision as to whether fixed effects or random effects should be used depends upon what the appropriate scope of generalization is. If it is appropriate to think of the levels of a factor as randomly drawn from, or otherwise representative of, a population to which we d like to generalize, then random effects are suitable. Design or grouping factors are usually more appropriately modeled with random effects. E.g., blocks (sections of land) in an agricultural experiment, days when an experiment is conducted over several days, lab technician when measurements are taken by several technicians, subjects in a repeated measures design, locations or sites along a river when we desire to generalize to the entire river. If, however, the specific levels of the factor are of interest in and of themselves then fixed effects are more appropriate. Treatment factors are usually more appropriately modeled with fixed effects. E.g., In experiments to compare drugs, amounts of fertilizer, hybrids of corn, teaching techniques, and measurement devices, these factors are most appropriately modeled with fixed effects. A good litmus test for whether the level of some factor should be treated as fixed is to ask whether it would be of broad interest to report a mean for that level. For example, if I m conducting an experiment in which each of four different classes of third grade students are taught with each of three methods of instruction (e.g., in a crossover design) then it will be of broad interest to report the mean response (level of learning, say) for a particular method of instruction, but not for a particular classroom of third grades. Here, fixed effects are appropriate for instruction method, random effects for class. 23

24 Preliminaries/Background In order to really understand the LMM, we need to study it in its vector/matrix form. So, we need to discuss/review random vectors and the multivariate normal distribution. Also need to review the classical linear model (CLM) before generalizing to the LMM. Estimation in the CLM based on least squares, but in the LMM, maximum likelihood (ML) estimation is used. Therefore, need to cover/review the basic ideas of ML estimation. Random Vectors: Random Vector: A vector whose elements are random variables. E.g., y 1 y 2 y =.., y n where y 1,y 2,...,y n are each random variables. Random vectors we will be concerned with: A vector containing the response variable measured on n units in the sample: y =(y 1,...,y n ) T. A vector of error terms in a model for y: e =(e 1,...,e n ) T. A vector of random effects: b =(b 1,b 2,...,b q ) T. Expected Value: The expected value (population mean) of a random vector is the vector of expected values, often denoted µ. Fory n 1, E(y) = E(y 1 ) E(y 2 ). E(y n ) µ 1 µ 2.. µ n = µ. 24

25 (Population) Variance-Covariance Matrix: For a random vector y n 1 =(y 1,y 2,...,y n ) T with mean µ =(µ 1,µ 2,...,µ n ) T,thematrix var(y 1 ) cov(y 1,y 2 ) cov(y 1,y n ) cov(y 2,y 1 ) var(y 2 ) cov(y 2,y n ) E[(y µ)(y µ) T ]= cov(y n,y 1 ) cov(y n,y 2 ) var(y n ) σ 11 σ 12 σ 1n σ 21 σ 22 σ 2n σ n1 σ n2 σ nn is called the variance-covariance matrix of y and is denoted var(y). (Population) Correlation Matrix: For a random variable y n 1,the population correlation matrix is the matrix of correlations among the elements of n: 1 corr(y 1,y 2 ) corr(y 1,y n ) corr(y 2,y 1 ) 1 corr(y 2,y n ) corr(y) = corr(y n,y 1 ) corr(y n,y 2 ) 1 Recall: for random variables y i and y j, corr(y i,y j )= cov(y i,y j ) var(yi )var(y j ) measures the amount of linear association between y i and y j. Correlation matrices are symmetric. 25

26 Properties of expected value, variance: Let x, y be random vectors of the same dimension, and let C and c be a matrix and vector, respectively, of constants. Then 1. E(y + c) =E(y)+c. 2. E(x + y) =E(x)+E(y). 3. E(Cy) =CE(y). 4. var(y + c) =var(y). 5. var(y + x) =var(y)+var(x)+ cov(y, x)+cov(x, y). }{{} =0 if x, y independent 6. var(cy) =Cvar(y)C T. 26

27 Multivariate normal distribution: The multivariate normal distribution is to a random vector as the univariate (usual) normal distribution is to a random variable. It is the version of the normal distribution appropriate to the joint distribution of several random variables (collected and stacked as a vector) rather than a single random variable. Recall that we write y N(µ, σ 2 ) to signify that the univariate r.v. y has the normal distribution with mean µ and variance σ 2. Meansthaty has probability density function (p.d.f.) [ ] 1 f Y (y) = exp (y µ)2 2πσ 2 2σ 2 Meaning: for two values y 1 <y 2 the area under the graph of the p.d.f. between f Y (y 1 )andf Y (y 2 )givespr(y 1 <Y <y 2 ). We write y N n (µ, Σ) to denote that y follows the n dimensional multivariate normal distribution with mean µ and variance-covariance matrix Σ. ( ) y1 E.g., for a bivariate random vector y = N 2 (µ, Σ), the p.d.f. of y maps out a bell over the (y 1,y 2 ) plane centered at µ with spread described by Σ. Recall for y N(µ, σ 2 ) the p.d.f. of y is f(y) = 1 exp (2πσ 2 ) 1/2 { 1 2 y 2 (y µ) 2 In the multivariate case, for y N n (µ, Σ), the p.d.f. of y is { 1 f(y) = exp 1 } (2π) n/2 Σ 1/2 2 (y µ)t Σ 1 (y µ). σ 2 }, Here Σ denotes the determinant of the var-cov matrix Σ. 27

28 Review of Classical (Fixed-Effects) Linear Model Assume we observe a sample of independent pairs, (y 1, x 1 ),..., (y n, x n )wherey i is a response variable and x i =(x i1,...,x ip ) T is a p 1 vector of explanatory variables. The classical linear model can be written y i = β 1 x i1 + + β p x ip + e i, = x T i β + ε i, iid where e 1,...,e n N(0,σ 2 ). i =1,...,n, Equivalently, we can stack these n equations and write the model as follows: y 1 x 11 x 12 x 1p β 1 e 1.. = y n x n1 x n2 x np β p e n or y = Xβ + e Our assumptions on e 1,...,e n can be equivalently restated as e N n (0,σ 2 I n ). Since y = Xβ +e and e N n (0,σ 2 I n ), it follows that y is m variate normal too: y N n (Xβ,σ 2 I n ). The var-cov matrix for y is σ σ 2 0 σ I n = σ 2 y i s are uncorrelated and have constant variance σ 2. Therefore, in the CLM y is assumed to have multivariate normal joint p.d.f. 28

29 Estimation of β and σ 2 : Maximum likelihood estimation: In general, the likelihood function is just the probability density function, but thought of as a function of the parameters rather than of the data. Interpretation: likelihood function quantifies how likely the data are for a given value of the parameters. The idea behind maximum likelihood estimation is to find the values of β and σ 2 under which the data are most likely. That is, we find the β and σ 2 that maximize the likelihood function, or equivalently, the loglikelihood function, for the value of y actually observed. These values are the maximum likelihood estimates (MLEs) of the parameters. For the CLM, the loglikelihood is l(β,σ 2 ; y) = n 2 log(2π) n }{{} 2 log(σ2 ) 1 2σ (y 2 Xβ)T (y Xβ). }{{} a constant kernel of l 29

30 Notice that maximizing l(β,σ 2 ; y) with respect to β is equivalent to maximizing the third term: which is equivalent to minimizing 1 2σ 2 (y Xβ)T (y Xβ), (y Xβ) T (y Xβ) = n (y i x T i β)2 (Least-Squares Criterion). ( ) i=1 (y Xβ) T (y Xβ) is the squared distance between y and its mean, Xβ. Parameter estimate ˆβ minimizes this distance. That is, ˆβ gives the estimated mean Xˆβ that is closest to y. So, the estimators of β given by ML and (ordinary) least squares (OLS) coincide. For β in the CLM: ML = OLS and, if X is of full rank (model is not overparameterized) then:. ˆβ =(X T X) 1 X T y 30

31 Estimation of σ 2 : Setting the partial derivative of l with respect to σ 2 to 0 and solving leads to the MLE of σ 2 : ˆσ ML 2 = 1 n (y Xˆβ) T (y Xˆβ) = 1 (y i x T i n ˆβ) 2 = 1 n SS E i Problem: This estimator is biased for σ 2. This bias can be easily fixed, which leads to the generally preferred estimator: ˆσ 2 = 1 n p (y Xˆβ) T (y Xˆβ) = 1 n p SS E = 1 df E SS E = MS E Note that the MLE of σ 2 is biased, and this is due to using the wrong valueforthedf E (the divisor for SS E ). df E = n p is the information in the data left for estimating σ 2 after having estimated β 1,...,β p. Because ˆσ ML 2 uses n rather than n p, it is often said that the MLE of σ 2 fails to account for d.f. used (or lost) in estimating β. MS E, the preferred estimator of σ 2, is an example of what is known as a restricted ML (REML) estimator. As we ll see, REML is the preferred method of estimating variance components in LMMs. This method simply generalizes using ˆσ 2 = MS E rather than ˆσ ML 2 in the CLM. 31

32 Example Volume of Cherry Trees: For 31 black cherry trees the following measurements were obtained: V = Volume of usable wood (cubic feet) H = Height of tree (feet) D = Diameter at breast height (inches) Goal: Predict usable wood volume from diameter and height. See S-PLUS script, backgrnd.r. Here, we first consider a simple multiple regression model cherry.lm1, for these data: V i = β 0 + β 1 H i + β 2 D i + e i, i =1,...,31 Initial plots of V against both explanatory variables, D and H, look linear, so this model may be reasonable. cherry.lm1 gives a high R 2 of.941 and most residual plots look pretty good. However, plot of residuals vs. diameter looks U -shaped, so we consider some other models for these data. 32

33 Inference in the CLM: Under the basic assumptions of the CLM (independence, homoscedasticity, normality), ˆβ, the ML/OLS estimator of β, has distribution ˆβ N(β,σ 2 (X T X) 1 ) That is, ˆβ is unbiased for β; ˆβ has var-cov matrix σ 2 (X T X) 1 ˆβ j has standard error s.e.( ˆβ j )= MS E [(X T X) 1 ] jj ; ˆβ is normally distributed. Also can be shown that ˆβ is optimal estimator (BLUE, UMVUE). These properties lead to a number of normal-theory methods of inference: 1. t tests, confidence intervals for an individual regression coefficient β j based on ˆβ j ˆβ s.e.( ˆβ t(n p) j ) }{{} the t distribution with n p d.f. 100(1 α)% CI for β j given by ˆβ j ± t 1 α/2 (n p)s.e.( ˆβ j ). For an α-level test of H 0 : β j = β 0 versus H 1 : β j β 0 we use the rule: reject H 0 if ˆβ j β 0 s.e.( ˆβ j ) >t 1 α/2(n p) Tests of H 0 : β j =0foreachβ j given by summary() function in S-PLUS/R. 33

34 2. More generally, inference on linear combinations of the β j s of the form c T β (e.g., contrasts) based on the t distribution: c T ˆβ c T ˆβ MSE c T (X T X) 1 c t(n p) E.g., 100(1 α)% C.I. for the expected response at a given value of the vector of explanatory variables x o is given by x T ˆβ 0 ± t 1 α/2 (n p) MS E x T 0 (XT X) 1 x 0. A 100(1 α)% prediction interval for the response on a new subject with vector of explanatory variables x o is given by x T ˆβ 0 ± t 1 α/2 (n p) MS E (1 + x T 0 (XT X) 1 x 0 ). Confidence intervals for fitted and predicted values given by the predict() function in S-PLUS/R. 3. Inference on the entire vector β is based on the fact that (ˆβ β) T (X T X)(ˆβ β) F (p, n p) pms E }{{} the F distribution with p and n p d.f. E.g., we can test any hypothesis of the form H 0 : Aβ = c where A is a k p matrix of constants (e.g., contrast coefficients) with an F test. The appropriate test has rejection rule: reject if F = (Aˆβ c) T {A(X T X) 1 A} 1 (Aˆβ c) kms E >F 1 α (k, n p). 4. The fit of nested models can be compared via an F test comparing their MS E s. Accomplished with the anova() function in S-PLUS/R. 34

35 Clustered Data: Clustered data are data that are collected on subjects/animals/trees/units which are heterogenous, falling into natural groupings, or clusters, based upon characteristics of the units themselves or the experimental design, but not on the basis of treatments or interventions. The most common example of clustered data are repeated measures data. By repeated measures, people typically mean data consisting of multiple measurements of essentially the same variable on a given subject or unit of observation. Repeated measurements are typically taken through time, but can be at different spatial locations, or can arise from multiple measuring devices, obervers, etc. When repeated measures are taken through time, the terms longitudinal data, and panel data, are roughly synonymous. We ll use the more generic term clustered data to refer to any of these situations. Clustered data also include data from split-plot designs, crossover designs, hierarchical sampling, and designs with pseudoreplication/subsampling. 35

36 Advantages of longitudinal/clustered data: Allow study of individual patterns of change i.e., growth. Economize on experimental units. Heterogeneous experimental units are often better representative of the population to which we d like to generalize. Each subject/unit can serve as his or her own control. Disadvantages: E.g., in a split-plot experiment or crossover design, comparisons between treatments can be done within the same subject. In a longitudinal study comparisons of time effects (growth) can be made within a subject rather than between subjects. Between unit heterogeneity can be eliminated when assessing treatment or time effects. Leads to more power/efficiency (think paired t-test versus two-sample t-test). Correlation, multiple sources of heterogeneity in the data. Makes statistical methods harder to understand, implement. LMMs flexible enough to deal with these features. Imbalance, incompleteness in data more common. This can be hard for some statistical methods, especially if missing data are not missing at random. LMMs handle unbalanced data relatively easily, well. 36

37 Linear Mixed Models (LMMs) We will present the LMM for clustered data. It can be presented and used in a somewhat more general context, but most applicationsare to clustered data and this is a simpler case to discuss/understand. Examples revisited: Example 1, One-way random effects model Rails Recall that we had three observations on each of 6 rails. Model: y ij = µ + b i + e ij, i =1,...,6,j =1,...,3, where y ij =responsefromj th measurement on i th rail µ = grand mean response across population of all rails b i = random effect for the i th rail e ij = error term Data are clustered by rail. Model for all data from the i th rail can be written in vector/matrix form: y i1 y i2 = 1 1 µ b i + e i1 e i2 y i3 1 1 e i3 or y i = X i β + Z i b i + e i 37

38 Example 2, RCBD model Stools Recall that we had n = 9 subjects, each of whom tested all a =4 stool designs under study. Model: where y ij = µ j + b i + e ij, i =1,...,n,j =1,...,a, y ij =responsefromj th stool tested by i th subj. µ j = mean response for stool type j across population of all subjects b i = random effect for the i th subject e ij = error term Data are clustered by subject. Model for all data from the i th subject can be written in vector/matrix form: or y i1 y i2 y i3 y i4 = y i = X i β + Z i b i + e i µ 1 µ 2 µ 3 µ b i + e i1 e i2 e i3 e i4 38

39 Example 3, Split-plot model Grass Recall that we had 8 whole plots (half-fields) randomized to a RCND, and then split into 8 split-plots, which were randomized to 3 different inoculation types. Model: y ijk = µ + α i + β k +(αβ) ik + τ j + b ij + e ijk, where y ijk is response from the split-plot assigned to the k th inoculation type within the (i, j) th whole plot (which is assigned to the i th cultivar in j th block). In addition, µ =grandmean α i = i th cultivar effect (fixed) β k = k th inoculation type effect (fixed) (αβ) ik =cultivar inoculation interaction effect (fixed) τ j = j th block effect (treated as fixed, but could be random) b ij = effect for the (i, j) th whole plot (random) e ijk = error term (random) Data are clustered by whole plot. Model for all data from the (i, j) th whole plot can be written in vector/matrix form: µ α i y ij1 y ij2 = β 1 β β b i + e ij1 e ij2 y ij (αβ) i1 1 e ij3 (αβ) i2 (αβ) i3 τ j or y ij = X ij β + Z ij b ij + e ij 39

40 The Linear Mixed Model for Clustered Data: Notice that all 3 of the previous examples have the same form. They are all examples of LMMs with a single (univariate) random effect: a random cluster-specific intercept. Suppose we have data on n clusters, where y i =(y i1,...,y iti ) T are the t i observations available on the i th cluster, i =1,...,n. Then the LMM with random cluster-specific intercept is given (in general) by y i = X i β + Z i b i + e i, i =1,...,n, where X i is a t i p design matrices for the fixed effects β, andz i is a t i 1 vector of ones. e i is a vector of error terms. If you re not comfortable with the vector/matrix representation, another way to write it is where z ij =1. Assumptions: y ij = β 1 x 1ij + β 2 x 2ij + + β p x pij + z ij b i +e ij }{{}}{{} fixed part random part cluster effects: b i s are independent, normal with variance (variance component) σ 2 b. error terms: e ij s are independent, normal with variance (variance component) σ 2. We will relax both the assumption of independence, and constant variance (homoscedasticity) later b i s and e ij s assumed independent of each other. 40

41 Often, it makes sense to have more than one random effect in the model. To motivate this, let s consider another example. Example Microbril Angle in Loblolly Pine Whole-disk cross-sectional microfibril angle was measured at 1.4, 4.6, 7.6, 10.7 and 13.7 meters up the stem of 59 trees, sampled from four physiographic regions. Regions (no. trees) were Atlantic Coastal Plain (24), Piedmont (17), Gulf Coastal Plain (9), and Hilly Coastal Plain (9). A plot of the data: Atlantic Gulf Hilly Piedmont Whole disk cross sectional microbril angle (deg) Height on stem (m) 41

42 Here we have 4 or 5 repeated measures on each tree. Repeated measures not through time, but through space, up the stem of the tree. Any reasonable model would account for heterogeneity between individual trees; correlation among observations on the same tree; and dependence of MFA on height at which it is measured. From the plots it is clear that MFA decreases with height. For simplicity, suppose it decreases linearly with height (it doesn t, but let s keep things easy). Let y ijk be the MFA on the j th tree in the i th region, measured at the k th height. Then a reasonable model might be y ijk = µ i + βheight ijk +b ij + e ijk }{{} fixed part where µ i = mean response for i th region β = slope for linear effect of height on MFA b ij = random effect for (i, j) th tree e ijk = error term for height-specific measurements Fixed part of model says that MFA decreases linearly in height, with an intercept that depends on region. I.e., mean MFA is different from one region to next. Random effects (the b ij s) say that the intercept varies from tree to tree within region. 42

43 Rather than just random tree-specific intercepts, suppose we believe that the slope (linear effect of height on MFA) also varies from subject to subject. This leads to a random intercept and slope model: y ijk =(µ i + b 1ij ) +(β + b 2ij ) height }{{}}{{} ijk + e ijk intercept slope = µ i + βheight ijk + b 1ij + b 2ij height ijk + e ijk Now there are ( two random ) effects b 1ij and b 2ij, or bivariate random b1ij effects: b ij =. b 2ij No reason to expect that an individual tree s effect on the intercept would be independent of that same tree s effect on the slope. So, we would assume b 1ij and b 2ij are correlated (probably negatively). Model can be written as y ij1 1 height ij1 =. y ij5 or. 1 height ij5 ( µi β y ij = X ij β + Z ij b ij + e ij ) 1 height ij height ij5 ( b1ij b 2ij ) + e ij1. e ij5 43

44 So, the LMM in general may have > 1 random effect, which leads us to the general form of the model: y i = X i β + Z i b i + e i, i =1,...,n, where X i = design matrix for fixed effects β = p 1 vector of fixed effects (parameters) Z i = design matrix for random effects b i = q 1 vector of random effects e i = vector of error terms If you re not comfortable with the vector/matrix representation, another way to write it is y ij = β 1 x 1ij + β 2 x 2ij + + β p x pij + z 1ij b 1i + + z qij b qi +e ij. }{{}}{{} fixed part random part Assumptions: cluster effects: b i s are normal and independent from cluster to cluster. We allow b 1i,...,b qi (random effects from same cluster - e.g., random intercept and slope) to be correlated, with var-cov D: b i s iid N q (0, D) error terms: e ij s are independent, normal with variance (variance component) σ 2.Thatis,e i s iid N ti (0,σ 2 I). We will relax both the assumption of independence, and constant variance (homoscedasticity) later b i s and e ij s assumed independent of each other. 44

45 Example Microbril Angle in Loblolly Pine (Continued) Recall the original random-intercept (only) model: y ijk = µ i + βheight ijk + b ij + e ijk. This model is fit with the lme() function in LMM.R: > mfa.lme1 <- lme(mfa ~ regname + diskht -1, data=mfa, random= ~1 tree) > summary(mfa.lme1) Linear mixed-effects model fit by REML Data: mfa AIC BIC loglik Random effects: Formula: ~1 tree (Intercept) Residual StdDev: Fixed effects: mfa ~ regname + diskht - 1 Value Std.Error DF t-value p-value regnameatlantic regnamegulf regnamehilly regnamepiedmont diskht Correlation: rgnmat rgnmgl rgnmhl rgnmpd regnamegulf regnamehilly regnamepiedmont diskht Standardized Within-Group Residuals: Min Q1 Med Q3 Max Number of Observations: 274 Number of Groups: 59 45

46 The random intercept and random slope model was y ijk = µ i + βheight ijk + b 1ij + b 2ij height ijk + e ijk. This model can be fit with lme() too, but an easy way to refit a model with a slight change is via update(): > mfa.lme2 <- update(mfa.lme1, random= ~diskht tree) > summary(mfa.lme2) Linear mixed-effects model fit by REML Data: mfa AIC BIC loglik Random effects: Formula: ~diskht tree Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) (Intr) diskht Residual Fixed effects: mfa ~ regname + diskht - 1 Value Std.Error DF t-value p-value regnameatlantic regnamegulf regnamehilly regnamepiedmont diskht Correlation: rgnmat rgnmgl rgnmhl rgnmpd regnamegulf regnamehilly regnamepiedmont diskht Standardized Within-Group Residuals: Min Q1 Med Q3 Max Number of Observations: 274 Number of Groups: 59 46

47 Questions: The models were fit by REML. What does that mean? Which model is better? How do know if the model assumptions are met (diagnostics)? How do we predict MFA at a given height for a given tree? for the population of all trees from a given region? Estimation and Inference in the LMM: Estimation: In the classical linear model, the usual method of estimation is ordinary least squares. However, we saw that if we assume normal errors, then OLS gives the same estimates of β as maximum likelihood (ML) estimation. In the LMM, there are fixed effects β, but also parameters related the distribution of the random effects (e.g., variance components such as σ 2 b ) as well as parameters related to the error terms (e.g., the error variance σ 2 ). Least-squares doesn t provide a framework for estimation and inference for all of these parameters, so ML and related likelihood-based methods (i.e., restricted maximum likelihood, or REML) are generally preferred. 47

48 ML: recall that ML proceeds by finding the parameters that maximizes the loglikelihood, or joint p.d.f. of the data. Finds the parameter values under which the observed data are most likely. Since the LMM assumes that the errors are normal, the random effects are normal, and the response y is linearly related to the errors and random effects via y = Xβ + Zb + e, its not hard to show that the LMM implies that the response vector y is normal too. That is, its easy to show that the observations from different clusters are independent, with y i N(X i β, V i ) where V i = Z i DZ T i + σ 2 I the joint p.d.f. of the data is multivariate normal the loglikelihood is the log of a m variate normal p.d.f. This loglikelihood is easy to write down, but requires iterative algorithm to maximize. Implemented optionally in lme() with method= ML option. 48

49 REML: Recall from the classical linear model that the MLE of σ 2 was biased. Did not adjust for d.f. lost in estimating β (fixed effects). Instead we used MSE as preferred estimator of σ 2. REML was developed as a general likelihood-based methodology that would be applicable to all LMMs, but which would take account of d.f. lost in estimation of β to produce less biased estimates of variance-covariance parameters (e.g., variance components) than ML; generalize the old, well-known, unbiased estimators in those simple cases of the LMM where such estimators are known; e.g., REML yields MSE as its estimator of σ 2 in the CLM. REML is based upon maximizing the restricted loglikelihood Can be thought of as that portion of the loglikelihood that doesn t depend on β. Like ML estimation, requires iterative algorithm to produce estimates. REML is the default estimation method for the lme() function and PROC MIXED in SAS. It s generally regarded as the preferred method of estimation for LMMs. However, some aspects of model selection are easier with ML, so sometimes competing models are fit and compared with ML, and then best model refit with REML at the end. 49

50 Inference on Fixed Effects: Remember, the framework for estimation and inference in the LMM is ML or REML, not least-squares as in the CLM. The standard methods of inference in a likelihood-based framework are Wald tests,andlikelihood ratio tests (LRTs). LRTs and Wald tests are based upon asymptotic theory. That is, they provide methods that hold exactly when the sample size goes to infinty and only approximately for finite sample sizes. LRTs are useful for comparing nested models. Shouldn t be used for comparing random-effect structures/variancecovariance structures. Shouldn t be used with REML, only ML. Wald tests are useful for testing linear hypotheses (e.g., contrasts) on fixed effects. Wald tests yield approximate z and chi-square tests. These tests can be improved as t and F tests to produce better inferences in small-samples. 50

51 Wald Tests: It can be shown that the approximate (i.e., large sample) distribution of the (restricted) ML estimator ˆβ in the LMM is where V i = Z i DZ T i + σ 2 I. [ n ] 1 ˆβ N β, X T i V 1 i X i, ( ) i=1 }{{} =var( ˆβ) In practice var(ˆβ) is estimated by plugging in final (restricted) ML estimates obtained from fitting the model. Standard errors of ˆβ j,thej th component of ˆβ are obtained as the square root of the j th diagonal element of var( ˆ ˆβ). The distributional result ( ) leads to the general Wald test on β. In particular, we reject H 0 : Aβ = c at level α, wherea is k p reject H 0 if (Aˆβ c) T {A[ var( ˆ ˆβ)] 1 A T } 1 (Aˆβ c) >χ 2 1 α(k) where χ 2 1 α(k) is the upper α th critical value of a chi-square distribution on k df. As a special case, an approximate z test of H 0 : ˆβj = 0 versus H 0 : ˆβ j 0 rejects H 0 if ˆβ j s.e.( ˆβ j ) >z 1 α/2 where z 1 α/2 is the (1 α/2) quantile of a standard normal distribution. In addition, an approximate 100(1 α)% CI for β j is given by ˆβ j ± z 1 α/2 s.e.( ˆβ j ). 51

13. October p. 1

13. October p. 1 Lecture 8 STK3100/4100 Linear mixed models 13. October 2014 Plan for lecture: 1. The lme function in the nlme library 2. Induced correlation structure 3. Marginal models 4. Estimation - ML and REML 5.

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36 20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Estimation: Problems & Solutions

Estimation: Problems & Solutions Estimation: Problems & Solutions Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline 1. Introduction: Estimation of

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Estimation: Problems & Solutions

Estimation: Problems & Solutions Estimation: Problems & Solutions Edps/Psych/Soc 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline 1. Introduction: Estimation

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Review of CLDP 944: Multilevel Models for Longitudinal Data

Review of CLDP 944: Multilevel Models for Longitudinal Data Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Linear Regression. Chapter 3

Linear Regression. Chapter 3 Chapter 3 Linear Regression Once we ve acquired data with multiple variables, one very important question is how the variables are related. For example, we could ask for the relationship between people

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

The Standard Linear Model: Hypothesis Testing

The Standard Linear Model: Hypothesis Testing Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Mixed effects models

Mixed effects models Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - II Henrik Madsen, Jan Kloppenborg Møller, Anders Nielsen April 16, 2012 H. Madsen, JK. Møller, A. Nielsen () Chapman & Hall

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Homework 3 - Solution

Homework 3 - Solution STAT 526 - Spring 2011 Homework 3 - Solution Olga Vitek Each part of the problems 5 points 1. KNNL 25.17 (Note: you can choose either the restricted or the unrestricted version of the model. Please state

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information