Sparse Matrix Methods and Mixed-effects Models

Size: px
Start display at page:

Download "Sparse Matrix Methods and Mixed-effects Models"

Transcription

1 Sparse Matrix Methods and Mixed-effects Models Douglas Bates University of Wisconsin Madison

2 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

3 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

4 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

5 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

6 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

7 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

8 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.

9 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.

10 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.

11 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)

12 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)

13 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)

14 What Positive Role Does Computing Play? Computing is an integral part of essentially all applications of statistics. It gives us the ability to explore complex models applied to large data sets with complex structure. In terms of theory, computing is widely used in simulation studies. Also, it provides a grounding (or should) for proposed methods. We have powerful computers but not infinitely powerful and not with an infinite amount of storage.

15 What Positive Role Does Computing Play? Computing is an integral part of essentially all applications of statistics. It gives us the ability to explore complex models applied to large data sets with complex structure. In terms of theory, computing is widely used in simulation studies. Also, it provides a grounding (or should) for proposed methods. We have powerful computers but not infinitely powerful and not with an infinite amount of storage.

16 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.

17 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.

18 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.

19 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.

20 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.

21 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.

22 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.

23 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.

24 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.

25 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

26 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).

27 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).

28 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).

29 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).

30 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).

31 The Wisconsin Knowledge and Concepts Exam (WKCE) > str(wkce) $ sch : Factor w/ 2411 levels "6","7","16","56",..: data.frame : obs. of 14 variables: $ LDS : Factor w/ levels "10000","10001",..: $ Yr : Ord.factor w/ 4 levels " "<" "<..: $ Gr : Ord.factor w/ 7 levels "3"<"4"<"5"<"6"<..: $ Rss : int $ Mss : int $ dist: Factor w/ 445 levels "7","14","63",..: $ Sx : Factor w/ 2 levels "F","M": $ Ra : Factor w/ 5 levels "W","B","H","A",..: $ Dis : Factor w/ 2 levels "N","Y": $ ELP : Factor w/ 2 levels "N","Y": $ EC : Factor w/ 2 levels "N","Y": > xtabs(~ Yr + Gr, WKCE) Gr Yr

32 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

33 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

34 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

35 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

36 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

37 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.

38 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

39 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.

40 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.

41 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.

42 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.

43 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.

44 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.

45 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.

46 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.

47 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.

48 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.

49 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

50 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.

51 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.

52 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.

53 Evaluating the likelihood For a LMM there are several different expressions for the likelihood. One method that allows for generalization to other forms of mixed models is first to determine the conditional mode, ũ = arg max u and expand h at ũ. f U Y(u y) = arg max h(u y) u = arg min u y Xβ ZΛ θ u 2 + u 2 Because we are dealing with multivariate Gaussians the log conditional density will be exactly quadratic. For other types of models it will be approximately quadratic.

54 Evaluating the likelihood For a LMM there are several different expressions for the likelihood. One method that allows for generalization to other forms of mixed models is first to determine the conditional mode, ũ = arg max u and expand h at ũ. f U Y(u y) = arg max h(u y) u = arg min u y Xβ ZΛ θ u 2 + u 2 Because we are dealing with multivariate Gaussians the log conditional density will be exactly quadratic. For other types of models it will be approximately quadratic.

55 Solving the Penalized Least Squares Problem The point of all this development is that we can solve the penalized least squares problem, which could be rewritten ũ = arg min u [ y Xβ 0 ] [ ] ZΛθ u with solution satisfying ( ) Λ T θ ZT ZΛ θ + I ũ = Λ T θ ZT y I q 2 We do this by forming the sparse Cholesky factor, which is a sparse lower triangular matrix, L θ, satisfying L θ L T θ = ΛT Z T ZΛ + I

56 Solving the Penalized Least Squares Problem The point of all this development is that we can solve the penalized least squares problem, which could be rewritten ũ = arg min u [ y Xβ 0 ] [ ] ZΛθ u with solution satisfying ( ) Λ T θ ZT ZΛ θ + I ũ = Λ T θ ZT y I q 2 We do this by forming the sparse Cholesky factor, which is a sparse lower triangular matrix, L θ, satisfying L θ L T θ = ΛT Z T ZΛ + I

57 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.

58 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.

59 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.

60 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.

61 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n

62 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n

63 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n

64 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.

65 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.

66 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.

67 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models

68 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.

69 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.

70 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.

71 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.

72 Benefits for Moderate-sized Data Sets Being able to evaluate and optimize the deviance rapidly is beneficial for small or moderate-sized data sets too. We can profile the deviance w.r.t. individual parameters. The change in the profiled deviance from the optimum is a LRT statistic on 1 degree of freedom. For inferences based on estimates and standard errors only to be reliable the profiled deviance should be quadratic and the signed square-root, written ζ, should be linear.

73 Data from Davies (1947) collected by G.E.P. Box E C Batch B A D F Yield of dyestuff (grams of standard color) Linear mixed model fit by maximum likelihood Formula: Yield ~ 1 + (1 Batch) Data: Dyestuff AIC BIC loglik deviance Random effects: Groups Name Variance Std.Dev. Batch (Intercept) Residual Number of obs: 30, groups: Batch, 6

74 Profile zeta plots.sig01.lsig (Intercept) 2 1 ζ log(σ) σ σ 2 log(σ 1) σ 1 σ ζ 0 ζ

75 Profile pairs plots (Intercept) lsig sig Scatter Plot Matrix

76 Summary Theory and practice are both important in statistical research and applications. They should not be regarded as either/or. Computing determines what is feasible in practice and what theory is relevant to practice. Fortunately computing capabilities become greater and greater with each passing year. Sparse matrix methods are a valuable tool in many situations treating large data sets. It is worthwhile learning how to use them. Linear mixed models can be formulated in a general way that allows for highly effective methods of determining the ML or REML estimates of the parameters. This formulation can be extended to GLMMs and NLMMs. For models fit to small to moderate data sets efficient methods allow for more effective evaluation of the precision of parameter estimates. Some current practices (quoting an estimate and standard error or a variance component) are

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Overview. Multilevel Models in R: Present & Future. What are multilevel models? What are multilevel models? (cont d)

Overview. Multilevel Models in R: Present & Future. What are multilevel models? What are multilevel models? (cont d) Overview Multilevel Models in R: Present & Future What are multilevel models? Linear mixed models Symbolic and numeric phases Generalized and nonlinear mixed models Examples Douglas M. Bates U of Wisconsin

More information

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates Madison January 11, 11 Contents 1 Definition of linear mixed models Definition of linear mixed models As previously

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating precision of estimates

Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating precision of estimates Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating precision of estimates Douglas Bates University of Wisconsin - Madison and R Development Core Team

More information

Generalized Linear and Nonlinear Mixed-Effects Models

Generalized Linear and Nonlinear Mixed-Effects Models Generalized Linear and Nonlinear Mixed-Effects Models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Potsdam August 8, 2008 Outline

More information

Penalized least squares versus generalized least squares representations of linear mixed models

Penalized least squares versus generalized least squares representations of linear mixed models Penalized least squares versus generalized least squares representations of linear mixed models Douglas Bates Department of Statistics University of Wisconsin Madison April 6, 2017 Abstract The methods

More information

Sparse Matrix Representations of Linear Mixed Models

Sparse Matrix Representations of Linear Mixed Models Sparse Matrix Representations of Linear Mixed Models Douglas Bates R Development Core Team Douglas.Bates@R-project.org June 18, 2004 Abstract We describe a representation of linear mixed-effects models

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 2011-03-16 Contents 1 Packages 1 2 Dyestuff 2 3 Mixed models 5 4 Penicillin 10 5 Pastes

More information

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects Douglas Bates University of Wisconsin - Madison and R Development Core Team

More information

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Outline. Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects. R packages. Accessing documentation

Outline. Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects. R packages. Accessing documentation Outline Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects Douglas Bates University of Wisconsin - Madison and R Development Core Team

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Linear mixed models and penalized least squares

Linear mixed models and penalized least squares Linear mixed models and penalized least squares Douglas M. Bates Department of Statistics, University of Wisconsin Madison Saikat DebRoy Department of Biostatistics, Harvard School of Public Health Abstract

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2007, Volume 20, Issue 2. http://www.jstatsoft.org/ Estimating the Multilevel Rasch Model: With the lme4 Package Harold Doran American Institutes for Research

More information

Fitting Mixed-Effects Models Using the lme4 Package in R

Fitting Mixed-Effects Models Using the lme4 Package in R Fitting Mixed-Effects Models Using the lme4 Package in R Douglas Bates University of Wisconsin - Madison and R Development Core Team International Meeting of the Psychometric

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace

More information

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates Department of Statistics University of Wisconsin - Madison Madison January 11, 2011

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Value Added Modeling

Value Added Modeling Value Added Modeling Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background for VAMs Recall from previous lectures

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Time Invariant Predictors in Longitudinal Models

Time Invariant Predictors in Longitudinal Models Time Invariant Predictors in Longitudinal Models Longitudinal Data Analysis Workshop Section 9 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Random and Mixed Effects Models - Part III

Random and Mixed Effects Models - Part III Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: Summary of building unconditional models for time Missing predictors in MLM Effects of time-invariant predictors Fixed, systematically varying,

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36 20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite

More information

36-463/663: Hierarchical Linear Models

36-463/663: Hierarchical Linear Models 36-463/663: Hierarchical Linear Models Lmer model selection and residuals Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline The London Schools Data (again!) A nice random-intercepts, random-slopes

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

More Accurately Analyze Complex Relationships

More Accurately Analyze Complex Relationships SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Generalized linear mixed models for biologists

Generalized linear mixed models for biologists Generalized linear mixed models for biologists McMaster University 7 May 2009 Outline 1 2 Outline 1 2 Coral protection by symbionts 10 Number of predation events Number of blocks 8 6 4 2 2 2 1 0 2 0 2

More information

Lecture 9 STK3100/4100

Lecture 9 STK3100/4100 Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Describing Within-Person Change over Time

Describing Within-Person Change over Time Describing Within-Person Change over Time Topics: Multilevel modeling notation and terminology Fixed and random effects of linear time Predicted variances and covariances from random slopes Dependency

More information

Mixed effects models

Mixed effects models Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Review of CLDP 944: Multilevel Models for Longitudinal Data

Review of CLDP 944: Multilevel Models for Longitudinal Data Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information