Sparse Matrix Methods and Mixed-effects Models
|
|
- Matilda Harrison
- 6 years ago
- Views:
Transcription
1 Sparse Matrix Methods and Mixed-effects Models Douglas Bates University of Wisconsin Madison
2 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
3 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
4 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
5 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
6 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
7 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
8 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.
9 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.
10 Theory and Practice of Statistics We celebrate the 50th anniversary of the founding of our department this summer. From its inception our department has had as its goal developing excellence in both the theory and the practice of statistics, and fostering the interaction of theory and practice. Involvement in the practice of statistics, even if only by taking the required course on statistical consulting, provides a grounding in real problems addressed by real clients with their, inevitably messy, real data. Knowledge of theory grounds the practice of statistics in a solid framework. It isn t enough to fit models and report estimates and p-values. We should also assess the validity of the assumptions in the model. You can t do this if you don t know what the model is.
11 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)
12 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)
13 Why Consider the Interplay of Theory and Practice? Deriving properties of models without reference to data is sterile because All models are wrong; some models are useful. (G.E.P. Box) Ideally the model is derived from properties of the data. The opposite approach: posit a model, derive its properties and then go looking for some data that follow such a model is not likely to prove useful. Conversely, deriving parameter estimates without assessing, or sometimes even knowing, the model is a risky practice. > fortune("provocatively") To paraphrase provocatively, machine learning is statistics minus any checking of models and assumptions. -- Brian D. Ripley (about the difference between machine learning and statistics) user! 2004, Vienna (May 2004)
14 What Positive Role Does Computing Play? Computing is an integral part of essentially all applications of statistics. It gives us the ability to explore complex models applied to large data sets with complex structure. In terms of theory, computing is widely used in simulation studies. Also, it provides a grounding (or should) for proposed methods. We have powerful computers but not infinitely powerful and not with an infinite amount of storage.
15 What Positive Role Does Computing Play? Computing is an integral part of essentially all applications of statistics. It gives us the ability to explore complex models applied to large data sets with complex structure. In terms of theory, computing is widely used in simulation studies. Also, it provides a grounding (or should) for proposed methods. We have powerful computers but not infinitely powerful and not with an infinite amount of storage.
16 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.
17 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.
18 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.
19 What Negative Role Does Computing Play? If you define the extent of statistical analysis by the capabilities of available software, you tend to shoehorn the data into a prefabricated model. The noted linguist, Benjamin Lee Whorf, observed, Language shapes the way we think, and determines what we can think about. This is true not only for natural languages but also for computing languages. E.g. Not long ago many people believed that Applied statistics is the use of SAS. More to the point for this discussion, many people believe that linear mixed models must be hierarchical models, even when the data are not hierarchical.
20 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.
21 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.
22 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.
23 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.
24 Combining Practice, Theory and Computing To the extent possible, the methodology should encompass the characteristics of data encountered in real world situations. (Large data sets with complex structures such as non-nested random effects. Unbalanced data is almost a given.) Methodology should have a firm theoretical basis. Before you fit a model you should be able to write it. Before you write code to produce some estimates, you should be able to describe the criterion optimized by the estimates, even if you are optimizing it indirectly. Theory should teach us that there isn t the formula or the representation. Often there are many representations of the same problem. Take advantage of this. Computational methods should be reliable, robust and based on stable calculations. They should handle the edge cases. As Kernighan and Plauger state, Your code should be able to do nothing gracefully when appropriate.
25 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
26 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).
27 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).
28 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).
29 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).
30 Data challenges As already stated, we find more and more that we encounter large, unbalanced data sets with complex structure. E.g. Look at the data for M.S. Exam problems from 20 years ago, 10 years ago and today. For mixed-effects models one current challenge is the analysis of annual test scores on students in grades 3 to 8 mandated by the No Child Left Behind act. These are longitudinal data, grouped by student, school and district. The act also mandates relating the scores to demographic variables (sex, race/ethnicity, socioeconomic status). Many states currently do not record information on teachers. That will change because of the Race to the Top program. Models with random effects associated with student, teacher, school and district will not be hierarchical (also called multilevel ). The random effects will be partially crossed (i.e. neither nested nor fully crossed).
31 The Wisconsin Knowledge and Concepts Exam (WKCE) > str(wkce) $ sch : Factor w/ 2411 levels "6","7","16","56",..: data.frame : obs. of 14 variables: $ LDS : Factor w/ levels "10000","10001",..: $ Yr : Ord.factor w/ 4 levels " "<" "<..: $ Gr : Ord.factor w/ 7 levels "3"<"4"<"5"<"6"<..: $ Rss : int $ Mss : int $ dist: Factor w/ 445 levels "7","14","63",..: $ Sx : Factor w/ 2 levels "F","M": $ Ra : Factor w/ 5 levels "W","B","H","A",..: $ Dis : Factor w/ 2 levels "N","Y": $ ELP : Factor w/ 2 levels "N","Y": $ EC : Factor w/ 2 levels "N","Y": > xtabs(~ Yr + Gr, WKCE) Gr Yr
32 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
33 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
34 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
35 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
36 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
37 How Do They Get the Scaled Scores The scores on a given test (about 50 questions) are not a simple tally of the number of correct answers. They are calculated according to pattern scoring determined by an Item Response Theory (IRT) model. The student abilities are assumed to be a sample from a distribution and the question difficulties are also a sample. A reasonable model would be a generalized linear mixed model (binary response) with (crossed) random effects for student and for question. IRT models often go further and incorporate discrimination parameters for the questions and even per-question baseline probability of correct (i.e. a guessing parameter). Building this into a mixed model would result in a generalized nonlinear mixed model with crossed random effects. In practice IRT models are fit by ad hoc methodologies with poor theoretical basis and producing really weird results.
38 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
39 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.
40 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.
41 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.
42 Formulating Linear Mixed-effects Models (LMMs) We see linear mixed-effects models specified in many different ways. Usually we first see the subscript fest formulation y ijklmn... = µ + α i + β j + b k +... which is not generalizable and which confounds many aspects of the model. Later we may see a vector representation with model matrices y = Xβ + Zb + ɛ ɛ N (0, σ 2 I), b N (0, Σ), b ɛ but this too confounds aspects of the model. Writing a linear model as Xβ + ɛ separates the mean (the linear predictor) from the variance, ɛ N (0, σ 2 I). In a mixed model is Zb part of the mean or part of the variance? In a generalized linear model or generalized linear mixed model (GLMM) you can t separate the mean and the variance.
43 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.
44 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.
45 My Current Formulation of LMMs It helps to follow the advice we give introductory students and distinguish between a random variable (Y or B) and its value (y or b). We observe y but not b. The probability model specifies the conditional distribution (Y B = b) N ( Zb + Xβ, σ 2 I n ) and the unconditional distribution B N (0, Σ), where X and Z are n p and n q model matrices, σ 0 and Σ is a positive semidefinite q q variance-covariance matrix determined by the variance-component parameters. Note the emphasis on semidefinite. σ = 0 does not occur in practice but singular Σ does. That is, estimates of zero for variance components can and do occur. During the course of numerical optimization it is common to want to evaluate likelihoods on the boundary of the parameter space.
46 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.
47 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.
48 Expressing Σ Model specification in lmer produces a parameterization that generates Σ through a q q general Λ θ as Σ = σ 2 Λ θ Λ T θ where θ is the variance-component parameter. For models with simple scalar random effects elements of θ are ratios of standard deviations, θ i = σ i /σ subject to θ i 0, and Λ θ is block-diagonal with blocks of the form θ i I. Let U N (0, σ 2 I q ) be the spherical random effects for which B = Λ θ U (Y U = u) N (Xβ + ZΛ θ u, σ 2 I n ) It is easy to verify that B has the desired properties. Note that the transformation from U to B is well-defined, even when Λ θ is singular.
49 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
50 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.
51 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.
52 Densities of U and Y The joint density f U,Y (u, y) = f U (u)f Y U (y u) on the deviance scale (negative twice the log density) is (n + q) log(2πσ 2 ) + y Xβ ZΛ θu 2 + u 2 σ 2 Evaluated at the observed y this expression (on the density scale) provides the unnormalized conditional density, h(u y), of (U Y = y). The normalizing constant, h(u y) du, is the likelihood of the parameters, θ, β and σ given y.
53 Evaluating the likelihood For a LMM there are several different expressions for the likelihood. One method that allows for generalization to other forms of mixed models is first to determine the conditional mode, ũ = arg max u and expand h at ũ. f U Y(u y) = arg max h(u y) u = arg min u y Xβ ZΛ θ u 2 + u 2 Because we are dealing with multivariate Gaussians the log conditional density will be exactly quadratic. For other types of models it will be approximately quadratic.
54 Evaluating the likelihood For a LMM there are several different expressions for the likelihood. One method that allows for generalization to other forms of mixed models is first to determine the conditional mode, ũ = arg max u and expand h at ũ. f U Y(u y) = arg max h(u y) u = arg min u y Xβ ZΛ θ u 2 + u 2 Because we are dealing with multivariate Gaussians the log conditional density will be exactly quadratic. For other types of models it will be approximately quadratic.
55 Solving the Penalized Least Squares Problem The point of all this development is that we can solve the penalized least squares problem, which could be rewritten ũ = arg min u [ y Xβ 0 ] [ ] ZΛθ u with solution satisfying ( ) Λ T θ ZT ZΛ θ + I ũ = Λ T θ ZT y I q 2 We do this by forming the sparse Cholesky factor, which is a sparse lower triangular matrix, L θ, satisfying L θ L T θ = ΛT Z T ZΛ + I
56 Solving the Penalized Least Squares Problem The point of all this development is that we can solve the penalized least squares problem, which could be rewritten ũ = arg min u [ y Xβ 0 ] [ ] ZΛθ u with solution satisfying ( ) Λ T θ ZT ZΛ θ + I ũ = Λ T θ ZT y I q 2 We do this by forming the sparse Cholesky factor, which is a sparse lower triangular matrix, L θ, satisfying L θ L T θ = ΛT Z T ZΛ + I
57 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.
58 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.
59 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.
60 Practical Aspects of the Sparse Cholesky Decomposition When working with very large matrices, one must be careful of how certain computations are performed. Sparse matrix methods are usually performed in two phases: a symbolic phase in which the positions of the non-zeros are determined and a numeric phase in which the numerical values of these non-zeros are calculated. During optimization of the likelihood we will need to evaluate L θ for many different values of θ. The symbolic phase does not need to be repeated, only the numeric phase. Part of the symbolic phase is determining a fill-reducing permutation, represented by the q q permutation matrix P. The actual decomposition used is ) L θ L T θ (Λ = P T Z T ZΛ + I P T Incorporating P does not affect the theory. It can profoundly affect time and storage requirements.
61 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n
62 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n
63 Expansion at ũ The penalized residual sum of squares (PRSS) can now be written y Xβ ZΛ θ u 2 + u 2 = rθ,β 2 + LT θ (u ũ) 2 where r 2 (θ, β) is the PRSS at ũ. A simple change of variable can be used to evaluate the likelihood. On the deviance scale it is 2l(θ, β, σ) = n log(2πσ 2 ) + log( L θ 2 ) + r2 θ,β σ 2. Notice that β only affects rθ,β 2. If we minimize the PRSS simultaneously w.r.t u and β, producing rθ 2, and plug in the conditional estimate of σ we obtain the profiled deviance [ ( 2πr d(θ y) = log( L θ 2 2 )] ) + n 1 + log θ. n
64 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.
65 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.
66 More practical aspects Because L θ is triangular its determinant is the product of its diagonal elements. When minimizing the PRSS w.r.t. u and β we write the system to be solved as [ P Λ T θ Z T ZΛ θ P T + I P Λ T θ ZT X X T ZΛ θ P T X T X and calculate the (left) Cholesky factor as [ ] Lθ 0. R T ZX RT X ] [ ũ β θ ] = [ ] T Λθ Z y X These are almost Henderson s mixed-model equations but with two important differences: the system is stable as Λ θ becomes singular and we decompose the part depending on Z first.
67 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model Formulation Evaluating the deviance Other Criteria and Models
68 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.
69 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.
70 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.
71 Other criteria and model formulations The profiled REML criterion has a similar form [ ( 2πr d R (θ y) = log( L θ 2 R X 2 2 )] ) + (n p) 1 + log θ. n p For a generalized linear mixed model (GLMM) the conditional mode, ũ, is determined by penalized iteratively reweighted least squares (PIRLS). For a nonlinear mixed model (NLMM) ũ is determined by penalized nonlinear least squares (PNLS). The Laplace approximation to the deviance for a GLMM or NLMM has a similar expression to that for the LMM. The log( L θ 2 ) term depends on β and ũ but usually this dependence is weak. For some models (those with a single grouping factor or with shallowly nested grouping factors for the random effects) a further refinement is available using adaptive Gauss-Hermite quadrature.
72 Benefits for Moderate-sized Data Sets Being able to evaluate and optimize the deviance rapidly is beneficial for small or moderate-sized data sets too. We can profile the deviance w.r.t. individual parameters. The change in the profiled deviance from the optimum is a LRT statistic on 1 degree of freedom. For inferences based on estimates and standard errors only to be reliable the profiled deviance should be quadratic and the signed square-root, written ζ, should be linear.
73 Data from Davies (1947) collected by G.E.P. Box E C Batch B A D F Yield of dyestuff (grams of standard color) Linear mixed model fit by maximum likelihood Formula: Yield ~ 1 + (1 Batch) Data: Dyestuff AIC BIC loglik deviance Random effects: Groups Name Variance Std.Dev. Batch (Intercept) Residual Number of obs: 30, groups: Batch, 6
74 Profile zeta plots.sig01.lsig (Intercept) 2 1 ζ log(σ) σ σ 2 log(σ 1) σ 1 σ ζ 0 ζ
75 Profile pairs plots (Intercept) lsig sig Scatter Plot Matrix
76 Summary Theory and practice are both important in statistical research and applications. They should not be regarded as either/or. Computing determines what is feasible in practice and what theory is relevant to practice. Fortunately computing capabilities become greater and greater with each passing year. Sparse matrix methods are a valuable tool in many situations treating large data sets. It is worthwhile learning how to use them. Linear mixed models can be formulated in a general way that allows for highly effective methods of determining the ML or REML estimates of the parameters. This formulation can be extended to GLMMs and NLMMs. For models fit to small to moderate data sets efficient methods allow for more effective evaluation of the precision of parameter estimates. Some current practices (quoting an estimate and standard error or a variance component) are
Mixed models in R using the lme4 package Part 4: Theory of linear mixed models
Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More informationOutline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs
Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationOverview. Multilevel Models in R: Present & Future. What are multilevel models? What are multilevel models? (cont d)
Overview Multilevel Models in R: Present & Future What are multilevel models? Linear mixed models Symbolic and numeric phases Generalized and nonlinear mixed models Examples Douglas M. Bates U of Wisconsin
More informationMixed models in R using the lme4 package Part 4: Theory of linear mixed models
Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates Madison January 11, 11 Contents 1 Definition of linear mixed models Definition of linear mixed models As previously
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More informationMixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating precision of estimates
Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating precision of estimates Douglas Bates University of Wisconsin - Madison and R Development Core Team
More informationGeneralized Linear and Nonlinear Mixed-Effects Models
Generalized Linear and Nonlinear Mixed-Effects Models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Potsdam August 8, 2008 Outline
More informationPenalized least squares versus generalized least squares representations of linear mixed models
Penalized least squares versus generalized least squares representations of linear mixed models Douglas Bates Department of Statistics University of Wisconsin Madison April 6, 2017 Abstract The methods
More informationSparse Matrix Representations of Linear Mixed Models
Sparse Matrix Representations of Linear Mixed Models Douglas Bates R Development Core Team Douglas.Bates@R-project.org June 18, 2004 Abstract We describe a representation of linear mixed-effects models
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationA brief introduction to mixed models
A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.
More informationMixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects
Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 2011-03-16 Contents 1 Packages 1 2 Dyestuff 2 3 Mixed models 5 4 Penicillin 10 5 Pastes
More informationMixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects
Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects Douglas Bates University of Wisconsin - Madison and R Development Core Team
More informationMixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random
Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis
More informationOutline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data
Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison
More informationOutline. Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects. R packages. Accessing documentation
Outline Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects Douglas Bates University of Wisconsin - Madison and R Development Core Team
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationLinear mixed models and penalized least squares
Linear mixed models and penalized least squares Douglas M. Bates Department of Statistics, University of Wisconsin Madison Saikat DebRoy Department of Biostatistics, Harvard School of Public Health Abstract
More informationJournal of Statistical Software
JSS Journal of Statistical Software April 2007, Volume 20, Issue 2. http://www.jstatsoft.org/ Estimating the Multilevel Rasch Model: With the lme4 Package Harold Doran American Institutes for Research
More informationFitting Mixed-Effects Models Using the lme4 Package in R
Fitting Mixed-Effects Models Using the lme4 Package in R Douglas Bates University of Wisconsin - Madison and R Development Core Team International Meeting of the Psychometric
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationA (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data
A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationOutline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model
Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace
More informationEstimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters
Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors
More informationMixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions
Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates Department of Statistics University of Wisconsin - Madison Madison January 11, 2011
More informationIntroduction and Background to Multilevel Analysis
Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and
More informationValue Added Modeling
Value Added Modeling Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background for VAMs Recall from previous lectures
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationLinear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.
Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationTime Invariant Predictors in Longitudinal Models
Time Invariant Predictors in Longitudinal Models Longitudinal Data Analysis Workshop Section 9 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRandom Intercept Models
Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationRandom and Mixed Effects Models - Part III
Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Topics: Summary of building unconditional models for time Missing predictors in MLM Effects of time-invariant predictors Fixed, systematically varying,
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationIntroduction to Random Effects of Time and Model Estimation
Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =
More information20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36
20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite
More information36-463/663: Hierarchical Linear Models
36-463/663: Hierarchical Linear Models Lmer model selection and residuals Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline The London Schools Data (again!) A nice random-intercepts, random-slopes
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationAn R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM
An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationIntroduction to Within-Person Analysis and RM ANOVA
Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides
More informationAn Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012
An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSome general observations.
Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationMLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017
MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationOutline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form
Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed
More informationRegression tree-based diagnostics for linear multilevel models
Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many
More informationA multivariate multilevel model for the analysis of TIMMS & PIRLS data
A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationMore Accurately Analyze Complex Relationships
SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationGeneralized linear mixed models for biologists
Generalized linear mixed models for biologists McMaster University 7 May 2009 Outline 1 2 Outline 1 2 Coral protection by symbionts 10 Number of predation events Number of blocks 8 6 4 2 2 2 1 0 2 0 2
More informationLecture 9 STK3100/4100
Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationDescribing Within-Person Change over Time
Describing Within-Person Change over Time Topics: Multilevel modeling notation and terminology Fixed and random effects of linear time Predicted variances and covariances from random slopes Dependency
More informationMixed effects models
Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research
More informationThe performance of estimation methods for generalized linear mixed models
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear
More informationExploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement
Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis
More informationReview of CLDP 944: Multilevel Models for Longitudinal Data
Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationDescribing Within-Person Fluctuation over Time using Alternative Covariance Structures
Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More information36-463/663: Multilevel & Hierarchical Models
36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized
More information