For more information about how to cite these materials visit
|
|
- Steven Elliott
- 6 years ago
- Views:
Transcription
1 Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. 1 / 48
2 Regression analysis with dependent data Kerby Shedden Department of Statistics, University of Michigan December 8, / 48
3 Clustered data Clustered data are sampled from a population that can be viewed as the union of a number of related subpopulations. Write the data as Y ij R, X ij R p, where i indexes the subpopulation and j indexes the individuals in the sample belonging to the i th subpopulation. 3 / 48
4 Clustered data It may happen that units from the same subpopulation are more alike than units from different subpopulations, i.e. cor(y ij, Y ij ) > cor(y ij, Y i j ). Part of the within-cluster similarity may be explained by X, i.e. units from the same subpopulation may have similar X values, which leads to similar Y values. In this case, cor(y ij, Y ij ) > cor(y ij, Y ij X ij, X ij ). Even after accounting for measured covariates, units in a cluster may still resemble each other more than units in different clusters: cor(y ij, Y ij X ) > cor(y ij, Y i j X ij, X i j ). 4 / 48
5 Clustered data A simple working model to account for this dependence is Ŷ ij = ˆθ i + ˆβ X ij. The idea here is that β X ij explains the variation in Y that is related to the measured covariates, and the θ i explain variation in Y that is related to the clustering. This working model would be correct if there were q omitted variables Z ijl, l = 1,..., q, that were constant for all units in the same cluster (i.e. Z ijl depends on l and i, but not on j). In that case, ˆθ i would stand in for the value of l ˆψ l Z ijl that we would have obtained if the Z ijl were observed. 5 / 48
6 Clustered data As an alternate notation, we can vectorize the data to express Y R n and X R n p+1, then write Ŷ = j ˆθ j I j + X ˆβ, where I j R n is the indicator of which subjects belong to cluster j. If the observed covariates in X are related to the clustering (i.e. if the columns of X and the I j are not orthogonal), then OLS apportions the overlapping variance between ˆβ and ˆθ. 6 / 48
7 Test score example Suppose Y ij is a reading test score for the j th student in the i th classroom, out of a large number of classrooms that are considered. Suppose X ij is the income of a student s family. We might postulate as a population model which can be fit as Y ij = θ i + β X ij + ɛ ij, Ŷ ij = ˆθ i + ˆβ X ij. 7 / 48
8 Test score example Ideally we would want the parameter estimates to reflect sources of variation as follows: Direct effects of parent income such as access to books, life experiences, good health care, etc. should go entirely to ˆβ. Attributes of classrooms that are not related to parent income, for example, the effect of an exceptionally good or bad teacher, should go entirely to the ˆθ i. Attributes of classrooms that are correlated with parent income, such as teacher salary, training, and resources, will be apportioned by OLS between ˆθ i and ˆβ. Unique events affecting particular individuals, such as the severe illness of the student or a family member, should go entirely to ɛ. 8 / 48
9 Other examples of clustered data Treatment outcomes for patients treated in various hospitals. Crime rates in police precincts distributed over a number of large cities (the precincts are the units and the cities are the clusters). Prices of stocks belonging to various business sectors. Surveys in which the data are collected following a cluster sampling approach. 9 / 48
10 What if we ignore the θ i? The θ i are usually not of primary interest, but we should be concerned that by failing to take account of the clustering, we may incorrectly assess the relationship between Y and X. If the θ i are nonzero, but we fail to include them in the model, the working model is misspecified. Let X be the design matrix without intercept, and let Q be the matrix of cluster indicators (which includes the intercept): Y = Y 11 Y 12 Y 21 Y 22 Y 23 Q = X = X 111 X 112 X 121 X 122 X 211 X 212 X 221 X 222 X 231 X / 48
11 What if we ignore the θ i? Let X = [1 n X ]. The estimate that results from regressing Y on X is E ˆβ = ( X X ) 1 X EY = ( X X ) 1 X (X β + Qθ). where ˆβ = ( ˆβ 0, ˆβ ). For the bias of ˆβ for β to be zero, we need E ˆβ β = ( X X ) 1 X (X β + Qθ X β) = 0, where β 0 = E ˆβ 0 and β = (β 0, β ). Thus we need 0 = X (X β + Qθ X β) = X (Qθ β 0 ). 11 / 48
12 What if we ignore the θ i? Let S = Qθ be the vector of cluster effects. Since the first column of X consists of 1 s, we have that S = β 0. For the other covariates, we have that X j S = β 0 X j 1 n, which implies that X j and S have zero sample covariance. This is unlikely in many studies, where people tend to cluster (in schools, hospitals, etc.) with other people having similar covariate levels. 12 / 48
13 Fixed effects analysis In a fixed effects approach, we model the θ i as regression parameters, by including additional columns in the design matrix whose covariate levels are the cluster indicators. As the sample size grows, in most applications the cluster size will stay constant (e.g. a primary school classroom might have up to 30 students). Thus the number of clusters must grow, so the dimension of the parameter vector θ grows. This is not typical in standard regression modeling. 13 / 48
14 Cluster effect examples What does the inclusion of fixed effects do to the parameter estimates ˆβ, which are often of primary interest? The following slides show scatterplots of an outcome Y against a scalar covariate X, in a setting where there are three clusters (indicated by color). Above each plot are the coefficient estimates, Z-scores, and r 2 values for fits in which the cluster effects are either included (top line) or excluded (second line). These estimates are obtained from the following two working models: Ŷ = ˆα + ˆβX + ˆθ j I j Ŷ = ˆα + ˆβX. The Z-scores are ˆβ/SD( ˆβ) and the r 2 values are cor(ŷ, Y ) / 48
15 Cluster effect examples Left: No effect of clusters or X ; X and clusters are independent; ˆβ 0 with and without cluster effects in model. Right: An effect of X, but no cluster effects; X and clusters are independent; ˆβ, Z, and r 2 are similar with and without cluster effects in model. ˆβ 1 = 0.07 Z = 0.49 r 2 =0.02 ˆβ 1s = 0.06 Z s = 0.44 r 2 s =0.00 ˆβ 1 =1.04 Z =7.75 r 2 =0.57 ˆβ 1s =1.01 Z s =7.81 r 2 s =0.56 Y Y X X 15 / 48
16 Cluster effect examples Left: Cluster effects, but no effect of X ; X and clusters are independent; ˆβ 0 with and without cluster effects in model. Right: An effect of clusters but no effect of X ; X and clusters are dependent; when cluster effects are not modeled their effect is picked up by X. ˆβ 1 = 0.03 Z = 0.85 r 2 =0.94 ˆβ 1s =0.00 Z s =0.02 r 2 s =0.00 ˆβ 1 =0.03 Z =0.19 r 2 =0.94 ˆβ 1s =0.92 Z s =20.72 r 2 s =0.90 Y Y X X 16 / 48
17 Cluster effect examples Left: Effects for clusters and for X ; X and clusters are dependent. Right: Effects for clusters but not for X ; X and clusters are dependent but the net cluster effect is not linearly related to X. ˆβ 1 =0.90 Z =5.97 r 2 =0.95 ˆβ 1s =0.96 Z s =29.26 r 2 s =0.95 ˆβ 1 = 0.04 Z = 0.31 r 2 =0.84 ˆβ 1s =0.00 Z s =0.02 r 2 s =0.00 Y Y X X 17 / 48
18 Cluster effect examples Left: Effects for clusters and for X ; X and clusters are dependent; the X effect has the opposite sign as the net cluster effect. Right: Effects for clusters and for X ; X and clusters are dependent; the signs and magnitudes of the X effects are cluster-specific. ˆβ 1 = 1.93 Z = r 2 =0.99 ˆβ 1s =2.72 Z s =16.81 r 2 s =0.85 ˆβ 1 = 0.52 Z = 1.70 r 2 =0.98 ˆβ 1s =2.92 Z s =8.20 r 2 s =0.58 Y Y X X 18 / 48
19 Implications of fixed effects analysis for observational data A stable confounder is a confounding factor that is approximately constant within clusters. A stable confounder will become part of the net cluster effect. If a stable confounder is correlated with an observed covariate X, then this will create non-orthogonality between the cluster effects and the effects of the observed covariates. 19 / 48
20 Implications of fixed effects analysis for experimental data Experiments are often carried out on batches of objects (specimens, parts, etc.) in which uncontrolled factors cause elements of the same batch to be more similar than elements of different batches. If treatments are assigned randomly within each batch, there are no stable confounders (in general there are no confounders in experiments). Therefore the overall OLS estimate of β is unbiased as long as the standard linear model assumptions hold. 20 / 48
21 Implications of fixed effects analysis for experimental data Suppose the design is balanced (e.g. exactly half of each batch is treated and half is untreated). This is an orthogonal design, so the estimate based on the working model Ŷ ij = ˆβX ij and the estimate based on the working model Ŷ ij = ˆθ i + ˆβ X ij are identical ( ˆβ = ˆβ ). Thus they have the same variance. But the estimated variance of ˆβ will be greater than the estimated variance of ˆβ (since the corresponding estimate of σ 2 is greater), so it will have lower power and wider confidence intervals. 21 / 48
22 Random cluster effects As we have seen, the cluster effects θ i can be treated as unobserved constants, and estimated as we would estimate any other regression coefficient. An alternative way to handle cluster effects is to view the θ i as unobserved (latent) random variables. Now we have two random variables in the model: θ i and ɛ ij. These are usually taken to be independent of each other. If the θ i are independent and identically distributed, we can combine them with the error terms to get a single random error term per observation: ɛ ij = θ i + ɛ ij. 22 / 48
23 Random cluster effects Let Y i = (Y i1,..., Y ini ) denote the vector of responses in the i th cluster, let X i denote the matrix of predictor variables for the i th cluster, and let ɛ i denote the vector of errors (ɛ ij ) for the i th cluster. Thus we have the model Y i = X i β + ɛ i, for clusters i = 1,..., m. The ɛ i are taken to be uncorrelated between clusters, i.e. for i i. cov(ɛ i, ɛ i ) = 0 23 / 48
24 Random cluster effects The structure of the covariance matrix is S i cov(ɛ i X i ) S i = σ 2 + σθ 2 σθ 2 σθ 2 σθ 2 σ 2 + σθ 2 σθ 2 σθ 2 σθ 2 σ 2 + σθ 2 = σ2 I + σθ 2 11T, where var(ɛ ij ) = σ 2 and var(θ i ) = σ 2 θ. 24 / 48
25 Generalized Least Squares Suppose we have a linear model with mean structure E[Y X ] = X β for Y R n, X R n p+1, and β R p+1, and variance structure var(y X ) Σ, where Σ is a given n n matrix. We can factor the covariance matrix as Σ = GG, and consider the transformed model G 1 Y = G 1 X β + G 1 ɛ. Then letting η G 1 ɛ, it follows that cov(η) = I, and note that the slope vector β of the transformed model is identical to the slope vector of the original model. 25 / 48
26 Generalized Least Squares We can use GLS to analyze clustered data with random cluster effects. Let Yi = S 1/2 i Y i, ɛ i = S 1/2 i ɛ i, and Xi = S 1/2 i X i. Let Y, X, and ɛ be Yi, X i, and ɛ i, respectively, stacked over i. 26 / 48
27 Generalized Least Squares Since cov(ɛ i ) I, the OLS estimate of β for the model Y = X β + ɛ is the best estimate of β that is linear in Y (by the Gauss-Markov theorem). Since the set of linear estimates based on Y is the same as the set of linear estimates based on Y, it follows that the GLS estimate of β based on Y is the best unbiased estimate of β that is linear in Y. 27 / 48
28 Generalized Least Squares The covariance matrix of ˆβ in GLS is (X T X ) 1 = (X T Σ 1 X ) 1 Note that there is no σ 2 term on the left because we have transformed the error distribution to have covariance matrix I. To apply GLS, it is only necessary to know Σ up to a multiplicative constant. The same estimated slopes ˆβ are obtained if we decorrelate with Σ or with kσ for k > 0. However we need to estimate σ 2 and use ˆσ 2 (X T Σ 1 X ) 1 for inference when Σ is only correct up to a scalar multiple. 28 / 48
29 Generalized Least Squares Since the sampling covariance matrix for GLS is proportional to (X T Σ 1 X ) 1, the parameter estimates are uncorrelated with each other if and only if X T Σ 1 X is diagonal that is, if the columns of X are mutually orthogonal with respect to the Mahalanobis metric defined by Σ. This is related to the fact that Ŷ in GLS is the projection of Y onto col(x ) in the Mahalanobis metric defined by Σ. This generalizes the fact that Ŷ in OLS is the projection of Y onto col(x ) in the Euclidean metric. 29 / 48
30 Least squares with mis-specified covariance What if we perform GLS using a working covariance S that is incorrect? For example, what happens if we use OLS when the actual covariance matrix of the errors is Σ I? Since E[ɛ X ] = E[ɛ X ] = 0, the estimate ˆβ remains unbiased. However it has two problems: it may not be the BLUE (i.e. it may not have the least variance among unbiased estimates), and the usual linear model inference procedures will be wrong. 30 / 48
31 Least squares with mis-specified covariance The sampling covariance when the error structure is mis-specified is given by the sandwich expression: where cov( ˆβ) = (X X ) 1 X Σ ɛ X (X X ) 1 Σ ɛ = cov(ɛ X ). This result covers two situations: (i) we use OLS, so X = X and Σ ɛ = Σ ɛ is the covariance matrix of the errors, and (ii) we use a working covariance to define the decorrelating matrix G, and this working covariance is not equal to Σ. 31 / 48
32 Iterated GLS In practice, we will generally not know Σ, so it must be estimated from the data. Σ is the covariance matrix of the errors, so we can estimate Σ from the residuals. Typically a low-dimensional parameteric model Σ = Σ(α) is used for the error covariance. For a given error model the fitting algorithm alternates between estimating α and estimating the regression slopes β. 32 / 48
33 Iterated GLS For example, suppose our working covariance has the form S i (j, j) = ν S i (j, k) = rν (j k). This is an exchangeable model with intraclass correlation coefficient (ICC) r between two observations in the same cluster. There are several closely-related ICC estimates. One approach is based on the standardized residuals R s ij : RijR s s ij /( n i (n i 1)/2 p 1) i j j i where n i is the size of the i th cluster. 33 / 48
34 Iterated GLS Another common model for the error covariance is the first order autoregressive (AR-1) model: where α < 1 is a parameter. Σ ij = α i j, There are several possible estimates of α borrowing ideas from time series analysis. 34 / 48
35 Generalized Least Squares and stable confounders For GLS to give unbiased estimates of β (that also have minimum variance due to the Gauss-Markov theorem), we must have E[ɛ ij X ] = E[θ i + ɛ ij X ] = 0. Since E[ɛ ij X ] = 0, this is equivalent to requiring that E[θ i X ] = 0. Thus if the X covariates have distinct within-cluster mean values, and the within-cluster mean values of X are correlated with the θ i, then the GLS estimate of β will be biased. 35 / 48
36 Likelihood inference for the random intercepts model The random intercepts model Y ij = θ i + β X ij + ɛ ij, can be analyzed using a likelihood approach, using: A random intercepts density φ(θ X ) A density for the data given the random intercept: f (Y X, θ) 36 / 48
37 Likelihood inference for the random intercepts model The marginal model for the observed data is obtained by integrating out the random intercepts f (Y X ) = f (Y X, θ)φ(θ X )dθ. The parameters in f (Y X ) are all parameters in f (Y X, θ), and all parameters in φ(θ X ). Typically, f (Y X, θ) is parameterized in terms of β and σ 2, and φ(θ X ) is parameterized in terms of a scale parameter σ θ. 37 / 48
38 Likelihood inference for the random intercepts model The first two moments of the marginal distribution Y ij X can be calculated explicitly. Let µ θ E(θ X ) and σθ 2 = var(θ X ). The moments are E(Y ij X ) = µ θ + β X ij var(y ij X ) = σ 2 θ + σ2, cov(y ij1, Y ij2 X ) = σ 2 θ (j 1 j 2 ). 38 / 48
39 Likelihood inference for the random intercepts model Suppose ɛ X N(0, σ 2 ) θ X N(µ θ, σ 2 θ ). In this case Y X is Gaussian, with mean and variance as given above. Thus the random intercept model can be equivalently written in marginal form as where ɛ ij = θ i µ θ + ɛ ij. Y ij = µ θ + β X ij + ɛ ij It follows that E(ɛ ij X ) = 0, var(ɛ ij X ) = σ2 θ + σ2, and the ɛ ij values have correlation coefficient σθ 2/(σ2 + σθ 2 ) within clusters. 39 / 48
40 Likelihood computation Maximum likelihood estimates for the model can be calculated using a gradient-based optimization procedure, or the EM algorithm. Asymptotic standard errors can be obtained from the inverse of Fisher information, and likelihood ratio tests can be used to compare nested models. The parameter estimates from iterated GLS and the MLE for the random intercepts model will be similar, and both are consistent under similar conditions. 40 / 48
41 Predicting the random intercepts Since the model is fit by optimizing the marginal distribution, we obtain an estimate of σ 2 θ = var(θ i), but we don t automatically learn anything about the individual θ i. If there is an interest in the individual θ i values, we can predict them using the best linear unbiased predictor (BLUP): E ˆβ,ˆσ 2,ˆσ 2 θ,ˆµ θ (θ Y, X ) = ˆµ θ + ĉov(θ, Y )ĉov(y X ) 1 (Y Ê[Y X ]) 41 / 48
42 Predicting the random intercepts The estimated second moments needed to clculate the BLUP are: and ĉov(θ i, Y i ) = ˆσ 2 θ 1 n i ĉov(y i X ) = ˆσ 2 I + ˆσ 2 θ 11T. where n i is the size of the i th group, ĉov(θ i, Y j ) = 0 for i j, and ĉov(y X ) is block-diagonal with blocks given by ĉov(y i X ). 42 / 48
43 Predicting the random intercepts Note that once the parameters are estimated, the BLUP for θ i only depends on the data in cluster i: E ˆβ,ˆσ 2,ˆσ 2 θ,ˆµ θ (θ Y, X ) = ˆµ θ + ĉov(θ i, Y i )ĉov(y i X ) 1 (Y i Ê[Y i X ]) ĉov(y i X ) 1 = ˆσ 2 (I ˆσ 2 θ 11T /(ˆσ 2 + n i ˆσ 2 θ )) For a given set of parameter values, the BLUP is a linear function of the data. For the i th cluster, E ˆβ,ˆσ 2,ˆσ 2 θ,ˆµ θ (θ i Y, X ) = ˆµ θ +n i σ 2 θ /(ˆσ2 +n i ˆσ 2 θ ) 1T (Y i Ê[Y i X ])/n i, 43 / 48
44 Predicting the random intercepts In the BLUP for θ i, this term is the mean of the residuals: and it is shrunk by this factor: 1 T (Y i Ê[Y i X ])/n i n i σ 2 θ /(ˆσ2 + n i ˆσ 2 θ ) This shrinkage allows us to interpret the random intercepts model as a partially pooled model that is intermediate between the fixed effects model and the model that completely ignores the clusters. 44 / 48
45 Predicting the random intercepts In the fixed effects model, the parameter estimates θ i are unbiased, but they are overdispersed, meaning that the sample variance of the ˆθ i will generally be greater than σ 2 θ. In the random intercepts model, the variance parameter σ 2 θ is estimated with low bias, and the BLUP s of the θ i are shrunk toward zero. 45 / 48
46 Random slopes Suppose we are interested in a measured covariate Z whose effect may vary by cluster. We might start with the model Y i = X i β + γz i + ɛ, so if Z i {0, 1} n i is a vector of treatment indicators (1 for treated subjects, 0 for untreated subjects), then γ is the average change associated with being treated (the population treatment effect). In many cases, it is reasonable to consider the possibility that different clusters may have different treatment effects that is, different clusters have different γ values. In this case we can let γ i be the response for cluster i. 46 / 48
47 Random slopes Suppose we model the random slopes γ i as being Gaussian (given X ) with expected value γ 0 and variance σ 2 γ. The marginal model is and E[Y i X i, Z i ] = X i β + Z i γ 0 cov[y i X i, Z i ] = σ 2 γz i Z i + σ 2 I. Again we can form a BLUP E ˆβ,ˆσ 2 γ ˆσ2 [γ i Y, X, Z], and the BLUP is a shrunken version of the fixed effects model in which we regress Y on X and Z within every cluster. 47 / 48
48 Linear mixed effects models The random intercept and random slope model are special cases of the linear mixed model: Y = X β + Zγ + ɛ X is a n p design matrix and Z is a n q design matrix; γ is a random vector with mean 0 and covariance matrix Ψ, ɛ is a random vector with mean 0 and covariance matrix σ 2 I. 48 / 48
For more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationSTA 2101/442 Assignment Four 1
STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationEconometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationGeneralized Estimating Equations (gee) for glm type data
Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationwhere x and ȳ are the sample means of x 1,, x n
y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationGLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22
GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical
More information9. Least squares data fitting
L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
More informationStat 135 Fall 2013 FINAL EXAM December 18, 2013
Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationEC402 - Problem Set 3
EC402 - Problem Set 3 Konrad Burchardi 11th of February 2009 Introduction Today we will - briefly talk about the Conditional Expectation Function and - lengthily talk about Fixed Effects: How do we calculate
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationAnalysis of variance and regression. December 4, 2007
Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed
More informationPanel Data Models. James L. Powell Department of Economics University of California, Berkeley
Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationBiostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE
Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationISQS 5349 Spring 2013 Final Exam
ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationSTA 302f16 Assignment Five 1
STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationKriging models with Gaussian processes - covariance function estimation and impact of spatial sampling
Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department
More informationCorrelation in Linear Regression
Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationModel Checking and Improvement
Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number
More informationQuantitative Economics for the Evaluation of the European Policy
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationRecap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1
Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationEcon 583 Final Exam Fall 2008
Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information