Module 2. General Linear Model

Size: px
Start display at page:

Download "Module 2. General Linear Model"

Transcription

1 D.G. Bonett (9/018) Module General Linear Model The relation between one response variable (y) and q 1 predictor variables (x 1, x,, x q ) for one randomly selected person can be represented by the following general linear model (GLM) y i = β 0 + β 1 x 1i + β x i + + β q x qi + e i where y i is the response variable score for person i, and x ji is the score for person i on predictor variable j. The value β 0 + β 1 x 1i + β x i + + β q x qi is the predicted y score for person i, and e i = y i (β 0 + β 1 x 1i + β x i + + β q x qi ) is the prediction error for person i. The coefficients β 0, β 1,, β q are the unknown population parameters to be estimated from sample data. The coefficient β 0 is the y-intercept and β 1,, β q are the slope coefficients. Each predictor variable in a GLM can be fixed or random. Fixed predictor variables (factors) have a predetermined number of values (levels). Random predictor variables are always quantitative, but fixed predictor variables can be quantitative (e.g., 10 hours, 0 hours, or 30 hours of training; 0, 1,, or 3 siblings) or qualitative (e.g., treatment A, treatment B, or treatment C; Democrat, Republican, or Independent). The fixed predictor variables can be treatment factors with levels to which participants are randomly assigned (e.g., Treatment A, Treatment B, or Treatment C; 10 hours, 0 hours, or 30 hours of training) or classification factors with levels that represent existing characteristics of the study population (e.g., 0, 1,, or 3 siblings; Democrat, Republican, or Independent). In Module 1, we saw how a -level qualitative predictor variable (e.g., male/female) could be included in the model by dummy coding the -level predictor variable. To model a qualitative predictor variable with k levels, the qualitative predictor variable must be converted into k 1 indicators variables as will be explained later. A GLM where all predictor variables are indicator variables is called an analysis of variance (ANOVA) model; a GLM that has only quantitative predictor variables is called a multiple regression model, and a GLM that has indicator variables and quantitative predictor variables is called an analysis of covariance (ANCOVA) model. 1

2 D.G. Bonett (9/018) Interpreting Slope Coefficients in Nonexperimental Designs In experiments with a quantitative treatment factor (x j ), β j describes the change in the population mean of y that will be caused by any 1-point increase in x j (within the range of x values used in the experiment). In nonexperimental designs with fixed or random quantitative predictor variables, the slope coefficient for x j describes the change in y associated with a 1-point increase in x j, controlling for the other q 1 predictor variables. The phrase controlling for requires explanation. Consider the following model with two predictor variables y i = β 0 + β 1 x 1i + β x i + e i Now consider a model where x 1 is treated as a response variable, x is treated as a predictor variable, and e x1 is the resulting prediction error. The e x1 prediction error reflects the aspect of x 1 that is not linearly associated with x. It can be shown that β 1 in the above model is equal to β 1 in the following model y i = β 0 + β 1 e x1i + β x i + e i In general, each β j in a GLM describes the relation between y and e xj where e xj represents the part of x j that is not linearly related to any of the other predictor variables, and it is in this sense that β j describes the association between x j and y controlling for all other predictor variables in the model. In applications where two or more predictor variables are measuring similar attributes, β j may be very difficult to interpret. For example, if x 1 = depression and x = psychological well-being, e x1 represents some component of depression that is unrelated to psychological well-being and that component could be very difficult to explain. However, in some applications, β j will become more meaningful when certain predictor variables are added to the model. For example, if x 1 is a measure of spatial ability that involves complicated verbal instructions and x is a measure of reading comprehension, β 1 describes the relation between y and the component of the spatial ability measure that is unrelated to reading comprehension (e x1 ). In this application, e x1 could represent a more pure measure of spatial ability than x 1. In applications where the relation between y and x 1 is confounded with demographic variables such as age, gender, or ethnicity, controlling for these demographic variables usually provides a more meaningful interpretation of β 1. For example, if an indicator variable for gender is added to a model where y is

3 D.G. Bonett (9/018) predicted by x 1, β 1 then describes the slope of the line relating x 1 to y within each gender category. The interpretation of β j will also be simple if the correlations between x j and all other predictor variables are small because then e xj will be essentially the same as x j. It is a common mistake to interpret β j as a description of the relation between y and x j when x j has a moderate or large correlation with the other predictor variables. Confounding Variables The value of the slope coefficient for a specific predictor variable in a GLM can change substantially if a confounding variable is added to the model. A confounding variable is a variable that is correlated with y and a particular predictor variable. Consider a model with one predictor variable (x 1 ) y i = β 0 + β 1 x 1i + e i (M0del 1a) and the following model that includes x as an additional predictor variable y i = β 0 + β 1 x 1i + β x i + e i (Model 1b) where the asterisks in Model 1a indicate that the parameters and prediction errors are not necessarily identical to those in Model 1b. If x 1 in Model 1a is confounded with some other variable, then β 1 will contain confounding variable bias. The confounding variable bias in β 1 can be described in terms of how much its value changes if a confounding variable is added to the model. Suppose x is a confounding variable and is added to Model 1a to give Model 1b. It can be shown that β 1 = β 1 + (β ρ x1 x σ x /σ x1 ) where the term in parentheses is the confounding variable bias in β 1 due to the omission of x from the model. Note that the amount of confounding bias due to x depends on the magnitude of the correlation between x and x 1 as well as the magnitude of β where the value of β depends on the correlation between x and y. The magnitude of the confounding variable bias due to x will be small if either β or ρ x1 x σ x /σ x1 is small. Confounding variable bias in a particular slope coefficient due to one or more confounding variables can be removed by including those confounding variables in the model. If the researcher can present a convincing argument that all potential confounding variables for predictor variable x j that have not been included in the model are likely to have small correlations with the response variable or small correlations with all included predictor variables, this would imply that 3

4 D.G. Bonett (9/018) confounding variable bias is small and β j might cautiously and tentatively be interpreted as a description of a causal relation between x j and y. In experimental designs where participants are randomly assigned to the levels of some treatment factor, the randomization process guarantees that all excluded predictor variables must be uncorrelated with the treatment predictor variables, and this precludes the possibility of any confounding variables. If there are no confounding variables, the slope coefficient for a particular treatment predictor variable will describe a causal relation between the treatment predictor variable and the response variable. Analysis of Variance Table for a GLM The variance of y (also called the total variance) can be decomposed into two sources of variability the variance of the prediction errors (also called error variance or residual variance) and the variance of y that is predictable from the q predictor variables (also called model variance). The decomposition of the total variance can be summarized in an analysis of variance (ANOVA) table as shown below. Source SS df MS F MODEL SS M df M = q MS M = SS M/df M MS M/MS E ERROR SS E df E = n q 1 MS E = SS E/df E TOTAL SS T df T = n 1 MS T = SS T/df T The estimated predicted y score for person i is y i = β 0 + β 1x 1i + β x i + + β qx qi and the estimated prediction error (residual) for person i is e i = y i y i where β 0, β 1,, β q are least squares estimates. The sum of squares (SS) values in the ANOVA table are SST = n i=1 (y i μ y) SSE = n i=1 e i SSM = SST SSE. MSE is an estimate of the variance of the prediction errors in the study population (σ e ), and the MST is an estimate of the variance of the y scores in the study population (σ y ). 4

5 D.G. Bonett (9/018) If all predictor variables in the GLM are random, a multiple correlation between y and the q predictor variables is an interesting population parameter. A multiple correlation is equal to a Pearson correlation between y i and the predicted y i score (β 1 x 1i + β x i + + β q x qi ). The OLS estimates of β j define a unique linear function of the q predictor variables that maximizes the value of the Pearson correlation between y i and y i where y i = β 0 + β 1x 1i + β x i + + β qx qi. In the random-x case, the population multiple correlation is denoted as ρ y.x where x denotes the set of q predictor variables. The squared multiple correlation ρ y.x, referred to as the coefficient of multiple determination, and is equal to 1 σ e /σ y. The coefficient of multiple determination describes the proportion of the response variable variance that can be predicted from the q predictor variables. An estimate of ρ y.x (reported as R-squared in most statistical packages) is obtained from the SS values in the ANOVA table ρ y.x = 1 SS E /SS T. (.1) An estimate of the multiple correlation is obtained by taking the square-root of Equation.1. It is instructive to express ρ y.x as the following function of Pearson correlations for the simple case of q = predictor variables ρ y.x = (ρ yx1 + ρ yx ρ yx1 ρ yx ρ x1 x )/(1 ρ x1 x ). (.) From Equation. we see that ρ y.x = ρ yx1 + ρ yx if x 1 and x are uncorrelated. In general, if all q predictor variables are uncorrelated, then ρ y.x will equal the sum of the squared Pearson correlations between y and each of the predictor variables. If the correlations among all q predictor variables are very small, then ρ y.x will approximately equal the sum of the squared Pearson correlations between y and each of the predictor variables. Note that ρ y.x can be greater than ρ yx1 + ρ yx if ρ yx1 ρ yx ρ x1 x is a negative value. The estimated squared multiple correlation has a positive bias and the bias can be substantial when dfe is small. The following adjusted squared multiple correlation (reported as adjusted R-squared in most statistical packages) has less positive bias and is equal to adj ρ y.x = 1 MSE/MST. (.3) 5

6 D.G. Bonett (9/018) A confidence interval for ρ y.x does not have a simple formula but it can be obtained in R. APA journals now expect authors who report an estimate of ρ y.x (preferably adj ρ y.x ) to also include a confidence interval for ρ y.x. In the fixed-x case, the following estimate of the coefficient of multiple determination η = 1 SS E /SS T (.4) is an estimate of η and is equal to ρ y.x. Like ρ y.x, η has a positive bias and the bias can be substantial when dfe is small. Although η = ρ y.x and η = ρ y.x, different symbols are used for the coefficient of multiple determination in the random-x and fixed-x cases because η and ρ y.x have different sampling distributions, and a confidence interval for η in the fixed-x model will be different than a confidence interval for ρ y.x in the random-x model. The confidence interval for η in the fixed-x case is complicated but it can be obtained in R. Note that adj ρ y.x is a function of MSE = SS E /dfe. Although SS E can never increase when predictor variables are added to the model, MSE can increase because the decrease in dfe could be relatively greater than the decrease in SS E. Unlike η and, which can never decrease when predictor variables are added to the model, ρ y.x adj ρ y.x can decrease when predictor variables are added to the model. The F statistic from the ANOVA table is used to test the omnibus null hypothesis H0: β 1 = β = = β q = 0 against an alternative hypothesis that at least one population slope coefficient is nonzero. H0 is rejected if the p-value for the F statistic (F = MS M/MS E) is less than α. These null and alternative hypotheses are equivalent to testing H0: ρ y.x = 0 against H1: ρ y.x > 0. A statistical test that allows the researcher to simply decide if H0: ρ y.x = 0 can or cannot be rejected does not provide useful scientific information because the researcher knows, before any data have been collected, that H0 is almost certainly false and hence H1 is almost certainly true. Although the researcher knows that ρ y.x (or η ) will almost never equal 0, the value of ρ y.x (or η ) will not be known and therefore a confidence interval for ρ y.x (or η ) provides useful information. 6

7 D.G. Bonett (9/018) Confidence Interval and Test for Slope Coefficient A 100(1 α)% confidence interval for β j (the population slope coefficient corresponding to x j ) is β j ± t α/;dfe SE β j (.5) where β j is the OLS estimate of β j and df E = n q 1. The standard error of β j can be expressed as SE β j = MS E /[(1 ρ xj.x) σ xj (n 1)] (.6) where ρ xj.x denotes the estimated squared multiple correlation between x j and the other q 1 predictor variables, and σ xj is the estimated variance of x j. A confidence interval for β j can be used to test H0: β j = 0 against H1: β j > 0 and H: β j < 0. SPSS and R also compute the test statistic t = β j/se β j and its corresponding p-value for each β j. The p-value can be used to decide if H0: β j = 0 can be rejected. If H0: β j = 0 is rejected, the sign of β j determines which alternative hypothesis to accept. The MSE can be expressed as df T σ y (1 ρ y.x )/df E. This expression is informative because it shows that a larger value of the multiple correlation between the response variable and all predictor variables (ρ y.x ) will give a smaller value of MSE, which in turn reduces the value of SE β j and the width of the confidence interval for β j. Furthermore, SE β j (Equation.6) is related the multiple correlation between predictor variable j and the other predictor variables (ρ xj.x) where larger values of ρ xj.x increase the value of SE β j. Thus, correlated predictor variables will inflate the value of SE β j which in turn will increase the width of the confidence interval for β j and reduce the power of the test of H0: β j = 0. However, correlated predictor variables will not reduce the power of the test of H0: ρ y.x = 0 (H0: β 1 = β = = β q = 0) and so it is possible to reject H0: ρ y.x = 0 but then fail to reject H0: β j = 0 for any of the predictor variables in the GLM. 7

8 D.G. Bonett (9/018) Example.1. A researcher obtained a random sample of n = 36 participants from business directory containing contact information for about 50,000 working adults. All participants in the sample were given a life satisfaction questionnaire (y), a neuroticism questionnaire (x 1), a conscientiousness questionnaire (x ), and an industriousness questionnaire (x 3). All three predictor variables were moderately correlated. The adjusted estimate of the squared multiple correlation was.7. The 95% confidence interval for the population squared correlation was [.135,.319] indicating that 13.5% to 31.9% of the variance in life satisfaction scores can be predicted from a linear function of the neuroticism, conscientiousness, and industriousness scores. The estimated slope coefficients and 95% confidence intervals for the population slope coefficients are given below. Predictor β j 95% CI for β j neurotic [-0.79, ] conscientious [-0.536, ] industrious 0.30 [0.148, 0.456] Interaction Effects in a GLM In many applications, the strength of the relation between x 1 and y will depend on the value of a second predictor variable x. When this occurs, it is also the case that the relation between x and y will depend on the value of x 1. In these situations, we say that x 1 and x interact. In applications where x 1 and x interact and the relation between x 1 and y is of primary interest, it also could be said that x moderates the relation between x 1 and y. The interaction effect for x 1 and x can be included in a GLM by adding a new predictor variable that is the product of x 1 and x. An example of a GLM with two predictor variables and their interaction is shown below. y i = β 0 + β 1 x 1i + β x i + β 3 x 1i x i + e i (Model ) The product variable can be highly correlated with x 1 and x which can inflate the values of SE β j. This problem can be reduced by centering x 1 and x before computing the product. A GLM that includes a product (interaction) term allows the strength of the relation between x 1 and y to vary across the levels of x and the strength of the relation between x and y to vary across the levels of x 1. Consequently, the values of β 1 and β may be uninteresting. The nature of the interaction effect in a GLM can be understood by examining the simple slope for x 1 at low and high values of x or the simple slope for x at low and high values of x 1. The simple slope for x 1 can be obtained by factoring x 1i out of the β 1 x 1i and β 3 x 1i x i terms as shown below 8

9 D.G. Bonett (9/018) y i = β 0 + β x i + (β 1 + β 3 x i )x 1i + e i where the term in parentheses is the simple slope for x 1. Factoring x i out of the β x i and β 3 x 1i x i terms shows that simple slope for x is β + β 3 x 1i. When x 1 and x are random and have been centered, their means will be zero and it is common practice to report the simple slope for x 1 at x = σ x and at x = σ x (one standard deviation below and above the centered mean of x ). Likewise, the simple slope for x is typically reported for x 1 = σ x1 and x 1 = σ x1. If x 1 is fixed, simple slopes for x could be computed at the lowest and highest values of x 1. If x 1 is a dummy variable, simple slopes for x would be computed for x 1 = 0 and x 1 = 1. If x 1 is the predictor variable of primary importance, the simple slopes for x 1 are typically more interesting than the simple slopes for x. Centering will change the values of β 1 and β but not β 3 in Model. Centering does not affect the values of the simple slopes at the equivalent centered and uncentered values of x 1 or x. Centering also has no effect on the estimates of ρ y.x, η, or σ e. When the interaction effect (β 3 ) is not small, the simple slope for x 1 could differ meaningfully across the values of x, and the simple slope for x could differ meaningfully across the values for x 1. In some applications it could be informative to determine the value of x where the simple slope for x 1 changes sign or the value of x 1 where the simple slope for x changes sign. These change points can be estimated by setting β + β 3 x 1i to zero and solving for x 1 or setting β 1 + β 3 x i to zero and solving for x. The estimated value of x where the simple slope for x 1 changes sign is -β 1/β 3, and the estimated value of x 1 where the simple slope for x changes sign is -β /β 3. The interpretation of the simple slopes will be less complicated if the change point is outside the range of typical x 1 or x values. Confidence Interval for Simple Slopes A 100(1 α)% confidence interval for the simple slope for x 1 at x (where x is a particular value of x ) is β 1 + β 3x ± t α/;dfe SE β 1+β 3x (.7a) where SE β 1+β 3x = SE + x β 1 SE + x β 3 cov(β 1, β 3), cov(β 1, β 3) is the estimated covariance between β 1 and β 3, and df E = n q 1. SPSS and R will report the estimated covariances among slope estimates as optional output. 9

10 D.G. Bonett (9/018) A 100(1 α)% confidence interval for the simple slope for x at x 1 is β + β 3x 1 ± t α/;df SE β +β 3x 1 (.7b) where SE β +β 3x = SE 1 + x β 1 SE + x β 3 1 cov(β, β 3). Indicator Variables A qualitative predictor variable with k categories (e.g., male/female, Democrat/Republican/Independent, freshman/sophomore/junior/senior, etc.) is called a factor and can serve as a predictor variable in a GLM if it is first converted into k 1 indicator variables. Dummy coded variables and effect coded variables are two type of indicator variables. Dummy coded variables have values of 0 and 1. For a categorical predictor variable with k categories, the dummy coded indicator variable j is equal to 1 in level j and 0 otherwise. For example, two dummy coded indicator variables x 1 and x will code k = 3 levels, as shown below. level x 1 x Participants with level 1 of the predictor variable are assigned an x 1 score of 1 and an x score of 0, participants with level of the predictor variable are assigned an x 1 score of 0 and an x score of 1, and participants with level 3 of the predictor variable are assigned an x 1 score of 0 and an x 1 score of 0. The level for which all the dummy codes are equal to 0 is called the reference level. In the above example, level 3 is the reference level. The GLM for a k = 3 level dummy coded qualitative factor is y i = β 0 + β 1 x 1i + β x i + e i and it can be shown that β 0 = μ 3, β 1 = μ 1 μ 3 and β = μ μ 3. Equation.5 can be used to obtain confidence intervals for β 1 and β, which are confidence intervals for μ 1 μ 3 and μ μ 3, respectively. Note that β 1 β = (μ 1 μ 3 ) (μ μ 3 ) = μ 1 μ and so a confidence interval for μ 1 μ is obtained from a confidence interval for β 1 β (Equation.4 can be used to obtain a confidence interval for 10

11 D.G. Bonett (9/018) β 1 β ). In general, a GLM for one qualitative factor with k levels that have been dummy coded, the parameters are β 0 = μ k and β j = μ j μ k. Effect coded variables have values of 1, 0, and -1 (or just 1 and -1 if there are only two categories). As with dummy coding, k 1 effect coded variables are needed to code a qualitative factor with k levels. The two effect coded variables for a qualitative predictor variable with k = 3 levels are shown below. level x 1 x For a k-level qualitative factor, the effect coded variable j is equal to 1 in level j, -1 in level k, and 0 otherwise. In this general case, β 0 = (μ 1 + μ + + μ k )/k and β j = μ j β 0. To obtain a confidence interval for a pairwise mean difference, say μ 1 μ, it would be necessary to compute a confidence interval for β 1 β, which equals (μ 1 β 0 ) (μ β 0 ) = μ 1 μ. However, pairwise comparisons involving the last category are not obvious. For k = 3, μ 1 μ 3 = β 1 β and μ μ 3 = β 1 β. With k = categories, only one effect coded variable is required and the model is y i = β 0 + β 1 x i + e i with participants at level 1 assigned an x i score of 1 and participants at level assigned an x i score of -1. With k = and effect coding, β 0 = (μ 1 + μ )/ and β 1 = μ 1 β 0 = μ 1 (μ 1 + μ )/ = (μ 1 μ )/. A GLM can have two or more qualitative factors. Consider the simplest case of two qualitative factors (factor A and factor B) that each have two levels (called a factorial design). The two levels of factor A are denoted as a1 and a. The two levels of factor B are denoted as b1 and b. Either dummy coding or effects coding can be used to code each factor. The interaction between the two factors is coded by taking the product of the dummy or effect codes for each factor. The dummy codes and effects codes are shown below for a factorial design. Dummy Codes Effect Codes A B x 1 x x 1 x x 1 x x 1 x

12 D.G. Bonett (9/018) The GLM for a factorial design is y i = β 0 + β 1 x 1i + β x i + β 3 x 1i x i + e i and the interpretation of the model parameters depends on the type of coding used. With dummy coded indicator variables the model parameters have the following definitions. β 0 = μ (mean at a and b) β 1 = μ 1 μ (simple main effect of A at b) β = μ 1 μ (simple main effect of B at a) β 3 = μ 11 μ 1 μ 1 + μ (AB interaction effect) With effect coded indicator variables the model parameters have the following definitions. β 0 = (μ 11 + μ 1 + μ 1 + μ )/4 (grand mean) β 1 = (μ 11 + μ 1 )/4 (μ 1 + μ )/4 (main effect of A divided by ) β = (μ 11 + μ 1 )/4 (μ 1 + μ )/4 (main effect of B divided by ) β 3 = (μ 11 μ 1 μ 1 + μ )/4 (AB interaction effect divided by 4) If the interaction term is not included in the model, the definitions of β 0, β 1, and β are unchanged with effects coding, but with dummy coding these parameters have the following definitions. β 0 = μ (mean at a and b) β 1 = (μ 11 + μ 1 )/ (μ 1 + μ )/ (main effect of A) β = (μ 11 + μ 1 )/ (μ 1 + μ )/ (main effect of B) Dummy coding is preferred to effect coding then the model contains only one qualitative predictor variable or when there are multiple qualitative predictor variables that are assumed not to interact. Effect coding is sometimes preferred to dummy coding if the model contains two or more -level qualitative predictor variables and their interactions. Quadratic Model If a nonlinear relation between y and x cannot be linearized using transformations, the following quadratic model will be appropriate when the relation between y and x can be characterized by a curve with a single bend. y i = β 0 + β 1 x i + β x i + e i 1

13 D.G. Bonett (9/018) In a quadratic model, the slope of the line relating x to y varies across the levels of x. Specifically, the slope of the line at x = x is equal to β 1 + β x, which may be estimated by replacing β 1 and β with their estimates. It is standard practice to center x in a quadratic model. Centering x will change the estimate of β 1 and can substantially reduce its standard error. Centering x will not change the estimate of β or its standard error. A quadratic model implies that the direction of the relation between x and y changes sign at some value of x. In some applications it could be informative to estimate the value of x where the relation between x and y changes direction. The estimated change point is equal to -β 1/β. If the change point is outside the range of typical x scores, this implies that the direction of the relation is constant across typical x scores and this could simplify the interpretation of results. To estimate the amount of curvature in the nonlinear relation between x and y, the slope of the line can be compared at low (x L ) and high (x H ) values of x. The difference in slopes at low and high values of x is equal to (β 1 + β x H ) (β 1 + β x H ) = (x H x L )β where x L and x H are values specified by the researcher. A confidence interval for (x H x L )β is obtained by multiplying the endpoints of a confidence interval for β by (x H x L ). In applications where the slope of the line at x = x does not have a clear interpretation, which would be the case in applications where it is difficult to assign a clear psychological meaning to specific values of the response variable, the researcher might be content with simply testing the following hypotheses about the values of β 1 and β. H0: β 1 = 0 H1: β 1 > 0 H: β 1 < 0 H0: β = 0 H1: β > 0 H: β < 0 Confidence intervals for β 1 and β may be used to select H1 or H. Deciding β 1 > 0 implies that there is a positive relation between y and x, and deciding β 1 < 0 implies that there is a negative relation between y and x. Deciding β > 0 implies that the predicted y scores follow a curve that bends up, and deciding β < 0 implies that the predicted y scores follow a curve that bends down. For the fixed-x case, a graph of the sample means with confidence interval bars at each level of x provides additional information about the nature of the nonlinear relation. It is common to center x in a quadratic model because doing so reduces the standard error for β 1 and increases the power of the test of H0: β 1 = 0 without affecting the power of the test of H0: β = 0. 13

14 D.G. Bonett (9/018) Example.. A random sample of psychology students was obtained from a volunteer pool. The sample was randomized into four study group conditions with ten study groups per condition. The four conditions used study groups of, 4, 6, or 8 students. The dependent variable is performance on a research project scored on a scale from 0 to 100. The 95% confidence intervals for β 1 and β are [4.8, 9.64] and [-0.79,-0.34], respectively. These confidence interval results support H 1: β 1 > 0 and H : β < 0, indicating that project scores are positively related to study group size but the relation is curved with a downward bend. A graph of the estimated population means with 95% confidence interval bars is shown below. Semipartial Correlation A semipartial correlation (or part correlation) between x 1 and y controlling for x x s is denoted as ρ y(x1.x 0 ) where x 0 represents a set of control variables x x s. A semipartial correlation is a Pearson correlation between y and e x1 where e x1 is the prediction error in a model that predicts x 1 from x x q. Replacing x with e x1 in Figure 1 (in Module 1) can be used to assess the importance of an estimated semipartial correlation. A semipartial correlation between x 1 and y controlling for x x s describes the standard deviation change in y associated with a 1 standard deviation increase in e x1. An estimate of a semipartial correlation between x 1 and y controlling for x x s can be obtained by first computing the residuals in a GLM where x 1 is the response variable and x x q are the predictor variables. The Pearson correlation between the y scores and the x 1 residuals (e x 1 ) is an estimate of the semipartial correlation. It can be shown than the squared semipartial correlation between y and x 1 is equal to the difference between ρ y.x (where x is the set of all control variables plus x 1 ) and ρ y.x0 (where x 0 is the set of all control variables). This difference is referred to as R in APA journals. In a random-x model, a semipartial correlation could be computed for each predictor variable controlling for all other predictor variables in the model. These semipartial correlations are conceptually similar to slope coefficients because they 14

15 D.G. Bonett (9/018) describe the relation between x j and y after the linear effects of all other predictor variables have been removed only from x j. A semipartial correlation is a standardized measure of effect size that is easier to interpret than a slope coefficient in applications where the metrics of x j and y are unfamiliar to the intended audience. A confidence interval for ρ y(x1.x 0 ) is obtained in two steps. First, an approximate 100(1 α)% confidence interval for a transformed semipartial correlation estimate is computed ρ y(x1.x 0 ) ± z α/ g/(n 3) (.8) 4 where g = (ρ y.x ρ y.x 4 + ρ y.x0 ρ y.x0 + 1)/(1 ρ y(x1.x 0 )) and ρ y(x1.x 0 ) = ln ([ 1 + ρ y(x1.xo) ])/. 1 ρ y(x 1.x0) Let ρ L and ρ U denote the endpoints of Equation.8. Reverse transforming the endpoints of Equation.8 gives the following lower confidence limit for ρ y(x1.x 0 ) [exp(ρ L ) 1]/[exp(ρ L ) + 1] (.09a) and the following upper confidence limit for ρ y(x1.x 0 ) [exp(ρ U ) 1]/[exp(ρ U ) + 1]. (.09b) Partial Correlation A partial correlation between x j and y removes the linear effects of one or more control variables from both x j and y. In comparison, a semi-partial correlation between x j and y removes the linear effects of one or more control variables from only x j. Let x 0 denote a set of control variables x x s. A partial correlation between x 1 and y controlling for x 0, denoted as ρ yx1.x 0, is a Pearson correlation between e x1 and e y where e x1 represents the prediction errors in a model that predicts x 1 from x 0 and e y represents the prediction errors in a model that predicts y from x 0. A partial correlation between x 1 and y describes the standard deviation change in e y associated with a 1 standard deviation increase in e x1. Replacing y with e y and x with e x1 in Figure 1 is helpful in assessing the importance of an estimated partial correlation. 15

16 D.G. Bonett (9/018) Like the multiple correlation and semipartial correlation, a partial correlation is appropriate only for random-x models. A partial correlation may be more interesting than a semi-partial correlation if it is important to remove the effects of the control variables from both y and x 1. For example, if social skills and problem solving skills are measured in a sample of 6 to 9 year old children, the correlation between these two variables could be misleading because both variables are related to age, and it would be desirable to remove the effect of age from both variables. A confidence interval for ρ yx1.x 0 is obtained in two steps. First, a 100(1 α)% confidence interval for a transformed partial correlation estimate is computed ρ yx1.x 0 ± z α/ 1/(n s 3) (.10) where ρ yx1.x 0 = ln ([ 1 + ρ yx1.x0 ])/ and s is the number of control variables. Let ρ 1 L ρ yx 1.x0 and ρ U denote the endpoints of Equation.10. Reverse transforming the endpoints of Equation.10 gives the following lower confidence limit for ρ yx1.x 0 [exp(ρ L ) 1]/[exp(ρ L ) + 1] (.11a) and the following upper confidence limit for ρ yx1.x 0 [exp(ρ U ) 1]/[exp(ρ U ) + 1]. (.11b) Example.3. A validation study examined the correlation between a new 3-D spatial ability scale and a -D spatial ability scale. However, both scales have detailed written instructions and it is likely that both scales are contaminated by differences in reading comprehension. The -D and 3-D spatial ability scales and a measure of reading comprehension were given to a random sample of 151 community college students. The sample partial correlation between the 3-D and -D scales, controlling for reading comprehension, was.70. The Fisher transformed partial correlation is ρ yx = and Equation.14 with α =.05 gives /147 = [0.705, 1.09] which is reverse transformed to give [.61,.77]. Comparing Two Correlations Recall from Module 1 that Equations 1.9ab could be used to obtain a confidence interval for the difference between two Pearson or point-biserial correlations that were estimated from a two-group design. Equations 1.9ab also may be used to obtain confidence intervals for a difference between two squared multiple correlations, two semipartial correlations, or two partial correlations where the two correlations have been estimated from a two-group design. 16

17 D.G. Bonett (9/018) Standardized Slope Coefficients An estimated standardized slope, denoted as β j, is computed from standardized y scores (y i = y i μ y σ y ) and standardized x j scores (x ji = x ji μ xj ). It is not necessary to standardize indicator variables. Recall that β j describes the relation between y and e xj where e xj is the part of x j that is not linearly related to the other predictor variables in the model. If the predictor variables are standardized and if x j is predicted from all other predictor variables in the model, e xj represents the part of x j that is not linearly related to any of the other predictor variables. Then β j describes the change in y, in standard deviation units, associated with a 1-point increase in e xj. However, it can be shown that the standard deviation of the e xj scores is equal to 1 ρ xj.x which will be less than 1 unless ρ xj.x = 0. For example, if ρ xj.x =.75 then the standard deviation of the e xj scores is equal to 1 ρ xj.x =.5 so that a 1-point increase in e xj corresponds to a standard deviation increase in e xj. The fact that a 1-point increase in e xj does not always correspond to a 1 standard deviation increase in e xj makes the standardized slope difficult to interpret. If the model has only one predictor variable or if the predictor variables are uncorrelated, the standardized slope is equal to the Pearson correlation between y and x j. σ xj A standardized slope estimate can be expressed as β j = ρ y(x1.x o )/ 1 ρ xj.x. If ρ xj.x > 0, the standardized slope will be greater than the corresponding semipartial correlation. Unlike a semipartial correlation, which cannot be greater than 1 or less than -1, a standardized slope can have a value that is much greater than 1 or much less than -1 when ρ xj.x is large. Although most APA journals recommend the reporting of standardized slope estimates, semipartial correlations along with their confidence intervals are usually preferable. An approximate 100(1 α)% confidence interval for β j is β j ± z α/ SE β j (.1) 17

18 D.G. Bonett (9/018) where SE β j is the standard error of the standardized slope. The formula for SE β j is complicated but can be obtained in R. Example.4. In Example.1, life satisfaction (y), neuroticism (x 1), conscientiousness (x ), and industriousness (x 3) were measured in a sample of n = 36 employees. The estimated standardized slopes and semipartial correlations along with 95% confidence intervals for the population semipartial correlations are given below. Suppose industriousness (x 3) is the predictor variable of primary interest. After controlling for neuroticism and conscientiousness, we can be 95% confident that a one standard deviation increase in industriousness is associated with a.108 to.38 standard deviation increase in life satisfaction in the study population. Semipartial Standardized 95% CI for 95% CI for Predictor Correlation Slope Semipartial Correlation Standardized Slope neuroticism [-.456, -.44] [-0.554, -0.94] conscientious [-.53, -.031] [-0.479, -0.17] industrious [.108,.38] [0.170, 0.450] Analysis of Covariance Model An ANCOVA model is a GLM with one or more qualitative factors and one or more quantitative predictor variables. The quantitative predictor variables in the ANCOVA model are referred to as covariates. In experimental designs where participants are randomly assigned to the levels of a treatment factor, the covariates can be assumed to be uncorrelated with the indicator variables and then the slope coefficients for the treatment indicator variables will not contain any confounding variable bias. In experimental designs, covariates are used primarily to reduce prediction error variance and secondarily because the researcher is interested in how the covariates relate to the dependent variable or interact with the treatment factor(s). Reducing prediction error variance has the beneficial effects of narrowing the confidence interval widths and increasing the power of the hypothesis tests. The ANCOVA model is also used in nonexperimental designs where participants have self-selected into two or more treatment conditions. When participants selfselect into treatment conditions, the slope coefficients for the treatment indicator variables will contain confounding variable bias because the participants in different treatment conditions could systematically differ in terms of attributes that are correlated with the response variable. If the most important confounding variables can be included in the model as covariates, the confounding variable bias could be substantially reduced and then the slope coefficients for the treatment effects will be less misleading. 18

19 D.G. Bonett (9/018) Consider an ANCOVA model for a -level factor and one covariate. Using dummy coding, the following model includes one covariate (x 1 ), one dummy coded variable (x ), and their product (x 3 = x 1 x ) y i = β 0 + β 1 x 1i + β x i + β 3 x 3i + e i. (Model 3) It is common to center the covariate by subtracting the overall sample covariate mean from every covariate score. The covariate is assumed to be centered in the above model. If a confidence interval for β 3 suggests that the interaction effect could be small, a confidence interval or test for β would be examined. Otherwise, the simple slopes for the dummy coded variable would be examined at low and high values of the covariate. The ANCOVA model can be represented as a multiple-group regression model. For example, the slope coefficients of Model 3 can be interpreted in terms of the parameters of a simple linear regression model for each of the two groups as shown below y 1i = β 10 + β 11 x 11i + e1i (Model 4a) y i = β 0 + β 1 x 1i + ei (Model 4b) where the first subscript indicates group membership (1 or ). When x is a dummy coded variable in Model 3, it can be shown that β = β 10 β 0, β 1 = β 1, and β 3 = β 11 β 1. When x is an effect coded variable in Model 3, β 1 = (β 11 + β 1 )/, β = (β 10 β 0 )/, and β 3 = (β 11 β 1 )/. In Models 4a and 4b, β 11 and β 1 are the simple slopes for x 1 at each level of the dummy variable. Example.5. A researcher suspects that multiple attempts to recall recently learned information, even without performance feedback, will improve recall performance. A twogroup experiment was conducted where all participants viewed a 50-minute video lecture. The first group was given a short test over the lecture material without performance feedback every day for five days. The second group was not tested during this 5-day period. Ten days later, all participants were given a test of the lecture material (scored 0 to 100). Using an ANOVA model y i = β 0 + β 1 x 1i + e i, where x 1 is a dummy coded variable. The 95% confidence interval for β 1 was [1.1, 18.8] suggesting that attempts to recall information, even without feedback, will improve retention of learned material. The researcher also obtained the total SAT score for each participant, which is believed to correlate with the final test score. Using an ANCOVA model y i = β 0 + β 1 x 1i + β x i + β 3 x 3i + e i, where x 1 is the total SAT score and x is the dummy coded variable, the interaction effect appeared to be small, and a 95% confidence interval for β was [6.7, 13.]. This confidence interval is considerably narrower, and hence more informative, than the confidence interval for β 1 using the ANOVA model. 19

20 D.G. Bonett (9/018) Example.6. A researcher obtained a sample of retirees who began working crossword puzzles after retirement and another sample of retirees who did not take up crossword puzzles. Using an ANOVA model y i = β 0 + β 1 x 1i + e i, where y is a measure of intelligence and x 1 is a dummy coded variable, the researcher obtained a 95% confidence interval for β 1 of [8., 1.5]. This result suggests that the average intelligence is higher for retirees who began working crossword puzzles after retirement. However, this nonexperimental result does not imply that taking up crossword puzzles after retirement will cause an improvement in cognitive functioning. It is possible that retirees who take up crossword puzzles after retirement are more intelligent than retirees who do not take up crossword puzzles. Using an ANCOVA model y i = β 0 + β 1 x 1i + β x i + β 3 x 3i + e i, where x 1 is years of college education (an easily-measured proxy of pre-retirement intelligence) and x is the dummy coded variable, the interaction effect appeared to be small and a 95% confidence interval for β was [-0.4,.1]. This result suggests that taking up crossword puzzles after retirement may not have much of an effect on cognitive functioning. The ANCOVA has another important application in studies that compare two different treatments (e.g., treatment 1 and treatment ) and certain participants are expected to benefit more from treatment 1 than treatment. In these situations, it may difficult to ethically justify an experiment that randomly assigns participants to treatment conditions. However, if a priority score can be assigned to each participant that assesses the degree to which a participant might benefit from treatment 1, participants with priority scores above some cut point would all be assigned to treatment 1 and participants with priority scores below the cut point would all be assigned to treatment. It is a remarkable fact that if the priority score is used as a covariate in a two-group ANCOVA, the slope coefficient for the treatment dummy variable will describe the causal effect of treatment on the response variable (assuming a linear relation between the priority scores and the response variable and no priority by treatment interaction). This type of design is called a regression discontinuity (RD) design. Example.7. A new program to provide low income college-bound students with financial aid application training was evaluated using a random sample of 100 low income collegebound students from the Los Angeles school district. All students from families making less than $17,000 per year received the financial aid application training and all other students received the usual assistance from their high school counselor. The dependent variable was the amount of financial aid obtained. The 95% confidence interval for the dummy variable slope coefficient was [$60, $918] indicating that if all low income collegebound students in the Los Angeles school district had received the financial aid application training, their mean financial aid would be $60 to $918 higher than if they had received the typical assistance provided by a high-school counselor. Note that the researcher was able to make a causal claim about the effectiveness of the new program even though students were not randomly assigned to groups. Although the RD design can reduce the "ethical costs" of a study, the RD design requires about 300% more participants than the corresponding two-group 0

21 D.G. Bonett (9/018) experimental design to achieve the same hypothesis testing power and confidence interval precision. Thus, the RD design will have lower ethical costs combined with greater sampling costs (e.g., measurement costs, cost of administering treatment, payments to participants) than a two-group experimental design. The sampling costs of the RD design can be substantially reduced by specifying an "indifference range" of priority scores for which either treatment could be beneficial and then randomly assigning participants with priority scores within this range to one of the two treatment conditions. Participants with priority scores below the lower limit of the indifference range are assigned to treatment and participants with priority scores above the upper limit of the indifference range are assigned to treatment 1. Increasing the width of the indifference range decreases the sample size requirements of the RD design. For example, with a wide indifference range that includes about 50% of the priority scores, the sample size requirement for the RD design only requires about 30% more participants than the corresponding twogroup experimental design. Exploratory Model Selection In applications where the researcher has measured many potential predictor variables and wants to determine which of these variables are most useful in predicting y, there are exploratory procedures (e.g., forward selection, backwards elimination, stepwise) in SPSS and R that will sift through a set of candidate predictor variables and identify the best subset. Although these exploratory procedures are popular, they can produce misleading results. In general, the p- values for the selected predictor variables will be too small and the confidence intervals for the slope coefficients of the selected predictor variables will be too narrow. If a large sample is available, the sample can be randomly divided into two samples. The exploratory model selection is performed in one sample (the training sample) and the selected model is then applied in the second sample (the test sample). Only the results in the test sample should be reported. Note however that the parameter estimates (e.g., ρ y.x, ρ y(x1.x 0 ), β j, β j) obtained in the training sample tend to shrink towards 0 in the test sample, and the shrinkage can be substantial if the number of candidate predictor variables is large. The amount of shrinkage can be reduced by using a Bonferroni adjusted alpha level in the training sample variable selection process. LASSO regression (described in more advanced courses) is a newer method for selecting predictor variables in the training sample that tend to remain good predictors in the test sample. 1

22 D.G. Bonett (9/018) Assumptions In fixed-x models, confidence intervals for slopes, simple slopes, η, and σ e assume: 1) random sampling, ) independence among participants, 3) linearity between y and each predictor variable (linearity assumption), 4) constant variability of the prediction errors across the values of every predictor variable (equal prediction error variance assumption), and 5) approximate normality of the prediction errors in the study population (prediction error normality assumption). Scatterplots of y with each predictor variable are useful in assessing the linearity assumption. Scatterplots of the residuals with each predictor variable (called residual plots) are helpful in assessing the equal variance assumption. Skewness and kurtosis estimates of the residuals are useful in assessing the prediction error normality assumption. Transforming the response variable may reduce prediction error non-normality. In random-x models, confidence interval for slopes and simple slopes requires the same assumptions given above for the fixed-x models. Confidence intervals for a squared multiple correlation, partial correlation, semipartial correlation, or standardized slope in random-x models require a stronger assumption that y and all q predictor variables have an approximate multivariate normal distribution in the study population. The multivariate normality assumption implies the linearity and equal error variance assumptions of the fixed-x model, and also assumes the predictor variables are linearly related to each other and each have an approximate normal distribution in the study population. To assess the multivariate normality assumption, assess the linearity and equal error variance assumptions as described above. Also examine scatterplots for all pairs of predictor variables to assess linearity of predictor variables, and check all predictor variables for skewness and kurtosis. Transforming the predictor variables may reduce nonlinearity and nonnormality of the predictor variables. Influential Observations A participant with an unusually large residual (e i = y i y i) may excessively influence the least-squares estimates of one or more β j values. However, an examination of the residuals can be misleading because the least-squares estimates of β j have the property of minimizing the sum of the (y i y i) values, and this tends to reduce the size of the residual for an outlier participant. A better approach is to compute y i for participant i using only the data from the other n 1 participants and then subtract this predicted y score from y i. These are called deleted residuals. The deleted residuals can be made easier to interpret by dividing them by their standard errors. Deleted residuals divided by their standard errors are called

Prerequisite Material

Prerequisite Material Prerequisite Material Study Populations and Random Samples A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Module 3. Latent Variable Statistical Models. y 1 y2

Module 3. Latent Variable Statistical Models. y 1 y2 Module 3 Latent Variable Statistical Models As explained in Module 2, measurement error in a predictor variable will result in misleading slope coefficients, and measurement error in the response variable

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM 1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

ANCOVA. Lecture 9 Andrew Ainsworth

ANCOVA. Lecture 9 Andrew Ainsworth ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Self-study Notes Module 7 (Optional Material)

Self-study Notes Module 7 (Optional Material) Self-study Notes Module 7 (Optional Material) Dichotomous Dependent Variable A dichotomous dependent variable has just two possible values. The two values represent two mutually exclusive categories of

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Formula for the t-test

Formula for the t-test Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Least Squares Analyses of Variance and Covariance

Least Squares Analyses of Variance and Covariance Least Squares Analyses of Variance and Covariance One-Way ANOVA Read Sections 1 and 2 in Chapter 16 of Howell. Run the program ANOVA1- LS.sas, which can be found on my SAS programs page. The data here

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV Simple linear regression with one qualitative IV variable is essentially identical to linear regression with quantitative variables. The primary difference

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs) The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution

More information

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES 4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Review of the General Linear Model

Review of the General Linear Model Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

In ANOVA the response variable is numerical and the explanatory variables are categorical.

In ANOVA the response variable is numerical and the explanatory variables are categorical. 1 ANOVA ANOVA means ANalysis Of VAriance. The ANOVA is a tool for studying the influence of one or more qualitative variables on the mean of a numerical variable in a population. In ANOVA the response

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

Module 1. Study Populations

Module 1. Study Populations Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population usually consists of a specific

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients SPSS Output Homework 1-1e ANOVA a Sum of Squares df Mean Square F Sig. 1 Regression 351.056 1 351.056 11.295.002 b Residual 932.412 30 31.080 Total 1283.469 31 a. Dependent Variable: Sexual Harassment

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Categorical Independent Variable Lecture 15 March 17, 2005 Applied Regression Analysis Lecture #15-3/17/2005 Slide 1 of 29 Today s Lecture» Today s Lecture» Midterm Note» Example Regression

More information

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D. Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D. Curve Fitting Mediation analysis Moderation Analysis 1 Curve Fitting The investigation of non-linear functions using

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,

More information

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126 Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Reducing Computation Time for the Analysis of Large Social Science Datasets

Reducing Computation Time for the Analysis of Large Social Science Datasets Reducing Computation Time for the Analysis of Large Social Science Datasets Douglas G. Bonett Center for Statistical Analysis in the Social Sciences University of California, Santa Cruz Jan 28, 2014 Overview

More information