Explanatory Variables Must be Linear Independent...
|
|
- Claude Chandler
- 5 years ago
- Views:
Transcription
1 Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships Y 1 = β 0 + β 1 X 11 + β 2 X β p X p1 + ε 1 Y 2 = β 0 + β 1 X 12 + β 2 X β p X p2 + ε 2. Y n = β 0 + β 1 X 1n + β 2 X 2n + + β p X pn + ε n The least square estimate ( β 0, β 1,..., β p ) exists under 2 conditions n p cannot include too many covariates The p covariates and also the intercept must be linearly independent What does linearly independent mean? Handout 1C - 1
2 Handout 1C - 2 Definition of Linear Dependence and Independence A subset of vectors v 1, v 2,..., v n is called linearly dependent if there exist scalars a 1, a 2,..., a n, not all zero, such that a 1 v 1 + a 2 v a n v n = 0. Otherwise, the vectors v 1, v 2,..., v n are linearly independent. For example, the four vectors below are linearly dependent because v 1 v 2 v 3 v 4 = 0, v 1 =, v 2 =, v 3 = v 4 = but v 1, v 2, v 3 are linearly independent because the only scalars a 1, a 2, a 3 that make a 1 v 1 + a 2 v 2 + a 3 v 3 = 0 are a 1 = a 2 = a 3 =
3 Handout 1C - 3 Example Suppose in some study, the covariates include WT 2 = weight at age 2, in kg WT 9 = weight at age 9, in kg DW = weight gain from age 2 to 9, in kg The covariate WT 2, WT 9, DW are linearly dependent, because DW = WT 9 WT 2.
4 Handout 1C - 4 What happens When Explanatory Variables Are Linearly Dependent? We cannot fit the model Y = β 0 + β 1 WT 2 + β 2 WT 9 + β 3 DW + ε, because the coefficients cannot be uniquely determined. Observe Y = β 0 + (β 1 + c)wt 2 + (β 2 c)wt 9 + (β 3 + c)dw + ε = β 0 + β 1 WT 2 + β 2 WT 9 + β 3 DW + c(wt } 2 WT {{ 9 + DW } ) + ε =0 Regardless of the value of c, the mean of the response Y are all the same. The set of coefficients (β 1, β 2, β 3 ) will fit the data as well as (β 1 + c, β 2 c, β 3 + c) does for any constant c.
5 Handout 1C - 5 What to Do When Explanatory Variables Are Linearly Dependent? Remove some of the explanatory variables that are linearly dependent with others until the remaining explanatory variables are linearly independent e.g., remove anyone of WT2, WT9, and DW will make the remaining linearly independent Put constraint(s) on the β s so that they can be uniquely determined. commonly adopted approaches for models in experimental designs
6 Handout 1C - 6 Dummy Variables (1) Sometimes the explanatory variables are categorical, like blood type (O, A, B, AB). However, it makes NO sense to write a model Y = β 0 + β 1 (blood type) + ε i. because blood type is not a number In experimental design, the treatment factors are often categorial, e.g., the type of fertilizer How to represent categorical variables numerically in a model? Create a dummy variable (aka. indicator variable) for each category of the categorical variable
7 Dummy Variables (2) For example, for the variable blood type, four dummy variables are created for the 4 categories: O, A, B, and AB: D O = 1 if one s blood type is O, and 0 otherwise D A = 1 if one s blood type is A, and 0 otherwise D B = 1 if one s blood type is B, and 0 otherwise D AB = 1 if one s blood type is AB, and 0 otherwise Though the model Y = β 0 + β 1 (blood type) + ε i makes no sense, but the following model does because D O, D A, D B and D AB are all numbers (either 0 or 1). Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i The mean response E[Y ] for the 4 blood types are then blood type E(Y ) O β 0 + β 1 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 Handout 1C - 7
8 Handout 1C - 8 But The Dummy Variables Are Linearly Dependent... Every individual must fall in exactly one of the 4 categories, it is always true that This means: D O + D A + D B + D AB 1 = 0. One of the 4 dummy variables is redundant because knowing any 3 tells us the rest one D O, D A, D B, D AB and the intercept are linearly dependent, and consequently, the coefficients (β 0, β 1, β 2, β 3, β 4 ) cannot be uniquely determined For this reason, we say the model Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i is overparameterized because it specifies more parameters than we actually need.
9 Handout 1C - 9 How to Deal With Overparametrization? There are various ways to deal with overparametrization in the model Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i. Some common ways include dropping the intercept (i.e., letting β 0 = 0) dropping one dummy variable, e.g., D O (i.e., letting β 1 = 0) The category of which the dummy variable is dropped is called the baseline. If D O is dropped, the baseline is blood type O letting β 1 + β 2 + β 3 + β 4 = 0
10 Handout 1C - 10 When the Intercept is Dropped... Dropping the intercept β 0, the coefficients for the dummy variables become the mean response E[Y ] for the coefficient blood type Y = β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i. blood type E(Y ) O β 1 A β 2 B β 3 AB β 4
11 Handout 1C - 11 When One of the Dummy Variables is Dropped... Dropping one of the dummy variables is dropped, the model becomes Y = β 0 + β 2 D A + β 3 D B + β 4 D AB + ε i, and the mean response E[Y ] for the 4 blood type are blood type E(Y ) O β 0 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 The mean of Y under the baseline (blood type O) is β 0 The mean of Y for for blood type A is β 0 + β 2 One can compare the means of Y for blood type A and O by testing β 2 = 0 Useful for comparing categories with the baseline category.
12 Handout 1C - 12 Choice of the Baseline Category Can Be Arbitrary If blood type O is the baseline: If blood type A is the baseline: Y = β 0 +β 2 D A +β 3 D B +β 4 D AB +ε i blood type E(Y ) O β 0 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 Y = β 0+β 1D O +β 3D B +β 4D AB +ε i blood type E(Y ) O β 0 + β 1 A β 0 B β 0 + β 3 AB β 0 + β 4 The 2 models are equivalent in the sense that they give identical group means: β 0 = β 0 + β 1 β 0 + β 2 = β 0 β 0 + β 3 = β 0 + β 3 β 0 + β 4 = β 0 + β 4
13 Example: Salary Survey S X E M S = Salary X = Experience, in years E = Education (1 if H.S. only, 2 if Bachelor s only, 3 if Advanced degree) M = Management Status (1 if manager, 0 if non-manager) Handout 1C - 13
14 Handout 1C - 14 Example: Salary Survey Coding Variables (1) Let s first consider the effect of experience (X ) and education (E) on employee s salary (S), ignoring the effect of management status. Experience (X ): numerical Education (E): qualitative, 3 categories, need 3 dummy variables { 1 if i th person has a high school diploma only E i1 = 0 otherwise { 1 if i th person has a B.S. only E i2 = 0 otherwise { 1 if i th person has an advanced degree E i3 = 0 otherwise. Model 1: S = β 0 + βx + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + ε
15 Handout 1C - 15 Example: Salary Survey Coding Variables Model 1: S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε This model is overparameterized. Need a constraint. If dropping the intercept (letting β 0 = 0), then δ 1 + βx + ε S = δ 2 + βx + ε δ 3 + βx + ε if H.S. only if B.A. or B.S. only if advanced In this parametrization, δ 1, δ 2, δ 3 represent the 3 different intercepts of the regression lines of S on X at the 3 different education levels Often, we are interested in comparison between categories, e.g., whether Bachelors earn more than H.S. graduates on average, i.e., δ 2 > δ 1, or not?
16 Handout 1C - 16 Example: Salary Survey Coding Variables If we drop the dummy variable E 2 for Bachelors, i.e., use the Bachelors degree as the baseline, then β 0 + δ 1 + βx + ε if H.S. only S = β 0 + βx + ε if B.A. or B.S. only β 0 + δ 3 + βx + ε if advanced This parametrization is convenient for comparison between categories. One can test whether Bachelors earn δ 2 more than H.S. graduates by testing δ 1 < 0, and test whether an Advanced degree increase salary by testing δ 3 > 0
17 Handout 1C - 17 Example: Salary Survey Regression Fit (1) > salary = read.table("salarysurvey.txt", head=true) > lm1a = lm(s ~ E+X, data = salary) > summary(lm1a) (... Part of the R output is omitted) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** E ** X e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3604 on 43 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 43 DF, p-value: 3.538e-06 Something wrong?
18 Handout 1C - 18 Example: Salary Survey Regression Fit (2) Let s check the model matrix. > model.matrix(lm1a) (Intercept) E X (omitted) attr(,"assign") [1] R treats E(education) as a numerical variable taking values 1, 2, and 3, not a categorical one.
19 Remark: Model 2 is nested in Model 1 (Why?). Handout 1C - 19 Example: Salary Survey Numerical or Categorical? If one treats E(education) as a numerical variable taking values 1, 2, and 3, the model then becomes Model 2 : S = β 0 + βx + δe + ε. But Model 2 has different implication from Model 1 that on average, a Bachelor s degree increases salary by δ 2 ; a Bachelor s degree + an advanced degree increase salary by δ 3 That is, the salary bonus for completing college is as much as the bonus for completing an advanced degree unrealistic and too restrictive. Treating E as a categorical variable allows the salary bonus for a Bachelor s degree and an advanced degree to be different.
20 Handout 1C - 20 Example: Salary Survey Regression Fit (3) > salary$e = as.factor(salary$e) > lm1 = lm(s ~ E+X, data = salary) > summary(lm1) (... Part of the R output is omitted) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** E * E ** X e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05 The command as.factor(e) tells R that E is categorical By default, R use the lower most level (E = 1) as the baseline
21 Example: Salary Survey Regression Fit (4) Let s check the model matrix of Model 1. > model.matrix(lm1) (Intercept) X E2 E (... omitted...) attr(,"assign") [1] attr(,"contrasts") attr(,"contrasts")$e [1] "contr.treatment" Now R knows E is categorial and creates 2 dummy variables: E2 and E3, and treats H.S. diploma (E = 1) as the baseline. Handout 1C - 21
22 Handout 1C - 22 Example: Salary Survey Interpreting Coefficients From the output of Model 1, the predicted salary is Ŝ = X E E 3. This model implies that on average: each extra year of experience worths $548.6; completing college increases salary by $3221.1; completing college + advanced degree increase salary by $ All the 3 coefficients above are significantly different from 0 (P-value < 5%) What if we want to compare Bachelors with advanced degree holders?
23 Handout 1C - 23 Example: Salary Survey Changing Baseline (1) If not happy with the baseline category R chooses, say want E = 2 (Bachelor s degree) to be the baseline, one can either manually create the dummy variables E1 and E3 > salary$e1 = as.integer(salary$e==1) > salary$e3 = as.integer(salary$e==3) > lm1b = lm(s ~ X + E1 + E3, data = salary) or use the command relevel() > salary$e = relevel(salary$e, ref = "2") > lm1c = lm(s ~ X + E,data=salary) Both will fit Model 1 using E = 2 as the baseline. See the R outputs on the next page. Conclusion: Looking at the coefficient for E3 in the next page, we can conclude advanced degree holders do NOT earn significantly more than Bachelors (P-value 0.25).
24 Handout 1C - 24 > summary(lm1b) Estimate Std. Error t value Pr(> t ) (Intercept) e-14 *** X e-06 *** E * E Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05 > summary(lm1c) Estimate Std. Error t value Pr(> t ) (Intercept) e-14 *** X e-06 *** E * E Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05
25 with δ 1 = , δ 2 = , δ 3 = , and β = Handout 1C - 25 What If We Want To Drop The Intercept? > lm1e = lm(s ~ -1 + X + E, data = salary) > summary(lm1e) Estimate Std. Error t value Pr(> t ) X e-06 *** E e-14 *** E e-10 *** E < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 42 DF, p-value: < 2.2e-16 This fits the model δ 1 + βx + ε S = δ 2 + βx + ε δ 3 + βx + ε if H.S. only if Bachelor s only if advanced
26 Handout 1C - 26 What About the Sum-to-Zero Constraint δ 1 + δ 2 + δ 3 = 0? For the salary example, S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε β 0 + δ 1 + βx + ε if H.S. only = β 0 + δ 2 + βx + ε if Bachelor s only β 0 + δ 3 + βx + ε if advanced the sum-to-zero constraint δ 1 + δ 2 + δ 3 = 0 is NOT intuitive, under which the coefficients δ 1, δ 2, δ 3 and β 0 have NO natural interpretations. Nonetheless, the sum-to-zero constraint will exhibit its power in factorial designs in which two or more treatment factors are administered in an experiment. We will come back to this in Chapter 8.
27 Interaction Between Categorical and Numerical Variables Regardless of the constraint is used, the model S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε β 0 + δ 1 + βx + ε if H.S. = β 0 + δ 2 + βx + ε if B.A. or B.S. β 0 + δ 3 + βx + ε if advanced assumes constant effect of experience X on salary S (the slope β) across all education levels, which can be unrealistic. If the effect a variable on response changes with the level of another variable, we say the effects of the two variables interact, If not, we say their effect is additive. e.g., the model above assumes the effects of education (E) and experience (X) on salary are additive How to write a MLR model with the slope of X changing with education levels? Handout 1C - 27
28 Handout 1C - 28 Interaction Between Categorical and Numerical Variables Consider the model S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + γ 1 (E 1 X ) + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε Here (E 1 X ) means the product of the variables E 1 and X. Then β 0 + δ 1 + (β + γ 1 )X + ε S = β 0 + δ 2 + (β + γ 2 )X + ε β 0 + δ 3 + (β + γ 3 )X + ε if H.S. if B.A. or B.S. if advanced Again, the model is overparameterized. We need one additional constraint on β and γ s. Some common constraints are β = 0 γ 1 = 0 (or γ 2 = 0, or γ 3 = 0) γ 1 + γ 2 + γ 3 = 0
29 Handout 1C - 29 If one uses H.S. diploma as the baseline, i.e., letting δ 1 = 0 and γ 1 = 0, S = β 0 + δ 2 E 2 + δ 3 E 3 + βx + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε β 0 + (β )X + ε if H.S. = β 0 + δ 2 + (β + γ 2 )X + ε if B.A. or B.S. β 0 + δ 3 + (β + γ 3 )X + ε if advanced Then γ 2 is the extra salary per year of experience for completing college, and γ 3 is that for getting an advanced degree.
30 Handout 1C - 30 Fitting Models with Interaction In R In R, the term X:E or X*E in the model formula represents all the interaction terms of X and E. By default, R uses the lowest level E = 1 (H.S. diploma) as the baseline. > salary$e = relevel(salary$e, ref = "1") > lm2 = lm(s ~ X+E+X:E, data = salary) > summary(lm2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-11 *** X ** E E X:E X:E Neither γ 2 nor γ 3 is significantly different from 0 (P-value 0.37 and 0.17).
31 The interaction is not significant. Handout 1C - 31 Test For Interaction Testing whether the effect of experience on salary changes with education level is equivalent to testing H 0 : γ 1 = γ 2 = γ 3 That is, it compares the full model and the reduced model below S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + γ 1 (E 1 X ) + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε (full) S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε > lm1 = lm(s ~ X+E, data = salary) > lm2 = lm(s ~ X+E+X:E, data = salary) > anova(lm1,lm2) Analysis of Variance Table Model 1: S ~ X + E Model 2: S ~ X + E + X:E Res.Df RSS Df Sum of Sq F Pr(>F) (reduced)
32 Handout 1C - 32 Interaction Between Two Categorical Variables Now let s take another categorical variable, management status (M), into account. M = { 1 if manager, 0 if non-manager Since M is a categorical variable, just like E, we should create dummy variables M 0 and M 1 for the two categories, and consider the model S = β 0 + α 0 M 0 + α 1 M 1 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε. However, we don t need both M 0 and M 1 since M 0 + M 1 = 1 and the model is again overparameterized. We can drop one of M 0 and M 1 and one of E 1, E 2 and E 3. So we dropped M 0 and E 1, and consider the model S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + βx + ε.
33 Handout 1C - 33 Interaction Between Two Categorical Variables S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + βx + ε. This model implies that, on average managers earn α 1 more than non-managers; completing college increases salary by δ 2 ; completing college + advanced degree increase salary by δ 3 However, the model above assumes the effect of management status on salary does not change with education levels. Thus we may consider the follow model with management status and education level interactions. S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + θ 2 (M E 2 ) + θ 3 (M E 3 ) + βx + ε.
34 Handout 1C - 34 Interaction Between Two Categorical Variables in R No interaction > lm3 = lm(s ~ X+E+M, data = salary) > summary(lm3) (... omitted... ) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** X < 2e-16 *** E e-11 *** E e-09 *** M < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 1027 on 41 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 41 DF, p-value: < 2.2e-16
35 Handout 1C - 35 Interaction Between Two Categorical Variables in R With interaction > lm4 = lm(s ~ X+E+M+E:M, data = salary) > summary(lm4) (... omitted... ) Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** X <2e-16 *** E <2e-16 *** E <2e-16 *** M <2e-16 *** E2:M <2e-16 *** E3:M <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 39 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 5517 on 6 and 39 DF, p-value: < 2.2e-16
36 Handout 1C - 36 Interaction Between Two Categorical Variables in R Test of interaction > anova(lm3,lm4) Analysis of Variance Table Model 1: S ~ X + E + M Model 2: S ~ X + E + M + E:M Res.Df RSS Df Sum of Sq F Pr(>F) < 2.2e-16 ***
22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationMultiple Regression Part I STAT315, 19-20/3/2014
Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.
More informationNC Births, ANOVA & F-tests
Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationSTAT22200 Spring 2014 Chapter 14
STAT22200 Spring 2014 Chapter 14 Yibi Huang May 27, 2014 Chapter 14 Incomplete Block Designs 14.1 Balanced Incomplete Block Designs (BIBD) Chapter 14-1 Incomplete Block Designs A Brief Introduction to
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationRecall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:
1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationAnalysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29
Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More information4 Grouping Variables in Regression
4 Grouping Variables in Regression Qualitative variables as predictors So far, we ve considered two kinds of regression models: 1. A numerical response with a categorical or grouping predictor. Here, we
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationMultiple Regression: Example
Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationLinear Model Specification in R
Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationSTA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.
STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationSTAT22200 Chapter 14
STAT00 Chapter 4 Yibi Huang Chapter 4 Incomplete Block Designs 4. Balanced Incomplete Block Designs (BIBD) Chapter 4 - Incomplete Block Designs A Brief Introduction to a Class of Most Useful Designs in
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationWe d like to know the equation of the line shown (the so called best fit or regression line).
Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")
More informationGeneral Linear Statistical Models
General Linear Statistical Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin This framework includes General Linear Statistical Models Linear Regression Analysis of Variance (ANOVA) Analysis
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationRegression Models for Quantitative and Qualitative Predictors: An Overview
Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between
More informationExercise 2 SISG Association Mapping
Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationMATH 423/533 - ASSIGNMENT 4 SOLUTIONS
MATH 423/533 - ASSIGNMENT 4 SOLUTIONS INTRODUCTION This assignment concerns the use of factor predictors in linear regression modelling, and focusses on models with two factors X 1 and X 2 with M 1 and
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationECONOMETRIC MODEL WITH QUALITATIVE VARIABLES
ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES How to quantify qualitative variables to quantitative variables? Why do we need to do this? Econometric model needs quantitative variables to estimate its parameters
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More informationExample of treatment contrasts used by R in estimating ANOVA coefficients
Example of treatment contrasts used by R in estimating ANOVA coefficients The first example shows a simple numerical design matrix in R (no factors) for the groups 1, a, b, ab. resp
More informationNested 2-Way ANOVA as Linear Models - Unbalanced Example
Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationRandomized Block Designs with Replicates
LMM 021 Randomized Block ANOVA with Replicates 1 ORIGIN := 0 Randomized Block Designs with Replicates prepared by Wm Stein Randomized Block Designs with Replicates extends the use of one or more random
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More information6. Dummy variable regression
6. Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................
More informationStat 401B Exam 2 Fall 2016
Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will
More informationSTAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR
STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous
More informationLinear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.
Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationMultiple Linear Regression for the Salary Data
Multiple Linear Regression for the Salary Data 5 10 15 20 10000 15000 20000 25000 Experience Salary HS BS BS+ 5 10 15 20 10000 15000 20000 25000 Experience Salary No Yes Problem & Data Overview Primary
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationRegression Analysis Chapter 2 Simple Linear Regression
Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationSTA 101 Final Review
STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem
More informationUsing R formulae to test for main effects in the presence of higher-order interactions
Using R formulae to test for main effects in the presence of higher-order interactions Roger Levy arxiv:1405.2094v2 [stat.me] 15 Jan 2018 January 16, 2018 Abstract Traditional analysis of variance (ANOVA)
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationStatistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam
Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 14 pages long. There are 4 questions,
More informationNotes on Maxwell & Delaney
Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationRegression 1: Linear Regression
Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear regression Linear regression in R Outline Classic linear regression Introduction Constructing the model Estimation
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationSTA441: Spring Multiple Regression. More than one explanatory variable at the same time
STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationMarcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design
Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?
More informationApplied Regression Analysis. Section 2: Multiple Linear Regression
Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response
More informationUnit 7: Multiple linear regression 1. Introduction to multiple linear regression
Announcements Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall 2017 Duke University, Department of Statistical Science Work on your project! Due date- Sunday
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More information