Explanatory Variables Must be Linear Independent...

Size: px
Start display at page:

Download "Explanatory Variables Must be Linear Independent..."

Transcription

1 Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships Y 1 = β 0 + β 1 X 11 + β 2 X β p X p1 + ε 1 Y 2 = β 0 + β 1 X 12 + β 2 X β p X p2 + ε 2. Y n = β 0 + β 1 X 1n + β 2 X 2n + + β p X pn + ε n The least square estimate ( β 0, β 1,..., β p ) exists under 2 conditions n p cannot include too many covariates The p covariates and also the intercept must be linearly independent What does linearly independent mean? Handout 1C - 1

2 Handout 1C - 2 Definition of Linear Dependence and Independence A subset of vectors v 1, v 2,..., v n is called linearly dependent if there exist scalars a 1, a 2,..., a n, not all zero, such that a 1 v 1 + a 2 v a n v n = 0. Otherwise, the vectors v 1, v 2,..., v n are linearly independent. For example, the four vectors below are linearly dependent because v 1 v 2 v 3 v 4 = 0, v 1 =, v 2 =, v 3 = v 4 = but v 1, v 2, v 3 are linearly independent because the only scalars a 1, a 2, a 3 that make a 1 v 1 + a 2 v 2 + a 3 v 3 = 0 are a 1 = a 2 = a 3 =

3 Handout 1C - 3 Example Suppose in some study, the covariates include WT 2 = weight at age 2, in kg WT 9 = weight at age 9, in kg DW = weight gain from age 2 to 9, in kg The covariate WT 2, WT 9, DW are linearly dependent, because DW = WT 9 WT 2.

4 Handout 1C - 4 What happens When Explanatory Variables Are Linearly Dependent? We cannot fit the model Y = β 0 + β 1 WT 2 + β 2 WT 9 + β 3 DW + ε, because the coefficients cannot be uniquely determined. Observe Y = β 0 + (β 1 + c)wt 2 + (β 2 c)wt 9 + (β 3 + c)dw + ε = β 0 + β 1 WT 2 + β 2 WT 9 + β 3 DW + c(wt } 2 WT {{ 9 + DW } ) + ε =0 Regardless of the value of c, the mean of the response Y are all the same. The set of coefficients (β 1, β 2, β 3 ) will fit the data as well as (β 1 + c, β 2 c, β 3 + c) does for any constant c.

5 Handout 1C - 5 What to Do When Explanatory Variables Are Linearly Dependent? Remove some of the explanatory variables that are linearly dependent with others until the remaining explanatory variables are linearly independent e.g., remove anyone of WT2, WT9, and DW will make the remaining linearly independent Put constraint(s) on the β s so that they can be uniquely determined. commonly adopted approaches for models in experimental designs

6 Handout 1C - 6 Dummy Variables (1) Sometimes the explanatory variables are categorical, like blood type (O, A, B, AB). However, it makes NO sense to write a model Y = β 0 + β 1 (blood type) + ε i. because blood type is not a number In experimental design, the treatment factors are often categorial, e.g., the type of fertilizer How to represent categorical variables numerically in a model? Create a dummy variable (aka. indicator variable) for each category of the categorical variable

7 Dummy Variables (2) For example, for the variable blood type, four dummy variables are created for the 4 categories: O, A, B, and AB: D O = 1 if one s blood type is O, and 0 otherwise D A = 1 if one s blood type is A, and 0 otherwise D B = 1 if one s blood type is B, and 0 otherwise D AB = 1 if one s blood type is AB, and 0 otherwise Though the model Y = β 0 + β 1 (blood type) + ε i makes no sense, but the following model does because D O, D A, D B and D AB are all numbers (either 0 or 1). Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i The mean response E[Y ] for the 4 blood types are then blood type E(Y ) O β 0 + β 1 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 Handout 1C - 7

8 Handout 1C - 8 But The Dummy Variables Are Linearly Dependent... Every individual must fall in exactly one of the 4 categories, it is always true that This means: D O + D A + D B + D AB 1 = 0. One of the 4 dummy variables is redundant because knowing any 3 tells us the rest one D O, D A, D B, D AB and the intercept are linearly dependent, and consequently, the coefficients (β 0, β 1, β 2, β 3, β 4 ) cannot be uniquely determined For this reason, we say the model Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i is overparameterized because it specifies more parameters than we actually need.

9 Handout 1C - 9 How to Deal With Overparametrization? There are various ways to deal with overparametrization in the model Y = β 0 + β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i. Some common ways include dropping the intercept (i.e., letting β 0 = 0) dropping one dummy variable, e.g., D O (i.e., letting β 1 = 0) The category of which the dummy variable is dropped is called the baseline. If D O is dropped, the baseline is blood type O letting β 1 + β 2 + β 3 + β 4 = 0

10 Handout 1C - 10 When the Intercept is Dropped... Dropping the intercept β 0, the coefficients for the dummy variables become the mean response E[Y ] for the coefficient blood type Y = β 1 D O + β 2 D A + β 3 D B + β 4 D AB + ε i. blood type E(Y ) O β 1 A β 2 B β 3 AB β 4

11 Handout 1C - 11 When One of the Dummy Variables is Dropped... Dropping one of the dummy variables is dropped, the model becomes Y = β 0 + β 2 D A + β 3 D B + β 4 D AB + ε i, and the mean response E[Y ] for the 4 blood type are blood type E(Y ) O β 0 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 The mean of Y under the baseline (blood type O) is β 0 The mean of Y for for blood type A is β 0 + β 2 One can compare the means of Y for blood type A and O by testing β 2 = 0 Useful for comparing categories with the baseline category.

12 Handout 1C - 12 Choice of the Baseline Category Can Be Arbitrary If blood type O is the baseline: If blood type A is the baseline: Y = β 0 +β 2 D A +β 3 D B +β 4 D AB +ε i blood type E(Y ) O β 0 A β 0 + β 2 B β 0 + β 3 AB β 0 + β 4 Y = β 0+β 1D O +β 3D B +β 4D AB +ε i blood type E(Y ) O β 0 + β 1 A β 0 B β 0 + β 3 AB β 0 + β 4 The 2 models are equivalent in the sense that they give identical group means: β 0 = β 0 + β 1 β 0 + β 2 = β 0 β 0 + β 3 = β 0 + β 3 β 0 + β 4 = β 0 + β 4

13 Example: Salary Survey S X E M S = Salary X = Experience, in years E = Education (1 if H.S. only, 2 if Bachelor s only, 3 if Advanced degree) M = Management Status (1 if manager, 0 if non-manager) Handout 1C - 13

14 Handout 1C - 14 Example: Salary Survey Coding Variables (1) Let s first consider the effect of experience (X ) and education (E) on employee s salary (S), ignoring the effect of management status. Experience (X ): numerical Education (E): qualitative, 3 categories, need 3 dummy variables { 1 if i th person has a high school diploma only E i1 = 0 otherwise { 1 if i th person has a B.S. only E i2 = 0 otherwise { 1 if i th person has an advanced degree E i3 = 0 otherwise. Model 1: S = β 0 + βx + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + ε

15 Handout 1C - 15 Example: Salary Survey Coding Variables Model 1: S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε This model is overparameterized. Need a constraint. If dropping the intercept (letting β 0 = 0), then δ 1 + βx + ε S = δ 2 + βx + ε δ 3 + βx + ε if H.S. only if B.A. or B.S. only if advanced In this parametrization, δ 1, δ 2, δ 3 represent the 3 different intercepts of the regression lines of S on X at the 3 different education levels Often, we are interested in comparison between categories, e.g., whether Bachelors earn more than H.S. graduates on average, i.e., δ 2 > δ 1, or not?

16 Handout 1C - 16 Example: Salary Survey Coding Variables If we drop the dummy variable E 2 for Bachelors, i.e., use the Bachelors degree as the baseline, then β 0 + δ 1 + βx + ε if H.S. only S = β 0 + βx + ε if B.A. or B.S. only β 0 + δ 3 + βx + ε if advanced This parametrization is convenient for comparison between categories. One can test whether Bachelors earn δ 2 more than H.S. graduates by testing δ 1 < 0, and test whether an Advanced degree increase salary by testing δ 3 > 0

17 Handout 1C - 17 Example: Salary Survey Regression Fit (1) > salary = read.table("salarysurvey.txt", head=true) > lm1a = lm(s ~ E+X, data = salary) > summary(lm1a) (... Part of the R output is omitted) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** E ** X e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3604 on 43 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 43 DF, p-value: 3.538e-06 Something wrong?

18 Handout 1C - 18 Example: Salary Survey Regression Fit (2) Let s check the model matrix. > model.matrix(lm1a) (Intercept) E X (omitted) attr(,"assign") [1] R treats E(education) as a numerical variable taking values 1, 2, and 3, not a categorical one.

19 Remark: Model 2 is nested in Model 1 (Why?). Handout 1C - 19 Example: Salary Survey Numerical or Categorical? If one treats E(education) as a numerical variable taking values 1, 2, and 3, the model then becomes Model 2 : S = β 0 + βx + δe + ε. But Model 2 has different implication from Model 1 that on average, a Bachelor s degree increases salary by δ 2 ; a Bachelor s degree + an advanced degree increase salary by δ 3 That is, the salary bonus for completing college is as much as the bonus for completing an advanced degree unrealistic and too restrictive. Treating E as a categorical variable allows the salary bonus for a Bachelor s degree and an advanced degree to be different.

20 Handout 1C - 20 Example: Salary Survey Regression Fit (3) > salary$e = as.factor(salary$e) > lm1 = lm(s ~ E+X, data = salary) > summary(lm1) (... Part of the R output is omitted) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** E * E ** X e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05 The command as.factor(e) tells R that E is categorical By default, R use the lower most level (E = 1) as the baseline

21 Example: Salary Survey Regression Fit (4) Let s check the model matrix of Model 1. > model.matrix(lm1) (Intercept) X E2 E (... omitted...) attr(,"assign") [1] attr(,"contrasts") attr(,"contrasts")$e [1] "contr.treatment" Now R knows E is categorial and creates 2 dummy variables: E2 and E3, and treats H.S. diploma (E = 1) as the baseline. Handout 1C - 21

22 Handout 1C - 22 Example: Salary Survey Interpreting Coefficients From the output of Model 1, the predicted salary is Ŝ = X E E 3. This model implies that on average: each extra year of experience worths $548.6; completing college increases salary by $3221.1; completing college + advanced degree increase salary by $ All the 3 coefficients above are significantly different from 0 (P-value < 5%) What if we want to compare Bachelors with advanced degree holders?

23 Handout 1C - 23 Example: Salary Survey Changing Baseline (1) If not happy with the baseline category R chooses, say want E = 2 (Bachelor s degree) to be the baseline, one can either manually create the dummy variables E1 and E3 > salary$e1 = as.integer(salary$e==1) > salary$e3 = as.integer(salary$e==3) > lm1b = lm(s ~ X + E1 + E3, data = salary) or use the command relevel() > salary$e = relevel(salary$e, ref = "2") > lm1c = lm(s ~ X + E,data=salary) Both will fit Model 1 using E = 2 as the baseline. See the R outputs on the next page. Conclusion: Looking at the coefficient for E3 in the next page, we can conclude advanced degree holders do NOT earn significantly more than Bachelors (P-value 0.25).

24 Handout 1C - 24 > summary(lm1b) Estimate Std. Error t value Pr(> t ) (Intercept) e-14 *** X e-06 *** E * E Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05 > summary(lm1c) Estimate Std. Error t value Pr(> t ) (Intercept) e-14 *** X e-06 *** E * E Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 42 DF, p-value: 1.291e-05

25 with δ 1 = , δ 2 = , δ 3 = , and β = Handout 1C - 25 What If We Want To Drop The Intercept? > lm1e = lm(s ~ -1 + X + E, data = salary) > summary(lm1e) Estimate Std. Error t value Pr(> t ) X e-06 *** E e-14 *** E e-10 *** E < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 3622 on 42 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 42 DF, p-value: < 2.2e-16 This fits the model δ 1 + βx + ε S = δ 2 + βx + ε δ 3 + βx + ε if H.S. only if Bachelor s only if advanced

26 Handout 1C - 26 What About the Sum-to-Zero Constraint δ 1 + δ 2 + δ 3 = 0? For the salary example, S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε β 0 + δ 1 + βx + ε if H.S. only = β 0 + δ 2 + βx + ε if Bachelor s only β 0 + δ 3 + βx + ε if advanced the sum-to-zero constraint δ 1 + δ 2 + δ 3 = 0 is NOT intuitive, under which the coefficients δ 1, δ 2, δ 3 and β 0 have NO natural interpretations. Nonetheless, the sum-to-zero constraint will exhibit its power in factorial designs in which two or more treatment factors are administered in an experiment. We will come back to this in Chapter 8.

27 Interaction Between Categorical and Numerical Variables Regardless of the constraint is used, the model S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε β 0 + δ 1 + βx + ε if H.S. = β 0 + δ 2 + βx + ε if B.A. or B.S. β 0 + δ 3 + βx + ε if advanced assumes constant effect of experience X on salary S (the slope β) across all education levels, which can be unrealistic. If the effect a variable on response changes with the level of another variable, we say the effects of the two variables interact, If not, we say their effect is additive. e.g., the model above assumes the effects of education (E) and experience (X) on salary are additive How to write a MLR model with the slope of X changing with education levels? Handout 1C - 27

28 Handout 1C - 28 Interaction Between Categorical and Numerical Variables Consider the model S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + γ 1 (E 1 X ) + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε Here (E 1 X ) means the product of the variables E 1 and X. Then β 0 + δ 1 + (β + γ 1 )X + ε S = β 0 + δ 2 + (β + γ 2 )X + ε β 0 + δ 3 + (β + γ 3 )X + ε if H.S. if B.A. or B.S. if advanced Again, the model is overparameterized. We need one additional constraint on β and γ s. Some common constraints are β = 0 γ 1 = 0 (or γ 2 = 0, or γ 3 = 0) γ 1 + γ 2 + γ 3 = 0

29 Handout 1C - 29 If one uses H.S. diploma as the baseline, i.e., letting δ 1 = 0 and γ 1 = 0, S = β 0 + δ 2 E 2 + δ 3 E 3 + βx + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε β 0 + (β )X + ε if H.S. = β 0 + δ 2 + (β + γ 2 )X + ε if B.A. or B.S. β 0 + δ 3 + (β + γ 3 )X + ε if advanced Then γ 2 is the extra salary per year of experience for completing college, and γ 3 is that for getting an advanced degree.

30 Handout 1C - 30 Fitting Models with Interaction In R In R, the term X:E or X*E in the model formula represents all the interaction terms of X and E. By default, R uses the lowest level E = 1 (H.S. diploma) as the baseline. > salary$e = relevel(salary$e, ref = "1") > lm2 = lm(s ~ X+E+X:E, data = salary) > summary(lm2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-11 *** X ** E E X:E X:E Neither γ 2 nor γ 3 is significantly different from 0 (P-value 0.37 and 0.17).

31 The interaction is not significant. Handout 1C - 31 Test For Interaction Testing whether the effect of experience on salary changes with education level is equivalent to testing H 0 : γ 1 = γ 2 = γ 3 That is, it compares the full model and the reduced model below S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + γ 1 (E 1 X ) + γ 2 (E 2 X ) + γ 3 (E 3 X ) + ε (full) S = β 0 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε > lm1 = lm(s ~ X+E, data = salary) > lm2 = lm(s ~ X+E+X:E, data = salary) > anova(lm1,lm2) Analysis of Variance Table Model 1: S ~ X + E Model 2: S ~ X + E + X:E Res.Df RSS Df Sum of Sq F Pr(>F) (reduced)

32 Handout 1C - 32 Interaction Between Two Categorical Variables Now let s take another categorical variable, management status (M), into account. M = { 1 if manager, 0 if non-manager Since M is a categorical variable, just like E, we should create dummy variables M 0 and M 1 for the two categories, and consider the model S = β 0 + α 0 M 0 + α 1 M 1 + δ 1 E 1 + δ 2 E 2 + δ 3 E 3 + βx + ε. However, we don t need both M 0 and M 1 since M 0 + M 1 = 1 and the model is again overparameterized. We can drop one of M 0 and M 1 and one of E 1, E 2 and E 3. So we dropped M 0 and E 1, and consider the model S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + βx + ε.

33 Handout 1C - 33 Interaction Between Two Categorical Variables S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + βx + ε. This model implies that, on average managers earn α 1 more than non-managers; completing college increases salary by δ 2 ; completing college + advanced degree increase salary by δ 3 However, the model above assumes the effect of management status on salary does not change with education levels. Thus we may consider the follow model with management status and education level interactions. S = β 0 + α 1 M 1 + δ 2 E 2 + δ 3 E 3 + θ 2 (M E 2 ) + θ 3 (M E 3 ) + βx + ε.

34 Handout 1C - 34 Interaction Between Two Categorical Variables in R No interaction > lm3 = lm(s ~ X+E+M, data = salary) > summary(lm3) (... omitted... ) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** X < 2e-16 *** E e-11 *** E e-09 *** M < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 1027 on 41 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 41 DF, p-value: < 2.2e-16

35 Handout 1C - 35 Interaction Between Two Categorical Variables in R With interaction > lm4 = lm(s ~ X+E+M+E:M, data = salary) > summary(lm4) (... omitted... ) Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** X <2e-16 *** E <2e-16 *** E <2e-16 *** M <2e-16 *** E2:M <2e-16 *** E3:M <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 39 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 5517 on 6 and 39 DF, p-value: < 2.2e-16

36 Handout 1C - 36 Interaction Between Two Categorical Variables in R Test of interaction > anova(lm3,lm4) Analysis of Variance Table Model 1: S ~ X + E + M Model 2: S ~ X + E + M + E:M Res.Df RSS Df Sum of Sq F Pr(>F) < 2.2e-16 ***

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

STAT22200 Spring 2014 Chapter 14

STAT22200 Spring 2014 Chapter 14 STAT22200 Spring 2014 Chapter 14 Yibi Huang May 27, 2014 Chapter 14 Incomplete Block Designs 14.1 Balanced Incomplete Block Designs (BIBD) Chapter 14-1 Incomplete Block Designs A Brief Introduction to

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

4 Grouping Variables in Regression

4 Grouping Variables in Regression 4 Grouping Variables in Regression Qualitative variables as predictors So far, we ve considered two kinds of regression models: 1. A numerical response with a categorical or grouping predictor. Here, we

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Linear Model Specification in R

Linear Model Specification in R Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

STAT22200 Chapter 14

STAT22200 Chapter 14 STAT00 Chapter 4 Yibi Huang Chapter 4 Incomplete Block Designs 4. Balanced Incomplete Block Designs (BIBD) Chapter 4 - Incomplete Block Designs A Brief Introduction to a Class of Most Useful Designs in

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

General Linear Statistical Models

General Linear Statistical Models General Linear Statistical Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin This framework includes General Linear Statistical Models Linear Regression Analysis of Variance (ANOVA) Analysis

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS MATH 423/533 - ASSIGNMENT 4 SOLUTIONS INTRODUCTION This assignment concerns the use of factor predictors in linear regression modelling, and focusses on models with two factors X 1 and X 2 with M 1 and

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES How to quantify qualitative variables to quantitative variables? Why do we need to do this? Econometric model needs quantitative variables to estimate its parameters

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Example of treatment contrasts used by R in estimating ANOVA coefficients

Example of treatment contrasts used by R in estimating ANOVA coefficients Example of treatment contrasts used by R in estimating ANOVA coefficients The first example shows a simple numerical design matrix in R (no factors) for the groups 1, a, b, ab. resp

More information

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Nested 2-Way ANOVA as Linear Models - Unbalanced Example Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Randomized Block Designs with Replicates

Randomized Block Designs with Replicates LMM 021 Randomized Block ANOVA with Replicates 1 ORIGIN := 0 Randomized Block Designs with Replicates prepared by Wm Stein Randomized Block Designs with Replicates extends the use of one or more random

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

6. Dummy variable regression

6. Dummy variable regression 6. Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................

More information

Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2016 Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables. Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Salary Data Multiple Linear Regression for the Salary Data 5 10 15 20 10000 15000 20000 25000 Experience Salary HS BS BS+ 5 10 15 20 10000 15000 20000 25000 Experience Salary No Yes Problem & Data Overview Primary

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Regression Analysis Chapter 2 Simple Linear Regression

Regression Analysis Chapter 2 Simple Linear Regression Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Using R formulae to test for main effects in the presence of higher-order interactions

Using R formulae to test for main effects in the presence of higher-order interactions Using R formulae to test for main effects in the presence of higher-order interactions Roger Levy arxiv:1405.2094v2 [stat.me] 15 Jan 2018 January 16, 2018 Abstract Traditional analysis of variance (ANOVA)

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 14 pages long. There are 4 questions,

More information

Notes on Maxwell & Delaney

Notes on Maxwell & Delaney Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Regression 1: Linear Regression

Regression 1: Linear Regression Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear regression Linear regression in R Outline Classic linear regression Introduction Constructing the model Estimation

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA441: Spring Multiple Regression. More than one explanatory variable at the same time STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Announcements Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall 2017 Duke University, Department of Statistical Science Work on your project! Due date- Sunday

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

ACOVA and Interactions

ACOVA and Interactions Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA

More information