BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Size: px
Start display at page:

Download "BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users"

Transcription

1 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing a simple linear regression using an R data set called week0.rdata. This data set has n=3 observations of boiling points (Y=boiling) and temperature (X=temp). Tip In this exercise you will need to create a new variable newy = 00*log 0 (boiling) Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) (ii) Y = β 0+βX or ' ' 00 log (Y) = β +βx 0 0 In developing your answer, try our hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun. # Input data that you have downloaded and saved to your desktop (edit my desktop) setwd( /Users/cbigelow/Desktop ) load(file= week0.rdata ) # Look at first 6 rows of data using command head( ) head(week0) temp boiling sol_regression( of ) R USERS.docx Page of 4

2 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # I will create newvar=oldvar as follows: newvar <- dataframe$oldvar x <- week0$temp y <- week0$boiling m <- lm(y ~ x) # OR: m <- lm(boiling~temp, data=dat) # OR: m <- lm(dat$boiling ~ dat$temp) summary(m) Call: lm(formula = y ~ x) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 4 *** x < e- 6 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error:.57 on 9 degrees of freedom Multiple R- squared: 0.907, Adjusted R- squared: F- statistic: on and 9 DF, p- value: <.e- 6 m$coefficients # output the intercept and slope (Intercept) x # confint(m) # output the CI for the parameters in the fitted model anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x <.e- 6 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' plot(x, y, main="simple Linear Regression of Y=boiling on X=temp", xlab="temp", ylab="boiling", pch=8) # pch means "plot character", can be to 8; abline(m, col="red", lty=, lwd=) # lty: "line type"; lwd: "line width". sol_regression( of ) R USERS.docx Page of 4

3 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users From the above summary(m) output, we know that: (a): Parameter Estimates: and 0.44, so regression line: Y= *X; (b): ANOVA (Analysis of Variance) table is obtained above using the anova(m) code; (c): R-square=0.907; (d): The scatterplot with the fitted line is plotted above. (). Now, instead of Y=boiling, we want to use newy=00*log0(boiling) newy=00*log0(y) m <- lm(newy ~ x) summary(m) Call: lm(formula = newy ~ x) Residuals: Min Q Median 3Q Max Coefficients: sol_regression( of ) R USERS.docx Page 3 of 4

4 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Estimate Std. Error t value Pr(> t ) (Intercept) *** x e- 5 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error:.96 on 9 degrees of freedom Multiple R- squared: 0.885, Adjusted R- squared: F- statistic: 3.7 on and 9 DF, p- value: 3.63e- 5 m$coefficients # output the intercept and slope (Intercept) x # confint(m) # output the CI for the parameters in the fitted model anova(m) Analysis of Variance Table Response: newy Df Sum Sq Mean Sq F value Pr(>F) x e- 5 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' plot(x, newy, main="simple Linear Regression of Y=00*log0(boiling) on X=temp", xlab="te mp", ylab="00*log0(boiling)", pch=8) # pch means "plot character", can be to 8; abline(m, col="red", lwd=) # lwd means "line width" sol_regression( of ) R USERS.docx Page 4 of 4

5 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users From the above summary(m) output, we know that: (a): Parameter Estimates: and 0.93, so regression line: 00log0(Y)= *X; (b): ANOVA (Analysis of Variance) table is obtained above using the anova(m) code; (c): R-square=0.885; (d): The scatterplot with the fitted line is plotted above. Solution (one paragraph of text that is interpretation of analysis): Did you notice that the scatter plot of these data reveal two outlying values? Their inclusion may or may not be appropriate. If all n=3 data points are included in the analysis, then the model that explains more of the variability in boiling point is Y=boiling point modeled linearly in X=temp. It has a greater R^ (9.07% vs. 88.5%). Be careful - It would not make sense to compare the residual mean squares of the two models because the scales of measurement involved are different. sol_regression( of ) R USERS.docx Page 5 of 4

6 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users. A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X ) and hostile suspiciousness (X ). (a) The least squares estimation equation involving both independent variables is given by Y = (X ) 7.47(X ) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 5 observed for this patient? Ŷ = X X with X =.80 and X =7.0 Ŷ = 5.53 This value is lower than the observed value of 5 (b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X and X, X alone, and X alone. Source DF Sum of Squares Regression on X 546 Residual 5 46 Source DF Sum of Squares Regression on X 60 Residual sol_regression( of ) R USERS.docx Page 6 of 4

7 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Source DF Sum of Squares Regression on X, X 784 Residual Model Containing X and X F = SSQ regression on X and X /Regression df SSQ residual / Residual df =,784 /,008 / 50 = 6.37 on DF=,50 p-value= à Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00356), prompting statistical rejection of the null hypothesis. The fitted linear model in X and X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F = SSQ Regression on X /DF Regression SSQ residual/df Residual = 546 /,46 / 5 = on DF=,5 p-value=0.047 Here, too, application of the null hypothesis model has led to an extremely unlikely result (p-value =.04), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F = SSQ Regresion on X /DF Regression SSQ Residual/DF Residual = 60 / 3,63 / 5 = on DF=,5 p-value= Here, application of the null hypothesis model has not led to an extremely unlikely result (p-value =.44). The null hypothesis is therefore not rejected. The fitted linear model in X does not explain statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. (c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y? X explains a significant proportion of the variability in Y when modelled as a linear predictor. X does not. (However, we don't know if a different functional form might have been important.) sol_regression( of ) R USERS.docx Page 7 of 4

8 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (d) What are the R values for the three regressions referred to in part (b)? Total SSQ= (Regression SSQ) + (Residual SSQ) is constant. Therefore total SSQ can be calculated from just one anova table: Total (SSQ)=,546 +,46 = 3,79 ( ) R X only = (Re gression SSQ)/(Total SSQ) R (X only) = (60)/(3,79) = 0.06 ( ) ( ) = (546)/(3,79) = 0. R Xand X = 784 /(3, 79) = 0.09 (e) What is the best model involving either one or both of the two independent variables? Eliminate from consideration model with X only. Compare model with X alone versus X and X using partial F test. Partial F = {(SSQ Regression on X,X ) - (SSQ Regression on X )}/VDF = SSQ Residual for model w X,X /Residual DF = on DF =,50 P-value =0.06 Addition of X to model containing X is statistically significant (p-value =.0). More appropriate model includes X and X ( ) / (,008) / 50 sol_regression( of ) R USERS.docx Page 8 of 4

9 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users 3. In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log 0 (dose) and log 0 (larva weight) to log 0 (survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva Y = log 0 (survival time) X =log 0 (dose) X =log 0 (weight) Larva Y = log 0 (survival time) X =log 0 (dose) X =log 0 (weight) Y = (X ) Y = (X ) Y = (X ) + ).87 (X ) Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X, X Residual sol_regression( of ) R USERS.docx Page 9 of 4

10 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (a) Test for the significance of the overall regression involving both independent variables X and X. X and X F = (SSQ regression on X and X ) / = (0.464) / = 59.8 on DF =, (SSQ Residual) / (0.047) / P value < Application of the null hypothesis model has led to an extremely unlikely result (p-value =.000), prompting statistical rejection of the null hypothesis. The fitted linear model in X and X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. (b) Test to see whether using X alone significantly helps in predicting survival time. X alone F = (0.3633) / (0.480) / 3 = (SSQ Regression on X ) / = 3.95 on DF =,3 (SSQ Residual) / 3 P value = Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00008), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. (c) Test to see whether using X alone significantly helps in predicting survival time. X alone F = (SSQ Regression on X ) / (SSQ Residual) / 3 P value = = (0.3367) / = 5.07 on DF =,3 (0.746) / 3 Application of the null hypothesis model has led to an extremely unlikely result (p-value =.0007), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. sol_regression( of ) R USERS.docx Page 0 of 4

11 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (d) Compute R for each of the three models. TotalSSQ = 0.53 ( ) R X and X = / 0.53 = R (X alone) = / 0.53 = ( ) R X alone = / 0.53 = (e) Which independent predictor do you consider to be the best single predictor of survival time? Using just the criteria of the overall F test and comparison of containing X is better. R, the single predictor model (f) Which model involving one or both of the independent predictors do you prefer and why? Partial F for comparing model with Xalone versus model with X and X ( ΔRegressionSSQ)/ΔReg DF ( = = )/ - ( ResidualSSQ-model w X, X )/ResidualDF-model w X, X 0.047/ = on DF=, P-value= Choose model with both X and X ( ) sol_regression( of ) R USERS.docx Page of 4

12 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users 4. Using R, try your hand at reproducing the analysis of variance tables you worked with in problem #3. load(file= larvae.rdata ) dat <- larvae # model regression on x alone m <- lm(y ~ x, data=dat) summary(m) Call: lm(formula = y ~ x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 5 *** x e- 05 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: on 3 degrees of freedom Multiple R- squared: 0.705, Adjusted R- squared: F- statistic: 3.9 on and 3 DF, p- value: 7.944e- 05 anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x e- 05 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page of 4

13 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # model regression on x alone m <- lm(y ~ x, data=dat) summary(m) Call: lm(formula = y ~ x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 3 *** x *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: 0.59 on 3 degrees of freedom Multiple R- squared: , Adjusted R- squared: 0.63 F- statistic: 5.07 on and 3 DF, p- value: anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page 3 of 4

14 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # model regression on x and x m3 <- lm(y ~ x + x, data=dat) summary(m3) Call: lm(formula = y ~ x + x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 3 *** x e- 05 *** x *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: on degrees of freedom Multiple R- squared: , Adjusted R- squared: F- statistic: 59.8 on and DF, p- value: 6.085e- 07 anova(m3) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x e- 07 *** x *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page 4 of 4

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users Unit Regression and Correlation 1 of - Practice Problems Solutions Stata Users 1. In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week0.dta.

More information

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA PubHlth 640. Regression and Correlation Page 1 of 19 Unit Regression and Correlation Practice Problems SOLUTIONS Version STATA 1. A regression analysis of measurements of a dependent variable Y on an independent

More information

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016 BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 1 of 5 Unit 2 Regression and Correlation WEEK 3 - Practice Problems Due: Wednesday February 10, 2016 1. A psychiatrist wants to know whether

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Second Midterm Exam Name: Solutions March 19, 2014

Second Midterm Exam Name: Solutions March 19, 2014 Math 3080 1. Treibergs σιι Second Midterm Exam Name: Solutions March 19, 2014 (1. The article Withdrawl Strength of Threaded Nails, in Journal of Structural Engineering, 2001, describes an experiment to

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh? Admistrivia Practice 2 due today. Assignment from Berndt due Monday. 1 Story: Pair programming Mythical man month If you double the number of programmers the amount of time it takes doubles. Huh? Invention

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Lecture 19: Inference for SLR & Transformations

Lecture 19: Inference for SLR & Transformations Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Garvan Ins)tute Biosta)s)cal Workshop 16/6/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

Garvan Ins)tute Biosta)s)cal Workshop 16/6/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia Garvan Ins)tute Biosta)s)cal Workshop 16/6/2015 Tuan V. Nguyen Tuan V. Nguyen Garvan Ins)tute of Medical Research Sydney, Australia Introduction to linear regression analysis Purposes Ideas of regression

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE)) School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 One Variable x = spacing of plants (either 4, 8 12 or 16 inches), and y = plant yield (bushels per acre). Each condition

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Linear Model Specification in R

Linear Model Specification in R Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Regression Analysis Chapter 2 Simple Linear Regression

Regression Analysis Chapter 2 Simple Linear Regression Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Linear Modelling: Simple Regression

Linear Modelling: Simple Regression Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 18 Miscellaneous Topics in Multiple Regression Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information