Topics on Statistics 2

Size: px
Start display at page:

Download "Topics on Statistics 2"

Transcription

1 Topics on Statistics 2 Pejman Mahboubi March 7, Regression vs Anova In Anova groups are the predictors. When plotting, we can put the groups on the x axis in any order we wish, say in increasing or decreasing order of their means or even alphabetical order of the group names. There is no relation between the means µ 1, µ 2,, µ g, other than the ones specified by constraints. In regression, predictors are numbers, say average winter time daily temperature, which has a natural order on the horizontal axis T. Let y be their corresponding values of the mean energy consumption. Furthermore, the regression assumption is that these means form a straight line. > set.seed(1114) > T<-sort(runif(15,20,50))#independent or predictor variable > E< *T+rnorm(15,0,.2)#dependent or response variable > (df<-data.frame(temperature=round(t,1),energy=round(e,2))) Temperature Energy > plot(t,e,xlab="temperature",ylab="energy",cex.lab=.5,cex.axis=.4,pch=20,tcl=-0.1) 1

2 Energy Temperature 1. We assume at every temperature T = t, the value E = e is sampled from an independent normal distribution which is assigned to that specific temperature t 2. Therefore, each temperature corresponds to its own population. 3. We assume that the variance of the populations are the same: σ We assume that the mean of these populations are on a straight line. (the red line below) What we are looking for is the equation of the straight line that goes through the points: > plot(t,e,xlab="temperature",ylab="energy",cex.lab=.5,cex.axis=.4,pch=20,tcl=-0.1) > abline(lm(e~t),col='red') 2

3 Energy Temperature 2 Prediction What is your predicted energy consumption on a day that temperature is 30, or 43.44? The regression model claims that the average values fo the energy consumption falls on a line, R computes β 0 and β 1 : > fit<-lm(energy~temperature,df) > summary(fit) e := e(t) = β 0 + β 1 t 3

4 Call: lm(formula = Energy ~ Temperature, data = df) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Temperature e-07 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 13 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 13 DF, p-value: 4.541e-07 If you just want to see the coefficients, write > coef(fit) (Intercept) Temperature Therefore, β 0 = and β 1 = R has a prediction function. It is very simple. Something like: > predict_fn<-function(lmobj,t){ + sum(coef(lmobj)*c(1,t)) + } > predict_fn(fit,44) [1] or we can use R s function: > newdata = data.frame(temperature=44) > predict(fit,newdata)

5 3 Fitted values vs. Actual For each temperature t, the fitted value is the energy consumption predicted by the model. The fitted value is the y value of the point in the red line for the specific t. There are multiple ways to compute them. 1. Use the predict function with the original data frame > predict(fit,df) Use the object fit: > fit$fitted.value Use our own function > sapply(df$temperature,function(x) predict_fn(fit,x)) [1] [9] Residuals Just as in anova, the distance between the fitted value and the actual value is called residual. residaul at t = observed e fitted value e residaul at t = e ê We can easily compute all 15 residuals. There are multiple ways of doing this: 1. Use the predict function > df$energy-predict(fit,df)

6 2. Use the resid() function > resid(fit) Remember that populations of energy corresponding to different temperatures are independent. Therefore, residuals are independent normal. Residuals are sampled from a centered normal distribution with fixed variance. Let s take a look at the histograms of the residuals > hist(resid(fit),10) Histogram of resid(fit) Frequency resid(fit) qqpnorm() is another tool that allows us to compare any set of numbers with normal distributions. > qqnorm(resid(fit),ann=false,cex.lab=.4,cex.axis=.4,pch=20,tcl=-0.1,cex.main=.8,mgp = > mtext(side=1,text="theoretical Quantile", line=1,cex.lab=.4,cex.axis=.4,pch=20,tcl=-0 6

7 > mtext(side = 2, text = "Sample Quantile", line = 1,cex.lab=.4,cex.axis=.4,pch=20,tcl= > qqline(resid(fit)) Sample Quantile Theoretical Quantile 4 Explained, Unexplained variation and R-squared If we consider every temperature as a separate group, similar to anova we can define sum of square of residuals as the undefined variation. Similarly, we can compute the grand mean and and define the distance of the fitted values to the grand mean as the explained variation. In our example we have the residuals, so the unexplained variation is > (within.var<-sum((resid(fit))^2)) [1] The explained variation is > (between.var<-sum((fitted.values(fit)-mean(df$energy))^2)) [1]

8 and the total variation is > (total.var<-var(df$energy)*(length(df$energy)-1)) [1] And, once again we can check that total.var equals within.var + between.var > within.var+between.var [1] We also can compute the R-squared, which is the ratio of the explained variation to the total variation, or in codes > (r.sqrd<-between.var/total.var) [1] which matches the R-squared reported by the summary() function. F-statistics also mentioned in the summary() output. Remember that F-statistics is the average Explained var to the average Unexplained var. So the question is what are the degrees of freedom for the explained and unexplained variations. The unexplained variation comes from the residuals. There are 15 residuals r 1,, r 15. But they are not arbitrary numbers. Linear regression puts 2 constraints on them 1 : Constraint1 The sum of residuals is zero: Let s check this: r r n = 0. > sum(resid(fit)) [1] e-17 Constraint2 The predictor Temperature is perpendicular to the residuals, i.e., T emperature Residual = (t 1,, t 15 ) (r 1,, r 15 ) = t 1 r 1 + t 2 r t 15 r 15 = 0 Let s check this: > sum(df$temperature*resid(fit)) [1] e-14 Therefore, with two constraints, the degree of freedom of the residuals is 15-2=13. Next look at the between variations, which is the distance of the 15 fitted values to the horizontal line Energy = mean(df$energy), grand mean. The only parameter we need for computing between.var is β 1 (it is not so trivial though). Therefore the degree of freedom of the explained var is 1. Therefore, the F-statistics is 1 We will soon see where these constraints come from 8

9 > (between.var/1)/(within.var/13) [1] Definition 1 (Erros vs. Residuals). Let s draw 5 samples from N(10, 1) > (x<-rnorm(5,10,1)) [1] You can check that mean(x) is not zero, and var(x) is not one. > mean(x) [1] > var(x) [1] errors are x 10 > (errors=x-10)# distance to the mean of the population [1] You can check that mean(error)!=0, > mean(errors) [1] If we don t privy to the population, Errors remain unknown to us. But residuals are x- mean(x) > (residuals<-x-mean(x)) [1] And mean of residuals is zero > mean(residuals) [1] e-16 9

10 4.1 Equation of the regression line Assume, we are given points > head(df) Temperature Energy Likelihood Function Cost Function In linear regression, cost function is the sum of square of residuals. Let f(x) = a + b x denote the regression line, then Then the cost function is Residual corresponding pint T = E f(t ) = E a b T. (1) 4.2 Pearson s Correlation Coefficient 15 cost(df, f) = (E f(t )) 2 (2) 1 15 = (E a b T ) 2 (3) 1 If x = x 1,, x n and y = y 1,, y n, then the correlation coefficient r between x and y is defined as n i=1 (x x)(y ȳ) r = n i=1 (x x)2 n i=1 (y ȳ)2 or n i=1 (x x)(y ȳ) covar(x, y) r = = sd(x)sd(y) sd(x)sd(y) where x and ȳ denotes the mean of x and y respectively r 1. This follows from a mathematical inequality called Schwarz inequality, 2. If r = 1, then there is a perfect negative linear relationship between x and y. If r = 1, then there is a perfect positive linear relationship between x and y. 10

11 3. If r = 0, then there is no linear relationship between x and y. 4. All other values of r tell us that the relationship between x and y is not perfect. The closer r is to 0, the weaker the linear relationship. The closer r is to 1, the stronger the negative linear relationship. And, the closer r is to 1, the stronger the positive linear relationship The Schwarz inequality states that X n Y n 2 X 2 n Y 2 n. Let see an example > x<-rnorm(10);y<-runif(10,20,1000) > #Schwarz inequality implies that > (sum(x*y))^2<sum(x^2)*sum(y^2) [1] TRUE The dot product of two vectors x, y is a measure of similarity between the two. To make it comparable, we can normalize the vectors by their lengths. For example for x = (1, 0) let us see which one of the following vectors is most similar to x according to the dot product. y1 = (1,.2), y2 = c(1, 1), y3 = c(0, 1). > x<-c(1,0) > y1<-c(1,.2);y2<-c(1,1);y3<-c(0,1) > sapply(list(y1,y2,y3),function(z) sum(x*z)/sum(z^2)) [1] Returning to energy vs. temperature, let us see what is the correlation coefficient between the 2. > with(df,cor(temperature,energy)) [1] It is very close to 1, but the slope of the regression line is > coef(fit) (Intercept) Temperature That is very close to 0. So, the value of b doesn t say much about the true strength of the relation between x and y. But, this is because the variance on the y axis is much smaller than the variance on x axis. If we multiply b by sd(temperature)/sd(energy). > unname(coef(fit)[2])*sd(df$temperature)/sd(df$energy) [1] By the way, this gives us the formula for computing b, i.e., b = r sd(y) sd(x) = covar(x, y) sd(x) sd(y) sd(y) sd(x) = covar(x, y) var(x) = n i=1 (x x)(y ȳ) var(x) (3) 11

12 4.2.1 Connection to the R-squared Both r and R 2 are indications of the goodness of the fit. For computing r we don t need to fit the data. To compute R 2 we need the explained and unexplained data. Means that we need the fitted values to compute residuals. We know that 0 R 2 1 and 1 r 1. However, r 2 just like R 2 is positive and in simple linear regression, they are closely related. The relationship is r 2 = R 2 (3) If R 2 =.9, then r 2 =.9, which means r =.9 = So either r = or r = How can we find it out? The answer is, if b > 0, then r > 0, and if b < 0, then we take the negative value of r. Example 1. Using the fit object calculate Pearson r. Solution. Since the model fit is linear, we can compute r using R 2. First we find R 2 > summary(fit)$r.squared [1] Therefore, the Pearson correlation is one of the two roots of R 2 > (sqrt(summary(fit)$r.squared)) [1] > #or > -(sqrt(summary(fit)$r.squared)) [1] Since b is negative: > (b=coef(fit)[2]) Temperature Therefore, r is also negative, and r = cor(x,y) function: We can compute r directly using > cor(df$energy,df$temperature) [1] Remark r measures the linear relation. R 2 measure the percentage of the variation explained by the model. So R 2 is not model independent r is. 12

13 Consider the following data: > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) y x Here, there is an almost perfect relation between x and y, i.e., y = x 2 3. Let me add the plot of y = x > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) > a<-seq(from = -2,to = 2,.5) 13

14 > b<-a^2-3 > lines(a,b) y x If we don t know that a line is not a best fit, we would do this: > fit.1<-lm(y~x) Let s look at the result: > summary(fit.1) Call: lm(formula = y ~ x) 14

15 Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** x Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 18 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 18 DF, p-value: Let s look at the plots of prediction and the data > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) > abline(fit.1,col='red') 15

16 y x So, what should we do. We know that a quadratic term would create a better result. Here is how we proceed > fit.2<-lm(y~i(x^2)+i(x)) > summary(fit.2) Call: lm(formula = y ~ I(x^2) + I(x)) Residuals: Min 1Q Median 3Q Max

17 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** I(x^2) <2e-16 *** I(x) Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1389 on 2 and 17 DF, p-value: < 2.2e-16 Note that R 2 = Let s check the correlation coefficient: > cor(x,y) [1] How to plot the data and the model? > plot(x,y) > lines(a,predict(object = fit.2,newdata = data.frame(x=a)),col='red') 17

18 y x 4.3 Optimization and Gradient Descent Remember how we computed the coefficients a and b for the simple linear regression y = a + b x. Assume the data is x = x 1,, x n and y = y 1,, y n, then the fitted values are Then residuals are ŷ i = a + b x i for i = 1,, n (3) res i = y i ŷ i for i = 1,, n = y i (a + b x i ) for i = 1,, n 18

19 Therefore, the cost function is cost(data, model) = n res 2 i = i=1 n (y i (a + b x i )) 2 (3) i=1 Therefore, cost is actually a function of a and b. The goal is to determine a and b that minimizes cost(a, b). Last time we saw how to do it mathematically, i.e., we compute the roots of the partial derivatives, by solving a cost = 0 b cost = 0. Computing softwares don t solve this equation, because as the number of predictors increases, the complexity increases significantly. The solution often involves finding the inverse matrices which are computationally very expensive. Example 2. Compute the cost function for the energy data. > #cost function > cost<-function(a,b){ + sum((df$energy-a-b*df$temperature)^2) + } > (cost(5,-1)) [1] > (cost(10,.02)) [1] Since we solved this problem before, we already know what is the best value of a and b. By looking at the coef(fit), we know that a = and b = The cost function for these values is minimum. > cost( , ) [1] The method of gradient descent states that, starting from any point, say a = 3, b = 0 19

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Simple linear regression

Simple linear regression Simple linear regression Thomas Lumley BIOST 578C Linear model Linear regression is usually presented in terms of a model Y = α + βx + ɛ ɛ N(0, σ 2 ) because the theoretical analysis is pretty for this

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Part 1: Conceptual ideas about correlation and regression Tintle 10.1.1 The association would be negative (as distance increases,

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Linear Modelling: Simple Regression

Linear Modelling: Simple Regression Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Exam 3 Practice Questions Psych , Fall 9

Exam 3 Practice Questions Psych , Fall 9 Vocabular Eam 3 Practice Questions Psch 3101-100, Fall 9 Rather than choosing some practice terms at random, I suggest ou go through all the terms in the vocabular lists. The real eam will ask for definitions

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016 Lecture 1: Random number generation, permutation test, and the bootstrap August 25, 2016 Statistical simulation 1/21 Statistical simulation (Monte Carlo) is an important part of statistical method research.

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Principal components

Principal components Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

North Carolina Offshore Buoy Data. Christopher Nunalee Deirdre Fateiger Tom Meiners

North Carolina Offshore Buoy Data. Christopher Nunalee Deirdre Fateiger Tom Meiners 1 North Carolina Offshore Buoy Data Christopher Nunalee Deirdre Fateiger Tom Meiners 2 Table of Contents Executive Summary 3 Description of Data..3 Statistical Analysis..4 Major Findings 7 Discussion 8

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

GMM - Generalized method of moments

GMM - Generalized method of moments GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

ALGEBRA 2 MIDTERM REVIEW. Simplify and evaluate the expression for the given value of the variable:

ALGEBRA 2 MIDTERM REVIEW. Simplify and evaluate the expression for the given value of the variable: ALGEBRA 2 MIDTERM REVIEW Evaluating Expressions: 1.) -3 + 3(-2+ 5) 2 2.) ( -5 ) 2 3.) -5 2 Simplify and evaluate the expression for the given value of the variable: 4.) f(x) = x 2 + x 8; find f(-2) 5.)

More information

bivariate correlation bivariate regression multiple regression

bivariate correlation bivariate regression multiple regression bivariate correlation bivariate regression multiple regression Today Bivariate Correlation Pearson product-moment correlation (r) assesses nature and strength of the linear relationship between two continuous

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

AP Statistics L I N E A R R E G R E S S I O N C H A P 7 AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Linear Regression. Furthermore, it is simple.

Linear Regression. Furthermore, it is simple. Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Math M111: Lecture Notes For Chapter 3

Math M111: Lecture Notes For Chapter 3 Section 3.1: Math M111: Lecture Notes For Chapter 3 Note: Make sure you already printed the graphing papers Plotting Points, Quadrant s signs, x-intercepts and y-intercepts Example 1: Plot the following

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM 1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information