Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Size: px
Start display at page:

Download "Introduction to Linear Regression Rebecca C. Steorts September 15, 2015"

Transcription

1 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using data frames for statistical purposes Manipulation of data into more convenient forms How do we do exploratory analysis to see if linear regression is appropriate? Linear regression Let Y n 1 be the response (poverty rate) Let X n p represent a matrix of covariates (age, education level, state you live in, etc.) Thus, for each observation, there are p covariates. Let β be an unknown parameter that can help up estimate future poverty rates where ɛ N(0, σ 2 I). Y n 1 = X n p β p 1 + ɛ n 1 Estimation and Prediction We seek to estimate ˆβ which is the Let f(y ) = Y Xβ 2 and solve for arg min Y Xβ 2. f(y ) β = β [(Y Xβ)T (Y Xβ)] =... We thus, find that ˆβ = (X T X) 1 X T Y. Predicting Ŷ We can now predict new observations via, where H is often called the hat matrix. Ŷ = X(X T X) 1 X T Y = HY 1

2 How can we evaluate an linear model? Residual plots Outliers Colliearity Residual plots Graphical tool for identifying non-linearity. For each observation i, the residual is e i = y i ŷ i. Plotting the residuals is plotting e i versus one covariate x i for all i. Residuals for multiple covariates Translating to multiple covariates, we plot the residuals versus the fitted values. That is, we plot e i versus ŷ i. (Think about why on your own). A strong pattern in the residuals indicates non-linearity. Instead: non-linear transformation of the covariates. Outliers An outlier is a point for which y i is far from the value predicted by the model. Can identify these by residual plots but often it s hard to know how large a residual should be to consider a point an outlier. Instead, we can plot the studentized residuals, computed by dividing each residual e i by its estimated standard error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers. What to do if you think you have an outliner, remove it. High-leverage points Recall that outliers are observations for which the response y i is unusual given the predictor x i Observations with high leverage have an unusual value for x i Correlation It s important that the error terms e are all uncorrelated. e are uncorrelated, means that if e i > 0 provides little or no information about the sign of e i+1 The fitted values (or standard errors) are computed assumed the e are uncorrleated. If there is correlated, then the estimated std errors will tend to understimate the true std errors. This implies confidence and prediction intervals will be too narrow. 2

3 Figure 1: Observation 41 is a high leverage point, while 20 is not. The red line is the fit to all the data, and the dotted line is the fit with observation 41 removed. Center: The red observation is not unusual in terms of its X1 value or its X2 value, but still falls outside the bulk of the data, and hence has high leverage. Right: Observation 41 has a high leverage and a high residual. Similarly, p-value will be lower than they should (and this could cause a parameter to be thought to be statistically significant when it s not). Where does this happen often time series data (things are correlated). Collinearity Collinearity refers to the situation in which two or more predictor variables are closely related to each other. Suppose we are predicting balance of credit card. Credit limit and age when plotted appear to have no obvious relation. However, credit limit and credit rating do (and this is also intuitive is you have credit and own a credit card) Credit limit and rating are high correlated. Collinearity (continued) The presence of collinearity can pose problems in the regression context, since it can be difficult to separate out the individual effects of collinear variables on the response. Collinearity reduces accuracy of estimates of regression coefficients and causes std error for beta ˆ j to grow. Elements of this matrix that are large in absolute value indicated pair of highly correlated variables. Another way to check is computing the variance inflation factor (VIF). VIF of 1 says no colllinearity. VIF above 5 problematic amount. How to deal with collinearity? The first is to drop one of the problematic variables from the regression. The second solution is to combine the collinear variables together into a single predictor. (better solution since not throwing away data) Example: For take the average of standardized versions of limit and rating in order to create a new variable that measures credit worthiness. 3

4 So You ve Got A Data Frame What can we do with it? Plot it: examine multiple variables and distributions Test it: compare groups of individuals to each other Check it: does it conform to what we d like for our needs? Test Case: Birth weight data Included in R already: library(mass) data(birthwt) summary(birthwt) low age lwt race Min. : Min. :14.00 Min. : 80.0 Min. : st Qu.: st Qu.: st Qu.: st Qu.:1.000 Median : Median :23.00 Median :121.0 Median :1.000 Mean : Mean :23.24 Mean :129.8 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:3.000 Max. : Max. :45.00 Max. :250.0 Max. :3.000 smoke ptl ht ui Min. : Min. : Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median : Median : Median : Mean : Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. : Max. : Max. : ftv bwt Min. : Min. : 709 1st Qu.: st Qu.:2414 Median : Median :2977 Mean : Mean :2945 3rd Qu.: rd Qu.:3487 Max. : Max. :4990 From R help Go to R help for more info, because someone documented this (thanks, someone!) help(birthwt) Make it readable! 4

5 colnames(birthwt) [1] "low" "age" "lwt" "race" "smoke" "ptl" "ht" "ui" [9] "ftv" "bwt" colnames(birthwt) <- c("birthwt.below.2500", "mother.age", "mother.weight", "race", "mother.smokes", "previous.prem.labor", "hypertension", "uterine.irr", "physician.visits", "birthwt.grams") Make it readable, again! Let s make all the factors more descriptive. birthwt$race <- factor(c("white", "black", "other")[birthwt$race]) birthwt$mother.smokes <- factor(c("no", "Yes")[birthwt$mother.smokes + 1]) birthwt$uterine.irr <- factor(c("no", "Yes")[birthwt$uterine.irr + 1]) birthwt$hypertension <- factor(c("no", "Yes")[birthwt$hypertension + 1]) Make it readable, again! summary(birthwt) birthwt.below.2500 mother.age mother.weight race Min. : Min. :14.00 Min. : 80.0 black:26 1st Qu.: st Qu.: st Qu.:110.0 other:67 Median : Median :23.00 Median :121.0 white:96 Mean : Mean :23.24 Mean : rd Qu.: rd Qu.: rd Qu.:140.0 Max. : Max. :45.00 Max. :250.0 mother.smokes previous.prem.labor hypertension uterine.irr No :115 Min. : No :177 No :161 Yes: 74 1st Qu.: Yes: 12 Yes: 28 Median : Mean : rd Qu.: Max. : physician.visits birthwt.grams Min. : Min. : 709 1st Qu.: st Qu.:2414 Median : Median :2977 Mean : Mean :2945 3rd Qu.: rd Qu.:3487 Max. : Max. :4990 5

6 Explore it! R s basic plotting functions go a long way. plot (birthwt$race) title (main = "Count of Mother's Race in Springfield MA, 1986") Count of Mother's Race in Springfield MA, black other white Explore it! R s basic plotting functions go a long way. plot (birthwt$mother.age) title (main = "Mother's Ages in Springfield MA, 1986") 6

7 Mother's Ages in Springfield MA, 1986 birthwt$mother.age Index Explore it! R s basic plotting functions go a long way. plot (sort(birthwt$mother.age)) title (main = "(Sorted) Mother's Ages in Springfield MA, 1986") 7

8 (Sorted) Mother's Ages in Springfield MA, 1986 sort(birthwt$mother.age) Index Explore it! R s basic plotting functions go a long way. plot (birthwt$mother.age, birthwt$birthwt.grams) title (main = "Birth Weight by Mother's Age in Springfield MA, 1986") 8

9 Birth Weight by Mother's Age in Springfield MA, 1986 birthwt$birthwt.grams birthwt$mother.age Basic statistical testing Let s fit some models to the data pertaining to our outcome(s) of interest. plot (birthwt$mother.smokes, birthwt$birthwt.grams, main="birth Weight by Mother's Smoking Habit", ylab 9

10 Birth Weight by Mother's Smoking Habit Birth Weight (g) No Yes Mother Smokes Basic statistical testing Tough to tell! Simple two-sample t-test. We re testing: H o : µ 1 = µ 2 versus H a : µ 1 µ 2 where µ 1 is the mean birth weight when the mother smokes and µ 2 is the mean birth weight when the mother doesn t smoke. t.test (birthwt$birthwt.grams[birthwt$mother.smokes == "Yes"], birthwt$birthwt.grams[birthwt$mother.smokes == "No"]) Welch Two Sample t-test data: birthwt$birthwt.grams[birthwt$mother.smokes == "Yes"] and birthwt$birthwt.grams[birthwt$mother t = , df = 170.1, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y We reject the null in favor that the two means are not the same. 10

11 Basic statistical testing Does this difference match the linear model? linear.model.1 <- lm (birthwt.grams ~ mother.smokes, data=birthwt) linear.model.1 Call: lm(formula = birthwt.grams ~ mother.smokes, data = birthwt) Coefficients: (Intercept) mother.smokesyes Basic statistical testing Does this difference match the linear model? summary(linear.model.1) Call: lm(formula = birthwt.grams ~ mother.smokes, data = birthwt) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** mother.smokesyes ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 187 DF, p-value: Basic statistical testing Does this difference match the linear model? linear.model.2 <- lm (birthwt.grams ~ mother.age, data=birthwt) linear.model.2 Call: 11

12 lm(formula = birthwt.grams ~ mother.age, data = birthwt) Coefficients: (Intercept) mother.age Basic statistical testing summary(linear.model.2) Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** mother.age Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 187 DF, p-value: Basic statistical testing Diagnostics: R tries to make it as easy as possible (but no easier). Try in R proper: plot(linear.model.2) 12

13 Residuals Residuals vs Fitted Fitted values lm(birthwt.grams ~ mother.age) Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~ mother.age) 13

14 Scale Location Standardized residuals Fitted values lm(birthwt.grams ~ mother.age) Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~ mother.age) Detecting Outliers These are the default diagnostic plots for the analysis. Note that our oldest mother and her heaviest child are greatly skewing this analysis as we suspected. 14

15 birthwt.noout <- birthwt[birthwt$mother.age <= 40,] linear.model.3 <- lm (birthwt.grams ~ mother.age, data=birthwt.noout) linear.model.3 Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt.noout) Coefficients: (Intercept) mother.age Detecting Outliers summary(linear.model.3) Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt.noout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** mother.age Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 186 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 186 DF, p-value: More complex models Add in smoking behavior: linear.model.3b <- lm (birthwt.grams ~ mother.age + mother.smokes*race, data=birthwt.noout) summary(linear.model.3b) Call: lm(formula = birthwt.grams ~ mother.age + mother.smokes * race, data = birthwt.noout) Residuals: 15

16 Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** mother.age mother.smokesyes raceother racewhite ** mother.smokesyes:raceother mother.smokesyes:racewhite Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 181 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 181 DF, p-value: More complex models plot(linear.model.3b) Residuals vs Fitted Residuals Fitted values lm(birthwt.grams ~ mother.age + mother.smokes * race) 16

17 Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~ mother.age + mother.smokes * race) Scale Location 10 4 Standardized residuals Fitted values lm(birthwt.grams ~ mother.age + mother.smokes * race) 17

18 Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~ mother.age + mother.smokes * race) Everything Must Go (In) Let s do a kitchen sink model on this new data set: linear.model.4 <- lm (birthwt.grams ~., data=birthwt.noout) linear.model.4 Call: lm(formula = birthwt.grams ~., data = birthwt.noout) Coefficients: (Intercept) birthwt.below.2500 mother.age mother.weight raceother racewhite mother.smokesyes previous.prem.labor hypertensionyes uterine.irryes physician.visits Everything Must Go (In), Except What Must Not Whoops! One of those variables was birthwt.below.2500 which is a function of the outcome. 18

19 linear.model.4a <- lm (birthwt.grams ~. - birthwt.below.2500, data=birthwt.noout) summary(linear.model.4a) Call: lm(formula = birthwt.grams ~. - birthwt.below.2500, data = birthwt.noout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** mother.age mother.weight ** raceother racewhite *** mother.smokesyes ** previous.prem.labor hypertensionyes ** uterine.irryes *** physician.visits Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 638 on 178 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 178 DF, p-value: 8.255e-08 Everything Must Go (In), Except What Must Not Whoops! One of those variables was birthwt.below.2500 which is a function of the outcome. plot(linear.model.4a) 19

20 Residuals vs Fitted Residuals Fitted values lm(birthwt.grams ~. birthwt.below.2500) Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~. birthwt.below.2500) 20

21 Standardized residuals Scale Location Fitted values lm(birthwt.grams ~. birthwt.below.2500) Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~. birthwt.below.2500) 21

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Introduction to linear model

Introduction to linear model Introduction to linear model Valeska Andreozzi 2012 References 3 Correlation 5 Definition.................................................................... 6 Pearson correlation coefficient.......................................................

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Linear Modelling: Simple Regression

Linear Modelling: Simple Regression Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

THE LINEAR DISCRIMINATION PROBLEM

THE LINEAR DISCRIMINATION PROBLEM What exactly is the linear discrimination story? In the logistic regression problem we have 0/ dependent variable, and we set up a model that predict this from independent variables. Specifically we use

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

POLSCI 702 Non-Normality and Heteroskedasticity

POLSCI 702 Non-Normality and Heteroskedasticity Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold. Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Spring 2015 Instructor: Martin Farnham Unless provided with information to the contrary,

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Linear Regression. Furthermore, it is simple.

Linear Regression. Furthermore, it is simple. Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Collinearity: Impact and Possible Remedies

Collinearity: Impact and Possible Remedies Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns

More information

Introduction to Regression

Introduction to Regression Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information