1 Multiple Regression
|
|
- Gerald Grant
- 6 years ago
- Views:
Transcription
1 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only as an introduction. We start with an example. Example 1.1. The dataset fat in the faraway package contains several body measurements of 252 adult males. Included in this dataset are two measures of the percentage of body fat, the Brozek and Siri indices. Each of these indices computes the percentage of body fat from the density (in gm/cm 3 ) which in turn is approximated by an underwater weighing technique. This is a time-consuming procedure and it might be useful to be able to estimate the percentage of body fat from easily obtainable measurements. For example, it might be nice to have a relationship of the following form: density = f(x 1,..., x k ) for k easily measured variables x 1,..., x k. We will first investigate the problem of approximating body fat by a function of only weight and abdomen circumference. The data on the first two individuals is given for illustration. > fat[1:2,] brozek siri density age weight height adipos free neck chest abdom hip thigh knee ankle biceps forearm wrist The notation gets a bit messy. We will continue to use y for the response variable and we will use x 1,..., x k for the k explanatory variables. We will again assume that there are n individuals and use the subscript i to range over individuals. Therefore, the i th data point is (x 1i, x 2i,..., x ki, y i ). The standard linear model now becomes the following. The standard linear model The standard linear model is given by the equation y = β 0 + β 1 x β k x k + ε (1) where 1. ε has mean 0 2. ε has standard deviation σ 3. and ε has a normal distribution. We again assume that the n data points are the result of independent ɛ 1,..., ɛ n. To find good estimates of β 0,..., β k we proceed exactly as in the case of one predictor and find the least squares estimates. Specifically, let b i be an estimate of β i and define ŷ i = b o + b 1 x ii + b 2 x 2i + + b k x ki. We choose these estimates so that we minimize SSE where SSE = n (y i ŷ i ) 2. i=1 It is routine to find the values of the b i s that minimize SSE. R computes them with dispatch. Suppose that we use weight and abdomen circumference to try to predict the Brozek measure of body fat.
2 > l=lm(brozek~weight+abdom,data=fat) > l lm(formula = brozek ~ weight + abdom, data = fat) (Intercept) weight abdom In the case of multiple predictors, we need to be very careful in how we interpret the various coefficients of the model. For example b 1 = 0.14 in this model seems to indicate that body fat is decreasing as a function of weight. This is counter to our intuition and our experience which says that the heaviest men tend to have more body fat than average. On the other hand, the coefficient b 2 = seems to be consistent with the relationship between stomach girth and body fat that we know. The key here is that the coefficient b 1 measures the effect of weight on body fat for a fixed abdomen circumference. This makes more sense. Among individuals with a fixed abdomen circumference, the heavier individuals tend to be taller and so have perhaps less body fat. Even this interpretation needs to be expressed carefully however. It is misleading to say that body fat decreases as weight increases with abdomen circumference held fixed since increasing weight tends to increase abdomen circumference. We will come back to this relationship in a moment but first we investigate the problem of inference in this linear model. The short story of inference is that all of the results for the one predictor case have the obvious extensions to more than one variable. To estimate σ, we again use the residual standard error, s e, except that we define it by SSE s e = n (k + 1). The denominator in s e is simply n p where p is the number of estimated coefficients in the model. Using the estimate s e of σ, we can again produce an estimate se(b j ) of the standard deviation of b j and produce confidence intervals for β j. For the body fat data we have > summary(l) lm(formula = brozek ~ weight + abdom, data = fat) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** weight e-11 *** abdom < 2e-16 *** Residual standard error: on 249 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 249 DF, p-value: < 2.2e-16 > confint(l) 2.5 % 97.5 % (Intercept) weight abdom From the output we observe the following. Our estimate for σ is the residual standard error, We note that 249 degrees of freedom are used which is since there are three parameters. We can compute the confidence interval for β 1 from the summary table (b 1 = 0.14 and se(b 1 ) = 0.019) using the t distribution with 249 degrees of freedom or from the R function confint.
3 We can compute confidence intervals for the expected value of body fat and prediction intervals for an individual observation as well. Investigating what happens for a male weighing 180 pounds with an abdomen measure of 82 cm gives the following prediction and confidence intervals: > d=data.frame(weight=180, abdom=82) > predict(l,d,interval='confidence') fit lwr upr > predict(l,d,interval='prediction') fit lwr upr The average body fat of such individuals is likely to be between 7.9% and 10.4%. An individual male not part of the dataset is likely to have body fat between 0.91% and 17.4%. We now return to the issue of interpreting the coefficients in the linear model. In the case of the body fat example, let s fit a model with weight as the only predictor. > lm(brozek~weight,data=fat) lm(formula = brozek ~ weight, data = fat) (Intercept) weight Notice that the sign of the relationship between weight and body fat has changed! Using weight alone, we predict an increase of 0.16 in percentage of body fat for each pound increase in weight. What has happened? Let s first restate the two fitted linear relationships: brozek = weight abdom (2) brozek = weight (3) In order to understand the relationships above, it is important to understand that there is a relationship between weight and the abdomen measurement. One more regression is useful. > lm(abdom~weight,data=fat) lm(formula = abdom ~ weight, data = fat) (Intercept) weight Now suppose that we change weight by 10 pounds. The last analysis says that we would predict that the abdomen measure increases by 3.3 cm. Using (2) we see that a increase in 10 pounds of weight and an increase of 3.3 cm in abdomen circumference causes an increase of 10 ( 0.14) (3.3) = 1.6% in Brozek index. But this is precisely what an increase in 10 pounds of weight should produce according (3). The fact that our predictors are linearly related in the set of data (and so presumably in the population that we are modeling) is known as multicollinearity. The presence of multicollinearity makes it difficult to give simple interpretations of the coefficients in a multiple regression. Interaction terms Consider our linear relationship, brozek = weight abdom. This model implies that for any fixed value of abdom, the slope of the line relating brozek to weight is always An alternative (and more complicated) model would be that the slope of this line also changes as the value of abdom changes. One strategy
4 for incorporating such behavior into our model is to add an additional term, an interaction term. The equation for the linear model with an interaction term in the case that there are only two predictor variables is Y = β 0 + β 1 x 1 + β 2 x 2 + β 1,2 x 1 x 2 + ɛ. While this is not the only way that two variables could interact, it seems to be the simplest possible way. R allows us to add an interaction term using a colon. > lm(brozek~weight+abdom+weight:abdom,data=fat) lm(formula = brozek ~ weight + abdom + weight:abdom, data = fat) (Intercept) weight abdom weight:abdom While the coefficient for the interaction term ( ) seems small, one should realize that the values of the product of these two variables are large so that this term contributes significantly to the sum. On the other hand, in the presence of this interaction term, the contribution of the term for weight is now very small. With all the possible variables that we might include in our model and with all the possible interaction terms, it is important to have some tools for evaluating different choices. We take up this issue in the next section. 2 Evaluating Models In the previous section, we considered several different linear models for predicting the Brozek body fat index from easily determined physical measurements. Other models could be considered by using other physical measurements that were available in the dataset. How should we evaluate one of these models and how should we choose among them? One of the principle tools used to evaluate such models is known as the analysis of variance. Given a linear model (any model, really), we choose the parameters to minimize SSE. Recall SSE = n (y i ŷ i ) 2. i=1 Therefore it seems reasonable to suppose that a model with smaller SSE is better than one with large SSE. Such a model seems to explain or account for more of the variation in the y i. Consider the two models for body fat, one using only abdomen circumference and the other only weight.
5 > la=lm(brozek~abdom,data=fat) > anova(la) Analysis of Variance Table Response: brozek Df Sum Sq Mean Sq F value Pr(>F) abdom < 2.2e-16 *** Residuals > lw=lm(brozek~weight,data=fat) > anova(lw) Analysis of Variance Table Response: brozek Df Sum Sq Mean Sq F value Pr(>F) weight < 2.2e-16 *** Residuals Among other things, the function anova() tells us that SSE = 5, 095 for the linear model using abdomen circumference and SSE = 9, 410 for the model using only weight. While this comparison seems clearly to indicate that abdomen circumference predicts Brozek index better on average than does weight, using SSE as an absolute measure of goodness of fit is has two shortcomings. First, the units of SSE are in terms of the squares of y units which means that SSE will tend to be large or small according as the observations are large or small. Second, we will obviously reduce SSE by including more variables in the model so that comparing SSE does not give us a good way of comparing, say, the model with abdomen circumference and weight to the model with abdomen circumference alone. We address the first issue first. We would like to transform SSE into a dimension free measurement. The key to doing this is to compare SSE to the maximum possible SSE. To do this, define SSTotal = n (y i ȳ) 2, i=1 The quantity SSTotal is just the quantity SSE for the model with only a constant term. The quantity SSTotal can be computed from the output of the function anova() by summing the column labeled Sum Sq. For the body fat data, that number is SSTotal = 1, We first note that 0 SSE SSTotal. This is because choosing b 0 = ȳ and b 1 = 0 would already achieve SSE = SSTotal but SSE is the minimum among all choices of b 0, b 1. Using this fact, we have a first measure of the fit of a linear model. Define R 2 = 1 SSE SSTotal. We have that 0 R 2 1 and R 2 is close to 1 if linear part of the model fits the data well. The number R 2 is sometimes called the coefficient of determination of the model and is often read as a percentage. In the model for Brozek index which uses only abdomen circumference, we can compute R 2 from the statistics in the analysis of variance table or else we can read it from the summary of the regression where it is labeled Multiple R-Squared. We read the result below as abdomen circumference explains 66.2% of the variation in Brozek index and weight explains 37.6% of the variation in the Brozek index.
6 > summary(la) lm(formula = brozek ~ abdom, data = fat) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** abdom <2e-16 *** Residual standard error: on 250 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 250 DF, p-value: < 2.2e-16 > summary(lw) lm(formula = brozek ~ weight, data = fat) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** weight < 2e-16 *** Residual standard error: on 250 degrees of freedom Multiple R-squared: 0.376, Adjusted R-squared: F-statistic: on 1 and 250 DF, p-value: < 2.2e-16 The number R 2 values for two different models with the same number of parameters gives us a reasonable way to compare their usefulness. However R 2 is a misleading tool for comparing models with differing numbers of parameters. After all, if we allow ourselves n different parameters (i.e., we have n different explanatory variables), we will be able to fit the data exactly and so achieve R 2 = 100%. There are a many ways to compare two such models the one that we explore here compares two nested models based on their respective SSE. Suppose that we have two different models, one of which is nested in the other. For example, the models model1 : model2 : brozek = β 0 + β 1 weight brozek = β 0 + β 1 weight + β 2 abdom are nested in this way. We compute a statstic, called the F statistic, that compares the SSE of the two models. In the following notation, suppose we have two models, model1 with p 1 parameters and sum of squared residuals equal to SSE 1 and similarly for model2. Here model2 contains all the parameters of model1 so that p 1 < p 2. Define F = (SSE 1 SSE 2 )/(p 2 p 1 ) SSE 1 /(n p 1 )
7 If model2 is substantially better than model1, then the F statistic is large but if not then the F statistic is small. We now take as our null hypothesis that all the extra parameters in model2 are 0. The anova() function of R computes the F statstic and the p-value for this null hypothesis: > anova(lw,l) Analysis of Variance Table Model 1: brozek ~ weight Model 2: brozek ~ weight + abdom Res.Df RSS Df Sum of Sq F Pr(>F) < 2.2e-16 *** This result (given the extremely low p value) should be read as including abdom in the model explains significantly more variation that the model that has weight alone. In particular, this p-value tests the null hypothesis H 0 : β 2 = 0.
Inference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationExample: 1982 State SAT Scores (First year state by state data available)
Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationMultiple Regression Part I STAT315, 19-20/3/2014
Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More information1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationBayesian Linear Models
Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationSTAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR
STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationσ σ MLR Models: Estimation and Inference v.3 SLR.1: Linear Model MLR.1: Linear Model Those (S/M)LR Assumptions MLR3: No perfect collinearity
Comparison of SLR and MLR analysis: What s New? Roadmap Multicollinearity and standard errors F Tests of linear restrictions F stats, adjusted R-squared, RMSE and t stats Playing with Bodyfat: F tests
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationLECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit
LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationRecall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:
1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationData Analysis 1 LINEAR REGRESSION. Chapter 03
Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using
More informationThe simple linear regression model discussed in Chapter 13 was written as
1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More information28. SIMPLE LINEAR REGRESSION III
28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More information2. Outliers and inference for regression
Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16
More informationChapter 11: Linear Regression and Correla4on. Correla4on
Chapter 11: Linear Regression and Correla4on Regression analysis is a sta3s3cal tool that u3lizes the rela3on between two or more quan3ta3ve variables so that one variable can be predicted from the other,
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationRegression Analysis IV... More MLR and Model Building
Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationSimple Linear Regression Using Ordinary Least Squares
Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationMultimodel Inference: Understanding AIC relative variable importance values. Kenneth P. Burnham Colorado State University Fort Collins, Colorado 80523
November 6, 2015 Multimodel Inference: Understanding AIC relative variable importance values Abstract Kenneth P Burnham Colorado State University Fort Collins, Colorado 80523 The goal of this material
More informationPubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I
PubH 7405: REGRESSION ANALYSIS MLR: INFERENCES, Part I TESTING HYPOTHESES Once we have fitted a multiple linear regression model and obtained estimates for the various parameters of interest, we want to
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationStatistics 5100 Spring 2018 Exam 1
Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationSections 7.1, 7.2, 7.4, & 7.6
Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25 Chapter 7 example: Body fat n = 20 healthy females 25 34
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationUnbalanced Data in Factorials Types I, II, III SS Part 1
Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More information