Stat 328 Regression Summary

Size: px
Start display at page:

Download "Stat 328 Regression Summary"

Transcription

1 Simple Linear Regression Model Stat 328 Regression Summary The basic (normal) "simple linear regression" model says that a response/output variable ` depends on an explanatory/input/system variable _ in a "noisy but linear" way. That is, one supposes that there is a linear relationship between _ and mean `, K `Š_ º? >??_ and that (for fixed _) there is around that mean a distribution of ` that is normal. Further, the model assumption is that the standard deviation of the response distribution is constant in _. In symbols it is standard to write `º? >??_B where B is normal with mean > and standard deviation R. This describes one `. Where several observations ` with corresponding values _ are under consideration, the assumption is that the ` (the B ) are independent. (The B are conceptually equivalent to unrelated random draws from the same fixed normal continuous distribution.) The model statement in its full glory is then ` º? >??_ B for º? B for º? are independent normal î> R ï random variables The model statement above is a perfectly theoretical matter. One can begin with it, and for specific choices of? >?? and Rfind probabilities for ` at given values of _. In applications, the real mode of operation is instead to take data pairs î_? î`? ïîî_ î` ïîîî_ î` ï and use them to make inferences about the parameters? > î?? and R and to make predictions based on the estimates (based on the empirically fitted model). Descriptive Analysis of Approximately Linear î_ `ï Data After plotting î_î`ï data to determine that the "linear in _ mean of `" model makes some sense, it is reasonable to try to quantify "how linear" the data look and to find a line of "best fit" to the scatterplot of the data. The sample correlation between ` and _ Yº bî_ Ÿ _ïî` Ÿ `ï º? b î_ Ÿ _ï b î` Ÿ `ï is a measure of strength of linear relationship between _ and `. º? º? 1

2 Calculus can be invoked to find a slope? and intercept? minimizing the sum of? > squared vertical distances from data points to a fitted line, bî` Ÿ î?? _ ïï. These " least squares" values are I º? bî_ Ÿ _ïî` Ÿ `ï º? bî_ Ÿ _ï º? º? >? and I > º `ŸI?_ It is further common to refer to the value of ` on the "least squares line" corresponding to _ as a fitted or predicted value ` º I I _ >? One might take the difference between what is observed ( ` ) and what is "predicted" or "explained" ( ` ) as a kind of leftover part or " residual" corresponding to a data value L º ` Ÿ` The sum b î` Ÿ `ï is most of the sample variance of the values `. It is a measure of º? raw variation in the response variable. eople often call it the " total sum of squares" and write L º? The sum of squared residuals b tt[v[ º cî` Ÿ `ï º? is a measure of variation in response remaining unaccounted for after fitting a line to the data. eople often call it the "error sum of squares" and write ttf º c L º cî` Ÿ` ï º? One is guaranteed that ttuv[ ¾ ttf. So the difference ttuv[ Ÿ ttf is a non- negative measure of variation accounted for in fitting a line to î_î`ï data. eople often call it the " regression sum of squares" and write º? tts º tt[v[ Ÿ ttf 2

3 The coefficient of determination expresses tts as a fraction of ttuv[ and is s º tts ttuv[ which is interpreted as "the fraction of raw variation in ` accounted for in the model fitting process." arameter Estimates for SLR The descriptive statistics for î_î`ï data can be used to provide "single number estimates" of the (typically unknown) parameters of the simple linear regression model. That is, the slope of the least squares line can serve as an estimate of??,? ºI?? and the intercept of the least squares line can serve as an estimate of? >,? ºI > > The variance of ` for a given _ can be estimated by a kind of average of squared residuals? ttf Z L º c L º Ÿ Ÿ Of course, the square root of this "regression sample variance" is Z L º Z a single number estimate of R. º? L and serves as Interval-Based Inference Methods for SLR The normal simple linear regression model provides inference formulas for model parameters. Confidence limits for?? (the slope of the line relating mean ` to _... the rate of change of average ` with respect to _) are ZL I? [ î_ Ÿ _ï b º? where [ is a quantile of the [ distribution with L º Ÿ degrees of freedom. Dielman calls the ratio Zì î_ Ÿ_ï L b the standard error of I? and uses the symbol ZI for it. In? º? these terms, the confidence limits for are?? I [Z? I? 3

4 Confidence limits for K `Š_ º? >??_ (the mean value of ` at a given value _) are? î_ÿ_ï îi I _ï [Z >? L bî_ Ÿ _ï One can abbreviate îi> I?_ï as `, and Dielman calls ZL the standard bî_ Ÿ_ï error of the mean and uses the symbol K`Š_ are Z T º?? º? î_ÿ_ï for it. In these terms, the confidence limits for ` [Z T (Note that by choosing _º>, this formula provides confidence limits for? >, though this parameter is rarely of independent practical interest.) rediction limits for an additional observation ` at a given value _ are? î_ÿ_ï îi I _ï [Z >? L? bî_ Ÿ _ï Dielman calls Z L?? î_ÿ_ï bî_ Ÿ_ï Z W for it. It is on occasion useful to notice that º? º? the standard error of prediction and uses the symbol W Z L Z T Z º sing the ZW notation, prediction limits for an additional ` at _ are ` [Z W Hypothesis Tests and SLR The normal simple linear regression model supports hypothesis testing. H: 0?? º # can be tested using the test statistic I? Ÿ# I? Ÿ# uº Z º L Z I î_ Ÿ_ï b? º? 4

5 and a [ Ÿ reference distribution. H: 0 K`Š_ º # can be tested using the test statistic and a [ Ÿ reference distribution. îi> I?_ï Ÿ# `Ÿ# uº º? î_ÿ_ï ZT ZL bî_ Ÿ_ï º? ANOVA and SLR The breaking down of ttuv[ into tts and ttf can be thought of as a kind of " analysis of variance" in `. That enterprise is often summarized in a special kind of table. The general form is as below. ANOVA Table (for SLR) Source SS df MS F Regression tts? nts º ttsì? g º ntsìntf Error ttf Ÿ ntf º ttfìî Ÿ ï Total tt[v[ Ÿ? In this table the ratios of sums of squares to degrees of freedom are called "mean squares." The mean square for error is, in fact, the estimate of R (i.e. ntf º Z ). As it turns out, the ratio in the "F" column can be used as a test statistic for the hypothesis H: 0?? º 0. The reference distribution appropriate is the g? Ÿ distribution. As it turns out, the value g º ntsìntf is the square of the [ statistic for testing this hypothesis, and the g test produces exactly the same W -values as a two-sided [ test. L Standardized Residuals and SLR The theoretical variances of the residuals turn out to depend upon their corresponding _ values. As a means of putting these residuals all on the same footing, it is common to "standardize" them by dividing by an estimated standard deviation for each. This produces standardized residuals L º L ZL?Ÿ? Ÿ î_ Ÿ_ï bî_ Ÿ_ï These (if the normal simple linear regression model is a good one) "ought" to look as if they are approximately normal with mean > and standard deviation?. Various kinds of plotting with these standardized residuals (or with the raw residuals) are used as means of "model checking" or "model diagnostics." º? 5

6 Multiple Linear Regression Model The basic (normal) "multiple linear regression" model says that a response/output variable ` depends on explanatory/input/system variables _? R in a "noisy but linear" way. That is, one supposes that there is a linear relationship between? _ R and mean `, K `Š,_ º? >??_??_? R_ R? R and that (for fixed? R) there is around that mean a distribution of `that is normal. Further, the standard assumption is that the standard deviation of the response distribution is constant in. In symbols it is standard to write? R ` º? >??_??_? R_ R B where B is normal with mean > and standard deviation R. This describes one `. Where several observations ` with corresponding values _? R are under consideration, the assumption is that the ` (the B ) are independent. (The B are conceptually equivalent to unrelated random draws from the same fixed normal continuous distribution.) The model statement in its full glory is then ` º? >??_??_? R_ R B for º? B for º? are independent normal î> R ï random variables The model statement above is a perfectly theoretical matter. One can begin with it, and for specific choices of? >???? R and Rfind probabilities for ` at given values of? R. In applications, the real mode of operation is instead to take data vectors î_?? _? _ R? `? ï î_? R ` ï î_? R `ï and use them to make inferences about the parameters? >??,?? R and R and to make predictions based on the estimates (based on the empirically fitted model). Descriptive Analysis of Approximately Linear î? R _ `ï Data Calculus can be invoked to find coefficients? >???? R minimizing the sum of squared vertical distances from data points in îr?ï dimensional space to a fitted surface, bî` Ÿî?? _? _? _ ïï. These " least squares" values º? >?? R R DO NOT have simple formulas (unless one is willing to use matrix notation). And in particular, one can NOT simply somehow use the formulas from simple linear regression in this more complicated context. We will call these minimizing coefficients IIII >? R and need to rely upon JM to produce them for us. 6

7 It is further common to refer to the value of ` on the "least squares surface" corresponding to _? R as a fitted or predicted value ` ºI> I?_? I _ I R_ R Exactly as in SLR, one takes the difference between what is observed ( ` ) and what is "predicted" or "explained" ( ` ) as a kind of leftover part or " residual" corresponding to a data value L º ` Ÿ` The total sum of squares, ttuv[ º b î` Ÿ `ï, is (still) most of the sample variance º? of the values and measures raw variation in the response variable. Just as in SLR, ` L º? the sum of squared residuals b is a measure of variation in response remaining unaccounted for after fitting the equation to the data. As in SLR, people call it the error sum of squares and write ttf º c L º cî` Ÿ` ï º? (The formula looks exactly like the one for SLR. It is simply the case that now ` is computed using all R inputs, not just a single _.) One is still guaranteed that ttuv[ ¾ ttf. So the difference ttuv[ Ÿ ttf is a non-negative measure of variation accounted for in fitting the linear equation to the data. As in SLR, people call it the regression sum of squares and write º? tts º tt[v[ Ÿ ttf The coefficient of (multiple) determination expresses tts as a fraction of ttuv[ and is s º tts ttuv[ which is interpreted as "the fraction of raw variation in ` accounted for in the model fitting process." This quantity can also be interpreted in terms of a correlation, as it turns out to be the square of the sample linear correlation between the observations ` and the fitted or predicted values `. arameter Estimates for MLR The descriptive statistics for î_? R`ïdata can be used to provide "single number estimates" of the (typically unknown) parameters of the multiple linear regression model. That is, the least squares coefficients IIII >? R serve as estimates of the parameters????. The first of these is a kind of high-dimensional "intercept" >? R 7

8 and in the case where the predictors are not functionally related, the others serve as rates of change of average ` with respect to a single _, provided the other _'s are held fixed. The variance of ` for a fixed values _? R can be estimated by a kind of average of squared residuals? ttf Z L º c L º ŸRŸ? ŸRŸ? The square root of this "regression sample variance" is Z L º Z number estimate of R. º? L and serves as a single Interval-Based Inference Methods for MLR The normal multiple linear regression model provides inference formulas for model parameters. Confidence limits for? Q (the rate of change of average ` with respect to _ Q ) are I Q [ZI Q where [ is a quantile of the [ distribution with L ºŸRŸ? degrees of freedom and ZI Q is a standard error of IQ. There is no simple formula for ZI Q, and in particular, one can NOT simply somehow use the formula from simple linear regression in this more complicated context. It IS the case that this standard error is a multiple of Z L, but we will have to rely upon JM to provide it for us. Confidence limits for K `Š? >??????, º R R_ R (the mean value of ` at a particular choice of the _ ) are? R ` [Z T where ZT is a standard error for ` and depends upon which values of? R are under consideration. There is no simple formula for Z T and in particular one can NOT simply somehow use the formula from simple linear regression in this more complicated context. It IS the case that this standard error is a multiple of Z L, but we will have to rely upon JM to provide it for us. rediction limits for an additional observation `at a given vector are ` [Z W î _ï? R where Z W is a standard error of prediction. It remains true (as in SLR) that Z W º Z L Z T but otherwise there are no simple formulas for it, and in particular one can NOT simply somehow use the formula from simple linear regression in this more complicated context. 8

9 Hypothesis Tests and MLR The normal multiple linear regression model supports hypothesis testing. H: 0? Q º # can be tested using the test statistic uº I Q Ÿ # Z and a [ ŸRŸ? reference distribution. H: 0 K`Š # can be tested using the test?, º R statistic and a [ ŸRŸ? reference distribution. I Q uº`ÿ Z T # ANOVA and MLR As in SLR, the breaking down of ttuv[ into tts and ttf can be thought of as a kind of " analysis of variance" in `, and summarized in a special kind of table. The general form for MLR is as below. ANOVA Table (for MLR Overall F Test) Source SS df MS F Regression tts R nts º ttsìr g º ntsìntf Error ttf Ÿ R Ÿ? ntf º ttfìî Ÿ R Ÿ?ï Total tt[v[ Ÿ? (Note that as in SLR, the mean square for error is, in fact, the estimate of ntf º Z L ).) As it turns out, the ratio in the "F" column can be used as a test statistic for the hypothesis H: 0?? º? º º? R º 0. The reference distribution appropriate is the g RŸRŸ? distribution. R (i.e. "artial F Tests" in MLR It is possible to use ANOVA ideas to invent F tests for investigating whether some whole group of?'s (short of the entire set) are all >. For example one might want to test the hypothesis H: 0? S? º? S º º? R º 0 9

10 (This is the hypothesis that only the first S of the R input variables _ have any impact on the mean system response... the hypothesis that the first S of the _'s are adequate to predict `... the hypothesis that after accounting for the first S of the _'s, the others do not contribute "significantly" to one's ability to explain or predict `.) If we call the model for ` in terms of all R of the predictors the "full model" and the model for ` involving only _? through _ S the "reduced model" then an g test of the above hypothesis can be made using the statistic gº îttsfull Ÿ ttsreducedïìîr Ÿ Sï ntf and an grÿsÿrÿ? reference distribution. ttsfull ¾ ttsreduced so the numerator here is non-negative. Finding a W-value for this kind of test is a means of judging whether for the full model is "significantly"/detectably larger than s for the reduced model. (Caution here, statistical significant is not the same as practical importance. With a big enough data set, essentially any increase in s will produce a small W-value.) It is reasonably common to expand the basic MLR ANOVA table to organize calculations for this test statistic. This is (Expanded) ANOVA Table (for MLR) Source SS df MS F Regression tts R nts º ttsìr g º nts ìntf _? _ S ttsred W îttsfullÿttsredïìîrÿsï _ S? _Š_ R S _ W ttsfull Ÿ ttsred R Ÿ S îttsfull Ÿ ttsredïìîr Ÿ Sï ntffull Error ttffull Ÿ R Ÿ? ntf Full º ttffullìî Ÿ R Ÿ?ï Total tt[v[ Ÿ? Full Full Full Full Full s Standardized Residuals in MLR As in SLR, people sometimes wish to standardize residuals before using them to do model checking/diagnostics. While it is not possible to give a simple formula for the "standard error of L " without using matrix notation, most MLR programs will compute these values. The standardized residual for data point is then (as in SLR) L º L standard error of L If the normal multiple linear regression model is a good one these "ought" to look as if they are approximately normal with mean > and standard deviation?. 10

11 Intervals and Tests for Linear Combinations of?'s in MLR It is sometimes important to do inference for a linear combination of MLR model coefficients m º J>? > J??? J? J JR? R (where JJJ >? R are known constants). Note, for example, that K`Š?, is of this R form for J > º?J? º _? J º _ and J R º _ R. Note too that a difference in mean responses at two sets of predictors, say î ï? R and î ï? R is of this form for J º >J º _ Ÿ_ J º _ Ÿ_ and J º _ Ÿ_. >??? R R R An obvious estimate of m is Confidence limits for m are mº J> I> J? I? J I J JRIR m [î standard error of mï There is no simple formula for "standard error of m " This standard error is a multiple of Z, but we will have to rely upon JM to provide it for us. (Computation of m and its standard error is under the "Custom Test" option in JM.) H: # 0 mº can be tested using the test statistic mÿ # uº standard error of m and a [ ŸRŸ? reference distribution. Or, if one thinks about it for a while, it is possible to find a reduced model that corresponds to the restriction that the null hypothesis places on the MLR model and to use a " SºRŸ? " partial g test (with? and ŸRŸ? degrees of freedom) equivalent to the [ test for this purpose. 11

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Psychology 282 Lecture #3 Outline

Psychology 282 Lecture #3 Outline Psychology 8 Lecture #3 Outline Simple Linear Regression (SLR) Given variables,. Sample of n observations. In study and use of correlation coefficients, and are interchangeable. In regression analysis,

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 18 Miscellaneous Topics in Multiple Regression Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Lecture 19: Inference for SLR & Transformations

Lecture 19: Inference for SLR & Transformations Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples Objective Section 9.4 Inferences About Two Means (Matched Pairs) Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means

More information

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature Data Set A: Algal Photosynthesis vs. Salinity and Temperature Statistical setting These data are from a controlled experiment in which two quantitative variables were manipulated, to determine their effects

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Statistics II Exercises Chapter 5

Statistics II Exercises Chapter 5 Statistics II Exercises Chapter 5 1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1) (a) Check that all four datasets generate exactly the same LS linear regression

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis. Hypothesis Testing Today, we are going to begin talking about the idea of hypothesis testing how we can use statistics to show that our causal models are valid or invalid. We normally talk about two types

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

III. Inferential Tools

III. Inferential Tools III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information