Advanced Regression Topics: Violation of Assumptions

Size: px
Start display at page:

Download "Advanced Regression Topics: Violation of Assumptions"

Transcription

1 Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36

2 Today s Lecture Today s Lecture rapping Up Revisiting residuals. Outliers aside, what other things are important to look for: Nonconstant variance... Lecture #7-2/15/2005 Slide 2 of 36

3 Snow Geese Snow geese Regression Analysis Assumptions Residual Plot rapping Up From eisberg (1985, p. 102): Aerial survey methods are regularly used to estimated the number of snow geese in their summer range areas west of Hudson Bay in Canada. To obtain estimates, small aircraft fly over the range and, when a flock of geese is spotted, an experienced person estimates the number of geese in the flock. To investigate the reliability of this method of counting, an experiment was conducted in which an airplane carrying two observers flew over 45 flocks, and each observer made an independent estimate of the number of birds in each flock. Also, a photograph of the flock was taken so that an exact count of the number of birds in the flock could be made (data by Cook and Jacobsen, 1978). Lecture #7-2/15/2005 Slide 3 of 36

4 Snow Geese Lecture #7-2/15/2005 Slide 4 of 36

5 Hudson Bay Lecture #7-2/15/2005 Slide 5 of 36

6 Regression Analysis Snow geese Regression Analysis Assumptions Residual Plot Using the first observer in the plane, we consider the relationship between this person s count and that from the photograph: rapping Up photo count observer 1 count Lecture #7-2/15/2005 Slide 6 of 36

7 Regression Analysis Snow geese Regression Analysis Assumptions Residual Plot One way of analyzing these data is to fit a regression that attempts to predict the count in the photo from the count by the observer. Using SPSS, this regression was estimated, giving the following statistics: rapping Up Coefficient Estimate SE t p-value a - intercept b - slope < Statistic Estimate SS reg 254, SS res 84, R Lecture #7-2/15/2005 Slide 7 of 36

8 Assumptions Snow geese Regression Analysis Assumptions Residual Plot But, remember, before we can interpret these results, we must first check our assumptions. Assumptions of regression analyses revolve around the residuals, e = Y Y = Y (a + bx). rapping Up In particular, we specified that all residuals were: Independent (or non-correlated). Identically distributed. Distribution was normal, with: A zero mean. A constant variance. e N(0, σ 2 e) Lecture #7-2/15/2005 Slide 8 of 36

9 Residual Plot Snow geese Regression Analysis Assumptions Residual Plot From a previous lecture, recall that an easy way to check assumptions was to look at a plot of the standardized residuals against the unstandardized predicted values: rapping Up Standardized Residual Unstandardized Predicted Value Do you notice any problems from this plot? Lecture #7-2/15/2005 Slide 9 of 36

10 Detection Example Test Remedies Alternate Estimation Transformations One of the primary assumptions in a linear regression is that var(e i ) = σ 2 e for all i = 1,...,N observations. Standardized Residual rapping Up Unstandardized Predicted Value In our example, this assumption is clearly violated. Lecture #7-2/15/2005 Slide 10 of 36

11 Detection Detection Example Test Remedies Alternate Estimation Transformations rapping Up Detecting nonconstant variance is often accomplished by examining the residual plot. However, using visual inspection can lead to some problems: Subjective interpretation, relying on experience. How much is too much? Nonconstant variance is really a matter of degree. For this reason, one can construct a statistical hypothesis test for the constancy of variance. Lecture #7-2/15/2005 Slide 11 of 36

12 Detection Test Detection Example Test Remedies Alternate Estimation Transformations rapping Up Note that var(e i ) is caused by: The response, y. The predictors X. Some other quantity not involved in the regression, such as: Observations over time. Observations related by space (spatial orientation). Any (or all) of these features can be put into a large matrix, Z, so each observation i has a row vector z i. Lecture #7-2/15/2005 Slide 12 of 36

13 Detection Test Detection Example Test Remedies Alternate Estimation Transformations rapping Up Given our suspected cause of nonconstancy of variance for each observation z i, we can assume: var(e i ) = σ 2 e[exp(λ z i )] This is a very technical way of making variance a function of other variables. This form specifies the following constraints on our variance: 1. var(e i ) > 0 for all observations z i. 2. The variance depends on z i and λ, but only because in the linear function λ z i. 3. var(e i ) is monotonic (either increasing or decreasing) across each component of z i. 4. If λ = 0, then var(e i ) = σ 2 e for all i. Lecture #7-2/15/2005 Slide 13 of 36

14 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations rapping Up STEP 0: Determination of what is causing nonconstant variance. Let s go back to our geese data example, and assume the nonconstancy in variance was due to the predictor. Human s have a more difficult time detecting the number of geese consistently as the number observed gets very large. Because we feel that the nonconstancy in variability is caused by our predictor variables, we will construct Z from X. Note that X has a column vector of ones for the intercept. Lecture #7-2/15/2005 Slide 14 of 36

15 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations 1. Estimate regression line for original model (Y = a + bx), and save the unstandardized residuals for each observation: ê i = Y i (â + ˆbX i ). 2. For each observation, compute scaled squared residuals, u i : rapping Up u i = ê2 i σ 2 e, where σ 2 e is the ML estimate of σ 2 e (differing from MS error because of denominator of N rather than N k 1): σ 2 e = N i=1 ê2 i N Lecture #7-2/15/2005 Slide 15 of 36

16 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations 3. Compute the regression of u i onto z i. Obtain from this regression the SS reg. Obtain the df reg, where this is the number of predictors in Z (not including the intercept). rapping Up 4. Compute the Score statistic (using SS reg from step 3): S = SSreg 2 5. Test the hypothesis that λ = 0 by obtaining a p-value for S, which is distributed χ 2 (df reg ) where df reg is from step 3. Lecture #7-2/15/2005 Slide 16 of 36

17 From Our Example Detection Example Test Remedies Alternate Estimation Transformations rapping Up 1. Original regression estimates found from SPSS: Analyze...Regression...Linear. Save unstandardized residuals from Save button menu. 2. For each observation, compute scaled squared residuals, u i = ê2 i σ : e 2 Compute σ 2 e = N i=1 ê2 i N. In SPSS: Transform...Compute, and make a new variable that is the squared value of the unstandardized residual. Then find average of the new variable from Analyze...Descriptive Statistics...Descriptives. Alternative: take SS res from step 1 output and divide by N. σ 2 e = 1, Compute u i in SPSS by going to Transform...Compute. Lecture #7-2/15/2005 Slide 17 of 36

18 From Our Example Detection Example Test Remedies Alternate Estimation Transformations rapping Up 3. Compute the regression of u i onto z i. In SPSS: Analyze...Regression...Linear. 4. Compute the Score statistic S = SSreg 2, using SS reg from step 3. Get this from SPSS output. S = /2 = This has df reg = Get p-value for S. In Excel type =chidist(81.41,1). p < Based on these results, we reject the null hypothesis of constant variance, and conclude that our example violates the constant variance assumption. Lecture #7-2/15/2005 Slide 18 of 36

19 ...Now hat? Detection Example Test Remedies Alternate Estimation Transformations rapping Up e found in our example that we have statistical evidence for nonconstant variance. The biggest result of nonconstant variance is that our regression line does not accurately represent all cases in our sample. Also a problem is that the hypothesis tests we use are based on the assumption that e i N(0, σ 2 e). hen nonconstant variance is found, two options are possible: 1. Estimate the regression using alternate methods. eighted least squares. Median regression. 2. Transform either the response or the predictors. Lecture #7-2/15/2005 Slide 19 of 36

20 Remedy #1: Alternate Estimation Algorithms Detection Example Test Remedies Alternate Estimation Transformations rapping Up Much like the mean is extremely sensitive to highly skewed (outlying) observations, least squares estimates have what is called a high breakdown point. Instead of finding regression parameters that minimize: N (Y Y ) 2, i=1 alternative optimization criteria exist. Lecture #7-2/15/2005 Slide 20 of 36

21 Alternate Estimation Algorithms Detection Example Test Remedies Alternate Estimation Transformations Two possible alternatives: eighted Least Squares (LS): N w i (Y Y ) 2 i=1 Can be performed in SPSS. rapping Up Minimum absolute deviation: Much more technical. N Y Y i=1 Simplex optimization method involves linear programming. Lecture #7-2/15/2005 Slide 21 of 36

22 Remedy #2: Variance Stabilizing Transformations Detection Example Test Remedies Alternate Estimation Transformations The second (and perhaps most commonly used) remedy for nonconstant variance is to transform the response variable Y. (eisberg, 1985; p. 134) Transformation Situation Reason Y var(ei ) E(Y i ) Poisson counts rapping Up Y + Y + 1 Poisson, small Y values ln(y ) var(e i ) [E(Y i )] 2 Broad range of Y ln(y + 1) Some Y are zero 1 Y var(e i ) [E(Y i )] 4 Y bunched near zero 1 (Y +1) Some Y are zero sin 1 ( Y ) var(e i ) E(Y i )(1 E(Y i )) Binomial proportions Lecture #7-2/15/2005 Slide 22 of 36

23 Nonlinear Relationship Between Predictor(s) and Y Detection Remedy Situations occur where a non-linear relationship between predictors and Y is present. For example, imagine that the true relationship between Y and X is something like: rapping Up Y = ax b To use linear regression techniques we are well aware of, this function must be transformed (both Y and X): ln(y ) = ln(a) + b ln(x) Lecture #7-2/15/2005 Slide 23 of 36

24 Nonlinear Relationship Between Predictor(s) and Y Depending on the situation, not all functions can be made linear: Detection Remedy rapping Up Y = a 1 e b 1X 1 + a 2 e b 2X 2 Furthermore, depending on the error dependency (multiplicative or additive), transformations will not lead to errors with the distributional assumptions of linear regression. Linear regression can only go so far, so if data have a functional relationship other methods may be better suited. Lecture #7-2/15/2005 Slide 24 of 36

25 Standardized Residual Detection of Often, nonlinearity is detected visually, through use of the residual plots. Detection Remedy rapping Up Unstandardized Predicted Value Lecture #7-2/15/2005 Slide 25 of 36

26 Possible Remedy for Detection Remedy As discussed, a possible remedy for nonlinearity is to use a transformation of both Y and X. Situations occur where a non-linear relationship between predictors and Y is present. rapping Up (eisberg, 1985; p. 142) Y Transformation X Transformation Form ln(y ) ln(x) Y = ax b 1 1 Xb Xb k k ln(y ) X Y = ae b 1 X 1 +b 2 X b k X k Y ln(x) Y = a + b 1 ln(x 1 ) + b 2 ln(x 2 ) ln(b k )X k 1 1 Y = 1 Y X a+(b 1 /X 1 )+(b 2 /X 2 )+...+(b k /X k ) 1 X Y = 1 Y a+(b 1 X 1 )+(b 2 X 2 )+...+(b k X k ) Y 1 Y = a + (b 1 X 1 + b 1 X b 1 X2 k Xk Lecture #7-2/15/2005 Slide 26 of 36

27 Parameterizing Transformations Transformation Parameter 1 Transformation Parameter 2 Instead of choosing some type of transformation function seemingly arbitrarily, statistical techniques have been developed to transform both Y and the set of all predictors X based on known functions. These techniques bear mention because from time to time you will encounter estimates based on these. rapping Up Furthermore, a clear functional relationship between Y and X may not be known, either from substantive theory or empirical results. In these situations, linear methods are often easier to rely upon because of parsimony (real or perceived). Lecture #7-2/15/2005 Slide 27 of 36

28 Transformation of Y : Optimization Consider the family of regression models (power models): y λ = Xb + e. Transformation Parameter 1 Transformation Parameter 2 rapping Up Finding λ gives an idea of the power relationship between the response and the predictor variables. Such transformations are often referred to as Box-Cox transformations, and involve iterative techniques to find the most likely value of λ. Another transformation method is called Atkinson s score method. Lecture #7-2/15/2005 Slide 28 of 36

29 Transformation of X: Optimization Similar methods have been developed for transforming X. Transforming X is inherently more difficulty. Transformation Parameter 1 Transformation Parameter 2 rapping Up Often times absurd results can be the outcome. Many times non-significant relationships obscure any needed transformations. Lecture #7-2/15/2005 Slide 29 of 36

30 The last assumption to be checked for is the normality of the residuals. Detecting nonnormality can be tricky, and is often based upon sample size. Detection 1: Plots Detection 2: Tests Special Cases rapping Up Statistically speaking, there is not a hypothesis test that can conclude that a variable is normally distributed. Violations of this assumption lead to inaccurate p-values in hypothesis tests (only). The p-values, however, are fairly robust to violations of this assumption. Lecture #7-2/15/2005 Slide 30 of 36

31 Detection of : Probability Plots Detection 1: Plots Detection 2: Tests Special Cases rapping Up The easiest way to detect nonnormal errors is to use a Q-Q plot. A Q-Q plot is a plot of an ordered variable against what its expected value should be for a variable from a normal distribution with size N. In SPSS: Graphs...Q-Q (check Standardize values and be sure Test distribution is Normal). The plot of the data should fall on the line produced on the plot. If not, the data are not from a normal distribution. Lecture #7-2/15/2005 Slide 31 of 36

32 Detection of : Probability Plots Detection 1: Plots Detection 2: Tests Special Cases rapping Up Lecture #7-2/15/2005 Slide 32 of 36

33 Detection of : Hypothesis Tests Detection 1: Plots Detection 2: Tests Special Cases rapping Up Additionally, statistical hypothesis tests have been developed to test the null hypothesis that the data (in this case the residuals) can from a normal distribution. Shapiro-ilk test. Kolmogorov-Smirnov test. In SPSS: get the unstandardized residuals and go to Analyze...Descriptive Statistics...Explore. Put the residuals in the Dependent List box. Click on Plots and check Normality plots with tests". If p-value is less than some α, then reject the null hypothesis...residuals are not from a normal distribution. Little validity under small samples. Lecture #7-2/15/2005 Slide 33 of 36

34 Test for Correlated Errors Finally, there is a hypothesis test for seriation effects in the predictor variables. The Durbin-atson statistic tests for correlation between adjacent observations. Detection 1: Plots Detection 2: Tests Special Cases rapping Up Really, this test is only valid if observations were made on equal time intervals. In SPSS: Analyze...Regresssion...Linear. Click on the Statistics box. Under Residuals check Durbin-atson. Lecture #7-2/15/2005 Slide 34 of 36

35 Here it is: Your Moment of Zen rapping Up Moment of Zen Next Class Regression diagnostics are important parts of an analysis that must not be overlooked. Often times, inferences can be wrong if assumptions of the regression have not been met. Transformations can take forever, and may not get you closer to a good result with respect to assumptions. Listen to your data, they are trying to tell you something. Lecture #7-2/15/2005 Slide 35 of 36

36 Next Time Bringing it all together: Case studies in regression. rapping Up Moment of Zen Next Class Lecture #7-2/15/2005 Slide 36 of 36

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

LECTURE 11. Introduction to Econometrics. Autocorrelation

LECTURE 11. Introduction to Econometrics. Autocorrelation LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function. 1 2 WEIGHTED LEAST SQUARES Recall: We can fit least squares estimates just assuming a linear mean function. Without the constant variance assumption, we can still conclude that the coefficient estimators

More information

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement EconS 450 Forecasting part 3 Forecasting with Regression Using regression to study economic relationships is called econometrics econo = of or pertaining to the economy metrics = measurement Econometrics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Model Assumptions; Predicting Heterogeneity of Variance

Model Assumptions; Predicting Heterogeneity of Variance Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Remedial Measures Wrap-Up and Transformations Box Cox

Remedial Measures Wrap-Up and Transformations Box Cox Remedial Measures Wrap-Up and Transformations Box Cox Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Last Class Graphical procedures for determining appropriateness

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology v2.0 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a near-daily basis

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Categorical Independent Variable Lecture 15 March 17, 2005 Applied Regression Analysis Lecture #15-3/17/2005 Slide 1 of 29 Today s Lecture» Today s Lecture» Midterm Note» Example Regression

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

MORE ON SIMPLE REGRESSION: OVERVIEW

MORE ON SIMPLE REGRESSION: OVERVIEW FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Lecture 7 Remedial Measures

Lecture 7 Remedial Measures Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11, Chapter 4 7-1 Topic Overview Review Assumptions & Diagnostics Remedial Measures for Non-normality Non-constant variance

More information

Ratio of Polynomials Fit One Variable

Ratio of Polynomials Fit One Variable Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 36 Simple Linear Regression Model Assessment So, welcome to the second lecture on

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74 Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note

More information

Regression with Nonlinear Transformations

Regression with Nonlinear Transformations Regression with Nonlinear Transformations Joel S Steele Portland State University Abstract Gaussian Likelihood When data are drawn from a Normal distribution, N (µ, σ 2 ), we can use the Gaussian distribution

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

22S39: Class Notes / November 14, 2000 back to start 1

22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics

More information

Introduction to the Analysis of Variance (ANOVA)

Introduction to the Analysis of Variance (ANOVA) Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information