Advanced Regression Topics: Violation of Assumptions
|
|
- Gloria Ball
- 5 years ago
- Views:
Transcription
1 Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36
2 Today s Lecture Today s Lecture rapping Up Revisiting residuals. Outliers aside, what other things are important to look for: Nonconstant variance... Lecture #7-2/15/2005 Slide 2 of 36
3 Snow Geese Snow geese Regression Analysis Assumptions Residual Plot rapping Up From eisberg (1985, p. 102): Aerial survey methods are regularly used to estimated the number of snow geese in their summer range areas west of Hudson Bay in Canada. To obtain estimates, small aircraft fly over the range and, when a flock of geese is spotted, an experienced person estimates the number of geese in the flock. To investigate the reliability of this method of counting, an experiment was conducted in which an airplane carrying two observers flew over 45 flocks, and each observer made an independent estimate of the number of birds in each flock. Also, a photograph of the flock was taken so that an exact count of the number of birds in the flock could be made (data by Cook and Jacobsen, 1978). Lecture #7-2/15/2005 Slide 3 of 36
4 Snow Geese Lecture #7-2/15/2005 Slide 4 of 36
5 Hudson Bay Lecture #7-2/15/2005 Slide 5 of 36
6 Regression Analysis Snow geese Regression Analysis Assumptions Residual Plot Using the first observer in the plane, we consider the relationship between this person s count and that from the photograph: rapping Up photo count observer 1 count Lecture #7-2/15/2005 Slide 6 of 36
7 Regression Analysis Snow geese Regression Analysis Assumptions Residual Plot One way of analyzing these data is to fit a regression that attempts to predict the count in the photo from the count by the observer. Using SPSS, this regression was estimated, giving the following statistics: rapping Up Coefficient Estimate SE t p-value a - intercept b - slope < Statistic Estimate SS reg 254, SS res 84, R Lecture #7-2/15/2005 Slide 7 of 36
8 Assumptions Snow geese Regression Analysis Assumptions Residual Plot But, remember, before we can interpret these results, we must first check our assumptions. Assumptions of regression analyses revolve around the residuals, e = Y Y = Y (a + bx). rapping Up In particular, we specified that all residuals were: Independent (or non-correlated). Identically distributed. Distribution was normal, with: A zero mean. A constant variance. e N(0, σ 2 e) Lecture #7-2/15/2005 Slide 8 of 36
9 Residual Plot Snow geese Regression Analysis Assumptions Residual Plot From a previous lecture, recall that an easy way to check assumptions was to look at a plot of the standardized residuals against the unstandardized predicted values: rapping Up Standardized Residual Unstandardized Predicted Value Do you notice any problems from this plot? Lecture #7-2/15/2005 Slide 9 of 36
10 Detection Example Test Remedies Alternate Estimation Transformations One of the primary assumptions in a linear regression is that var(e i ) = σ 2 e for all i = 1,...,N observations. Standardized Residual rapping Up Unstandardized Predicted Value In our example, this assumption is clearly violated. Lecture #7-2/15/2005 Slide 10 of 36
11 Detection Detection Example Test Remedies Alternate Estimation Transformations rapping Up Detecting nonconstant variance is often accomplished by examining the residual plot. However, using visual inspection can lead to some problems: Subjective interpretation, relying on experience. How much is too much? Nonconstant variance is really a matter of degree. For this reason, one can construct a statistical hypothesis test for the constancy of variance. Lecture #7-2/15/2005 Slide 11 of 36
12 Detection Test Detection Example Test Remedies Alternate Estimation Transformations rapping Up Note that var(e i ) is caused by: The response, y. The predictors X. Some other quantity not involved in the regression, such as: Observations over time. Observations related by space (spatial orientation). Any (or all) of these features can be put into a large matrix, Z, so each observation i has a row vector z i. Lecture #7-2/15/2005 Slide 12 of 36
13 Detection Test Detection Example Test Remedies Alternate Estimation Transformations rapping Up Given our suspected cause of nonconstancy of variance for each observation z i, we can assume: var(e i ) = σ 2 e[exp(λ z i )] This is a very technical way of making variance a function of other variables. This form specifies the following constraints on our variance: 1. var(e i ) > 0 for all observations z i. 2. The variance depends on z i and λ, but only because in the linear function λ z i. 3. var(e i ) is monotonic (either increasing or decreasing) across each component of z i. 4. If λ = 0, then var(e i ) = σ 2 e for all i. Lecture #7-2/15/2005 Slide 13 of 36
14 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations rapping Up STEP 0: Determination of what is causing nonconstant variance. Let s go back to our geese data example, and assume the nonconstancy in variance was due to the predictor. Human s have a more difficult time detecting the number of geese consistently as the number observed gets very large. Because we feel that the nonconstancy in variability is caused by our predictor variables, we will construct Z from X. Note that X has a column vector of ones for the intercept. Lecture #7-2/15/2005 Slide 14 of 36
15 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations 1. Estimate regression line for original model (Y = a + bx), and save the unstandardized residuals for each observation: ê i = Y i (â + ˆbX i ). 2. For each observation, compute scaled squared residuals, u i : rapping Up u i = ê2 i σ 2 e, where σ 2 e is the ML estimate of σ 2 e (differing from MS error because of denominator of N rather than N k 1): σ 2 e = N i=1 ê2 i N Lecture #7-2/15/2005 Slide 15 of 36
16 Detection Test: Steps Detection Example Test Remedies Alternate Estimation Transformations 3. Compute the regression of u i onto z i. Obtain from this regression the SS reg. Obtain the df reg, where this is the number of predictors in Z (not including the intercept). rapping Up 4. Compute the Score statistic (using SS reg from step 3): S = SSreg 2 5. Test the hypothesis that λ = 0 by obtaining a p-value for S, which is distributed χ 2 (df reg ) where df reg is from step 3. Lecture #7-2/15/2005 Slide 16 of 36
17 From Our Example Detection Example Test Remedies Alternate Estimation Transformations rapping Up 1. Original regression estimates found from SPSS: Analyze...Regression...Linear. Save unstandardized residuals from Save button menu. 2. For each observation, compute scaled squared residuals, u i = ê2 i σ : e 2 Compute σ 2 e = N i=1 ê2 i N. In SPSS: Transform...Compute, and make a new variable that is the squared value of the unstandardized residual. Then find average of the new variable from Analyze...Descriptive Statistics...Descriptives. Alternative: take SS res from step 1 output and divide by N. σ 2 e = 1, Compute u i in SPSS by going to Transform...Compute. Lecture #7-2/15/2005 Slide 17 of 36
18 From Our Example Detection Example Test Remedies Alternate Estimation Transformations rapping Up 3. Compute the regression of u i onto z i. In SPSS: Analyze...Regression...Linear. 4. Compute the Score statistic S = SSreg 2, using SS reg from step 3. Get this from SPSS output. S = /2 = This has df reg = Get p-value for S. In Excel type =chidist(81.41,1). p < Based on these results, we reject the null hypothesis of constant variance, and conclude that our example violates the constant variance assumption. Lecture #7-2/15/2005 Slide 18 of 36
19 ...Now hat? Detection Example Test Remedies Alternate Estimation Transformations rapping Up e found in our example that we have statistical evidence for nonconstant variance. The biggest result of nonconstant variance is that our regression line does not accurately represent all cases in our sample. Also a problem is that the hypothesis tests we use are based on the assumption that e i N(0, σ 2 e). hen nonconstant variance is found, two options are possible: 1. Estimate the regression using alternate methods. eighted least squares. Median regression. 2. Transform either the response or the predictors. Lecture #7-2/15/2005 Slide 19 of 36
20 Remedy #1: Alternate Estimation Algorithms Detection Example Test Remedies Alternate Estimation Transformations rapping Up Much like the mean is extremely sensitive to highly skewed (outlying) observations, least squares estimates have what is called a high breakdown point. Instead of finding regression parameters that minimize: N (Y Y ) 2, i=1 alternative optimization criteria exist. Lecture #7-2/15/2005 Slide 20 of 36
21 Alternate Estimation Algorithms Detection Example Test Remedies Alternate Estimation Transformations Two possible alternatives: eighted Least Squares (LS): N w i (Y Y ) 2 i=1 Can be performed in SPSS. rapping Up Minimum absolute deviation: Much more technical. N Y Y i=1 Simplex optimization method involves linear programming. Lecture #7-2/15/2005 Slide 21 of 36
22 Remedy #2: Variance Stabilizing Transformations Detection Example Test Remedies Alternate Estimation Transformations The second (and perhaps most commonly used) remedy for nonconstant variance is to transform the response variable Y. (eisberg, 1985; p. 134) Transformation Situation Reason Y var(ei ) E(Y i ) Poisson counts rapping Up Y + Y + 1 Poisson, small Y values ln(y ) var(e i ) [E(Y i )] 2 Broad range of Y ln(y + 1) Some Y are zero 1 Y var(e i ) [E(Y i )] 4 Y bunched near zero 1 (Y +1) Some Y are zero sin 1 ( Y ) var(e i ) E(Y i )(1 E(Y i )) Binomial proportions Lecture #7-2/15/2005 Slide 22 of 36
23 Nonlinear Relationship Between Predictor(s) and Y Detection Remedy Situations occur where a non-linear relationship between predictors and Y is present. For example, imagine that the true relationship between Y and X is something like: rapping Up Y = ax b To use linear regression techniques we are well aware of, this function must be transformed (both Y and X): ln(y ) = ln(a) + b ln(x) Lecture #7-2/15/2005 Slide 23 of 36
24 Nonlinear Relationship Between Predictor(s) and Y Depending on the situation, not all functions can be made linear: Detection Remedy rapping Up Y = a 1 e b 1X 1 + a 2 e b 2X 2 Furthermore, depending on the error dependency (multiplicative or additive), transformations will not lead to errors with the distributional assumptions of linear regression. Linear regression can only go so far, so if data have a functional relationship other methods may be better suited. Lecture #7-2/15/2005 Slide 24 of 36
25 Standardized Residual Detection of Often, nonlinearity is detected visually, through use of the residual plots. Detection Remedy rapping Up Unstandardized Predicted Value Lecture #7-2/15/2005 Slide 25 of 36
26 Possible Remedy for Detection Remedy As discussed, a possible remedy for nonlinearity is to use a transformation of both Y and X. Situations occur where a non-linear relationship between predictors and Y is present. rapping Up (eisberg, 1985; p. 142) Y Transformation X Transformation Form ln(y ) ln(x) Y = ax b 1 1 Xb Xb k k ln(y ) X Y = ae b 1 X 1 +b 2 X b k X k Y ln(x) Y = a + b 1 ln(x 1 ) + b 2 ln(x 2 ) ln(b k )X k 1 1 Y = 1 Y X a+(b 1 /X 1 )+(b 2 /X 2 )+...+(b k /X k ) 1 X Y = 1 Y a+(b 1 X 1 )+(b 2 X 2 )+...+(b k X k ) Y 1 Y = a + (b 1 X 1 + b 1 X b 1 X2 k Xk Lecture #7-2/15/2005 Slide 26 of 36
27 Parameterizing Transformations Transformation Parameter 1 Transformation Parameter 2 Instead of choosing some type of transformation function seemingly arbitrarily, statistical techniques have been developed to transform both Y and the set of all predictors X based on known functions. These techniques bear mention because from time to time you will encounter estimates based on these. rapping Up Furthermore, a clear functional relationship between Y and X may not be known, either from substantive theory or empirical results. In these situations, linear methods are often easier to rely upon because of parsimony (real or perceived). Lecture #7-2/15/2005 Slide 27 of 36
28 Transformation of Y : Optimization Consider the family of regression models (power models): y λ = Xb + e. Transformation Parameter 1 Transformation Parameter 2 rapping Up Finding λ gives an idea of the power relationship between the response and the predictor variables. Such transformations are often referred to as Box-Cox transformations, and involve iterative techniques to find the most likely value of λ. Another transformation method is called Atkinson s score method. Lecture #7-2/15/2005 Slide 28 of 36
29 Transformation of X: Optimization Similar methods have been developed for transforming X. Transforming X is inherently more difficulty. Transformation Parameter 1 Transformation Parameter 2 rapping Up Often times absurd results can be the outcome. Many times non-significant relationships obscure any needed transformations. Lecture #7-2/15/2005 Slide 29 of 36
30 The last assumption to be checked for is the normality of the residuals. Detecting nonnormality can be tricky, and is often based upon sample size. Detection 1: Plots Detection 2: Tests Special Cases rapping Up Statistically speaking, there is not a hypothesis test that can conclude that a variable is normally distributed. Violations of this assumption lead to inaccurate p-values in hypothesis tests (only). The p-values, however, are fairly robust to violations of this assumption. Lecture #7-2/15/2005 Slide 30 of 36
31 Detection of : Probability Plots Detection 1: Plots Detection 2: Tests Special Cases rapping Up The easiest way to detect nonnormal errors is to use a Q-Q plot. A Q-Q plot is a plot of an ordered variable against what its expected value should be for a variable from a normal distribution with size N. In SPSS: Graphs...Q-Q (check Standardize values and be sure Test distribution is Normal). The plot of the data should fall on the line produced on the plot. If not, the data are not from a normal distribution. Lecture #7-2/15/2005 Slide 31 of 36
32 Detection of : Probability Plots Detection 1: Plots Detection 2: Tests Special Cases rapping Up Lecture #7-2/15/2005 Slide 32 of 36
33 Detection of : Hypothesis Tests Detection 1: Plots Detection 2: Tests Special Cases rapping Up Additionally, statistical hypothesis tests have been developed to test the null hypothesis that the data (in this case the residuals) can from a normal distribution. Shapiro-ilk test. Kolmogorov-Smirnov test. In SPSS: get the unstandardized residuals and go to Analyze...Descriptive Statistics...Explore. Put the residuals in the Dependent List box. Click on Plots and check Normality plots with tests". If p-value is less than some α, then reject the null hypothesis...residuals are not from a normal distribution. Little validity under small samples. Lecture #7-2/15/2005 Slide 33 of 36
34 Test for Correlated Errors Finally, there is a hypothesis test for seriation effects in the predictor variables. The Durbin-atson statistic tests for correlation between adjacent observations. Detection 1: Plots Detection 2: Tests Special Cases rapping Up Really, this test is only valid if observations were made on equal time intervals. In SPSS: Analyze...Regresssion...Linear. Click on the Statistics box. Under Residuals check Durbin-atson. Lecture #7-2/15/2005 Slide 34 of 36
35 Here it is: Your Moment of Zen rapping Up Moment of Zen Next Class Regression diagnostics are important parts of an analysis that must not be overlooked. Often times, inferences can be wrong if assumptions of the regression have not been met. Transformations can take forever, and may not get you closer to a good result with respect to assumptions. Listen to your data, they are trying to tell you something. Lecture #7-2/15/2005 Slide 35 of 36
36 Next Time Bringing it all together: Case studies in regression. rapping Up Moment of Zen Next Class Lecture #7-2/15/2005 Slide 36 of 36
Applied Regression Analysis
Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationLECTURE 11. Introduction to Econometrics. Autocorrelation
LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationRegression With a Categorical Independent Variable
Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationWEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.
1 2 WEIGHTED LEAST SQUARES Recall: We can fit least squares estimates just assuming a linear mean function. Without the constant variance assumption, we can still conclude that the coefficient estimators
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationOne-way ANOVA Model Assumptions
One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationA Re-Introduction to General Linear Models
A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationRegression With a Categorical Independent Variable
Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationIntroduction to Confirmatory Factor Analysis
Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis
More informationTopic 1. Definitions
S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationLecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012
Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationUsing SPSS for One Way Analysis of Variance
Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More information4 Multicategory Logistic Regression
4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationUsing regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement
EconS 450 Forecasting part 3 Forecasting with Regression Using regression to study economic relationships is called econometrics econo = of or pertaining to the economy metrics = measurement Econometrics
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationModel Assumptions; Predicting Heterogeneity of Variance
Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationBE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club
BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook
More information10 Model Checking and Regression Diagnostics
10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationRemedial Measures Wrap-Up and Transformations Box Cox
Remedial Measures Wrap-Up and Transformations Box Cox Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Last Class Graphical procedures for determining appropriateness
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationCorrelation 1. December 4, HMS, 2017, v1.1
Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationIntro to Linear Regression
Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationLecture 2: Linear and Mixed Models
Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationSimple Linear Regression
Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology v2.0 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a near-daily basis
More informationUnivariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?
Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationIntro to Linear Regression
Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor
More informationRegression With a Categorical Independent Variable
Regression With a Categorical Independent Variable Lecture 15 March 17, 2005 Applied Regression Analysis Lecture #15-3/17/2005 Slide 1 of 29 Today s Lecture» Today s Lecture» Midterm Note» Example Regression
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationMORE ON SIMPLE REGRESSION: OVERVIEW
FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationLecture 7 Remedial Measures
Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11, Chapter 4 7-1 Topic Overview Review Assumptions & Diagnostics Remedial Measures for Non-normality Non-constant variance
More informationRatio of Polynomials Fit One Variable
Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationMultiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:
Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationIntroduction to Within-Person Analysis and RM ANOVA
Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationData Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras
Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 36 Simple Linear Regression Model Assessment So, welcome to the second lecture on
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationOverview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation
Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already
More informationQ1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74
Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note
More informationRegression with Nonlinear Transformations
Regression with Nonlinear Transformations Joel S Steele Portland State University Abstract Gaussian Likelihood When data are drawn from a Normal distribution, N (µ, σ 2 ), we can use the Gaussian distribution
More informationSimple Linear Regression
Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More informationIntroduction to the Analysis of Variance (ANOVA)
Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationLecture 10: F -Tests, ANOVA and R 2
Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More information