Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang
|
|
- Andrew Cox
- 6 years ago
- Views:
Transcription
1 Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang November,
2 Part : Introduction Data Set The data set contains a random sample of observations on variables sampled from the Current Population Survey of 98. It provides information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. This data set is obtained from StatLib and its original source is: Berndt, ER. The Practice of Econometrics. 99. NY: Addison-Wesley. (The JMP file containing this data set is attached as raw_data.jmp) Variables The variables contained in the data set are summarized in the table below: Variable Name Description Data type Explanatory Variable: WAGE Wage (dollars per hour). Continuous Predictor Variables: EDUCATION Number of years of education. Continuous SOUTH SEX Indicator variable for Southern Region =Person lives in South =Person lives elsewhere Indicator variable for sex =Female =Male Nominal Nominal EXPERIENCE Number of years of work experience. Continuous UNION Indicator variable for union membership =Union member =Not union member Nominal AGE Age (years). Continuous RACE Race =Other =Hispanic =White Nominal
3 OCCUPATION SECTOR MARR Occupational category =Management =Sales =Clerical =Service =Professional =Other Sector =Other =Manufacturing =Construction Marital Status =Unmarried =Married Nominal Nominal Nominal NOTE: For nominal variables SOUTH, SEX, UNION and MARR, they already serve as dummy variables since they can only take the values and ; for nominal variables RACE, OCCUPATION and SECTOR, dummy variables will be introduced automatically by JMP when doing regressions. Objective According to common sense, wages may be dependent on the other characteristics of the workers more or less. The objectives of this project are to find out whether wages are indeed related to these characteristics and if so, what are possible ways to model such relationships and how good are these models. A final model will be selected that models the relationship between wages and these characteristics the best on the given data set, which might help to predict a worker s wage based on some relevant characteristics of the worker. However, since the data were collected in 98, it might not be appropriate for us to make general inferences about a worker s wage in the current time.
4 Part : Data Analysis Problems with the data As an initial attempt, WAGE is fitted against all predictor variables. Model : WAGE = β + β EDUCATION + β SOUTH + β SEX + β EXPERIENCE + β UNION + β AGE + β RACE + β OCCUPATION + β SECTOR + β MARR (NOTE: Nominal variables RACE, OCCUPATION and SECTOR should be replaced with corresponding dummy variables. For simplicity, the model expression above didn t reflect this, but conversions will be done when actually carrying out the regression.) The output of the least square regression immediately helps to identify two major problems to be corrected before any further analyses can be carried out. Problem : Unstable variance The problem of unstable variance is identified by examining the JMP output of residual plot against the predicted values as shown below: Residual by Predicted Plot WAGE Residual - WAGE Predicted From the above residual plot, we can easily see that the variance is increasing with the predicted value. This indicates that Var(Y) is a function of E(Y) instead of being constant, and therefore needs to be transformed to be stabilized. Applying the Box-Cox Transformation technique in JMP, we find out that the best transformation needed to stabilize Var(Y) is the log transformation. Therefore, the new model should become:
5 Model +: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX + β EXPERIENCE + β UNION + β AGE + β RACE + β OCCUPATION + β SECTOR + β MARR After refitting Log(WAGE) against all possible predictor variables, the residual plot is as following: Residual by Predicted Plot.. LOG (WAGE) Residual LOG (WAGE) Predicted The residual plot now suggests a much more constant variance, and from this step on, we will use Log(WAGE) to do all the least square regressions instead of using WAGE. Problem : Multicollinearity The problem of multicollinearity is identified by examining the VIF values in the JMP output of least square regression as shown below: Parameter Estimates Term VIF Intercept. EDUCATION.8 SOUTH. SEX EXPERIENCE.87 UNION.88 AGE RACE[].7977 RACE[].8 OCCUPATION[].79 OCCUPATION[].97 OCCUPATION[].78 OCCUPATION[].889 OCCUPATION[].9899 SECTOR[].9 SECTOR[].99 MARR.9
6 We see that VIF values of EDUCATION, EXPERIENCE and AGE are much greater than, which suggests a serious multicollinearity problem. Usually, multicollinearity problems are caused by correlated predicator variables. Therefore, pair-wise correlations between EDUCATION, EXPERIENCE and AGE are calculated to justify the cause to this multicollinearity problem. Correlations EDUCATION EXPERIENCE AGE EDUCATION EXPERIENCE AGE Scatterplot Matrix EDUCATION EXPERIENCE AGE From the pair-wise correlations and the scatterplot matrix, we see that AGE and EXPERIENCE are almost perfectly correlated with a correlation coefficient r =.978. This explains the multicollinearity problem identified above and to solve this problem, we need to get rid of either AGE or EXPERIENCE. The following models to be evaluated start with removing the AGE variable from the model.
7 Models without the AGE variable Removing the AGE variable from Model +, we get the model below: Model -AGE: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX β RACE + β OCCUPATION + β SECTOR + β MARR β EXPERIENCE + β UNION + Fitting the above model in JMP, we get the following least square regression output. Summary of Fit RSquare.8 RSquare Adj.889 Observations (or Sum Wgts) Parameter Estimates Term Estimate VIF Intercept.7. EDUCATION..88 SOUTH SEX EXPERIENCE UNION.7.89 RACE[] RACE[] OCCUPATION[].88. OCCUPATION[] OCCUPATION[]..9 OCCUPATION[] OCCUPATION[] SECTOR[] SECTOR[]..77 MARR Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH SEX <. EXPERIENCE <. UNION <. RACE OCCUPATION.9.9 <. SECTOR MARR Removing the AGE variable results in a drop in the R value from.7 to.8, which is negligible. However, the multicollinearity problem has been solved, as there is no VIF value greater than now. The above output also suggests that the coefficients of the RACE, SECTOR and MARR variables are not significant at α =. level since their corresponding P-values in the partial F-tests are greater than.. Therefore, in the next model, we will try to get rid of these variables to see how the regression results will be affected.
8 Model : Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX + β EXPERIENCE + β UNION + β OCCUPATION Summary of Fit RSquare.877 RSquare Adj.9 Observations (or Sum Wgts) Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total 8.8 <. Parameter Estimates Term Estimate VIF Intercept.79. EDUCATION SOUTH SEX EXPERIENCE.9.8 UNION.8.8 OCCUPATION[].8. OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[]..9 Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH SEX.9.9 <. EXPERIENCE <. UNION <. OCCUPATION <. Cp Press The F-test of the hypothesis H : β = β = β = β = β = β = indicates the rejection of H since the P-value in the pooled F-test is less than.. Also, each of the coefficients in the model is statistically significant as the P-values in the partial F-tests are very small. All VIF values are less than, which suggests no multicollinearity problem in the model. The R value of this model is.877 (a slight drop from.8 in the previous model) and therefore,.877% of the variability in the wages of workers is explained by regression on the predictors. This may not seem to be a satisfactory result, but thinking of the reality of uncertainties, this may still be acceptable. The R adj, C p and PRESS statistics are also calculated for comparisons with other competing models. In all, Model seems quite reasonable and does suggest a good candidate model to consider. The next step is to consider all possible interactions between the predictors as well as their corresponding quadratic terms (only for EDUCATION and EXPERIENCE), and see
9 if we can have an improvement in the regression results. Since there are C = possible interaction terms in total plus quadratic terms, the JMP output is very lengthy and therefore omitted. As a final result, only EXPERIENCE turns out to be significant and EDUCATION as well as all the interaction terms can be discarded as they neither are significant nor add much information to the model. Introducing EXPERIENCE to Model, we get the following model: Model +: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX β OCCUPATION + β EXPERIENCE 7 + β EXPERIENCE + β UNION + Summary of Fit RSquare.88 RSquare Adj. Observations (or Sum Wgts) Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total 8.8 <. Parameter Estimates Term Estimate VIF Intercept EDUCATION SOUTH SEX -..7 EXPERIENCE.88.8 UNION OCCUPATION[].97.7 OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[]..9 (EXPERIENCE-7.8)*(EXPERIENCE-7.8) Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH SEX <. EXPERIENCE <. UNION.89.9 <. OCCUPATION <. EXPERIENCE*EXPERIENCE..879 <. Cp Press This model gives better results on the R, R adj, C p and PRESS statistics than Model does. Also, each of the coefficients in the model is highly significant and all the VIF values are less than. Therefore, Model + suggests another good candidate model to consider.
10 Models without the EXPERIENCE variable As mentioned in previous sections, we can also remove the EXPERIENCE variable from Model + to get the model below: Model -EXPERIENCE: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX β OCCUPATION + β SECTOR + β MARR β UNION + β AGE + β RACE + Fitting the above model in JMP, we get the following least square regression output. Summary of Fit RSquare.7 RSquare Adj.7 Observations (or Sum Wgts) Parameter Estimates Term Estimate VIF Intercept.. EDUCATION.7.88 SOUTH SEX UNION.8.87 AGE RACE[] RACE[] OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[] SECTOR[] SECTOR[].7.78 MARR Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH SEX <. UNION <. AGE <. RACE OCCUPATION <. SECTOR MARR Removing the EXPERIENCE variable results in a drop in the R value from.7 to.7, which is negligible. However, the multicollinearity problem has also been solved, as there is no VIF value greater than. Just as in the case of removing the AGE variable, the RACE, SECTOR and MARR variables are suggested to be excluded from the model since they are not significant at α =. level. Therefore, these variables are omitted in the next model to see how the regression results will be affected.
11 Model : Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX + β UNION + β AGE + β OCCUPATION Summary of Fit RSquare.8 RSquare Adj.89 Observations (or Sum Wgts) Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total 8.8 <. Parameter Estimates Term Estimate VIF Intercept.989. EDUCATION SOUTH SEX UNION AGE.7.98 OCCUPATION[]..97 OCCUPATION[] OCCUPATION[] -..8 OCCUPATION[] OCCUPATION[] Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH.8.9. SEX <. UNION.9.7 <. AGE <. OCCUPATION <. Cp Press This model also gives fairly good results on the R, R adj, C p and PRESS statistics. All its regression coefficients are highly significant and no VIF value is greater than. Therefore, Model also suggests a good candidate model to consider. As in the case of developing Model +, the next step is to consider all possible interactions between the predictors as well as their corresponding quadratic terms (only for EDUCATION and AGE), and see if we can have an improvement in the regression results. As a final result, only AGE turns out to be significant and EDUCATION as well as all the interaction terms can be discarded. Again, the lengthy JMP output is omitted. Introducing AGE to Model, we get the following model:
12 Model +: Log( WAGE) = β β OCCUPATION + β AGE + β EDUCATION + β SOUTH + β SEX 7 + β UNION + β AGE + Summary of Fit RSquare.87 RSquare Adj.8 Observations (or Sum Wgts) Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total 8.8 <. Parameter Estimates Term Estimate VIF Intercept.97. EDUCATION SOUTH SEX UNION AGE.8.88 OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[] OCCUPATION[].9.9 (AGE-.8)*(AGE-.8) -.. Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F EDUCATION <. SOUTH SEX <. UNION.797. <. AGE <. OCCUPATION.7.9 <. AGE*AGE.7.89 <. Cp Press This model gives even better results on the R, R adj, C p and PRESS statistics than Model does, while all the regression coefficients in the model are still highly significant and the VIF values are all less than. Therefore, Model + suggests one more good candidate model to consider.
13 Model Comparisons Based on above analyses, we have altogether four candidate models to consider. Summarizing all of them according to their R, R adj, C p and PRESS values, we get the following table: Model Variables in Model R R C adj p PRESS EDUCATION, SOUTH, SEX, EXPERIENCE, UNION, OCCUPATION + EDUCATION, SOUTH, SEX, EXPERIENCE, UNION, OCCUPATION, EXPERIENCE EDUCATION, SOUTH, SEX, UNION, AGE, OCCUPATION + EDUCATION, SOUTH, SEX, UNION, AGE, OCCUPATION, AGE According to the above table, both Model + and Model + are better than Model and Model, and between the two better ones, Model + is just slightly better than Model +. Actually the differences between Model + and Model + can be neglected, which means EXPERIENCE and AGE can be used interchangeably in fitting models, due to their almost-perfect correlations with each other. For decision-making purposes, I will choose Model + as the final model, given that it can pass the following checks for violations of model assumptions.
14 Model Assumptions Verifications Doing regression diagnostics on Model +: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX + β EXPERIENCE + β UNION + β OCCUPATION + β EXPERIENCE 7 We get the following JMP results: a) Checking for outliers and influential observations: By calculating the standardized residuals and the h ii values, one large outlier with standardized e i = -.78 is detected. This was a male, with years of experience and years of education, in a management position, who lived in the north and was not a union member. As an outlier, he had much lower wages than expected. This outlier will simply be omitted when doing the following analyses. b) Residual plots against all predictor variables (checking for linearity) Bivariate Fit of By EDUCATION - - EDUCATION Bivariate Fit of By SOUTH SOUTH
15 Bivariate Fit of By SEX SEX Bivariate Fit of By EXPERIENCE - - EXPERIENCE Bivariate Fit of By UNION UNION
16 Bivariate Fit of By AGE - - AGE Oneway Analysis of By RACE - - RACE Oneway Analysis of By OCCUPATION - - OCCUPATION
17 Oneway Analysis of By SECTOR - - SECTOR Bivariate Fit of By MARR MARR Except for the large outlier identified in (a), all of the residuals exhibit random scatter around zero, and there are no unusual patterns in these residual plots. Therefore no further transformations are needed and the omitted variables can keep being excluded.
18 c) Residual plot against predicted value (checking for constant variance) Bivariate Fit of By Predicted LOG (WAGE) - - Predicted LOG (WAGE) Except for the large outlier identified in (a), the variances appear stable and the dispersion of the residuals is approximately constant with respect to the predicted values. Therefore the constant variance assumption is satisfied. d) Normal plot of residuals (checking for normality) Normal Quantile Plot - -
19 Except for the large outlier identified in (a), the normal plot of residuals appears very close to a straight line, which indicates that the normality assumption is satisfied. e) Run-Chart of residuals (checking for independence) Residual by Row Plot.. Residual Row Number Durbin-Watson Durbin-Watson Number of Obs. AutoCorrelation Prob<DW There are no signs of correlation introduced by time order from the above Run-Chart of residuals. Also, the autocorrelation statistic is calculated as.8 and its associated P-value is.9, which indicates that we do not have a problem of autocorrelation. Therefore, the assumption of independence is satisfied.
20 Part : Final Conclusion Based on all above analyses, the final model that has been decided on is: Log( WAGE) = β + β EDUCATION + β SOUTH + β SEX + β EXPERIENCE + β UNION + β OCCUPATION + β EXPERIENCE 7 (EXPERIENCE variables) is centered and OCCUPATION should be replaced with dummy Substituting actual estimates for β s, we get: Log( WAGE) = EDUCATION.97SOUTH.SEX +.88EXPERIENCE +.9UNION +.97OCCUPATION[].987OCCUPATION[].877OCCUPATION[].889OCCUPATION[] +.OCCUPATION [].7( EXPERIENCE 7.8) According to this final model, more education and more experiences helped to earn more; people living in the South earned less than people living elsewhere; females earned less than males; union members earned more than non-union members; management and professional positions were paid the most, and service and clerical positions were paid the least. The overall variability in the wages of workers explained by this model is R =.88%. As mentioned in the beginning, since the data used to build this model were collected in 98, we might not be able to use this model to make general inferences about a worker s wage in the current time.
Regression Models for Time Trends: A Second Example. INSR 260, Spring 2009 Bob Stine
Regression Models for Time Trends: A Second Example INSR 260, Spring 2009 Bob Stine 1 Overview Resembles prior textbook occupancy example Time series of revenue, costs and sales at Best Buy, in millions
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationChapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)
Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies) Statistics and Introduction to Econometrics M. Angeles Carnero Departamento de Fundamentos del Análisis Económico
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationx3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators
Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.
More informationStat 328 Final Exam (Regression) Summer 2002 Professor Vardeman
Stat Final Exam (Regression) Summer Professor Vardeman This exam concerns the analysis of 99 salary data for n = offensive backs in the NFL (This is a part of the larger data set that serves as the basis
More informationSingle and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationChapter 14 Multiple Regression Analysis
Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationRegression Diagnostics Procedures
Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More informationLecture 5: Omitted Variables, Dummy Variables and Multicollinearity
Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More information1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11
Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................
More informationLecture (chapter 13): Association between variables measured at the interval-ratio level
Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to
More informationCh 7: Dummy (binary, indicator) variables
Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male
More informationSMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationMultiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar
Multiple Regression and Model Building 11.220 Lecture 20 1 May 2006 R. Ryznar Building Models: Making Sure the Assumptions Hold 1. There is a linear relationship between the explanatory (independent) variable(s)
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationEcn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,
More informationECON Interactions and Dummies
ECON 351 - Interactions and Dummies Maggie Jones 1 / 25 Readings Chapter 6: Section on Models with Interaction Terms Chapter 7: Full Chapter 2 / 25 Interaction Terms with Continuous Variables In some regressions
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationIntroduction to statistical modeling
Introduction to statistical modeling Illustrated with XLSTAT Jean Paul Maalouf webinar@xlstat.com linkedin.com/in/jean-paul-maalouf November 30, 2016 www.xlstat.com 1 PLAN XLSTAT: who are we? Statistics:
More informationInference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3
Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More information2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0
Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationChapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.
Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationMBA Statistics COURSE #4
MBA Statistics 51-651-00 COURSE #4 Simple and multiple linear regression What should be the sales of ice cream? Example: Before beginning building a movie theater, one must estimate the daily number of
More informationRegression #8: Loose Ends
Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch
More informationChapter 26 Multiple Regression, Logistic Regression, and Indicator Variables
Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables 26.1 S 4 /IEE Application Examples: Multiple Regression An S 4 /IEE project was created to improve the 30,000-footlevel metric
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationThe simple linear regression model discussed in Chapter 13 was written as
1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationChapter 10 Correlation and Regression
Chapter 10 Correlation and Regression 10-1 Review and Preview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling Copyright 2010, 2007, 2004
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationContest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.
Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round
More informationRidge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014
Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...
More informationPBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.
PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the
More informationChecking model assumptions with regression diagnostics
@graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor
More informationECON 4230 Intermediate Econometric Theory Exam
ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationMultiple Regression. Peerapat Wongchaiwat, Ph.D.
Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationFAQ: Linear and Multiple Regression Analysis: Coefficients
Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable
More informationFinQuiz Notes
Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationREGRESSION DIAGNOSTICS AND REMEDIAL MEASURES
REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes
More information4. Nonlinear regression functions
4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationOverview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation
Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationUniversidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8
Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8 1. The sales of a company amount to 196 millions of dollars in 2009 and increased up to 198 millions in 2010. (a)
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationEconometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]
Econometrics (60 points) Question 7: Short Answers (30 points) Answer parts 1-6 with a brief explanation. 1. Suppose the model of interest is Y i = 0 + 1 X 1i + 2 X 2i + u i, where E(u X)=0 and E(u 2 X)=
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationMore on Roy Model of Self-Selection
V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income
More informationAppendix B. Additional Results for. Social Class and Workers= Rent,
Appendix B Additional Results for Social Class and Workers= Rent, 1983-2001 How Strongly do EGP Classes Predict Earnings in Comparison to Standard Educational and Occupational groups? At the end of this
More informationa. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =
Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationECON 497 Final Exam Page 1 of 12
ECON 497 Final Exam Page of 2 ECON 497: Economic Research and Forecasting Name: Spring 2008 Bellas Final Exam Return this exam to me by 4:00 on Wednesday, April 23. It may be e-mailed to me. It may be
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationECO321: Economic Statistics II
ECO321: Economic Statistics II Chapter 6: Linear Regression a Hiroshi Morita hmorita@hunter.cuny.edu Department of Economics Hunter College, The City University of New York a c 2010 by Hiroshi Morita.
More informationChapter 12 - Part I: Correlation Analysis
ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,
More informationEconometrics -- Final Exam (Sample)
Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationRegression with Qualitative Information. Part VI. Regression with Qualitative Information
Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving
More informationMISCELLANEOUS REGRESSION TOPICS
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MISCELLANEOUS REGRESSION TOPICS I. AGENDA: A. Example of correcting for autocorrelation. B. Regression with ordinary independent
More informationMultiple Regression Examples
Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +
More informationStat 500 Midterm 2 12 November 2009 page 0 of 11
Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More information