sociology 362 regression

Similar documents
sociology 362 regression

Section Least Squares Regression

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Confidence Interval for the mean response

ECO220Y Simple Regression: Testing the Slope

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Statistical Modelling in Stata 5: Linear Models

General Linear Model (Chapter 4)

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

STATISTICS 110/201 PRACTICE FINAL EXAM

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Multiple Regression: Inference

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

1 Independent Practice: Hypothesis tests for one parameter:

ECON3150/4150 Spring 2015

Monday 7 th Febraury 2005

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Lecture 4: Multivariate Regression, Part 2

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Lecture 4: Multivariate Regression, Part 2

Problem Set 1 ANSWERS

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Problem Set 10: Panel Data

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Lab 10 - Binary Variables

THE MULTIVARIATE LINEAR REGRESSION MODEL

Interpreting coefficients for transformed variables

ECON3150/4150 Spring 2016

Lecture 5. In the last lecture, we covered. This lecture introduces you to

1 The basics of panel data

Unemployment Rate Example

Lab 6 - Simple Regression

Lecture 7: OLS with qualitative information

9. Linear Regression and Correlation

Binary Dependent Variables

Statistical Inference with Regression Analysis

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

1 A Review of Correlation and Regression

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,

Regression Models. Chapter 4

Lecture 3: Multivariate Regression

Correlation and Simple Linear Regression

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Analysis of Bivariate Data

Lecture#12. Instrumental variables regression Causal parameters III

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

Week 3: Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Applied Statistics and Econometrics

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Econometrics. 8) Instrumental variables

Computer Exercise 3 Answers Hypothesis Testing

Essential of Simple regression

F Tests and F statistics

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Empirical Application of Simple Regression (Chapter 2)

Lecture (chapter 13): Association between variables measured at the interval-ratio level

ECON3150/4150 Spring 2016

Econometrics II Censoring & Truncation. May 5, 2011

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Inference for the Regression Coefficient

Lecture 12: Interactions and Splines

Business Statistics. Lecture 9: Simple Regression

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Lab 07 Introduction to Econometrics

Rockefeller College University at Albany

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

10) Time series econometrics

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

y response variable x 1, x 2,, x k -- a set of explanatory variables

Specification Error: Omitted and Extraneous Variables

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

****Lab 4, Feb 4: EDA and OLS and WLS

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Business Statistics. Lecture 10: Correlation and Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression

Lecture 3: Inference in SLR

Handout 12. Endogeneity & Simultaneous Equation Models

Homework Solutions Applied Logistic Regression

Lecture 12: Effect modification, and confounding in logistic regression

Regression #8: Loose Ends

Sociology Exam 2 Answer Key March 30, 2012

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

A discussion on multiple regression models

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Simple Linear Regression Using Ordinary Least Squares

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

Lab 11 - Heteroskedasticity

Inference for Regression Simple Linear Regression

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

Transcription:

sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say, X). The feature of the response variable distribution that has attracted the most interest in the past is the mean. The response variable is frequently quantitative and measured on a true metric, but it doesn t have to be; similarly, the independent variables are frequently quantitative, but they don t have to be. For the time being we ll work exclusively with regression models in which the dependent variable and the independent variables are both quantitative. Below we use data from respondents to the current population survey (cps) to look at how the mean of the sample conditional distribution of hourly wage varies across distinct values of. Let s begin by graphing Y against X, i.e., wages (vertical axis) against schooling (horizontal axis). 6 7 8 9 11 1 13 14 16 17 18 19 figure 1. conditional distributions of wage by schooling Model 1 We ll start with a model for the mean of wages that totally ignores schooling. Write this model as M ( y ) a 1 a where 8 18 is a schooling value and is a constant that is calculated from sample data. Let the calculated value of a be written as â. Then the predicted or fitted value of wage for the ith person at the th value of schooling can be written as ˆ aˆ y

So the equation for the observed value of wage for the i th person at the th year of schooling can be written as y aˆ + ˆ e where the term on the end is the residual, the difference between the observed value of the response variable and the fitted value from the model. To render all this operational, the constant a must be calculated from sample data. For that purpose we use the function of sample data that minimizes the sum of the squared residuals: The value of â 1. regress hrwage e ˆ ˆ ( y yˆ ) eˆ ( y can be found by running aˆ) Source SS df MS Number of obs ---------+------------------------------ F(, 14). Model.. Prob > F. Residual 1374.963 14 4.783 R-squared. ---------+------------------------------ Ad R-squared. Total 1374.963 14 4.783 Root MSE 4.967 hrwage Coef. Std. Err. t P> t [9% Conf. Interval] ---------+-------------------------------------------------------------------- _cons 9.88874.16 4.36. 8.66499 9.13649 which yields the least-squares value of a y 889. This will be our predicted or fitted value of wage for everyone in the sample, no matter how many they have, since the model ignores schooling. ˆ $9. Here s the graph of the fitted line against. hrwage grand 6 7 8 9 11 1 13 14 16 17 18 19 figure. fitting constant function

Model Now let s fit a model in which the fitted values of y are equal to the mean wage at each distinct value of. In contrast to the previous model, in which there was the same mean wage at every value of schooling, this model accommodates a possibly different value of the mean at every value of X. Hence, there will be as many different, distinct predictions as there are different values of schooling, in this case, eleven. You can see from the scatter diagram that this makes more sense. So the second model for the mean of y is Then the predicted or fitted value of wage for the ith person at the th value of schooling can be written as where the value of the â M ( y ) a yˆ aˆ that minimizes the sum of squared residuals are the conditional sample means at each value of schooling, i.e., y. equation for the i th observation at the th is Then the y aˆ + eˆ To find the eleven fitted wage values, I issue the following Stata command:. oneway hrwage edyrs, tab Summary of hrwage edyrs Mean Std. Dev. Freq. ------------+------------------------------------ 8.98.43 9 7.33 4. 1 7.3.66 17 11 6.8 3.33 7 1 7.89 3.69 13 8. 4. 36 14.41.3.67.4 13 16.84.3 7 17 13.61 6.98 4 18 13.3 6.9 31 ------------+------------------------------------ Total 9.9 4.91 Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 1.983.1983.91. Within groups 17.9798 4.1844837 ------------------------------------------------------------------------ Total 1374.963 14 4.783 Here s the graph of this sample fitted conditional mean function:

hrwage mean_ y 6 7 8 9 11 1 13 14 16 17 18 19 figure 3. conditional mean function Model 3 Instead of a sample conditional mean function that fits exactly the mean of wage for every distinct value of schooling, perhaps we would prefer, or be satisfied with, a linear approximation to it. To get the best linear predictor of wage given schooling, we do a linear regression on schooling. The model for the mean wage is then M ( y ) a + 3 bx which yields the equation for the fitted line: yˆ aˆ + bˆ x So the equation for the observation is y aˆ + bx ˆ + eˆ To render all this operational, the constants â and must be calculated from sample data. For that purpose we again use the function of sample data that minimizes the sum of the squared residuals: e ˆ ( y yˆ ) ˆb eˆ ( y aˆ bx ˆ ) â bˆ The values of and can be found by running

3. regress hrwage edyrs Source SS df MS Number of obs ---------+------------------------------ F( 1, 13) 97.7 Model 198.6338 1 198.6338 Prob > F. Residual 394.8996 13.696 R-squared.16 ---------+------------------------------ Ad R-squared.84 Total 1374.963 14 4.783 Root MSE 4.14 hrwage Coef. Std. Err. t P> t [9% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs.8347.8333 9.88..698174.987137 _cons -1.77461 1.1167-1.89.113-3.968497.419961 Here s the graph of the fitted values of wage from the linear regression. hrwage blp 6 7 8 9 11 1 13 14 16 17 18 19 figure 4. best linear predictor Below is the graph of all the fitted models. The linear regression does a good ob of tracking the exact fitted conditional mean function. To see how good, compare the mean square residuals from the different models as given in the table.

grand mean_y blp 6 7 8 9 11 1 13 14 16 17 18 19 figure. constant, mean, and blp functions model comparisons constant model conditional mean linear regression SST total sum of squares 1374.96 1374.96 1374.96 SSresidual residual sum of squares 1374.96 17.98 394.9 SSregression 1.98 198.6 Regression sum of squares df residual degrees of freedom (n-1) 14 (n-11) 4 (n-) 13 MSres mean square residual (1374.9/14)4.7 (17.98/4).18 (394.9/13).6 Root MSres sqrt(4.7) 4.9 sqrt(.18) 4.49 sqrt(.6) 4.

Other statistics for wages and schooling total variation in y: standard deviation of y: ( y y) 1374.963 s y 1374.963/ 14 4.91 mean of y: y 9.88 total variation in x: ( x ) x 919. 967 standard deviation of x: s x 919.967 / 14.38 mean of x: x 13. 19 covariation of y and x: ( x x)( y y) 44. 8 covariance of y and x: s xy 44.8/ 14 4.68 correlation of x and y: r xy s / s s 4.68/(.38)(4.91).4 xy x y