sociology 362 regression

Similar documents
sociology 362 regression

Section Least Squares Regression

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

STATISTICS 110/201 PRACTICE FINAL EXAM

General Linear Model (Chapter 4)

Statistical Modelling in Stata 5: Linear Models

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Confidence Interval for the mean response

ECO220Y Simple Regression: Testing the Slope

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Problem Set 10: Panel Data

Lab 10 - Binary Variables

1 The basics of panel data

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

ECON3150/4150 Spring 2016

Multiple Regression: Inference

Lecture 4: Multivariate Regression, Part 2

Lab 07 Introduction to Econometrics

ECON3150/4150 Spring 2015

THE MULTIVARIATE LINEAR REGRESSION MODEL

1 Independent Practice: Hypothesis tests for one parameter:

Lecture 4: Multivariate Regression, Part 2

Lecture 7: OLS with qualitative information

Problem Set 1 ANSWERS

Interpreting coefficients for transformed variables

9. Linear Regression and Correlation

Monday 7 th Febraury 2005

Statistical Inference with Regression Analysis

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Lab 6 - Simple Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

Binary Dependent Variables

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Model Building Chap 5 p251

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Chapter 4. Regression Models. Learning Objectives

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Applied Statistics and Econometrics

Econometrics. 8) Instrumental variables

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Simple Linear Regression Using Ordinary Least Squares

Regression #8: Loose Ends

Lecture 3: Multivariate Regression

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Unemployment Rate Example

Correlation and Simple Linear Regression

Lecture 12: Interactions and Splines

Lecture#12. Instrumental variables regression Causal parameters III

Statistical Techniques II EXST7015 Simple Linear Regression

Lecture 3: Inference in SLR

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

****Lab 4, Feb 4: EDA and OLS and WLS

Econometrics II Censoring & Truncation. May 5, 2011

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Thursday Morning. Growth Modelling in Mplus. Using a set of repeated continuous measures of bodyweight

TMA4255 Applied Statistics V2016 (5)

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

y response variable x 1, x 2,, x k -- a set of explanatory variables

ECON Introductory Econometrics. Lecture 17: Experiments

Lecture 24: Partial correlation, multiple regression, and correlation

Essential of Simple regression

ECON3150/4150 Spring 2016

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Lecture 12: Effect modification, and confounding in logistic regression

Empirical Application of Simple Regression (Chapter 2)

Lecture 11: Simple Linear Regression

Econometrics Midterm Examination Answers

1 A Review of Correlation and Regression

Computer Exercise 3 Answers Hypothesis Testing

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Fixed and Random Effects Models: Vartanian, SW 683

Design of Engineering Experiments Chapter 5 Introduction to Factorials

A discussion on multiple regression models

Regression Models. Chapter 4

Week 3: Simple Linear Regression

Applied Statistics and Econometrics

What If There Are More Than. Two Factor Levels?

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Exercices for Applied Econometrics A

ECON 836 Midterm 2016

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

Transcription:

sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say, X). The feature of the response variable distribution that most work on regression looks at is the mean. The response variable is frequently quantitative and measured on a true metric, but it doesn t have to be: we ll do regression with qualitative, categorical response variables. The independent variables (aka regressors) are frequently quantitative, but they don t have to be: we ll do regressions with qualitative, categorical independent variables. But for the time being we ll work exclusively with regression models in which the dependent variable and the independent variable are both quantitative. Below we use data from 1 respondents to the 198 current population survey (cps) to look at how the mean of the sample conditional distribution of hourly wage varies across distinct values of. Let s begin by graphing Y against X, wages (vertical axis) against schooling (horizontal axis). 3 1 1 figure 1. conditional distributions of wage by schooling Let s begin by looking at a model for the mean of wages that totally ignores schooling. Write this model as M ( y j ) a 1 where a is a constant that is calculated from sample data. Let the calculated value of a be written as â. Then the predicted or fitted value of wage for the ith person at the jth value of schooling can be written as ˆ aˆ y

So the equation for the observed value of wage for the i th person at the j th year of schooling is Where the term on the end is the residual, the difference between the observed value of the response variable and the fitted value from the model. To render all this operational, the constant â must be calculated from sample data. For that purpose we use the function of sample data that minimizes the sum of the squared residuals: e y yˆ ) ( ( y aˆ) e The value of â can be found by running 1. regress hrwage y aˆ + ˆ e Source SS df MS Number of obs 1 ---------+------------------------------ F(, 14). Model.. Prob > F. Residual 1374.963 14 4.783 R-squared. ---------+------------------------------ Adj R-squared. Total 1374.963 14 4.783 Root MSE 4.967 hrwage Coef. Std. Err. t P> t [9% Conf. Interval] ---------+-------------------------------------------------------------------- _cons 9.88874.161 4.36. 8.66499 9.13649 which yields the least-squares value of a y 889. This will be our predicted or fitted value of wage for everyone in the sample, no matter how many they have, since the model ignores schooling. ˆ $9.. pred grand Here s the graph of the fitted line against.

hrwage grand 3 1 1 figure. fitting constant function Now let s fit a model in which the fitted/predicted values of y are equal to the mean wage at each value of. In contrast to the previous model, in which there was the same mean wage at every value of schooling, let s consider a model in which there s a possibly different value of the mean at every value of X, a different fitted value. Hence, there will be as many different, distinct predictions as there are different values of schooling, in this case, eleven. So the second model for the mean of y is M ( y j ) a j Then the predicted or fitted value of wage for the ith person at the jth value of schooling can be written as yˆ ˆ a j where the value of the â j that minimizes the sum of squared residuals are the conditional sample means at each value of schooling, y j. Then the equation for the i th observation at the j th is y aˆ + eˆ j To find the eleven fitted wage values, I issue the following command:

3. oneway hrwage edyrs, tab Summary of hrwage edyrs Mean Std. Dev. Freq. ------------+------------------------------------ 8.98.43 1 9 7.33 4. 1 1 7.3.66 17 11 6.8 3.33 7 1 7.89 3.69 1 13 8. 4. 36 14 1.41.3 1 1.67.4 13 16 1.84.3 7 17 13.61 6.98 4 18 13.3 6.9 31 ------------+------------------------------------ Total 9.9 4.91 1 Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 1.983 1.1983 1.91. Within groups 117.9798 4.1844837 ------------------------------------------------------------------------ Total 1374.963 14 4.783 Here s the graph of this sample fitted conditional mean function: hrwage mean_y 3 1 1 figure 3. conditional mean function Instead of a sample conditional mean function that fits exactly the mean of wage for every distinct value of schooling, perhaps we would prefer, or be satisfied with, a linear approximation to it. To get the best linear predictor of wage given schooling, we do a linear regression on schooling. The model for the mean wage is then M ( y ) a + 3 j bx

Which yields the equation for the fitted line: yˆ aˆ + bˆ x So the equation for the observation is y aˆ + bx ˆ + eˆ The least-squares values of â and bˆ can be found by running: 4. regress hrwage edyrs Source SS df MS Number of obs 1 ---------+------------------------------ F( 1, 13) 97.7 Model 198.6338 1 198.6338 Prob > F. Residual 1394.8996 13.696 R-squared.16 ---------+------------------------------ Adj R-squared.184 Total 1374.963 14 4.783 Root MSE 4.14 hrwage Coef. Std. Err. t P> t [9% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs.8347.83333 9.88..698174.987137 _cons -1.77461 1.11671-1.89.113-3.968497.419961. pred blp Here s the graph of the fitted values of wage from the linear regression. hrwage blp 3 1 1 figure 4. best linear predictor

Here s the graph of all the fitted models. The linear regression does a good job of tracking the exact fitted conditional mean function. To see how good, compare the mean square residuals from the different models. 3 grand mean_y blp 1 1 figure. constant, mean, and blp functions model comparisons constant model conditional mean linear regression SST total sum of squares 1374.96 1374.96 1374.96 SSresidual residual sum of squares 1374.96 117.98 1394.9 SSregression 1.98 198.6 Regression sum of squares df residual degrees of freedom (n-1) 14 (n-11) 4 (n-) 13 MSres mean square residual (1374.9/14)4.7 (117.98/4).18 (1394.9/13).6 Root MSres sqrt(4.7) 4.9 sqrt(.18) 4.49 sqrt(.6) 4.

Other statistics for wages and schooling total variation in y: standard deviation of y: ( y y) 1374.963 s y 1374.963/ 14 4.91 mean of y: y 9.88 total variation in x: ( x x) 919.967 standard deviation of x: s x 919.967 / 14.38 mean of x: x 13.19 covariation of y and x: ( x x)( y y) 44.8 covariance of y and x: s xy 44.8/ 14 4.68 correlation of x and y: r xy s / s s 4.68/(.38)(4.91).4 xy x y