Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Similar documents
Lecture 12: Interactions and Splines

Lecture 6: Introduction to Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

18. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Economics & Business

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

STATISTICS QUESTIONS. Step by Step Solutions.

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

STAT 3008 Applied Regression Analysis

Introduction to Regression

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Statistics for Business and Economics

Negative Binomial Regression

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Learning Objectives for Chapter 11

Chapter 11: Simple Linear Regression and Correlation

Comparison of Regression Lines

Chapter 13: Multiple Regression

Midterm Examination. Regression and Forecasting Models

Chapter 14 Simple Linear Regression

Chapter 8 Multivariate Regression Analysis

Basic Business Statistics, 10/e

Lecture 3 Stat102, Spring 2007

x i1 =1 for all i (the constant ).

Statistics MINITAB - Lab 2

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 15 - Multiple Regression

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Diagnostics in Poisson Regression. Models - Residual Analysis

/ n ) are compared. The logic is: if the two

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Economics 130. Lecture 4 Simple Linear Regression Continued

17 - LINEAR REGRESSION II

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Polynomial Regression Models

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Regression. The Simple Linear Regression Model

Chapter 9: Statistical Inference and the Relationship between Two Variables

Linear Regression Analysis: Terminology and Notation

Addressing Alternative Explanations: Multiple Regression

ANOVA. The Observations y ij

Systems of Equations (SUR, GMM, and 3SLS)

Statistics II Final Exam 26/6/18

Scatter Plot x

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Topic 7: Analysis of Variance

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

January Examinations 2015

β0 + β1xi and want to estimate the unknown

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Marginal Effects of Explanatory Variables: Constant or Variable? 1. Constant Marginal Effects of Explanatory Variables: A Starting Point

Chapter 8 Indicator Variables

β0 + β1xi. You are interested in estimating the unknown parameters β

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

Addressing Alternative. Multiple Regression Spring 2012

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Introduction to Analysis of Variance (ANOVA) Part 1

Lecture 4 Hypothesis Testing

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Topic- 11 The Analysis of Variance

Biostatistics 360 F&t Tests and Intervals in Regression 1

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

e i is a random error

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Statistics Chapter 4

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

CHAPTER 8. Exercise Solutions

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Lab 4: Two-level Random Intercept Model

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Econometrics: What's It All About, Alfie?

β0 + β1xi. You are interested in estimating the unknown parameters β

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Correlation and Regression

Chapter 15 Student Lecture Notes 15-1

Properties of Least Squares

The Ordinary Least Squares (OLS) Estimator

The SAS program I used to obtain the analyses for my answers is given below.

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Basically, if you have a dummy dependent variable you will be estimating a probability.

a. (All your answers should be in the letter!

III. Econometric Methodology Regression Analysis

Transcription:

Lecture 9: Interactons, Quadratc terms and Splnes An Manchakul amancha@jhsph.edu 3 Aprl 7 Remnder: Nested models Parent model contans one set of varables Extended model adds one or more new varables to the parent model one varable added: compare models wth t test two or more varables added: compare models wth F test Return to the example of wage versus experence Effect Modfcaton The phenomenon n whch the relatonshp between the prmary predctor and outcome vares across levels of another predctor We say the other predctor modfes the effect between the prmary predctor and outcome In lnear regresson, coded by ncluson of nteracton term between prmary predctor and another predctor Model 1 E[Wage ] =!ˆ +!ˆ (Experence ) 1 +!ˆ (Gender ) Ths model allows the average wage to dffer for men and women, but the dfference n average wage between men and women s always the same regardless of experence level.

Model 1 Model : Creatng the nteracton varable Source SS df MS Number of obs = 534 -------------+------------------------------ F(, 531) = 61.6 Model 651.49936 135.74968 Prob > F =. Resdual 1145.199 531 1.516387 R-squared =.1884 -------------+------------------------------ Adj R-squared =.1853 Total 1476.6985 533 6.41316 Root MSE = 4.6386 wagehr Coef. Std. Err. t P> t [95% Conf. Interval] educyrs.751834.7685 9.78..6371.91966 gender -.1457.483-5.7. -.915397-1.33716 _cons.17831 1.363.1.834-1.81796.5364 gender: for men 1 for women gender*experence = *experence = for men = 1*experence = experence for women Model Model : output E[Wage ] =!ˆ +!ˆ 1(Experence) +!ˆ (Gender ) +!ˆ 3(Gender! Experence ) What s the nteracton varable??. generate gender_educ = gender*educ. reg wagehr educyrs gender gender_educ Source SS df MS Number of obs = 534 -------------+------------------------------ F( 3, 53) = 41.5 Model 677.434 3 89.477414 Prob > F =. Resdual 11399.663 53 1.58496 R-squared =.19 -------------+------------------------------ Adj R-squared =.1856 Total 1476.6985 533 6.41316 Root MSE = 4.6377 wagehr Coef. Std. Err. t P> t [95% Conf. Interval] educyrs.6831451.98743 6.9..489178.8771194 gender -4.3745.8557 -.1.37-8.466441 -.744591 gender_educ.17533.15713 1.1.73 -.136135.481191 _cons 1.14571 1.313655.84.41-1.47638 3.685181

Model : Interpretaton Equaton for men: E[Wage ] =!ˆ +!ˆ (Experence ) E[Wage ] = 1.1 +.68(Experence ) 1 Equaton for women: E[Wage ] = (!ˆ +!ˆ ) + (!ˆ 1 +!ˆ 3 )(Experence ) E[Wage ] = ( 1.1 " 4.37) + (.68 +.17) (Experence )! : change n mean wage for women vs. men wth no experence! 3 : change n slope (of experence) for women vs. men Model : Predctons by gender, 1 year of experence Men wth 1 year of experence E[Wage ] = 1.1 +.68(1) " 4.37() +.17(! 1) = 1.1 +.68 =!ˆ Women wth 1 year of experence +!ˆ E[Wage ] = 1.1 +.68(1) " 4.37(1) +.17(1! 1) = 1.1 +.68-4.37 +.17 =!ˆ!ˆ +!ˆ 3 s the dfference n mean wage between women and men wth one year of experence 1 +!ˆ +!ˆ 1 +!ˆ 3 Model : Predctons by gender, no experence Men wth no experence E[Wage ] = 1.1 +.68() " 4.37() +.17(! ) = 1.1 =!ˆ Women wth no experence = 1.1-4.37 =!ˆ E[Wage ] = 1.1 +.68() " 4.37(1) +.17(1! )!ˆ s the dfference n mean wage between women and men of no experence +!ˆ Model : Predctons by gender, years of experence Men wth years of experence E[Wage ] = 1.1 +.68() " 4.37() +.17(! ) = 1.1 +.68() =!ˆ Women wth years of experence!ˆ +!ˆ 3 s the dfference n mean wage between women and men wth two years of experence +!ˆ E[Wage ] = 1.1 +.68() " 4.37(1) +.17(1! ) = 1.1 +.68() - 4.37 +.17() =!ˆ 1 +!ˆ +!ˆ 1 +!ˆ 3

Model : Interpretaton! : The average wage for men wth no experence! 1 : The dfference n average wage for a one year ncrease n experence among men! : The dfference n average wage between women and men wth no experence! 3 : The dfference of the dfference n average wage for a one year ncrease n experence between women and men the change n slope between women and men the slope for women s! 1 +! 3 Is the change n slope statstcally sgnfcant? Test model 1 vs. model only 1 varable added use t test for that varable to compare models H :! 3 = n the populaton From the t-statstc, p =.7 Fal to reject H Conclude that model 1 s better Compare to model 1 In the parent model! 1 was slope for both men and women! was dfference between women & men at every experence level In the extended model (wth nteracton)! 1 s slope for men! s dfference between women & men for experence=! 3 s change n slope per year of experence between men & women Model 3: Interacton of two bnary predctors Model : contnuous X, bnary X, ther nteracton slope changes by group Model 3: bnary X, bnary X, ther nteracton dfference n mean changes by group

Model 3: output Graph for Model 3! 3 = Dfference of dfferences Source SS df MS Number of obs = 534 -------------+------------------------------ F( 3, 53) = 13.94 Model 19.58518 3 343.19559 Prob > F =. Resdual 1347.1134 53 4.617195 R-squared =.731 -------------+------------------------------ Adj R-squared =.679 Total 1476.6985 533 6.41316 Root MSE = 4.9616 wagehr Coef. Std. Err. t P> t [95% Conf. Interval] gender -.951139.735696 -.13.897-1.53911 1.348894 marred.51311.61188 4.1. 1.318854 3.73768 gender_mar~d -3.97184.97319-3.41.1-4.879567-1.3148 _cons 8.35475.4936948 16.9. 7.384914 9.34591 Mean hourly wage 1 1 8 6 4! Dfference =! 1 Dfference =! Dfference =! 1 "! 3 Dfference =! "! 3 unmarred men unmarred women marred men marred women Model 3: Creatng the nteracton varable gender: for men 1 for women marred: f unmarred 1 f marred gender*marred = * = for unmarred men = 1* = for unmarred women = *1 = for marred men = 1*1 = 1 for marred women Model 3: Interpretaton! : The average wage for unmarred men! 1 : The dfference n average wage between unmarred women and unmarred men! 1 +! 3 : The dfference n average wage between marred women and marred men! 3 : The dfference of the dfference n average wage between marred women and marred men and between unmarred women and unmarred men

Model 3: Interpretaton! : The average wage for unmarred men! : The dfference n average wage between marred men and unmarred men! +! 3 : The dfference n average wage between marred women and unmarred women! 3 : The dfference of the dfference n average wage between marred women and unmarred women and between marred men and unmarred men Summary Interacton nteracton=var1*var nteracton varable changes nterpretaton of entre model wth nteracton, the effect of one varable changes accordng to the level of the second varable Test for nteracton by testng new varable f sgnfcant (p<#, not n CI), keep f not sgnfcant, go back to parent model wthout nteracton varable 3 Model 3: concluson The nteracton varable s statstcally sgnfcantly dfferent from (p=.1, CI: -4.9 to -1.3 ) The dfference n mean hourly wage between women and men s greater for marred people than for unmarred people. -or- The dfference n mean hourly wage between marred people and unmarred people s greater for men than for women. Flexblty n lnear models In lnear regresson, we assume the outcome, Y, has a lnear relatonshp wth the predctors, X However, we have flexblty n defnng the predctors transform X, such as X orx 3 use lnear splnes to ft broken arrow models

Example: Hosptal Expendtures ($$) The data are smlar to an example from the book by Pagano and Gauvreau: Prncples of Bostatstcs Data: Y - Average Hosptal expendture ($s) per admsson X 1 - Average length of stay (days) X - Average employee salary ($s) n = 51; 5 U.S. states + DC Model We mght formulate a MLR: 1) Y = # + # 1 X 1 + # X + $ ) $ ~ N(, % ) where: Y = Expendtures per admsson n $s X 1 = Length of stay (LOS) n days X = Salary n $s Scentfc Queston How s per capta expendture (Y) related to: Length of stay (X 1 ) Employee salary (X ) Model: E( Y X ) = # + # 1 X 1 + # X Parameter Interpretatons: # : expected expendture when LOS = and salary = ; (Need to center the model!) # 1 : dfference n expected expendture ($s) for two states wth same average salary but LOS that dffers by one day # : dfference n expected per capta expendture ($s) for two states wth same average LOS but salary that dffers by one dollar

Basc Model Source SS df MS Number of obs = 51 -------------+------------------------------ F(, 48) = 46.8 Model 5555145.4 177757.7 Prob > F =. Resdual 1331154.7 48 77317.87 R-squared =.6575 -------------+------------------------------ Adj R-squared =.643 Total 388664. 5 77738.3 Root MSE = 56.61 expend Coef. Std. Err. t P> t [95% Conf. Interval] los 313.597 73.44155 4.7. 165.8656 461.1938 salary.33349.37936 8.79..569844.495137 _cons -466.343 88.717-5.77. -688.346-336.339 Dagnoss The Alaskan outler appears here as well as some curvature n the salary relatonshp There appears to be a non-lnear relatonshp between expendtures (Y) and salary (X). How could we ncorporate ths n our model? Defne a new varable: salary and nclude t n the model: Check for curvature & other patterns of nterest: New Model Standardzed Resduals e(expend X) 1516.55-1131.39 -.1593.579 e( los X ) 4-4 6 8 1 length of stay (days) Standardzed Resduals e(expend X) 4815.65-989.144-968.18 8599.59 e( salary X ) 4-1 15 5 salary ($) AVPlots Resduals E( Y X ) = # + # 1 X 1 + # X + # 3 X Lnear relatonshp wth X 1 Quadratc relatonshp wth X

Quadratc Term Expendtures are lnearly related to length of stay, but have a quadratc relatonshp wth salary. Defne a new varable: salary = salary^ and nclude t n the regresson. Interpretatons # :??? # 1 : We estmate that expected expendtures per admsson wll be $44 hgher (95% CI: $37-51) n a state whose average LOS s one day longer than another state wth the same average employee salary # :??? # 3 :??? Model Output Inferences Source SS df MS Number of obs = 5 -------------+------------------------------ F( 3, 46) = 14.76 Model 175565.1 3 585755.3 Prob > F =. Resdual 188557.79 46 4983.8651 R-squared =.93 -------------+------------------------------ Adj R-squared =.8967 Total 194375.9 49 396684.14 Root MSE =.44 expend Coef. Std. Err. t P> t [95% Conf. Interval] los 441.999 9.3469 15.6. 38.9354 51.63 salary -.88387.9951-9.84. -3.47967 -.9367 salary.1 9.58e-6 1.46..89.1195 _cons 1974.65 6.543 8.94. 1583.11 4166.19 Is salary related to expendtures? Could test: H : # =? H : # 3 =? But really want H : # = # 3 = overall test for salary

Hosptal Example Recall Model: E( Y X ) = # + # 1 X 1 + # X + # 3 X Ho: # = # 3 = (Test by hand: need SSE E, SSE F ) Null Model Results Null model: E( Y X ) = # + # 1 X 1 Source SS df MS Number of obs = 5 -------------+------------------------------ F( 1, 48) = 47.4 Model 96138.76 1 96138.76 Prob > F =. Resdual 9816484.1 48 451.86 R-squared =.495 -------------+------------------------------ Adj R-squared =.4845 Total 194375.9 49 396684.14 Root MSE = 45.3 expend Coef. Std. Err. t P> t [95% Conf. Interval] los 443.3567 64.63975 6.86. 313.3898 573.336 _cons -786.691 49.483-1.6.115-177.641 199.48 SSE E = 9816484.1, s= Full Model Results F-test Results Source SS df MS Number of obs = 5 -------------+------------------------------ F( 3, 46) = 14.76 Model 175565.1 3 585755.3 Prob > F =. Resdual 188557.79 46 4983.8651 R-squared =.93 -------------+------------------------------ Adj R-squared =.8967 Total 194375.9 49 396684.14 Root MSE =.44 expend Coef. Std. Err. t P> t [95% Conf. Interval] los 441.999 9.3469 15.6. 38.9354 51.63 salary -.88387.9951-9.84. -3.47967 -.9367 salary.1 9.58e-6 1.46..89.1195 _cons 1974.65 6.543 8.94. 1583.11 4166.19 F-test: F,46 = (79316.3) / 188557.79 /(5 " 1 " " 1) $ 96.76 (p<.1; F. 5,,46 =3.) SSE F = 188557.79, n-p-s-1 = 5-1--1 = 46 Reject the null: conclude that the salary effects were statstcally sgnfcant n regresson model

Lnear Splnes: set-up The broken arrow model Example: A researcher tells you most Health Management Organzatons (HMOs) wll usually pay for the frst week of a hosptal stay only She expects expendtures to ncrease dramtcally f LOS was longer than one week How should we set up the model? Defnng a New Varable Smlar to what we dd n ANCOVA, we could just defne a new varable that checks to see f the slope s ndeed dfferent f LOS s greater than 7. Idea, nclude a term: (LOS-7) + = (LOS 7) f LOS>7 = f LOS<=7 The splne allows you to change the magntude of the slope! The researcher thought the LOS regresson lne should look lke: When to use a splne? Expendtures 35 3 5 Broken Arrow Model 3 5 7 9 length of stay (days) When a contnuous predctor s used, a typcal regresson equaton assumes there s a straght-lne relatonshp between X and Y n the populaton. If the relatonshp between X and Y s a bent lne a curve addng a splne may more accurately model the relatonshp between X and Y

Vsualzng the Model Then: Broken Arrow Model E(expendtures LOS <= 7) = # + # 1 LOS 35 Expendtures 3 5 Slope = # 1 Slope = # 1 + # E(exp LOS > 7)= # + # 1 LOS + # (LOS - 7) = (# - # &7)+ (# 1 + # )LOS 3 5 7 9 length of stay (days) = # * + # 1 *LOS The Model Model: E(expendtures) = # + # 1 LOS + # (LOS-7) + New Model E(Y X) = # + # 1 X 1 + # (X 1-7) + + # 3 X + # 4 X Where: (LOS-7)+ = (LOS 7) f LOS>7 f LOS<=7 Broken Arrow relatonshp wth X 1 Quadratc relatonshp wth X

Addng Splne to Quadratc Expendtures have a dfferent lnear relatonshp before and after a 7 day length of stay, and have a quadratc relatonshp wth salary. We ll just defne a new varable: los7 = (los-7)*(los>7) and nclude t n the regresson. Centerng LOS n the expendtures model Y: Average Hosptal expendture ($s) per admsson X1: Average length of stay (days) X: Average employee salary($1s) Centered Model: E(Y X) = # + # 1 (X 1-7) + # (X 1-7) + + # 3 (X -15) + # 4 (X -15) Results Fnal Model for Expendtures Source SS df MS Number of obs = 5 -------------+------------------------------ F( 4, 45) = 16.1 Model 17844348. 4 446187. Prob > F =. Resdual 1593174.87 45 3543.8861 R-squared =.918 -------------+------------------------------ Adj R-squared =.918 Total 194375.9 49 396684.14 Root MSE = 188.16 expend Coef. Std. Err. t P> t [95% Conf. Interval] los 1.5361 84.41545.5.15 4.51468 38.5576 los7 347.7778 11.85.87.6 13.991 591.6465 salary -3.14361.86969-1.95. -3.791 -.5651 salary.18 9.3e-6 11.6..894.169 _cons 376.97 394.89 9.7. 18453.41 81.53 Source SS df MS Number of obs = 5 -------------+------------------------------ F( 4, 45) = 16.1 Model 17844345.3 4 446186.31 Prob > F =. Resdual 1593177.63 45 3543.9473 R-squared =.918 -------------+------------------------------ Adj R-squared =.918 Total 194375.9 49 396684.14 Root MSE = 188.16 expend Coef. Std. Err. t P> t [95% Conf. Interval] losc 1.5366 84.4155.5.15 4.515 38.558 losc7 347.777 11.86.87.6 13.983 591.646 salc 11.6865 19.69614 5.16. 6.1645 141.3566 salc 18.1581 9.3474 11.6. 89.37714 16.9391 _cons 1954.413 68.69979 8.45. 1816.45 9.78 E( Y X ) = 1954 + 13(X 1-7) + 348(X 1-7) + + 1(X -15) + 18(X -15)

Back to modellng wages Ftted model wth splne at 35 Standardzed resduals - 4 Source SS df MS Number of obs = 533 -------------+------------------------------ F(, 53) = 8.18 Model 131.65577 615.87885 Prob > F =. Resdual 11584.1395 53 1.8568669 R-squared =.961 -------------+------------------------------ Adj R-squared =.97 Total 1815.795 53 4.89847 Root MSE = 4.6751 wagehr Coef. Std. Err. t P> t [95% Conf. Interval] age_cent.33899.47853 7.7..43943.453876 age_splne -.374546.6638-5.65. -.5485 -.44869 _cons 1.45389.357741 9.. 9.751156 11.1566 3 4 5 6 age We removed an outler, but do we stll need a splne? How should we add the splne? Goal: let the regresson lne bend Model: E(Wage ) =! +! 1 (age-35)+! (age-35) + What s (age-35) +? f age<35 (age-35) f age>=35 Ftted Graph (wth splne) Wage ($/hour) 1 3 4 5 3 4 5 6 age

Better Interpretaton E(Wage ) = 1.45+.33(age-35)-.37(age-35) + For a person under 35: E(Wage ) = 1.45+.33(age-35)-.37(age-35) + For a person 35 or older: E(Wage ) = 1.45+.33(age-35)-.37(age-35) + = 1.45-.4(age-35) (age-35) The average wage for people who are 35 years old s $1.45/hour (95% CI: $9.75, 11.16) For each addtonal year of age, those under age 35 earn an average of $.33 more per hour (95% CI: $.4, $.43) For each addtonal year of age, those over age 35 earn an average of $.4 less per hour (95% CI: -$.1, $.1)! 1 "! = new slope for those over 35 Interpretaton! s the average wage for people who are 35 years old! 1 s the change n average wage per addtonal year of age for those under 35! s the dfference n the change n average wage per addtonal year of age for those over age 35 as compared to those under age 35! s the change n the slope for over 35 vs. under 35 Is the change n slope statstcally sgnfcant? One varable was added to create the change n slope compare nested models wth t test. regress wagehr age_cent age_splne f sres_age<6 Source SS df MS Number of obs = 533 -------------+------------------------------ F(, 53) = 8.18 Model 131.65577 615.87885 Prob > F =. Resdual 11584.1395 53 1.8568669 R-squared =.961 -------------+------------------------------ Adj R-squared =.97 Total 1815.795 53 4.89847 Root MSE = 4.6751 wagehr Coef. Std. Err. t P> t [95% Conf. Interval] age_cent.33899.47853 7.7..43943.453876 age_splne -.374546.6638-5.65. -.5485 -.44869 _cons 1.45389.357741 9.. 9.751156 11.1566 H : splne s not needed (no change n slope n the populaton) p<.1 or CI does not nclude : reject H Conclude slope dffers for those over vs. under 35 n populaton

L Lnear relatonshp Wth the splne, there s no longer any pattern n the resduals After removng the one outler, no others appear to stand out N Normalty of the resduals The resduals are slghtly skewed to postve values the estmated regresson coeffcents are stll correct ther confdence ntervals may be msleadng I - Independence We cannot check ths by lookng at the data E Equal varance of the resduals across X The vertcal spread of the resduals may be smaller for those under 5 years of age the estmated regresson coeffcents are stll correct ther confdence ntervals may be msleadng

Concluson The ncrease n hourly wage wth ncreasng age s statstcally sgnfcant for those who recently entered the workforce (ages 18-35): for each addtonal year, these workers earn an average of 33 cents more per hour. However, ths ncrease n wage wth ncreasng age levels off for those over age 35, so that no apprecable ncrease n average wage s observed for those over age 35. One 1-year-old had much hgher earnngs ($44.5 per hour) than other young workers. Ths person s results were so unlke the rest of the sample that the observaton was dropped from the analyss. It s possble that the data was ncorrectly entered for ths person, but we are unable to assess the data entry snce the orgnal completed surveys are unavalable. Splnes Splnes are used to allow the regresson lne to bend the breakpont s arbtrary and decded graphcally the actual slope above and below the breakpont s usually of more nterest than the coeffcent for the splne (e the change n slope) 66