PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Similar documents
1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Topic 7: Analysis of Variance

Statistics for Economics & Business

Chapter 13: Multiple Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Basic Business Statistics, 10/e

Chapter 14 Simple Linear Regression

Chapter 11: Simple Linear Regression and Correlation

e i is a random error

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

β0 + β1xi and want to estimate the unknown

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Statistics for Business and Economics

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

x i1 =1 for all i (the constant ).

PubH 7405: REGRESSION ANALYSIS SLR: PARAMETER ESTIMATION

17 - LINEAR REGRESSION II

Learning Objectives for Chapter 11

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

18. SIMPLE LINEAR REGRESSION III

Economics 130. Lecture 4 Simple Linear Regression Continued

28. SIMPLE LINEAR REGRESSION III

Lecture 6: Introduction to Linear Regression

STAT 3008 Applied Regression Analysis

Statistics II Final Exam 26/6/18

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

First Year Examination Department of Statistics, University of Florida

Introduction to Regression

/ n ) are compared. The logic is: if the two

Biostatistics 360 F&t Tests and Intervals in Regression 1

Chapter 15 - Multiple Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

x = , so that calculated

Statistics Chapter 4

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

STATISTICS QUESTIONS. Step by Step Solutions.

Introduction to Analysis of Variance (ANOVA) Part 1

Comparison of Regression Lines

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

A Comparative Study for Estimation Parameters in Panel Data Model

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Statistics MINITAB - Lab 2

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Lecture 4 Hypothesis Testing

Lecture 3 Stat102, Spring 2007

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 8 Indicator Variables

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Unit 10: Simple Linear Regression and Correlation

Properties of Least Squares

β0 + β1xi. You are interested in estimating the unknown parameters β

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

F8: Heteroscedasticity

Regression. The Simple Linear Regression Model

Correlation and Regression

Sociology 301. Bivariate Regression II: Testing Slope and Coefficient of Determination. Bivariate Regression. Calculating Expected Values

STAT 511 FINAL EXAM NAME Spring 2001

β0 + β1xi. You are interested in estimating the unknown parameters β

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Chapter 12 Analysis of Covariance

Lab 4: Two-level Random Intercept Model

Lecture 3: Probability Distributions

Chapter 9: Statistical Inference and the Relationship between Two Variables

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

Topic- 11 The Analysis of Variance

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

experimenteel en correlationeel onderzoek

Topic 23 - Randomized Complete Block Designs (RCBD)

The SAS program I used to obtain the analyses for my answers is given below.

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Negative Binomial Regression

Linear Regression Analysis: Terminology and Notation

A Robust Method for Calculating the Correlation Coefficient

Professor Chris Murray. Midterm Exam

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

Econometrics of Panel Data

IV. Modeling a Mean: Simple Linear Regression

Linear Approximation with Regularization and Moving Least Squares

On the Influential Points in the Functional Circular Relationship Models

Transcription:

PubH 7405: REGRESSION ANALSIS SLR: INFERENCES, Part II

We cover te topc of nference n two sessons; te frst sesson focused on nferences concernng te slope and te ntercept; ts s a contnuaton on estmatng te mean response and more. Applcatons concernng te slope and te ntercept are based on te followng four 4 teorems

SAMPLING DISTRIBUTION OF SLOPE Teorem A: Under te "Normal Error Regresson Model" : β β Eb b 0 ε ε N0, Te samplng dstrbut on of te estmated slope b s Normal wt Meanand Varance : β

IMPLICATION b β b β s b s b b b dstrbuted as N0, n χ df n Teorem B : b β s dstrbute d as "t" wt n s b degrees of freedom

CONFIDENCE INTERVALS Teorem B : b β s dstrbute d as "t" wt n s b degrees of freedom α00% Confdence Interval for β s : b ± t α / ; n sb t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom

SAMPLING DISTRIBUTION OF INTERCEPT Teorem A: Under te Eb β β ε 0 b 0 0 β 0 ε N0, Te b 0 s samplng Normal "Normal n dstrbut wt Error Regresson Meanand Varance on of te Model" : estmated ntercept :

IMPLICATION b0 β0 b0 β0 s b0 s b0 b0 b0 dstrbuted as N0, n χ df n Teorem B : b0 β0 s dstrbute d as "t" wt n s b 0 degrees of freedom

CONFIDENCE INTERVALS Teorem B : b β0 s b 0 s dstrbute d as "t" wt n 0 degrees of freedom α00% Confdence Interval for β 0 s : b 0 ± t α / ; n sb 0 t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom

Te Mean Response : E X β β 0 A common objectve n regresson analyss s to estmate te mean response. For eample: we are nterested to know te average blood pressure for women at certan age and ow estmate t usng te relatonsp between SBP and Age, and n a study of te relatonsp between level of pay salary, X and worker productvty, te mean productvty at g, medum, and low levels of pay may be of partcular nterest for any company.

POINT ESTIMATE Te Mean Response E X β 0 β Let X denote te level of X for wc we ws to estmate te mean response,.e. E X ; ts may be a value wc occurred n te sample, or t may be some oter value of te predctor varable wtn te scope of te model. Te pont estmate of te response s: Pont Estmate E : X b 0 b :

SAMPLING DISTRIBUTION Teorem #3A : Under te "Normal Error Regresson Model" : β β ε 0 ε N0, Te samplng dstrbut on of te estmated Mean Response s Normal wt Meanand Varance : E E X β0 β n

y k n y k y k n b b b b 0 0

Te samplng dstrbuton of Ŷ s normal because ts estmated mean response, lke te ntercept and te slope, Ŷ s a lnear combnaton of te observatons y and te dstrbuton of eac observaton s normal under te normal error regresson model :

Te estmated mean response s unbased because te estmated ntercept and estmated slope are bot unbased: 0 0 0 X E b E b E E b b β β

n k k n n k k n n k n Var y k n

n MSE s n Var Takng square root to get Standard Error

n MSE SE n MSE s Implcaton: Our estmates are less precse toward te ends

MORE ON SAMPLING DISTRIBUTION s E s E n df n χ dstrbuted as N0, freedom of degrees d as "t" wt n dstrbute s s E Teorem #3B :

CONFIDENCE INTERVALS Teorem #3B : E s s dstrbute d as "t" wt n degrees of freedom α00% Confdence Interval s : ± t α / ; n s t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom for

EXAMPLE #: Brt wegt data: oz y % 63 66 07 7 9 5 9 75 80 8 8 0 84 4 8 4 06 7 03 90 94 9 s Intercept 56.97 Slope -.737 MSE 75.98 Mean of X 00.58 SS of X,56.93 For cldren wt brt wegt of 95 ounces, te pont estmate and 95% Confdence Interval for te Mean growt between 70-00 days as % of BW s: 56.97.73795 75.98 9.76 ±.8 7.43 9.757% 95 00.58,56.93 85.69%,97.83% 7.49

EXAMPLE #: Age and SBP Age SBP y 4 30 46 5 4 48 7 00 80 56 74 6 70 5 80 56 85 6 7 58 64 55 8 60 4 5 6 50 75 65 s Intercept 99.958 Slope.705 MSE 78.554 Mean of X 65.6 SS of X 3403.6 For 60 years old women, te pont estmate and 95% Confdence Interval for te Mean SBP s: 99.958.70560 4.6 78.554 5 4.3 ±.60.37 60 65.6 3403.6 3.4,5..37

LotSze WorkHours 80 399 30 50 90 376 70 36 60 4 0 546 80 35 00 353 50 57 40 60 70 5 90 389 0 3 0 435 00 40 30 50 68 90 377 0 4 30 73 90 468 40 44 80 34 70 33 EXAMPLE #3: Toluca Company Data Intercept 6.366 Slope 3.570 MSE,384 Mean of X 70.0 SS of X 9,800 For te lots sze of 65 unts, te pont estmate and 90% Confdence Interval for te Mean Work Hours s: s 6.37 3.5765,384 94.4 ±.74 5 98.47 94.4 65 70.0 9,800 77.4,3.4 98.47

In regresson analyss, besdes estmatng te mean response, sometmes one may want to estmate a new ndvdual response. For eample: In addton to estmatng te average blood pressure for women at certan age usng te relatonsp between SBP and Age, we may be nterested n estmatng te SBP of a partcular woman/patent at tat age; and In a study of te relatonsp between pay salary, X and worker productvty, te nterest may focus on te productvty of certan partcular worker.

POINT ESTIMATE Let X denote te level of X under nvestgaton, at wc te mean response s E X. Let new be te value of te new ndvdual response of nterest. Ts new observaton of to be predcted s often vewed as te result of a new tral ndependent of te trals on wc te regresson lne s formed. Te pont estmate s stll te same as tat of te mean response: E X b 0 β β b new 0 Same as te mean

VARIANCE Te pont estmates of te mean response and of an ndvdual response are te same but te varances are dfferent. In estmatng an ndvdual response, tere are two layers of varaton: a varaton n te poston of te dstrbuton tat s of te mean response, and b te varaton wtn tat dstrbuton tat s from te ndvdual response to te mean response

normal. s on of dstrbut samplng te Model, Error Regresson "Normal Under te : new Teorem #4A n n Var Var Var new new

n MSE s n Var new new Takng square root to get Standard Error

MORE ON SAMPLING DISTRIBUTION Inferences on a new ndvdual response s based on te followng results: Teorem #4B : new s new s dstrbute d as "t" wt n degrees of freedom

new n MSE SE n MSE s new Agan: Our estmates are less precse toward te ends

Normal Error Regresson Model β β ε { e } s 0 MSE ε N0, a sample wt mean zero : Teorem #5 : SSE s dstrbute d as χ EMSE df n- :

THE TEST FOR INDEPENDENCE Te E H 0 t Mean Response : X : β "t" test at n degreesof freedom : b sb 0 β β 0 wc s dentcal to te test usng "r": n t r r Te metod we use most often s ts Test for Independence wc we are now approacng by a dfferent way: ANOVA

COMPONENTS OF VARIATION Te varaton n s conventonally measured n terms of te devatons - 's; te total varaton, denoted by SST, s te sum of squared devatons: SST Σ -. For eample, SST0 wen all observatons are te same; SST s te numerator of te sample varance of, te greater SST te greater te varaton among -values. In te regresson analyss, te varaton n s decomposed nto two components: - - Ŷ Ŷ -

DECOMPOSITION OF SST In te decomposton: - - Ŷ Ŷ - Te frst term RHS reflects te varaton around te regresson lne; te part tan cannot be eplaned by te regresson tself wt te sum of squared errors SSE Σ - Ŷ. Te dfference between te above two sums of squares, SSR SST - SSE ΣŶ -, s called te regresson sum of squares; SSR may be consdered as a measure of te varaton n assocated wt or eplaned by te regresson lne.

Regresson elps to mprove te estmate of from wtout any nformaton to Ŷ wt nformaton provded by knowng X

SSR SSE SST e e SSR SSE SST ] [ 0

ANALSIS OF VARIANCE SST measures te total varaton n te sample of values of te dependent varable wt n- degrees of freedom, n s te sample sze. It s decomposed nto: SSTSSESSR SSE measures te varaton cannot be eplaned by te regresson wt n- degrees of freedom, and SSR measures te varaton n assocated wt or eplaned by te regresson lne wt degree of freedom representng te slope.

0 ] [ ] [ b SSR E MSR E b y b b y y b b SSR β β VarX EX {EX} EX VarX {EX}

ANOVA TABLE Te breakdowns of te total sum of squares and ts assocated degree of freedom are dsplayed n te form of an analyss of varance table ANOVA table for regresson analyss as follows: Source of Varaton SS df MS F Statstc p-value Regresson SSR MSR MSR/MSE Error SSE n- MSE Total SST n- Recall: MSE, te error mean square, serves as an estmate of te constant varance as stpulated by te regresson model.

E MSE E MSR β Under te Null Hypotess H 0 : β 0, EMSE EMSR so tat FMSR/MSE s epected to be near.0 Teorem 6: F s dstrbuted, under H 0, as F,n- followng a teorem by Cocran.

THE F-TEST Te test statstc F for te above analyss of varance approac compares MSR and MSE, a value near supports te null ypotess of ndependence. In fact, we ave: F t, were t s te test statstc for testng weter or not β 0; te F-test s equvalent to te two-sded t- test wen refereed to te F-table n Append B Table B.4 wt,n- degrees of freedom.

THE TEST FOR INDEPENDENCE Te H 0 : β 0 Two dentcal t n t r r "F" test at,n degreesof F Null "t" test at n b sb wc s MSR MSE Hypotess dentcal coces degreesof to : : te test usng freedom : "r": freedom :

COEFFICIENT OF DETERMINATION We can epress te coeffcent of determnaton te square of te coeffcent of correlaton r as: r SSR SST Tat s te porton of total varaton attrbutable to regresson; Regresson elps to mprove te estmate of from wtout any nformaton to Ŷ wt nformaton provded by knowng X reducng te total varaton by 00r %

EXAMPLE #: Brt Wegt Data oz y % 63 66 07 7 9 5 9 75 80 8 8 0 84 4 8 4 06 7 03 90 94 9 SUMMAR OUTPUT Regresson Statstcs R Square 0.89546 Observatons ANOVA df SS MS F Sgnfcance F Regresson 6508 6508 85.66 3.6E-06 Resdual 0 759.8 75.98 Total 768

EXAMPLE #: AGE & SBP Age SBP y 4 30 46 5 4 48 7 00 80 56 74 6 70 5 80 56 85 6 7 58 64 55 8 60 4 5 6 50 75 65 SUMMAR OUTPUT Regresson Statstcs R Square 0.383 Observatons 5 ANOVA df SS MS F Sgnfcance F Regresson 69 69 6.07 0.08453563 Resdual 3 36 78.6 Total 4 53

EXAMPLE #3: Toluca Company Data LotSze WorkHours 80 399 30 50 90 376 70 36 60 4 0 546 80 35 00 353 50 57 40 60 70 5 90 389 0 3 0 435 00 40 30 50 68 90 377 0 4 30 73 90 468 40 44 80 34 70 33 SUMMAR OUTPUT Regresson Statstcs R Square 0.383 Observatons 5 ANOVA df SS MS F Sgnfcance F Regresson 69 69 6.07 0.08453563 Resdual 3 36 78.6 Total 4 53

Normal Error Regesson ε β 0 β N0, ε Model : Te normal regresson model assumes tat te X values are known constants. We do not mpose any knd of dstrbuton for te -values

In many cases, ts s not true; for eample, f we study te relatonsp between egt of a person and wegt of a person, a sample of persons are taken but bot measurements are random. Rater tan a regresson model, one sould consder a correlaton model ; te most wdely used s te Bvarate Normal Dstrbuton wt densty:

] [, ep, y y y y y y y y y X E X Cov X X X f µ µ ρ µ µ µ ρ µ ρ ρ π y s te Covarance and ρ s te Coeffcent of Correlaton between te two random varables X and ; ρ s estmated by te sample Coeffcent of Correlaton r. CORRELATION MODEL Correlaton Data are often cross-sectonal or observatonal. Instead of a regresson model, one sould consder a correlaton model ; te most wdely used s te Bvarate Normal Dstrbuton wt densty:

Te Coeffcent of Correlaton ρ between te two random varables X and s estmated by te sample Coeffcent of Correlaton r but te samplng dstrbuton of r s far from beng normal. Confdence ntervals of s by frst makng te Fser s z transformaton ; te dstrbuton of z s normal f te sample sze s not too small

CONDITIONAL DISTRIBUTION : y and standard devaton wt mean normal s for any gven X dstrbuton of condtonal Te 0 0 ep, y y y y y y y y y y X X X f ρ ρ β ρ µ µ β β β µ µ µ ρ µ ρ ρ π Teorem :

Agan, snce Var X- ρ Var, ρ s bot a measure of lnear assocaton and a measure of varance reducton n assocated wt knowledge of X tat s wy we called r, an estmate of ρ, te coeffcent of determnaton.

Readngs & Eercses Readngs: A toroug readng of te tet s sectons.4-.5 pp. 5-6,.7 pp. 63-7, and. pp. 78-8 s gly recommended. Eercses: Te followng eercses are good for practce, all from capter of tet:.3,.3,.4,.8, and.9.

Due As Homework #9. Refer to dataset Cgarettes, Cotnne & XCPD: a Obtan te 95% confdence nterval for te mean Cotnne level for subjects wo consumed X 30 cgarettes per day and gve your nterpretaton. b Obtan te 95% confdence nterval for Cotnne level of a subject wo consumed 30 cgarettes per day; wy s te result s dfferent from a? c Plot te resdual aganst X; Wat would be your concluson about ter possble lnear relatonsp? Wat would be te average resdual? d Set up te ANOVA table and test weter or not a lnear assocaton est between Cotnne and CPD. #9. Answer te 4 questons of Eercse 9. usng dataset Vtal Capacty wt X Age and 00Vtal Capacty; use X 35 years for questons a and b.