University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

Similar documents
University of California, Los Angeles Department of Statistics. Simple regression analysis

Stat 139 Homework 7 Solutions, Fall 2015

Simple Linear Regression

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

1 Inferential Methods for Correlation and Regression Analysis

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Properties and Hypothesis Testing

Linear Regression Models

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

Linear Regression Demystified

MA 575, Linear Models : Homework 3

Correlation Regression

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Statistics 20: Final Exam Solutions Summer Session 2007

Chapter 1 Simple Linear Regression (part 6: matrix version)

Topic 9: Sampling Distributions of Estimators

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Mathematical Notation Math Introduction to Applied Statistics

Final Examination Solutions 17/6/2010

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Matrix Representation of Data in Experiment

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Algebra of Least Squares

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Lecture 11 Simple Linear Regression

Sample Correlation. Mathematics 47: Lecture 5. Dan Sloughter. Furman University. March 10, 2006

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Statistical Properties of OLS estimators

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Topic 9: Sampling Distributions of Estimators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Section 14. Simple linear regression.

Stat 200 -Testing Summary Page 1

11 Correlation and Regression

10-701/ Machine Learning Mid-term Exam Solution

Simple Linear Regression

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Topic 9: Sampling Distributions of Estimators

Chapter 13, Part A Analysis of Variance and Experimental Design

Estimation for Complete Data

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STP 226 EXAMPLE EXAM #1

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

1 Models for Matched Pairs

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

of the matrix is =-85, so it is not positive definite. Thus, the first

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Full file at

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

(all terms are scalars).the minimization is clearer in sum notation:

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Stat 421-SP2012 Interval Estimation Section

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Homework for 4/9 Due 4/16

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

University of California, Los Angeles Department of Statistics. Hypothesis testing

There is no straightforward approach for choosing the warmup period l.

Bayesian Methods: Introduction to Multi-parameter Models

MA Advanced Econometrics: Properties of Least Squares Estimators

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Lecture 7: Properties of Random Samples

Ch 2: Simple Linear Regression

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

bwght = cigs

Lesson 11: Simple Linear Regression

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

TAMS24: Notations and Formulas

Stat 319 Theory of Statistics (2) Exercises

November 2002 Course 4 solutions

Output Analysis (2, Chapters 10 &11 Law)

(X i X)(Y i Y ) = 1 n

Transcription:

Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y, y,, y are idepedet radom variables ad y i µ ɛ i for i,,, Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Fid the least squares estimate of µ Give the variace of this estimate We wat to miimize S y i µ wrt µ Therefore, S µ y i µ 0 Solve for µ to get ˆµ Ȳ Ad varȳ b Cosider the model y i β 0 β x i ɛ i Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 I additio, it is give that x i 0 What are the least squares estimates of β 0 ad β? If x 0 we get ˆβ x iy i, ad ˆβ 0 ȳ x i c We have show that ŷ i ca be expressed as ŷ i h ii y i j i h ijy j Use this expressio to fid varŷ i Sice Y, Y,, Y are idepedet we fid that varŷ i h ii j i h ij j h ij This is simplified as follows: varŷ i h ij j j x i xx j x x i x x i x x j x x i x x i xx j x x, ad after summig over j we get i x j x i x j x j x I x 0 i x x i x I x i x d Fid a expressio of corre i, e j i terms of h ii, h jj, h ij I homework, exercise 6 we foud that cove i, e j x i xx j x h ij I additio, from class x i x otes, vare i h ii ad vare j h jj Therefore, corre i, e j cove i, e j sde i sde j EXERCISE Aswer the followig questios: h ij h ii h jj h ij hii hjj a Cosider the model y i β 0 β x i ɛ i Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Suppose we rescale the x values as x x α, ad we wat to fit the model y i β 0 β x i ɛ i Fid the least squares estimates of β 0 ad β The ew sample mea of x is x α Therefore, ˆβ will ot chage But ˆβ 0 ȳ ˆβ x α ȳ ˆβ x α ˆβ ˆβ 0 α ˆβ b Refer to the model y i β 0 β x i ɛ i of part a Fid the SSE of this model ad compare it to the SSE of the model y i β 0 β x i ɛ i What is your coclusio? SSE SST SSR Note: SST is the same We oly rescale x Will SSR chage? SSR ˆβ x i α x α SSR Therefore, SSE SSE c Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Show that ES Y Y β S XX, where S Y Y y i ȳ ad S XX x i x ES Y Y ESST ESSE ESSR E s e E ˆβ x i x

Es e x i x E ˆβ x i x var ˆβ E ˆβ x i x x i x β β d Refer to the model of part c Fid covɛ i, e i covɛ i, e i covɛ i, y i ȳ ˆβ x i x covɛ i, y i covɛ i, ȳ x i xcovɛ i, ˆβ i x i x x i x x i x x i x x i x EXERCISE 3 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, Suppose we wat to test simultaeously H 0 : β β ad β 0 β 0 H a : The hypothesis H 0 is ot true Aswer the followig questios: a I the expressio Q y i β0 β x i if we add ad subtract ˆβ 0 ad add ad subtract ˆβ x i show that Q y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β Q Q y i β0 β x i yi β0 β x ˆβ 0 ˆβ 0 ˆβ x i ˆβ x i yi ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β i x y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β ˆβ 0 β 0 ˆβ β y i ˆβ 0 ˆβ x i this is zero because e i 0 y i ˆβ 0 ˆβ x i x i this is zero because e ix i 0 ˆβ 0 β0 ˆβ β x i x i but x i x y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β b Let D ˆβ 0 ˆβ x Show that the radom variables ˆβ ad D are ucorrelated, ad explai why ˆβ ad D must therefore be idepedet CovD, ˆβ cov ˆβ 0 ˆβ x, ˆβ cov ˆβ 0, ˆβ xcov ˆβ 0, ˆβ x xi x 0 x xi x They are idepedet because they are bivariate ormal

c Show that the sum of the last three terms of i part a is equal to ˆβ β var ˆβ First let s fid vard D β 0 β x vard vard var ˆβ 0 ˆβ x var ˆβ 0 x var ˆβ xcov ˆβ 0, ˆβ Ad ow the proof: ˆβ β D β 0 β var ˆβ x vard ˆβ β ˆβ0 β0 ˆβ β x x i x ˆβ β x i x x x i x x x i x x x i x ˆβ 0 β0 x ˆβ β x ˆβ 0 β0 ˆβ β ˆβ β x i x ˆβ β ˆβ 0 β0 x ˆβ β x ˆβ 0 β0 ˆβ β ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β d If H 0 is true, what are the degrees of freedom of the radom variables ˆβ β var ˆβ Sice ˆβ N β, x i x D N β 0 β x, ad D β 0 β x vard? it follows that ˆβ β var ˆβ D β0 β x vard χ χ EXERCISE 4 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, Aswer the followig questios: a Fid EY i EY i vary i EY i β 0 β x i b Fid the distributio of Ȳ Ȳ Nβ 0 β x, c Fid EȲ EȲ varȳ EȲ β 0 β x d Fid cov ɛ i, ˆβ cov ɛ i, ˆβ cov ɛ i, j k jy j j k jcovɛ i, y j k i 0 Note: This is the same as cov y i, ˆβ covȳ, ˆβ covȳ, ˆβ 0 e Suppose EY i β 0 β x i, but that the Y i s are ot ecessarily idepedet or ormally distributed ad do ot ecessarily have equal variaces Are ˆβ 0 ad ˆβ ubiased estimators of β 0 ad β? Yes, ˆβ ad ˆβ 0 are still ubiased Takig expectatio of ˆβ ad ˆβ 0 does ot ivolve the idepedece assumptio

EXERCISE 5 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, I this umerical example, y represets the cocetratio of lead i ppm ad x represets the cocetratio of zic i ppm of soil at a particular area of iterest The sample size was 5 These data gave the followig results: y i ȳŷ i ȳ 7076708 y i ȳ 73376 x i x 560 x i 50706 ȳ 64 Aswer the followig questios: a Fid ˆβ Aswer 098 b Fid ˆβ 0 Aswer 58346 c Compute s e Aswer 9696 d Compute the value of the F statistic i testig the hypothesis H 0 : β 0 H a : β 0 Aswer 3593 e Compute var ˆβ 0 Aswer 46886 EXERCISE 6 Cosider the simple regressio model y i β 0 β x i ɛ i, i,, with Eɛ i 0, varɛ i, covɛ i, ɛ j 0, ad ɛ i N0, Aswer the followig questios: a Fid Covŷ i, y i covŷ i, y i cov ȳ ˆβ x i x, y i covȳ, yi x i xcovy i, ˆβ x i x x i x b Fid Cov ɛ i, e i Sice e i 0, it follows that Cov ɛ i, 0 0 EXERCISE 7 Three variables N, D, ad Y, all have zero sample meas ad uit sample variaces A fourth variable is C N D I the regressio of C o Y, the slope is 08 I the regressio of C o N, the slope is 05 I the regressio of D o Y the slope is 04 What is the error sum of squares i the regressio of C o D? There are observatios C N D varc varn vard covn, D covn, D From the two simple regressios we have: covc,y vary covc, Y 08 But covc, Y covn D, Y covn, Y covd, Y Also, covc,n varn covc, N 05 But covc,ncovnd, NvarN covd,n05 covd, N 05 Therefore, varc 05 Also, covc, D covn D, D covn, D vard 05 05 To fid the slope of the regressio of C o D: ˆβ covc,d vard 05 Fially, SSE SST SSR s C ˆβ S D 0 05 0 SSE 5 EXERCISE 8 Aswer the followig questios: a Cosider the simple regressio model y i β 0 β x i ɛ i, i,, with Eɛ i 0, varɛ i, covɛ i, ɛ j 0, ad ɛ i N0, Show that the correlatio coefficiet betwee ˆβ 0 ad ˆβ x is x i corr ˆβ 0, ˆβ cov ˆβ 0, ˆβ sd ˆβ 0 sd ˆβ x x i x x x i x x i x

x x i x x i x x x i x x i x x x i b Refer to the model of part a Give that x 0, derive a F statistic for testig the hypothesis H 0 : β β 0 agaist the alterative H a : β β 0 Follow these steps: Fid the distributio of ˆβ ˆβ 0 This is Nβ β 0, var ˆβ var ˆβ 0 cov ˆβ, β 0 Ad also s e χ Usig ad we ca create a ratio that follows the F distributio with degrees of freedom,, which ca also be obtaied if we have used t EXERCISE 9 Access the followig data i R: a <- readtable"http://wwwstatuclaedu/~christo/statistics00c/soil_completetxt", headertrue Aswer the followig questios: a Ru the regressio of textttcadmium o zic Attach the R output q <- lma$cadmium ~ a$zic summaryq Call: lmformula a$cadmium ~ a$zic Residuals: Mi Q Media 3Q Max -40976-0785 008 0607 4539 Coefficiets: Estimate Std Error t value Pr> t Itercept -0885463 0855-478 404e-06 *** a$zic 0008795 00003 884 < e-6 *** --- Sigif codes: 0?***? 000?**? 00?*? 005?? 0?? Residual stadard error: 47 o 53 degrees of freedom Multiple R-squared: 08394, Adjusted R-squared: 08384 F-statistic: 800 o ad 53 DF, p-value: < e-6 b Compute the leverage values leverage <- iflueceq$hat headleverage #List the first 6 leverage values: 3 4 5 0050933 00867869 0007849009 00086300 0008393 6 000867903 c Suppose the 0th observatio is deleted Give the formula that computes the ew ˆβ ad ˆβ 0 Use R to compute them ad attach the code The formula for computig ˆbeta after poit i is deleted is give by: ˆβ i ˆβ h ii x i xy i ȳ x i xy i ȳ ad ˆbeta 0 after poit i is deleted is give by: ˆβ 0 i ȳ i ˆβ i x i, where ȳ i ad x i are the sample meas of y ad x after observatio i is deleted from the data set

EXERCISE 0 Breast cacer mortality data: The data cotai breast cacer mortality y from 950 to 960 ad the adult white female populatios x i 960 for 30 couties i North Carolia, South Carolia, ad Georgia Access the data: a <- readtable"http://wwwstatuclaedu/~christo/statistics00c/cacertxt", sep",", headertrue Aswer the followig questios: Costruct a scatterplot of y o x Ru the regressio through the origi of y o x 3 Check the assumptios 4 Now ru the regressio of y o sqrtx 5 Check the assumptio of the model of questio 5 #Breast cacer mortality data: #Read the data: a <- readtable"http://wwwstatuclaedu/~christo/statistics0/cacertxt", sep",", headertrue #See the ames of the variables: amesa #Plot y o x: plota$x, a$y We see o-costat variace #Ru the regressio of y o x without the itercept: q <- lma$y ~ a$x 0 #See summary of the regressio: summaryq #No-costat variace ca be detected with the followig two plots: #Residuals o fitted values: plotq$fitted, q$res #Residuals o x: plota$x, q$res #Oe suggestio is to trasform the variables take square roots: #Ru the regressio o the trasformed variables: q <- lmsqrta$y ~ sqrta$x 0 #See summary of the ew regressio: summaryq #Make some plots: #First scatterplot of the trasformed variables: plotsqrta$x, sqrta$y #The plot of residuals o fitted values of the regressio o the trasformed variables: plotq$fitted, q$res #Ad residuals o sqrtx: plotsqrta$x, q$res #These plots usig the trasformed variables showed that the variace is defiitely more costat tha before