NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Similar documents
Statistics II Final Exam 26/6/18

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 13: Multiple Regression

Chapter 11: Simple Linear Regression and Correlation

Statistics MINITAB - Lab 2

STAT 3008 Applied Regression Analysis

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 15 - Multiple Regression

Chapter 14 Simple Linear Regression

Basic Business Statistics, 10/e

Lecture 6: Introduction to Linear Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

a. (All your answers should be in the letter!

Statistics for Business and Economics

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Comparison of Regression Lines

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Topic 7: Analysis of Variance

Negative Binomial Regression

January Examinations 2015

Chapter 9: Statistical Inference and the Relationship between Two Variables

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Professor Chris Murray. Midterm Exam

x i1 =1 for all i (the constant ).

Economics 130. Lecture 4 Simple Linear Regression Continued

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics Chapter 4

Polynomial Regression Models

Learning Objectives for Chapter 11

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Biostatistics 360 F&t Tests and Intervals in Regression 1

First Year Examination Department of Statistics, University of Florida

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Regression. The Simple Linear Regression Model

17 - LINEAR REGRESSION II

28. SIMPLE LINEAR REGRESSION III

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Regression Analysis. Regression Analysis

Scatter Plot x

10-701/ Machine Learning, Fall 2005 Homework 3

Midterm Examination. Regression and Forecasting Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

18. SIMPLE LINEAR REGRESSION III

Question 1 carries a weight of 25%; question 2 carries 20%; question 3 carries 25%; and question 4 carries 30%.

Lecture 4 Hypothesis Testing

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

Introduction to Regression

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

STAT 511 FINAL EXAM NAME Spring 2001

Correlation and Regression

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Topic- 11 The Analysis of Variance

β0 + β1xi. You are interested in estimating the unknown parameters β

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 12 Analysis of Covariance

ANOVA. The Observations y ij

β0 + β1xi and want to estimate the unknown

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Unit 10: Simple Linear Regression and Correlation

/ n ) are compared. The logic is: if the two

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

x = , so that calculated

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

F statistic = s2 1 s 2 ( F for Fisher )

Diagnostics in Poisson Regression. Models - Residual Analysis

The SAS program I used to obtain the analyses for my answers is given below.

Linear Regression Analysis: Terminology and Notation

β0 + β1xi. You are interested in estimating the unknown parameters β

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Chapter 15 Student Lecture Notes 15-1

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

e i is a random error

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Lecture 3 Stat102, Spring 2007

Chemometrics. Unit 2: Regression Analysis

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Cathy Walker March 5, 2010

Transcription:

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons and comprses S IX (6) prnted pages, ncludng two appendces (page 5 and 6).. Answer all questons. The marks for each queston are ndcated at the begnnng of each queston. 3. Answer each queston begnnng on a FRESH page of the answer book. 4. Ths IS NOT an OPEN BOOK exam. 5. Canddates may use calculators. However, they should wrte down systematcally the steps n the workngs.

QUESTION 1. (3 marks) Suppose one wshes to study the e ectveness of a mathematcal clnc center n mprovng the undergraduate experence. To ths end, the center keeps a complete log of all users and the duraton of each vst. The researchers select 5 students regstered n mathematcal courses aganst the center log to see the hours spent (f any) and to ask them to rate the overall value of ther learnng experence on a scale from 0 to 100. The 5 students on average spent.03 hours at the center wth astandarddevatonof4.56hours. On average they gave a ratng of 53.43 wth astandarddevatonof9.38. The correlaton coe cent between hours spent and ratng s 0.41. () Fnd a regresson model for the varables of nterest and estmate the parameters. () Is the regresson coe cent sgnfcant at 5% sgnfcance level (t 0.05 1 =1.96,t 0.05 1.98,t 0.01 1 =63.6)? () Construct an ANOVA table to test the sgnfcance on the relatonshp between the response varable and predctor varables at the 5% level. (v) Estmate the ncrease n the expected number of ratng when there are more hours spent at the center by student A than student B. Fnd ts 95% confdence ntervals (t 0.05 1 =1.96,t 0.05 10 =1.98,t 0.01 1 =63.6). Soluton () From the data we see that x =.03, S xx = X (x x) =4.56 5 = 4678.56, ȳ =53.43, S yy = X (y ȳ) =9.38 5 = 19796.49. 10 = Note that whch mples that r xy = S xy S xx S yy, r xy =0.41, S xy = r xy S xx S yy =0.41 4678.56 19796.49 = 1556965 It follows that ˆ1 = S xy S xx = 3945.791 4678.56 =0.8434

and that b 0 =ȳ b 1 x =53.43 0.8434.03 = 51.7179. The ftted model s y =51.7179 + 0.8434x. () Note that SSR = b 1 S xx =0.8434 4678.56 = 337.97. Ths ensures that SSE = S yy SSR =19796.49 337.97 = 16468.5 and that s = SSE/(n ) = 16468.5/3 = 73.84987. Hence t = ˆ1 q s S xx = 0.8434 q 73.84987 4678.56 = 0.8434 0.156373 =6.71975 >t0.05 1 =1.96. We then reject the null hypothess and the regresson coe cent s sgnfcant. () The ANOVA table s as follows Source df SS MS F p-value 337.97 Regresson 1 SSR =337.97 337.97 =45.06399 73.84987 Resdual 3 SSE =16468.5 73.84987 Total 4 S yy =19796.49 We see that F = SSR s = 337.97 73.84987 =45.06399 > (t0.05 1 ) =3.8416 and agan reject the null hypothess and the regresson coe cent s sgnfcant. (v) Note that x 1 x =andtheestmatorofthencreases Ey 1 Ey =0.8434 =1.6868. The predcton nterval s s ŷ 0 ± s ( 1 n + (x r 0 x) )t 1 (.03) n =1.6868 ± 8.593595 + 1.96 S xx 5 4678.56 =(0.563879,.80971). 3

QUESTION. Consder the general lnear regresson stuaton wth a 0 n the model: (18 marks) () Verfy that the correlaton between the vectors e and Y s (1 e =(e 1,, e p )wthe = y ŷ and Y =(y 1,,y p ). R ) 1/ where () Can we fnd detectve regressons by a plot of resduals e aganst observatons y? Justfy your answer. () Fnd the correlaton between e and Ŷ. Soluton () Observe that X (e ē)(y ȳ)= X e (y ȳ)= X e y = e 0 Y = e 0 e because ē =0fa 0 term s n the model and e 0 e = Y 0 (I H)(I H)Y = Y 0 (I H)Y = Y 0 e. Moreover we have It follows that r xy = (e 0 e P X (e ē) = X e = e 0 e. e 0 e (y ȳ ) ) = e 0 e 1/ P =(1 R ) 1/. 1/ (y ȳ) () No, we can not fnd detectve regressons by a plot of resduals e aganst observatons y because there always shows a slope. () Wrte X (e ē)(ŷ ˆȳ) = X e ŷ = e 0 Ŷ = Ŷ 0 (I H) 0 HY = Ŷ 0 (H H )Y = 0 so that the correlaton s zero. 4

QUESTION 3. (30 marks) The dataset n Appendx II contans the prce per capta of pork annually from 1936-195 together wth other varables relevant to the prce of the pork. A multple lnear regresson model s proposed to descrbe the relatonshp between the response varable PP (prce of pork) and the other 5 predctors varables (x 1,x,x 3,x 4,x 5 ). However, a farmer beleves that the varaton n PP can be adequately explaned by the varable x 4 alone and therefore proposes a smple lnear regresson model! for the data. Fttng and! separately to the dataset yelds the followng table. Model Model! SSE 133.1769? SSR 63.48848 457.9568 () Wrte down the full model and reduced model!. () Calculate the SSE, denoted by the queston mark n the above table, for fttng!. () Fttng produces the estmators of the followng regresson coe cents: Intercept x 1 x x 3 x 4 x 5-3704.869.147-0.866.96-3.961-1.859 Predct the value of PP n the year 006 based on the model when x = 98,x 3 =100,x 4 =100,x 5 =110. (v) Is the farmer s belef correct at the 5% sgnfcance level? Justfy your answer (F4,11 0.95 =3.36,F5,11 0.95 =3.0,F4,1 0.95 =3.6,F5,1 0.95 =3.11). Assume that the varable x 1 (the varable of year) s categorzed nto 3 levels: (1) 1936-1940; () 1941-1950; (3) 1951-195. Suppose that s an adequate model after categorzaton of x 1. (v) Defne dummy varables to represent the categorzed x 1 varable. (v) Propose a test statstc to examne whether PP changes sgnfcantly wth tme under. State clearly the dstrbuton and the parameters of the test statstc (such as degree of freedom). Soluton 5

() The full model s y = 0 + 1 x 1 + x + 3 x 3 + 4 x 4 + 5 x + " and the reduced model s y = 0 + 4 x 4 + " () SSE w =133.1769 + 63.48848 457.9568 = 307.7355. () The pont predctor s y 006 = 3704.869+.147 006 0.866 98+.96 100 3.961 100 1.859 110 = 16.616. (v) The extra SSE s The statstc s F = SSEEXT =307.7355 133.1769 = 174.568. 174.568/4 133.1769/(17 5 1) =3.6047 <F 4,11 =3.36 whch mples that we can not reject the null hypothess. The farmer s belef s correct based on the current data. (v) The two dummy varables are as follows. 8 >< 1 level1 I 1 = 0 level >: 0 level3 (v) In ths case the full model s 8 >< 0 level1 I = 1 level >: 0 level3 y = 0 + 1 I 1 + I + x + 3 x 3 + 4 x 4 + 5 x + " and the reduced model s y = 0 + x + 3 x 3 + 4 x 4 + 5 x + ". 6

The proposed F statstc s F = MSEXT MSE F k 1,n p k, where MSEXT =(SSE! SSE /(k 1) and MSE = SSE /n p k wth k =3andp =4. QUESTION 4. (0 marks) Astudywasconductedtocomparethee ectvenessoftwod erentmedcatons for treatng ndvduals wth hgh blood pressure. To control for unknown sources of varaton, ten patents were assgned at random to each of the two medcatons. The response s a coded measure of the decrease n the dastolc blood pressure measurement after a specfed perod. In ths study, treatment 1 s a standard medcaton and another s a new expermental medcaton. The data, y r, =1, and r =1,, 10 are shown n the table below. Observatons Trt.1 Trt. Observatons Trt.1 Trt. 1 1.5 3.5 6 0.6 3. 0..7 7-0.5 4.3 3-0.. 8 1.1 1.3 4.1 1.6 9-1. 1.5 5-1. 1.7 10 1..5 Does the new medcaton have some mprovement over the standard treatment? () Propose two statstcal approaches for the above queston that can be tested by the data above. () Answer the above queston and justfy t (at 5% level). () Fnd the 95% confdence nterval to estmate the d erence between two medcatons. (F1,18 0.95 =4.41,F1,0 0.95 =4.35, F15, 0.95 =19.43.) Soluton () One way anova model and t-test. 7

() Below we use one way anova model. From the data we see that and ȳ = 8.1 0 =1.405,c.f.= n(ȳ ) =0 (1.405) S yy = X,j y j c.f. =1.88 + 68.75 0 (1.405) =4.1495. Also SST = X y /n c.f. =(3.6 +4.5 )/10 0 (1.405) =1.8405. It follows that SSE = S yy SSR =4.1495 1.8405 = 0.309 and the F statstc s F = 1.8405/1 0.309/18 =19.35737 >F0.95 1,18 =4.41. Ths mples that the new medcaton has some mprovement over the standard treatment. () Note that ȳ 1 =3.6/10 = 0.36, ȳ =4.5/10 =.45. The pont the estmator of the d erence between two medcatons s ȳ 1 ȳ =.09. It follows that the confdence nterval s (.09 ± p r r 0.309 1 4.11 18 10 + 1 )=( 10 3.087568, 1.0943). END OF PAPER 8

Appendx Formulae for the fnal examnaton Smple lnear regresson b 0 =ȳ b 1 x, b 1 = S xy, var( S b 0)= ( 1 xx n + x ), var( S b 1)=, xx S s xx s.e.( b 0 + x 0 b 1 1 )=s n + (x 0 x), SSR = S b 1 S xx,rxy = S xy xx S xx S yy s Predcton nterval for y 0 at x = x 0 s ŷ 0 ± s Predcton nterval for Ey 0 at x = x 0 s ŷ 0 ± s Multple lnear regresson b =(X 0 X) 1 X 0 Y, var( b )= (X 0 X) 1, (1 + 1 n + (x 0 x) s S xx ( 1 n + (x 0 x) S xx )t n )t n SSE = Y 0 [I X(X 0 X) 1 X 0 ]Y, S yy = Y 0 Y nȳ,r = SSR S yy One-way ANOVA var(ȳ.) = SSE = n j, SST = rx Xn =1 j=1 (Ȳ. rx Xn (Y j Ȳ. ), S yy = =1 j=1 Two-way ANOVA (equal sample szes) Ȳ.. ) = rx =1 n (Ȳ. rx Xn (Y j Ȳ.. ) =1 j=1 Ȳ.. ), var(ȳ..) = bn, var(ȳ.j.) = an SS A = X X X (Ȳ.. Ȳ... ) = nb X (Ȳ.. Ȳ... ) = nb X j k SS B = X X X (Ȳ.j. Ȳ... ) = na X (Ȳ.j. Ȳ... ) = na X j k j j SSE = X X X (Ȳj. Ȳ... ), j k rx Xn SS AB = n j. Ȳ.. Ȳ.j. + =1 j=1(y Ȳ...) = n X X bj j b, b j, 9

Appendx II x_1 x_ x_3 x_4 x_5 PP 1936 65.8 51.4 90.9 68.5 59.7 1937 68.0 5.6 9.1 69.6 59.7 1938 65.5 5.1 90.9 70. 63.0 1939 64.8 5.7 90.9 71.9 71.0 1940 65.6 55.1 91.1 75. 71.0 1941 6.4 48.8 90.7 68.3 74. 194 51.4 41.5 90.0 64.0 7.1 1943 4.8 31.4 87.8 53.9 79.0 1944 41.6 9.4 88.0 53. 73.1 1945 46.4 33. 89.1 58.0 70. 1946 49.7 37.0 87.3 63. 8. 1947 50.1 41.8 90.5 70.5 68.4 1948 5.1 44.5 90.4 7.5 73.0 1949 48.4 40.8 90.6 67.8 70. 1950 47.1 43.5 93.8 73. 67.8 1951 47.8 46.5 95.5 77.6 63.4 195 5. 56.3 97.5 89.5 56.0 ; 10