STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Similar documents
Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Business and Economics

Statistics MINITAB - Lab 2

STAT 3008 Applied Regression Analysis

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics II Final Exam 26/6/18

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 14 Simple Linear Regression

17 - LINEAR REGRESSION II

Biostatistics 360 F&t Tests and Intervals in Regression 1

a. (All your answers should be in the letter!

Statistics for Economics & Business

Introduction to Regression

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 11: Simple Linear Regression and Correlation

Basic Business Statistics, 10/e

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

LINEAR REGRESSION MODELS W4315

β0 + β1xi. You are interested in estimating the unknown parameters β

Economics 130. Lecture 4 Simple Linear Regression Continued

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 9: Statistical Inference and the Relationship between Two Variables

/ n ) are compared. The logic is: if the two

Linear Feature Engineering 11

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Chapter 15 - Multiple Regression

Regression Analysis. Regression Analysis

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Learning Objectives for Chapter 11

Properties of Least Squares

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 13: Multiple Regression

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

STATISTICS QUESTIONS. Step by Step Solutions.

Statistics Chapter 4

NUMERICAL DIFFERENTIATION

First Year Examination Department of Statistics, University of Florida

18. SIMPLE LINEAR REGRESSION III

Comparison of Regression Lines

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Polynomial Regression Models

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

28. SIMPLE LINEAR REGRESSION III

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 6: Introduction to Linear Regression

Regression. The Simple Linear Regression Model

x i1 =1 for all i (the constant ).

F8: Heteroscedasticity

STAT 511 FINAL EXAM NAME Spring 2001

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

β0 + β1xi and want to estimate the unknown

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Topic 7: Analysis of Variance

x = , so that calculated

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Professor Chris Murray. Midterm Exam

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 4 Hypothesis Testing

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Linear Regression Analysis: Terminology and Notation

SIMPLE LINEAR REGRESSION

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Scatter Plot x

Differentiating Gaussian Processes

10-701/ Machine Learning, Fall 2005 Homework 3

Some basic statistics and curve fitting techniques

Math1110 (Spring 2009) Prelim 3 - Solutions

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

This column is a continuation of our previous column

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

January Examinations 2015

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Gravitational Acceleration: A case of constant acceleration (approx. 2 hr.) (6/7/11)

Correlation and Regression

Lecture 3 Stat102, Spring 2007

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Unit 10: Simple Linear Regression and Correlation

The SAS program I used to obtain the analyses for my answers is given below.

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Midterm Examination. Regression and Forecasting Models

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Cathy Walker March 5, 2010

Transcription:

(out of 15 ponts) STAT 3340 Assgnment 1 solutons (10) (10) 1. Fnd the equaton of the lne whch passes through the ponts (1,1) and (4,5). β 1 = (5 1)/(4 1) = 4/3 equaton for the lne s y y 0 = β 1 (x x 0 ), where (x 0, y 0 ) s a pont on the lne. Usng the pont (1,1), the equaton s y 1 = 4(x 1), or y = 4x 1. Could also use the pont (4,5), and 3 3 3 would get the same equaton.. Suppose you are gven three data ponts (1,4), (,6) and (3,7) and the lne y = 1 + 4x. Gve the three resduals and ther sum of squares. (5 ponts for resduals, 5 ponts for resdual sum of squares) > x=c(1,,3) > y=c(4,6,7) > yhat=1+4*x > yhat # predcted values [1] 5 9 13 > resds=y-yhat #resduals > resds [1] -1-3 -6 > sum(resds^) # resdual sum of squares [1] 46 3. Some data gves the summares: n = 10, x y = 100, x = 0 and y = 10. Suppose that the response y s temperature n degrees Celcus. (3) (3) (4) (a) What s S xy? S xy = x y n xȳ = 100 10(0/10)(10/10) = 80. (b) If the response was converted to temperature n degrees Fahrenhet y, so that y = 3 + 1.8y, what s y?. 10 10 10 y = (3 + 1.8y ) = 3(10) + 1.8 y = 30 + 1.8(10) = 338 (c) If the response was converted to temperature n degrees Fahrenhet y, so that y = 3 + 1.8y, what s S xy? x y = x (3 + 1.8y ) = 3 x + 1.8 x y = 3(0) + 1.8(100) = 80 Then S xy = x y ( 10 x 10 y )/10 = 80 (0)(338)/10 = 144. 1

4. In a smple lnear regresson, the sum of squares functon s S(β 0, β 1 ) = 1000 100β 0 700β 1 + 100β 0 β 1 + 50β 0 + 70β 1. Fnd the least squares values for β 0 and β 1. Frst dfferentate wrt β 0 and β 1. (5 ponts for dervatves) S β 0 = 100 + 100β 1 + (50)β 0 S β 1 = 700 + 100β 0 + (70)β 1 Settng the partal dervatves eqch equal to 0 gves the followng two equatons, after a bt of smplfyng: β 1 + β 0 = 1 = β 0 = 1 β 1 1.4β 1 + β 0 = 7 Substtutng the frst equaton nto the second, 1.4β 1 +(1 β 1 ) = 7 =.4β 1 = 6 = β 1 = 15. Substtutng ths back nto the frst equaton gves β 0 = 1 β 1 = 14. The least squares soluton s ( 14, 15). (5 ponts for soluton)

5. A random sample of 11 elementary school students s selected, and each student s measured on a creatvty score (x) usng a well-defned testng nstrument and on a task score (y) usng a new nstrument. The task score s the mean tme taken to perform several hand-eye coordnaton tasks. The data are: STUDENT CREATIVITY(X) TASKS(Y) FR 35 3.9 HT 37 3.9 IO 50 6.1 DP 69 4.3 YR 84 8.8 QD 40.1 DF 9 5.7 ER 4 3.0 RR 51 7.1 TG 45 7.3 EF 31 3.3 Use R to do the followng questons. Show your commands. Make sure your output s ntegrated nto your responses. (Cut and paste as necessary.) (a) Plot Tasks versus Creatvty and comment on the form and strength of the assocaton. Be sure to label the axes. To make the plot, use: x=c(35,37,50,69,84,40,9,4,51,45,31) y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) plot(x,y,xlab="creatvty",ylab="tasks") Plot s below, together wth added least squares lne. (b) Calculate the summares S xx, S xy, S yy and X and Ȳ. (1 pont for each of the 5 summary statstcs) > x=c(35,37,50,69,84,40,9,4,51,45,31) > y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) > xbar=mean(x) > xbar [1] 46.63636 > ybar=mean(y) > ybar [1] 5.045455 > Sxx=sum((x-mean(x))^) > Sxx [1] 778.545 > Syy=sum((y-mean(y))^) > Syy [1] 44.077 3

() (4) > Sxy=sum((x-mean(x))*(y-mean(y))) > Sxy [1] 01.5818 (c) Use these data summares to calculate the correlaton coeffcent. Does the value agree wth your vsual assessment n (a)? ( ponts for the correlaton coeffcent). > corrxy=sxy/sqrt(sxx*syy) > corrxy [1] 0.5763439 from the plot followng, t looks lke there s a moderately strong ncreasng relatonshp between x and y, and ths s born out by the moderate sze of r. (d) Use these summares to calculate the least squares values for the ntercept and slope. ( ponts each for ˆbeta 1 and ˆbeta 0 ) > b1=sxy/sxx #estmated slope > b1 [1] 0.075494 > b0=ybar-b1*xbar #estmated ntercept > b0 [1] 1.66014 (e) Add the least squares lne to the plot n (a). (A convenent way to add the lne s the command ablne(ntercept,slope) (5 ponts for the plot wth added lne) > x=c(35,37,50,69,84,40,9,4,51,45,31) > y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) > plot(x,y,xlab="creatvty",ylab="tasks") > ablne(b0,b1) 4

Tasks 3 4 5 6 7 8 9 30 40 50 60 70 80 Creatvty 5

(6) (f) Obtan the resduals, e = y ŷ. Calculate ther sample mean to verfy t s zero, and the correlaton wth X to verfy t s also zero. ( ponts for the resduals, ponts for showng the mean of resduals s 0, ponts for showng correlaton of resduals and x s 0.) (g) Plot the resduals versus X. Do the resduals look random? (5 ponts for the resdual plot) > yhat = b0+b1*x #predcted values > resds=y-yhat #resduals > resds [1] -0.301433-0.446341 0.8105156 -.367930 1.0438359 -.4639903 [7] 1.9340531-1.7090891 1.737966.37367-0.6110457 > prnt(mean(resds)) #0 to round off error [1] 4.04064e-17 > prnt(cor(x,resds)) #0 to round off error [1] -1.03577e-16 6

> #<<fg=t,echo=true,keep.source=t>>= > plot(x,resds,man="resdual plot", xlab="x",ylab="resduals") resdual plot resduals 1 0 1 30 40 50 60 70 80 x The resduals look random, wth no evdence that mean or varance change wth x. 7

(6) () (h) Obtan the resdual, regresson and total sums of squares, usng the data summares. ( ponts for each of SSE, SST, SSR) > SSE=sum(resds^) #resdual sum of squares > SSE [1] 9.4063 > SST=Syy #total sum of squares > SST [1] 44.077 > SSR=SST-SSE #regresson sum of squares > SSR [1] 14.6464 > b1*sxy #another way to get the regresson sum of squares [1] 14.6464 () What s the value of the coeffcent of determnaton? > R=corrxy^ > R [1] 0.33173 8

6. Use the data summares calculated for the prevous queston and the formulae from the book or notes to do the followng questons. (6) () () (a) Assess the null hypothess that there s no relatonshp between task score and creatvty. Use a test based on the normal assumpton. State the hypotheses, show calculaton of the test statstc, calculate the P value and draw a concluson. ( ponts for hypotheses, ponts for observed test statstc, ponts for p-value) H 0 : β 1 = 0, H 0 : β 1 0 > MSE=SSE/(11-) > tobs=b1/sqrt(mse/sxx) > tobs [1].115781 > pvalue=*(1-pt(tobs,11-)) > pvalue [1] 0.06347107 (b) Calculate the 95% confdence nterval for the mean task score when the creatvty s 50. ( ponts for CI) > c(b0+b1*50 - qt(.975,11-)*sqrt(mse)*sqrt(1/11+(50-xbar)^/sxx), + b0+b1*50 + qt(.975,11-)*sqrt(mse)*sqrt(1/11+(50-xbar)^/sxx)) [1] 4.09361 6.549608 (c) Calculate the 95% predcton nterval for a new value for task score when the creatvty s 50. ( ponts for predcton nterval) > c(b0+b1*50 - qt(.975,11-)*sqrt(mse)*sqrt(1+1/11+(50-xbar)^/sxx), + b0+b1*50 + qt(.975,11-)*sqrt(mse)*sqrt(1+1/11+(50-xbar)^/sxx)) [1] 1.01091 9.568047 9

7. Suppose X and Y are random varables wth µ x = 10, σ x = 3, µ y = 4, σ y = 1, and Cov[X, Y ] = 1.5. Calculate: () (4) (4) (a) E[X Y ] E[X Y ] = E[X] E[Y ] = (10) 4 = 16 (b) V ar[x Y ] V [X Y ] = V [X] Cov[X, Y ] + V [Y ] = 4V [X] 4Cov[X, Y ] + V [Y ] = 4(3 ) 4(1.5) + 1 = 31 (c) Cor[X, Y ], where Cor stands for correlaton. Cor[X, Y ] = Cov[X,Y ] = Cov[X,Y ] = 1.5/(3(1)) =.5 V [X]V [Y ] 4V [X]V [Y ] 8. Suppose you have data (x, y ), = 1,..., n and want to ft the lne wth known ntercept, y = + β 1 x + ɛ wth the usual assumptons about ɛ. The least squares estmate for β 1 s ˆβ 1 = x y x. x (a) Fnd the expected value of ˆβ 1. Is ˆβ 1 unbased? (5 ponts for the expected value) E[ ˆβ 1 ] = x E[y ] x x = x ( + β 1 x ) x x = x (β 1 x ) x = β 1 x x = β 1 β 1 s unbased. (b) Fnd the varance of ˆβ 1. (5 ponts for the varance) V [ ˆβ 1 ] = x V [y ] ( = V [Y x ] ) x σ = ) ( x x 9. Suppose you have data (x, y ), = 1,..., n and want to ft the lne wth known slope equal to 1 y = β 0 + x + ɛ (10) Derve the least squares estmator of β 0. The error sum of squares s S(β 0 ) = n (y β 0 x ). Dfferentatng wth respect to β 0 gves d dβ 0 S(β 0 ) = n (y β 0 x ) = Settng the dervatve equal to zero and solvng gves β 0 = ȳ x n y + nβ 0 + n x 10

10. An experment s to be run to determne the lnear assocaton between x and y. Two possble arrangements of the x values are proposed (6) (4) (a) x = (1, 1, 1, 1, 1, 10, 10, 10, 10, 10) and (b) x = (1,, 3, 4, 5, 6, 7, 8, 9, 10).. Calculate S xx for each proposal. (3 ponts for each of the sums of squares.) > x1=c(1,1,1,1,1,10,10,10,10,10) > SSx1=sum((x1-mean(x1))^) > SSx1 [1] 0.5 > x=c(1,,3,4,5,6,7,8,9,10) > SSx=sum((x-mean(x))^) > SSx [1] 8.5. Whch arangement wll lead to the most precse (.e. smallest varance) estmator of the slope? Justfy your answer. (4 ponts for a reasonable justfcaton.) The varance of β σ 1 s gven by SS XX where SS XX s the sum of squares of the x s. The frst confguraton has a larger value of the X sum of squares, so a smaller value of V [ β 1 ], whch means a more precse estmator. 11