STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Similar documents
STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 3: Inference in SLR

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Formal Statement of Simple Linear Regression Model

STOR 455 STATISTICAL METHODS I

Lecture 11: Simple Linear Regression

Inference for Regression

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Chapter 2 Inferences in Simple Linear Regression

Ch 2: Simple Linear Regression

Math 3330: Solution to midterm Exam

Overview Scatter Plot Example

Multiple Regression Examples

Lecture 10 Multiple Linear Regression

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

STAT 3A03 Applied Regression With SAS Fall 2017

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Lecture 18: Simple Linear Regression

Lecture notes on Regression & SAS example demonstration

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 12: Multiple Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Lectures on Simple Linear Regression Stat 431, Summer 2012

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Homework 2: Simple Linear Regression

Inference for the Regression Coefficient

ST Correlation and Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

STATISTICS 479 Exam II (100 points)

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Chapter 6 Multiple Regression

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

6. Multiple Linear Regression

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

General Linear Model (Chapter 4)

Ch 3: Multiple Linear Regression

Lecture 9 SLR in Matrix Form

CHAPTER EIGHT Linear Regression

STAT Chapter 11: Regression

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Inferences for Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

Simple Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Lecture 18 MA Applied Statistics II D 2004

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Concordia University (5+5)Q 1.

Lecture 6 Multiple Linear Regression, cont.

3 Variables: Cyberloafing Conscientiousness Age

Topic 14: Inference in Multiple Regression

INFERENCE FOR REGRESSION

Lecture 12 Inference in MLR

Statistics 5100 Spring 2018 Exam 1

Correlation Analysis

Chapter 12 - Lecture 2 Inferences about regression coefficient

ST430 Exam 1 with Answers

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

What to do if Assumptions are Violated?

Density Temp vs Ratio. temp

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Simple and Multiple Linear Regression

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

ECON3150/4150 Spring 2015

Chapter 14. Linear least squares

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

How to mathematically model a linear relationship and make predictions.

2.1: Inferences about β 1

Simple Linear Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Multiple Linear Regression

Introduction to Regression

Chapter 1 Linear Regression with One Predictor

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

CS 5014: Research Methods in Computer Science

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Mathematics for Economics MA course

Chapter 15. Correlation and Regression

Data Set 8: Laysan Finch Beak Widths

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Chapter 1. Linear Regression with One Predictor Variable

Inference for Regression Inference about the Regression Model and Using the Regression Line

Solution: X = , Y = = = = =

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

ECON3150/4150 Spring 2016

AMS 7 Correlation and Regression Lecture 8

Section Least Squares Regression

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

Transcription:

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. A table of values from the t distribution is on the last page (page 8). Total points: 50 Some formulae: b 1 (Xi X)(Y i Y ) Xi Y (Xi i nxy X) 2 X 2 i nx 2 b 0 Y b 1 X ( ) σ Var(b 1 ) 2 (Xi Var(b X) 2 0 ) σ 2 1 n + X 2 (Xi X) 2 Cov(b 0, b 1 ) (Xi SSTO (Y X) 2 i Y ) 2 σ2 X SSE (Y i Ŷi) 2 SSR b 2 1 (Xi X) 2 (Ŷi Y ) 2 ( ) σ 2 {Ŷh} Var(Ŷh) σ 2 1 n + (X h X) 2 (Xi X) 2 ( ) σ 2 {pred} Var(Y h Ŷh) σ 2 1 + 1 n + (X h X) 2 (Xi X) 2 r (Xi X)(Y i Y ) (Xi X) 2 (Y i Y ) 2 Working-Hotelling coefficient: W 2 F 2,n 2; α 1 2a 2bcdef 2ghi 2j 3 1

1. The following questions require derivations of results for the simple linear regression model. (a) (2 marks) In lecture we showed that n e i 0 and n e i X i 0. Given these results, what is n e i Ŷ i? Justify your answer. e i Ŷ i e i (b 0 + b 1 X i ) n n b 0 e i + b 1 e i X i 0 (b) (5 marks) Show that the total Sum of Squares in a regression can be decomposed as ) 2 (Ŷi n ( ) 2 Y + Y i Ŷi You may use any results that were derived in lecture. and SSTO (Y i Y ) 2 (Y i Ŷi + Ŷi Y ) 2 (Ŷi Y ) 2 + (Y i Ŷi) 2 + 2 (Ŷi Y )(Y i Ŷi) (Ŷi Y )(Y i Ŷi) So SSTO ) n 2 ( (Ŷi Y + n Y i Ŷi ) 2 (Ŷi Y )e i e i Ŷ i Y e i 0 (c) (5 marks) Assume that the X i are non-random. Derive the formula for Cov(b 0, b 1 ) given on the first page. You may use any other formulae from the first page that you require except the formulae whose derivations require knowing Cov(b 0, b 1 ). (Hint: You may want to start with the formula for the estimated intercept.) From the formula for b 0 : Y b 0 + b 1 X So Var(Y ) Var(b 0 ) + X 2 Var(b 1 ) + 2XCov(b 0, b 1 ) ( ) σ 2 1 σ 2 n n + X2 + X 2 σ 2 + 2XCov(b 0, b 1 ) S XX S XX Rearranging gives Cov(b 0, b 1 ) σ2 X S XX where S XX n (X i X) 2 2

2. The SAS output that follows was produced to examine the relationship between full-scale IQ (FSIQ) and brain size as measured by MRI (MRIcount). Measurements were taken on 20 university students chosen because their full-scale IQ was at least 130. The REG Procedure Descriptive Statistics Uncorrected Standard Variable Sum Mean SS Variance Deviation Intercept 20.00000 1.00000 20.00000 0 0 MRIcount 18518961 925948 1.725648E13 5730703420 75701 FSIQ 2728.00000 136.40000 372396 15.62105 3.95235 Dependent Variable: FSIQ Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 89.22306 89.22306 7.74 0.0123 Error 18 207.57694 11.53205 Corrected Total 19 296.80000 Root MSE 3.39589 R-Square 0.3006 Dependent Mean 136.40000 Adj R-Sq 0.2618 Coeff Var 2.48965 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 109.89399 9.55947 11.50 <.0001 MRIcount 1 0.00002863 0.00001029 2.78 0.0123 (a) (5 marks) Give estimates of the following quantities: the percent of total variability in FSIQ that is explained by its linear relationship with MRIcount 30.06% the FSIQ for a person whose MRIcount is 1,000,000 138.5 the error variance 11.532 the variance of the slope 1.06 10 10 the difference in FSIQ between 2 people when the second person has MRIcount that is 10,000 units larger than the first person 0.286 3

(Question 2 continued.) (b) (2 marks) Explain the practical meaning of the estimated intercept. It has no practical meaning in this context because an MRIcount of 0 is impossible. (c) (2 marks) Give a 95% confidence interval for the slope. t 18,.025 2.101 CI: 0.00002863 ± 2.101(0.00001029) (0.00000701, 0.0000502) (d) (1 mark) What are the null and alternative hypotheses for the test with p-value of 0.0123? H 0 : β 1 0 versus H a : β 1 0 (e) (3 marks) The p-value of 0.0123 appears twice in the SAS output. Explain clearly how the test statistics are related for these data and show that this relationship holds for all simple linear regressions. The test statistics are t obs 2.78 and F obs 7.74 and 2.78 2 7.74 (within roundoff error) In general, ( ) ( ) t 2 obs b 2 2 1 b 1 b2 1 S XX s.e. of b 1 MSE/SXX MSE SSR MSE F obs (f) (4 marks) Calculate a 90% interval estimate of FSIQ for an additional student whose MRIcount is 1,025,948. FSIQ ˆ 109.89 + 0.0000286(1025948) 139.27 S XX (n 1)s 2 X 19(5730703420) 108883364980 t 18,.05 1.734 PI: 139.37 ± 1.734(3.396) 1 + 1 20 + (1025948 925948)2 108883364980 (132.97, 145.56) 4

(Question 2 continued.) (g) (2 marks) In addition to the student considered in part (f), suppose we also need an interval estimate of FSIQ for another student who has an MRIcount of 825,948. For these two intervals, a simultaneous confidence level of 90% is required. How should the interval for the student considered in part (f) be adjusted to take into account the fact that you are now interested in two additional students? (Note that you do not need to calculate anything for the second additional student and you do not need to re-calculate the interval from part (f), just explain how it would change.) Use the Bonferroni method; that is, make each interval at confidence level 95%. the only adjustment necessary is to substitute t 18,.05 with t 18,.025 2.101. So (h) (3 marks) Explain clearly what it means for the intervals discussed in part (g) to be simultaneous. In repeated samples of 20 students with the prediction intervals re-calculated for each sample, at least 90% of the pairs of prediction intervals will both capture the actual FSIQ of the 2 additional students. (i) (5 marks) The following is a plot of the residuals versus the predicted values. What assumptions can be evaluated from this plot? What do you conclude? Assumptions: - linear model is appropriate (no curvature, outliers, influential points) - variance is constant Both of these assumptions appear to be satisfied. 5

(Question 2 continued.) (j) Suppose the data were collected for 40 students, 20 with high IQs and 20 with low IQs and suppose the scatterplot of FSIQ versus MRIcount for all 40 observations looks like the following plot. A regression line is fit to these 40 points. i. (2 marks) Explain why you would expect a high R 2. There is lots of variation in the Y s, particularly between the two groups. But a regression line (which will roughly connect the centres of the 2 groups) accounts for much of it. ii. (3 marks) In fact, R 2 for the 40 observations in this plot is 85%. Considering only this fact and the plot above, is this evidence of a strong linear relationship between FSIQ and MRIcount? Why or why not? Is there any additional information you would like? It is not evidence of a strong linear relationship as the line just connects the 2 clumps of points and may not fit either clump well. I d like to see a separate regression line fit to each group and evaluate each of them for fit. 6

3. (6 marks (2 each)) For each of the following statements regarding simple linear regression, state whether you agree or disagree. Briefly explain your choice. (a) 95% confidence limits for an intercept were constructed in a regression analysis for a study. The confidence interval may be interpreted as follows: If we were able to repeat the study and the corresponding analysis a large number of times with the same sample size, we would expect that 95% of the resulting estimated intercepts would fall in the original confidence interval. DISAGREE We expect that 95% of the confidence intervals will include the true (unknown) intercept. (b) The sample mean of the residuals always equals the true mean of the error term. AGREE The true mean of the errors is E(ɛ) 0 and we have shown that n e i 0 so e 0. (c) For the least squares method to be valid, the error terms ɛ i must be normally distributed with mean zero and common variance σ 2. DISAGREE For least squares the only assumption is that the relationship is linear. 7