ST430 Exam 1 with Answers

Similar documents
ST430 Exam 2 Solutions

Inference for Regression

Density Temp vs Ratio. temp

Lecture 18: Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Unit 6 - Simple linear regression

Unit 6 - Introduction to linear regression

Ch 2: Simple Linear Regression

Comparing Nested Models

Multiple Linear Regression

MATH 644: Regression Analysis Methods

Simple Linear Regression

Ch 3: Multiple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

Coefficient of Determination

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Chapter 12: Linear regression II

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Applied Regression Analysis

Simple and Multiple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Scatter plot of data from the study. Linear Regression

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Regression on Faithful with Section 9.3 content

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

13 Simple Linear Regression

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Math 2311 Written Homework 6 (Sections )

A discussion on multiple regression models

Regression and the 2-Sample t

AMS-207: Bayesian Statistics

14 Multiple Linear Regression

Introduction and Single Predictor Regression. Correlation

Stat 5102 Final Exam May 14, 2015

Math 3330: Solution to midterm Exam

STAT 215 Confidence and Prediction Intervals in Regression

AMS 7 Correlation and Regression Lecture 8

Multiple Linear Regression

STAT 350: Summer Semester Midterm 1: Solutions

Linear Regression Model. Badr Missaoui

Inferences for Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

22s:152 Applied Linear Regression

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

1 Multiple Regression

Lecture 4 Multiple linear regression

STAT 3022 Spring 2007

Handout 4: Simple Linear Regression

Example: 1982 State SAT Scores (First year state by state data available)

Regression Analysis Chapter 2 Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Practice Final Examination

Variance Decomposition and Goodness of Fit

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

SCHOOL OF MATHEMATICS AND STATISTICS

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Inferences on Linear Combinations of Coefficients

CAS MA575 Linear Models

Chapter 16: Understanding Relationships Numerical Data

MS&E 226: Small Data

Stat 401B Final Exam Fall 2015

UNIVERSITY OF TORONTO Faculty of Arts and Science

Stat 401B Exam 2 Fall 2015

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Regression. Marc H. Mehlman University of New Haven

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

STA 101 Final Review

SCHOOL OF MATHEMATICS AND STATISTICS

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

MODELS WITHOUT AN INTERCEPT

Lecture 2. Simple linear regression

We d like to know the equation of the line shown (the so called best fit or regression line).

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Exam Applied Statistical Regression. Good Luck!

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Data Analysis Using R ASC & OIR

Correlation and Regression

Final Exam - Solutions

Linear Probability Model

Correlation and Linear Regression

Swarthmore Honors Exam 2012: Statistics

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Lecture 11: Simple Linear Regression

Mathematical Notation Math Introduction to Applied Statistics

1 Introduction 1. 2 The Multiple Regression Model 1

Model Specification and Data Problems. Part VIII

Transcription:

ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator. Giving or receiving assistance from other students is not allowed. Show work to receive partial credit! Partial credit will be given, but only for work written on the exam. The total points are 25. Good luck!

1. Assume that the math scores of high school seniors in North Carolina are normally distributed with mean 82 and standard deviation 5. (a) (2 points) Compute the z-score of a student with math score 85. Would you say this student has an extremely high math score? Justify your answer. ANSWER: z-score= (85-82)/5=.6. Note that for standard normal distribution, 95% of the data is within 2 standard deviation from 0; here,.6 < 2, suggesting the 85 is not an extreme value. (b) (3 points) Let X be the mean math score of a class of 25. What is the probability that X is greater than 83.6? ANSWER: P ( X > 83.6) = P ( X 82 5/ 83.6 82 > 25 5/ 25 ) 83.6 82 = P (Z > 5/ 25 ) = P (Z > 1.6) =.0548

2. Consider the following three scatter plots with least squares lines. (a) (1 point) Which least squares line has the largest intercept? ANSWER: Plot3. About 3.1. (b) (1 point) Which least squares line has the largest slope? ANSWER: Plot1 has largest slope. (c) (2 points) Which simple linear regression has the largest coefficient of determination (R 2 )? ANSWER: Plot 3, as its line has the best of fit among the three.

3. The British Journal of Sports Medicine (April 2000) published a study of the effect of massage on boxing performance. Two variables measured on the boxers were blood lactate concentration (mm) and the boxer s perceived recovery (28-point scale). The data were obtained for 16 five-round boxing performances, where a massage was given to the boxer between rounds. The plot below gives the 95% prediction interval for the average value and a particular value of perceived recovery for several levels of blood lactate concentration. (a) (2 points) Explain why the interval for a particular value is considerably wider than the interval for the average value. ANSWER: Recall the formula of confidence interval and prediction interval, the only difference lies in the margin part, 1 n + (x p x) 2 SS xx < 1 + 1 n + (x p x) 2 SS xx Intuitively, for confidence interval, we estimate E(y) = β 0 + β 1 x with ŷ = ˆβ 1 + ˆβ 2 x. The error is just ŷ E(y). However, for prediction interval, we estimate y with ŷ, the error being ŷ y = ŷ E(y) + (E(y) y) = ŷ E(y) ɛ, which has additional error ɛ from individual level. (b) (1 point) Would it be wise to use this simple linear regression model to predict a boxer s pereceived recovery if the blood lactate level is 1mM? Explain. ANSWER: No. Predication at some value of explanatory variable which is out of the range of observed data will produce unreliable result. 1mM is way below the lower bound of the data.

4. The R output for the data of the previous problem is: Call: lm(formula = RECOVERY ~ LACTATE, data = BOXING2) Residuals: Min 1Q Median 3Q Max -6.577-3.752 0.060 3.067 8.043 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.7967 4.9838 0.561 0.5836 LACTATE 2.5667 0.9883 2.597 0.0211 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 4.28 on 14 degrees of freedom Multiple R-squared: 0.3251,Adjusted R-squared: 0.2769 F-statistic: 6.744 on 1 and 14 DF, p-value: 0.0211 (a) (1 point) Give the least squares regression line. ANSWER: ŷ = 2.7967 + 2.5667 LACTATE (b) (2 points) Give the slope and it interpretation in the context of the problem. ANSWER: Slope is 2.5667. Thus, the boxers perceived recovery (28-point scale) increases by 2.5667 point on average for one additional unit increase in blood lactate concentration (mm). (c) (1 point) Give the sample correlation between blood lactate level and perceived recovery. ANSWER: Sample correlation is just R 2 =.3251 =.57 (note the sign must agree wit the sign of slope). (d) (2 points) Is there a statistically significant association between blood lactate level and perceived recovery at the 0.05 level? Explain. ANSWER: H 0 : β = 0 H a : β 0 Look at the t value corresponding to LACTATE, its p-value.0211 <.05. Therefore, we reject the null and conclude there is a statistically significant association between blood lactate level and perceived recovery. (e) (1 point) Would you say there is a strong linear relationship between blood lactate level and perceived recovery? Explain. ANSWER: The R-squares is 0.3251, which means only about 32.5 percent of variation in the data can be explained by the model. Hence the linear relationship is not very strong.

5. The first-order multiple regression model with two predictors is Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ, where Y is the dependent variable, X 1 and X 2 are the independent variables, and ɛ is the random error. We collect 32 observations and perform a multiple regression. The R output is: Call: lm(formula = Y ~ X1 + X2) Residuals: Min 1Q Median 3Q Max -2.2128-0.5937 0.1083 0.7110 1.8639 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -2.8676 1.3173-2.177 0.03778 * X1 2.4296 0.6857 3.543 0.00136 ** X2 2.2206 0.6615 3.357 0.00222 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.012 on 29 degrees of freedom Multiple R-squared: 0.396,Adjusted R-squared: 0.3543 F-statistic: 9.505 on 2 and 29 DF, p-value: 0.0006691 (a) (1 point) What is the least squares regression line? ANSWER: ŷ = 2.8676 + 2.4296X 1 + 2.2206X 2 (b) (1 point) Conduct a test of overall model utility. Use α =.05. ANSWER: H 0 : β 1 = β 2 = 0 H a : at least one of β 1 and β 2 is not equal to zero. F-test value is 9.505 with p-value=.0006 which is less than.05. Thus the model is useful in that it explains some variation in Y using X 1 and X 2. (c) (2 points) Conduct a test whether X 1 is significantly associated with Y. Use α =.05. ANSWER: H 0 : β 1 = 0 and H a : β 1 0. The t-value for X 1 is 3.543. Its p-value is.00136, which is less than.05. Thus we reject the null and conclude X 1 is significantly associated with Y. (d) (2 points) What assumptions about ɛ s distribution are needed for the test in (c). ANSWER: we need the following 1. ɛ i follows the same normal distribution N(0, σ 2 ) for all i; 2. ɛ i is independent of ɛ j for j i.