Stat 500 Midterm 2 8 November 2007 page 0 of 4

Similar documents
Stat 500 Midterm 2 12 November 2009 page 0 of 11

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Correlation Analysis

Inferences for Regression

Concordia University (5+5)Q 1.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Chapter 16. Simple Linear Regression and Correlation

Section 3: Simple Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 16. Simple Linear Regression and dcorrelation

Homework 2: Simple Linear Regression

28. SIMPLE LINEAR REGRESSION III

Ph.D. Preliminary Examination Statistics June 2, 2014

Math 3330: Solution to midterm Exam

LI EAR REGRESSIO A D CORRELATIO

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Midterm 2 - Solutions

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

MATH 644: Regression Analysis Methods

ST430 Exam 1 with Answers

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Section 11: Quantitative analyses: Linear relationships among variables

ST430 Exam 2 Solutions

Simple Linear Regression

PLSC PRACTICE TEST ONE

Sociology 593 Exam 1 February 14, 1997

Stat 217 Final Exam. Name: May 1, 2002

Practice A Exam 3. November 14, 2018

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Chapter 12 - Lecture 2 Inferences about regression coefficient

Formal Statement of Simple Linear Regression Model

Battery Life. Factory

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Algebra 2 CP Semester 1 PRACTICE Exam January 2015

Basic Business Statistics 6 th Edition

STATISTICS 110/201 PRACTICE FINAL EXAM

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

MAT 2377C FINAL EXAM PRACTICE

A discussion on multiple regression models

Ch 2: Simple Linear Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

Stat 401B Exam 2 Fall 2015

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Ch 11- One Way Analysis of Variance

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 212 Business Statistics II 1

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Problem # Number of points 1 /20 2 /20 3 /20 4 /20 5 /20 6 /20 7 /20 8 /20 Total /150

ECON 497 Midterm Spring

Page Points Score Total: 100

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

Written Exam (2 hours)

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Chapter 3 Multiple Regression Complete Example

STA 6167 Exam 1 Spring 2016 PRINT Name

Correlation and Simple Linear Regression

CHAPTER EIGHT Linear Regression

Algebra 2 CP Semester 1 PRACTICE Exam

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Stat 231 Exam 2 Fall 2013

Statistics 5100 Spring 2018 Exam 1

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Inference for Regression Simple Linear Regression

Regression Analysis II

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Business 320, Fall 1999, Final

Inference for Regression

appstats27.notebook April 06, 2017

Diagnostics and Remedial Measures

Answers: Problem Set 9. Dynamic Models

Fall 2013 Northwestern University STAT Section 23 Introduction to Statistics. Final Exam. Name:

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

STAT 516 Midterm Exam 2 Friday, March 7, 2008

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Chapter 14 Student Lecture Notes 14-1

Simple Linear Regression

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Econometrics. 4) Statistical inference

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

Unit 12: Analysis of Single Factor Experiments

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Simple Linear Regression

STAT 515 MIDTERM 2 EXAM November 14, 2018

In the previous chapter, we learned how to use the method of least-squares

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Unit 10: Simple Linear Regression and Correlation

Regression With a Categorical Independent Variable: Mean Comparisons

Factorial designs. Experiments

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n.

Stat 101 L: Laboratory 5

Econometrics Homework 1

Transcription:

Stat 500 Midterm 2 8 November 2007 page 0 of 4 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. DO NOT START until I tell you to. You are welcome to read this front page. The exam is closed book, closed notes. Use only the formula sheet and tables I provide today. You may use a calculator. Write your answers in your blue book. Ask if you need a second (or third) blue book. You have 2 hours (120 minutes) to complete the exam. Stop working when the end of the exam is announced. Points are indicated for each question. There are 120 total points. Important reminders: budget your time. Some parts of each question should be easy; others may be hard. Make sure you do all parts you can. notice that some parts do not require any computations. show your work neatly so you can receive partial credit. Good luck!

Stat 500 Midterm 2 8 November 2007 page 1 of 4 1. 45 points. Are baseball free agents worth it? After a professional baseball s player contract expires, that player becomes a free agent and is permitted to sell his services to the highest bidder. The salaries paid to free agents are often much higher than those paid to regular players. Do players perform better when paid higher salaries? The batting average is one measure of performance of a baseball player. Higher batting averages indicate better performance. Data were collected on 132 players who became free agents between 1976 through 1989. For each player, two batting averages were computed. PRE is the batting average the year before the player became a free agent. POST is the batting average the next year, after the player became a free agent. We want to make inferences about the mean difference in batting average. (a) These data are an example of: (write the numbers of all appropriate terms on your answer sheet) 1) Block design 2) Paired data 3) Randomized experiment 4) Observational data 5) Latin Square design (b) You want to test the null hypothesis that the difference (POST - PRE) = 0. Here are the means and standard deviations for four quantities that might be useful. Please calculate the appropriate t-statistic. Note: you do not need to provide a p-value. Quantity Average s.d. Pre Free Agent 0.258 0.0337 Post Free Agent 0.249 0.0418 Difference -0.00929 0.0459 Player average 0.253 0.0302 (c) There are 132 players and 264 observations in the data set. What are the d.f. associated the T statistic in part 1b? (d) Fill in the missing d.f. and SS. in the ANOVA table. The variable PLAYER is the player name; the variable PERIOD has two values: PRE or POST. Note: you do not need to compute MS or F values. Source d.f. SS Player 0.23902 Period 0.00569 Error Total 263 0.38316 (e) The s.d. for the pre values is smaller than that for the post values. Does this invalidate your test in part 1b or part 1d? Explain why or why not. (f) One assumption of this analysis is additivity. Briefly explain, using this study as an example, the meaning of additivity. (g) This study will be repeated, but the choice of design is not clear. Is it better to use the same design or something different? The correlation between two observations on the same player is estimated to be 0.26. What design would you recommend? Justify your choice.

Stat 500 Midterm 2 8 November 2007 page 2 of 4 (h) Below is the residual vs. predicted value plot from the ANOVA in part 1d. Does this plot indicate any problems or concerns about the analysis? 2. 20 pts. Pharmaceuticals usually have a shelf life. One reason is that the concentration of active ingredient decreases over time. One common model for this is the exponential decay model Y i = C 0 e λx i ɛ i, where Y i is the concentration at time i, C 0 is the initial concentration (at time of manufacture), X i is the time since manufacture, and λ is the decay rate. This model can be fit using linear regression by log transforming both sides to get Y i = log Y i = β 0 + λx i + ε i. Data were collected from a randomized experiment. Twenty packages of a particular drug were randomly assigned to be sampled at one of five times, 0 months, 2 months, 4 months, 8 months, and 12 months after manufacture. 4 packages were assigned to each sampling time. Each package was measured once and only once, at the assigned sampling time. The data are shown below.

Stat 500 Midterm 2 8 November 2007 page 3 of 4 (a) Four different models were fit to the data. Some unimportant details of each model are left out. The Error sums of squares for each are: Model # parameters Error SS Yi = β 0 1 10.020 Yi = β 0 + λx i 2 0.5051 Yi = β 0 + λx i + β 2 Xi 2 3 0.5050 Yi = µ + τ i 5 0.2550 Please test whether λ = 0. Report the test statistic, the appropriate d.f. and an approximate p-value. If this is not possible from the information provided, indicate what additional information you need. (b) Construct the most appropriate test of lack of fit of the linear regression, using one or more of the sums of squares in part 2a. Report the test statistic, its d.f., an approximate p-value and a short conclusion. (c) The drug company has a large batch of the drug (many packages) that has been in storage for four months. They will use the regression to predict the mean concentration of drug in this batch. They also want to report the precision of that estimate. There are three possible formulae that could be used: 1) 2) 3) MSE ( 1 + n MSE (1 + 1 + n MSE/4 (4 x)2 (xi ) x) 2 (4 x)2 (xi ) x) 2 Which is the most appropriate formula to calculate standard error? Explain (briefly) why your choice is the most appropriate. The next problem starts on the next page.

Stat 500 Midterm 2 8 November 2007 page 4 of 4 3. 55 pts. Mercury is a naturally occuring contaminant in fish in freshwater lakes. In some lakes, the mercury concentration is sufficiently high that warnings are posted to limit the consumption of fish from that lake. The mercury concentration is influenced by many characteristics of lakes. The state of Maine collected information from a random sample of 110 lakes. The variables of interest included: hg: mercury concentration in fish. elev: elevation of the lake area: area of lake drain: size of the drainage basin for the lake runoff: annual runoff into lake Here are the parameter estimates and standard errors when a multiple linear regression is fit to the data, and an ANOVA table: parameter estimate s.e. VIF Source d.f. SS β 0 0.758 0.144 0 Model 1.137 β elev -0.211 0.061 1.07 Error 7.515 β area -0.014 0.016 1.82 Total 8.652 β drain 0.0137 0.029 1.85 β runoff -0.292 0.262 1.02 (a) Fill in the degrees of freedom in the above ANOVA table. (b) An undergraduate student calculated the slope for runoff in a simple linear regression as -0.527. When they see your results (β runoff = -0.292) they are concerned that they made a calculation error. Should the two estimates be the same? If not, explain why they are likely to be different. (c) Construct a test of H0: β elev = 0, β area = 0, β drain = 0, β runoff = 0, if that is possible from the data provided. If not, indicate how you would construct the desired test. (d) Construct a test of H0: β drain = 0, β runoff = 0, if that is possible from the data provided. If not, indicate how you would construct the desired test. (e) Construct a test of H0: β area + β drain = 0, if that is possible from the data provided. If not, indicate how you would construct the desired test. (f) Construct a test of H0: β drain = 0, if that is possible from the data provided. If not, indicate how you would construct the desired test. (g) The next two pages provide a variety of diagnostic plots. In order, they are a plot of h ii for each observation, DF F IT S i for each observation, Cook s distance for each observation, externally studentized residual for each observation, and a plot of externally studentized residuals against predicted values. Also, VIF values for each parameter are in the table at the top of the page. The investigators want to use this model to predict mercury concentrations in unsampled lakes (there are a lot of lakes in Maine!). Do you have any concerns about the fitted model? If so, describe your concerns. (h) Would it be appropriate to use a Durbin-Watson test here? Explain why or why not. (i) Would it be appropriate to use a Breusch-Pagan test here? Explain why or why not.