Practice Final Exam. December 14, 2009

Similar documents
Ch 2: Simple Linear Regression

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Ch 3: Multiple Linear Regression

Stat 502 Design and Analysis of Experiments General Linear Model

Chapter 10: Analysis of variance (ANOVA)

16.3 One-Way ANOVA: The Procedure

Concordia University (5+5)Q 1.

Statistical Inference

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Chapter 11 - Lecture 1 Single Factor ANOVA

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Lecture 15. Hypothesis testing in the linear model

STAT Exam Jam Solutions. Contents

Masters Comprehensive Examination Department of Statistics, University of Florida

Lecture 21. Hypothesis Testing II

Multiple Linear Regression

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Regression and Statistical Inference

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Two-Way Factorial Designs

STAT 705 Chapter 16: One-way ANOVA

Week 14 Comparing k(> 2) Populations

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Factorial designs. Experiments

Lecture 6 Multiple Linear Regression, cont.

Swarthmore Honors Exam 2012: Statistics

STATS Analysis of variance: ANOVA

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Business Statistics 41000: Homework # 5

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

Linear models and their mathematical foundations: Simple linear regression

Inference for Regression

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Chapter 12 - Lecture 2 Inferences about regression coefficient

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Master s Written Examination

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

ANOVA (Analysis of Variance) output RLS 11/20/2016

Math 494: Mathematical Statistics

Mathematics for Economics MA course

3. Design Experiments and Variance Analysis

p(z)

STAT 3A03 Applied Regression With SAS Fall 2017

STAT22200 Spring 2014 Chapter 8A

Collaborative Statistics: Symbols and their Meanings

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Statistical Hypothesis Testing

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

22s:152 Applied Linear Regression. Take random samples from each of m populations.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 2101/442 Assignment 3 1

ST Correlation and Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Statistics II Exercises Chapter 5

Linear Regression and Its Applications

ECON3150/4150 Spring 2016

Math 3330: Solution to midterm Exam

Two-Way Analysis of Variance - no interaction

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Master s Written Examination - Solution

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

[y i α βx i ] 2 (2) Q = i=1

Simple and Multiple Linear Regression

Exam C Solutions Spring 2005

This document contains 3 sets of practice problems.

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Master s Examination Solutions Option Statistics and Probability Fall 2011

One-way ANOVA (Single-Factor CRD)

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Correlation 1. December 4, HMS, 2017, v1.1

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

IE 4521 Midterm #1. Prof. John Gunnar Carlsson. March 2, 2010

Sociology 6Z03 Review II

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Written Exam (2 hours)

Inferences for Regression

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Simple Linear Regression

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Chapter 11 - Lecture 1 Single Factor ANOVA

Problem Selected Scores

Section 3: Simple Linear Regression

A Note on UMPI F Tests

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Coefficient of Determination

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Transcription:

Practice Final Exam December 14, 29 1 New Material 1.1 ANOVA 1. A purication process for a chemical involves passing it, in solution, through a resin on which impurities are adsorbed. A chemical engineer is testing the eciency of 3 dierent resins in collecting impurities; he breaks each resin into 5 pieces and measures the concentration of impurities after passing through the resins. The data are as follows: Concentration of impurities Resin 1 Resin 2 Resin 3.46.38.31.25.35.42.14.31.2.17.22.18.43.12.39 Test the hypothesis that there is no dierence in the eciency of the resins, using analysis of variance techniques. Solution We want to test the hypothesis H : µ 1 µ 2 µ 3. The analysis of variance table is Source d.f. SOS Mean Squares F -statistic Treatments 2 SSTr 1.4533 1 5 MSTr 7.27 1 6 F.487 Error 12 SSE.18 MSE 1.49 1 4 Total 14 SST.18 p-value:.953 Since our p-value is.953, we can accept the null hypothesis. 2. Four standard chemical procedures are used to determine the magnesium content in a certain chemical compound. Each procedure is used four times on a given compound with the following data resulting: Magnesium content Method 1 Method 2 Method 3 Method 4 76.42 8.41 74.2 86.2 78.62 82.26 72.68 86.4 8.4 81.15 78.84 84.36 78.2 79.2 8.32 8.68 Do the data indicate that the procedures yield equivalent results? Solution We want to test the hypothesis H : µ 1 µ 2 µ 3 µ 4. The analysis of variance table is Source d.f. SOS Mean Squares F -statistic Treatments 3 SSTr 135.7625 MSTr 45.254 F 7.474 Error 12 SSE 72.66 MSE 6.55 Total 15 28.423 p-value:.44 Since our p-value is.44, we can safely reject the null hypothesis. 1

3. For data x ij, i 1,..., m, j 1,..., m, show that x m x i /m where x 1 mn m n x ij is the sample mean of all x ij. x j /n Solution We can write Also, x x 1 mn m x ij 1 m 1 x ij m n }{{} x i m x i /m 1 n 1 mn 1 mn m m x ij x ij m ) 1 x ij m }{{} x j /n x j 4. Problem 11.1.21 in the text. Solution The rst condence interval, for example, is µ 1 µ 2 x 1 x 2 ± sq α,k,ν 1 + 1 ) 2 n 1 n 2 46.9 42.21 ± 4.33 q ).5,5,4 1 2 1 + 1 9 1.83, 9.563) This establishes that the hypothesis µ 1 µ 2 is plausible at the.5 signicance level, since the interval contains. Carrying out this procedure for all pairs, the condence intervals that contain are µ 1 µ 2, µ 2 µ 5, and µ 3 µ 4. The largest mean is µ 3 or µ 4 and the smallest mean is either µ 2 or µ 5, which can be veried by looking at the values of x 1 through x 5 1.2 Regression 1. The following table shows the number of units of a good that customers ordered, when the good was priced at various levels. In economics, these points are said to lie on a demand curve. Number ordered 88 112 123 136 158 172 Price 5 4 35 3 2 15 2

How many units do you think would be ordered if the price were 25? Solution We perform a regression of the form y i β + β 1 x i + ɛ i with y i being the number ordered and x i the price. Solving for β and β 1, we nd that ˆβ 26.7 and ˆβ 1 2.38. At x 25 we estimate Y x 26.7 2.38 25 147 units ordered. 2. Consider the simple linear regression model Suppose that < β 1 < 1. Y β + β 1 x + ɛ i a) Show that if x < β 1 β 1, then Solution We can write x < E Y ) < β E Y ) β + β 1 x x E Y ) β β 1 < β E Y ) < β 1β + β β Now, to show that x < E Y ), we just have to show that x < β + β 1 x since E Y ) β + β 1 x. This is straightforward: we have as desired. b) Show that if x > β 1 β 1, then x < ) x < β β x β 1 x < β x < β + β 1 x x > E Y ) > β and hence conclude that E Y ) always lies between x and Solution We can write β 1 β 1. E Y ) β + β 1 x x E Y ) β β 1 > β E Y ) > β 1β + β β Now, to show that x < E Y ), we just have to show that x > β + β 1 x 3

since E Y ) β + β 1 x. This is, again, straightforward: we have as desired. x > ) x > β β x β 1 x > β x > β + β 1 x 3. It has been determined that the relation between stress S) and the number of cycles to failure N) for a particular type of alloy is given by S A N m where A and m are unknown constants. An experiment is run yielding the following data: Stress 55. 5.5 43.5 42.5 42. 41. 35.7 34.5 33. N millions).223.925 6.75 18.1 29.1 5.5 126 215 445 a) Estimate A and m hint: use a logarithmic transformation). Solution Using a logarithmic transformation, we nd that log S log A m log N so there is a linear relationship between log S and log N. We set y log S and x log N, and we obtain the new table y 4.1 3.92 3.77 3.75 3.74 3.71 3.57 3.54 3.5 x -1.5 -.8 1.91 2.89 3.37 3.92 4.84 5.37 6.1 Solving for β and β 1 we nd that ˆβ 3.92 and ˆβ 1.66, and therefore  e ˆβ 5.51 and m β 1.66 b) Estimate β, β 1, and β 2 if we instead use the relation S β + β 1 N + β 2 N 2 Why is this probably) a less reasonable model? In particular,what happens to each model as N? Solution This is a multi-variable regression of the form y β + β 1 x 1 + β 2 x 2 with y S, x 1 N, x 2 x 2 1 N 2. We write the matrices 1.223.497 1.925.8556 X... ; Y 1 445. 1.98 1 5 and solve the normal equations X T Xβ X T Y, which gives us ˆβ 47.86, ˆβ1.114, and ˆβ 2.2. This is not a very good model because S as N as ˆβ 2 > ), which is not reected in the data set, whereas in our original model we have S as N, which is reected in the data set. 1.3 Multi-factor experiments 1. Suppose we observe the following data in a two-factor experiment: Factor A Level 1 Level 2 Factor B Level 1 2,4,6 1,3,5 Level 2 1,3,11 2,6,12 55. 5.5. 33. 4

Estimate the parameters µ, α i, β j, and α β) ij using the appropriate estimators. Solution We estimate the parameters with ˆµ x 4.67 ˆα 1 x 1 x.167 ˆα 2 x 2 x.167 ˆβ 1 x 1 x 1.167 ˆβ 2 x 2 x 1.167 α β) 11 x 11 ˆµ ˆα 1 ˆβ 1.67 α β) 12 x 12 ˆµ ˆα 1 ˆβ 2.67 α β) 21 x 21 ˆµ ˆα 2 ˆβ 1.67 α β) 22 x 22 ˆµ ˆα 2 ˆβ 2.67 2. This one's hard) Consider a two-factor experiment with cell means µ ij decomposed as µ ij µ + α i + β j Notice that there are no interaction eects i.e. α β) ij for all ij). Suppose that µ, α 1,..., α a ), β 1,..., β b ) and µ, ᾱ 1,..., ᾱ a ), β1,..., β b ) satisfy for all ij and α i µ + α i + β j µ + ᾱ i + β j 1) ᾱ i β j β j 2) Show that µ µ, α i ᾱ i, and β j β j This shows that the parameters µ, α 1,..., α a ), β 1,..., β b ) are uniquely determined). Hint 1 First, show that µ µ. To do this, suppose for a contradiction that µ > µ. Then from 1) it must be true that α i + β j < ᾱ i + β j If we sum over all i and j, we have which is a contradiction with 2). Why? Hint 2 Since µ µ, we know from 1) that α i + β j < ᾱ i + β j for all ij. Suppose that α i > ᾱ i for some index i. Since 3) says that it must be true that α i + β j ᾱ i + β j 3) α i + β j ᾱ i + β j β j < β j for all j. However, this is a contradiction with 2). Why? Solution Following Hint 1, we have α i + β j < 5 ᾱ i + β j 4)

However, 2) says that and therefore α i ᾱ i β j β j α i + β j β j + α i bα i + β j }{{} bα i b α i We could also conclude that a b ᾱi + β j, using the same reasoning. Then 4) says that <, a contradiction. Next, following hint 2, we have β j < β j for all j. This is again a contradiction because 2) says that but we concluded in the hint that β j β j β j < β j β j < β j which again implies that <, a contradiction. 3. Consider a two-factor layout. Show that the estimators satisfy ˆµ x ˆα i x i x ˆβ j ) x j x ˆα ˆβ x ij ˆµ ˆα i ˆβ j ij ˆα i ˆβ j in fact, it is also true that a ˆα ˆβ ) and b ˆα ˆβ ), but you don't need to show that ij ij here). 6

Solution We have ˆα i x i x ) x i x i a x 1 bn x 1 bn k1 k1 x ijk a x ijk 1 bn 1 abn k1 k1 x ijk x ijk Similarly, ˆβ j 2 Old statistics material 1 an x j x ) x j x j b x x ) 1 x ijk b an k1 k1 x ijk 1 an 1 abn k1 k1 x ijk 1. Find a maximum likelihood estimator for the parameter p in a Bernoulli random variable, letting A be the number of successes and B n A the number of failures. Solution Let x 1,..., x n represent a collection of samples. We write the likelihood function L x 1,..., x n ; p) p A 1 p) B l log L A log p + B log 1 p) dl A dp p B 1 p p A/ A + B) A/n 2. Find a maximum likelihood estimator for the parameter p in a geometric random variable. Is this the same estimator that one would obtain with the method of moments? x ijk 7

Solution Let x 1,..., x n represent a collection of samples. We write the likelihood function [ ] [ ] L x 1,..., x n ; p) 1 p) x1 1 p 1 p) xn 1 p 1 p) x1+ +xn n p n l log L x 1 + + x n n) log 1 p) + n log p dl n dp p x 1 + + x n n 1 p n p 1/ x x 1 + + x n which is indeed the same estimator as the method of moments would give. 3. During two consecutive seasons in the NBA, Larry Bird shot a pair of free throws on 338 occasions. On 251 occasions he made both shots; on 34 occasions he made the rst shot but missed the second one; on 48 occasions he missed the rst shot but made the second one; on 5 occasions he missed both shots. a) Use these data to test the hypothesis that Bird's probability of making the rst shot is equal to his probability of making the second shot. Solution We'll model this as a two-population hypothesis test, using the method of paired samples. Let x 1,..., x 338 denote the set of all rst shots that Bird made, and let y 1,..., y 338 denote the set of all second shots that he made. We want to test the null hypothesis H : µ, where µ E Z) with z i x i y i. Notice that, given the data, we know that we have z i for 251 + 5 256 occasions the occasions when he made both or missed both), z i 1 for 48 occasions, and z i 1 for 34 occasions. Hence, we nd that We nd that S 2 Our t-statistic is ˆµ z 256 + 48 1) + 34 1 338.414 338 z i z) 2 256 z)2 + 48 1 z) 2 + 34 1 z) 2.2416 337 337 n z µ ) t s 338.414).2416 1.55 Using the 1% signicance level, we see that t.5,338 1.645. Since 1.55 < 1.645, the hypothesis is plausible. b) Use these data to test the hypothesis that Bird's probability of making the second shot is the same regardless of whether he made or missed the rst one. Solution We'll again model this as a two-population hypothesis test, but this time we can't use paired samples. The two populations we're comparing are: a) The set of second shots, after a successful rst shot population A) b) The set of second shots, after an unsuccessful rst shot population B) There are 251 + 34 285 occasions in which Bird successfully made his rst shot, and 48 + 5 53 occasions in which he missed his rst shot. We'll test the hypothesis H : µ A µ B. Let the members of population A be denoted by x 1,..., x 285, where x i is if Bird missed his second shot and x i is 1 if he made his second shot. and let the members of population B be denoted by y 1,..., y 53, with similar denitions for y i. Since Bird made his second shot on 251 of the 285 occasions that he made his rst shot, we have x 251/285.887 Similarly, since Bird made his second shot on 48 of the 53 occasions that he missed his rst shot, we have ȳ 48/53.95 8

So, we have x and ȳ; we just need the variances S 2 x and S 2 y, and we'll be all set. We have Our t-statistic is S 2 x S 2 y 285 x i x) 2 284 251 1.887)2 + 34.887) 2 284 53 y i ȳ) 2 48 1.95)2 + 5.95) 2.871 52 52 x ȳ t Sx 2 n + S2 y m.887.95.154 285 +.871 53.5416.154 Using the 1% signicance level, we see that t.5,52 1.677. Since.5416 < 1.677, the hypothesis is accepted. Hint Each shot is a Bernoulli random variable, with X indicating a miss and X 1 indicating a basket. The rst question asks you to test the hypothesis H : µ 1 µ 2, where µ 1 is the probability of success of the rst shot and µ 2 is the probability of success of the second shot. The second question asks you to test the hypothesis H : µ 1 µ 2, where µ 1 is the probability of success of the second shot when the rst shot was a miss, and µ 2 is the probability of success of of the second shot when the rst shot was a success. 4. In a certain chemical process, it is very important that a particular solution that is to be used as a reactant have a ph of exactly 8.2. Suppose 1 independent measurements yielded the following ph values: 8.18, 8.17, 8.16, 8.15, 8.17, 8.21, 8.22, 8.16, 8.19, 8.18 a) What conclusion can be drawn at the α.1 level of signicance? Solution We want to test the null hypothesis H : µ 8.2. We nd that x 8.179. The sample variance is S 2.49889, so s.223. The t-statistic is n x µ ) 1 8.179 8.2) t 2.9779 s.223 Since t.5,9 1.833, we nd that t > t.5,9, so the hypothesis is rejected. b) What about at the α.5 level of signicance? Solution Since t.25,9 2.262, we nd that t > t.25,9, so we can still reject the hypothesis. 5. A certain type of bipolar transistor has a mean value of current gain that is at least 21. A sample of these transistors is tested. If the sample mean value of current gain is 2 with a sample standard deviation of 35, would the claim be rejected at the 5 percent level of signicance if a) the sample size is 25? Solution We want to test the null hypothesis H : µ 21. If the sample size is 25 and s 35, the t-statistic is n x µ ) 25 2 21) t 1.426 s 35 Since t.5,24 1.711, we nd that t > 1.711, so we can accept the null hypothesis. b) the sample size is 64? Solution We want to test the null hypothesis H : µ 21. If the sample size is 64 and s 35, the t-statistic is n x µ ) 64 2 21) t 2.2857 s 35 Since t.5,64 1.671, we nd that t < 1.671, so we should reject the null hypothesis. 9

3 Probability Questions 1. The density function of X is given by f x) { a + bx 2 x 1 otherwise If E X) 3/5, nd a and b. Solution We have ˆ 1 a + bx 2 dx 1 [ ax + b ] 1 3 x3 1 a + b/3 1 Next, we also have ˆ 1 x a + bx 2) dx 3/5 [ a 2 x2 + b ] 1 4 x4 3/5 a/2 + b/4 3/5 Solving simultaneously for a and b, we have a 3/5, b 6/5. 2. The lifetime in hours of electronic tubes is a random variable having a probability density function given by Compute E X). Solution We have Integrating by parts, we nd that ˆ and therefore a 2 ˆ f x) a 2 xe ax, x E X) ˆ x a 2 xe ax) dx ˆ a 2 x 2 e ax dx x 2 e ax ax ax ax + 2) 2 dx e a 3 [ x 2 e ax dx a 2 ax ax ax + 2) 2 e 3. Consider a sequence of independent uniform random variables X i U, 1) a 3 ] 2 a a) Let Write the c.d.f. and p.d.f. of X. X max {X 1,..., X n } 1

Solution The c.d.f. for X is F x) Pr X x) Pr X 1,..., X n x) For any particular X i, the probability that X i x is precisely x. Therefore, F x) x n, so f x) nx n 1. b) Compute E X). Solution We have E X) ˆ 1 n n + 1 x nx n 1) dx [ x n+1 ] 1 n n + 1 4. The annual rainfall in Cincinnati is normally distributed with mean 4.14 inches and standard deviation 8.7 inches. a) What is the probability this year's rainfall will exceed 42 inches? Solution Let X denote the annual rainfall in Cincinnati. We have Pr X > 42) ) X 4.14 42 4.14 Pr > 8.7 8.7 Pr Z >.2138) 1 Φ.2138).4154 where, as usual, Z N, 1). b) What is the probability that the sum of the next 2 years' rainfall will exceed 84 inches? Solution Let X 1 denote the rainfall next year, and X 2 the rainfall the year after that. Then X 1, X 2 N 4.14, 8.7 2) and therefore the sum X X 1 + X 2 satises X N 8.28, 8.7 2 + 8.7 2). We have Pr X > 84) ) X 8.28 84 8.28 Pr > 8.72 + 8.72 8.72 + 8.7 2 Pr Z >.323) 1 Φ.323).3812 c) What is the probability that the sum of the next 3 years' rainfall will exceed 126 inches? Solution Let X 1 denote the rainfall next year, X 2 the rainfall the year after that, and X 3 the rainfall the year after that. Then X 1, X 2, X 3 N 4.14, 8.7 2) and therefore the sum X X 1 + X 2 + X 3 satises X N 12.42, 8.7 2 + 8.7 2 + 8.7 2). We have ) X 12.42 Pr X > 12) Pr 8.72 + 8.7 2 + 8.7 > 126 12.42 2 8.72 + 8.7 2 + 8.7 2 Pr Z >.373) 1 Φ.373).373 d) For parts b) and c), what independence assumptions are you making? Solution We're assuming that X 1, X 2, and X 3 are all independent; this assumption is necessary to justify the statement that Var X 1 + X 2 ) Var X 1 ) + Var X 2 ) for example. 11