(Elementary) Regression Methods & Computational Statistics ( ) Part IV: Hypothesis Testing and Confidence Intervals (cont.)

Size: px

Start display at page:

Download "(Elementary) Regression Methods & Computational Statistics ( ) Part IV: Hypothesis Testing and Confidence Intervals (cont.)"

Phyllis Nichols
5 years ago
Views:

1 (Elementary) Regression Methods & Computational Statistics (405.95) Part IV: Hypothesis Testing and Confidence Intervals (cont.) Assoz. Prof. Dr. Arbeitsgruppe Stochastik/Statistik Fachbereich Mathematik Universität Salzburg Salzburg, January 019

2 The classical t confidence interval for µ D = µ x µ y We again return to the two-sided t-test. Suppose that X N (µ x, σ ), we do not know µ x and σ. Suppose that Y N (µ y, σ ), we do not know µ y and σ. Notice that the variance of X and Y is the same (and unknown). Given a sample X 1,..., X n from X and a sample Y 1,..., Y m from Y with m, n we now want to calculate a confidence interval for the parameter µ D := µ x µ y. Remember that when testing for H 0 : µ D = 0 R also returned a 95%-confidence interval

3 The classical t confidence interval for µ D = µ x µ y 1 mux < muy < 0 sigmax < 1 ; sigmay < 3 n < x < rnorm ( n, mean=muy, sd=sigmax ) 5 y < rnorm ( n, mean=muy, sd=sigmay ) 6 7 t e s t < t. t e s t ( x, y, p a i r e d=false, a l t e r n a t i v e= two. s i d e d ) 8 t e s t yields 1 Welch Two Sample t t e s t 3 data : x and y 4 t = , d f = , p v a l u e = a l t e r n a t i v e h y p o t h e s i s : t r u e d i f f e r e n c e i n means i s not e q u a l to p e r c e n t c o n f i d e n c e i n t e r v a l : sample e s t i m a t e s : 9 mean o f x mean o f y

4 The classical t confidence interval for µ D = µ x µ y How is this 95%-confidence interval calculated? We know that S n,m given by S n,m = follows a t n+m -distribution. A a consequence P ( [ S n,m X n Y m (µx µy ), S 1 p n + 1 m t n+m ; α, t n+m ;1 α ]) = 1 α. Based on this we can easily derive the following confidence interval Cn,m 1 α with coverage probability 1 α: Cn,m 1 α (X 1,..., X n, Y 1,..., Y m) = Cn,m 1 α = with = t n+m ;1 α S p 1 n + 1 m. [ ] X n Y m, X n Y m +

5 The classical t confidence interval for µ D = µ x µ y 1 # t t e s t f o r H0 : mud=mux muy=0 mux < 0 3 muy < 0. 4 sigmax < sigmay < 1 5 n < m < 50 6 x < rnorm ( n, mean=mux, sd=sigmax ) 7 y < rnorm (m, mean=muy, sd=sigmay ) 10 8 t e s t < t. t e s t ( x, y, p a i r e d=false, a l t e r n a t i v e= two. s i d e d, v a r. e q u a l=true) 9 t e s t 11 #c o n f i d e n c e i n t e r v a l f o r mud m a n u a l l y 1 a l p h a < sp < ( ( n 1) v a r ( x )+(m 1) v a r ( y ) ) / ( n+m ) 14 D e l t a < qt ( p=1 a l p h a /, d f=n+m ) s q r t ( sp (1 /n+1/m) ) 15 c o n f. i n t < c ( mean ( x ) mean ( y ) Delta, mean ( x ) mean ( y ) + D e l t a ) 16 t e s t $ c o n f. i n t [ 1 : ] 17 c o n f. i n t yields 1 [ 1 ] [ 1 ]

6 The classical t confidence interval for µ D = µ x µ y Check if the confidence interval does what it should. 1 R < e r r o r < r e p ( 0,R) 3 CI < data. frame ( l o w e r=r e p ( 0,R), upper=r e p ( 0,R) ) 4 f o r ( i i n 1 :R) { 5 mux < 0 6 muy < 0. 7 sigmax < sigmay < 1 8 n < m < 50 9 x < rnorm ( n, mean=mux, sd=sigmax ) 10 y < rnorm (m, mean=muy, sd=sigmay ) 11 t e s t < t. t e s t ( x, y, p a i r e d=false, a l t e r n a t i v e= two. s i d e d, v a r. e q u a l=true) 1 CI [ i, ] < t e s t $ c o n f. i n t [ 1 : ] 13 } CI $ c o n t a i n e d < i f e l s e ( CI $ lower<= mux muy & CI $ upper>= mux muy, 1, 0 ) 16 c o v e r a g e < mean ( CI $ c o n t a i n e d ) 17 c o v e r a g e 18 [ 1 ]

7 The classical t confidence interval for µ D = µ x µ y What happens if we change the values of µ x and µ y? What happens if we change n and m? How is the hypothesis test for H 0 : µ D = 0 vs. the two-sided alternative related with the confidence interval? Answer: We reject H 0 if and only if 0 Cn,m 1 α, i.e. if the confidence interval does not contain 0. Exercise 39: Confirm the just-stated answer by simulations and proceed as follows: Choose some some values for µ x and µ y and simulate samples of X and Y. Apply the two-sided t-test and save the p-value as well as the confidence interval. Repeat the two steps R = times and verify if in all R case we have that the p-value is less than 0.05 if and only if 0 C 1 α n,m.

8 The bootstrap confidence interval for µ D = µ x µ y Suppose that x 1,..., x n is a sample from X and that y 1,..., y m is a sample from Y. We repeat the following steps R times: Randomly draw n values from x 1,..., x n and m values from y 1,..., y m with (!) replacement The resulting samples x1,..., xn, y1,..., ym are called bootstrap samples or bootstrap replications. Calculate xn y m and save this value. Let d1,..., d R denote the resulting values (i.e. the differences of the means of the boostrap samples). The boostrap confidence interval Cn,m,1 α is then defined as the interval formed by the α -quantile and the (1 α )-quantile of the sample d 1,..., d R, i.e. [ Cn,m,1 α = Let s check the details in R. (F d ) ( α ) (, (Fd ) 1 α )]

9 The bootstrap confidence interval for µ D = µ x µ y 1 mux < 0 muy < 0. 3 sigmax < sigmay < 1 4 n < m < 50 5 x < rnorm ( n, mean=mux, sd=sigmax ) 6 y < rnorm (m, mean=muy, sd=sigmay ) 7 t e s t < t. t e s t ( x, y, p a i r e d=false, a l t e r n a t i v e= two. s i d e d, v a r. e q u a l=true) #j u s t 8 t e s t $ c o n f. i n t [ 1 : ] 9 10 boot. d i f f < r e p ( 0,R) 11 f o r ( i i n 1 :R) { 1 x. boot < sample ( x, s i z e = n, r e p l a c e = TRUE) 13 y. boot < sample ( y, s i z e = m, r e p l a c e = TRUE) 14 boot. d i f f [ i ] < mean ( x. boot ) mean ( y. boot ) 15 } 16 c i. boot < as. numeric ( q u a n t i l e ( boot. d i f f, p r o b s = c ( a l p h a /,1 a l p h a / ) ) ) 17 #compare t h e two i n t e r v a l s 18 t e s t $ c o n f. i n t [ 1 : ] 19 c i. boot yields (lucky coincidence?) 1 [ 1 ] [ 1 ]

10 The bootstrap confidence interval for µ D = µ x µ y 1 #s y s t e m a t i c a l comparison o f t h e two C I s o u t e r. R < R e s u l t s < data. frame ( l o w e r. t=r e p ( 0, o u t e r. R), l o w e r. boot=r e p ( 0, o u t e r. R), upper. t=r e p ( 0, o u t e r. R), upper. boot=r e p ( 0, o u t e r. R) ) 4 f o r ( k i n 1 : o u t e r. R) { 5 mux < 0 ; muy < 0. 6 sigmax < sigmay < 1 7 n < m < 50 8 x < rnorm ( n, mean=mux, sd=sigmax ) 9 y < rnorm (m, mean=muy, sd=sigmay ) 10 t e s t < t. t e s t ( x, y, p a i r e d=false, a l t e r n a t i v e= two. s i d e d, v a r. e q u a l=true) #j u s t 11 R e s u l t s [ k, c ( 1, 3 ) ] < t e s t $ c o n f. i n t [ 1 : ] 1 13 R < ; boot. d i f f < r e p ( 0,R) 14 f o r ( i i n 1 :R) { 15 x. boot < sample ( x, s i z e = n, r e p l a c e = TRUE) 16 y. boot < sample ( y, s i z e = m, r e p l a c e = TRUE) 17 boot. d i f f [ i ] < mean ( x. boot ) mean ( y. boot ) 18 } 19 R e s u l t s [ k, c (, 4 ) ] < as. numeric ( q u a n t i l e ( boot. d i f f, p r o b s = c ( a l p h a /,1 a l p h a / ) ) ) 0 }

11 The bootstrap confidence interval for µ D = µ x µ y type ci.boot ci.t run

12 Exercises Exercise 40: Fix µ R and σ > 0. Generate a sample X 1,..., X n from X N (µ, σ ). Calculate a bootstrap confidence-interval Cn 1 α the sample. for the parameter µ based on Use the t-test to get an exact confidence interval and compare the interval with the bootstrap interval. Repeat the previous steps to get a more systematic picture of the performance of the bootstrap confidence interval.

13 Exercises Exercise 41: Fix µ R and σ > 0. Generate a sample X 1,..., X n from X N (µ, σ ). Calculate a bootstrap confidence-interval Cn 1 α the sample. Compare the exact confidence interval [ ] (n 1)Sn I = χ, (n 1)S n n 1;1 α χ n 1; α and compare the interval with the bootstrap interval. for the parameter σ based on Repeat the previous steps to get a more systematic picture of the performance of the bootstrap confidence interval.

14 Exercises Exercise 4: We have already mentioned the correspondence between two-sided hypothesis tests and confidence intervals. Return to the situation discussed in the slides (confidence interval for µ D = µ x µ y ) and use the boostrap confidence interval to derive a boostrap hypothesis test. Evaluate the performance of the test via simulations.

15 Exercises Exercise Fix λ > 0. Generate a sample X 1,..., X n from X E(λ) (exponential distribution). Calculate a bootstrap confidence-interval Cn 1 α the sample. for the parameter λ based on Evaluate the performance of the bootstrap confidence interval via simulations and compare the interval with an exact confidence interval (as derived in the UV Angewandte Statistik ).

Confidence intervals for kernel density estimation

Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting