1 Hypothesis test of a mean vector

Size: px

Start display at page:

Download "1 Hypothesis test of a mean vector"

Quentin Ramsey
6 years ago
Views:

1 THE UNIVERSITY OF CHICAGO Booth School of Busiess Busiess 41912, Sprig Quarter 2010, Mr Ruey S Tsay Lecture: Iferece about sample mea Key cocepts: 1 Hotellig s T 2 test 2 Likelihood ratio test 3 Various cofidece regios 4 Applicatios 5 Missig values 6 Impact of serial correlatios 1 Hypothesis test of a mea vector Let x 1,, x be a radom sample from a p-dimesioal ormal populatio with mea µ ad positive-defiite variace-covariace matrix Σ Cosider the testig problem: H o : µ = µ o versus H a : µ µ o, where µ o is a kow vector This is a geeralizatio of the oe-sample t test of the uivariate case Whe p = 1, the test statistic is t = x µ o s/, with s2 = 1 (x i x) 2 1 The statistic follows a Studet-t distributio with 1 degrees of freedom Oe rejects H o if the p-value of t is less tha the type-i error deoted by α For geeralizatio, we rewrite the test as ( ) s t = ( x µ o ) ( x µ o ) Oe rejects H o if ad oly if t 2 t 2 1(α/2), the upper 100(α/2) percetile of t-distributio with 1 degrees of freedom A atural geeralizatio of this test statistic is T 2 = ( x µ o ) ( S ) 1 ( x µ o ), (1) where S = 1 (x 1 i x)(x i µ o ) is the sample covariace matrix

2 This is the Hotellig s T 2 statistic It is distributed as ( 1)p F p p, p, where F u,v deotes the F -distributio with degrees of freedom u ad v Recall that [t 1 (α/2)] 2 = F 1, 1 (α), where F u,v (α) is the upper 100α percetile of the F -distributio with u ad v degrees of freedom Thus, whe p = 1, T 2 reduces to the usual oe-sample t statistic Example 1 Cosider the mothly log returs of Boeig (BA), Abbott Labs (ABT), Motorola (MOT) ad Geeral Motors (GM) from Jauary 1998 to December 2007 The log returs are i percetages Let µ o = 0 Test the ull hypothesis that the mea vector of the log returs is zero Aswer: Except for three possible outlyig observatios, the chi-square QQ-plot idicates that the ormal assumptio is reasoable See the QQ-plot i Figure 1 > setwd("c:/users/rst/teachig/ama") > x=readtable("m-ba4c9807txt") > head(x) V1 V2 V3 V > source("qqchi2r") <== Check ormality > qqchi2(x) [1] "correlatio coefficiet:" [1] ## Summary statistics > xave=mea(x) > xave V1 V2 V3 V > sqrt(apply(x,2,var)) V1 V2 V3 V > S=var(x) > S V1 V2 V3 V4 V V V

3 V > Si=solve(S) <== Fid the iverse of S > T2=row(x)*t(xave)%*%Si%*%xave > T2 [,1] [1,] ### Results of margial tests > ttest(x[,1]) Oe Sample t-test data: x[, 1] t = , df = 119, p-value = alterative hypothesis: true mea is ot equal to 0 95 percet cofidece iterval: sample estimates: mea of x > ttest(x[,2]) Oe Sample t-test data: x[, 2] t = 07465, df = 119, p-value = alterative hypothesis: true mea is ot equal to 0 95 percet cofidece iterval: sample estimates: mea of x > ttest(x[,3]) Oe Sample t-test data: x[, 3] t = 11524, df = 119, p-value =

4 d Quatile of chi square Figure 1: Chi-square QQ plot for mothly log returs of 4 stocks alterative hypothesis: true mea is ot equal to 0 95 percet cofidece iterval: sample estimates: mea of x Example 2 Cosider the Perspiratio data of 20 observatios i Table 51 of the text There are three measuremets, amely sweat rate(x 1 ), Sodium(X 2 ), ad Potassium(X 3 ) Test H o : µ = (4, 50, 10) versus H a : µ (4, 50, 10) > z=readtable("t5-1dat") > dim(z) [1] 20 3 > colames(z) <- c("rate","sodium","potassium") > qqchi2(z) [1] "correlatio coefficiet:" <== Normality seems reasoable 4

5 [1] > source("hotelligr") > Hotellig(z,c(4,50,10)) [,1] Hoteliig-T pvalue <== Caot reject the ull hypo at the 5% level Remark: Let y = Cx + d, where C is a p p o-sigular matrix ad d is a p-dimesioal vector The, testig H o : µ = µ o of x is equivalet to testig H o : µ y = Cµ o + d of y It turs out that, as expected, the Hotellig T 2 statistic is idetical Proof: ȳ = C x + d ad S y = CSC T 2 = (ȳ µ yo ) S 1 y (ȳ µ yo ) = (C x + d Cµ o d) (CSC ) 1 (C x + d Cµ o d) = [C( x µ o )] C 1 S 1 C 1 [C( x µ o )] = ( x µ o ) S 1 ( x µ o ) 2 Likelihood ratio test Key result: Hotellig T 2 test statistic is equivalet to likelihood ratio test statistic Geeral theory: Let θ be the parameter vector of a likelihood fuctio L(θ) with observatios x 1,, x Suppose that the parameter space is Θ Cosider the ull hypothesis θ Θ o versus H a : θ Θ o, where Θ o is a subspace of Θ The likelihood ratio statistic is Λ = max θ Θ o L(θ) (2) max θ Θ L(θ) Oe rejects H o if Λ < c, where c is some critical value depedig o the type-i error Ituitively, oe rejects H o if the maximized likelihood fuctio over the subspace Θ o is much smaller tha that over the parameter space Θ, idicatig that it is ulikely for θ to be i Θ o Specifically, uder the hull hypothesis H o, the likelihood ratio statistic where v = dim(θ) ad v o = dim(θ o ) 2 l(λ) χ 2 v v o, as, Remark: Θ o Θ idicates that the ull model is ested i the alterative model For o-ested models, likelihood ratio test does ot apply [See some recet papers o the geeralized likelihood ratio test] Normal case: For multivariate ormal distributio, the likelihood ratio statistic is relatively simple because the limitig distributio of Λ is available Specifically, cosider H o : µ = µ o 5

6 versus H a : µ µ o Here θ = (µ, σ 11, σ 21, σ 22,, σ p1,, σ pp ) is of dimesio p+p(p+1)/2 Uder H o, µ is fixed at µ o so that Θ o cosists of the space for the elemets of Σ I this case, Λ = max Σ L(µ o, Σ) max µ,σ L(µ, Σ) Recall that, uder Θ, max µ,σ L(µ, Σ) = L( µ, Σ) = 1 (2π) p/2 Σ /2 e p/2, where µ = x = 1 x i ad Σ = 1 (x i x)(x i x) Uder Θ o, the likelihood fuctio becomes [ 1 L(µ o, Σ) = exp 1 ] (x (2π) p/2 Σ /2 i µ 2 o ) Σ 1 (x i µ o ) By the same method as before, we ca show that max Σ L(µ o, Σ) = L(µ o, Σ o ) = where Σ o = 1 (x i µ o )(x i µ o ) Cosequetly, Λ = ( Σ ) /2 Σ o The statistic Λ 2/ = Σ / Σ o is called the Wilks lambda 1 (2π) p/2 Σ o /2 e p/2, Result 51 Let X 1,, X be a radom sample from a N p (µ, Σ) populatio Cosider the hypothesis testig: H o : µ = µ o versus H a : µ µ o The, the likelihood ratio statistic becomes ( Λ 2/ = 1 + T 2 ) 1, 1 where T 2 is the Hotellig s T 2 statistic Proof The followig equality of determiats is useful ( 1) (x i x)(x i x) + ( x µ o )( x µ o ) ( ) = (x i x)(x i x) 1 ( x µ o ) 1 (x x)(x i x) ( x µ o ) This follows directly from the idetity A = A 22 A 11 A 12 A 1 22 A 21 = A 11 A 22 A 21 A 1 11 A 12 with [ (x A = i x)(x i x) ] [ ] ( x µo ) A11 A ( x µo ) = 12 1 A 21 A 22 6

7 Note also that Therefore, That is, Cosequetly, (x i µ o )(x i µ o ) = (x i x)(x i x) + ( x µ o )( x µ o ) ( ( 1) (x i µ o )(x i µ o ) = (x x)(x i x) ( 1) 1 + T 2 ) 1 ( Σ o = Σ 1 + T 2 ) 1 Λ 2/ = Σ ( Σ o = 1 + T 2 ) 1 1 Remark: The prior result suggests that there is o eed to calculate S 1 i computig the Hotellig s T 2 statistic Ideed, 3 Cofidece regio T 2 = ( 1) Σ o Σ ( 1) Let θ be the parameter vector of iterest ad Θ be the paramter space For a radom sample X 1,, X, let X deote the data A 100(1 α)% cofidece regio R(X) is defied as P r[θ R(X)] = 1 α, where the probability is evaluated at the ukow true parameter θ That is, the regio R(X) will cover θ with probability 1 α For the mea vector µ of a multivariate ormal distributio, the cofidece regio (CR) ca be obtaied by the result T 2 ( 1)p p F p, p That is, P r [ ( X µ) S 1 ( X µ) ] ( 1)p p F p, p(α) = 1 α, whatever the values of µ ad Σ, where S = 1 (X 1 i X)(X i X) I other words, if we measure the distace usig the variace-covariace matrix 1 S, the X will be withi [( 1)pF p, p (α)/( p)] 1/2 of µ 7

8 Remark: The quatity ( X µ) (S/) 1 ( X µ) ca be cosidered as the Mahalaobis distace of µ from X, because the covariace matrix of X is 1 Σ, which is cosistetly estimated by 1 S Compared with the Euclidea distace, Mahalaobis distace takes ito cosideratio the covariace structure It is a distace measure based o correlatios betwee variables For ormal distributio, CR ca be viewed as ellipsoids cetered at µ ad have axes determied by the eigevalues ad eigevectors of S For N p (µ, Σ) distributio, the cotours of costat desity are ellipsoids defied by x such that (x µ) Σ 1 (x µ) = c 2 These ellipsoids are cetered at µ ad have axes ±c λ i e i, where Σe i = λ i e i for i = 1,, p Cosequetly, the CR are cetered at x ad the axes of the cofidece ellipsoid are ± λ ( 1)p i ( p) F p, p(α)e i, where Se i = λ i e i for i = 1,, p Simultaeous cofidece itervals: Suppose that X N p (µ, Σ) For ay o-zero p-dimesioal vector a, Z = a X N(a µ, a Σa) If x 1,, x are a radom sample from X, {z i = a x i } is a radom sample of Z The sample mea ad variace are z = a x ad s 2 z = a Sa, where S is the sample covariace matrix of x i If a is fixed, the 100(1 α)% cofidece iterval of a µ is a x t 1 (α/2) a Sa a µ a x + t 1 (α/2) a Sa This iterval cosists of a µ values for which (a x a µ) a Sa t 1(α/2), or, equivaletly, [a ( x µ)] 2 t 2 a Sa 1(α/2) If we cosider all values of a for which the prior iequality holds, the we are aturally led to the determiatio of [a ( x µ)] 2 max a a Sa Usig the Maximizatio Lemma (see Eq (250) of the text, p 80) with x = a, d = x µ ad B = S, we obtai [a ( x µ)] 2 [ [a ( x µ)] 2 ] max = max = ( x µ) a a Sa a S 1 ( x µ) = T 2, a Sa 8

9 with the maximum occurrig for a proportioal to S 1 ( x µ) Notig that T 2 F p, p, we have the followig result ( 1)p p Result 53 Let X 1,, X be a radom sample from a N p (µ, Σ) populatio with Σ beig positive defiite The, simultaeously for all a, the iterval a X p( 1) ( p) F p, p(α)a Sa, a X + p( 1) ( p) F p, p(α)a Sa will cotai a µ with probability 1 α For coveiece, we refer to the simultaeous cofidece itervals of Result 53 as T 2 - itervals I particular, the choices of a = (1, 0,, 0), (0, 1, 0,, 0),, (0,, 0, 1) allow us to coclude that x i p( 1) ( p) F sii p, p(α) µ i x i + p( 1) ( p) F sii p, p(α), hold simultaeously with cofidece coefficiet 1 α for all i = 1,, p Furthermore, by choosig a = (0,, 0, 1, 0,, 0, 1, 0,, 0) with 1 at the ith positio ad 1 at the kth positio, where i k, we have a µ = µ i µ k ad a Sa = s ii 2s ik + s kk, ad the T 2 -itervals x i x k p( 1) ( p) F sii 2s ik + s kk p, p(α) µ i µ k x i x k + p( 1) ( p) F sii 2s ik + s kk p, p(α) Compariso A alterative approach to costruct cofidece itervals for the compoet meas is to cosider the compoet oe at a time This approach igores the covariace structure of the variables ad leads to the itervals x i t 1 (α/2) sii µ i x i + t 1 (α/2) for i = 1,, p Here each iterval has a cofidece coefficiet 1 α However, we do ot kow the cofidece coefficiet that all itervals cotai their respective µ i, except for the case that the variables X i are idepedet I the latter case, the cofidece coefficiet is (1 α) p Obviously, the T 2 -iterval is wider tha the idividual iterval for each µ i The table below gives some comparisos of critical multipliers for oe-at-a-time t-itervals ad the T 2 itervals for selected ad p whe the coverage probability is (1 α) = 095 sii, 9

10 Table 1: Critical multipliers for oe-at-a-time t-itervals ad the T 2 -itervals for selected sample size ad dimesio p, where (1 α) = 095 p( 1) p p, p(005) t 1 (0025) p = 3 p = 5 p = Boferroi method for multiple comparisos I some applicatios, the T 2 itervals might be too wide to be of practical value ad oe may oly cocer about a fiite umber of liear combiatios, say a 1,, a m I this case, we may cotruct m cofidece itervals that are shorter tha the T 2 itervals This method to multiple comparisos is called the Boferroi method Let C i deote a cofidece statemet about the value of a iµ with P r[c i true ] = 1 α i, where i = 1,, m The, P r[all C i true] = 1 P r[at least oe C i false] m m 1 P r[c i false] = 1 [1 P r(c i true)] = 1 (α α m ) This is a special case of the Boferroi iequality ad allows us to cotrol the overall type-i error rate α α m The choices of α i make the method flexible, depedig o the prior kowledge o the importace of each iterval If there is o prior iformatio, the α i = α/m is ofte used Let z i = a ix i ad s zi = a isa i The, each of the itervals szi z i ± t 1 (α/(2m)), i = 1,, m, cotais a iµ with probability 1 α/m Joitly, all a iµ are i their respective itervals with probability p 1 m(α/m) = 1 α For istace, if m = p ad a i is the ith uit vector, the the itervals sii x i t 1 (α/(2p)) µ sii i x i + t 1 (α/(2p)) hold with probability at least 1 α We refer to these itervals as the Boferroi itervals 10

11 Table 2: Ratio of the legths betwee Boferroi iterval ad T 2 -iterval for selected sample size ad dimesio p whe the probability of the iterval is 095 p I geeral, if we use α i = α/p, the we ca compare the legth of Boferroi itervals with those of the T 2 itervals The ratio is Legth of Boferroi iterval Legth of T 2 iterval = t 1(α/(2p)) p( 1) p F p, p(α) This ratio does ot deped o the sample mea or covariace matrix Table 2 gives the prior ratio for some selected ad p See also, Table 54 of the text( page 234) 4 Large sample case Whe the sample size is large, we ca apply the cetral limit theory so that the populatio may ot be ormal Recall that, whe is large, ( X µ) S 1 ( X µ) χ 2 p Thus, P r[( X µ) S 1 ( X µ) χ 2 p(α)] 1 α, where χ 2 p(α) is the upper 100α percetile of a chi-square distributio with p degrees of freedom This property ca be used (a) to costruct asymptotic cofidece itervals for the meas µ i ad (b) to perform hypothesis testig about µ Asymptotic cofidece itervals: Let X 1,, X be a radom sample from a populatio with mea µ ad positive covariace matrix Σ If p is sufficietly large, a X ± χ 2 p(α) a Sa will cotai a µ, for every a, with probability approximately 1 α Sice the chi-square distributio does ot deped o a, the above cofidece itervals are simultaeous cofidece itervals This result ca be used to compare meas of differet compoets of X 11

12 Asymptotic testig: Let X 1,, X be a radom sample from a populatio with mea µ ad positive covariace matrix Σ Whe p is sufficietly large, the hypothesis H o : µ = µ o is rejected i favor of H a : µ µ o, at the sigificace level α if ( x µ o ) S 1 ( x µ o ) > χ 2 p(α) Remark: A simple R fuctio is available o the course web that computes various types of cofidece itervals for the meas of the compoets of X To use the program, do the followig: (a) Dowload the program ito your workig directory, (2) use the commad source( r-cregiotxt ) to compile the fuctio The commad cofreg is ready to use You ca modify the fuctio to costruct cofidece regios for differeces betwee meas of compoets of X Example: Cosider the mothly log returs of four Midwest compaies used before The various cofidece itervals are obtaied as follows: > setwd("c:/users/rst/teachig/ama") > x=readtable("m-ba4c9807txt") > source("cregior") > cofreg(x) [1] "CR based o T^2" [,1] [,2] [1,] [2,] [3,] [4,] [1] "CR based o idividual t" [,1] [,2] [1,] [2,] [3,] [4,] [1] "CR based o Boferroi" [,1] [,2] [1,] [2,] [3,] [4,] [1] "Asymp simu CR" [,1] [,2] [1,] [2,] [3,] [4,] > 12

13 Figure 2: T-sq chart for the mothly log returs of four Midwest compaies: The limits are 95% ad 99% quatiles t Multivariate cotrol charts We discuss two cases I the first case, the goal is to idetify uusual observatios i a sample, icludig the possibility of drift over time This is called the T 2 -chart, which uses the distace d i discussed before i assessig ormality assumptio For the ith observatio, the d 2 i statistic is d 2 i = (x i x) S 1 (x i x) The upper cotrol limit is the set by the upper quatiles of χ 2 p Typically, 95th ad 99th percetiles are used to set the upper cotrol limit ad the lower cotrol limit is zero Oce a poit is foud to be outside the cotrol limit, the idividual cofidece itervals for the compoet meas ca be used to idetify the source of the deviatio As a illustratio, cosider the mothly log returs of the four Midwest compaies from 1998 to 2007 The T 2 -chart is give i Figure 2 About three data poits are outside of the 99% limit, idicatig a volatility period I the secod case, we cosider a cotrol chart for future obsevratios The theory behid this type of cotrol chart is the followig result Result 56 Let X 1,, X be idepedetly distributed as N p (µ, Σ) ad let X be a 13

14 future observatio from the same populatio The, T 2 = + 1 (X X) S 1 (X X) p( 1) p F p, p, ad a 100(1 α)% p-dimesioal predictio ellipsoid is give by all x satisfyig (x x) S 1 (x x) p(2 1) ( p) F p, p(α) Proof: First, E(X X) = 0 Sice X ad X are idepedet, Cov(X X) = Σ + 1 Σ = + 1 Σ /( + 1)(X X) N p (0, Σ) Furthermore, by the result i Eq (56) of the Thus, textbook, + 1 (X X) S (X X) has a scaled F distributio as stated T 2 -chart for future observatios For each ew observatio, plot T 2 = + 1 (x x) S 1 (x x), agaist time order Set the lower cotrol limit to zero ad the upper cotrol limit as UCL = ( 1)p p F p, p(001) For illustratio, cosider the mothly log retur series of the four Midwest compaies We start with iitial = 40 observatios The T 2 -chart for future observatios is give i Figure 3 Remark: R fuctios for the two cotrol charts are available o the course web 6 Missig values i ormal radom sample A key assumptio: Missig at radom Two methods are available: 1 The EM algorithm: Dempster, Laird ad Rubi (1977, JRSSB) 2 Markov chai Mote Carlo (MCMC) method Some refereces 14

15 Figure 3: T 2 Future Chart for mothly log returs of four Midwest compaies: Future Chart: the limits are 95% ad 99% quatiles T tdx 15

16 1 The EM Alorithm ad Extesios by G J McLachla ad T Krisha (2008), Joh Wiley 2 The EM Algorithm ad Related Statistical Models by M Wataabe ad K Yamaguchi (2003), CRC Press 3 Bayesia Data Aalysis, 2d Ed, by Gelma, Carli, Ster ad Rubi (2003), CRC Press 4 Bayes ad Empirical Bayes Methods for Data Aalysis, 2d Ed, by BP Carli ad TA Louis (2001), CRC Press EM algorithm: Iterate betwee Expectatio step ad Maximizatio step E-step: For each data poit with missig values, use the coditioal distributio X 1 X 2 = x 2 N k (µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21 ), where k = dim(x 1 ) ad the partitio is based o X = (X 1, X 2) Here X 1 deotes the missig compoets ad X 2 the observed compoet M-step: Perform MLE estimatio based o the complete data It is helpful to make use of sufficiet statistics MCMC method: Also makes use of the same coditioal distributio However, istead of usig the expectatio, oe draws a radom sample from the coditioal distributio to fill the missig values Basic Result used i MCMC method: Suppose that x 1,, x form a radom sample from N(µ, Σ), where Σ is kow Suppose that the prior distributio of µ is N(µ o, Σ o ) The the posterior distributio of µ is also multivariate ormal with mea µ ad covariace matrix Σ, where where x = x i / Σ 1 = Σ 1 o + Σ 1, µ = Σ (Σ 1 o µ o + Σ 1 x), 7 Impact of serial correlatios Cosider the case i which X i is serially correlated such as it follows a vector AR(1) model X t µ = Φ(X t 1 µ) + a t, a t iid N p (0, Σ), 16

17 where E(X t ) = µ ad all eigevalues of Φ are less tha 1 i modulus Let X 1,, X be cosecutive observatios of the model Defie the lag-l autocovariace matrix of X t as Γ l = E(X t µ)(x t l µ), l = 0, ±1, ±2, It is easy to see that Γ l = Γ l Also, by repeatedly substitutio, we have Therefore, X t µ = a t + Φa t 1 + Φ 2 a t 2 + Γ 0 = Φ i Σ(Φ i ) For l > 0, post-multiplyig the model by (X t l µ), takig expectatio, ad usig the fact that X t l is ucorrelated with a t, we get Γ l = ΦΓ l 1, l = 1, 2, Takig the traspose of the above equatio, we have Usig Γ l = Γ l, we obtai which is equivalet to Γ l = Γ l 1Φ Γ l = Γ (l 1) Φ, l > 1, Γ l = Γ l+1 Φ, l = 1, 2, For the vector AR(1) model, we ca show the followig properties: 1 E( X) = µ 2 Cov( 1/2 t=1 X t ) p Ω, where Ω = (I Φ) 1 Γ 0 + Γ 0 (I Φ ) 1 Γ 0, as 3 S = 1 t=1 (X 1 t X)(X t X) p Γ 0 as Usig these properties, we ca show that ( X µ) is approximately ormal with mea 0 ad covariace Ω This implies that ( X µ) Ω 1 ( X µ) χ 2 p, ot the usual statistic ( X µ) S 1 ( X µ) See Table 510 of the text (page 257) for the differece i coverage probability due to the effect of AR(1) serial correlatios 17

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio