Likelihood Ratio Tests for High-dimensional Normal Distributions

Size: px
Start display at page:

Download "Likelihood Ratio Tests for High-dimensional Normal Distributions"

Transcription

1 Likelihood Ratio Tests for High-dimensional Normal Distributions A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Fan Yang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy Tiefeng Jiang, Adviser December, 011

2 c Fan Yang 011 ALL RIGHTS RESERVED

3 Acknowledgements Ever since I first began graduate study at the University of Minnesota, I have been extremely interested in probability theory and the challenges and opportunities that this topic presents in statistics, and many other areas of research. During the time that I have been constructing my thesis, I have had the chance to reconsider, revise, and refine many of the original concepts and methodologies foundational to my research. I have also had the opportunity to explore some of these concepts in applied settings, which has helped me to integrate my theoretical exploration in ways that have proven useful and practical. The combined research and writing process has been a rich and enlightening one for me, and over time, I have grown and evolved both in my thinking and in my approach to research. As every doctoral student can attest, completion of a dissertation cannot be accomplished in isolation. It involves the support and contribution of many people, both inside and outside the department. I would first like to thank my advisor, Professor Tiefeng Jiang. I am deeply indebted to him for his support and commitment, his enthusiastic guidance, and his insight throughout the research and writing process. He could not have been more generous with his time and effort in directing me toward the completion of my thesis. As my research has been largely carried out in the School of Statistics, I would therefore like to profusely thank the faculty, staff, and fellow students whose contributions were always tremendously helpful and illuminating to me. I would also like to extend my deepest thanks to Boston Scientific Corporation, to my supervisor Mark Balhorn, and to my many co-workers and colleagues for their extraordinary support. Excellence infuses every aspect of Boston Scientific and my affiliation with this company has helped me to challenge myself in an environment where innovation i

4 and creativity are the very foundation. Finally, I should express my sincere gratitude to my best friend Lauren Pacelli for her tireless effort to review and edit my thesis and for her constant encouragement and motivation throughout my entire graduate study. ii

5 Dedication This thesis is dedicated to my parents, Yongyu Yang and Liuqian Zeng, for their strength and encouragement no matter what path I have chosen, and for their unwavering belief in me. It is also dedicated to my wife Lili Yang who is a great joy in my life and a source of inspiration to me always. iii

6 Abstract For a random sample of size n obtained from p-variate normal distributions, we consider the likelihood ratio tests LRT) for their means and covariance matrices. Most of these test statistics have been extensively studied in the classical multivariate analysis see, e.g., ] and 36]) and their limiting distributions under the null hypothesis were proved to be a Chi-Square distribution under the assumption that n goes to infinity while p remains fixed. In our research, we consider the high-dimensional case where both p and n go to infinity and their ratio p/n y 0, 1]. We prove that the likelihood ratio test statistics under this assumption will converge in distribution to a normal random variable and we also give the explicit forms of its mean and variance. We run simulation study to show that the likelihood ratio test using this new central limit theorem outperforms the one using the traditional Chi-square approximation for analyzing high-dimensional data. iv

7 Contents Acknowledgements Dedication Abstract List of Tables List of Figures i iii iv vii viii 1 Background 1 Asymptotic Expansion of Multivariate Gamma Function 5 3 Testing Covariance Matrix of Normal Distribution Proportional to Identity Matrix Sphericity Test) Introduction High-dimensional Likelihood Ratio Test for Sphericity Proof of Theorem Testing Independence of Components of Normal Distribution Introduction High-dimensional LRT for Testing Independence of Components of a Normal Distribution Proof of Theorem v

8 5 Testing Equality of Multiple Normal Distributions Introduction High-dimensional Likelihood Ratio Test for Equality of Multiple Normal Distributions Proof of Theorem Testing Equality of Multiple Covariance Matrices of Normal Distributions Introduction High-dimensional Likelihood Ratio Test for Equality of Multiple Covariance Matrices Proof of Theorem Testing Specified Mean Vector and Covariance Matrix of Normal Distribution Introduction High-dimensional LRT for Testing Specified Mean Vector and Covariance Matrix Proof of Theorem Testing Complete Independence of Normal Distribution Introduction High-dimensional Likelihood Ratio Test for Complete Independence Proof of Theorem Conclusion and Discussion 66 References 68 vi

9 List of Tables 3.1 Sizes and Powers of LRT for Sphericity Sizes and Powers of LRT for Independent Normal Components Sizes and Powers of LRT for Equality of Multiple Normal Distributions Sizes and Powers of LRT for Equality of Multiple Covariance Matrices Sizes and Powers of LRT for Specified Normal Distribution Sizes and Powers of LRT for Complete Independence vii

10 List of Figures 3.1 Sizes and Powers of LRT for Sphericity Sizes and Powers of LRT for Independent Normal Components Sizes and Powers of LRT for Equality of Multiple Normal Distributions Sizes and Powers of LRT for Multiple Covariance Matrices Sizes and Powers of LRT for Specified Normal Distribution Sizes and Powers of LRT for Complete Independence viii

11 Chapter 1 Background In recent years, researchers have become increasingly sophisticated at using technology to expand their capabilities. They are now able to collect massive quantities of data that can produce revelations leading to new insights, more predictive power, better treatment protocols, and faster results at lower costs. However, as technology continues to expand accessibility to the amount and type of data down to minute levels, a whole new set of challenges arises concerning how the data is to be managed to maximize its potential usefulness. The data must be identified, reviewed, categorized, evaluated, and understood if it is to become meaningful information. The cost and time for managing data becomes significant; and therefore, the collection, organization, and processing of datasets with enormous dimensions pose a critical challenge for both researchers and professionals, one that have become a major focus of statistics, mathematics, and computer science far into the foreseeable future. Traditional statistical theory, particularly in multivariate analysis, did not contemplate the demands of high dimensionality in data analysis due to technological limitations. Consequently, in many cases, tests of hypotheses and many other modeling procedures that can be found in the classical multivariate analysis textbooks such as Anderson ], Muirhead 36], and Siotani et al. 43] were developed under the assumption that the dimension of the dataset, conventionally denoted by p, is considered a fixed small constant or at least negligible compared with the sample size n. However, this assumption is no longer true for many modern datasets, because their dimensions can be proportionally 1

12 large compared with the sample size. Examples of high-dimensional datasets include: Financial Data. Over the past decades, the global financial market has grown rapidly in size and complexity thanks to the innovation in financial engineering and the proliferation of financial derivatives. Today, over tens of thousands of financial products, including securities, commodities, currencies, options, swaps, and credit contracts etc., are being traded around the clock with bid information recorded to the financial databases. One big challenge in financial data analysis is to make short-term forecast to portfolios comprised of a great variety of financial products. This usually involves analysis of a dataset with a high dimension number of financial products) but a limited sample size number of trading records within a short time period). Consumer Data. Increasing popularity of using search engine and shopping online is fundamentally changing the landscape of consumer behavior study. As every browsing, searching and purchasing activities are being recorded and compiled into databases, advertisers and marketing researchers more and more rely on analyzing these data to correlate consumer actions to product attributes. One such example is the movie rating data published by Netflix for its one million dollar prize contest between 006 and 009 that purported to improve the prediction of movie ratings based on consumers movie preferences. The training data for this competition included approximately 480, 000 customer ratings on 18, 000 movies. Manufacturing Data. Modern statistical process control for lean manufacturing often adopts real-time data collection that are automatically performed by the test equipment on production lines. Data corresponding to a large number of product and process characteristics, from raw material receiving inspection, components in-process monitoring, to finished goods testing are being recorded to production databases and analyzed for preventing defects and detecting process shift. In some cases, the number of attributes tested on products is comparable to or even exceeding the production volume, leading to datasets with the dimension p greater than the sample size n. Multimedia Data. Multimedia objects, such as images, flash animations, and

13 3 audio and video clips, are ubiquitous in contemporary life. One of the challenging problems in multimedia analysis is on similarity search, i.e., to seek a collection of multimedia objects from a larger pool that are similar to a given query object. As the similarity between two multimedia objects cannot be measured directly, the comparison usually involves mapping some important primitives of a multimedia object to a numeric vector called a feature vector) in a high-dimensional space so that the similarity can be quantified by the distance in that space. Because the quality of the search improves as the dimension of that space goes up, a similarity search often ends up with handling high-dimensional datasets. More examples of high-dimensional data can be found in Donoho 14] and Johnstone 7]. The failure of the traditional multivariate method for high-dimensional data had been observed by Dempster 13] in as early as Dempster identified the issue that the classical Hotelling s T test for the difference of the means of two p-variate normal distributions becomes inapplicable when the dimension p is greater than the within sample degree of freedom n 1 + n 1). This is because in this case, the Hotelling s T statistic is undefined. Dempster proposed a so-called non-exact F test based on the ratio of two mean square distances as an alternative solution for this case. and Saranadasa 4] further showed that even when the Hotelling s T statistic is well defined, Dempster s F test is more powerful than the Hotelling s T test if the dimension is proportionally close to the within sample degree of freedom. Most recently, Bai et al. 3] studied the likelihood ratio test LRT) for the covariance matrix of a normal distribution and showed that using the traditional Chi-square approximation e.g. see Section 8.4 of 36]) to the limiting distribution of the test statistic will result in a much inflated test size or alpha error) even with moderate size of p and n. Bai et al. 3] developed corrections to the traditional likelihood ratio test to make it suitable for testing a high-dimensional normal distribution. In Bai s derivation, the dimension p is no longer considered a fixed constant, but rather a variable that goes to infinity along with the sample size n; and the ratio between p and n converges to a constant y i.e., p n lim n n Bai = y 0, 1). 1.1)

14 4 Jiang et al. 1] further extended Bai s result to cover the case of y = 1 and also proposed a new LRT for testing equality of two covariance matrices of normal distributions in high-dimensional situation. Besides the likelihood ratio tests, many other traditional hypothesis tests in multivariate analysis have also been revisited in the past decade for high-dimensional cases. Examples include Ledoit and Wolf 9] and Schott 40], 41], 4]. Fujikoshi et al.16] gave a book-length survey on multivariate methods under the high-dimensional framework when p/n y > 0. In this thesis, we study several other likelihood ratio tests for means and covariance matrices of high-dimensional normal distributions. Most of these tests have the asymptotic results for their test statistics derived decades ago under the assumption of a large n but a fixed p. Our results supplement these traditional results in providing alternatives to analyze high-dimensional datasets. The rest of this thesis is organized as follows: In Chapter, we prepare the proof of our main theorems by developing an asymptotic expansion of a multivariate Gamma function. In Chapter 3 through Chapter 8, we prove central limit theorems for six commonly used likelihood ratio test statistics in the high-dimensional cases. Using these central limit theorems will allow one to perform the likelihood ratio tests on datasets with dimension p comparable to the sample size n. We also compare the performance test size and power) of the proposed high-dimensional LRTs against the traditional one through simulation. In the final chapter, we summarize our results and conclude by offering some open problems for future consideration.

15 Chapter Asymptotic Expansion of Multivariate Gamma Function In this chapter, we develop an asymptotic expansion of the multivariate Gamma function. This result plays a pivot role in later chapters as we derive the limiting distributions of the likelihood ratio test statistics for mean vectors and covariance matrices of high-dimensional normal distributions. This is because that the expectation of the moments of these test statistics can be expressed in a close form using multivariate Gamma functions. We begin our discussion with the definition of the multivariate Gamma function. Recall the definition of a univariate Gamma function Γ ) on the complex space as: Γα) := 0 exp x)x α 1 dx for α C with Reα) > 0.1) A multivariate Gamma function, denoted by Γ p ), is a generalization of the univariate Gamma function: DEFINITION.1 The multivariate Gamma function of dimension p > 1, denoted by Γ p α), is defined as Γ p α) := A>0 exp tr A)] deta) α p+1)/ da.) for α C with Reα) > p 1)/, where the integration is over the set of all positive definite symmetric matrix {A : A > 0}. 5

16 Apparently, when p = 1, the multivariate Gamma function.) reduces to its univariate form.1), i.e. Γ 1 α) = Γα). It was also shown e.g., Theorem.1.1 from Muirhead 36]) that a multivariate Gamma function can be expressed as a product of univariate Gamma functions, Γ p α) = π pp 1)/4 p Γ α 1 ] i 1) 6 for Reα) > 1 p 1)..3) This result is more useful from computational aspect and sometime considered as an equivalent definition of multivariate Gamma distribution. The multivariate Gamma function has a wide application in multivariate statistics. For instance, it appears in the expression of the probability density function of Wishart distribution and inverse Wishart distributions see James 0] for reference). The asymptotic expansion of a univariate Gamma function was best known as the Sterling formula see, e.g., p.368 from 17] or eq. 37) on page 04 from 1]): log Γz) = z log z z 1 log z + log π + 1 ) 1 1z + O Rez) 3..4) Next, we derive some useful results on the asymptotic expansion of the multivariate Gamma function. Since our application of these results are mainly in the statistical area, we limit our discussion to the real space R for the sake of simplicity in derivation. We begin with the following lemma: LEMMA.1 Let b := bx) be a real-valued function defined on 0, ). As x log Γx + b) Γx) = b log x + b b x where Γ ) is the univariate Gamma function as defined in.1) and Ox 1/ ), if bx) = O x ) as x + ; cx) = Ox ), if bx) = O1) as x +. + cx).5) Proof. It follows the Stirling formula.4) that log Γx + b) Γx) = x + b) logx + b) x log x b 1 logx + b) log x] x + b 1 x ) ) 1 + O x 3.6)

17 as x +. First, use the fact that log1 + t) = t t /) + Ot 3 ) as t 0, we have x + b) logx + b) x log x = x + b) log x + log 1 + b )] x log x x where Ox 1/ ) c 1 x) = Ox ) bx b = x + b) log x + b 3 x + O x 3 b 3 = b log x + b + b x b3 x + O = b log x + b + b x + c 1x) if bx) = O x), if bx) = O1), x )] x log x ) + O ) b 4 x 3 7 as x +. Similarly, as x +, logx + b) log x = log 1 + b ) = x b/x + O x 1) b/x + O x ) if bx) = O x), if bx) = O1); 1 x + b 1 x = b xx + b) = O x 3/) O x ) if bx) = O x), if bx) = O1). Substituting these assertions in.6), we have where log Γx + b) Γx) Ox 1/ ) cx) = Ox ) = b log x + b b x + cx) if bx) = O x), if bx) = O1), as x +. LEMMA. Let n > p = p n. Assume that p/n y 0, 1) and t = t n = O1). Then, as n, log n 1 i=n p Γ i + t) Γ i ) = n log n p1 + log ) n p) logn p)] t t 1.5t) log1 y) + o1).

18 Proof. Since p/n y 0, 1), then n p + as n. By Lemma.1, there exists integer C 1 such that log Γ i +t) = t log i Γ i ) + t t i + ϕi) and ϕi) C 1 for i all i n p as n is sufficiently large, where here and later in this proof we write t for t n for short notation. Notice t log i = t log i t log. Then, n 1 i=n p log Γ i + t) Γ i ) = pt log + t n 1 i=n p = pt log + t t) = pt log + t t) log i + t t) n 1 i=n p n 1 i=n p n 1 i=n p n 1 1 i + ϕi) i=n p 1 i + t log n! n p)! t log n n p) + O ) 1 n 1 i + t log1 y) + t log n! + o1),.7) n p)! since n 1 i=n p ϕi) = O1/n) and logn/n p)] log1 y) as n. Notice that n 1 i=n p n 1 1 i i=n p i i 1 1 n 1 x dx = 1 n p 1 x dx. By working on the lower bound in a similar way, we also have log n n n 1 n p = 1 n p x dx i=n p This implies, by assumption that p/n y 0, 1), that n 1 i=n p 1 i 1 n 1 i 1 n p 1 x dx = log n 1 n p 1. log1 y).8) as n. Second, by the Stirling formula on factorials see, e.g., p.10 from 15]), there are some θ n, θ n 0, 1), 8 log n! n p)! πnn n θn n+ e 1n = log πn p)n p) n p e n+p+ θ n 1n p) = n log n n p) logn p) p + 1 log n n p + o1) = n log n n p) logn p) p 1 log1 y) + o1)

19 as n. Join this with.7) and.8), we arrive at 9 log n 1 i=n p Γ i + t) Γ i ) = pt log t t) log1 y) + t log1 y) + tn log n tn p) logn p) pt t log1 y) + o1) = pt1 + log ) t 1.5t) log1 y) + tn log n tn p) logn p) + o1) as n. The proof is then complete. LEMMA.3 Let n > p = p n and r n = log1 p/n)] 1/. Assume that p/n 1 and t = t n = O1/r n ) as n Then, as n, log n 1 i=n p Γ i + t) Γ i ) = plog n 1 log )t + rn t p n + 1.5)t ] + o1)..9) Proof. Obviously, lim n r n = +. Hence, {t n ; n } is bounded. By Lemma.1, there exist integers C 1 and C such that for all i C. log Γ i + t) Γ i ) = t log i + t t + ϕi) and ϕi) C 1 i i.10) We will use.10) to estimate n 1 Γ i +t) i=n p. However, when n p is small, say, or Γ i ) 3 which is possible since p/n 1), the identity.10) can not be directly applied to estimate each term in the product of n 1 i=n p Γ i +t). We next use a truncation to solve Γ i ) the problem thanks to the fact that Γ i +t) Γ i ) 1 as n for fixed i. Fix M C. Write Then, a i = Γ i + t) Γ i ) for i 1 and γ n = n 1 i=n p Γ i + t) Γ i ) = γ n 1, if n p M; M 1 n 1 i=n p) M i=n p a i, if n p < M. Γ i + t) Γ i )..11)

20 Easily, ) M min 1 a i) γ n max 1 a i) 1 i M 1 i M ) M 10 for all n 1. Note that, for each i 1, a i 1 as n since lim n t n = 0. Thus, since M is fixed, the two bounds above go to 1 as n. Consequently, lim n γ n = 1. This and.11) say that n 1 i=n p Γ i + t) Γ i ) n 1 i=n p) M as n. By.10), as n is sufficiently large, we know log n 1 i=n p) M Γ i + t) Γ i ) = n 1 i=n p) M Γ i + t) Γ i ).1) t log i ) + t t + ϕi) i with ϕi) C 1 i for i C. Write t logi/) = t log i t log. It follows that log n 1 i=n p) M Γ i + t) Γ i ) = t n n p) M] log + t +t t) n 1 i=n p) M 1 i + n 1 i=n p) M n 1 i=n p) M ϕi) log i := A n + B n + C n + D n.13) as n is sufficiently large. Now we analyze the four terms above separately. By distinguishing the cases n p > M and n p M, we get A n pt log t log ) n p M In p M) M log )t..14) Now we estimate B n. By the same argument as in.14), we get n 1 i=n p) M hi) n 1 i=n p) hi) M hi).15)

21 for hx) = log x or hx) = 1/x on x 0, ). By the Stirling formula see, e.g., p.10 from 15]), n! = πnn n θn n+ e 1n with θ n 0, 1) for all n 1. It follows that for some θ n, θ n 0, 1), n 1 i=n p log i = n! log n p)! + log n p n = πnn n θn n+ e 1n log πn p)n p) n p e n+p+ θ n 1n p) = n log n n p) logn p) p + 1 log n p n + log n p n + R n with R n 1 as n is sufficiently large. Recall B n = t n 1 i=n p) M log i. We know from.15) that B n tn log n tn p) logn p) tp + t log n p ] Ct,.16) n where C here and later stands for a constant and can be different from line to line. 11 Now we estimate C n. Recall the identity s n := n 1/i) = log n + c n for all n 1 and lim n c n = c, where c is the Euler constant. Thus, s n s n p ) logn/n p)] c n + c n p. Moreover, n i=n p+1 1 i = s n s n p and n 1 i=n p 1 i n i=n p+1 1 i 1. Therefore, n 1 i=n p 1 i log n C. n p Consequently, since C n = t + t) n 1 i=n p) M 1/i), we know from.15) that ) n C n t + t) log t + t)c..17) n p Finally, it is easy to see from the second fact in.10) that D n C 1 i=m 1 i.18)

22 for all n. Now, reviewing that t = t n 0 as n, we have from.1),.13),.14),.16) and.17) that, for fixed integer M > 0, 1 A n + B n + C n + D n = pt log + tn log n tn p) logn p) tp + t log n p ] n ) n +t t) log + D n + o1) n p = pt1 + log ) + t 1.5t + nt ) log n t p n + 1.5)t ] logn p) }{{} E n +D n + o1) as n. Write logn p) = log n r n. Then From.18) we have that E n = plog n 1 log )t + rn t p n + 1.5)t ]. lim sup n A n B n + C n + D n ) E n C 1 for any M C. Recalling.1) and.13), letting M, we eventually obtain the desired conclusion. i=m 1 i For simplicity, we may combine Lemma. and Lemma.3 into one proposition: PROPOSITION.1 Let n > p = p n and r n = log1 p/n)] 1/. Assume that p/n y 0, 1] and t = t n = O1/r n ) as n. Then, as n, log n 1 i=n p Γ i + t) Γ i ) = ptlog n 1 log ) + rn t p n + 1.5)t ] + o1). Proof. The equality corresponding to the case y = 1 follows from Lemma.3. If y 0, 1), then lim n r n = log1 y)] 1/, and hence {t n : n 1} is bounded. It follows that ptlog n 1 + log ) + rn t p n + 1.5)t ] = ptlog n 1 log ) + tp n) log 1 p ) t 1.5t ) log 1 p ). n n

23 The last term above is identical to t 1.5t ) log1 y)+o1) since p/n y as n. Moreover, pt log n + tp n) log 1 p ) n The above three assertions conclude ptlog n 1 log ) + rn t + p n + 1.5)t ] = pt log n + tp n) logn p) log n] = nt log n tn p) logn p). = nt log n pt1 + log ) tn p) logn p) t 1.5t ) log1 y) + o1) as n. This is exactly the right hand side of.9). LEMMA.4 Let n > p = p n and r n = log1 p/n)] 1/. Assume p/n y 0, 1] and t = t n = O1/r n ) as n. Then as n. Γ n log + t) Γ n p Γ n ) ) ] Γ n p = rnt + o1).19) + t) Proof. We prove the lemma by considering two cases. Case i): y 0, 1). In this case, n p and lim n r n = log1 y)] 1/ 0, ), and hence {t n } is bounded. By Lemma.4, log Γ n + t) n ) 1 Γ n ) = t log + O n) n p log Γ n p ) Γ n p = t log + t) ) ) 1 + O n p as n. Add the two assertions up, we get that the left hand side of.19) is equal to as n. So the lemma holds for y 0, 1). t log 1 p ) + o1) = r n nt + o1).0) Case ii): y = 1. In this case, r n + and t n 0 as n. By Lemma.4, there exist integers C 1 and C such that log Γ m + t) m ) Γ m ) = t log + t t m + ϕm) and ϕm) C 1 m.1) 13

24 for all m C. For any ɛ > 0, take integer M C such that C 1 ɛ. Set M Γ i A n = log min + t) ] Γ i 1 i M Γ i ) and B n = log max + t) ] 1 i M Γ i ). 14 Γ n p +t) Γ n p Thus, A n log B ) n for all 1 n p M. Consequently, n p Γ + t) log Γ n p ) t log n p A n + B n + t log M.) for all n with 1 n p M. If n p > M, noticing lim n t n = 0, then there exists integer C 3 C such that t t + ϕn p) n p t + t + C 1 M ɛ as n C 3 by the second assertion in.1). Consequently, by the first assertion in.1), n p Γ + t) log Γ n p ) t log n p ɛ.3) for all n C 3 with n p M. Since lim n t n = 0 we know A n 0 and B n 0 as n. Joining.) with.3), we conclude that n p Γ + t) log Γ n p ) t log n p A n + B n + t log M + ɛ < 3ɛ as n is sufficiently large. This says the left hand side of the above goes to 0 as n. Equivalently, log n p Γ + t) Γ n p ) = t log n p + o1).4) as n. By Lemma.1 and the fact that lim n t n = 0, log Γ n + t) Γ n ) = t log n ) 1 + O n as n. Subtracting this by.4), then using the same argument as in.0), we obtain.19). Based on all the results presented, we complete this chapter with our main results on the asymptotic expansion of the multivariate Gamma function:

25 PROPOSITION. Let n > p = p n and r n = log1 p/n)] 1/. Assume p/n y 0, 1] and t = t n = O1/r n ) as n. Then, as n, log Γ p n + t) Γ p n ) Proof. First, It follows that This implies = ptlog n 1 log ) + rn t p n + 0.5) t ] + o1). Γ p n + t ) Γ p n + t) Γ p n ) = Γ p n + t) Γ p n ) n 1 j=n p = Γ n + t) = π pp 1)/4 p n p = π pp 1)/4 i=n 1 Γ j + t + 1 ) Γ j + 1 ) = Γ ) n Now, according to Lemma.1, we have log n 1 j=n p Γ n p) Γ n p + t ) n i Γ + t + 1 ) j Γ + t + 1. ) n j=n p+1 n 1 j=n p 15 Γ j + t) Γ j )..5) Γ j + t) Γ j ). Γ j + t) Γ j ) = ptlog n 1 log ) + rn t p n + 1.5)t ] + o1) as n. On the other hand, from Lemma.4, Γ n log + t) Γ n p Γ n ) ) ] Γ n p = rnt + o1) + t) as n. Combining the last three equalities, we have log Γ p n + t) Γ p n ) = ptlog n 1 log ) + rn t p n + 0.5)t ] + o1). COROLLARY.1 Let n > p = p n and r n = log1 p/n)] 1/. Assume p/n y 0, 1] and s = s n = O1/r n ) and t = t n = O1/r n ) as n. Then, as n, log Γ p n + t) Γ p n + s) = pt s)log n 1 log ) t s ) p n + 0.5) t s) ] + o1). +r n

26 Proof. Simply write 16 log Γ p n + t) Γ p n + s) = log Γ n p + t) Γ n ) log Γ n p + s) p Γ n ). p Then the conclusion is a direct result of Proposition..

27 Chapter 3 Testing Covariance Matrix of Normal Distribution Proportional to Identity Matrix Sphericity Test) 3.1 Introduction Let x 1,, x n be i.i.d. R p -valued random vectors from a normal distribution N p µ, Σ), where µ R p is the mean vector and Σ is the p p covariance matrix. Consider the hypothesis test: H 0 : Σ = λi p vs H 1 : Σ λi p 3.1) where λ > 0 is unknown. Denote x = 1 n n x i, A = n x i x)x i x), and S = A n 1. 3.) The likelihood ratio statistic of test 3.1) was first derived by Mauchly 34] as ) tra p ) trs p V n = deta = dets. 3.3) p p 17

28 Notice that the matrix A and S are not of full rank when p > n and consequently their determinants are equal to zero in this case. 18 This indicates that the likelihood ratio test of 3.1) only exists when p n. The statistic V n is commonly known as the ellipticity statistic. Gleser 18] showed that the likelihood ratio test with the rejection region {V n c α } where c α is chosen so that the test has a significant level of α) is unbiased. The distribution of the test statistic V n can be studied through its moments. When the null hypothesis H 0 : Σ = λi p is true, the following result is referenced from page 341 of Muirhead 36]: EVn h ) = p ph Γ n 1)p] Γ n 1) + ph ] Γ n 1) p + h ] Γ n 1 ] for h > 1 p. 3.4) When p is assumed a fixed integer, the following result, referenced from section of 36] and section of ], gives an explicit expansion of the distribution function of ρ log V n, where ρ = 1 p + p + )/6np 6p), as M = ρn 1) : Pr n 1)ρ log V n x ] = Prχ f x) + γ M Prχ f+4 x) Prχ f x)] + O M 3) 3.5) where f = p + )p 1)/, γ = n 1) ρ ω, and ω given by ω = p 1)p )p + )p3 + 6p + 3p + ) 88p n 1) ρ. 3.6) Nagarsenker and Pillai 37] tabulated the lower 5 percentile and 1 percentile of the asymptotic distribution of V n under the null hypothesis H 0 : Σ = λi p. A different test for sphericity other than the likelihood ratio test was recommended by John 5] who studied the test statistic U = 1 ] S p 1/p)trS I p = 1/p)trS ) 1/p)trS] According to John 5], the test that rejects the null hypothesis when U > c α, where c α is determined by the significant level of α, is a locally most powerful invariant test for sphericity and this test is more universal than the aforementioned likelihood ratio test because it can be performed even with p > n. John 6] further showed that under the

29 null hypothesis of 3.1), the limiting distribution of the test statistic U, as the sample size n goes to infinity while the dimension p remains fixed, is given by 19 np U d χ pp+1)/ ) Ledoit and Wolf 9] re-examined the limiting distribution of the test statistic U in the high-dimensional situation where p/n c 0, ). They proved that under the null hypothesis of 3.1), nu n p Ledoit and Wolf further argued that since d N1, 4). 3.9) p χ pp+1)/ 1 p d N1, 4), 3.10) John s n-asymptotic results assuming p is fixed) of test statistic U still remains valid for practice in the high-dimensional case i.e. both p and n are large). Most recently, Chen, Zhang and Zhong 9] extended Ledoit and Wolf s asymptotic result to the non-normal distributions with certain conditions on their covariance matrices. 3. High-dimensional Likelihood Ratio Test for Sphericity In this section, we focus on the likelihood ratio test for sphericity in the high dimensional case p/n y 0, 1]) and develop a central limit theorem for the likelihood ratio test statistic log V n as given in 3.3). The proof of this theorem is deferred until the next section. THEOREM 3.1 Assume that p := p n is a series of positive integers depending on n such that n > 1 + p for all n 3 and p/n y 0, 1] as n. Let V n be defined as in 3.3). Then under H 0 : Σ = λi p λ unknown), log V n µ n )/σ n converges in distribution to N0, 1) as n, where µ n = p n p 1.5) log p σn = n 1 + log ), n 1 )] > 0. 1 p 1 p n 1

30 0 We compare the performance of sphericity test using the Theorem 3.1 against the traditional LRT using 3.5). Based on the calculated critical values of the test statistic at α = 0.05, we conduct a simulation study with 10, 000 replicates from normal distribution to obtain the realized size of the three tests i.e. the probability of rejecting the null hypothesis) for different pairs of p, n). Our results are summarized in Table 3.1 and charted in Figure 3.1. p, n) Table 3.1: Sizes and Powers of LRT for Sphericity Traditional LRT log V n Critical Value α = 0.05) Size Power High-dimensional LRT log V n Critical Value α = 0.05) Size Power 5, 00) , 00) , 00) , 00) , 00) , 00) , 00) Sizes of the sphericity likelihood ratio test are computed based on 10, 000 independent replications of the tests with n samples drawn from N pe, 0.5I p), where e = 1,..., 1) R p. The powers are estimated under the alternative hypothesis that Σ = diag0.5, 0.09, 0.09,, 0.09). It can be seen from Table 3.1 and Figure 3.1 that when p is small, e.g. p = 5, 10, and 50 in our simulation, the traditional LRT test using the Chi-square approximation of 3.5) has the size that matches the test significant level of 0.05 very well. In these cases, the high-dimensional LRT using Theorem 3.1 also demonstrates a good size, yet slightly greater than When p is large, our simulation shows that the traditional LRT test using 3.5) will reject H 0 with a much higher probability than Particularly, when p = 190 and p = 198, the traditional LRT always rejects H 0 in our simulation, leading to a 100% alpha error. However, the high-dimensional LRT using Theorem 3.1 outperforms

31 1 Figure 3.1: Sizes and Powers of LRT for Sphericity the traditional one with its sizes still very close to 0.05 in these cases. On the power side, our simulation shows that the traditional LRT and the high-dimensional one have comparable powers in case p is small. However, when p becomes large, the power of the traditional LRT goes up to 100% due to the failure of the test 100% alpha error) in these cases, while the power of the high-dimensional LRT stays valid. In summary, our simulation indicates that in regard to the test size or alpha error), the proposed high-dimensional LRT using Theorem 3.1 shows non-inferiority to the traditional LRT when p is small, yet a significant improvement over the traditional one when p becomes large. 3.3 Proof of Theorem 3.1 Recall that a sufficient condition See, e.g., page 408 from 6]) for a sequence of random variables {Z n ; n 1} converges to Z in distribution as n is that lim n EetZn = Ee tz < 3.11)

32 for all t t 0, t 0 ), where t 0 > 0 is a constant. Thus, to prove the theorem, it suffices to show that there exists δ 0 > 0 such that { log Vn µ n E exp as n for all s < δ 0. σ n } s e s / 3.1) First, by the fact that x + log1 x) < 0 for all x 0, 1), we know that σ n > 0 for each n and p with n 3, and σ n y + log1 y)] > 0 as n for y 0, 1), and σ n + as n for y = 1. Therefore, let δ 0 := inf{σ n : n 3} > 0. Fix s < δ 0 / and set t = t n = s/σ n. Then {t n ; n 3} is bounded and t n < 1/ for all n 3. By the moment result of 3.4), Ee t log Vn = EV t n = p pt Γ Γ n 1)p n 1)p ] + pt ] Γ n 1 p + t ] Γ n 1 ] p for all n 3. By Lemma.1 and the assumption that p/n y 0, 1], ] ] Γ n 1)p Γ n 1)p + pt log ] = log ] Γ n 1)p + pt Γ n 1)p ] ) n 1)p = pt log p t pt n 1)p + O 1 n 1)p ] n 1)p = pt log pt 1 ) n 1 + O 3.13) n as n. Set r x := log 1 p/x)] 1/ for x > p 1 and notice t rn 1 = s σn log 1 p )] n 1 s log1 y) y+log1 y), if y 0, 1) s, if y = 1 as n. Thus, t = O1/r n 1 ) as n. By Proposition., log Γ n 1 p + t ] Γ n 1 ] = pt logn 1) 1 log ] + r n 1 t p n + 1.5)t ] + o1) p

33 as n. This together with 3.13) and 3.13) gives that ] Γ n 1)p log Ee t log Vn = pt log p + log Γ n 1)p + pt ] + log Γ n 1 p + t ] ] Γ p n 1 n 1)p = pt log p pt log pt + pt logn 1) 1 log ] n 1 +rn 1 t p n + 1.5)t ] + o1) = rn 1 p ) t n 1 + p + n p 1.5)rn 1] t + o1) 3 as n. Reviewing the notations of µ n, σ n and t = t n = s/σ n, then the above says that { log Vn } log E exp s = log Ee t log Vn = σ nt + µ n t + o1) = s + µ n s + o1) σ n σ n as n for all s < δ 0 /. This implies 3.1). The proof is completed.

34 Chapter 4 Testing Independence of Components of Normal Distribution 4.1 Introduction Suppose that random vector X follows a p dimensional normal distribution N p µ, Σ) where X, µ, and Σ are partitioned as X = X 1, X,..., X k ), µ = µ 1, µ,..., µ k ) and Σ 11 Σ 1 Σ 1k Σ 1 Σ Σ k Σ = Σ k1 Σ k Σ kk where X i and µ i are p i dimensional vector and Σ ij is p i p j matrix with i, j = 1,..., k and k p i = p. Consider the hypothesis test that the sub-vector X 1, X,..., X k are pairwise independent: H 0 : Σ ij = 0 i, j = 1,..., k; i j) 4.1) 4

35 Let x 1,..., x n be n independent random samples of X, n x = x i = x 1, x,..., x k ) be the sample mean vector partitioned into k sub-vectors with x i is k i 1, and A 11 A 1 A 1k n A = x i x i )x i x i ) A 1 A A k = A k1 A k A kk 5 4.) with A ij is p i p j. Wilks 46] showed that the likelihood ratio statistic for testing 4.1) is: Λ n = deta) n/ k deta ii) n/ 4.3) and the likelihood ratio test will reject the H 0 when Λ n c α where c α is chosen so that the significant level is equal to α. Notice that this likelihood ratio test only exists when p n, because otherwise the matrix A is not of full rank and consequently its determinant will be equal to zero. Distribution of the likelihood ratio statistic Λ n can be studied through its moment. Define W n = Λ /n n = deta k deta ii 4.4) When the null hypothesis H 0 : Σ ij = 0 i, j = 1,..., k, i j) is true, the following moment result is from Theorem of Muirhead 36]: ) E Wn h = Γ 1 p n 1) + h] Γ 1 p n 1)] k Γ 1 pi n 1)] Γ 1 pi n 1) + h] for h > ) It was also shown from Theorem of 36] that under the null hypothesis H 0, the test statistic W n has the same distribution as k p i V ij 4.6) i= j=1

36 where V ij are independent random variables that follow beta 0.5n p i j), 0.5p i ] distribution with p i = i 1 l=1 p l. 6 Let ρ = 1 ω = f = 1 p 3 k p3 i 6n 1 1 ρ n p 4 48 p k p i ) + 9 p k p k p i k p 4 i ) 5 p 96 p i ) ), k ), and γ = ρn) ω. p i ) p3 ] k p3 i 7 p k ), p i When n goes to infinity while all p i remain fixed. The traditional Chi-square approximation to the limiting distribution of Λ n was referenced from Section 9.5 of Anderson ]: Pr ρ log Λ n x) = Prχ f x) + γ M Prχ f+4 x) Prχ f x) ] + O M 3) 4.7) The upper 100α% points of the distribution of ρ log Λ n for α = 0.05 and 0.01 were tabulated by Davis and Field 1]. In the high-dimensional case, Schott 41] studied a relevant hypothesis test for complete independence of a multivariate normal distribution, i.e. all off-diagonal entries of the covariance matrix Σ are zero. Schott studied the test statistic T np = p i 1 rij i= j=1 pp 1), 4.8) n where r ij is the i,j entry of the sample correlation matrix. He proved that if the complete independence holds true and p/n c 0, ), T np converges in distribution to a normal distribution with mean 0 and variance c.

37 7 4. High-dimensional LRT for Testing Independence of Components of a Normal Distribution In this section, we develop a high-dimensional likelihood ratio test for testing independence of components of a normal distribution. Our proposed test is based on the following central limit theorem on log W n that is a function of the likelihood ratio s- tatistic Λ n. The proof of this theorem is deferred until the next section. THEOREM 4.1 Assume that p i := p i n) i = 1,..., k) are series of positive integers depending on n and p = p p k < n 1 for all n 3. Suppose p i /n y i 0, 1) as n for each 1 i k. Let W n be defined as in 4.4). Then under H 0 : Σ ij = 0 i, j = 1,..., k; i j), log W n µ n ) /σ n converges in distribution to N0, 1) as n, where µ n = p n + 1.5) log σn = log 1 p ) + n 1 1 p ) n 1 k k log 1 p ) i n 1 p i n + 1.5) log 1 p ) i ; n 1 We compare the performance of the high-dimensional likelihood ratio test derived from Theorem 4.1 against the traditional LRT using Chi-square approximation 4.7). Based on the calculated critical values of the test statistic log W n under both cases at α = 0.05, we run a simulation study with 10, 000 replicates from three independent normal distribution components to estimate the size of the test i.e. the probability of rejecting the null hypothesis or alpha error) for different pairs of p, n). Our results are summarized in Table 4.1 and charted in Figure 4.1. > 0. It can be seen from Table 4.1 and Figure 4.1 that the traditional LRT for independence using the Chi-square approximation only performs well in the low-dimensional cases. When the dimension goes large, the size of the test will raise significantly, leading to much higher than 0.05 alpha error. On the contrary, the proposed high-dimensional LRT test using Theorem 4.1 always returns a good test size regardless of the dimension. This suggests the superiority of the proposed high-dimensional LRT over the traditional one.

38 8 Table 4.1: Sizes and Powers of LRT for Independent Normal Components p 1, p, p 3, n) Traditional LRT log W n Critical Value α = 0.05) Size Power High-dimensional LRT log W n Critical Value α = 0.05) Size Power,, 1, 00) , 4,, 00) , 0, 10, 00) , 40, 0, 00) , 60, 30, 00) , 76, 38, 00) Sizes of the likelihood ratio test of 4.1) are computed based on 10, 000 independent applications of the tests with n = 00 samples drawn from N p0, I p). The powers are estimated under the alternative hypothesis that all entries of Σ ij i, j = 1,..., k; i j) are equal to Proof of Theorem 4.1 For convenience purpose, set m = n 1. Recall the notation r x := log 1 p/x)] 1/ for x > p 1. Then we need to prove as n, where log W n µ m σ m converges in distribution to N0, 1) 4.9) µ m = r m p m + 0.5) + σ m = r m k r m,i. First, by the assumptions in the theorem, p m = k p i m k rm,i p i m + 0.5) k y i := y 0, 1] 4.10) as n. Secondly, it is known k 1 x i) > 1 k x i for x i 0, 1), 1 i k, see, e.g., p.60 from 19]. Taking the logarithm on both sides and then taking x i = p i /m,

39 9 Figure 4.1: Sizes and Powers of LRT for Independent Normal Components we see that 1 σ m = r m k rm,i = k log 1 p i m ) log 1 p ) > ) m Now, by the assumptions and 4.10), it is easy to see lim n σ m = log1 y) + k log1 y i), if y < 1; +, if y = 1. By the same argument as in the last inequality in 4.11), we know the limit above is always positive. Reviewing that m = n 1 > p, we then set δ 0 := inf{ σ m ; m } > 0. Fix s < δ 0 /. Set t = t m = s/ σ m. Then {t m ; m } is bounded and t m < 1/ for all m. In particular, as n, we have 1 ) t = t m = O, 1 i k, 4.1) r m,i

40 due to the fact that lim n r m,i = log1 y i ) 0, ) for 1 i k. On the other hand, notice k rm,i = k log 1 p ) i m k log1 y i ) as n. It follows from 4.10) that r 1 m lim n σ m = log1 y) log1 y) k, if y 0, 1); log1 y i) 1, if y = This implies that t = s 1 ) = O σ m r m 4.13) as n. Now, use the moment results as given in 4.5), Ee t log Wn = EW t n = Γ p m + t) Γ p m ) since t = t m < 1/. By Lemma. and 4.13), k Γ pi m ) Γ pi m + t) 4.14) log Γ p m + t) Γ p m ) = ptlog m 1 log ) + rm t p m + 0.5) t ] + o1) 4.15) as p. Similarly, by Lemma. and 4.1), log Γ p i m + t) Γ pi m ) = p i tlog m 1 log ) + rm,i t p i m + 0.5) t ] + o1) 4.16) as n for 1 i k. Therefore, use the identity p = p p k to have log k Γ pi m + t) Γ pi m ) = k log Γ p i m + t) Γ pi m ) = ptlog m 1 log ) + t k t r m,i k rm,i p i m + 0.5) + o1)

41 as n. This together with 4.14) and 4.15) gives 31 log EW t n = t r m k r m,i = s + µ m σ m s + o1) ) + t r m p m + 0.5) + ] k rm,i p i m + 0.5) + o1) as n by the definitions of µ m and σ m as well as the fact t = s/ σ m. We then arrive at { log Wn µ } m E exp s = e µms/ σm EWn t e s / σ m as n for all s < δ 0 /. This implies 4.9) by using the moment generating function method stated in 3.11).

42 Chapter 5 Testing Equality of Multiple Normal Distributions 5.1 Introduction Let x i1,..., x ini be i.i.d. R p -valued random vectors from k p-variate normal distributions N p µ i, Σ i ) for i = 1,..., k, where k is an fixed integer. Consider the hypothesis test that these k normal distributions are identical, i.e. H 0 : µ 1 = = µ k and Σ 1 = = Σ k. 5.1) Let x i1,..., x ini be an i.i.d random sample from the N p µ i, Σ i ) distribution i = 1,..., k). Define A = and B i = B = with x i = 1 n i n i j=1 x ij, k n i x i x) x i x), 5.) n i j=1 x ij x i )x ij x i ), 5.3) k B i, 5.4) x = 1 n k k n i x i, n = n i. 5.5) 3

43 The following likelihood ratio statistic for testing 5.1) is first derived by Wilks 45] Λ n = k detb i) n i/ deta + B) n/ n np/ k nn ip/ i ) and the likelihood ratio test will reject the null hypothesis H 0 if Λ n c α, where the critical value c α is determined so that the significant level of the test is equal to α. Note that when p > n, the matrices B i i = 1,,..., k) are not of full rank and consequently their determinants are equal to zero, so is the likelihood ratio statistic Λ n. Therefore, this likelihood ratio test of 5.1) only exists when p n. Anderson suggested see Section 10.3 of ]) using a modified test statistic Λ n which is very similar to the likelihood ratio statistic Λ n of equation 5.6) except that the n i in that equation had been replaced by n i 1 and n had been replaced by n 1, or Λ n = k detb i) n i 1)/ deta + B) n k)/ n n k)p/ k n i 1). 5.7) n i 1)p/ However, according to Perlman 39], it is the likelihood ratio statistic Λ n, not the modified test statistic Λ n, that gives a unbiased test of 5.1). The distribution of Λ n can be studied through its moments. For notation convenience, define λ n := Λ n k nn ip/ k i n np/ = detb i) n i/. 5.8) deta + B) n/ Then the general expression of the moment of λ n was derived in the Theorem of 36] with the use of Hypergeometic Function also see section 10.4 of ] for the moment results of the modified statistic Λ n). When the null hypothesis 5.1) is true, this expression can be much simplified to ) E λ h n = for h > max 1 i k {p/n i } 1. Γ 1 p n 1)] Γ p 1 n1 + h) 1 ] k Γ 1 p n i1 + h) 1 ] Γ 1 p n i 1) ] 5.9)

44 When the dimension p is considered fixed, the following asymptotic expansion of the distribution function of log Λ n under the null hypothesis 5.1) was from Theorem in 36]. Let f = 1 pk 1)p + 3) 34 and Then p ) ρ = 1 p + 9p + 11 n 1. 6k 1)p + 3)n n i Pr ρ log Λ n x) = Prχ f x) + γ M Prχ f+4 x) where M = ρn and γ = 1 k 6pp + 1)p + )p + 3) 88 p + 9p + 11) p 1) k 1)pp + 3) Prχ f x) ] + O M 3) 5.10) n n i p 1 ) n ) ] 1. n i A similar type of expansion of the distribution function of ρ log Λ n was given by section 10.5 of Anderson ], with a higher order approximation of M. The modified test statistic Λ n was studied more thoroughly by Lee, Chang and Krishnaiah 30] with the upper percentage points of its limiting distribution tabulated. 5. High-dimensional Likelihood Ratio Test for Equality of Multiple Normal Distributions In this section, we develop the likelihood ratio test for testing equality of multiple normal distributions in the high-dimensional case i.e. p/n y 0, 1]). Our proposed test is based on the following central limit theorem for the likelihood ratio statistic log Λ n under the null hypothesis 5.1). The proof of this theorem is deferred until the next section.

45 THEOREM 5.1 Assume that n i := n i p) i = 1,..., k) are series of positive integers depending on p such that min 1 i k n i > 1 + p for all p 1 and p/n i y i 0, 1] as p for each 1 i k. Let n = n n k and Λ n be defined as in 5.6). Then under H 0 : µ 1 = = µ k and Σ 1 = = Σ k, log Λ n µ n )/nσ n ) converges in distribution to N0, 1) as p, where µ n = 1 4 σ n = 1 kp log k y i + nn p 3) log 1 p ) n k n i n p 3) log 1 p ) ], n i 1 1 p ) n k ni ) log 1 p ) ] > 0. n n i 1 35 We compare the performance of likelihood ratio test of 5.1) using the Central Limit Theorem 5.1 against the traditional Chi-square approximation of 5.10). The critical values of the test statistic at 0.05 significant level are calculated under both cases and a simulation study with 10, 000 replicates from 3 normal distributions were performed to estimate the actual sizes and power of the test i.e. the probability of rejecting the null hypothesis) for different pairs of p, n i ). Our results are summarized in Table 5.1 and charted in Figure 5.1. It can be seen from Table 5.1 and Figure 5.1 that when p is small, e.g. p = 5, 10, and 50 in our simulation, the traditional LRT using the Chi-square approximation of 5.10) has the size that matches the test significant level of 0.05 very well. In these cases, the proposed high-dimensional LRT using Theorem 5.1 also shows acceptable sizes, yet slightly higher than When p is large, our simulation shows that the traditional LRT test using 6.7) will reject H 0 with a much higher probability than Particularly, when p = 150, 190, and 198, the traditional LRT always rejects H 0 in our simulation, leading to a 100% alpha error. However, in these cases, the size of the proposed high-dimensional LRT using Theorem 5.1 is still close to On the power side, our simulation shows that the traditional LRT and the high-dimensional one have comparable powers in case p is small. However, when p becomes large, the power of

46 Table 5.1: Sizes and Powers of LRT for Equality of Multiple Normal Distributions k = 3 Traditional LRT High-dimensional LRT p, n i ) log Λ n Critical Value α = 0.05) Size Power 36 log Λ n Critical Value α = 0.05) Size Power 5, 00) , 00) , 00) , 00) , 00) , 00) , 00) Sizes or alpha errors) are computed based on 10, 000 independent applications of the tests with n i = 00 samples drawn from three normal distributions with zero mean and covariance matrix whose ondiagonal entries equal to 1 and off-diagonal entries equal to 0.5. The powers were estimated under the alternative hypothesis that µ 1 = 0,..., 0), Σ 1 = 0.51 p +0.5I p; µ = 0.,..., 0.), Σ = 0.41 p +0.6I p; µ 3 = 0.4,..., 0.4), Σ 1 = 0.31 p I p. the traditional LRT goes up to 100% due to the failure of the test 100% alpha error) in these case, while the power of the high-dimensional LRT stays valid. In summary, our simulation indicates that the proposed high-dimensional LRT using Theorem 5.1 shows non-inferiority to the traditional LRT when p is small, but outperforms the the traditional one when p becomes large. 5.3 Proof of Theorem 5.1 According to 5.8), Evidently, as p, log Λ n = log λ n + 1 pn log n 1 p n = 1 k n i p Recall that r n = log1 p/n)] 1/. Therefore, as p, k pn i log n i. 5.11) 1 k 1 y i := y 0, 1). 5.1) r n log1 y) 0, ). 5.13)

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS The Annals of Statistics 013, Vol. 41, No. 4, 09 074 DOI: 10.114/13-AOS1134 Institute of Mathematical Statistics, 013 CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Reading 10 : Asymptotic Analysis

Reading 10 : Asymptotic Analysis CS/Math 240: Introduction to Discrete Mathematics Fall 201 Instructor: Beck Hasti and Gautam Prakriya Reading 10 : Asymptotic Analysis In the last reading, we analyzed the running times of various algorithms.

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University The Hasse-Minkowski Theorem in Two and Three Variables THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Multivariate-sign-based high-dimensional tests for sphericity

Multivariate-sign-based high-dimensional tests for sphericity Biometrika (2013, xx, x, pp. 1 8 C 2012 Biometrika Trust Printed in Great Britain Multivariate-sign-based high-dimensional tests for sphericity BY CHANGLIANG ZOU, LIUHUA PENG, LONG FENG AND ZHAOJUN WANG

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Tube formula approach to testing multivariate normality and testing uniformity on the sphere

Tube formula approach to testing multivariate normality and testing uniformity on the sphere Tube formula approach to testing multivariate normality and testing uniformity on the sphere Akimichi Takemura 1 Satoshi Kuriki 2 1 University of Tokyo 2 Institute of Statistical Mathematics December 11,

More information

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

#A69 INTEGERS 13 (2013) OPTIMAL PRIMITIVE SETS WITH RESTRICTED PRIMES

#A69 INTEGERS 13 (2013) OPTIMAL PRIMITIVE SETS WITH RESTRICTED PRIMES #A69 INTEGERS 3 (203) OPTIMAL PRIMITIVE SETS WITH RESTRICTED PRIMES William D. Banks Department of Mathematics, University of Missouri, Columbia, Missouri bankswd@missouri.edu Greg Martin Department of

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Modular Monochromatic Colorings, Spectra and Frames in Graphs

Modular Monochromatic Colorings, Spectra and Frames in Graphs Western Michigan University ScholarWorks at WMU Dissertations Graduate College 12-2014 Modular Monochromatic Colorings, Spectra and Frames in Graphs Chira Lumduanhom Western Michigan University, chira@swu.ac.th

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

On the Testing and Estimation of High- Dimensional Covariance Matrices

On the Testing and Estimation of High- Dimensional Covariance Matrices Clemson University TigerPrints All Dissertations Dissertations 1-009 On the Testing and Estimation of High- Dimensional Covariance Matrices Thomas Fisher Clemson University, thomas.j.fisher@gmail.com Follow

More information

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors Takashi Seo a, Takahiro Nishiyama b a Department of Mathematical Information Science, Tokyo University

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

STAT FINAL EXAM

STAT FINAL EXAM STAT101 2013 FINAL EXAM This exam is 2 hours long. It is closed book but you can use an A-4 size cheat sheet. There are 10 questions. Questions are not of equal weight. You may need a calculator for some

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

Empirical properties of large covariance matrices in finance

Empirical properties of large covariance matrices in finance Empirical properties of large covariance matrices in finance Ex: RiskMetrics Group, Geneva Since 2010: Swissquote, Gland December 2009 Covariance and large random matrices Many problems in finance require

More information

TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX.

TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX. Serdica Math J 34 (008, 509 530 TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX Evelina Veleva Communicated by N Yanev Abstract

More information

Detection of structural breaks in multivariate time series

Detection of structural breaks in multivariate time series Detection of structural breaks in multivariate time series Holger Dette, Ruhr-Universität Bochum Philip Preuß, Ruhr-Universität Bochum Ruprecht Puchstein, Ruhr-Universität Bochum January 14, 2014 Outline

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY José A. Díaz-García and Raúl Alberto Pérez-Agamez Comunicación Técnica No I-05-11/08-09-005 (PE/CIMAT) About principal components under singularity José A.

More information

STA 2101/442 Assignment 2 1

STA 2101/442 Assignment 2 1 STA 2101/442 Assignment 2 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. A polling firm plans to ask a random sample of registered voters in Quebec whether

More information

Optimal primitive sets with restricted primes

Optimal primitive sets with restricted primes Optimal primitive sets with restricted primes arxiv:30.0948v [math.nt] 5 Jan 203 William D. Banks Department of Mathematics University of Missouri Columbia, MO 652 USA bankswd@missouri.edu Greg Martin

More information

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Supplemental Material 1 for On Optimal Inference in the Linear IV Model Supplemental Material 1 for On Optimal Inference in the Linear IV Model Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Vadim Marmer Vancouver School of Economics University

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang Use in experiment, quasi-experiment

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Determinants of Partition Matrices

Determinants of Partition Matrices journal of number theory 56, 283297 (1996) article no. 0018 Determinants of Partition Matrices Georg Martin Reinhart Wellesley College Communicated by A. Hildebrand Received February 14, 1994; revised

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD

THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD ANDREW V. SILLS AND CHARLES W. CHAMP Abstract. In Grubbs and Weaver (1947), the authors suggest a minimumvariance unbiased estimator for

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

Near-exact approximations for the likelihood ratio test statistic for testing equality of several variance-covariance matrices

Near-exact approximations for the likelihood ratio test statistic for testing equality of several variance-covariance matrices Near-exact approximations for the likelihood ratio test statistic for testing euality of several variance-covariance matrices Carlos A. Coelho 1 Filipe J. Marues 1 Mathematics Department, Faculty of Sciences

More information