Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

Size: px

Start display at page:

Download "Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection"

Byron Mason
5 years ago
Views:

1 Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection Satoshi Kuriki and Xiaoling Dou (Inst. Statist. Math., Tokyo) ISM Cooperative Research Symposium: Extreme value theory and applications Fri 7 July, 01 1 / 3

2 Contents of talk 1. Some theoretical results. Application to the statistical genetics. Summary / 3

3 1. Some theoretical results 3 / 3

4 Quadratic form of a Gaussian vector Canonical form: T = n a i ξi, i=1 ξ i N(0, 1) i.i.d. Note that: ai s are not necessarily positive. Some of a i s take the same values. Our purpose is to obtain the tail probability: F (x) = Pr(T > x) (x ) We propose a PP-plot for this tail probability. For the case where the numbers of the same a i s are all even, i.e., T is a linear combination of chi-square distributions with d.f., see Imhof (1961, Biometrika). 4 / 3

5 The case where a i s are positive Proposition 1 Let a 1 = = a m > a m+1 a n > 0. Then, ( n ) F (x) = Pr a i ξi > x i=1 Pr (χ m > x ) a 1 i m+1 ( 1 a ) i 1 a 1 (x ) Note that Pr ( χ m > x ) 1 m 1 Γ( m 1 e x (x ) )xm For m = 1, e.g., Beran (1975, AS) 5 / 3

6 An intuitive explanation of Proposition 1 Let a 1 = = a m (= 1) > a m+1 a n > 0 for simplicity. We want to prove F (x) C Γ( m 1 e x (x ) )xm where Equivalently C = 1 m 1 e x F (x) i m+1 (1 a i ) 1 C Γ( m )xm 1 (x ) By Tauberian theorem, it suffices to show that 0 e sx e x F (x)dx Cs m (s 0) if the regularity condition (ultimate monotonicity) is ensured. 6 / 3

7 An intuitive explanation of Proposition 1 (contd) By integration by parts, LHS = e (s 1 )x F (x)dx 0 = 1 [ s 1 e (s 1 )x F (x) + 0 = 1 φ(s 1 ) s 1, φ(s) = Actually, in our case, 0 0 ] e (s 1 )x df (x) e sx df (x) φ(s) = E [ e s P ] a i ξi = (1 + sai ) 1, and LHS = 1 {1 + (s 1 )} m i m+1 (1 + (s 1 )a i) 1 s 1 s m m 1 a i ) i m+1(1 1 = Cs m = RHS (s 0) 7 / 3

8 An approach to prove Proposition 1 Recall that T = n a i ξi, i=1 ξ i N(0, 1) i.i.d. Define a Gaussian process on S n 1 (the set of unit vectors in R n ) by Z(h) = n h i ai ξ i, h = (h i ) S n 1. i=1 Then, max Z(h) = T. h S n 1 Various methods for approximating the tail probability of the maximum of a Gaussian process are applicable. 8 / 3

9 An approach to prove Proposition 1 (contd) One approach is Euler-characteristic heuristic (volume-of-tube method) is ( ) Pr max Z(h) x E[χ(A x )] (x ) h Sn 1 where A x = { h S n 1 Z(h) x } (excursion set) χ( ): Euler characteristic. Thanks to Morse s theorem (see, e.g., Worsley, 1995; K & Takemura, 009), [ E[χ(A x )] = 1(Z(h) x) det ( Z(h) ) ] Ż(h) = 0 S n 1 E θ(0) ds n 1 (h) where θ(0) is the density function of Ż(h) evaluated at Ż(h) = 0. Details are omitted. 9 / 3

10 The case where a i s are not necessarily positive Proposition Let a 1 = = a m > a m+1 a n >. Then, ( n ) F (x) = Pr a i ξi > x i=1 Pr (χ m > x ) a 1 i m+1 ( 1 a ) i 1 a 1 (x ) Note that Pr ( χ m > x ) 1 m 1 Γ( m 1 e x (x ) )xm Of the same form as Proposition / 3

11 Proof of Proposition Assume that a 1 = = a m > a m+1 > 0 > b m +1 > b m = = b 1 Let We evaluate T = a i ξ i b j ξ j =: Y Z F (x) = Pr(T > x) = E Z [Pr(Y Z > x Z)] = E Z [ F Y (x+z)] where F Y (x) = Pr(Y > x) Lemma Let Z be a nonnegative r.v. If F 1 (x) F (x) (x ), then E Z [ F 1 (x + Z)] E Z [ F (x + Z)]. 11 / 3

12 Proof of Proposition (contd) Applying Lemma together with the result of Proposition 1 (D 1 m F Y (x) D m x m 1 e x a 1 = (a 1 ) m 1 Γ( m )), we have i m+1 ( 1 a ) i 1 a 1 F (x) D m e x a 1 E Z[ (x + Z) m 1 e Z a 1 ] D m e x a 1 x m 1 E Z[ e Z a 1 ] i m+1 Pr (χ m > x ) ( 1 + b ) j 1 a 1 a 1 j 1 i m+1 ( 1 a ) i 1 a 1 ( 1 a ) i 1 a 1 i m+1 ( 1 a ) i 1 a 1 1 / 3

13 Example Double exponential distribution: On the other hand, f(x) = 1 e x { 1 F (x) = e x (x 0) 1 1 e x (x < 0) T =Y Z, Y, Z Exp(1) 4 ( 1 = a i ξi, (a i ) =, 1, 1 ), 1, i=1 ( F (x) Pr χ > ( x ) ) (1 ( 1/) ) 1/ (1/) = 1 e x (x ) 13 / 3

14 PP-plot Let Assume that X 1,..., X N L( λi ξ i ) i.i.d. λ max = max λ i > 0 > λ min = min λ i, the multiplicities of max λ i and min λ i are 1. The order statistics X (1) < < X (N) 14 / 3

15 PP-plot (contd) PP-plot: ( ( X(i) ) log Ḡ1 + log ( 1 λ ) ( i, log 1 i ) ) λ max λ max N + 1 i max for i such that X (i) > 0 ( ( X(i) ) log Ḡ1 log ( 1 λ ) ( i, log 1 i ) ) λ min λ min N + 1 where Ḡ1(x) = Pr ( χ 1 > x) i min for i such that X (i) < 0 15 / 3

16 . Application to the influence analysis in QTL detection 16 / 3

17 What is QTL analysis? N individuals (e.g., mice) data: # phenotype genotype 1 y 1 z i1,..., z 1M.. i y i z i1,..., z im.. N y N z N1,..., z NM Phenotype y i : The measurement of interesting feature of individual i. Genotype z ij : The type of gene at the locus j of individual i. Purpose of the analysis: To identify j (index of loci) such that z ij is highly correlated to y i. Such locus j is called QTL. 17 / 3

18 LOD Score 5 4 LOD Test positions on the chromosomes H j (QTL at j) : y i N(µ + αz ij, σ ) H 0 (no QTL) : y i N(µ, σ ) LOD(j) = const log σ (H 0 ) σ (H j ) (LRT H 0 vs. H j ) 18 / 3

19 Influence function Empirical influence function of LOD(j) for the individual i (Dou, et al., 01): { ε i (H0 ) EIF i (j) = const σ (H ε } i (H j ) 0 ) σ (H j ) where ε i (Hj ) =y i µ (Hj ) α (Hj )z ij ε i (H0 ) =y i µ (H0 ) are residuals under H j and H 0. EIF of the weighted LOD score j J c jlod(j): c j EIF i (j) (Weighted EIF) j J Available for detecting individuals that affect the shape of LOD score specified by the coefficients (c i ). 19 / 3

20 Influence function (contd) (c j ) = (1.04,.356, 1.314) Weighted EIF ( ) Index We want to make sure whether No. 60 mouse is influential. 0 / 3

21 Influence function (contd) Approximation: Suppose that in { ε i (H0 ) EIF i (j) = const σ (H ε } i (H j ) 0 ) σ (H, j ) ε (Hj ) and ε (H0 ) are Gaussian random variables, and σ (H j ), σ (H 0 ) are constants. Then, c j EIF i (j) d a j ξj, ξ j N(0, 1) j J j J (a j ) = (16.143,.69, ) 1 / 3

22 PP-plot The PP-plot suggests that No. 60 is influential. q.y q.x / 3

23 Concluding remarks In Propositions 1 and, we provide the upper tail probability formula for a linear combination of chi-square random variables (a quadratic form of a Gaussian vector). We applied PP-plot to the influence analysis in QTL detection. We want to extend our result to the case where the number n of terms in T is infinite. Acknowledgment: The authors thank Hsien-Kuei Hwang for his comments on the original version of slides. 3 / 3

Tube formula approach to testing multivariate normality and testing uniformity on the sphere

Tube formula approach to testing multivariate normality and testing uniformity on the sphere Akimichi Takemura 1 Satoshi Kuriki 2 1 University of Tokyo 2 Institute of Statistical Mathematics December 11,