Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction

Size: px
Start display at page:

Download "Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction"

Transcription

1 Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction Zhishen Ye a, Jie Yang,b,1 a Amgen Inc., Thousand Oaks, CA , USA b Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL , USA Abstract We propose a new method for dimension reduction in regression using the first two inverse moments. We develop corresponding weighted chi-squared tests for the dimension of the regression. The proposed method considers linear combinations of Sliced Inverse Regression (SIR) and the method using a new candidate matrix which is designed to recover the entire inverse second moment subspace. The optimal combination may be selected based on the p- values derived from the dimension tests. Theoretically, the proposed method, as well as Sliced Average Variance Estimate (SAVE), are more capable of recovering the complete central dimension reduction subspace than SIR and Principle Hessian Directions (phd). Therefore it can substitute for SIR, phd, SAVE, or any linear combination of them at a theoretical level. Simulation study indicates that the proposed method may have consistently greater power than SIR, phd, and SAVE. Corresponding author at: Department of Mathematics, Statistics, and Computer Science (MC 249), University of Illinois at Chicago, 851 South Morgan Street, SEO 322, Chicago, Illinois 60607, USA. Tel.: ; fax: address: jyang06@math.uic.edu (J. Yang). 1 The authors thank Robert Weiss for comments on an earlier draft. Preprint submitted to Elsevier February 11, 2009

2 Key words: Weighted chi-squared test Dimension reduction in regression, phd, SAVE, SIMR, SIR, 1. Introduction The purpose of the regression of a univariate response y on a p-dimensional predictor vector x is to make inference on the conditional distribution of y x. Following Cook (1998b), the nature of the problem keeps invariant if x is replaced by its standardized version z = [Σ x ] 1/2 (x µ x ), (1) where µ x denotes the population mean of x, and Σ x denotes the corresponding population covariance matrix assuming non-singularity. To reduce the complexity accompanied by a high dimensional predictor, we focus on replacing the p-dimensional vector z by a d-dimensional vector γ z without specifying any parametric model and without losing any information on predicting y, where γ is a p d (d p) matrix, and γ is the transpose of γ. This procedure is known as dimension reduction in regression. By definition, such a γ always exists. For example, γ can be the p p identity matrix I p. For our purpose, we are interested in those γ s with d < p. The smallest applicable d is called the dimension of the regression. See Section 2 for more details. Based on the inverse mean E(z y), Li (1991a) proposed Sliced Inverse Regression (SIR) for dimension reduction in regression. It is realized that SIR can not recover the symmetric dependency (Li, 1991b; Cook and Weisberg, 1991). After SIR, many dimension reduction methods have been introduced. 2

3 Sliced Average Variance Estimate (SAVE) proposed by Cook and Weisberg (1991) and Principle Hessian Directions (phd) proposed by Li (1992) are another two popular ones. Both phd and SAVE refer to the second inverse moment, centered or non-centered. Compared with SAVE, phd can not detect certain dependency hidden in the second moment (Yin and Cook, 2002; Ye and Weiss, 2003) and the linear dependency (Li, 1992; Cook, 1998a). Among those dimension reduction methods using only the first two inverse moments, SAVE seems to be the preferred one. Nevertheless, SAVE is not always the winner. For example, Ye and Weiss (2003) implied that a linear combination of SIR and phd may perform better than SAVE in some cases. It is not surprising since Li (1991b) already suggested that a suitable combination of two different methods might sharpen the dimension reduction results. Ye and Weiss (2003) further proposed that a bootstrap method could be used to pick up the best linear combination of two known methods in the sense of the variability of the estimators, although lower variability under the bootstrap procedure does not necessarily lead to a better estimator. The purpose of this article is to develop a class of dimension reduction methods using only the first two inverse moments, which can cover SIR, phd, SAVE and their linear combinations as well. Large sample tests for the dimension of the regression and convenient criterion for choosing a suitable candidate from the class are preferred. In Section 2, we review the necessary dimension reduction context. In Section 3, we introduce a new candidate matrix M zz y which targets the entire inverse second moment subspace. It is indeed the candidate matrix of an intermediate method between phd and SAVE. In Section 4, we propose a new class of dimension reduction methods 3

4 called Sliced Inverse Moment Regression (SIMR), along with corresponding weighted chi-squared tests for the dimension of the regression. Theoretically, SIMR can substitute for SIR, phd, SAVE and their linear combinations as well. In Section 5, we use SIMR to analyze a simulated example and illustrate how to choose a good candidate in SIMR with the aid of the p- values derived from the weighted chi-squared tests. Simulation study shows that the chosen one may have consistently greater power than SIR, phd, and SAVE. In Section 6, a real example is used to illustrate how the proposed method works. It is implied that a class of dimension reduction methods, along with a reasonable criterion for choosing the suitable one among them, is preferable in practice to any single method. We conclude this article with discussion and proofs of the results presented. 2. Dimension Reduction Context 2.1. Central Dimension Reduction Subspace (CDRS) Following Cook (1998b), let γ denote a p d matrix such that y z γ z, (2) where indicates independence. Such a γ always exists since (2) is true if γ = I p. Given such a γ, (2) is still true if the columns of γ are replaced by any basis of Span{γ}, which is the linear space spanned by the columns of γ. Thus (2) is actually a statement about the space Span{γ}, which is called a dimension reduction subspace for the regression of y on z. (Cook, 1994b, 1996) introduced the notion of central dimension reduction subspace (CDRS), denoted by S y z, which is the intersection of all dimension 4

5 reduction subspaces. The CDRS S y z is still a dimension reduction subspace under fairly weak restrictions on the aspects of the joint distribution of y and x (Cook, 1994b, 1996). In this article, we always assume that S y z is a dimension reduction subspace and that the columns of γ is an orthonormal basis of S y z. In practice, we usually first transform the original data {x i } into their standardized version {z i } by replacing Σ x and µ x in (1) with their usual sample estimates ˆΣ x and ˆµ x. Then we can estimate S y x by Ŝ y x = [ˆΣ x ] 1/2 Ŝ y z, where Ŝy z is an estimate of S y z. Therefore, the goal of dimension reduction in regression is to find out the dimension of the regression d and the CDRS S y z = Span{γ} Linearity Condition and Constant Covariance Condition Most of the current dimension reduction methods, including SIR, phd and SAVE, require the following condition on the distribution of z: Condition 1. Linearity Condition (Li, 1991a; Cook, 1998b) E(z γ z) = P γ z, where P γ = γγ. In addition, phd and SAVE using the second inverse moment also require: Condition 2. Constant Covariance Condition (Cook, 1998b; Shao et al., 2007) Var(z γ z) = Q γ, 5

6 where Q γ = I P γ. These two conditions hold if z is normally distributed, although the normality is not necessary. Both conditions will be assumed in the rest of the article Candidate Matrix Trying to segregate the underlying matrices from the existing dimension reduction methods, Ye and Weiss (2003) introduced the concept of candidate matrix, which is a p p matrix A satisfying A = P γ AP γ. They showed that any eigenvector corresponding to any nonzero eigenvalue of A belongs to the CDRS Span{γ}. On the other hand, the set of all candidate matrices, denoted by M, contains all symmetric matrices that potentially can be used to estimate Span{γ}. Besides, M is closed under scalar multiplication, transpose, addition, multiplication, and thus under linear combination and expectation. Based on the following equations pointed out by Cook and Lee (1999) and Cook and Weisberg (1991) respectively: E(z y) = P γ E(z y), I Var(z y) = P γ (I Var(z y))p γ. Ye and Weiss (2003) also showed that the matrices [µ 1 (y)µ 1 (y) ] and [µ 2 (y) I] belong to M for all y, where µ 1 (y) = E(z y) and µ 2 (y) = E(zz y). Moreover, they identified the common components [µ 1 (y)µ 1 (y) ] and [µ 2 (y) I] in the symmetric matrices that SIR, SAVE, and y-phd estimate, and proved that 6

7 those matrices all belong to M: M SIR = Var(E(z y)) = E[µ 1 (y)µ 1 (y) ], M SAVE = E[(I Var(z y)) 2 ] = E([µ 1 (y)µ 1 (y) ] 2 + [µ 2 (y) I] 2 [µ 1 (y)µ 1 (y) ][µ 2 (y) I] [µ 2 (y) I][µ 1 (y)µ 1 (y) ]), M y phd = E[(y E(y))zz ] = E[y(µ 2 (y) I)]. 3. A New Candidate Matrix M zz y It has been recognized that sometimes SIR and phd are only able to discover a proper subspace of the CDRS. Circumstances when SIR and phd fail to reveal the complete Span{γ} have been discussed by Li (1991b, 1992), Cook and Weisberg (1991), Cook (1998a), Yin and Cook (2002), and Ye and Weiss (2003). SAVE however provides more accurate estimates of Span{γ} in those cases. To understand the possible comprehensiveness of SAVE, one must first understand the relationship among SIR, phd and SAVE. It is now known that what SIR intends to estimate is a subspace of what SAVE does at both practical and theoretical level (Cook and Critchley, 2000; Cook and Yin, 2001; Ye and Weiss, 2003). Before discussing further about the relationship between phd and SAVE, we first introduce a new candidate matrix, which connects phd to SAVE A New Candidate Matrix As mentioned in Section 2.3, the matrices [µ 1 (y)µ 1 (y) ] and [µ 2 (y) I] are two fundamental components of M SIR, M SAVE, and M y phd. The matrix M SIR only involves the first component [µ 1 (y)µ 1 (y) ], while both M SAVE and 7

8 M y phd share the second component [µ 2 (y) I]. Realizing that this common feature may lead to the connection between SAVE and phd, we investigate the behavior of the matrix [µ 2 (y) I]. To avoid the inconvenience due to E([µ 2 (y) I]) = 0, we define M zz y = E([E(zz I y)] 2 ) = E([µ 2 (y) I] 2 ). Note that M zz y takes a simpler form than the rescaled version of sirii (Li, 1991b, Remark R.3) while still keeping the theoretical comprehensiveness of second inverse moments. The simplicity is critical here because it leads to less complicated large sample test and larger power. We next will gradually establish the relationship between M y phd and M zz y. Lemma 1. Let M be a p q random matrix defined on a probability space (Ω, F, P ), then there exists an event Ω 0 F with probability 1, such that, Span{E(MM )} = Span{M(ω), ω Ω 0 }. A similar result can also be found in Yin and Cook (2003, Proposition 2(i)). The lemma here is more general. By the definition of M zz y, Corollary 1 follows directly. Corollary 1. Span{M zz y} = Span{[µ 2 (y) I], y Ω(y)}, where Ω(y) is the support of y. Based on Corollary 1, Ye and Weiss (2003, Lemma 3), and the fact that [µ 2 (y) I] M for all y, matrix M zz y is in fact a candidate matrix too. Corollary 1 also implies a strong connection between M y phd and M zz y: 8

9 Corollary 2. Span{M y phd } Span{M zz y}. To further understand the relationship between M y phd and M zz y, recall the central k-th moment dimension reduction subspace (Yin and Cook, 2003), S (k) y z = Span{η(k) }. The corresponding random vector (η (k) ) z contains all the available information about y from the first k conditional moments of y z. In other words, y {E(y z),..., E(y k z)} (η (k) ) z. Similar to Span{E(yz),..., E(y k z)} = Span{E(yµ 1 (y)),..., E(y k µ 1 (y))} S (k) y z S y z, the subspace Span{E(y[µ 2 (y) I]),..., E(y k [µ 2 (y) I])} is also contained in S (k) y z. Parallel to Yin and Cook (2002, Proposition 4), the result on M zz y is: Proposition 1. (a) If y has finite support Ω(y) = {a 0,..., a k }, then Span{M zz y} = Span{E[y i (µ 2 (y) I)], i = 1,..., k}. (b) If y is continuous and µ 2 (y) is continuous on y s support Ω(y), then Span{M zz y} = Span{E[y i (µ 2 (y) I)], i = 1, 2,...}. According to Proposition 1 and Yin and Cook (2002, Proposition 4), the relationship between E[y(µ 2 (y) I)] = M y phd and M zz y is fairly comparable with the relationship between E(yµ 1 (y)) = E(yz) and M SIR. Both E(yz) and 9

10 M y phd actually target the central mean (first moment) dimension reduction subspace (Cook and Li, 2002), while M SIR and M zz y target the central k- th moment dimension reduction subspace given any k, or equivalently the CDRS S y z as k goes to infinite. In order to understand the similarity from another perspective, recall the inverse mean subspace of S y z (Yin and Cook, 2002): S E(z y) = Span{E(z y), y Ω(y)}. Similarly, we define the inverse second moment subspace of S y z : Span{E(zz y) I, y Ω(y)}. By definition, matrices M SIR and M zz y are designed to recover the entire inverse mean subspace and the entire inverse second moment subspace respectively, while E(yz) and M y phd are only able to recover portions of those subspaces. We are therefore interested in combining matrices M SIR and M zz y because they are both comprehensive. In the next section, we will reveal the relationships among M SIR, M zz y, and M SAVE, and therefore connect phd to SAVE SAVE versus SIR and phd After separating underlying candidate matrices from the methods estimating them, Ye and Weiss (2003) showed that Span{M SIR } Span{M SAVE }, (3) In other words, M SAVE is more comprehensive than M SIR in terms of the ability to discover the complete CDRS, although the method to estimate M SAVE does not necessarily behave better than the one to estimate M SIR. 10

11 To connect M SAVE with M y phd, we first apply Lemma 1 on M SAVE : Span{M SAVE } = Span{E([µ 1 (y)µ 1 (y) + (µ 2 (y) I)] 2 )} = Span{[µ 1 (y)µ 1 (y) + (µ 2 (y) I)], y Ω(y)}. We then prove the following proposition (see Appendix for the proof): Proposition 2. Span{M SAVE } = Span{M SIR } + Span{M zz y}. A straightforward result following Proposition 2 and Corollary 2 is: Corollary 3. Span{M y phd }, Span{M SIR }, Span{M zz y} Span{M SAVE }. Corollary 3 tells us that the subspaces that SIR, y-phd, and M zz y intend to discover are all contained in the subspace that SAVE does. It also explains why SAVE is able to provide better estimates of the CDRS than SIR and y-phd in many cases. 4. Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests 4.1. Sliced Inverse Moment Regression Section 3.2 implies that M zz y plays a key role in the comprehensiveness of SAVE over SIR and phd. As an alternative candidate matrix in a simple form, M zz y is designed to reveal the entire inverse second moment subspace (Section 3.1). In order to simplify the candidate matrices using the first two inverse moments and still keep the comprehensiveness of SAVE, a natural idea is to combine M zz y with M SIR as follows: αm SIR + (1 α)m zz y 11

12 = E(α[µ 1 (y)µ 1 (y) ] + (1 α)[µ 2 (y) I] 2 ) = E( ( αµ 1 (y), 1 α [µ 2 (y) I] ) αµ1 (y) 1 α [µ2 (y) I] ), where α (0, 1). We call this matrix M (α) SIMR and the corresponding dimension reduction method Sliced Inverse Moment Regression (SIMR or SIMR α ). Again, the combination here is simpler than the SIR α method (Li, 1991b; Gannoun and Saracco, 2003) while still keeping the least requirement on comprehensiveness. Actually, for any α (0, 1), SIMR α is as comprehensive as SAVE at a theoretical level based on the following proposition: Proposition 3. Span{M (α) SIMR } = Span{M SAVE}, α (0, 1). Combined with Corollary 3, it is straightforward that any linear combination of SIR, phd and SAVE can be covered by SIMR α : Corollary 4. Span{aM SIR + bm y phd + cm SAVE } Span{M (α) SIMR }, where a, b, and c are arbitrary real numbers. Note that the way of constructing SIMR α makes it easier to develop a corresponding large sample test for the dimension of the regression (Section 4.3). From now on, we assume that the data {(y i, x i )} i=1,...,n are i.i.d. from a population which has finite first four moments and conditional moments Algorithm for SIMR α Given i.i.d. sample (y 1, x 1 ),...,(y n, x n ), first standardize x i into ẑ i, sort the data by y, and divide the data into H slices with intraslice sample sizes n h, 12

13 h = 1,..., H. Secondly construct the intraslice sample means (zz ) h and z h : (zz ) h = 1 n h ẑ ih ẑ ih, n h i=1 z h = 1 n h ẑ ih, n h where ẑ ih s are predictors falling into slice h. Thirdly calculate ˆM (α) SIMR = H h=1 = ÛnÛ n, i=1 ˆf h ( (1 α)[(zz ) h I p ][(zz ) h I p ] + α[ z h ][ z h ] ) where ˆf h = n h /n and ( Û n =..., 1 α [(zz ) h I p ] ˆf h,...,..., α z h Finally calculate the eigenvalues ˆλ 1 ˆλ p of ˆM (α) SIMR ) ˆf h,.... p (ph+h) and the corresponding eigenvectors ˆγ 1,..., ˆγ p. Then Span{ˆγ 1,..., ˆγ d } is an estimate of the CDRS Span{γ}, where d is determined by the weighted chi-squared test described in the next section A Weighted Chi-Squared Test for SIMR α Define the population version of Ûn: B ( =..., 1 α [E(zz ỹ = h) I p ] f h,..., αe(z ỹ = h) ) f h,... = ( ) (Γ 11 ) p d, (Γ 12 ) p (p d) D d d 0 (Γ 21) d (ph+h),(4) 0 0 (Γ 22) (ph+h d) (ph+h) where ỹ is a slice indicator with ỹ h for all observations falling into slice h, f h = P (ỹ = h) is the population version of ˆf h, (Γ 11, Γ 12 ) is an orthogonal 13

14 p p matrix, (Γ 21, Γ 22 ) is an orthogonal (ph +H) (ph +H) matrix, and D is a diagonal matrix whose diagonal entries are the positive singular values of B. In other words, (4) is the singular value decomposition of B. In order to study the asymptotic behavior of SIMR, denote Ũ n = n(ûn B). By the multivariate central limit theorem and the multivariate version of Slutsky s theorem, Ũ n converges in distribution to a certain random p (ph + H) matrix U as n goes to infinity (Gannoun and Saracco, 2003). Since the singular values are invariant under right and left multiplication by orthogonal matrices, the distribution of the singular values of nûn is the same as the distribution of the singular values of n [(Γ 11, Γ 12 ) Û n (Γ 21, Γ 22 )]. Based on Eaton and Tyler (1994, Theorem 4.1 and 4.2), the asymptotic distribution of the smallest (p d) singular values of nûn is the same as the asymptotic distribution of the corresponding singular values of the following (p d) (ph + H d) matrix: nγ 12 Û n Γ 22. (5) Construct statistic p ˆΛ d = n ˆλ h, h=d+1 which is the sum of the squared smallest (p d) singular values of nûn. Then the asymptotic distribution of ˆΛ d is the same as that of the sum of the squared singular values of (5). That is ntrace([γ 12ÛnΓ 22 ][Γ 12ÛnΓ 22 ] ) = n[vec(γ 12ÛnΓ 22 )] [Vec(Γ 12ÛnΓ 22 )], 14

15 where Vec(A r c ) denotes (a 1,..., a c ) rc 1 for any matrix A = (a 1,..., a c ). By the multivariate version of central limit theorem and Slutsky s theorem again, Vec(Ũn) L N (p 2 H+pH)(0, V ), where V is a nonrandom (p 2 H + ph) (p 2 H + ph) matrix. Thus, n[vec(γ 12 Û n Γ 22 )] L N (p d)(ph+h d) (0, W ), where W = [Γ 22 Γ 12]V [Γ 22 Γ 12] is a (p d)(ph+h d) (p d)(ph+h d) matrix. Combined with Slutsky s theorem, it yields the following theorem: Theorem 1. The asymptotic distribution of ˆΛ d is the same as that of (p d)(ph+h d) i=1 α i K i where the K i s are independent χ 2 1 random variables, and α i s are the eigenvalues of the matrix W. Clearly, a consistent estimate of W is needed for testing the dimension of the regression based on Theorem 1. Because of the way we define it, M (α) SIMR allows us to study the asymptotic distribution of Ûn through its partition below and therefore simplifies the derivation of W : ( Û n,1 =..., ) 1 α [(zz ) h I p ] ˆf h,..., Û n,2 = (..., α z h ) ˆf h,... p H p ph The asymptotic distribution of the matrix Ûn,2 has been fully explored by Bura and Cook (2001), resulting in a weighted chi-squared test for SIR. The 15.,

16 similar techniques can also be applied on the matrix Ûn,1, and therefore the matrix Ûn as a whole, although the details are much more complicated. Define the population versions of Ûn,1 and Ûn,2, B 1 = B 2 = (..., 1 α [E(zz ỹ = h) I p ] ) f h,... (..., α E(z ỹ = h) ) f h,.... ) Then Ûn = (Ûn,1, Ûn,2, B = (B 1, B 2 ). p H p ph Let f, ˆf and 1 H be H 1 vectors with elements f h, ˆf h and 1 respectively; let G and Ĝ be H H diagonal matrices with diagonal entries f h and ˆf h respectively; and let ˆF = (I H ˆf1 H),, (Γ 21) (Γ 22) F = (I H f1 H), = (Γ 212) d H (Γ 211) d ph (Γ 221) (ph+h d) ph (Γ 222) (ph+h d) H. Finally, define four matrices M = (..., E(x ỹ = h),...) p H, N = (..., E(x ỹ = h),...) 1 ph = Vec(M), O = (..., E(xx ỹ = h),...) p ph, C = [O M(I H µ x) µ x N] p ph, and their corresponding sample versions M n = (..., x h,...) p H, N n = (..., x h,...) 1 ph = Vec(M n ), 16

17 O n = (..., (xx ) h,...) p ph, C n = [O n M n (I H ˆµ x) ˆµ x N n ] p ph, where (xx ) h and x h, h = 1,..., H, are intraslice sample means defined similar to (zz ) h and z h. By the central limit theorem, nvec([(cn, M n ) (C, M)]) L N (p 2 H+pH)(0, ) for a nonrandom (p 2 H + ph) (p 2 H + ph) matrix. Appendix for a detailed proof), As a result (see Theorem 2. The covariance matrix in Theorem 1 is 1/2 1 α(f G) Σ x 0 W = ( Γ 22 ) (Γ 12Σ 1/2 x ) 0 α F G 1/2 1 α(f G) Σ ( 0 x 0 α F G Γ 22 ) (Γ 12Σ 1/2 x ). The only difficulty left now is to obtain a consistent estimate of. By the central limit theorem, nvec([(on, M n, ˆµ x ) (O, M, µ x )]) L N (p 2 H+pH+p)(0, 0 ) where 0 is a nonrandom (p 2 H + ph + p) (p 2 H + ph + p) matrix, with details shown in the Appendix. On the other hand, Vec(C n, M n ) = I p 2 H I H ˆµ x I p I ph ˆµ x 0 Vec(O n, M n, ˆµ x ) 0 I ph 0 = g([vec(o n, M n, ˆµ x )]) 17

18 for a certain mapping g : R (p2h+ph+p) R (p2h+ph) such that Vec(C, M) = g([vec(o, M, µ x )]). Thus the close form of can be obtained by Cramér s theorem (Cramér, 1946): = [ġ([vec(o, M, µ x )])] 0 [ġ([vec(o, M, µ x )])], (6) where the (p 2 H + ph) (p 2 H + ph + p) derivative matrix ġ [Vec(O, M, µ x )] = I p 2 H I H µ x I p I ph µ x ġ 13 0 I ph 0 (7) with ġ 13 = (..., I p E(x ỹ = h),...) Vec(M) I p. In summary, to compose a consistent estimate of matrix W, one can (i) substitute the usual sample moments to get the sample estimate of 0 following the details in the Appendix; (ii) estimate by substituting the usual sample estimates for E(x ỹ = h), µ x and M in (6) and (7); (iii) obtain the usual sample estimates of Γ 12 and Γ 22 from the singular value decomposition of Ûn; (iv) substitute the usual sample estimates for F, G, Σ x, Γ 12 and Γ 22 in Theorem 2 to form an estimate of W. Note that both and 0 do not rely on α. This fact can save a lot of computational time when multiple α s need to be checked. To approximate a linear combination of chi-squared random variables and reduce the cost of the calculation, one may use Satterthwaite s statistic, Wood s statistic, or Satorra and Bentler s statistic. Here we use Satterthwaite s statistic for illustration purpose. Let Λ d = t Trace(Ŵ ) ˆΛd, 18

19 where t is the nearest integer to [Trace(Ŵ )]2 /Trace(Ŵ 2 ). This adjusted statistic has an approximate chi-squared distribution with t degree of freedom. In the next applications, we will only present tests based on the adjusted statistic Λ d instead of the original ˆΛ d Choosing Optimal α The proposed M (α) SIMR = αm SIR + (1 α)m zz y, 0 < α < 1, is basically a family of linear combinations. For a given data set, the practical performance of different combinations will not be the same. reasonable criterion on choosing the optimal α. It is important to find a Ye and Weiss (2003) proposed a bootstrap method to pick up the best linear combination of two known methods in terms of variability of the estimated CDRS Ŝy z. To evaluate a fixed combination, bootstrap samples x (b) were generated to produce bootstrapped Ŝ(b) y z. The vector correlation coefficient q (Hotelling, 1936) or trace correlation r (Hooper, 1959) were used to measure the variability between Ŝ(b) y z and Ŝy z based on the original data set. Less variability indicates better combination. The bootstrap method could be used here to choose α for SIMR too. It works reasonably well. See Section 5.1 for an illustration. Alternative criterion for optimal α is based on the weighted chi-squared tests for SIMR α (Section 4.3). When multiple tests with different α report the same dimension d, we pick up the α with smallest p-value. Given the true dimension d, the last eigenvector ˆγ d added in with chosen α is the most significant one among those candidates based on different α. Based on simulation studies (Section 5), the performance of the p-value criterion is comparable with the bootstrap one. The advantage of the former is that it 19

20 require much less computation. When a model or an algorithm is specified for the data analysis, crossvalidation could be used for choosing optimal α too, just like how people did for model selection. For example, see Hastie et al. (2001, chap. 7). It will not be covered in this paper since we aim at model-free dimension reduction. 5. Simulation Study 5.1. A Simulated Example To emphasize the potential advantage of the proposed SIMR over SIR and phd, as well as SAVE, let us look at the following example that both SIR and phd fail to discover the complete CDRS. For phd, r-phd (Li, 1992), which is a mutation of y-phd by replacing y with its least squares residuals r, is used instead. The technical details about r-phd and its advantage over y-phd can be found in Li (1992) and Cook (1998a). Let the response y = 2z 1 ɛ + z2 2 + z 3, where (z, ɛ) = (z 1, z 2, z 3, z 4, ɛ) are i.i.d sample from the N 5 (0, I 5 ) distribution. Then the true dimension of the regression is 3 and the true CDRS is spanned by (1, 0, 0, 0), (0, 1, 0, 0), and (0, 0, 1, 0), that is, z 1, z 2 and z 3. Theoretically, since z 1, z 2 and ɛ depend on y symmetrically, M SIR = Var ((0, 0, E(z 3 y), 0) ) = Diag {0, 0, Var(E(z 3 y)), 0}, M y phd = E((2z 1 ɛ + z2 2 + z 3 1)zz ) = Diag {0, 2, 0, 0}, M r phd = E(rzz ) = E((2z 1 ɛ + z2 2 1)zz ) = Diag {0, 2, 0, 0}, where Diag {0, 2, 0, 0} is the diagonal matrix with entries {0, 2, 0, 0}, residual r = y E(y) [E(yz)] z, and M r phd is the candidate matrix for r-phd. All 20

21 three candidate matrices have rank one and therefore are only able to find a one-dimensional proper subspace of the CDRS. The linear combination of any two of them suggested by Ye and Weiss (2003) can at most find a two-dimensional proper subspace of the CDRS. On the contrary, Span(M SAVE ) = Span(M (α) SIMR ) = Span(M SIR + M zz y) = Span(M SIR + E([µ 2 (y) I] 2 )) = Span (Diag {σ 11, σ 22, Var(E(z 3 y)) + σ 33, 0}), where σ ii = E([E(zi 2 y) 1] 2 ), i = 1, 2, 3. Therefore, both SAVE and SIMR are able to recover the complete CDRS at a theoretical level A Single Simulation We begin with a single simulation with sample size n = 400. SIR, r-phd, SAVE and SIMR α are applied to the data. Number of slices H = 10 are used for SIR, SAVE, and SIMR. The CDRS estimates are displayed in Table 1, and the p-values are listed in Table 2. The R package dr (Weisberg, 2002, 2009, version 3.0.3) is used for SIR, r-phd, SAVE, as well as their corresponding marginal dimension tests. SIMR α with α = 0, 0.01, 0.05, paced by 0.1, 0.95, 0.99, 1 are applied. For illustration purpose, only α = 0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95 are list in the tables. For this typical simulation, SIR identifies only the direction (.018,.000,.999,.035). It is roughly z 3, the linear trend. r-phd identifies only the direction (.011,.999,.038,.020), which is roughly z 2, the quadratic component. As expected, SAVE works better. It identifies z 2 and z 1. However, 21

22 the marginal dimension tests for SAVE (Shao et al., 2007) fail to detect the third predictor, z 3. The p-value of the corresponding test is (Table 2). Roughly speaking, SAVE with its marginal dimension test is comparable with SIMR 0.1 in this case (Table 1, Table 2). The comparison between SAVE and SIMR α suggests that the failure of SAVE might due to its weights combining the first and second inverse moments. As α increases, SIMR 0.3, SIMR 0.5, SIMR 0.6 and SIMR 0.8 all succeed in detecting all the three effective predictors z 1, z 2 and z 3. The CDRS estimated by the three candidate matrices are similar to each other, which implies that the results with different α are fairly consistent. Actually, the fourth and last eigenvectors corresponding to α = 0.3, 0.5, 0.6, 0.8 are (.132,.049,.342, 0.929), (.166,.062,.186, 0.966), (.191,.065,.123, 0.972), and (.305,.065,.029,.950) respectively, which are the complement of the CDRS estimated. The major difference among SIMR 0.3, SIMR 0.5, SIMR 0.6 and SIMR 0.8 is that the order of the detected predictors changes roughly from {z 2, z 1, z 3 } to {z 3, z 2, z 1 } as α increases. It is reasonable because α is the coefficient of M SIR in M SIMR and M SIR is capable of finding out z 3 only. As expected, SIMR α is comparable with SIR if α is close to 1. For example, α = For this particular simulation, SIMR α with α between 0.3 and 0.8 are first selected, because the corresponding tests all imply d = 3 and the estimated CDRS are fairly consistent. Secondly, it is practically important to pick up an optimal α according to a clear criterion. If we know the true CDRS, the optimal α is the one minimizing the distance between the estimated CDRS and the true CDRS. The three distance measures arccos(q), 1 q, 1 r (Ye and Weiss, 2003, p. 974) all imply α = 0.6 for this particular simulation (see 22

23 left panel of Figure 1). The 1 r values are list in Table 2 too. Since the true CDRS is unknown, bootstrap criterion and p-value criterion (Section 4.4) are applied separately. Figure 2 shows the variability of 200 bootstrapped estimated CDRS. Distance 1 r is used because it is still comparable across different dimensions. The minimum variability is attained at d = 3 and α = 0.6, which happens to the optimal one based on the truth. Actually, more simulations reveal about 75% optimal α based on bootstrap fall in , while 60% of optimal α based on the truth vary from Based on another 200 simulations, SIMR with α chosen by bootstrap criterion attains 1 r = from the true CDRS on average, which is reasonably good. Note that low variability not necessarily implies that the estimated CDRS is accurate. For example, SIMR 1 or SIR can only detect one direction z 3. However the estimated one-dimensional CDRS is fairly stable under bootstrapping (see Figure 2). The right panel in Figure 1 shows that the p-value criterion also picks up α = 0.6 for this single simulation (see the line d = 3, which is the highest one that still goes below the significance level 0.05). Based on the same 200 simulations, about 80% of the best α selected by p-value criterion fall between 0.4 and 0.7. On average, SIMR with such kind of α attains 1 r = , which is the distance between the estimated CDRS and the truth. The results is comparable with the bootstrap ones. We prefer the p-value criterion because it is computationally cheaper Power Analysis In the single simulation in Section 5.2, SIMR 0.6 using weighted chi-squared tests perform better than SIR, phd, and SAVE. In this section, we conduct 23

24 1,000 independent simulations to show that it is not an isolated case. We summarize in Table 3 the empirical powers and sizes of the marginal dimension tests with significance level 0.05 for SIR, SAVE, r-phd, and SIMR α with α chosen by the p-value criterion. For illustration purpose, we omit the simulation results of y-phd because there is little difference between y-phd and r-phd in this case. The empirical powers and sizes with significance level 0.01 are omitted too since their pattern is similar to Table 3. In Table 3, the rows d 0, d 1, d 2 and d 3 indicate different null hypotheses. Following Bura and Cook (2001), the numerical entries in the rows d 0, d 1, and d 2 are empirical estimates of the powers of the corresponding tests, while the entries in the row d 3 are empirical estimates of the sizes of the tests. As expected, SIR claims d = 1 in most cases. r-phd works a little better. At the significance level 0.05, r-phd has about 30% chance to find out d 2 (Table 3). At level 0.01, the chance shrinks to about 15%. Both SAVE and SIMR perform much better than SIR and phd. Compared with SAVE, SIMR has consistently greater powers for the null hypotheses d 0, d 1 and d 2 across different choices of sample size, number of slices and significant level. For example, under the null hypothesis d 2 with sample size 400, the empirical powers of SIMR at level 0.05 are under 5 slices and under 10 slices, while the corresponding powers of SAVE are only and respectively (Table 3). Those differences become even bigger at level The empirical sizes of SIMR are roughly under the nominal size 0.05 although they tend to be larger than all the others. For comparison purpose, the method inverse regression estimator (IRE) 24

25 (Cook and Ni, 2005; Wen and Cook, 2007; Weisberg, 2009)) is also applied to the same 1000 simulations. Roughly speaking, IRE performs similar to SIR in this example. Among the five dimension reduction methods, SIMR is the most reliable one. The optimal α chosen by the p-value criterion may vary from simulation to simulation. Nevertheless, about 85% of the optimal α range from 0.4 to 0.6 which indicates the results are fairly stable. The chi-squared test for SIMR does not seem to be very sensitive to the numbers of slices. Nevertheless, we suggest that the number of slices should not be greater than 3%-5% of the sample size based on the simulation results. 6. A Real Example: Ozone Data To examine how SIMR works in practice, we consider a data set taken from Breiman and Friedman (1985). The response Ozone is the daily ozone concentration in parts per million, measured in Los Angeles basin, for 330 days in For illustration purpose, the dependence of Ozone on the following four predictors is studied next: Height, Vandenburg 500 millibar height in meters; Humidity in percents; ITemp, Inverse base temperature in degrees Fahrenheit; and STemp, Sandburg Air Force Base temperature in degrees Fahrenheit. To approximate both the Linearity Condition 1 and the Constant Covariance Condition 2, simultaneously power transformations on the predictors are estimated to improve the normality of their joint distribution. After replacing Humidity, ITemp, and STemp with Humidity 1.68, ITemp 1.25, and STemp 1.11 respectively, SIR, r-phd, SAVE and SIMR are applied to the data. For SIR, 25

26 SAVE, and SIMR, various numbers of slices are applied, and the results are fairly consistent. Here we only present the outputs based on H = 8. Table 4 summarizes the p-values of marginal dimension tests for SIR, r- phd, SAVE and SIMR. SIR suggests the dimension of the regression d = 1, while r-phd claims d = 2. Using the visualization tools described by Cook and Weisberg (1994) and Cook (1998b), the first phd predictor appears to be somewhat symmetric about the response Ozone, and the second phd predictor seems to be similar to the first SIR predictor, which are not shown in this article. The symmetric dependency explains why SIR is not able to find the first phd predictor. The resulting inference based on phd is therefore more reliable than the inference based on SIR. When checking the predictors of SAVE, visual tools show a clear quadratic or even higher order polynomial dependency between the response and the first SAVE predictor. The second SAVE predictor is similar to the second phd predictor, and the third SAVE predictor is similar to the first phd predictor. Both SIR s and phd s tests miss the first SAVE predictor. Now apply SIMR to the ozone data. Bootstrap criterion picks up α = 0.2 while p-value criterion suggests α = 0. Nevertheless, both SIMR 0.2 and SIMR 0 lead to very similar estimated CDRS in this case (see Table 5). As expected, they recovers all the three SAVE predictors. Actually, those three estimated CDRS appear to be almost identical. 7. Discussion There are two fundamental goals of dimension reduction methods discussed in this article: recovering the CDRS S y z = Span{γ} and determining 26

27 the dimension of the regression d. In this paper, we first identify an important candidate matrix M zz y. It helps to interpret the recovered subspace of the CDRS based on inverse second moment. Similar to M SIR targeting the complete inverse mean subspace (Yin and Cook, 2002), M zz y is able to recover the complete inverse second moment subspace of the CDRS. It plays a key role in revealing the comprehensiveness of SAVE and SIMR over SIR and phd. Besides, M zz y has a concise form which simplifies the construction of large sample tests. Secondly, we propose a new class of dimension reduction methods SIMR α, α (0, 1) and develop corresponding weighted chi-squared tests for the dimension of the regression. SIMR α and SAVE are theoretically equivalent since that the subspaces spanned by their underlying matrices are identical. Nevertheless, simulation study shows that SIMR α with some chosen α may perform better than SAVE. The main reason is that SAVE is only a fixed combination of the first two inverse moments. The simulation example in Section 5 implies that any fixed combination can not always be the winner. Apparently, SIMR 0.6 can not always be the winner either. For example, if the simulation example is changed to y = 2z 1 ε + z z 3, SIMR α with α closer to 1 will perform better. For practical use, multiple methods, as well as their combinations, should be tried and unified. SIMR α with α (0, 1) provide a simple solution to it. A candidate interval of α can be determined conveniently based on the weighted chi-squared tests when the estimated CDRS are fairly stable. An optimal α therefore can be chosen based on the p-value criterion (Section 5.2). As expected, the chosen one works well. One of the main concerns about the proposed tests of SIMR α is their 27

28 stability and computation cost, since that a (p d)(ph+h d) by (p d)(ph+ H d) matrix W needs to be calculated. Based on the 1000 simulations in Section 5.3, SIMR α s weighted chi-squared test actually works fairly accurate under various sample sizes and numbers of slices. As a conclusion, we propose SIMR using weighted chi-squared tests as an important class of dimension reduction methods, which should be routinely considered during the search for the central dimension reduction subspace and its dimension. Appendix Proof of Lemma 1: By definition, Span{E(MM )} Span{M(ω), ω Ω 0 }, if P (Ω 0 ) = 1. On the other hand, for any v p 1 0, v E(M(ω)M (ω)) = 0 v E(M(ω)M (ω))v = 0 E([v M(ω)][v M(ω)] ) = 0 [v M(ω)] 0, with probability 1 Since {v : v E(MM ) = 0} only has finite dimension, there exists an Ω 0 with probability 1, such that, dim(span{e(m(ω)m (ω))}) dim(span{m(ω), ω Ω 0 }). Thus, Span{E(M(ω)M (ω))} = Span{M(ω), ω Ω 0 } Proof of Corollary 2: Span{M y phd = E[y(µ 2 (y) I)]} Span{[µ 2 (y) I], y} = Span{M zz y}. 28

29 Proof Proposition 1: Define µ i = E[(zz I) y = a i ] = E(zz y = a i ) I and f i = Pr(y = a i ) for i = 0,...k, then Σ k i=0f i = 1 and Σ k i=0f i µ i = E((zz I)) = 0. The rest of the steps follow the exactly same proof as in Yin and Cook (2002, A.3. Proposition 4). Proof of Proposition 2: By Lemma 1, Span{M SAVE } = Span{[µ 1 (y)µ 1 (y) + (µ 2 (y) I)], y} Span{µ 1 (y), y} + Span{(µ 2 (y) I), y} = Span{M SIR } + Span{M zz y} Span{M SIR } + [Span{µ 1 (y)µ 1 (y) + (µ 2 (y) I), y} +Span{µ 1 (y), y}] Span{M SIR } + Span{M SAVE } + Span{M SIR } = Span{M SAVE }. The last step follows from (3). Proof of Proposition 3: By Lemma 1, Span{M (α) SIMR } = Span{(µ 1(y), [µ 2 (y) I]), y} = Span{µ 1 (y), y} + Span{[µ 2 (y) I], y} = Span{M SIR } + Span{M zz y} = Span{M SAVE }. Proof of Corollary 4: By Corollary 3 and Proposition 3, any column of M SIR, M y phd or M SAVE belongs to Span{M (α) SIMR }. combination of them still belongs to Span{M (α) SIMR }. Therefore, any linear 29

30 Proof of Theorem 2: Actually, 1/2 1 α ( ˆF Ĝ) Û n = ˆΣ x (C n, M n ) 0 1/2 1 α (F G) Σ B = Σ 1/2 x (C, M) 0 1/2 ˆΣ x 0 x 0, α ˆF Ĝ α F G, (Γ 12B 1, Γ 12B 2 ) = 0 (p d) (ph+h), B 1 Γ B 2 Γ 222 = 0 p (ph+h d), Span{C Σ 1/2 x Γ 12 } Span{1 H I p }, Span{M Σ 1/2 x Γ 12 } Span{1 H }, 1 H ˆF = 0, 1 HF = 0. Writing Îp = ˆΣ 1/2 x Σ 1/2 x, nγ 12 Û n Γ 22 = nγ 12Ûn,1Γ nγ 12Ûn,2Γ 222 = 1 α nγ 12(Îp I p + I p )Σ 1/2 x (C n C + C)[( ˆF Ĝ F G + F G) I p] (I H Σ 1/2 x )[I H (Î p I p + I p )]Γ α nγ 12(Îp I p + I p ) Σ 1/2 x (M n M + M)( ˆF Ĝ F G + F G)Γ 222 = 1 α nγ 12Σ 1/2 x (C n C)[F G I p ](I H Σ 1/2 x )Γ α nγ 12Σ 1/2 x (M n M)F GΓ O p (n 1/2 ) = nγ 12Σ 1/2 x [(C n, M n ) (C, M)] 1/2 1 α(f G) Σ x 0 0 α F G Γ 22 + O p (n 1/2 ). 30

31 Therefore, the asymptotic distribution of Γ 12ÛnΓ 22 is determined only by the asymptotic distribution of (C n, M n ). The detail of 0, (p 2 H + ph + p) (p 2 H + ph + p): 1,1 0 1,2 0 1,3 0 0 = 2,1 0 2,2 0 2,3 0, 3,1 0 3,2 0 3,3 0 where 1,1 0 = diag {..., Cov (Vec(xx ) ỹ = h) /f h,...}, p 2 H p 2 H 2,1 0 = diag {..., Cov (x, Vec(xx ) ỹ = h) /f h,...}, ph p 2 H 2,2 0 = diag {..., Cov (x ỹ = h) /f h,...}, ph ph 3,1 0 = [..., Cov (x, Vec(xx ) ỹ = h),...], p p 2 H 3,2 0 = [..., Cov (x ỹ = h),...], p ph 3,3 0 = Σ x, p p 1,2 0 = ( ) 2,1 0, 1,3 0 = ( ) 3,1 0, 2,3 0 = ( ) 3,2 0. References Breiman, L., Friedman, J., Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 80, Bura, E., Cook, R.D., Extending sliced inverse regression: the weighted 31

32 chi-squared test. Journal of the American Statistical Association. 96, Cook, R.D., 1994a. On the interpretation of regression plots. Journal of the American Statistical Association. 89, Cook, R.D., 1994b. Using dimension-reduction subspaces to identify important inputs in models of physical systems. Proceedings of the Section on Physical and Engineering Sciences. Alexandria, VA: American Statistical Association Cook, R.D., Graphics for regressions with a binary response. Journal of the American Statistical Association. 91, Cook, R.D., 1998a. Principal Hessian directions revisited (with discussion). Journal of the American Statistical Association. 93, Cook, R.D., 1998b. Regression Graphics, Ideas for Studying Regressions through Graphics. Wiley, New York. Cook, R.D., Critchley, F., Identifying regression outliers and mixtures graphically. Journal of the American Statistical Association. 95, Cook, R.D., Li, B., Dimension reduction for conditional mean in regression. Annals of Statistics. 30, Cook, R.D., Lee, H., Dimension-reduction in binary response regression. Journal of the American Statistical Association. 94,

33 Cook, R.D., Ni, L., Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. Journal of the American Statistical Association. 100, Cook, R.D., Weisberg, S., Discussion of sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 86, Cook, R.D., Weisberg, S., An Introduction to Regression Graphics. Wiley, New York. Cook, R.D., Yin, X., Dimension reduction and visualization in discriminant analysis (with discussion). Australian & New Zealand Journal of Statistics. 43, Cramér, H., Mathematical Methods of Statistics. Princeton University Press, Princeton. Eaton, M.L., Tyler, D.E., The asymptotic distributions of singular values with applications to canonical correlations and correspondence analysis. Journal of Multivariate Analysis. 50, Gannoun, A., Saracco, J., Asymptotic theory for SIR α method. Statistica Sinica. 13, Hastie, T., Tibshirani, R., Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. Hooper, J., Simultaneous equations and canonical correlation theory. Econometrica. 27,

34 Hotelling, H., Relations between two sets of variates. Biometrika. 28, Li, K.-C., 1991a. Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association. 86, Li, K.-C., 1991b. Rejoinder to sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 86, Li, K.-C., On principal Hessian directions for data visualization and dimension reduction: another application of Stein s lemma. Journal of the American Statistical Association. 87, Satorra, A., Bentler, P.M., Corrections to test statistics and standard errors in covariance structure analysis. In: von Eye, A., Clogg C.C. (Eds.), Latent Variables Analysis: Applications for Developmental Research, , Sage, Newbury Park, CA. Satterthwaite, F. E. (1941). Synthesis of variance. Psychometrika. 6, Shao, Y., Cook, R.D., Weisberg, S., Marginal tests with sliced average variance estimation. Biometrika. 94, Weisberg, S., Dimension reduction regression in R. Journal of Statistical Software. 7. Available from Weisberg, S., The dr package. Available from 34

35 Wen, X., Cook, R.D., Optimal sufficient dimension reduction in regressions with categorical predictors. Journal of Statistical Inference and Planning. 137, Wood, A., An F-approximation to the distribution of a linear combination of chi-squared random variables. Communication in Statistics, Part B - Simulation and Computation. 18, Yin, X., Cook, R.D., Dimension reduction for the conditional kth moment in regression. Journal of the Royal Statistical Society, Ser. B. 64, Yin, X., Cook, R.D., Estimating central subspaces via inverse third moments. Biometrika. 90, Ye, Z., Weiss, R.E., Using the bootstrap to select one of a new class of dimension reduction methods. Journal of the American Statistical Association. 98,

36 Table 1: Simulated Example: Estimated CDRS by SIR (H = 10), r-phd, SAVE (H = 10) and SIMR α (H = 10), α = 0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95 Eigenvectors Eigenvectors First Second Third Fourth First Second Third Fourth r phd (0.369) (0.770) (0.530) SAVE (0.163) (0.115) (-.032) (0.072) (0.063) (0.081) (0.027) (-.891) (0.098) (0.464) (0.802) (-.621) (0.264) (-.626) (0.707) (-.568) (-.775) SIMR (-.119) (-.105) SIMR (-.132) (-.118) (-.031) (-.049) (-.868) (0.516) (0.342) (0.468) (0.850) (0.929) SIMR (-.166) SIMR (-.191) (-.062) (-.065) (0.186) (0.123) (0.966) (0.972) SIMR (-.305) SIMR (0.851) (-.516) (-.065) (0.263) (-.054) (0.029) (-.003) (0.000) (0.950) (0.454) (0.855) SIMR (0.428) (-.733) SIR (-.404) (-.329) (0.865) (0.820) (-.035) (0.187) (-.944) (-.229) (0.009) (-.004) (0.049) (-.012) (-.006) (0.380) (0.679) (-.894) (-.019) (-.447) Note: ( ) indicates nonsignificant direction at level 0.05 based on the corresponding test for the dimension of the regression. 36

37 Table 2: Simulated Example: p-values of SIR (H = 10), r-phd, SAVE (H = 10), and SIMR α (H = 10), α = 0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95, as well as 1 r distance from the true CDRS p values Null Hypotheses r phd SAVE SIMR 0.1 SIMR 0.3 SIMR 0.5 d d d d r p values Null Hypotheses SIMR 0.6 SIMR 0.8 SIMR 0.9 SIMR 0.95 SIR d d d d r

38 Figure 1: Optimal α according to log distance away from the true CDRS (left panel, arccos(q), 1 q, 1 r are three types of distance) or p-values of weighted chi-squared tests (right panel, d = 3 indicates the test d 2 versus d 3, and so on) log scale arccos(q) 1 q 1 r log p value d= d=3 d=2 d= α α 38

39 Figure 2: Optimal α according to variability of 200 bootstrapped estimated CDRS (d = 3 indicates that the first 3 eigenvectors are considered, and so on) mean of log(1 r) d=1 d=2 d=3 log mean of (1 r) d=1 d=2 d= α α 39

40 Table 3: Empirical Power and Size of Marginal Dimension Tests for SIR, SAVE, SIMR α with α Chosen by p-value Criterion, and r-phd, as Well as Mean of 1 r Distances between Estimated 3-Dim CDRS and True CDRS, Based on 1000 Simulations (Significance Level: 0.05; Sample Size: 200, 400, 600; Number of Slices: 5, 10, 15) n=200 SIR SAVE SIMR α r-phd Slice d d d d mean(1 r) n=400 SIR SAVE SIMR α r-phd Slice d d d d mean(1 r) n=600 SIR SAVE SIMR α r-phd Slice d d d d mean(1 r)

Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction

Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction Zhishen Ye a, Jie Yang,b,1 a Amgen Inc., Thousand Oaks, CA 91320-1799, USA b Department of Mathematics, Statistics,

More information

A review on Sliced Inverse Regression

A review on Sliced Inverse Regression A review on Sliced Inverse Regression Kevin Li To cite this version: Kevin Li. A review on Sliced Inverse Regression. 2013. HAL Id: hal-00803698 https://hal.archives-ouvertes.fr/hal-00803698v1

More information

Marginal tests with sliced average variance estimation

Marginal tests with sliced average variance estimation Biometrika Advance Access published February 28, 2007 Biometrika (2007), pp. 1 12 2007 Biometrika Trust Printed in Great Britain doi:10.1093/biomet/asm021 Marginal tests with sliced average variance estimation

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Sufficient Dimension Reduction using Support Vector Machine and it s variants Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and

More information

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

Moment Based Dimension Reduction for Multivariate. Response Regression

Moment Based Dimension Reduction for Multivariate. Response Regression Moment Based Dimension Reduction for Multivariate Response Regression Xiangrong Yin Efstathia Bura January 20, 2005 Abstract Dimension reduction aims to reduce the complexity of a regression without requiring

More information

Sliced Inverse Regression for big data analysis

Sliced Inverse Regression for big data analysis Sliced Inverse Regression for big data analysis Li Kevin To cite this version: Li Kevin. Sliced Inverse Regression for big data analysis. 2014. HAL Id: hal-01081141 https://hal.archives-ouvertes.fr/hal-01081141

More information

Simulation study on using moment functions for sufficient dimension reduction

Simulation study on using moment functions for sufficient dimension reduction Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2012 Simulation study on

More information

Combining eigenvalues and variation of eigenvectors for order determination

Combining eigenvalues and variation of eigenvectors for order determination Combining eigenvalues and variation of eigenvectors for order determination Wei Luo and Bing Li City University of New York and Penn State University wei.luo@baruch.cuny.edu bing@stat.psu.edu 1 1 Introduction

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and his article appeared in a journal published by Elsevier. he attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56 Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The

More information

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra. Scaled and adjusted restricted tests in multi-sample analysis of moment structures Albert Satorra Universitat Pompeu Fabra July 15, 1999 The author is grateful to Peter Bentler and Bengt Muthen for their

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES MATRICES ARE SIMILAR TO TRIANGULAR MATRICES 1 Complex matrices Recall that the complex numbers are given by a + ib where a and b are real and i is the imaginary unity, ie, i 2 = 1 In what we describe below,

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

A note on sufficient dimension reduction

A note on sufficient dimension reduction Statistics Probability Letters 77 (2007) 817 821 www.elsevier.com/locate/stapro A note on sufficient dimension reduction Xuerong Meggie Wen Department of Mathematics and Statistics, University of Missouri,

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Supplementary Materials for Tensor Envelope Partial Least Squares Regression

Supplementary Materials for Tensor Envelope Partial Least Squares Regression Supplementary Materials for Tensor Envelope Partial Least Squares Regression Xin Zhang and Lexin Li Florida State University and University of California, Bereley 1 Proofs and Technical Details Proof of

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

LECTURE NOTE #10 PROF. ALAN YUILLE

LECTURE NOTE #10 PROF. ALAN YUILLE LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Uniqueness of the Solutions of Some Completion Problems

Uniqueness of the Solutions of Some Completion Problems Uniqueness of the Solutions of Some Completion Problems Chi-Kwong Li and Tom Milligan Abstract We determine the conditions for uniqueness of the solutions of several completion problems including the positive

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Fused estimators of the central subspace in sufficient dimension reduction

Fused estimators of the central subspace in sufficient dimension reduction Fused estimators of the central subspace in sufficient dimension reduction R. Dennis Cook and Xin Zhang Abstract When studying the regression of a univariate variable Y on a vector x of predictors, most

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Independent component analysis for functional data

Independent component analysis for functional data Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Forecasting Macroeconomic Variables

Forecasting Macroeconomic Variables Cluster-Based Regularized Sliced Inverse Regression for Forecasting Macroeconomic Variables arxiv:1110.6135v2 [stat.ap] 2 Dec 2013 Yue Yu 1, Zhihong Chen 2, and Jie Yang 3 1 TradeLink L.L.C., Chicago,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

Economics 620, Lecture 5: exp

Economics 620, Lecture 5: exp 1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

The LIML Estimator Has Finite Moments! T. W. Anderson. Department of Economics and Department of Statistics. Stanford University, Stanford, CA 94305

The LIML Estimator Has Finite Moments! T. W. Anderson. Department of Economics and Department of Statistics. Stanford University, Stanford, CA 94305 The LIML Estimator Has Finite Moments! T. W. Anderson Department of Economics and Department of Statistics Stanford University, Stanford, CA 9435 March 25, 2 Abstract The Limited Information Maximum Likelihood

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information