WALD CONSISTENCY AND THE METHOD OF SIEVES IN REML ESTIMATION. By Jiming Jiang Case Western Reserve University
|
|
- Shonda Norman
- 6 years ago
- Views:
Transcription
1 The Annals of Statistics 1997, Vol. 5, o. 4, WALD COSISTECY AD THE METHOD OF SIEVES I REML ESTIMATIO By Jiming Jiang Case Western Reserve University We prove that for all unconfounded balanced mixed models of the analysis of variance, estimates of variance components parameters that maximize the (restricted) Gaussian likelihood are consistent and asymptotically normal and this is true whether normality is assumed or not. For a general (nonnormal) mixed model, we show estimates of the variance components parameters that maximize the (restricted) Gaussian likelihood over a sequence of approximating parameter spaces (i.e., a sieve) constitute a consistent sequence of roots of the REML equations and the sequence is also asymptotically normal. The results do not require the rank p of the design matrix of fixed effects to be bounded. An example shows that, in some unbalanced cases, estimates that maximize the Gaussian likelihood over the full parameter space can be inconsistent, given the condition that ensures consistency of the sieve estimates. 1. Introduction. In many cases exploration of asymptotic properties of maximum likelihood (ML) estimates led to one of two types of consistency: Cramér (1946) or Wald (1949) types. The former establishes the consistency of some root of the likelihood equation(s) but usually gives no indication on how to identify such a root when the roots of the likelihood equation(s) are not unique. The latter, however, states that an estimate of the parameter vector that maximizes the likelihood function is consistent. Therefore it is not unusual that Wald consistency sometimes fails whereas the Cramér one holds [e.g., Le Cam (1979)]. Often, obstacles to Wald consistency arise on the boundary of the parameter space. For this reason, the maximization is sometimes carried out over a sequence of subspaces which may belong to the interior of and approaches as sample size increases. Methods of this type have been studied in the literature, among them those that Grenander (1981) called sieves. In this paper we consider asymptotic behaviors of REML restricted or residual maximum likelihood estimates in variance components estimation. The REML method was first proposed by W. A. Thompson (196). Several authors have given overviews on REML, which can be found in Harville (1977), Khuri and Sahai (1985), Robinson (1987) and Searle, Casella and McCulloch (199). A general mixed model can be written as (1) y = Xβ + Z 1 α Z s α s + ε Received February 1995; revised October AMS 1991 subject classification. 6F1. Key words and phrases. Mixed models, restricted maximum likelihood, Wald consistency, the method of sieves. 1781
2 178 J. JIAG where y is an 1 vector of observations, X is an p known matrix of full rank p, β is a p 1 vector of unknown constants (the fixed effects), Z i is an m i known matrix, α i is a m i 1 vector of i.i.d. random variables with mean 0 and variance σi (the random effects), i = 1 s; ε is an 1 vector of i.i.d. random variables with mean 0 and variance σ0 (the errors). Asymptotic results of maximum likelihood type estimates for model (1) are few in number, with or without normality assumptions. Assuming normality, and the model having a standard AOVA structure, Miller (1977) proved a result of Cramér consistency and asymptotic normality for the maximum likelihood estimates (MLE) of both the fixed effects and the variance components σ0 σ s. Under conditions slightly stronger than those of Miller, Das (1979) obtained a similar result for the REML estimates. Also under the normality assumption, Cressie and Lahiri (1993) gave conditions under which Cramér consistency and asymptotic normality of the REML estimates in a general mixed model would hold, using a result of Sweeting (1980). Without assuming normality, in which case the REML estimates are defined as solutions of the REML equations derived under normality, Richardson and Welsh (1994) proved Cramér consistency and asymptotic normality for hierarchical (nested) mixed models. A common feature of the above results is that the rank p of the design matrix X of the fixed effects was held fixed. A more important and interesting question related to the (possible) superiority of REML over straight ML in estimating the variance components is how the REML estimates behave asymptotically with p. In such situations a well-known example showing the inconsistency of the MLE is due to eyman and Scott (1948). The question was answered recently by Jiang (1996), in which Cramér consistency and asymptotic normality of the REML estimates were proved under the assumption that the model is asymptotically identifiable and infinitely informative. The results do not require boundedness of p, normality, or any structure (such as AOVA or nested design, etc.) for the model. As a consequence, Cramér consistency was shown to hold for all unconfounded balanced mixed models of the analysis of variance. In contrast to the preceding results, the only consistency result in variance components estimation that might be considered as Wald type was Hartley and Rao (1967) for MLE. As was noted by Rao (1977), a corrected version of Hartley Rao s proof would establish the Wald consistency of the MLE, under the assumption that the number of observations falling into any particular level of any random factor stayed below a universal constant. To our knowledge, there has not been discussion on the method of sieves in variance components estimation. From both theoretical and practical points of view, consistency of Wald type (nonsieve or sieve) is more gratifying than that of Cramér type, although the former often requires stronger assumptions. Since in nonnormal cases [e.g., Richardson and Welsh (1994), Jiang (1996)] the REML estimates are defined as a solution of the (normal) REML equations, a natural question regarding Wald type consistency is whether the solution that maximizes the (restricted) Gaussian likelihood is consistent. The aim of this paper is to establish the following.
3 REML ESTIMATIO Wald consistency holds for all unconfounded balanced mixed models of the analysis of variance. That is, estimates of the variance components parameters that maximize the restricted Gaussian likelihood are consistent.. The method of sieves works for all mixed models provided the models are asymptotically identifiable and infinitely informative globally. That is, there is a clear way of constructing the sieve which will produce consistent estimates. 3. In both (1) and () the estimates are also asymptotically normal. Again the results do not require normality or boundedness of p or structures for the model. The Wald consistency in the balanced case is also without the assumption that the true parameter vector is an interior point of the parameter space. We also give an example showing that in some unbalanced cases there is sieve Wald consistency but no Wald consistency. The restricted Gaussian likelihood is given in Section. Wald consistency in the balanced case is proved in Section 3. In Section 4 we discuss the method of sieves for the general mixed model (1). An example and a counterexample are given in Section 5. In Section 6 we make some remarks about the techniques we use and open problems.. Definitions, basic assumptions, and a simple lemma. There are two parametrizations of the variance components: θ 0 = λ = σ0 θ i = µ i = σi /σ 0 1 i s [Hartley and Rao (1967)] and φ i = σi 0 i s. There is a one-to-one correspondence between the two sets of parameters. In this paper, estimates of the two sets of parameters are equivalent in the sense that consistency for one set of parameters implies that for the other. The parameter spaces are = θ λ > 0 µ i 0 1 i s and = φ σ0 > 0 σ i 0 1 i s. The true parameter vectors will be denoted by θ 0 = θ 0i 0 i s = λ 0 µ 0 = λ 0 µ 0i 1 i s and φ 0 = φ 0i 0 i s. Given the data y, the restricted (or residual) Gaussian log-likelihood is the Gaussian log-likelihood based on z = A y, where A is a p matrix such that () ranka = p A X = 0 Thus the restricted Gaussian log-likelihoods for estimating the two sets of parameters θ i 0 i s and φ i 0 i s are given, respectively, by ( ) p L θ = c log λ 1 1 (3) log VA µ λ z VA µ 1 z and (4) L φ = c 1 log UA φ 1 z UA φ 1 z where c = p/ log π VA µ = G 0 + s i=1 µ i G i UA φ = s φ i G i = λva µ with G 0 = A A G i = A Z i Z ia 1 i s. The
4 1784 J. JIAG maximizers ˆθ = ˆθ i 0 i s and ˆφ = ˆφ i 0 i s of (3) and (4) do not depend on the choice of A so long as () is satisfied. Therefore we may assume, w.l.o.g., that (5) A A = I p ote that the relation between ˆθ i 0 i s and ˆφ i 0 i s is the same as that for the parameters. The dependence of VA µ and UA φ on A is not important in this paper; therefore we will abbreviate these by Vµ and Uφ, respectively. Also w.l.o.g., we can focus on a sequence of designs indexed by, since consistency and asymptotic normality hold iff they hold for any sequence with increasing strictly monotonically. Thus, for example, p and m i 1 i s can be considered as functions of, and y, X, Z i s and so on as depending on [e.g., Jiang (1996)]. Let m 0 =, α 0 = ε. The following assumptions A1 and A are made for model (1). A1. For each, α 0 α 1 α s are mutually independent. A. For 0 i s, the common distribution of α i1 α imi may depend on. However, it is required that (6) lim sup x max 0 i s Eα4 i1 1 α i1 >x = 0 The basic idea of proving Wald consistency (nonsieve or sieve) is simple. Consider, for example, the log-likelihood (3). The difference L θ L θ 0 can be decomposed as (7) L θ L θ 0 = e θ θ 0 + d θ θ 0 where (8) e θ θ 0 = E θ0 L θ L θ 0 = 1 { p log λ + log Vµ λ 0 Vµ 0 + λ 0 λ trvµ 1 Vµ 0 p } (9) d θ θ 0 = L θ L θ 0 E θ0 L θ L θ 0 = 1 [ ] 1λ {z Vµ 1 1λ0 Vµ 0 1 z [ 1 E θ0 z λ Vµ 1 1 } Vµ λ 0 ]z 1 0
5 REML ESTIMATIO 1785 Essentially, what we are going to show is that, with probability 1 and uniformly for θ outside a small neighborhood δ θ 0 = θ θ θ 0 < δ, the second term on the RHS of (7) is negligible compared with the first term, which is negative. Therefore the LHS of (7) will be less than 0 for θ / δ θ 0, and the rest of the argument is the same as Jiang (1996). Let A B A i 1 i n be matrices. Define A = the determinant of A, A = λ max A A 1/ (λ max denotes the largest eigenvalue), A R = tra A 1/, CorA 1 A n = cora i A j if A i 0 1 i n, and 0 otherwise, where cora B = tra B/A R B R if A B 0; diaga i to be the block-diagonal matrix with A i as its ith diagonal block. Define bµ = I µ1 Z 1 µs Z s Bµ = AVµ 1 A [note that Bµ indeed does not depend on A], B 0 µ = bµbµbµ B i µ = bµbµz i Z i Bµbµ i = 1 s. Let p i 0 i s, be sequences of positive numbers. Denote U i φ = Uφ 1/ G i Uφ 1/ 0 i s V 0 θ = 1/λI p V i θ = Vµ 1/ G i Vµ 1/ 1 i s I ij θ = trv iθv j θ/p i p j K +m ij θ = 1 EW 4 l p i p j 3B iµ ll B j µ ll /λ 1 +1 j=0 l=1 i j = 0 1 s where m = m m s, B ll denotes the lth diagonal element of B; W l = ε l / λ 0 1 l W l = α i l k<i m k / λ 0 µ 0i + m k + 1 l + m k 1 i s k<i k i or = W l 1 l +m = ε / λ 0 α 1 / λ 0 µ 01 α s / λ 0 µ 0s, where α i / λ 0 µ 0i is understood as 0 0 (α ij / λ 0 µ 0i 0 1 j m i ) if µ 0i = 0. Let I θ = I ij θ K θ = K ij θ J θ = I θ + K θ. We say model (1) has positive variance components if θ 0 is an interior point of and is nondegenerate if (10) inf min 0 i s varα i1 > 0 A sequence of estimates ˆθ 0 ˆθ s is called asymptotically normal if there are sequences of numbers p i 0 i s and matrices M θ 0 with lim supm 1 θ 0 M θ 0 < such that (11) M θ 0 p 0 ˆθ 0 θ 00 p s ˆθ s θ 0s 0 Is+1 Similarly we define the asymptotic normality of estimates for the φ s.
6 1786 J. JIAG The following lemma will be used in proofs of both Theorem 3.1 and Theorem 4.1 in the next two sections. Lemma.1. Let = X 1 X n be a vector of independent random variables such that EX i = 0 EX 4 i < 1 i n, and A = a ij be an n n symmetric matrix. Then n var A = a ii varx i + a ij varx i varx j i=1 i j { varx } max i n a 1 i n varx i ij varx i varx j ij=1 where varx i /varx i is understood as 0 if varx i = 0. (1) Proof. We have A E A = { ( ) n a ii X i EX i + a ij X j X i } j<i i=1 where j<i = 0 if i = 1. The summands in (1) form a sequence of martingale differences with respect to the σ-fields i = σx 1 X i 1 i n. The result then follows easily. 3. The balanced case: Wald consistency. A balanced r-factor mixed model of the analysis of variance can be expressed (after possible reparametrization) in the following way [e.g., Searle, Casella and McCulloch (199), Rao and Kleffe (1988)]: (13) y = Xβ + i S Z i α i + ε where X = r+1 q=1 1d q n q with d = d 1 d r+1 S r+1 = 0 1 r+1 Z i = r+1 q=1 1i q nq with i = i 1 i r+1 S S r n = I n 1 1 n = 1 n with I n and 1 n being the n-dimensional identity matrix and vector of unit elements, respectively. [For examples, see Jiang (1996).] We assume, as usual, that factor r + 1 corresponds to repetition within cells. As a consequence, in (13) one must have d r+1 = 1 and i r+1 = 1 i S. Also we can assume, w.l.o.g., that n q 1 q r (since if n q = 1, factor q is not really a factor and the model not really an r-factor one). The model is called unconfounded if (1) the fixed effects are not confounded with the random effects and errors, that is, rankx Z i > p i and X I ; () the random effects and errors are not confounded; that is, I Z i Z i i S are linearly independent [e.g., Miller (1977)]. ote that under the balanced model (13) we have the expressions m i = i q =0 n q i S. This allows us to extend the definition of m i to all i S r+1. In particular, p = m d = d q =0 n q, and = m 00 0 = r+1 q=1 n q. Let S = 00 0 S µ i = µ 0i = 1 if i = 00 0 S r+1.
7 REML ESTIMATIO 1787 In the balanced case we have the following nice representations for the e θ θ 0 and d θ θ 0 in (7). (14) (15) Lemma 3.1. In the balanced case, e θ θ 0 = 1 S θ d θ θ 0 = 1 r l θ 1ξ l l d where for u v S u v iff u q v q 1 q r + 1, and u v iff u is not v; r l θ = λ 0 C l µ 0 /λc l µ with C l µ = i S µ i /m i 1 i l S θ = l dr l θ 1 log r l θp l with p l = l q =0n q 1 1/ ; and ξ l l d are random variables whose definition will be given later on in the proof. Proof. By the proof of Lemma 7.3 in Jiang (1996), there is an orthogonal matrix T such that T Z i Z i T = diagλ ij T XX T = diagλ dj, where λ i1 λ i = r+1 q=1 λ iqk q 1 k q n q 1 q r + 1 λ d1 λ d = r+1 q=1 λ dqk q 1 k q n q 1 q r + 1 with λ iqw = 1 i q + n q i q δ w 1 i S d (δ x y = 1 if x = y and = 0 otherwise). Let A T = B = b 1 b, then since [by (5)] AA = P X = I XX X 1 X = I p/xx B B = diagγ j, with γ j = 1 p/λ dj 1 j It is easy to see that r+1 q=1 1 i q + n q i q l q = /m i 1 i l i S d l S r+1. Hence r+1 λ iqkq q=1 = m i 1 i δk 1 i S r+1 λ dqkq q=1 = p 1 d δ k 1 where δ k1 = δ k1 1 δ kr+1 1 for k = k 1 k r+1. Thus γ 1 γ = 1 δk 1 d 1 k q n q 1 q r + 1 ow G i = A Z i Z i A = B diagλ ijb = j=1 λ ij b j b j i S, thus Vµ = I p + µ t G t = BB + µ t G t t S t S (16) = ( j=1 1 + ) µ t λ tj b j b j = t S γ j 0 ( 1 + µ t λ tj )b j b j t S since γ j = 0 b j = 0. Hence Vµ 1 = γ j 01 + t S µ t λ tj 1 b j b j since γ j 0 γ j = 1. Let Ɣ + = j γ j 0. Then Ɣ + = p (hereafter E denotes the cardinality of a set E). It follows from (16) that the eigenvalues
8 1788 J. JIAG of Vµ are 1 + t S µ t λ tj j Ɣ +. Thus by (8), e θ θ 0 = 1 { λ0 1 + t S µ 0t λ tj λ1 + t S µ t λ tj 1 log λ } 01 + t S µ 0t λ tj λ1 + t S µ t λ tj = 1 = 1 γ j 0 n 1 k 1 =1 1 l 1 =0 n r+1 k r+1 =1 1 l r+1 =0 = 1 { λ0 C l µ 0 λc l µ l d 1 δk 1 d { λ0 1 + t S µ 0t /m t 1 t δk 1 λ1 + t S µ t /m t 1 t δk 1 } 1 log { λ0 1 + t S µ 0t /m t 1 t l 1 l d λ1 + t S µ t /m t 1 t l } 1 log n q 1 l q =0 1 log λ } 0C l µ 0 n λc l µ q 1 = 1 S θ l q =0 using a formula given at the end of the proof of Lemma 7.3 in Jiang (1996) for the third equation; and by (9), d θ θ 0 = 1 ( [λ ) 1 ( µ t λ tj λ ) 1 ] µ 0t λ tj γ j 0 t S t S = 1 = 1 n 1 k 1 =1 b j z E θ0 b j z n r+1 ( [λ 1 1 l 1 =0 k r+1 =1 1 1 δk 1 d 1 + µ t 1 m t δk 1 t S t ) 1 ( λ ) 1 ] µ 0t 1 m t δk 1 t S t b k z E θ0 b k z l r+1 =0 δ k 1 =l = 1 r l θ 1ξ l l d 1 δk 1 d b k z E θ0 b k z where ξ l = λ 0 C l µ 0 1 δ k 1 =lb k z E θ0 b k z l d. Let S 1 θ = l dr l θ 1 p l S 3θ = l d r l θ 1p l. Define S + = i S µ 0i > 0 S + = 00 0 S + b 0 = min i S + µ 0i 1 i S µ + 0i if S +, and b 0 = 1 if S + = ; ε 0 = λ 0 /3 i S µ 0i 1 ; b = 1+explog /r
9 REML ESTIMATIO (here x means the largest interger less than or equal to x); s = S; and T l = q 1 q r + 1 l q = 1. The following lemma plays a key role in the proof of Wald consistency in the balanced case. (17) (18) (19) Lemma 3.. Suppose model (13) is unconfounded. For any B > 0, { ( ) ( inf S 1 θ 1/ r+3 min m i θ / B i S { ( ) ( inf S θ 1/ r+5 b 1 0 min m i θ/ B i S ( S 3θ sup θ/ B S θ r+3 b 0 inf θ/ B S θ ) 1/ + r+1 ( b log ε 0 B 3ε 0 + sλ 0 ) } ε 0 B 3ε 0 + sλ 0 ) ( ) 1/ min m i i S ) } where B = θ λ λ 0 ε 0 B/ µ l µ 0l b T l B/ m l l S. Proof. Let i = 0 = 0 0 if n r+1 ; i = 0 01 if n r+1 = 1. It is easy to show that i d, i i i S, and p i 1 r+1 m i = 1 r+1. So if λ < /3λ 0 or λ > λ 0, then (0) S 1 θ r i θ 1 p i = λ 0 /λ 1 p i 1/ r+3 If /3λ 0 λ λ 0 and µ l > λ 0 /ε 0 for some l S, then r l θ = λ 0 D l µ 0 /λd l µ < 1/, where D l µ = i S µ i m l /m i 1 i l. So S 1 θ r l θ 1 p l 1/r+ m l. ow suppose /3λ 0 λ λ 0 0 µ l λ 0 /ε 0 l S, and θ / B. If λ λ 0 > ε 0 B/, then as in (0) we have S 1 θ 1/ r+3 ε 0 /λ 0 B. If λ λ 0 ε 0 B/ and µ l µ 0l > b Tl B/ m l for some l S, there is l S such that µ l µ 0l > b Tl B/ m l and µ i µ 0i b Ti B/ m i i S i l i l. Then D l µ 0 D l µ µ 0l µ l µ 0i µ i m l /m i 1 i l i l > B/ m l i S since µ 0i µ i m l /m i 1 i l i l B/ m l i S i l i l b T i = B/ m l b + 1 Tl b Tl B/ m l b Tl 1
10 1790 J. JIAG Therefore, λ 0 D l µ 0 λd l µ λd l µ 0 D l µ λ λ 0 D l µ 0 /3λ 0 B/ m l ε 0 B/ µ 0i i S λ 0 /3B/ m l Hence S 1 θ r l θ 1 p l 1/r+ ε 0 B/3ε 0 + sλ 0 To prove (18), let l S + l = i S + i l such that m l = min i S + m l i if S + l, and l = i if S + l =. Then it is easy to show that C l µ 0 /C l µ 0 b 0 and hence (1) r l θ r l θ = C lµ 0 C l µ C l µ 0 C l µ b 0 If r l θ b 0 for some l d, then r l θ by (1). Therefore, () S θ r l θ 1 log r l θp l ( ) 1 log r l θp l 1/ r+1 1 log min m i i S using that x 1 log x/x 1 log / x. If r l θ < b 0 for all l d, then S θ S 1 θ/4b 0 by x 1 log x x 1 /L 0 < x < L L > 1. To prove (19), let S = i S. Write (3) S 3 θ = For any ε > 0, l d r l θ<b 0 r l θ 1p l + l d r l θ b 0 r l θ 1p l = S 1 + S (4) ( ) 1/ ( S 1 1 r l θ 1 p l l d r l θ<b 0 l d r l θ<b 0 } {ε r ε r l θ 1 p l l d r l θ<b 0 { ε r 1 1 } S θ + 4b 0ε S θ ) 1/ by x ε 1 + εx / and that S θ 4b 0 1 l d r l θ<b 0 r l θ 1 p l. By (1), the facts that (according to the definition) l only takes values in
11 REML ESTIMATIO 1791 S = i S and l l (therefore p l p l ), and the fact that ( ) 1 log S θ r i θp i it follows that (5) i S r i θ S b 0 + 1/ r l θp l l d r l θ = b 0 + 1/ r+1 b 0 + 1/ i S r i θ r i θ i S r i θ p l l d l =i r i θp i r+1 b log min i S p i S θ Inequality (19) now follows by (3) (5), taking ε = 4b 0 inf θ/ B S θ 1/ in (4), and the fact that min p i r+1( ) 1/ min m i i S Let ˆθ = ˆλ ˆµ i i S be the maximizer of (3) (one may define ˆθ as any fixed point in when the maximum can not be reached). Theorem 3.1. Let the balanced mixed model (13) be unconfounded. As and m i i S, we have the following: (i) ˆθ is consistent and the sequence p ˆλ λ 0 m i ˆµ i µ 0i i S is bounded in probability. (ii) If, moreover, the model has positive variance components and is nondegenerate, then ˆθ is asymptotically normal with p 0 = p p i = mi i S and M θ 0 = J 1/ θ 0I θ 0. Remark. Part (i) of Theorem 3.1 does not require that the model have positive variance components. In particular, when such requirement does hold, ˆθ constitutes (with probability 1) a consistent sequence of roots of the REML equations [e.g., Jiang (1996)], which are identical to the AOVA estimates in the balanced case [e.g., Searle, Casella and McCulloch (199)]. On the other hand, it is necessary for part (ii) of Theorem 3.1 that the model have positive variance components since otherwise the estimates, which are nonnegative, cannot be asymptotically normal. Proof. According to the definition in Section, it is always true (no matter whether µ 0i = 0 or not) that Z i α i = λ 0 µ 0i Z i α i / λ 0 µ 0i 1 i s, and i S (6) z = A y = λ 0 A bµ 0
12 179 J. JIAG ote that is a vector of independent random variables satisfying EW l = 0 varw l 1 [varw l = 1 if the corresponding µ 0i 0], and EW 4 l <. By the definition of ξ l at the end of the proof of Lemma 3.1, Lemma.1 and assumption A in Section, we see that with B = bµ 0 A δ k 1 =l b k b k A bµ 0, ) E θ0 ξl = var θ 0 ( B C l µ 0 C C l µ 0 trb (7) (( ) C ) = C l µ 0 tr b k b k Vµ 0 = Cp l δ k 1 =l for some constant C. Thus, for any M > 0, (8) P θ0 ξ l > Mp l for some l d r+1 C M For any ε > 0, choose M > r+1 Cε 1 1/. Then choose B > 0 and K such that by Lemma 3., sup θ/ B S 3 θ/s θ < 1/M m i > K i S. It then follows by (7), (14), (15), (18) and (8) that P θ0 L θ < L θ 0 θ / B > 1 ε when m i > K i S. The conclusion of (i) then follows by the definition of B. The proof for (ii) is the same as that of Theorem 4.1 of Jiang (1996). 4. The general case: the method of sieves. For convenience we consider in this section the parameters φ i 0 i s. Let δ M be two sequences of positive numbers such that δ 0 M. Let ˆφ δ M = φ δ φ i M 0 i s be the maximizer of L φ of (4) over δ M. The above procedure is called the method of sieves [e.g., Grenander (1981)]. The following definitions are explained intuitively in Jiang (1996). Definition 4.1. Model (1) is called asymptotically identifiable under the invariant class (AI ) at φ θ if (9) lim inf λ mincorv 0 θ V s θ > 0 (see Section ; λ min means the smallest eigenvalue). Here the invariant class is the class of (location) invariant estimates (30) = estimates which are functions of A y with A satisfying () Definition 4.. Model (1) is called infinitely informative under the invariant class (I 3 ) at φ θ if (31) lim V iθ R = 0 i s
13 REML ESTIMATIO 1793 The following notation is needed for this section: λ θ = λ min CorV 0 θ V s θ ρ a b = inf θ a b λ θ τ c d = inf φ c d λ θ where a b = θ a θ i b 0 i s and so on. Lemma 4.1. For any x i 0 i s and φ, ( ){ } s x i U i φ λ θ s x 0 R 4s + 1 V 0θ R + λ x i V iθ R i=1 ( ) λ θ s x 4M i s + 1 V i1 R where M = max 0 i s φ i. Proof. By the relation (see Section ): U 0 φ = V 0 θ s i=1 λ 1 µ i V i θ U i φ = λ 1 V i θ 1 i s, we have (( s s ) ) s x i U i φ = tr y i V i θ λ θ y i V iθ R R where y 0 = x 0 y i = λ 1 x i µ i x 0 1 i s. Let B = 1 i s x i µ i x 0 1/x i, then s y i V iθ R 1 s + 1 x 0 V 0θ R + 1 s s + 1 { x 0 V 0θ R + λ x 0 V 0θ R + y i V iθ R i/ B i B } s x i V iθ R using the fact that V 0 θ R λ 1 µ i V i θ R 1 i s. The second inequality is obvious. Lemma 4.. Suppose model (1) is AI at all θ a b with a > 0, then lim inf ρ a b > 0. Proof. By direct computation it can be shown that (see Section ): corv θ i θ V j θ 4V kθ 4a 1 0 i j k s k The result thus follows by an argument of subsequences [e.g., Billingsley (1986), A10]. Denote, respectively, the interiors of and by and. Corollary 4.1. Suppose model (1) is AI at all θ, then lim inf τ c d > 0 for all 0 < c < d <. i=1
14 1794 J. JIAG Let model (1) be AI at all θ and I 3 at 1 = 1 1. Define sequences δ and M as follows. Let q = min 0 i s V i 1, a, b be positive numbers such that (3) a + b1 b 1 < 3s Pick f, g, h such that (33) 3a + b < f < s a s + 7b (34) (35) ( f 3a b 0 < g < min 1 a s + 7 b s + 1 ) f ( 0 < h < min f 3a b g 1 a s + 7 b s + 1 ) f g ow pick sequences c and d such that 0 c c 1 1 d 1 d and (36) lim inf q h τ c d > 0 (by Corollary 4.1, one certainly can do this). Finally, pick a baseline region δ B M B such that 0 < δ B < 1 < M B < and let δ = δ B c q a M = M B d q b ote. The selection of δ B and M B will not make a difference for the following asymptotic result. However, a reasonable choice of δ B and M B may be of practical convenience since, when is not sufficiently large, the region c q a d q b may appear to be too narrow. Theorem 4.1. components. Consider a general mixed model (1) having positive variance (i) If the model is asymptotically identifiable and infinitely informative (AI 4 ) at all θ (or φ o ), then ˆφ is consistent and the sequence V i 1 R ˆφ i φ 0i 0 i s is bounded in probability. (ii) If, moreover, the model is nondegenerate, then ˆφ is asymptotically normal with p 0 = p, p i being any sequence V i 1 R 1 i s and M θ 0 = J 1/ θ 0I θ 0. Remark 1. Thus, with probability 1, ˆφ constitutes a consistent sequence of roots of the REML equations. Remark. The assumption that model (1) is AI 4 at all θ is equivalent to saying that it is AI at all θ and I 3 at some θ and w.l.o.g., one can take θ = 1.
15 REML ESTIMATIO 1795 Proof. (i) Let = φ = φ i 0 i s φ i = δ + k i ε 0 i s for some integers 0 k i < K 0 i s, where K = q b+f ε = M δ /K. Let φ δ M, then by (4) we have a decomposition similar to (7): (37) L φ L φ 0 = ẽ φ φ 0 + d φ φ 0 By an expression similar to (8), we have ẽ φ φ 0 = E φ0 L φ L φ 0 = 1/ { log Uφ 0 1/ Uφ 1 Uφ 0 1/ + tri p Uφ 0 1/ Uφ 1 Uφ 0 1/ } p = 1/ log λ i + 1 λ i i=1 where λ 1 λ p are the eigenvalues of Bφ = Uφ 0 1/ Uφ 1 Uφ 0 1/. Since λ max Bφ 1 M 0 δ 1, where M 0 = max 0 i s φ 0i, we have by x 1 log x x 1 /L 0 < x L L 1 and Lemma 4.1 that (38) (39) δ ẽ φ φ 0 41 M 0 truφ 0 1/ Uφ 1 Uφ 0 1/ I p δ = s 41 M 0 φ i φ 0i U i φ δ τ c d 16s + 11 M 0 M R s V i 1 R φ i φ 0i It is easy to verify the following identity which holds for any φ : where Uφ 0 1 Uφ 1 = s φ i φ 0i Uφ 0 1 G i Uφ s j=0 s H = 1/ j=0 s φ i φ 0i φ j φ 0j Uφ 0 1 G i Uφ 1 G j Uφ 0 1 s φ i φ 0i φ j φ 0j Uφ 0 1 G i Uφ 0 1/ H Uφ 0 1/ G j Uφ 0 1 s φ k φ k Uφ 0 1/ k=0 Uφ 1 G k Uφ 1 + Uφ 1 G k Uφ 1 Uφ 0 1/
16 1796 J. JIAG In the following we pick φ in (39) such that φ i φ i < ε 0 i s. By an expression similar to (9) and by (39) we get (40) d φ φ 0 = L φ L φ 0 E φ0 L φ L φ 0 { s = 1/ φ i φ 0i z Uφ 0 1 G i Uφ 0 1 z E φ0 s s φ i φ 0i φ j φ 0j + j=0 s j=0 ( z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z E φ0 ) s φ i φ 0i φ j φ 0j z Uφ 0 1 G i Uφ 0 1/ = 1/I 1 I + I 3 E φ0 I 3 It follows from Lemma.1 and (6) that H Uφ 0 1/ G j Uφ 0 1 z E φ0 ( s var φ0 z Uφ 0 1 G i Uφ 0 1 z CU i φ 0 R Cδ 0 V i1 R s j=0 )} where C = varε1 /varε 1 max 1 i s varα i1 /varα i1 δ 0 = min 0 i s φ 0i. Thus with L 1 = 3s + 1C 1/ q g, we have on E 1 = { z Uφ 0 1 G i Uφ 0 1 z E φ0 L 1 δ 1 0 V i1 R 0 i s } that (41) and (4) Similarly, ( s ) 1/ I 1 L 1 s + 1 1/ δ0 1 V i 1 R φ i φ 0i P φ0 E c 1 s var φ0 z z L 1 δ 0 V i1 R 1/3q g var φ0 z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z CUφ 0 1/ G i Uφ 1 G j Uφ 0 1/ R CU i φ 0 U j φ 0 U i φ R U j φ R Cδ 0 δ V i 1 R V j 1 R (Here is a brief derivation of the second inequality in the above: Uφ 0 1/ G i Uφ 1 G j Uφ 0 1/ R = AB R A A R B B R
17 REML ESTIMATIO 1797 where A = Uφ 0 1/ G i Uφ 1/ B = Uφ 1/ G j Uφ 0 1/. ow A A λ max U i φ 0 U i φ. Using the fact that A B 0 A B tra trb [note that it is not true that A B 0 A B A B (e.g., Chan and Kwong (1985))], we get A A R U i φ 0 U i φ R, and so on.) Thus with L = s + 13K s+1 C1/ q g we have on that (43) and (44) E = { z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z E φ0 L δ 0 δ 1 V i 1 1/ R V j1 1/ 0 i j s φ } ( s ) I L δ 0 δ 1 V i 1 1/ R φ i φ 0i ( s ) 1/ L s + 1 3/ δ 0 δ 1 M 0 M V i 1 R φ i φ 0i P φ0 E c φ 0 i j s R var φ0 z z L δ 1/3q g 0δ V i 1 R V j 1 R Finally, I 3 = U H U, where U = s φ i φ 0i Uφ 0 1/ G i Uφ 0 1 z. We have s H 1/ φ k φ k Uφ 0 1/ Uφ 0 1/ s + 1M 0 δ ε and k=0 U s + 1 s φ i φ 0i Uφ 0 1/ G i Uφ 0 1 z E φ0 Uφ 0 1/ G i Uφ 0 1 z δ 0 V i1 R Thus with L 3 = 3s+1 1/ q g we have on E 3 = Uφ 0 1/ G i Uφ 0 1 z L 3 δ 1 0 V i1 R 0 i s that s (45) I 3 L 3 s + 1 δ0 M 0δ V i 1 R φ i φ 0i and (46) It also follows that (47) P φ0 E c 3 s ε E φ0 z L 3 δ 0 V i1 R E φ0 I 3 s + 1 δ 0 M 0δ ε 1/3q g s V i 1 R φ i φ 0i
18 1798 J. JIAG Combining (37), (38), (40), (41), (43), (45) and (47), we have on E = E 1 E E 3 that for φ δ M, L φ L φ 0 { Q δ τ c d (48) 16s + 11 M 0 M + 1/s + 1 δ 0 M 0L 3 + 1δ ε + 1/s + 1 1/ δ 1 0 L 1 + s + 1L δ 1 M 0 M Q 1 where Q = s V i 1 R φ i φ 0i 1/. And, combining (4), (44) and (46), we have } (49) P φ0 E c q g ote that by I 3 at 1 (= 1 1 ), q tends to infinity, and so the probability of E tends to 1. It remains to show that for any η > 0 there is 0 > 0 such that for 0 the in (48) is less than 0 for all φ δ M S η φ 0 c, where S η φ 0 = φ φ i φ 0i < η 0 i s. Since δ 0 M as, this implies that ˆφ is consistent. Since φ δ M S η φ 0 c implies Q ηq, in (48) qh τ c d 16s + 11 M 0 q a b h + C η 1 q a+s+3/b+s+1/f+g 1 + C 1 q a f+g where C 1 C are constants. Also lim inf q h τ c d > 0, and ( a b h > max a f + g a + s + 3 b + s + 1 ) f + g 1 by the way we picked a, b, f, g, h. Thus in (48) is less than 0 for large uniformly for all φ δ M S η φ 0 c To prove that V i 1 R ˆφ i φ 0i 0 i s is bounded in probability, replace δ M K ε and q g by δ = 1/δ 0, M = 3/M 0, K, ε = M δ/k and η 1, respectively. Then by the same argument we have with probability greater than or equal to 1 η that for all φ δ M, { L φ L φ 0 Q δτ δ M 16s + 1M 0 M + C 1η δ ε (50) } + C η 1 K s+1/ δ 1 MQ 1 The result follows from (50) and the consistency of ˆφ. The proof of (ii) is the same as that of Theorem 4.3 in Jiang (1996).
19 REML ESTIMATIO Examples. The first example is used to illustrate the AI 4 condition. (51) Example 5.1. Consider a two-way (unbalanced) nested model y ijk = β i + α ij + ε ijk where i = 1 p j = 1 k = 1 n i β α and ε correspond to the fixed, random effects and errors, respectively. Assume, w.l.o.g., that n i 1 1 i p. Let X = diag1 ni 1 Z = diag1 ni I, then the model can be written as (5) y = Xβ + Zα + ε So = p i=1 n i. Direct calculation shows H = Z AA Z = Z I XX X 1 X Z = diagn i I 1 J where J = 1 1, and Z AVµ 1 A Z = H Hµ 1 I p +H 1 H = diagn i 1+ µn i 1 I 1 J Thus p trv 1 θ = trz AVµ 1 A n Z = i 1 + µn i=1 i ( ) p trv 1 θ = trz AVµ 1 A Z n = i 1 + µn i Let q = n i >1n i 1 r = q/p, then it is easy to show that i=1 (53) lim inf λ min CorV 0 θ V 1 θ > 0 for all θ = θ = λ µ λ > 0 µ > 0 iff lim sup p/ < 1/ iff lim inf r > 0. Since q 1 i p n i > 1 lim inf r = 0 would mean the model is asymptotically confounded. ote s = 1 in this example, so (3) (35) become a + b1 b 1 < 0 3a + b < f < 1 a 4b 0 < g < 05f 15a b 1 a 4b f 0 < h < f 3a b g 1 a 4b f g. The next example is a counterexample showing that AI 4 is not enough for Wald consistency. (54) Example 5.. Let λ 1 = 0 λ λ be positive numbers such that i=1 λ lim sup i /1 + µλ i 1/ < 1 i=1 λ i /1 + µλ i 1/ and (55) lim ( i=1 λ i 1 + µλ i ) =
20 1800 J. JIAG for all µ 0 [e.g., λ = = λ / = a λ /+1 = = λ = b, where a b > 0 a b]. Consider (56) y i = λ i α i + ε i i = 1 where α i s are random effects, ε i s are errors satisfying Pε 1 = 0 > 0. Thus y = Zα + ε where Z = diag λ i. Equations (54) and (55) imply lim inf λ min CorV 0 θ V 1 θ > 0 and lim V i θ R = i = 0 1 for all θ = θ = λ µ λ > 0 µ 0, so the model is AI 4 not only at all θ but all θ. Assume φ 0 o. It is easy to derive (57) L φ L φ 0 = 1 { log i=1 + ( i=1 ( σ 00 + σ01 λ ) i σ 0 + σ 1 λ i 1 σ 00 + σ 01 λ i σ 0 + σ 1 λ i where w i = ε i + λ i α i / σ00 + σ 01 λ i i = 1. ) } w i Let B = φ = σ0 σ 1 σ i σ 0i / i = 0 1 φ = σ 0 σ 01 with 0 < σ 0 < σ 00 exp { 3δ 1 ( + σ 00 /σ 01 ( i= )) } λi 1 log where δ = P φ0 ε 1 = 0. Define ξ = 1/ i=1 w i η = i= λi 1 w i / i= λi 1 E = ε 1 = 0 ξ η 3δ 1. Then it follows from (57) that on E L φ L φ 0 + /log + ξ { ( < L φ 0 + 1/ log σ 00 σ )( 00 σ 0 σ01 < L φ φ B λ 1 i i= )η } Thus ) P φ0 (sup L φ < L φ P φ0 E δ/3 φ B Therefore with probability greater than some positive number the maximum point ˆφ will not fall into B no matter how large is. ote that in this example the REML estimates are the same as the MLE.
21 6. Concluding remarks. REML ESTIMATIO In virtually all problems regarding Wald consistency, there are two sources of difficulties. The first is the arbitrariness of the parameters (here, θ or φ). The second is the randomness of some quantities involved in the likelihood function (here, z = A y). Somehow, one does not want these two kinds of things to be tied up, and this turns out to be the technical origin of our proofs. As we mentioned in Section, the basic idea is to prove the difference, say L θ L θ 0, is negative for θ outside a small neighborhood of θ 0. Since the first term in the difference decomposition [e.g., (7)] is nonrandom, the focus is therefore on the second term (i.e., the d term). In the balanced case, we are able to further decompose d θ θ 0 as a sum such that each summand is a product of two factors the first depends only on θ and the second only on z, and the number of terms in the sum is bounded. This technique of separating the two kinds of difficulties mentioned above has been proved totally successful since it then becomes clear how to construct the small neighborhood (see the proof of Theorem 3.1). Unfortunately, this nice decomposition of the d term no longer exists in the unbalanced case (thanks to a counterexample we are able to construct which helps clarify that). In the unbalanced situation we go back to the old idea of Wald (1949), that is, dividing the parameter space by small regions and approximating the likelihood within each small region by its value at, say, the center of the region. ote that once φ is fixed at some point φ L φ = L φ becomes a function of z alone. So this technique often does the job of separating φ and z except for one thing: the number of such small regions has to be well under control, which is impossible, as we have found, without using the method of sieves. Since we are dealing with some nonstandard situations, namely, quadratic forms of non-gaussian random variables which can not be expressed as i.i.d. sums, special techniques are needed to evaluate carefully the orders of different quantities and hence eventually construct the sieve. These include an expansion of some matrix-valued function and an inequality for the variance of a quadratic form. Finally, the counterexample has made it clear that sieve-wald consistency is all one can expect in a general unbalanced non-gaussian situation. 6.. In Section 4 a sequence of approximating spaces (i.e., a sieve) is constructed, and the consistency of the resulting estimates can be ensured under AI 4. Although the AI 4 condition seems minimal for a theorem of this generality, the convergence rate of the sieve to the full parameter space could be slow. The rate can be much improved if, for example, the true parameter vector is known to belong to a compact subspace of the interior of the parameter space, say, φ 0 δ M, where 0 < δ < M < (e.g., δ = 10 6 M = 10 6 ). In such a case the actual parameter space is 0 = δ M, each approximating space can be taken as 0 (i.e., δ = δ B = δ M = M B = M) and the corresponding sequence ˆφ is consistent [see (50) in the proof of Theorem 4.1].
22 180 J. JIAG 6.3. However, the AI 4 condition is not sufficient for Wald consistency. Additional assumptions are needed, either on the structure of the model or on distributions of the random effects and errors. For example, it would be interesting to see whether Wald consistency holds when normality actually holds (in which case Gaussian likelihood is the true likelihood). ote that in Example 5. the distribution of the errors is not normal. Acknowledgments. The paper is based on part of the author s Ph.D. dissertation at the University of California, Berkeley. I sincerely thank my advisor, Professor P. J. Bickel, for his guidance, encouragement and many helpful discussions. I also thank Professor L. Le Cam for helpful suggestions. The author also thanks a referee and an Associate Editor for their excellent and detailed suggestions which led to substantial improvement of the presentation. REFERECES Billingsley, P. (1986). Probability and Measure. Wiley, ew York. Chan,.. and Kwong, M. K. (1985). Hermitian matrix inequalities and a conjecture. Amer. Math. Monthly Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press. Cressie,. and Lahiri, S.. (1993). The asymptotic distribution of REML estimators. J. Multivariate Anal Das, K. (1979). Asymptotic optimality of restricted maximum likelihood estimates for the mixed model. Calcutta Statist. Assoc. Bull Grenander, U. (1981). Abstract Inference. Wiley, ew York. Hartley, H. O. and Rao, J.. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems (with discussion). J. Amer. Statist. Assoc Jiang, J. (1996). REML estimation: asymptotic behavior and related topics. Ann. Statist Khuri, A. I. and Sahai, H. (1985). Variance components analysis: a selective literature survey. Internat. Statist. Rev Le Cam, L. (1979). Maximum likelihood: an introduction. Lecture otes in Statist. 18. Univ. Maryland, College Park. Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Ann. Statist eyman, J. and Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrika Rao, C. R. and Kleffe, J. (1988). Estimation of Variance Components and Applications. orth- Holland, Amsterdam. Rao, J.. K. (1977). Contribution to the discussion of Maximum likelihood approaches to variance component estimation and to related problems by D. A. Harville. J. Amer. Statist. Assoc Richardson, A. M. and Welsh, A. H. (1994). Asymptotic properties of restricted maximum likelihood (REML) estimates for hierarchical mixed linear models. Austral. J. Statist Robinson, D. L. (1987). Estimation and use of variance components. Statistician Searle, S. R., Casella, G. and McCulloch, C. E. (199). Variance Components. Wiley, ew York. Sweeting, T. J. (1980). Uniform asymptotic normality of the maximum likelihood estimator. Ann. Statist
23 REML ESTIMATIO 1803 Thompson, Jr., W. A. (196). The problem of negative estimates of variance components. Ann. Math. Statist Wald, A. (1949). ote on the consistency of the maximum likelihood estimate. Ann. Math. Statist Department of Statistics Case Western Reserve University Euclid Avenue Cleveland, Ohio jiang@reml.cwru.edu
A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS
K Y B E R N E T I K A V O L U M E 4 3 ( 2 0 0 7, N U M B E R 4, P A G E S 4 7 1 4 8 0 A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS
More informationJournal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error
Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA
More informationTesting Some Covariance Structures under a Growth Curve Model in High Dimension
Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction
Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationIntroduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX
Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX September 2007 MSc Sep Intro QT 1 Who are these course for? The September
More informationON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE
Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia
More informationTesting a Normal Covariance Matrix for Small Samples with Monotone Missing Data
Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationarxiv: v2 [math.st] 22 Oct 2016
Vol. 0 (0000) arxiv:1608.08789v2 [math.st] 22 Oct 2016 On the maximum likelihood degree of linear mixed models with two variance components Mariusz Grządziel Department of Mathematics, Wrocław University
More informationMIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations
Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 409-435 c 2006, Indian Statistical Institute MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete
More informationLecture 20: Linear model, the LSE, and UMVUE
Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;
More informationASYMPTOTIC DIOPHANTINE APPROXIMATION: THE MULTIPLICATIVE CASE
ASYMPTOTIC DIOPHANTINE APPROXIMATION: THE MULTIPLICATIVE CASE MARTIN WIDMER ABSTRACT Let α and β be irrational real numbers and 0 < ε < 1/30 We prove a precise estimate for the number of positive integers
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationChapter 1. Matrix Algebra
ST4233, Linear Models, Semester 1 2008-2009 Chapter 1. Matrix Algebra 1 Matrix and vector notation Definition 1.1 A matrix is a rectangular or square array of numbers of variables. We use uppercase boldface
More informationSPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS
SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint
More informationGoodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach
Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,
More informationImproved Methods for Monte Carlo Estimation of the Fisher Information Matrix
008 American Control Conference Westin Seattle otel, Seattle, Washington, USA June -3, 008 ha8. Improved Methods for Monte Carlo stimation of the isher Information Matrix James C. Spall (james.spall@jhuapl.edu)
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović
ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS Srdjan Petrović Abstract. In this paper we show that every power bounded operator weighted shift with commuting normal weights is similar to a contraction.
More informationMultivariate Time Series
Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form
More informationML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58
ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.
More informationORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction
J. Appl. Math & Computing Vol. 13(2003), No. 1-2, pp. 457-470 ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS Myongsik Oh Abstract. The comparison of two or more Lorenz
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationStochastic Design Criteria in Linear Models
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationTo appear in The American Statistician vol. 61 (2007) pp
How Can the Score Test Be Inconsistent? David A Freedman ABSTRACT: The score test can be inconsistent because at the MLE under the null hypothesis the observed information matrix generates negative variance
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationBanach Journal of Mathematical Analysis ISSN: (electronic)
Banach J. Math. Anal. 6 (2012), no. 1, 139 146 Banach Journal of Mathematical Analysis ISSN: 1735-8787 (electronic) www.emis.de/journals/bjma/ AN EXTENSION OF KY FAN S DOMINANCE THEOREM RAHIM ALIZADEH
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationRELATION BETWEEN SMALL FUNCTIONS WITH DIFFERENTIAL POLYNOMIALS GENERATED BY MEROMORPHIC SOLUTIONS OF HIGHER ORDER LINEAR DIFFERENTIAL EQUATIONS
Kragujevac Journal of Mathematics Volume 38(1) (2014), Pages 147 161. RELATION BETWEEN SMALL FUNCTIONS WITH DIFFERENTIAL POLYNOMIALS GENERATED BY MEROMORPHIC SOLUTIONS OF HIGHER ORDER LINEAR DIFFERENTIAL
More informationMi-Hwa Ko. t=1 Z t is true. j=0
Commun. Korean Math. Soc. 21 (2006), No. 4, pp. 779 786 FUNCTIONAL CENTRAL LIMIT THEOREMS FOR MULTIVARIATE LINEAR PROCESSES GENERATED BY DEPENDENT RANDOM VECTORS Mi-Hwa Ko Abstract. Let X t be an m-dimensional
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationEffective Sample Size in Spatial Modeling
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS029) p.4526 Effective Sample Size in Spatial Modeling Vallejos, Ronny Universidad Técnica Federico Santa María, Department
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationChapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries
Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.
More informationroots, e.g., Barnett (1966).) Quite often in practice, one, after some calculation, comes up with just one root. The question is whether this root is
A Test for Global Maximum Li Gan y and Jiming Jiang y University of Texas at Austin and Case Western Reserve University Abstract We give simple necessary and sucient conditions for consistency and asymptotic
More informationA 3 3 DILATION COUNTEREXAMPLE
A 3 3 DILATION COUNTEREXAMPLE MAN DUEN CHOI AND KENNETH R DAVIDSON Dedicated to the memory of William B Arveson Abstract We define four 3 3 commuting contractions which do not dilate to commuting isometries
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationA Residual Inverse Power Method
University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR 2007 09 TR 4854 A Residual Inverse Power Method G. W. Stewart February 2007 ABSTRACT The inverse
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationTest Volume 11, Number 1. June 2002
Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS
Ø Ñ Å Ø Ñ Ø Ð ÈÙ Ð Ø ÓÒ DOI: 10.2478/v10127-012-0017-9 Tatra Mt. Math. Publ. 51 (2012), 173 181 ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Júlia Volaufová Viktor Witkovský
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationGAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n
GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. A. ZVAVITCH Abstract. In this paper we give a solution for the Gaussian version of the Busemann-Petty problem with additional
More informationHigh-dimensional two-sample tests under strongly spiked eigenvalue models
1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More informationSome properties of Likelihood Ratio Tests in Linear Mixed Models
Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero
More informationThe Distribution of Generalized Sum-of-Digits Functions in Residue Classes
Journal of umber Theory 79, 9426 (999) Article ID jnth.999.2424, available online at httpwww.idealibrary.com on The Distribution of Generalized Sum-of-Digits Functions in Residue Classes Abigail Hoit Department
More informationDAVIS WIELANDT SHELLS OF NORMAL OPERATORS
DAVIS WIELANDT SHELLS OF NORMAL OPERATORS CHI-KWONG LI AND YIU-TUNG POON Dedicated to Professor Hans Schneider for his 80th birthday. Abstract. For a finite-dimensional operator A with spectrum σ(a), the
More informationHOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)
HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More information1 Appendix A: Matrix Algebra
Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationGENERALIZED CANTOR SETS AND SETS OF SUMS OF CONVERGENT ALTERNATING SERIES
Journal of Applied Analysis Vol. 7, No. 1 (2001), pp. 131 150 GENERALIZED CANTOR SETS AND SETS OF SUMS OF CONVERGENT ALTERNATING SERIES M. DINDOŠ Received September 7, 2000 and, in revised form, February
More informationApproximation exponents for algebraic functions in positive characteristic
ACTA ARITHMETICA LX.4 (1992) Approximation exponents for algebraic functions in positive characteristic by Bernard de Mathan (Talence) In this paper, we study rational approximations for algebraic functions
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationComputations of Critical Groups at a Degenerate Critical Point for Strongly Indefinite Functionals
Journal of Mathematical Analysis and Applications 256, 462 477 (2001) doi:10.1006/jmaa.2000.7292, available online at http://www.idealibrary.com on Computations of Critical Groups at a Degenerate Critical
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationLecture: Examples of LP, SOCP and SDP
1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:
More informationTheory of Statistics.
Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased
More informationRATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationJournal of Inequalities in Pure and Applied Mathematics
Journal of Inequalities in Pure and Applied Mathematics http://jipam.vu.edu.au/ Volume 7, Issue 1, Article 34, 2006 MATRIX EQUALITIES AND INEQUALITIES INVOLVING KHATRI-RAO AND TRACY-SINGH SUMS ZEYAD AL
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationThe Skorokhod reflection problem for functions with discontinuities (contractive case)
The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection
More informationRANKS OF QUANTUM STATES WITH PRESCRIBED REDUCED STATES
RANKS OF QUANTUM STATES WITH PRESCRIBED REDUCED STATES CHI-KWONG LI, YIU-TUNG POON, AND XUEFENG WANG Abstract. Let M n be the set of n n complex matrices. in this note, all the possible ranks of a bipartite
More informationdeterminantal conjecture, Marcus-de Oliveira, determinants, normal matrices,
1 3 4 5 6 7 8 BOUNDARY MATRICES AND THE MARCUS-DE OLIVEIRA DETERMINANTAL CONJECTURE AMEET SHARMA Abstract. We present notes on the Marcus-de Oliveira conjecture. The conjecture concerns the region in the
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationDepartment of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014
Department of Mathematics, University of California, Berkeley YOUR 1 OR 2 DIGIT EXAM NUMBER GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014 1. Please write your 1- or 2-digit exam number on
More informationFuzzy Statistical Limits
Fuzzy Statistical Limits Mark Burgin a and Oktay Duman b a Department of Mathematics, University of California, Los Angeles, California 90095-1555, USA b TOBB Economics and Technology University, Faculty
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationBALANCING GAUSSIAN VECTORS. 1. Introduction
BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More information