WALD CONSISTENCY AND THE METHOD OF SIEVES IN REML ESTIMATION. By Jiming Jiang Case Western Reserve University

Size: px
Start display at page:

Download "WALD CONSISTENCY AND THE METHOD OF SIEVES IN REML ESTIMATION. By Jiming Jiang Case Western Reserve University"

Transcription

1 The Annals of Statistics 1997, Vol. 5, o. 4, WALD COSISTECY AD THE METHOD OF SIEVES I REML ESTIMATIO By Jiming Jiang Case Western Reserve University We prove that for all unconfounded balanced mixed models of the analysis of variance, estimates of variance components parameters that maximize the (restricted) Gaussian likelihood are consistent and asymptotically normal and this is true whether normality is assumed or not. For a general (nonnormal) mixed model, we show estimates of the variance components parameters that maximize the (restricted) Gaussian likelihood over a sequence of approximating parameter spaces (i.e., a sieve) constitute a consistent sequence of roots of the REML equations and the sequence is also asymptotically normal. The results do not require the rank p of the design matrix of fixed effects to be bounded. An example shows that, in some unbalanced cases, estimates that maximize the Gaussian likelihood over the full parameter space can be inconsistent, given the condition that ensures consistency of the sieve estimates. 1. Introduction. In many cases exploration of asymptotic properties of maximum likelihood (ML) estimates led to one of two types of consistency: Cramér (1946) or Wald (1949) types. The former establishes the consistency of some root of the likelihood equation(s) but usually gives no indication on how to identify such a root when the roots of the likelihood equation(s) are not unique. The latter, however, states that an estimate of the parameter vector that maximizes the likelihood function is consistent. Therefore it is not unusual that Wald consistency sometimes fails whereas the Cramér one holds [e.g., Le Cam (1979)]. Often, obstacles to Wald consistency arise on the boundary of the parameter space. For this reason, the maximization is sometimes carried out over a sequence of subspaces which may belong to the interior of and approaches as sample size increases. Methods of this type have been studied in the literature, among them those that Grenander (1981) called sieves. In this paper we consider asymptotic behaviors of REML restricted or residual maximum likelihood estimates in variance components estimation. The REML method was first proposed by W. A. Thompson (196). Several authors have given overviews on REML, which can be found in Harville (1977), Khuri and Sahai (1985), Robinson (1987) and Searle, Casella and McCulloch (199). A general mixed model can be written as (1) y = Xβ + Z 1 α Z s α s + ε Received February 1995; revised October AMS 1991 subject classification. 6F1. Key words and phrases. Mixed models, restricted maximum likelihood, Wald consistency, the method of sieves. 1781

2 178 J. JIAG where y is an 1 vector of observations, X is an p known matrix of full rank p, β is a p 1 vector of unknown constants (the fixed effects), Z i is an m i known matrix, α i is a m i 1 vector of i.i.d. random variables with mean 0 and variance σi (the random effects), i = 1 s; ε is an 1 vector of i.i.d. random variables with mean 0 and variance σ0 (the errors). Asymptotic results of maximum likelihood type estimates for model (1) are few in number, with or without normality assumptions. Assuming normality, and the model having a standard AOVA structure, Miller (1977) proved a result of Cramér consistency and asymptotic normality for the maximum likelihood estimates (MLE) of both the fixed effects and the variance components σ0 σ s. Under conditions slightly stronger than those of Miller, Das (1979) obtained a similar result for the REML estimates. Also under the normality assumption, Cressie and Lahiri (1993) gave conditions under which Cramér consistency and asymptotic normality of the REML estimates in a general mixed model would hold, using a result of Sweeting (1980). Without assuming normality, in which case the REML estimates are defined as solutions of the REML equations derived under normality, Richardson and Welsh (1994) proved Cramér consistency and asymptotic normality for hierarchical (nested) mixed models. A common feature of the above results is that the rank p of the design matrix X of the fixed effects was held fixed. A more important and interesting question related to the (possible) superiority of REML over straight ML in estimating the variance components is how the REML estimates behave asymptotically with p. In such situations a well-known example showing the inconsistency of the MLE is due to eyman and Scott (1948). The question was answered recently by Jiang (1996), in which Cramér consistency and asymptotic normality of the REML estimates were proved under the assumption that the model is asymptotically identifiable and infinitely informative. The results do not require boundedness of p, normality, or any structure (such as AOVA or nested design, etc.) for the model. As a consequence, Cramér consistency was shown to hold for all unconfounded balanced mixed models of the analysis of variance. In contrast to the preceding results, the only consistency result in variance components estimation that might be considered as Wald type was Hartley and Rao (1967) for MLE. As was noted by Rao (1977), a corrected version of Hartley Rao s proof would establish the Wald consistency of the MLE, under the assumption that the number of observations falling into any particular level of any random factor stayed below a universal constant. To our knowledge, there has not been discussion on the method of sieves in variance components estimation. From both theoretical and practical points of view, consistency of Wald type (nonsieve or sieve) is more gratifying than that of Cramér type, although the former often requires stronger assumptions. Since in nonnormal cases [e.g., Richardson and Welsh (1994), Jiang (1996)] the REML estimates are defined as a solution of the (normal) REML equations, a natural question regarding Wald type consistency is whether the solution that maximizes the (restricted) Gaussian likelihood is consistent. The aim of this paper is to establish the following.

3 REML ESTIMATIO Wald consistency holds for all unconfounded balanced mixed models of the analysis of variance. That is, estimates of the variance components parameters that maximize the restricted Gaussian likelihood are consistent.. The method of sieves works for all mixed models provided the models are asymptotically identifiable and infinitely informative globally. That is, there is a clear way of constructing the sieve which will produce consistent estimates. 3. In both (1) and () the estimates are also asymptotically normal. Again the results do not require normality or boundedness of p or structures for the model. The Wald consistency in the balanced case is also without the assumption that the true parameter vector is an interior point of the parameter space. We also give an example showing that in some unbalanced cases there is sieve Wald consistency but no Wald consistency. The restricted Gaussian likelihood is given in Section. Wald consistency in the balanced case is proved in Section 3. In Section 4 we discuss the method of sieves for the general mixed model (1). An example and a counterexample are given in Section 5. In Section 6 we make some remarks about the techniques we use and open problems.. Definitions, basic assumptions, and a simple lemma. There are two parametrizations of the variance components: θ 0 = λ = σ0 θ i = µ i = σi /σ 0 1 i s [Hartley and Rao (1967)] and φ i = σi 0 i s. There is a one-to-one correspondence between the two sets of parameters. In this paper, estimates of the two sets of parameters are equivalent in the sense that consistency for one set of parameters implies that for the other. The parameter spaces are = θ λ > 0 µ i 0 1 i s and = φ σ0 > 0 σ i 0 1 i s. The true parameter vectors will be denoted by θ 0 = θ 0i 0 i s = λ 0 µ 0 = λ 0 µ 0i 1 i s and φ 0 = φ 0i 0 i s. Given the data y, the restricted (or residual) Gaussian log-likelihood is the Gaussian log-likelihood based on z = A y, where A is a p matrix such that () ranka = p A X = 0 Thus the restricted Gaussian log-likelihoods for estimating the two sets of parameters θ i 0 i s and φ i 0 i s are given, respectively, by ( ) p L θ = c log λ 1 1 (3) log VA µ λ z VA µ 1 z and (4) L φ = c 1 log UA φ 1 z UA φ 1 z where c = p/ log π VA µ = G 0 + s i=1 µ i G i UA φ = s φ i G i = λva µ with G 0 = A A G i = A Z i Z ia 1 i s. The

4 1784 J. JIAG maximizers ˆθ = ˆθ i 0 i s and ˆφ = ˆφ i 0 i s of (3) and (4) do not depend on the choice of A so long as () is satisfied. Therefore we may assume, w.l.o.g., that (5) A A = I p ote that the relation between ˆθ i 0 i s and ˆφ i 0 i s is the same as that for the parameters. The dependence of VA µ and UA φ on A is not important in this paper; therefore we will abbreviate these by Vµ and Uφ, respectively. Also w.l.o.g., we can focus on a sequence of designs indexed by, since consistency and asymptotic normality hold iff they hold for any sequence with increasing strictly monotonically. Thus, for example, p and m i 1 i s can be considered as functions of, and y, X, Z i s and so on as depending on [e.g., Jiang (1996)]. Let m 0 =, α 0 = ε. The following assumptions A1 and A are made for model (1). A1. For each, α 0 α 1 α s are mutually independent. A. For 0 i s, the common distribution of α i1 α imi may depend on. However, it is required that (6) lim sup x max 0 i s Eα4 i1 1 α i1 >x = 0 The basic idea of proving Wald consistency (nonsieve or sieve) is simple. Consider, for example, the log-likelihood (3). The difference L θ L θ 0 can be decomposed as (7) L θ L θ 0 = e θ θ 0 + d θ θ 0 where (8) e θ θ 0 = E θ0 L θ L θ 0 = 1 { p log λ + log Vµ λ 0 Vµ 0 + λ 0 λ trvµ 1 Vµ 0 p } (9) d θ θ 0 = L θ L θ 0 E θ0 L θ L θ 0 = 1 [ ] 1λ {z Vµ 1 1λ0 Vµ 0 1 z [ 1 E θ0 z λ Vµ 1 1 } Vµ λ 0 ]z 1 0

5 REML ESTIMATIO 1785 Essentially, what we are going to show is that, with probability 1 and uniformly for θ outside a small neighborhood δ θ 0 = θ θ θ 0 < δ, the second term on the RHS of (7) is negligible compared with the first term, which is negative. Therefore the LHS of (7) will be less than 0 for θ / δ θ 0, and the rest of the argument is the same as Jiang (1996). Let A B A i 1 i n be matrices. Define A = the determinant of A, A = λ max A A 1/ (λ max denotes the largest eigenvalue), A R = tra A 1/, CorA 1 A n = cora i A j if A i 0 1 i n, and 0 otherwise, where cora B = tra B/A R B R if A B 0; diaga i to be the block-diagonal matrix with A i as its ith diagonal block. Define bµ = I µ1 Z 1 µs Z s Bµ = AVµ 1 A [note that Bµ indeed does not depend on A], B 0 µ = bµbµbµ B i µ = bµbµz i Z i Bµbµ i = 1 s. Let p i 0 i s, be sequences of positive numbers. Denote U i φ = Uφ 1/ G i Uφ 1/ 0 i s V 0 θ = 1/λI p V i θ = Vµ 1/ G i Vµ 1/ 1 i s I ij θ = trv iθv j θ/p i p j K +m ij θ = 1 EW 4 l p i p j 3B iµ ll B j µ ll /λ 1 +1 j=0 l=1 i j = 0 1 s where m = m m s, B ll denotes the lth diagonal element of B; W l = ε l / λ 0 1 l W l = α i l k<i m k / λ 0 µ 0i + m k + 1 l + m k 1 i s k<i k i or = W l 1 l +m = ε / λ 0 α 1 / λ 0 µ 01 α s / λ 0 µ 0s, where α i / λ 0 µ 0i is understood as 0 0 (α ij / λ 0 µ 0i 0 1 j m i ) if µ 0i = 0. Let I θ = I ij θ K θ = K ij θ J θ = I θ + K θ. We say model (1) has positive variance components if θ 0 is an interior point of and is nondegenerate if (10) inf min 0 i s varα i1 > 0 A sequence of estimates ˆθ 0 ˆθ s is called asymptotically normal if there are sequences of numbers p i 0 i s and matrices M θ 0 with lim supm 1 θ 0 M θ 0 < such that (11) M θ 0 p 0 ˆθ 0 θ 00 p s ˆθ s θ 0s 0 Is+1 Similarly we define the asymptotic normality of estimates for the φ s.

6 1786 J. JIAG The following lemma will be used in proofs of both Theorem 3.1 and Theorem 4.1 in the next two sections. Lemma.1. Let = X 1 X n be a vector of independent random variables such that EX i = 0 EX 4 i < 1 i n, and A = a ij be an n n symmetric matrix. Then n var A = a ii varx i + a ij varx i varx j i=1 i j { varx } max i n a 1 i n varx i ij varx i varx j ij=1 where varx i /varx i is understood as 0 if varx i = 0. (1) Proof. We have A E A = { ( ) n a ii X i EX i + a ij X j X i } j<i i=1 where j<i = 0 if i = 1. The summands in (1) form a sequence of martingale differences with respect to the σ-fields i = σx 1 X i 1 i n. The result then follows easily. 3. The balanced case: Wald consistency. A balanced r-factor mixed model of the analysis of variance can be expressed (after possible reparametrization) in the following way [e.g., Searle, Casella and McCulloch (199), Rao and Kleffe (1988)]: (13) y = Xβ + i S Z i α i + ε where X = r+1 q=1 1d q n q with d = d 1 d r+1 S r+1 = 0 1 r+1 Z i = r+1 q=1 1i q nq with i = i 1 i r+1 S S r n = I n 1 1 n = 1 n with I n and 1 n being the n-dimensional identity matrix and vector of unit elements, respectively. [For examples, see Jiang (1996).] We assume, as usual, that factor r + 1 corresponds to repetition within cells. As a consequence, in (13) one must have d r+1 = 1 and i r+1 = 1 i S. Also we can assume, w.l.o.g., that n q 1 q r (since if n q = 1, factor q is not really a factor and the model not really an r-factor one). The model is called unconfounded if (1) the fixed effects are not confounded with the random effects and errors, that is, rankx Z i > p i and X I ; () the random effects and errors are not confounded; that is, I Z i Z i i S are linearly independent [e.g., Miller (1977)]. ote that under the balanced model (13) we have the expressions m i = i q =0 n q i S. This allows us to extend the definition of m i to all i S r+1. In particular, p = m d = d q =0 n q, and = m 00 0 = r+1 q=1 n q. Let S = 00 0 S µ i = µ 0i = 1 if i = 00 0 S r+1.

7 REML ESTIMATIO 1787 In the balanced case we have the following nice representations for the e θ θ 0 and d θ θ 0 in (7). (14) (15) Lemma 3.1. In the balanced case, e θ θ 0 = 1 S θ d θ θ 0 = 1 r l θ 1ξ l l d where for u v S u v iff u q v q 1 q r + 1, and u v iff u is not v; r l θ = λ 0 C l µ 0 /λc l µ with C l µ = i S µ i /m i 1 i l S θ = l dr l θ 1 log r l θp l with p l = l q =0n q 1 1/ ; and ξ l l d are random variables whose definition will be given later on in the proof. Proof. By the proof of Lemma 7.3 in Jiang (1996), there is an orthogonal matrix T such that T Z i Z i T = diagλ ij T XX T = diagλ dj, where λ i1 λ i = r+1 q=1 λ iqk q 1 k q n q 1 q r + 1 λ d1 λ d = r+1 q=1 λ dqk q 1 k q n q 1 q r + 1 with λ iqw = 1 i q + n q i q δ w 1 i S d (δ x y = 1 if x = y and = 0 otherwise). Let A T = B = b 1 b, then since [by (5)] AA = P X = I XX X 1 X = I p/xx B B = diagγ j, with γ j = 1 p/λ dj 1 j It is easy to see that r+1 q=1 1 i q + n q i q l q = /m i 1 i l i S d l S r+1. Hence r+1 λ iqkq q=1 = m i 1 i δk 1 i S r+1 λ dqkq q=1 = p 1 d δ k 1 where δ k1 = δ k1 1 δ kr+1 1 for k = k 1 k r+1. Thus γ 1 γ = 1 δk 1 d 1 k q n q 1 q r + 1 ow G i = A Z i Z i A = B diagλ ijb = j=1 λ ij b j b j i S, thus Vµ = I p + µ t G t = BB + µ t G t t S t S (16) = ( j=1 1 + ) µ t λ tj b j b j = t S γ j 0 ( 1 + µ t λ tj )b j b j t S since γ j = 0 b j = 0. Hence Vµ 1 = γ j 01 + t S µ t λ tj 1 b j b j since γ j 0 γ j = 1. Let Ɣ + = j γ j 0. Then Ɣ + = p (hereafter E denotes the cardinality of a set E). It follows from (16) that the eigenvalues

8 1788 J. JIAG of Vµ are 1 + t S µ t λ tj j Ɣ +. Thus by (8), e θ θ 0 = 1 { λ0 1 + t S µ 0t λ tj λ1 + t S µ t λ tj 1 log λ } 01 + t S µ 0t λ tj λ1 + t S µ t λ tj = 1 = 1 γ j 0 n 1 k 1 =1 1 l 1 =0 n r+1 k r+1 =1 1 l r+1 =0 = 1 { λ0 C l µ 0 λc l µ l d 1 δk 1 d { λ0 1 + t S µ 0t /m t 1 t δk 1 λ1 + t S µ t /m t 1 t δk 1 } 1 log { λ0 1 + t S µ 0t /m t 1 t l 1 l d λ1 + t S µ t /m t 1 t l } 1 log n q 1 l q =0 1 log λ } 0C l µ 0 n λc l µ q 1 = 1 S θ l q =0 using a formula given at the end of the proof of Lemma 7.3 in Jiang (1996) for the third equation; and by (9), d θ θ 0 = 1 ( [λ ) 1 ( µ t λ tj λ ) 1 ] µ 0t λ tj γ j 0 t S t S = 1 = 1 n 1 k 1 =1 b j z E θ0 b j z n r+1 ( [λ 1 1 l 1 =0 k r+1 =1 1 1 δk 1 d 1 + µ t 1 m t δk 1 t S t ) 1 ( λ ) 1 ] µ 0t 1 m t δk 1 t S t b k z E θ0 b k z l r+1 =0 δ k 1 =l = 1 r l θ 1ξ l l d 1 δk 1 d b k z E θ0 b k z where ξ l = λ 0 C l µ 0 1 δ k 1 =lb k z E θ0 b k z l d. Let S 1 θ = l dr l θ 1 p l S 3θ = l d r l θ 1p l. Define S + = i S µ 0i > 0 S + = 00 0 S + b 0 = min i S + µ 0i 1 i S µ + 0i if S +, and b 0 = 1 if S + = ; ε 0 = λ 0 /3 i S µ 0i 1 ; b = 1+explog /r

9 REML ESTIMATIO (here x means the largest interger less than or equal to x); s = S; and T l = q 1 q r + 1 l q = 1. The following lemma plays a key role in the proof of Wald consistency in the balanced case. (17) (18) (19) Lemma 3.. Suppose model (13) is unconfounded. For any B > 0, { ( ) ( inf S 1 θ 1/ r+3 min m i θ / B i S { ( ) ( inf S θ 1/ r+5 b 1 0 min m i θ/ B i S ( S 3θ sup θ/ B S θ r+3 b 0 inf θ/ B S θ ) 1/ + r+1 ( b log ε 0 B 3ε 0 + sλ 0 ) } ε 0 B 3ε 0 + sλ 0 ) ( ) 1/ min m i i S ) } where B = θ λ λ 0 ε 0 B/ µ l µ 0l b T l B/ m l l S. Proof. Let i = 0 = 0 0 if n r+1 ; i = 0 01 if n r+1 = 1. It is easy to show that i d, i i i S, and p i 1 r+1 m i = 1 r+1. So if λ < /3λ 0 or λ > λ 0, then (0) S 1 θ r i θ 1 p i = λ 0 /λ 1 p i 1/ r+3 If /3λ 0 λ λ 0 and µ l > λ 0 /ε 0 for some l S, then r l θ = λ 0 D l µ 0 /λd l µ < 1/, where D l µ = i S µ i m l /m i 1 i l. So S 1 θ r l θ 1 p l 1/r+ m l. ow suppose /3λ 0 λ λ 0 0 µ l λ 0 /ε 0 l S, and θ / B. If λ λ 0 > ε 0 B/, then as in (0) we have S 1 θ 1/ r+3 ε 0 /λ 0 B. If λ λ 0 ε 0 B/ and µ l µ 0l > b Tl B/ m l for some l S, there is l S such that µ l µ 0l > b Tl B/ m l and µ i µ 0i b Ti B/ m i i S i l i l. Then D l µ 0 D l µ µ 0l µ l µ 0i µ i m l /m i 1 i l i l > B/ m l i S since µ 0i µ i m l /m i 1 i l i l B/ m l i S i l i l b T i = B/ m l b + 1 Tl b Tl B/ m l b Tl 1

10 1790 J. JIAG Therefore, λ 0 D l µ 0 λd l µ λd l µ 0 D l µ λ λ 0 D l µ 0 /3λ 0 B/ m l ε 0 B/ µ 0i i S λ 0 /3B/ m l Hence S 1 θ r l θ 1 p l 1/r+ ε 0 B/3ε 0 + sλ 0 To prove (18), let l S + l = i S + i l such that m l = min i S + m l i if S + l, and l = i if S + l =. Then it is easy to show that C l µ 0 /C l µ 0 b 0 and hence (1) r l θ r l θ = C lµ 0 C l µ C l µ 0 C l µ b 0 If r l θ b 0 for some l d, then r l θ by (1). Therefore, () S θ r l θ 1 log r l θp l ( ) 1 log r l θp l 1/ r+1 1 log min m i i S using that x 1 log x/x 1 log / x. If r l θ < b 0 for all l d, then S θ S 1 θ/4b 0 by x 1 log x x 1 /L 0 < x < L L > 1. To prove (19), let S = i S. Write (3) S 3 θ = For any ε > 0, l d r l θ<b 0 r l θ 1p l + l d r l θ b 0 r l θ 1p l = S 1 + S (4) ( ) 1/ ( S 1 1 r l θ 1 p l l d r l θ<b 0 l d r l θ<b 0 } {ε r ε r l θ 1 p l l d r l θ<b 0 { ε r 1 1 } S θ + 4b 0ε S θ ) 1/ by x ε 1 + εx / and that S θ 4b 0 1 l d r l θ<b 0 r l θ 1 p l. By (1), the facts that (according to the definition) l only takes values in

11 REML ESTIMATIO 1791 S = i S and l l (therefore p l p l ), and the fact that ( ) 1 log S θ r i θp i it follows that (5) i S r i θ S b 0 + 1/ r l θp l l d r l θ = b 0 + 1/ r+1 b 0 + 1/ i S r i θ r i θ i S r i θ p l l d l =i r i θp i r+1 b log min i S p i S θ Inequality (19) now follows by (3) (5), taking ε = 4b 0 inf θ/ B S θ 1/ in (4), and the fact that min p i r+1( ) 1/ min m i i S Let ˆθ = ˆλ ˆµ i i S be the maximizer of (3) (one may define ˆθ as any fixed point in when the maximum can not be reached). Theorem 3.1. Let the balanced mixed model (13) be unconfounded. As and m i i S, we have the following: (i) ˆθ is consistent and the sequence p ˆλ λ 0 m i ˆµ i µ 0i i S is bounded in probability. (ii) If, moreover, the model has positive variance components and is nondegenerate, then ˆθ is asymptotically normal with p 0 = p p i = mi i S and M θ 0 = J 1/ θ 0I θ 0. Remark. Part (i) of Theorem 3.1 does not require that the model have positive variance components. In particular, when such requirement does hold, ˆθ constitutes (with probability 1) a consistent sequence of roots of the REML equations [e.g., Jiang (1996)], which are identical to the AOVA estimates in the balanced case [e.g., Searle, Casella and McCulloch (199)]. On the other hand, it is necessary for part (ii) of Theorem 3.1 that the model have positive variance components since otherwise the estimates, which are nonnegative, cannot be asymptotically normal. Proof. According to the definition in Section, it is always true (no matter whether µ 0i = 0 or not) that Z i α i = λ 0 µ 0i Z i α i / λ 0 µ 0i 1 i s, and i S (6) z = A y = λ 0 A bµ 0

12 179 J. JIAG ote that is a vector of independent random variables satisfying EW l = 0 varw l 1 [varw l = 1 if the corresponding µ 0i 0], and EW 4 l <. By the definition of ξ l at the end of the proof of Lemma 3.1, Lemma.1 and assumption A in Section, we see that with B = bµ 0 A δ k 1 =l b k b k A bµ 0, ) E θ0 ξl = var θ 0 ( B C l µ 0 C C l µ 0 trb (7) (( ) C ) = C l µ 0 tr b k b k Vµ 0 = Cp l δ k 1 =l for some constant C. Thus, for any M > 0, (8) P θ0 ξ l > Mp l for some l d r+1 C M For any ε > 0, choose M > r+1 Cε 1 1/. Then choose B > 0 and K such that by Lemma 3., sup θ/ B S 3 θ/s θ < 1/M m i > K i S. It then follows by (7), (14), (15), (18) and (8) that P θ0 L θ < L θ 0 θ / B > 1 ε when m i > K i S. The conclusion of (i) then follows by the definition of B. The proof for (ii) is the same as that of Theorem 4.1 of Jiang (1996). 4. The general case: the method of sieves. For convenience we consider in this section the parameters φ i 0 i s. Let δ M be two sequences of positive numbers such that δ 0 M. Let ˆφ δ M = φ δ φ i M 0 i s be the maximizer of L φ of (4) over δ M. The above procedure is called the method of sieves [e.g., Grenander (1981)]. The following definitions are explained intuitively in Jiang (1996). Definition 4.1. Model (1) is called asymptotically identifiable under the invariant class (AI ) at φ θ if (9) lim inf λ mincorv 0 θ V s θ > 0 (see Section ; λ min means the smallest eigenvalue). Here the invariant class is the class of (location) invariant estimates (30) = estimates which are functions of A y with A satisfying () Definition 4.. Model (1) is called infinitely informative under the invariant class (I 3 ) at φ θ if (31) lim V iθ R = 0 i s

13 REML ESTIMATIO 1793 The following notation is needed for this section: λ θ = λ min CorV 0 θ V s θ ρ a b = inf θ a b λ θ τ c d = inf φ c d λ θ where a b = θ a θ i b 0 i s and so on. Lemma 4.1. For any x i 0 i s and φ, ( ){ } s x i U i φ λ θ s x 0 R 4s + 1 V 0θ R + λ x i V iθ R i=1 ( ) λ θ s x 4M i s + 1 V i1 R where M = max 0 i s φ i. Proof. By the relation (see Section ): U 0 φ = V 0 θ s i=1 λ 1 µ i V i θ U i φ = λ 1 V i θ 1 i s, we have (( s s ) ) s x i U i φ = tr y i V i θ λ θ y i V iθ R R where y 0 = x 0 y i = λ 1 x i µ i x 0 1 i s. Let B = 1 i s x i µ i x 0 1/x i, then s y i V iθ R 1 s + 1 x 0 V 0θ R + 1 s s + 1 { x 0 V 0θ R + λ x 0 V 0θ R + y i V iθ R i/ B i B } s x i V iθ R using the fact that V 0 θ R λ 1 µ i V i θ R 1 i s. The second inequality is obvious. Lemma 4.. Suppose model (1) is AI at all θ a b with a > 0, then lim inf ρ a b > 0. Proof. By direct computation it can be shown that (see Section ): corv θ i θ V j θ 4V kθ 4a 1 0 i j k s k The result thus follows by an argument of subsequences [e.g., Billingsley (1986), A10]. Denote, respectively, the interiors of and by and. Corollary 4.1. Suppose model (1) is AI at all θ, then lim inf τ c d > 0 for all 0 < c < d <. i=1

14 1794 J. JIAG Let model (1) be AI at all θ and I 3 at 1 = 1 1. Define sequences δ and M as follows. Let q = min 0 i s V i 1, a, b be positive numbers such that (3) a + b1 b 1 < 3s Pick f, g, h such that (33) 3a + b < f < s a s + 7b (34) (35) ( f 3a b 0 < g < min 1 a s + 7 b s + 1 ) f ( 0 < h < min f 3a b g 1 a s + 7 b s + 1 ) f g ow pick sequences c and d such that 0 c c 1 1 d 1 d and (36) lim inf q h τ c d > 0 (by Corollary 4.1, one certainly can do this). Finally, pick a baseline region δ B M B such that 0 < δ B < 1 < M B < and let δ = δ B c q a M = M B d q b ote. The selection of δ B and M B will not make a difference for the following asymptotic result. However, a reasonable choice of δ B and M B may be of practical convenience since, when is not sufficiently large, the region c q a d q b may appear to be too narrow. Theorem 4.1. components. Consider a general mixed model (1) having positive variance (i) If the model is asymptotically identifiable and infinitely informative (AI 4 ) at all θ (or φ o ), then ˆφ is consistent and the sequence V i 1 R ˆφ i φ 0i 0 i s is bounded in probability. (ii) If, moreover, the model is nondegenerate, then ˆφ is asymptotically normal with p 0 = p, p i being any sequence V i 1 R 1 i s and M θ 0 = J 1/ θ 0I θ 0. Remark 1. Thus, with probability 1, ˆφ constitutes a consistent sequence of roots of the REML equations. Remark. The assumption that model (1) is AI 4 at all θ is equivalent to saying that it is AI at all θ and I 3 at some θ and w.l.o.g., one can take θ = 1.

15 REML ESTIMATIO 1795 Proof. (i) Let = φ = φ i 0 i s φ i = δ + k i ε 0 i s for some integers 0 k i < K 0 i s, where K = q b+f ε = M δ /K. Let φ δ M, then by (4) we have a decomposition similar to (7): (37) L φ L φ 0 = ẽ φ φ 0 + d φ φ 0 By an expression similar to (8), we have ẽ φ φ 0 = E φ0 L φ L φ 0 = 1/ { log Uφ 0 1/ Uφ 1 Uφ 0 1/ + tri p Uφ 0 1/ Uφ 1 Uφ 0 1/ } p = 1/ log λ i + 1 λ i i=1 where λ 1 λ p are the eigenvalues of Bφ = Uφ 0 1/ Uφ 1 Uφ 0 1/. Since λ max Bφ 1 M 0 δ 1, where M 0 = max 0 i s φ 0i, we have by x 1 log x x 1 /L 0 < x L L 1 and Lemma 4.1 that (38) (39) δ ẽ φ φ 0 41 M 0 truφ 0 1/ Uφ 1 Uφ 0 1/ I p δ = s 41 M 0 φ i φ 0i U i φ δ τ c d 16s + 11 M 0 M R s V i 1 R φ i φ 0i It is easy to verify the following identity which holds for any φ : where Uφ 0 1 Uφ 1 = s φ i φ 0i Uφ 0 1 G i Uφ s j=0 s H = 1/ j=0 s φ i φ 0i φ j φ 0j Uφ 0 1 G i Uφ 1 G j Uφ 0 1 s φ i φ 0i φ j φ 0j Uφ 0 1 G i Uφ 0 1/ H Uφ 0 1/ G j Uφ 0 1 s φ k φ k Uφ 0 1/ k=0 Uφ 1 G k Uφ 1 + Uφ 1 G k Uφ 1 Uφ 0 1/

16 1796 J. JIAG In the following we pick φ in (39) such that φ i φ i < ε 0 i s. By an expression similar to (9) and by (39) we get (40) d φ φ 0 = L φ L φ 0 E φ0 L φ L φ 0 { s = 1/ φ i φ 0i z Uφ 0 1 G i Uφ 0 1 z E φ0 s s φ i φ 0i φ j φ 0j + j=0 s j=0 ( z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z E φ0 ) s φ i φ 0i φ j φ 0j z Uφ 0 1 G i Uφ 0 1/ = 1/I 1 I + I 3 E φ0 I 3 It follows from Lemma.1 and (6) that H Uφ 0 1/ G j Uφ 0 1 z E φ0 ( s var φ0 z Uφ 0 1 G i Uφ 0 1 z CU i φ 0 R Cδ 0 V i1 R s j=0 )} where C = varε1 /varε 1 max 1 i s varα i1 /varα i1 δ 0 = min 0 i s φ 0i. Thus with L 1 = 3s + 1C 1/ q g, we have on E 1 = { z Uφ 0 1 G i Uφ 0 1 z E φ0 L 1 δ 1 0 V i1 R 0 i s } that (41) and (4) Similarly, ( s ) 1/ I 1 L 1 s + 1 1/ δ0 1 V i 1 R φ i φ 0i P φ0 E c 1 s var φ0 z z L 1 δ 0 V i1 R 1/3q g var φ0 z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z CUφ 0 1/ G i Uφ 1 G j Uφ 0 1/ R CU i φ 0 U j φ 0 U i φ R U j φ R Cδ 0 δ V i 1 R V j 1 R (Here is a brief derivation of the second inequality in the above: Uφ 0 1/ G i Uφ 1 G j Uφ 0 1/ R = AB R A A R B B R

17 REML ESTIMATIO 1797 where A = Uφ 0 1/ G i Uφ 1/ B = Uφ 1/ G j Uφ 0 1/. ow A A λ max U i φ 0 U i φ. Using the fact that A B 0 A B tra trb [note that it is not true that A B 0 A B A B (e.g., Chan and Kwong (1985))], we get A A R U i φ 0 U i φ R, and so on.) Thus with L = s + 13K s+1 C1/ q g we have on that (43) and (44) E = { z Uφ 0 1 G i Uφ 1 G j Uφ 0 1 z E φ0 L δ 0 δ 1 V i 1 1/ R V j1 1/ 0 i j s φ } ( s ) I L δ 0 δ 1 V i 1 1/ R φ i φ 0i ( s ) 1/ L s + 1 3/ δ 0 δ 1 M 0 M V i 1 R φ i φ 0i P φ0 E c φ 0 i j s R var φ0 z z L δ 1/3q g 0δ V i 1 R V j 1 R Finally, I 3 = U H U, where U = s φ i φ 0i Uφ 0 1/ G i Uφ 0 1 z. We have s H 1/ φ k φ k Uφ 0 1/ Uφ 0 1/ s + 1M 0 δ ε and k=0 U s + 1 s φ i φ 0i Uφ 0 1/ G i Uφ 0 1 z E φ0 Uφ 0 1/ G i Uφ 0 1 z δ 0 V i1 R Thus with L 3 = 3s+1 1/ q g we have on E 3 = Uφ 0 1/ G i Uφ 0 1 z L 3 δ 1 0 V i1 R 0 i s that s (45) I 3 L 3 s + 1 δ0 M 0δ V i 1 R φ i φ 0i and (46) It also follows that (47) P φ0 E c 3 s ε E φ0 z L 3 δ 0 V i1 R E φ0 I 3 s + 1 δ 0 M 0δ ε 1/3q g s V i 1 R φ i φ 0i

18 1798 J. JIAG Combining (37), (38), (40), (41), (43), (45) and (47), we have on E = E 1 E E 3 that for φ δ M, L φ L φ 0 { Q δ τ c d (48) 16s + 11 M 0 M + 1/s + 1 δ 0 M 0L 3 + 1δ ε + 1/s + 1 1/ δ 1 0 L 1 + s + 1L δ 1 M 0 M Q 1 where Q = s V i 1 R φ i φ 0i 1/. And, combining (4), (44) and (46), we have } (49) P φ0 E c q g ote that by I 3 at 1 (= 1 1 ), q tends to infinity, and so the probability of E tends to 1. It remains to show that for any η > 0 there is 0 > 0 such that for 0 the in (48) is less than 0 for all φ δ M S η φ 0 c, where S η φ 0 = φ φ i φ 0i < η 0 i s. Since δ 0 M as, this implies that ˆφ is consistent. Since φ δ M S η φ 0 c implies Q ηq, in (48) qh τ c d 16s + 11 M 0 q a b h + C η 1 q a+s+3/b+s+1/f+g 1 + C 1 q a f+g where C 1 C are constants. Also lim inf q h τ c d > 0, and ( a b h > max a f + g a + s + 3 b + s + 1 ) f + g 1 by the way we picked a, b, f, g, h. Thus in (48) is less than 0 for large uniformly for all φ δ M S η φ 0 c To prove that V i 1 R ˆφ i φ 0i 0 i s is bounded in probability, replace δ M K ε and q g by δ = 1/δ 0, M = 3/M 0, K, ε = M δ/k and η 1, respectively. Then by the same argument we have with probability greater than or equal to 1 η that for all φ δ M, { L φ L φ 0 Q δτ δ M 16s + 1M 0 M + C 1η δ ε (50) } + C η 1 K s+1/ δ 1 MQ 1 The result follows from (50) and the consistency of ˆφ. The proof of (ii) is the same as that of Theorem 4.3 in Jiang (1996).

19 REML ESTIMATIO Examples. The first example is used to illustrate the AI 4 condition. (51) Example 5.1. Consider a two-way (unbalanced) nested model y ijk = β i + α ij + ε ijk where i = 1 p j = 1 k = 1 n i β α and ε correspond to the fixed, random effects and errors, respectively. Assume, w.l.o.g., that n i 1 1 i p. Let X = diag1 ni 1 Z = diag1 ni I, then the model can be written as (5) y = Xβ + Zα + ε So = p i=1 n i. Direct calculation shows H = Z AA Z = Z I XX X 1 X Z = diagn i I 1 J where J = 1 1, and Z AVµ 1 A Z = H Hµ 1 I p +H 1 H = diagn i 1+ µn i 1 I 1 J Thus p trv 1 θ = trz AVµ 1 A n Z = i 1 + µn i=1 i ( ) p trv 1 θ = trz AVµ 1 A Z n = i 1 + µn i Let q = n i >1n i 1 r = q/p, then it is easy to show that i=1 (53) lim inf λ min CorV 0 θ V 1 θ > 0 for all θ = θ = λ µ λ > 0 µ > 0 iff lim sup p/ < 1/ iff lim inf r > 0. Since q 1 i p n i > 1 lim inf r = 0 would mean the model is asymptotically confounded. ote s = 1 in this example, so (3) (35) become a + b1 b 1 < 0 3a + b < f < 1 a 4b 0 < g < 05f 15a b 1 a 4b f 0 < h < f 3a b g 1 a 4b f g. The next example is a counterexample showing that AI 4 is not enough for Wald consistency. (54) Example 5.. Let λ 1 = 0 λ λ be positive numbers such that i=1 λ lim sup i /1 + µλ i 1/ < 1 i=1 λ i /1 + µλ i 1/ and (55) lim ( i=1 λ i 1 + µλ i ) =

20 1800 J. JIAG for all µ 0 [e.g., λ = = λ / = a λ /+1 = = λ = b, where a b > 0 a b]. Consider (56) y i = λ i α i + ε i i = 1 where α i s are random effects, ε i s are errors satisfying Pε 1 = 0 > 0. Thus y = Zα + ε where Z = diag λ i. Equations (54) and (55) imply lim inf λ min CorV 0 θ V 1 θ > 0 and lim V i θ R = i = 0 1 for all θ = θ = λ µ λ > 0 µ 0, so the model is AI 4 not only at all θ but all θ. Assume φ 0 o. It is easy to derive (57) L φ L φ 0 = 1 { log i=1 + ( i=1 ( σ 00 + σ01 λ ) i σ 0 + σ 1 λ i 1 σ 00 + σ 01 λ i σ 0 + σ 1 λ i where w i = ε i + λ i α i / σ00 + σ 01 λ i i = 1. ) } w i Let B = φ = σ0 σ 1 σ i σ 0i / i = 0 1 φ = σ 0 σ 01 with 0 < σ 0 < σ 00 exp { 3δ 1 ( + σ 00 /σ 01 ( i= )) } λi 1 log where δ = P φ0 ε 1 = 0. Define ξ = 1/ i=1 w i η = i= λi 1 w i / i= λi 1 E = ε 1 = 0 ξ η 3δ 1. Then it follows from (57) that on E L φ L φ 0 + /log + ξ { ( < L φ 0 + 1/ log σ 00 σ )( 00 σ 0 σ01 < L φ φ B λ 1 i i= )η } Thus ) P φ0 (sup L φ < L φ P φ0 E δ/3 φ B Therefore with probability greater than some positive number the maximum point ˆφ will not fall into B no matter how large is. ote that in this example the REML estimates are the same as the MLE.

21 6. Concluding remarks. REML ESTIMATIO In virtually all problems regarding Wald consistency, there are two sources of difficulties. The first is the arbitrariness of the parameters (here, θ or φ). The second is the randomness of some quantities involved in the likelihood function (here, z = A y). Somehow, one does not want these two kinds of things to be tied up, and this turns out to be the technical origin of our proofs. As we mentioned in Section, the basic idea is to prove the difference, say L θ L θ 0, is negative for θ outside a small neighborhood of θ 0. Since the first term in the difference decomposition [e.g., (7)] is nonrandom, the focus is therefore on the second term (i.e., the d term). In the balanced case, we are able to further decompose d θ θ 0 as a sum such that each summand is a product of two factors the first depends only on θ and the second only on z, and the number of terms in the sum is bounded. This technique of separating the two kinds of difficulties mentioned above has been proved totally successful since it then becomes clear how to construct the small neighborhood (see the proof of Theorem 3.1). Unfortunately, this nice decomposition of the d term no longer exists in the unbalanced case (thanks to a counterexample we are able to construct which helps clarify that). In the unbalanced situation we go back to the old idea of Wald (1949), that is, dividing the parameter space by small regions and approximating the likelihood within each small region by its value at, say, the center of the region. ote that once φ is fixed at some point φ L φ = L φ becomes a function of z alone. So this technique often does the job of separating φ and z except for one thing: the number of such small regions has to be well under control, which is impossible, as we have found, without using the method of sieves. Since we are dealing with some nonstandard situations, namely, quadratic forms of non-gaussian random variables which can not be expressed as i.i.d. sums, special techniques are needed to evaluate carefully the orders of different quantities and hence eventually construct the sieve. These include an expansion of some matrix-valued function and an inequality for the variance of a quadratic form. Finally, the counterexample has made it clear that sieve-wald consistency is all one can expect in a general unbalanced non-gaussian situation. 6.. In Section 4 a sequence of approximating spaces (i.e., a sieve) is constructed, and the consistency of the resulting estimates can be ensured under AI 4. Although the AI 4 condition seems minimal for a theorem of this generality, the convergence rate of the sieve to the full parameter space could be slow. The rate can be much improved if, for example, the true parameter vector is known to belong to a compact subspace of the interior of the parameter space, say, φ 0 δ M, where 0 < δ < M < (e.g., δ = 10 6 M = 10 6 ). In such a case the actual parameter space is 0 = δ M, each approximating space can be taken as 0 (i.e., δ = δ B = δ M = M B = M) and the corresponding sequence ˆφ is consistent [see (50) in the proof of Theorem 4.1].

22 180 J. JIAG 6.3. However, the AI 4 condition is not sufficient for Wald consistency. Additional assumptions are needed, either on the structure of the model or on distributions of the random effects and errors. For example, it would be interesting to see whether Wald consistency holds when normality actually holds (in which case Gaussian likelihood is the true likelihood). ote that in Example 5. the distribution of the errors is not normal. Acknowledgments. The paper is based on part of the author s Ph.D. dissertation at the University of California, Berkeley. I sincerely thank my advisor, Professor P. J. Bickel, for his guidance, encouragement and many helpful discussions. I also thank Professor L. Le Cam for helpful suggestions. The author also thanks a referee and an Associate Editor for their excellent and detailed suggestions which led to substantial improvement of the presentation. REFERECES Billingsley, P. (1986). Probability and Measure. Wiley, ew York. Chan,.. and Kwong, M. K. (1985). Hermitian matrix inequalities and a conjecture. Amer. Math. Monthly Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press. Cressie,. and Lahiri, S.. (1993). The asymptotic distribution of REML estimators. J. Multivariate Anal Das, K. (1979). Asymptotic optimality of restricted maximum likelihood estimates for the mixed model. Calcutta Statist. Assoc. Bull Grenander, U. (1981). Abstract Inference. Wiley, ew York. Hartley, H. O. and Rao, J.. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems (with discussion). J. Amer. Statist. Assoc Jiang, J. (1996). REML estimation: asymptotic behavior and related topics. Ann. Statist Khuri, A. I. and Sahai, H. (1985). Variance components analysis: a selective literature survey. Internat. Statist. Rev Le Cam, L. (1979). Maximum likelihood: an introduction. Lecture otes in Statist. 18. Univ. Maryland, College Park. Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Ann. Statist eyman, J. and Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrika Rao, C. R. and Kleffe, J. (1988). Estimation of Variance Components and Applications. orth- Holland, Amsterdam. Rao, J.. K. (1977). Contribution to the discussion of Maximum likelihood approaches to variance component estimation and to related problems by D. A. Harville. J. Amer. Statist. Assoc Richardson, A. M. and Welsh, A. H. (1994). Asymptotic properties of restricted maximum likelihood (REML) estimates for hierarchical mixed linear models. Austral. J. Statist Robinson, D. L. (1987). Estimation and use of variance components. Statistician Searle, S. R., Casella, G. and McCulloch, C. E. (199). Variance Components. Wiley, ew York. Sweeting, T. J. (1980). Uniform asymptotic normality of the maximum likelihood estimator. Ann. Statist

23 REML ESTIMATIO 1803 Thompson, Jr., W. A. (196). The problem of negative estimates of variance components. Ann. Math. Statist Wald, A. (1949). ote on the consistency of the maximum likelihood estimate. Ann. Math. Statist Department of Statistics Case Western Reserve University Euclid Avenue Cleveland, Ohio jiang@reml.cwru.edu

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS K Y B E R N E T I K A V O L U M E 4 3 ( 2 0 0 7, N U M B E R 4, P A G E S 4 7 1 4 8 0 A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX September 2007 MSc Sep Intro QT 1 Who are these course for? The September

More information

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

arxiv: v2 [math.st] 22 Oct 2016

arxiv: v2 [math.st] 22 Oct 2016 Vol. 0 (0000) arxiv:1608.08789v2 [math.st] 22 Oct 2016 On the maximum likelihood degree of linear mixed models with two variance components Mariusz Grządziel Department of Mathematics, Wrocław University

More information

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 409-435 c 2006, Indian Statistical Institute MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete

More information

Lecture 20: Linear model, the LSE, and UMVUE

Lecture 20: Linear model, the LSE, and UMVUE Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;

More information

ASYMPTOTIC DIOPHANTINE APPROXIMATION: THE MULTIPLICATIVE CASE

ASYMPTOTIC DIOPHANTINE APPROXIMATION: THE MULTIPLICATIVE CASE ASYMPTOTIC DIOPHANTINE APPROXIMATION: THE MULTIPLICATIVE CASE MARTIN WIDMER ABSTRACT Let α and β be irrational real numbers and 0 < ε < 1/30 We prove a precise estimate for the number of positive integers

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Chapter 1. Matrix Algebra

Chapter 1. Matrix Algebra ST4233, Linear Models, Semester 1 2008-2009 Chapter 1. Matrix Algebra 1 Matrix and vector notation Definition 1.1 A matrix is a rectangular or square array of numbers of variables. We use uppercase boldface

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Improved Methods for Monte Carlo Estimation of the Fisher Information Matrix

Improved Methods for Monte Carlo Estimation of the Fisher Information Matrix 008 American Control Conference Westin Seattle otel, Seattle, Washington, USA June -3, 008 ha8. Improved Methods for Monte Carlo stimation of the isher Information Matrix James C. Spall (james.spall@jhuapl.edu)

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović

ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS. Srdjan Petrović ON THE SIMILARITY OF CENTERED OPERATORS TO CONTRACTIONS Srdjan Petrović Abstract. In this paper we show that every power bounded operator weighted shift with commuting normal weights is similar to a contraction.

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.

More information

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction J. Appl. Math & Computing Vol. 13(2003), No. 1-2, pp. 457-470 ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS Myongsik Oh Abstract. The comparison of two or more Lorenz

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

To appear in The American Statistician vol. 61 (2007) pp

To appear in The American Statistician vol. 61 (2007) pp How Can the Score Test Be Inconsistent? David A Freedman ABSTRACT: The score test can be inconsistent because at the MLE under the null hypothesis the observed information matrix generates negative variance

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Banach Journal of Mathematical Analysis ISSN: (electronic)

Banach Journal of Mathematical Analysis ISSN: (electronic) Banach J. Math. Anal. 6 (2012), no. 1, 139 146 Banach Journal of Mathematical Analysis ISSN: 1735-8787 (electronic) www.emis.de/journals/bjma/ AN EXTENSION OF KY FAN S DOMINANCE THEOREM RAHIM ALIZADEH

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

RELATION BETWEEN SMALL FUNCTIONS WITH DIFFERENTIAL POLYNOMIALS GENERATED BY MEROMORPHIC SOLUTIONS OF HIGHER ORDER LINEAR DIFFERENTIAL EQUATIONS

RELATION BETWEEN SMALL FUNCTIONS WITH DIFFERENTIAL POLYNOMIALS GENERATED BY MEROMORPHIC SOLUTIONS OF HIGHER ORDER LINEAR DIFFERENTIAL EQUATIONS Kragujevac Journal of Mathematics Volume 38(1) (2014), Pages 147 161. RELATION BETWEEN SMALL FUNCTIONS WITH DIFFERENTIAL POLYNOMIALS GENERATED BY MEROMORPHIC SOLUTIONS OF HIGHER ORDER LINEAR DIFFERENTIAL

More information

Mi-Hwa Ko. t=1 Z t is true. j=0

Mi-Hwa Ko. t=1 Z t is true. j=0 Commun. Korean Math. Soc. 21 (2006), No. 4, pp. 779 786 FUNCTIONAL CENTRAL LIMIT THEOREMS FOR MULTIVARIATE LINEAR PROCESSES GENERATED BY DEPENDENT RANDOM VECTORS Mi-Hwa Ko Abstract. Let X t be an m-dimensional

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Effective Sample Size in Spatial Modeling

Effective Sample Size in Spatial Modeling Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS029) p.4526 Effective Sample Size in Spatial Modeling Vallejos, Ronny Universidad Técnica Federico Santa María, Department

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

roots, e.g., Barnett (1966).) Quite often in practice, one, after some calculation, comes up with just one root. The question is whether this root is

roots, e.g., Barnett (1966).) Quite often in practice, one, after some calculation, comes up with just one root. The question is whether this root is A Test for Global Maximum Li Gan y and Jiming Jiang y University of Texas at Austin and Case Western Reserve University Abstract We give simple necessary and sucient conditions for consistency and asymptotic

More information

A 3 3 DILATION COUNTEREXAMPLE

A 3 3 DILATION COUNTEREXAMPLE A 3 3 DILATION COUNTEREXAMPLE MAN DUEN CHOI AND KENNETH R DAVIDSON Dedicated to the memory of William B Arveson Abstract We define four 3 3 commuting contractions which do not dilate to commuting isometries

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

A Residual Inverse Power Method

A Residual Inverse Power Method University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR 2007 09 TR 4854 A Residual Inverse Power Method G. W. Stewart February 2007 ABSTRACT The inverse

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Ø Ñ Å Ø Ñ Ø Ð ÈÙ Ð Ø ÓÒ DOI: 10.2478/v10127-012-0017-9 Tatra Mt. Math. Publ. 51 (2012), 173 181 ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Júlia Volaufová Viktor Witkovský

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n

GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. A. ZVAVITCH Abstract. In this paper we give a solution for the Gaussian version of the Busemann-Petty problem with additional

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Estimation of parametric functions in Downton s bivariate exponential distribution

Estimation of parametric functions in Downton s bivariate exponential distribution Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

The Distribution of Generalized Sum-of-Digits Functions in Residue Classes

The Distribution of Generalized Sum-of-Digits Functions in Residue Classes Journal of umber Theory 79, 9426 (999) Article ID jnth.999.2424, available online at httpwww.idealibrary.com on The Distribution of Generalized Sum-of-Digits Functions in Residue Classes Abigail Hoit Department

More information

DAVIS WIELANDT SHELLS OF NORMAL OPERATORS

DAVIS WIELANDT SHELLS OF NORMAL OPERATORS DAVIS WIELANDT SHELLS OF NORMAL OPERATORS CHI-KWONG LI AND YIU-TUNG POON Dedicated to Professor Hans Schneider for his 80th birthday. Abstract. For a finite-dimensional operator A with spectrum σ(a), the

More information

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

GENERALIZED CANTOR SETS AND SETS OF SUMS OF CONVERGENT ALTERNATING SERIES

GENERALIZED CANTOR SETS AND SETS OF SUMS OF CONVERGENT ALTERNATING SERIES Journal of Applied Analysis Vol. 7, No. 1 (2001), pp. 131 150 GENERALIZED CANTOR SETS AND SETS OF SUMS OF CONVERGENT ALTERNATING SERIES M. DINDOŠ Received September 7, 2000 and, in revised form, February

More information

Approximation exponents for algebraic functions in positive characteristic

Approximation exponents for algebraic functions in positive characteristic ACTA ARITHMETICA LX.4 (1992) Approximation exponents for algebraic functions in positive characteristic by Bernard de Mathan (Talence) In this paper, we study rational approximations for algebraic functions

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Computations of Critical Groups at a Degenerate Critical Point for Strongly Indefinite Functionals

Computations of Critical Groups at a Degenerate Critical Point for Strongly Indefinite Functionals Journal of Mathematical Analysis and Applications 256, 462 477 (2001) doi:10.1006/jmaa.2000.7292, available online at http://www.idealibrary.com on Computations of Critical Groups at a Degenerate Critical

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Journal of Inequalities in Pure and Applied Mathematics

Journal of Inequalities in Pure and Applied Mathematics Journal of Inequalities in Pure and Applied Mathematics http://jipam.vu.edu.au/ Volume 7, Issue 1, Article 34, 2006 MATRIX EQUALITIES AND INEQUALITIES INVOLVING KHATRI-RAO AND TRACY-SINGH SUMS ZEYAD AL

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

The Skorokhod reflection problem for functions with discontinuities (contractive case)

The Skorokhod reflection problem for functions with discontinuities (contractive case) The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection

More information

RANKS OF QUANTUM STATES WITH PRESCRIBED REDUCED STATES

RANKS OF QUANTUM STATES WITH PRESCRIBED REDUCED STATES RANKS OF QUANTUM STATES WITH PRESCRIBED REDUCED STATES CHI-KWONG LI, YIU-TUNG POON, AND XUEFENG WANG Abstract. Let M n be the set of n n complex matrices. in this note, all the possible ranks of a bipartite

More information

determinantal conjecture, Marcus-de Oliveira, determinants, normal matrices,

determinantal conjecture, Marcus-de Oliveira, determinants, normal matrices, 1 3 4 5 6 7 8 BOUNDARY MATRICES AND THE MARCUS-DE OLIVEIRA DETERMINANTAL CONJECTURE AMEET SHARMA Abstract. We present notes on the Marcus-de Oliveira conjecture. The conjecture concerns the region in the

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014 Department of Mathematics, University of California, Berkeley YOUR 1 OR 2 DIGIT EXAM NUMBER GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014 1. Please write your 1- or 2-digit exam number on

More information

Fuzzy Statistical Limits

Fuzzy Statistical Limits Fuzzy Statistical Limits Mark Burgin a and Oktay Duman b a Department of Mathematics, University of California, Los Angeles, California 90095-1555, USA b TOBB Economics and Technology University, Faculty

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information