FUNCTIONAL REGRESSION FOR GENERAL EXPONENTIAL FAMILIES

Size: px
Start display at page:

Download "FUNCTIONAL REGRESSION FOR GENERAL EXPONENTIAL FAMILIES"

Transcription

1 Submitted to the Annals of Statistics FUNCTIONAL REGRESSION FOR GENERAL EXPONENTIAL FAMILIES BY WEI DOU, DAVID POLLARD AND HARRISON H. ZHOU Yale University The paper derives a minimax lower bound for rates of convergence for an infinite-dimensional parameter in an exponential family model. An estimator that achieves the optimal rate is constructed by maximum likelihood on finite-dimensional approximations with parameter dimension that grows with sample size. 1. Introduction. Our main purpose in this paper is to extend the theory developed by Hall and Horowitz 2007 for regression with mean a linear function of an unknown square integrable function B defined on a compact interval of the real line to observations y i from an exponential famly whose canonical parameter is of the form 1 0 BtX it dt for observed Gaussian processes X i. Our methods introduce several new technical devices. We establish a sharp approximation for maximum likelihood estimators for exponential families parametrized by linear functions of m-dimensional parameters, for an m that grows with sample size. We develop a change of measure argument inspired by ideas from Le Cam s theory of asymptotic equivalence of models to eliminate the effect of bias terms from the asymptotics of maximization estimators. And we obtain improved bounds for projections onto subspaces defined by eigenfunctions of perturbations of compact operators, bounds that simplify arguments involving estimates of unknown covariance kernels. More precisely, we consider problems where the observed data consist of independent, identically distributed pairs y i, X i where each X i is a Gaussian process indexed by a compact subinterval of the real line, which with no loss of generality we take to be [0, 1]. We write m for Lebesgue measure on the Borel sigma-field of [0, 1]. We denote the corresponding norm and inner product in the space L 2 m by and,. We assume the conditional distribution of y i given the process X i comes from Supported in part by NSF FRG grant DMS Supported in part by NSF grant MSPA-MCS Supported in part by NSF Career Award DMS and NSF FRG grant DMS AMS 2000 subject classifications: Primary 62J05, 60K35; secondary 62G20 Keywords and phrases: Functional estimation, exponential families, minimax rates of convergence, approximation of compact operators. 1

2 2 an exponential family {Q λ : λ R} with parameter 1 1 λ i = a + X i tbt dt 0 for an unknown constant a and an unknown B L 2 m. We focus on estimation of B using integrated squared error loss: LB, B n = B B n 2 = 1 0 Bt B n t 2 dt. In a companion paper we will show that our methods can be adapted to treat the problem of prediction of a linear functional 1 0 xtbt dt for a known x, extending theory developed by Cai and Hall In that paper we also consider some of the practical realities in applying the results to the economic problem of predicting occurence of recessions from the U.S. Treasury yield curve. Our models are indexed by a set F of parameters f = a, B, K, µ, where µ is the mean and K is the covariance kernel of the Gaussian process. Under assumptions on F see Section 3 analogous to the assumptions made by Hall and Horowitz 2007 for a problem of functional linear regression, we find a sequence {ρ n } that decreases to zero for which 2 lim inf sup n f F P n,f B B n 2 /ρ n > 0 for every estimating sequence { B n } and construct one particular estimating sequence of B n s for which: for each ɛ > 0 there exists a finite constant C ɛ such that 3 sup P n,f { B B n 2 > C ɛ ρ n } < ɛ for large enough n. f F For the collection of models F = FR, α, β defined in Section 3, the rate ρ n equals n 1 2β/α+2β. In Section 9 we establish a minimax lower bound by means of a variation on Assouad s Lemma. We begin our analysis of the rate-optimal estimator in Section 4, with an approximation theorem for maximum likelihood estimators in exponential family models for parameters whose dimensions change with sample size. The main result is stated in a form slightly more general than we need for the present paper because we expect the result to find other sieve-like applications. The approximations from this section lie at the heart of our construction of an estimator that achieves the minimax rate from Section 9. As an aid to the reader, we present our construction of the estimating sequence for 3 in two stages. First Section 5 we assume that both the mean µ and the

3 FUNCTIONAL REGRESSION 3 covariance kernel K are known. This allows us to emphasize the key ideas in our proofs without the many technical details that need to be handled when µ and K are estimated in the natural way. Many of those details involve the spectral theory of compact operators. We have found some of the results that we need quite difficult to dig out of the spectral theory literature. In Section 6 we summarize the theory that we use to control errors when approximating K: some of it is a rearrangement of ideas from Hall and Horowitz 2007 and Hall and Hosseini-Nasab 2006; some is adapted from the notes by Bosq 2000 and the monograph by Birman and Solomjak 1987; and some, such as the material in subsection 6.3 on approximation of projections, we believe to be new. Armed with the spectral theory, we proceed in Section 7 to the case where µ and K are estimated. We emphasize the parallels with the argument for known µ and K, postponing the proofs of the extra approximation arguments mostly collected together as Lemma 28 to the following section. The final two sections of the paper establish a bound on the Hellinger distance between members of an exponential family, the key to our change of measure argument, and a maximal inequality for Gaussian processes. 2. Notation. For each matrix A, the spectral norm is defined as A 2 := 1/2. sup u 1 Au and the Frobenius norm by A F := i,j A2 i,j If A is symmetric, with eigenvalues λ 1,..., λ k, then A 2 = max i λ i = sup u 1 u Au A F. If A is also positive definite then the absolute values are superfluous for the first two equalities. When we want to indicate that a bound involving constants c, C, C 1,... holds uniformly over all models indexed by a set of parameters F, we write cf, CF, C 1 F,.... By the usual convention for eliminating subscripts, the values of the constants might change from one paragraph to the next: a constant C 1 F in one place needn t be the same as a constant C 1 F in another place. For sequences of constants c n that might depend on F, we write c n = O F 1 and o F 1 and so on to show that the asymptotic bounds hold uniformly over F. We write hp, Q for the Hellinger distance between two probability measures P and Q. If both P and Q are dominated by some measure ν, with densities p and q, then h 2 P, Q = ν p q 2. We use Hellinger distance to bound total variation distance, P Q TV := sup A P A QA = 1 2ν p q hp, Q.

4 4 For product measures we use the bound h 2 i n P i, i n Q i i n h2 P i, Q i. To avoid confusion with transposes, we use the dot notation or superscript notation to denote derivatives. For example,... ψ or ψ 3 both denote the third derivative of a function ψ, 3. The model. Let {Q λ : λ R} be an exponential family of probability measures with densities dq λ /dq 0 = f λ y = exp λy ψλ. Remember that e ψλ = Q 0 e λy and that the distribution Q λ has mean ψ 1 λ and variance ψ 2 λ. We assume: ψ3 There exists an increasing real function G on R + such that ψ 3 λ + h ψ 2 λg h for all λ and h Without loss of generality we assume G0 1. ψ2 For each ɛ > 0 there exists a finite constant C ɛ for which ψ 2 λ C ɛ expɛλ 2 for all λ R. Equivalently, ψ 2 λ exp oλ 2 as λ. As shown in Section 10, these assumptions on the ψ function imply that 4 h 2 Q λ, Q λ+δ δ 2 ψ 2 λ 1 + δ G δ for all λ, δ R. Remark. We may assume that ψ 2 λ > 0 for every real λ. Otherwise we would have 0 = ψ 2 λ 0 = var λ0 y = νf λ0 yy ψ 1 λ 0 2 for some λ 0, which would make y = ψ 1 λ 0 for ν almost all y and Q λ Q λ0 for every λ. We assume the observed data are iid pairs y i, X i for i = 1,..., n, where: a Each {X i t : 0 t 1} is distributed like {Xt : 0 t 1}, a Gaussian process with mean µt and covariance kernel Ks, t. b y i X i Q λi with λ i = a + X i, B for an unknown {Bt : 0 t 1} in L 2 m and a R. DEFINITION 5. For real constants α > 1 and β > α + 3/2 and R > 0, define F = FR, α, β as the set of all f = a, B, µ, K that satisfy the following conditions. K The covariance kernel is square integrable with respect to m m and has an eigenfunction expansion as a compact operator on L 2 m Ks, t = k N θ kφ k sφ k t where the eigenvalues θ k are decreasing with Rk α θ k θ k+1 +α/rk α 1.

5 FUNCTIONAL REGRESSION 5 a a R µ µ R B B has an expansion Bt = k N b kφ k t with b k Rk β, for the eigenfunctions defined by the kernel K. Remarks. all k < j, The awkward lower bound for θ k in Assumption K implies, for j 6 θ k θ j R 1 αx α 1 dx = R 1 k α j α. k If K and µ were known, we would only need the lower bound θ k R 1 k α and not the lower bound for θ k θ k+1. As explained by Hall and Horowitz 2007, page 76, the stronger assumption is needed when one estimates the individual eigenfunctions of K. Note that the subset B K of L 2 m in which B lies depends on K. We regard the need for the stronger assumption on the eigenvalues and the irksome Assumption B as artifacts of the method of proof, but we have not yet succeeded in removing either assumption. More formally, we write P µ,k for the distribution a probability measure on L 2 m of each Gaussian process X i. The joint distribution of X 1,..., X n is then P n,µ,k = Pµ,K n. We identify the y i s with the coordinate maps on R n equipped with the product measure Q n,a,b,x1,...,x n := i n Q λi, which can also be thought of as the conditional joint distribution of y 1,..., y n given X 1,..., X n. Thus the P n,f in equations 2 and 3 can be rewritten as an iterated expectation, P n,f = P n,µ,k Q n,a,b,x1,...,x n, the second expectation on the right-hand side averaging out over y 1,..., y n for given X 1,..., X n, the first averaging out over X 1,..., X n. To simplify notation, we will often abbreviate Q n,a,b,x1,...,x n to Q n,a,b. 4. Maximum likelihood estimation. The theory in this section combine ideas from Portnoy 1988 and from Hjort and Pollard We write our results in a notation that makes the applications in Section 5 and 7 more straightforward. The notational cost is that the parameters are indexed by {0, 1,..., N}. To avoid an excess of parentheses we write N + for N + 1. In the applications N changes with the sample size n and Q is replaced by Q n,a,b,n or Q n,a,b,n. Suppose ξ 1,..., ξ n are nonrandom vectors in R N +. Suppose Q = i n Q λi with λ i = ξ i γ for a fixed γ = γ 0, γ 1,..., γ N in R N +. Under Q, the coordinate maps y 1,..., y n are independent random variables with y i Q λi. The log-likelihood for fitting the model is L n g = i n ξ igy i ψξ ig for g R N +, which is maximized over R N + at the MLE ĝ = ĝ n.

6 6 Remark. As a small amount of extra bookkeeping in the following argument would show, we do not need ĝ to exactly maximize L n. It would suffice to have L n ĝ suitably close to sup g L n g. In particular, we need not be concerned with questions regarding existence or uniqueness of the argmax. Define i J n = i n ξ iξ i ψ2 λ i, an N + N + matrix ii w i := Jn 1/2 ξ i, an element of R N + iii W n = i n w i y i ψ 1 λ i, an element of R N + Notice that QW n = 0 and var Q W n = i n w iw i ψ2 λ i = I N+ and Q W n 2 = trace var Q W n = N +. LEMMA 7. Suppose 0 < ɛ 1 1/2 and 0 < ɛ 2 < 1 and max i n w i ɛ 1ɛ 2 2G1N + with G as in Assumption ψ3. Then ĝ = γ + J 1/2 n W n + r n with r n ɛ 1 on the set { W n N + /ɛ 2 }, which has Q-probability greater than 1 ɛ 2. PROOF. The equality Q W n 2 = N + and Tchebychev give Q{ W n > N + /ɛ 2 } ɛ 2. Reparametrize by defining t = Jn 1/2 g γ. The concave function L n t := L n γ + Jn 1/2 t L n γ = y iw i n it + ψλ i ψλ i + w it is maximized at t n = J 1/2 n ĝ γ. It has derivative L n t = i n w i y i ψ 1 λ i + w it. For a fixed unit vector u R N + and a fixed t R N +, consider the real-valued function of the real variable s, Hs := u Ln st = i n u w i y i ψ 1 λ i + sw it, which has derivatives Ḣs = i n u w i w itψ 2 λ i + sw it Ḧs = i n u w i w it 2 ψ 3 λ i + sw it.

7 FUNCTIONAL REGRESSION 7 Notice that H0 = u W n and Ḣ0 = u i n w iw i ψ2 λ i t = u t. Write M n for max i n w i. By virtue of Assumption ψ3, Ḧs i n u w i w it 2 ψ 2 λ i G sw it M n G M n st t i n w iw iψ 2 λ i t = M n G M n st t 2. By Taylor expansion, for some 0 < s < 1, H1 H0 Ḣ0 1 2 Ḧs 1 2 M ng M n t t 2. That is, 8 u Ln t W n + t 1 2 M ng M n t t 2. Approximation 8 will control the behavior of Ls := L n W n +su, a concave function of the real argument s, for each unit vector u. By concavity, the derivative is a decreasing function of s with Ls = u Ln W n + su = s + Rs Rs 1 2 M ng M n W n + su W n + su 2 On the set { W n N + /ɛ 2 } we have W n ± ɛ 1 u N + /ɛ 2 + ɛ 1. Thus implying M n W n ± ɛ 1 u ɛ 1ɛ 2 N + /ɛ 2 + ɛ 1 < 1, 2G1N + R±ɛ M ng1 W n ± ɛ 1 u 2 ɛ 1ɛ 2 N + /ɛ 2 + ɛ 2 1 G1N + ɛ ɛ 2 1ɛ 2 /N + < 5 8 ɛ 1. Deduce that Lɛ 1 = ɛ 1 + Rɛ ɛ 1 L ɛ 1 = ɛ 1 + R ɛ ɛ 1 The concave function s L n W n + su must achieve its maximum for some s in the interval [ ɛ 1, ɛ 1 ], for each unit vector u. It follows that t n W n ɛ 1.

8 8 COROLLARY 9. J n = nda n D Suppose ξ i = Dη i for some nonsingular matrix D, so that wherea n := 1 n If B n is another nonsingular matrix for which 10 A n B n 2 2 Bn and if i n η iη iψ 2 λ i. 11 max i n η i ɛ n/n + for some 0 < ɛ < 1 G1 32 Bn 1 2 then for each set of vectors κ 0,..., κ N in R N + there is a set Y κ,ɛ with QY c κ,ɛ < 2ɛ on which 0 j N κ jĝ γ 2 6 B 1 n 2 nɛ 0 j N D 1 κ j 2. Remark. For our applications of the Corollary in Sections 5 and 7, we need D = diagd 0, D 1,..., D N and κ j = e j, the unit vector with a 1 in its jth position, for j m and κ j = 0 for j > m. In our companion paper we will need the more general κ j s. and B 1 2 A n B n 2 1/2, which justifies PROOF. First we establish a bound on the spectral distance between A 1 n Define H = Bn 1 the expansion A 1 n Bn 1 2 = A n I. Then H 2 B 1 n I + H 1 I B 1 n 2 j 1 H k 2 B 1 n n. 2 B 1 n 2. As a consequence, A 1 n 2 2 Bn 1 2. Choose ɛ 1 = 1/2 and ɛ 2 = ɛ in Lemma 7. The bound on max i n η i gives the bound on max i n w i needed by the Lemma: n w i 2 = η idj n /n 1 Dη i = η ia 1 n η i A 1 n 2 η i 2. Define K j := Jn 1/2 κ j, so that κ j ĝ γ 2 2K j W n 2 + 2K j r n 2. By Cauchy-Schwarz, j K jr n 2 K j 2 r n 2 = U κ r n 2 j where U κ := j κ jj 1 n κ j = j n 1 D 1 κ j A 1 D 1 κ j 2n 1 B 1 n 2 j D 1 κ j 2. n

9 FUNCTIONAL REGRESSION 9 For the contribution V κ := j K j W n 2 the Cauchy-Schwarz bound is too crude. Instead, notice that QV κ = U κ, which ensures that the complement of the set Y κ,ɛ := { W n N + /ɛ} {V κ U κ /ɛ} has Q probability less that 2ɛ. On the set Y κ,ɛ, 0 j N κ jĝ γ 2 2V κ + 2U κ r n 2 3U κ /ɛ. The asserted bound follows. 5. Known Gaussian distribution. Initially we suppose that µ and K are known. We can then calculate all the eigenvalues θ k, the eigenfunctions φ k for K, and the coefficients z i,k := Z i, φ k for the expansion X i µ = Z i = k N z i,kφ k. The random variables z i,k are independent with z i,k N0, θ k. The random variables η i,k := z i,k / θ k are independent standard normals. Under Q n = Q n,a,b, the y i s are independent, with y i Q λi and λ i = a + X i, B = b 0 + k N z i,kb k where b 0 = a + µ, B. Our task is to estimate the b k s with sufficient accuracy to be able to estimate Bt = k N b kφ k t within an error of order ρ n = n 1 2β/α+2β. In fact it will suffice to estimate the component H m B of B in the subspace spanned by {φ 1,..., φ m } with m n 1/α+2β because 12 H mb 2 = k>m b2 k = O F m 1 2β = O F ρ n. We might try to estimate the coefficients b 0,..., b m by choosing ĝ = ĝ 0,..., ĝ m to maximize a conditional log likelihood over all g in R m+1, i n y iλ i,m ψλ i,m with λ i,m = g k m z i,kg k. To this end we might try to appeal to Corollary 9 in Section 4, with κ j equal to the unit vector with a 1 in its jth position for j m and κ j = 0 otherwise. That would give a bound for j m ĝ j γ j 2. Unfortunately, we cannot directly invoke the Corollary with N = m to estimate γ = b 0, b 1,..., b N when 13 Q = Q n,a,b and D = diag1, θ 1,..., θ N 1/2 ξ i = 1, z i,1,..., z i,n and η i = 1, η i,1,..., η i,n because λ i ξ i γ.

10 10 Remark. We could modify Corollary 9 to allow l i = ξ i γ + bias i, for a suitably small bias term, but at the cost of extra regularity conditions and a more delicate argument. The same difficulty arises whenever one investigates the asymptotics of maximum likelihood with the true distribution outside the model family. Instead, we use a two-stage estimation procedure that eliminates the bias term by a change of measure. Condition on the X i s. Consider an N much larger than m for which N n ζ with 2 + 2α 1 > ζ > α + 2β 1 1, Such a ζ exists because the assumptions α > 1 and β > α + 3/2 imply α + 2β 1 > 2 + 2α. Define ξ i, D, and η i as in equation 13. For Q use the probability measure Q n,a,b,n := i n Q λi,n with λ i,n := ξ i γ and γ = b 0, b 1,..., b N. Choose B n := P n,µ,k A n. Define X n = X Z,n X η,n X A,n, where X Z,n := {max i n Z i 2 C 0 log n} X η,n := {max i n η i 2 C 0 N log n} X A,n := { A n B n 2 2 B 1 n 2 1 } If we choose a large enough constant C 0 = C 0 F, Lemma 41 and its Corollary in Section 11 ensure that P n,µ,k X c Z,n 2/n and P n,µ,kx c η,n 2/n; and in subsection 5.1 we show that B 1 n 2 = O F 1 and P n,µ,k A n B n 2 2 = o F 1. Thus P n,µ,k X c n = o F 1. Moreover, on the set X n, inequality 10 holds by construction and inequality 11 holds for large enough n because max i n η i 2 O F N log n = o F n/n. Estimate γ by the ĝ = ĝ 0,..., ĝ N defined in Section 4. Then discard most of the estimates by defining B n := 1 k m ĝkφ k. For each realization of the X i s in X n, the Lemma gives a set Y m,ɛ with Q n,a,b,n Y c m,ɛ < 2ɛ on which 1 k m ĝ k γ k 2 = O F which implies 1 k m θ 1 k = O F m 1+α /n = O F ρ n, B n B 2 = 1 k m ĝ k γ k 2 + k>m b2 k = O F ρ n.

11 FUNCTIONAL REGRESSION 11 In replacing Q n,a,b by Q n,a,b,n we eliminate the bias problem but now we have to relate the probability bounds for Q n,a,b,n to bounds involving Q n,a,b. As we show in subsection 5.2, there exists a sequence of nonnegative constants c n of order o F log n, such that 17 Q n,a,b Q n,a,b,n 2 TV e 2cn i n λ i λ i,n 2 on X n. From this inequality it follows, for a large enough constant C ɛ, that P n,µ,k Q n,a,b { B n B 2 > C ɛ ρ n } P n,µ,k X c n + P n,µ,k X n Q n,a,b Q n,a,b,n TV + Q n,a,b,n Y c m,ɛ o F 1 + 2ɛ + e cn i n P n,µ,k λ i λ i,n 2 1/2. By construction, λ i λ i,n = k>n z i,kb k with the z i,k s independent and z i,k N0, θ k. Thus i n P n,µ,k λ i λ i,n 2 n k>n θ kb 2 k = O F nn 1 α 2β = o F e 2cn because ζ > α + 2β 1 1. That is, we have an estimator that achieves the O F ρ n minimax rate Approximation of A n. Throughout this subsection abbreviate P n,µ,k to P. Remember that where A n = n 1 i n η iη iψ 2 λ i,n with λ i,n = γ Dη i, γ = ā, b 1,..., b N D = diag1, θ 1,..., θ N η i = 1, η i,1,..., η i,n With B n = PA n, we need to show Bn 1 2 = O F 1 and P A n B n 2 2 = o F1. The matrix A n is an average of n independent random matrices each of which is distributed like NN ψ 2 γ DN, where N = N 0, N 1,..., N N with N 0 1 and the other N j s are independent N0, 1 s. Moreover, by rotational invariance of the spherical normal, we may assume with no loss of generality that γ DN = ā+κn 1, where κ 2 = N θ k=1 kb 2 k = O F 1.

12 12 Thus where B n = PNN ψ 2 ā + κn 1 = diagf, r 0 I N 1 [ ] r j := PN j r0 r 1 ψ2 ā + κn 1 and F = 1. r 1 r 2 The block diagonal form of B n simplifies calculation of spectral norms. B 1 n 2 = diagf 1, r 1 max 0 I N 1 2 F 1 2, r0 1 I r0 + r 2 N 1 2 max r 0 r 2 r 2 1, r0 1. Assumption ψ2 ensures that both r 0 and r 2 are O F 1. Continuity and strict positivity of ψ 2, together with max ā, κ = O F 1, ensure that c 0 := infā,κ inf x 1 ψ 2 ā + κx > 0. Thus +1 2πr0 c 0 e x2 /2 dx > 0 Similarly 2πr0 r 2 r 2 1 = 2πr 0 Pψ 2 ā + κn 1 N 1 r 1 /r c 0 r 0 x r 1 /r 0 2 e x2 /2 dx c 0 r 0 x 2 e x2 /2 dx. 1 It follows that B 1 n 2 = O F 1. The random matrix A n B n is an average of n independent random matrices each distributed like NN ψ 2 ā + κn 1 minus its expected value. Thus P A n B n 2 2 P A n B n 2 F = n 1 N var j N 0 j,k N k ψ 2 γ DN. Assumption ψ2 ensures that each summand is O F 1, which leaves us with a O F N 2 /n = o F 1 upper bound Total variation argument. To establish inequality 17 we use the bound Q n,a,b Q n,a,b 2 TV h 2 Q n,a,b, Q n,a,b i n h2 Q λi, Q λi,n By Lemma 4 h 2 Q λi, Q λi,n δ 2 i ψ 2 λ i 1 + δ i g δ i 1

13 FUNCTIONAL REGRESSION 13 where δ i = λ i λ i,n = Z i, B H N Z i, B = Z i, H NB Z i HNB O F N 1 2β log n = o F 1 Thus all the 1 + δ i g δ i factors can be bounded by a single O F 1 term. For a, B, µ, K FR, α, β and with the Z i s controlled by X n, λ i a + µ + Z i B C 2 log n for some constant C 2 = C 2 F. Assumption ψ2 then ensures that all the ψ 2 λ i are bounded by a single exp o F log n term. 6. Approximation of compact operators. Suppose T is a positive, self-adjoint compact operator on a Hilbert space H with eigenvectors {e k } and eigenvalues {θ k }. That is, T e i = θ i e i with θ 1 θ 2 0. For each x in H, T = k N θ ke k e k, a series that converges in operator norm. Let T be another positive, self-adjoint compact operator on H with corresponding representation T = θ k N k ẽ k ẽ k. Define := T T and δ =. The operator T also has a representation 18 T = j,k N T j,k e j e k. Note that T j,k = T j,k because T is self-adjoint. This representation gives = T j,k N j,k θ j {j = k} e j e k and 2 = sup x =1 x, x 2 j,k N T j,k θ j {j = k} 2. The last inequality will lend itself to the calculation of the expected value of 2 when T is random, leading to probabilistic bounds for δ.

14 14 In this section we collect some general consequences of δ being small. In the next section we draw probabilistic conclusions when T is random, for the special case where T = K and T = K, the usual estimate of the covariance kernel, both acting on H = L 2 m. The eigenvectors will become eigenfunctions φ 1, φ 2,... and φ 1, φ 2,.... We feel this approach makes it easier to follow the overall argument. Both {e j : j N} and {ẽ k : k N} are orthonormal bases for H. Define σ j,k := e j, ẽ k. Then e j = k N σ j,kẽ k and ẽ k = j N σ j,ke j and {j = j } = e j, e j = k N σ j,kσ j,k Approximation of eigenvalues. The eigenvalues have a variational characterization Bosq, 2000, Section 4.2: 19 θ j = inf sup{ x, T x : x L and x = 1}. diml<j The first infimum runs over all subspaces L with dimension at most j 1. When j equals 1 the only such subspace is. Both the infimum and the supremum are achieved: by L j 1 = span{e i : 1 i < j} and x = e j. Similar assertions hold for T and its eigenvalues. By the analog of 19 for T, θ j sup{ x, T x : x L j 1 and x = 1} sup{ x, T x δ : x L j 1 and x = 1} = θ j δ. Argue similarly with the roles of T and T reversed to conclude that 20 θ j θ j δ for all j N Approximation of eigenvectors. We cannot hope to find a useful bound on ẽ k e k, because there is no way to decide which of ±ẽ k should be approximating e k. However, we can bound f k, where f k = σ k ẽ k e k with σ k := sign σ k,k := which will be enough for our purposes. { +1 if σk,k 0 1 otherwise,

15 FUNCTIONAL REGRESSION 15 We also need to assume that the eigenvalue θ k is well separated from the other θ j s, to avoid the problem that the eigenspace of T for the eigenvalue θ k might have dimension greater than one. More precisely, we consider a k for which which implies ɛ k := min{ θ j θ k : j k} > 5δ, θ k θ j θ k θ j δ 4 5 θ k θ j 4 5 ɛ k. The starting point for our approximations is the equality 21 ẽ k, e j = T ẽ k, e j ẽ k, T e j = θ k θ j σ j,k. For j k we then have which implies θ k θ j 2 σ 2 j,k σ k ẽ k, e j 2 2 f k, e j e k, e j 2, σ 2 j,k 25 8 f k, e j 2 /ɛ 2 k + 2 T 2 j,k/θ k θ j 2 because T e k, e j = 0 for j k. To simplify notation, write j for j N {j k}. The introduction of the σ k also ensures that f k 2 = e k 2 + ẽ k 2 2σ k e k, ẽ k = 2 2 σ k,k 2 2σ 2 k,k because σ k,k 1 = 2 j σ2 j,k j 25 4 f k, e j 2 /ɛ 2 k j The first sum on the right-hand side is less than T 2 j,k/θ k θ j f k 2 /ɛ 2 k 2 f k 2 /4δ 2 f k 2 /4. The second sum can be written as 25 Λ k 2 /4 for Λ k := { T j,k /θ k θ j if j k Λ k,j e j with Λ k,j := j N 0 if j = k. Our bound for f k 2 with an untidy 25/3 increased to 9 then takes the convenient form 22 f k 2 9 Λ k 2 if ɛ k > 5.

16 16 For our applications, P Λ k 2 will be of order Ok 2 /n. When δ is much smaller than ɛ k we can get an even better approximation for f k itself. Start once more from equality 21, still assuming that ɛ k > 5δ. For j k, σ k σ j,k = σ k ẽ k, e j / θ k θ j = e k + f k, e j /θ k + γ k θ j where γ k = θ k θ k = Λ k,j 1 γ 1 k + f k, e j θ j θ k θ k θ j because T e k, e j = 0 = Λ k,j + r k,j where r k,j := θ k θ k θ j θ k Λ k,j + f k, e j θ k θ j. The r k,j s are small: r k,j 5 δ Λk,j + f k, e j 4 θ k θ j 23 5δ Λ k θ k θ j by inequality 22. for j k, if ɛ k > 5δ Define r k,k = σ k,k 1 = 1 2 f k 2 and r k = j N r k,je j. We then have a representation cf. Hall and Hosseini-Nasab, 2006, equation 2.8 and Cai and Hall, 2006, f k = σ k ẽ k e k = σ k ẽ k, e k 1 e k + j σ kσ j,k e j = Λ k + r k Approximation of projections. The operator H J = k J e k e k projects elements of H orthogonally onto span{e k : k J}; the operator H J = k J ẽk ẽ k projects elements of H orthogonally onto span{e k : k J}. We will be interested in the case J = {1, 2,..., p} with p equal to either the m or the N from Section 5. In that case, we also write H p and H p for the projection operators. In this subsection we establish a bound for H J B H J B for a B = j b je j in H. The difference H J H J equals k J σ kẽ k σ k ẽ k e k e k = k J σ kẽ k r k + k J e k + f k Λ k + k J e k + Λ k + r k e k e k e k = R J + k J e k Λ k + Λ k e k where R J := k J σ kẽ k r k + f k Λ k + r k e k.

17 FUNCTIONAL REGRESSION 17 Self-adjointness of T implies T j,k = T k,j and hence Λ j,k = Λ j,k. The antisymmetry eliminates some terms from the main contribution to H J H J : e k J k Λ k + Λ k e k = Λ k J j J c k,j e k e j + e j e k. With this simplification we get the following bound for H J H J B 2 : 3 e k J k Λ j J k,jb c j e j J c j Λ k J k,jb k R J B 2 The first two sums contribute 3 k J j J c Λ k,jb j j J c k J Λ k,jb k 2 In the next section the expected value of both sums will simplify because PΛ k,j Λ k,j will be zero if j j. For the three contributions to the bound for R J B 2 we make repeated use of the inequality, based on equations 22 and 23, r k, x 81 2 Λ k 2 x k + 5δ Λ k x j j θ k θ j, which is valid whenever ɛ k > 5δ. To avoid an unnecessary calculation of precise constants, we adopt the convention of the variable constant: we write C for a universal constant whose value might change from one line to the next. The first two contributions are: k J σ kẽ k r k, B 2 = k J r k, B 2 and C k J b2 k Λ k 4 + Cδ 2 k J Λ k 2 j f k J k Λ k, B 2 f k J k Λ k, B C k J Λ k 2 2 k J j Λ k,jb j 2. 2 b j θ k θ j For the third contribution, let x = j x je j be an arbitrary unit vector in H. Then 2 2 r k J k e k B, x = b k J k r k, x C b k J kx k Λ k Cδ 2 Λ 2 x j k J k b k j θ k θ j C k J b k Λ k Cδ 2 k J Λ k 2 k J b2 k j 1 θ k θ j 2

18 18 take the supremum over x, which doesn t even appear in the last line, to get the same bound for k J b kr k 2. In summary: if min k J ɛ k > 5δ then H J H J B 2 is bounded by a universal constant times Λ k J k,jb k + k J b2 k Λ k 4 25 k J j J c Λ k,jb j j J c + k J b k Λ k k J Λ k 2 k J + δ 2 2 Λ k J k 2 b j j + δ 2 θ k θ j k J Λ k 2 k J b2 k j 2 1. θ k θ j j Λ k,jb j 7. Unknown Gaussian distribution. When µ and K are unknown, we estimate them in the usual way: µ n t = X n t = n 1 i n X it and Ks, t = n 1 1 X i s X n s X i t X n t i n = n 1 1 Z i s Zs Z i t Zt, i n which has spectral representation Ks, t = θ φ k N k k s φ k t. In fact we must have θ k = 0 for k n because all the eigenfunctions φ k corresponding to nonzero θ k s must lie in the n 1-dimensional space spanned by {Z i Z : i = 1, 2,..., n}. The construction and analysis of the new estimator B will parallel the method developed in Section 5 for the case of known K and µ. The quantities m and N are the same as before. We write H p for p = N or p = m for the operator that projects orthogonally onto span{ φ 1,..., φ p }. Essentially we have only to estimate all the quantities that appeared in the previous proof then show that none of the errors of estimation is large enough to upset analogs of the calculations from Section 5. There is a slight complication caused by the fact that we do not know which of ± φ j should be used to approximate φ j. At strategic moments we will be forced to multiply by the matrix S := diagσ 0,..., σ N with σ 0 = 1 and σ k = sign φ k, φ k for k 1. The results from Section 6 will control the difference f k := σ φ k k φ k. The other key quantities are: i := K K 2

19 ii D = diag1, θ 1,..., θ N 1/2 iii z i = z i,1,..., z i,n where z i,k = Z i, φ k FUNCTIONAL REGRESSION 19 iv z = z 1,..., z N where z k = Z, φ k = n 1 i n z i,k v ξ i = 1, z i z and η i = D 1 ξ i. [We could define η i = D 1 ξ i but then we would need to show that D 1 ξ i D 1 ξ i. Our definition merely rearranges the approximation steps.] vi γ := γ 0, b 1,..., b N where B = b φ k N k k and γ 0 := a + B, X. [Note that λ i = γ 0 + B, Z i Z.] vii λ i,n = γ 0 + H N B, Z i Z = ξ i γ. viii ĝ = argmax g R N+1 i n y i ξ i g ψ ξ i g and B = 1 k m ĝk φ k. [Note that these two quantities differ from the ĝ and B in Section 5.] ix Ãn = n 1 i n η i η i ψ2 λ i,n The use of estimated quantities has one simplifying consequence: Z i t Zt = z k N i,k z k φ k t so that θ k {j = k} = Ks, t φ j s φ k t ds dt which implies n 1 1 i n z i z i = D 2 and = n 1 1 i n z i,j z j z i,k z k, 26 n 1 1 i n η i η i = D 1 D 2 D 1 := diag1, θ 1 /θ 1,..., θ N /θ N. We will analyze K by rewriting it using the eigenfunctions for K. Remember that z i,j = Z i, φ j and the standardized variables η i,j = z i,j / θ j are independent N0, 1 s. Define z j = Z, φ j and η j = n 1 i n η i,j and C j,k := n 1 1 i n η i,j η j η i,k η k, a sample covariance between two independent N0, I N random vectors. Then Z i t Zt = j N z i,j z j φ j t = j N θ j η i,j η j φ j t

20 20 and 27 Ks, t = j,k N K j,k φ j sφ k t with K j,k = θ j θ k C j,k Moreover, as shown in Section 6, the main contribution to f k = σ k φ k φ k is Λ k := j N Λ k,jφ j with Λ k,j := { θj θ k C j,k /θ k θ j if j k 0 if j = k. In fact, most of the inequalities that we need to study the new B come from simple moment bounds Lemma 31 for the sample covariances C j,k and the derived bounds Lemma 32 for the Λ k s. As before, most of the analysis will be conditional on the X i s lying in a set with high probability on which the various estimators and other random quantities are well behaved. LEMMA 28. For each ɛ > 0 there exists a set X ɛ,n, depending on µ and K, with sup F P X c n,µ,k ɛ,n < ɛ for all large enough n and on which, for some constant C ɛ that does not depend on µ or K, i C ɛ n 1/2 ii max i n Z i C ɛ log n and Z Cɛ n 1/2 iii H m H m B 2 = o F ρ n iv H N H N B 2 = O F n 1 ν for some ν > 0 that depends only on α and β v max i n η i 2 = o F n/n vi SÃn S A n 2 = o F 1 This Lemma whose proof appears in Section 8 contains everything we need to show that B B 2 has the uniform O F ρ n rate of convergence in P n,f probability, as asserted by equation 3. In what follows, all assertions refer to the numbered parts of Lemma 28. As before, the component of B orthogonal to span{ φ 1,..., φ m } causes no trouble because B B 2 = ĝ γ H mb 2 and, by iii, H mb 2 2 H mb H m H m B 2 = O F ρ n on X ɛ,n.

21 FUNCTIONAL REGRESSION 21 To handle ĝ γ 2, invoke Corollary 9 for X i s in X ɛ,n, with η i replaced by η i and A n replaced by Ãn and B n replaced by B n = SB n S, the same B n and D as before, and Q equal to Q n,a,b,n = i n Q λi,n. to get a set Ỹm,ɛ with Q n,a,b,nỹc m,ɛ < 2ɛ on which ĝ γ 2 2 = O Fρ n. The conditions of the Corollary are satisfied on X ɛ,n, because of v and Ãn B n 2 Ãn SA n S 2 + SA n S SB n S 2 = o F 1. To complete the proof it suffices to show that Q n,a,b,n Q n,a,b,n TV tends to zero. First note that λ i,n λ i,n = a + B, X + H N B, Z i Z a B, µ H N B, Z i which implies that, on X ɛ,n, = H NB, Z H NB, Z + H NB, Z + H N B H N B, Z i λ i,n λ i,n 2 2 HNB, Z H 2 N B H N B 2 Z i + Z O F N 1 2β Cɛ 2 n 1 + O F n 1 ν Cɛ 2 n 1/2 + log n 2 29 = O F n 1 ν for some 0 < ν < ν. Now argue as in subsection 5.2: on X ɛ,n, Q n,a,b,n Q n,a,b,n 2 TV i n h2 Q λi,n, Q λi,n exp o F log n i n λ i,n λ i,n 2 = o F 1. Finish the argument as before, by splitting into contributions from X c n and X n Ỹc m,ɛ and X n Ỹm,ɛ. 8. Proofs of unproven assertions from Section 7. Many of the inequalities in this section involve sums of functions of the θ j s. The following result will save us a lot of repetition. To simplify the notation, we drop the subscripts from P n,µ,k. LEMMA 30. i For each r 1 there is a constant C r = C r F for which κ k r, γ := j γ {j k} j N θ j θ k r C r 1 + k r1+α γ if r > 1 C k 1+α γ log k if r = 1

22 22 ii For each p, k p j>p k α 2β j α θ k θ j 2 = O F p 1 α PROOF. For i, argue in the same way as Hall and Horowitz 2007, page 85, using the lower bounds c α j α if j < k/2 θ j θ k c α j k k α 1 if k/2 j 2k c α k α if j > 2k where c α is a positive constant. For ii, split the range of summation into two subsets: {k, j : j > maxp, 2k} and {k, j : p/2 < k p < j 2k}. The first subset contributes at most k p k α 2β j>maxp,2k j α c α k α 2 = O F p 1 α because α 2β < 3. The second subset contributes at most p/2<k p k α 2β c 2 α k 2α+2 j>p j α j k 2 = O F p.p 2+α 2β p α O1, which is of order o F p α. The distribution of C j,k does not depend on the parameters of our model. Indeed, by the usual rotation of axes we can rewrite n 1C j,k as U j U k, where U 1, U 2,... are independent N0, I n 1 random vectors. This representation gives some useful equalities and bounds. LEMMA 31. Uniformly over distinct j, k, l, i PC j,j = 1 and P C j,j 1 2 = 2n 1 1 ii PC j,k = PC j,k C j,l = 0 iii PC 2 j,k = On 1 iv PC 2 j,k C2 l,k = ton 2 v PC 4 j,k = On 2 PROOF. Assertion i is classical because U j 2 χ 2 n 1. For assertion ii use PU 1 U 2 U 2 = 0 and PU 1U 2 U 2U 3 U 2 = trace U 2 U 2PU 3 U 1 = 0. For iii use PU 1 U 1 = I n 1 and PU 1U 2 U 2U 1 U 2 = trace U 2 U 2PU 1 U 1 = traceu 2 U 2 = U 2 2.

23 For iv use P U 2 4 = n 2 1 and FUNCTIONAL REGRESSION 23 PU 1U 2 2 U 3U 2 2 U 2 = U 2 4 For v, check that the coefficient of t 4 in the Taylor expansion of P exptu 1U 2 = P exp 1 2 t2 U 1 2 = 1 t 2 n 1/2 is of order n 2. LEMMA 32. Uniformly over distinct j, k, l, i PΛ k,j = PΛ k,j Λ k,l = 0 ii PΛ 2 k,j = O F n 1 k α j α θ k θ j 2 iii PΛ 4 k,j = O F n 2 k 2α j 2α θ k θ j 4 iv P Λ k 2 = O F n 1 k 2 v P Λ k 4 = O F n 2 k 4 PROOF. Assertions i, ii, and iii follow from Assertions ii and iii of Lemma 31. For iv, note that For v note that P Λ k 2 = j PΛ2 j,k = O F n 1 k α κ k 2, α 2 P Λ k 4 = P θ jθ j k Sj,kθ 2 k θ j 2 = θ jθ j l l θkθ 2 k θ j 2 θ k θ l 2 PSj,kS 2 l,k 2 = O F n 2 = O F n 2 k 4. j θ jθ k θ k θ j 2 To prove Lemma 28 we define X ɛ,n as an intersection of sets chosen to make the six assertions of the Lemma hold, 2 X ɛ,n := X,n X Z,n X Λ,n X η,n X A,n, where the complement of each of the five sets appearing on the right-hand side has probability less than ɛ/5. More specifically, for a large enough constant C ɛ, we

24 24 define X,n = { C ɛ n 1/2 } X Z,n = {max i n Z i 2 C ɛ log n and Z C ɛ n 1/2 } X η,n = {max i n η i 2 C ɛ N log n} as in Section 5 X A,n = { η i η i n i 2 C ɛ n} The definition of X Λ,n, in subsection 8.3, is slightly more complicated. It is defined by requiring various functions of the Λ k s to be smaller than C ɛ times their expected values. The set X A,n is almost redundant. From Definition 5 we know that min θ 1 j<j j θ j α/rn 1 α and min θ j R 1 N α. N 1 j N The choice N n ζ with ζ < 2+2α 1 ensures that n 1/2 N 1 α. On X,n the spacing assumption used in Section 6 holds for all n large enough; all the bounds from that Section are avaiable to us on X ɛ,n. In particular, max j N θ j /θ j 1 O F N α = o F 1. Equality 26 shows that X A,n X,n eventually if we make sure C ɛ > Proof of Lemma 28 part i. Observe that P 2 = 2 K P j,k j,k θ j {j = k} = j,k θ jθ k P S j,k {j = k} 2 j θ jo F n 1 + j,k θ jθ k O F n 2 = O F n Proof of Lemma 28 part ii. As before, Corollary 42 controls max i n Z i 2. To control the Z contribution, note that n Z 2 has the same distribution as Z 1 2, which has expected value j N θ j <.

25 FUNCTIONAL REGRESSION Proof of Lemma 28 parts iii and iv. Calculate expected values for all the terms that appear in the bound 25 from Section and P n,µ,k k p = k p Λ j>p k,jb j P n,µ,kλ 2 j>p k,j 2 + P n,µ,k b 2 j + b 2 k j>p = O F n 1 k p j>p k α 2β j α θ k θ j 2 = O F n 1 p 1 α by Lemma 30 P n,µ,k k p b2 k Λ k 4 = O F n 2 k p k4 2β = O F n 2 and P n,µ,k k p b k Λ k 2 = O F n 1 k J k2 β = O F n 1 k p Λ k,jb k 2 by Lemma 32i 1 + p 5 2β + log p 1 + p 3 β + log p and P n,µ,k k p Λ k 2 = O F n 1 p 3 and P n,µ,k 2 = O F n 1 Λ k p j k,jb j k p 34 = O F n 1 by Lemma 30 and j k α j a 2β θ k θ j 2 35 δ 2 2 P n,µ,k Λ k p k 2 b j j θ k θ j = O F n 1 δ 2 p 3 + p 5+2α 2β log 2 p and 36 k p b2 k j 2 1 = O F 1 + p 3+2α 2β log 2 p by Lemma 30. θ k θ j For some constant C ɛ = C ɛ F, on a set X Λ,n with P n,µ,k X c Λ,n < ɛ, each of the random quantities in the previous set of inequalities for both p = m and p = N is bounded by C ɛ times its P n,µ,k expected value. By virtue of Lemma 32iv, we may also assume that Λ k 2 C ɛ k 2 /n on X Λ,n.

26 26 From inequality 25, it follows that on the set X,n X Λ,n, for both p = m and p = N, H p H p B 2 O F n 1 p 1 α + O F n p 5 2β + log p + p 6 β + log 2 p + O F n 1 p 3 O F n 1 + O F n 2 p 3 + p 5+2α 2β log 2 p + O F n 2 p 3 O F 1 + p 3+2α 2β log 2 p = O F n 1 p 1 α if p N. This inequality leads to the asserted conclusions when p = m or p = N Proof of Lemma 28 part v. By construction, η i1 = 1 for every i and, for j 2, θ j η i,j = z i,j z j = Z i Z, φ j Thus, for j 2, σ j η i,j = θ 1/2 j Z i Z, φ j + f j = η i,j + δ i,j with δ i,j 2 θj 1 2 j Z i + Z fj 2 2+α log n O F n on X ɛ,n. In vector form, 37 S η i = η i + δ i with δ i 2 = O F N 3+α log n n It follows that o F n/n 2 on X ɛ,n. max i n η i = max i n S η i max i n η i +o F n/n = O F n/n on X ɛ,n Proof of Lemma 28 part vi. From inequality 29 we know that ɛ N := max i n λ i,n λ i,n = O F n 1+ν /2 on X ɛ,n and from subsection 5.2 we have max i n λ i,n = O F log n. Assumption ψ3 in Section 3 and the Mean-Value theorem then give max i n ψ 2 λ i,n ψ 2 λ i,n ɛ N ψ 2 λ i,n Gɛ N = o F 1.

27 FUNCTIONAL REGRESSION 27 If we replace ψ 2 λ i,n in the definition of Ãn by L i := ψ 2 λ i,n we make a change Γ with Γ 2 o F 1 n 1 1 i n η i η i 2, which, by equality 26, is of order o F 1 on X ɛ,n. From Assumption ψ2 we have c n := log max i n L i = o F log n. Uniformly over all unit vectors u in R N+1 we therefore have u SÃn Su = o F 1 + n 1 1 L iu η i + δ i η i + δ i u i n = o F On 1 u A n u + O F n 1 i n L i u δ i 2 + 2u η i u δ i Rearrange then take a supremum over u to conclude that SÃn S A n 2 o F 1 + O F e cn max i n δ i δ i η i Representation 37 and the defining property of X η,n then ensure that the upper bound is of order o F 1 on X ɛ,n. 9. The minimax lower bound. We will apply a slight variation on Assouad s Lemma combining ideas from Yu 1997 and from van der Vaart 1998, Section 24.3 to establish inequality 2. We consider behavior only for µ = 0 and a = 0, for a fixed K with spectral decomposition j N θ jφ j φ j. For simplicity we abbreviate P n,0,k to P. Let J = {m + 1, m + 2,..., 2m} and Γ = {0, 1} J. Let β j = Rj β. For each γ in Γ define B γ = ɛ j J γ jβ j φ j, for a small ɛ > 0 to be specified, and write Q γ for the product measure i n Q λi γ with λ i γ = B γ, Z i = ɛ j J γ jβ j z i,j. For each j let Γ j = {γ Γ : γ j = 1} and let ψ j be the bijection on Γ that flips the jth coordinate but leaves all other coordinates unchanged. Let π be the uniform distribution on Γ, that is, π γ = 2 m for each γ. For each estimator B = b j N j φ j we have B γ B 2 j J γ j β j b 2 j and so sup P n,f B B 2 π γ F γ Γ 38 = 2 m j J 2 m j J ɛγ j β j b j 2 PQ γ j J P Q γ ɛβ j b j 2 + Q γ Γ ψj j γ0 b j 2 γ Γ j 1 4 ɛβ j 2 P Q γ Q ψj γ,

28 28 the last lower bound coming from the fact that ɛβ j b j b j ɛβ j 2 for all b j. We assert that, if ɛ is chosen appropriately, 39 min j,γ P Q γ Q ψj γ stays bounded away from zero as n, which will ensure that the lower bound in 38 is eventually larger than a constant multiple of j J β2 j cρ n for some constant c > 0. Inequality 2 will then follow. To prove 39, consider a γ in Γ and the corresponding γ = ψ j γ. By virtue of the inequality Q γ Q γ = 1 Q γ Q γ TV 1 2 1/2 i n h2 Q λi γ, Q λi γ it is enough to show that 40 lim sup n max j,γ P 2 i n h2 Q λi γ, Q λi γ < 1. Define X n = {max i n Z i 2 C 0 log n}, with the constant C 0 large enough that PX c n = o 1. On X n we have and, by inequality 4, λ i γ 2 j J β2 j Z i 2 = Oρ n log n = o1 h 2 Q λi γ, Q λi γ O F 1 λ i γ λ i γ 2 ɛ 2 O F 1β 2 j z 2 i,j. We deduce that P 2 i n h2 Q λi γ, Q λi γ 2PX c n + i n ɛ2 O F 1βj 2 PX n zi,j 2 o1 + ɛ 2 O1nβ 2 j θ j. The choice of J makes β 2 j θ j R 2 m α 2β R 2 /n. Assertion 40 follows. 10. Hellinger distances in an exponential family. We need to show that h 2 Q λ, Q λ+δ δ 2 ψ 2 λ 1 + δ G δ for all real λ and δ. Temporarily write λ for λ + δ and λ for λ + λ /2 = λ + δ/ h2 Q λ, Q λ = f λ yf λ y = = exp exp λy 1 2 ψλ 1 2 ψλ ψλ 1 2 ψλ 1 2 ψλ 1 + ψλ 1 2 ψλ 1 2 ψλ

29 That is, FUNCTIONAL REGRESSION 29 h 2 Q λ, Q λ ψλ + ψλ + δ 2ψλ + δ/2. By Taylor expansion in δ around 0, the right-hand side is less than 1 4 δ2 ψ 2 λ δ3 ψ 3 λ + δ 1 8 ψ3 λ δ /2 where 0 < δ < δ. Invoke inequality 3 twice to bound the coefficient of δ 3 /6 in absolute value by ψ 2 λ G δ G δ /2 9 8 ψ2 λg δ. The stated bound simplifies some unimportant constants. 11. Bounds for Gaussian processes. As a consequence of defining property K, the centered process Z := X µ has an expansion Zt = k N θk η k φ k t where the η k s are independent N0, 1 s, implying Z 2 = θk θ k,k N k η k η k φ k tφ k s dt ds = θ k N kηk. 2 LEMMA 41. Suppose W i = k N τ i,kηi,k 2 for i = 1,..., n, where the η i,k s are independent standard normals and the τ i,k s are nonnegative constants with > T := max i n k N τ i,k. Then P{max i n W i > 4T log n + x} < 2e x for each x 0. PROOF. Without loss of generality suppose T = 1. For s = 1/4, note that P expsw i = 1 2sτ k N i,k 1/2 exp e 1/4 k N sτ i,k by virtue of the inequality log1 t 2t for t 1/2. With the same s, it then follows that P{max i n W i > 4log n + x} exp 4slog n + x P exp max i n sw i e x 1 n P expsw i. i n The 2 is just a clean upper bound for e 1/4. COROLLARY 42. where C = 4C k N k α <. P n {max i n Z i 2 > C log n + x} 2e x

30 30 References. BIRMAN, M. S. and SOLOMJAK, M. Z Spectral Theory of Self-Adjoint Operators in Hilbert Space. Reidel Translation from the 1980 Russian edition. BOSQ, D Linear Processes in Function Spaces: Theory and Applications. Lecture Notes in Statistics 149. Springer-Verlag. CAI, T. T. and HALL, P Prediction in functional linear regression. Annals of Statistics HALL, P. and HOROWITZ, J. L Methodology and convergence rates for functional linear regression. Annals of Statistics HALL, P. and HOSSEINI-NASAB, M On properties of functional principal components analysis. J. R. Statist. Soc. B HJORT, N. L. and POLLARD, D Asymptotics for minimisers of convex processes Technical Report, Yale University. PORTNOY, S Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Annals of Statistics VAN DER VAART, A. W Asymptotic Statistics. Cambridge University Press. YU, B Assouad, Fano, and Le Cam. In A Festschrift for Lucien Le Cam D. Pollard, E. Torgersen and G. L. Yang, eds Springer-Verlag, New York. STATISTICS DEPARTMENT YALE UNIVERSITY Wei.Dou@yale.edu David.Pollard@yale.edu Huibin.Zhou@yale.edu URL:

5.1 Approximation of convex functions

5.1 Approximation of convex functions Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and

More information

A linear algebra proof of the fundamental theorem of algebra

A linear algebra proof of the fundamental theorem of algebra A linear algebra proof of the fundamental theorem of algebra Andrés E. Caicedo May 18, 2010 Abstract We present a recent proof due to Harm Derksen, that any linear operator in a complex finite dimensional

More information

A linear algebra proof of the fundamental theorem of algebra

A linear algebra proof of the fundamental theorem of algebra A linear algebra proof of the fundamental theorem of algebra Andrés E. Caicedo May 18, 2010 Abstract We present a recent proof due to Harm Derksen, that any linear operator in a complex finite dimensional

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS

CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS JOHN HARTIGAN, DAVID POLLARD AND SEKHAR TATIKONDA YALE UNIVERSITY ABSTRACT. The paper provides a simpler method for proving

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

5 Compact linear operators

5 Compact linear operators 5 Compact linear operators One of the most important results of Linear Algebra is that for every selfadjoint linear map A on a finite-dimensional space, there exists a basis consisting of eigenvectors.

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional 15. Perturbations by compact operators In this chapter, we study the stability (or lack thereof) of various spectral properties under small perturbations. Here s the type of situation we have in mind:

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

The Codimension of the Zeros of a Stable Process in Random Scenery

The Codimension of the Zeros of a Stable Process in Random Scenery The Codimension of the Zeros of a Stable Process in Random Scenery Davar Khoshnevisan The University of Utah, Department of Mathematics Salt Lake City, UT 84105 0090, U.S.A. davar@math.utah.edu http://www.math.utah.edu/~davar

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

C.6 Adjoints for Operators on Hilbert Spaces

C.6 Adjoints for Operators on Hilbert Spaces C.6 Adjoints for Operators on Hilbert Spaces 317 Additional Problems C.11. Let E R be measurable. Given 1 p and a measurable weight function w: E (0, ), the weighted L p space L p s (R) consists of all

More information

Real Variables # 10 : Hilbert Spaces II

Real Variables # 10 : Hilbert Spaces II randon ehring Real Variables # 0 : Hilbert Spaces II Exercise 20 For any sequence {f n } in H with f n = for all n, there exists f H and a subsequence {f nk } such that for all g H, one has lim (f n k,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

arxiv: v2 [math.pr] 27 Oct 2015

arxiv: v2 [math.pr] 27 Oct 2015 A brief note on the Karhunen-Loève expansion Alen Alexanderian arxiv:1509.07526v2 [math.pr] 27 Oct 2015 October 28, 2015 Abstract We provide a detailed derivation of the Karhunen Loève expansion of a stochastic

More information

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE MICHAEL LAUZON AND SERGEI TREIL Abstract. In this paper we find a necessary and sufficient condition for two closed subspaces, X and Y, of a Hilbert

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Jordan normal form notes (version date: 11/21/07)

Jordan normal form notes (version date: 11/21/07) Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES ANDREW V. KNYAZEV AND JOHN E. OSBORN Abstract. We analyze the Ritz method for symmetric eigenvalue problems and prove a priori eigenvalue error estimates.

More information

SOME PROPERTIES OF THE CANONICAL DIVISOR IN THE BERGMAN SPACE

SOME PROPERTIES OF THE CANONICAL DIVISOR IN THE BERGMAN SPACE SOME PROPERTIES OF THE CANONICAL DIVISOR IN THE BERGMAN SPACE Cyrus Luciano 1, Lothar Narins 2, Alexander Schuster 3 1 Department of Mathematics, SFSU, San Francisco, CA 94132,USA e-mail: lucianca@sfsu.edu

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

An introduction to Birkhoff normal form

An introduction to Birkhoff normal form An introduction to Birkhoff normal form Dario Bambusi Dipartimento di Matematica, Universitá di Milano via Saldini 50, 0133 Milano (Italy) 19.11.14 1 Introduction The aim of this note is to present an

More information

Linear Algebra 2 Spectral Notes

Linear Algebra 2 Spectral Notes Linear Algebra 2 Spectral Notes In what follows, V is an inner product vector space over F, where F = R or C. We will use results seen so far; in particular that every linear operator T L(V ) has a complex

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Eigenvalues, random walks and Ramanujan graphs

Eigenvalues, random walks and Ramanujan graphs Eigenvalues, random walks and Ramanujan graphs David Ellis 1 The Expander Mixing lemma We have seen that a bounded-degree graph is a good edge-expander if and only if if has large spectral gap If G = (V,

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Fundamentals of Engineering Analysis (650163)

Fundamentals of Engineering Analysis (650163) Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Functional Analysis of Variance

Functional Analysis of Variance Applied Mathematical Sciences, Vol. 2, 2008, no. 23, 1115-1129 Functional Analysis of Variance Abdelhak Zoglat Faculté des Sciences Département de Mathématiques et Informatique B.P. 1014 Rabat, Morocco

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

2 A Model, Harmonic Map, Problem

2 A Model, Harmonic Map, Problem ELLIPTIC SYSTEMS JOHN E. HUTCHINSON Department of Mathematics School of Mathematical Sciences, A.N.U. 1 Introduction Elliptic equations model the behaviour of scalar quantities u, such as temperature or

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Local Asymptotic Normality

Local Asymptotic Normality Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions

More information

Math 61CM - Solutions to homework 6

Math 61CM - Solutions to homework 6 Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

PACKING-DIMENSION PROFILES AND FRACTIONAL BROWNIAN MOTION

PACKING-DIMENSION PROFILES AND FRACTIONAL BROWNIAN MOTION PACKING-DIMENSION PROFILES AND FRACTIONAL BROWNIAN MOTION DAVAR KHOSHNEVISAN AND YIMIN XIAO Abstract. In order to compute the packing dimension of orthogonal projections Falconer and Howroyd 997) introduced

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Eigenvalues and Eigenfunctions of the Laplacian

Eigenvalues and Eigenfunctions of the Laplacian The Waterloo Mathematics Review 23 Eigenvalues and Eigenfunctions of the Laplacian Mihai Nica University of Waterloo mcnica@uwaterloo.ca Abstract: The problem of determining the eigenvalues and eigenvectors

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Spectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space

Spectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space Comment.Math.Univ.Carolin. 50,32009 385 400 385 Spectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space Toka Diagana, George D. McNeal Abstract. The paper is

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations. Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differential equations. 1 Metric spaces 2 Completeness and completion. 3 The contraction

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Gaussian Random Fields

Gaussian Random Fields Gaussian Random Fields Mini-Course by Prof. Voijkan Jaksic Vincent Larochelle, Alexandre Tomberg May 9, 009 Review Defnition.. Let, F, P ) be a probability space. Random variables {X,..., X n } are called

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Lax Solution Part 4. October 27, 2016

Lax Solution Part 4.   October 27, 2016 Lax Solution Part 4 www.mathtuition88.com October 27, 2016 Textbook: Functional Analysis by Peter D. Lax Exercises: Ch 16: Q2 4. Ch 21: Q1, 2, 9, 10. Ch 28: 1, 5, 9, 10. 1 Chapter 16 Exercise 2 Let h =

More information

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES You will be expected to reread and digest these typed notes after class, line by line, trying to follow why the line is true, for example how it

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

LINEAR ALGEBRA REVIEW

LINEAR ALGEBRA REVIEW LINEAR ALGEBRA REVIEW JC Stuff you should know for the exam. 1. Basics on vector spaces (1) F n is the set of all n-tuples (a 1,... a n ) with a i F. It forms a VS with the operations of + and scalar multiplication

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Quantum Statistics -First Steps

Quantum Statistics -First Steps Quantum Statistics -First Steps Michael Nussbaum 1 November 30, 2007 Abstract We will try an elementary introduction to quantum probability and statistics, bypassing the physics in a rapid first glance.

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Random sets of isomorphism of linear operators on Hilbert space

Random sets of isomorphism of linear operators on Hilbert space IMS Lecture Notes Monograph Series Random sets of isomorphism of linear operators on Hilbert space Roman Vershynin University of California, Davis Abstract: This note deals with a problem of the probabilistic

More information