Existence and Uniqueness of Penalized Least Square Estimation for Smoothing Spline Nonlinear Nonparametric Regression Models

Size: px

Start display at page:

Download "Existence and Uniqueness of Penalized Least Square Estimation for Smoothing Spline Nonlinear Nonparametric Regression Models"

Juniper Foster
5 years ago
Views:

1 Existence and Uniqueness of Penalized Least Square Estimation for Smoothing Spline Nonlinear Nonparametric Regression Models Chunlei Ke and Yuedong Wang March 1, 24 1 The Model A smoothing spline nonlinear nonparametric regression model (SSNNRM) assumes that y i = N i (g 1,, g r ) + ɛ i, i = 1,, n, (1) where N i are known nonlinear functionals, g = (g 1,, g r ) are unknown functions, and ɛ i iid N(, σ 2 ) are random errors. Without loss of generality, we assume that r = 2. As in O Sullivan (199), we express design points x explicitly in the functional N i : N i (g 1, g 2 ) = η(g 1, g 2 ; x i ), where η is a known nonlinear functional. In the following sections, η(g 1, g 2 ; x) is sometimes also represented by η(g 1, g 2 ) or η when the meaning is clear. g 1 and g 2 are estimated as minimizers of the following penalized least squares (PLS) l nλ (g 1, g 2 ) = 1 n (y i η(g 1, g 2 ; x i )) 2 + λ 1 J 1 (g 1 ) + λ 2 J 2 (g 2 ), (2) n i=1 where g j H j, j = 1, 2, H j are Hilbert spaces, λ 1, λ 2 are smoothing parameters and J 1, J 2 are penalty functionals. In this plement, adapting the frameworks in Cox (1988), Cox and O Sullivan (199) and O Sullivan (199), we establish the existence and uniqueness of the solution to (2). To save space, some details are omitted in the following sections. Chunlei Ke ( cke@sjm.com) is Statistician, St. Jude Medical, Cardiac Rhythm Management Division, Sylmar, CA Yuedong Wang ( yuedong@pstat.ucsb.edu) is Professor, Department of Statistics and Applied Probability, University of California, Santa Barbara, California Yuedong Wang s research was ported by NIH Grant R1 GM Address for correspondence: Yuedong Wang, Department of Statistics and Applied Probability, University of California, Santa Barbara, California

2 2 Some Assumptions We make the following assumptions. See O Sullivan (199) for discussion on these assumptions. Assumption A.1 (Observational model) (i) x i Ω R d, where Ω is a bounded, open and simply connected set with C boundary. (ii) {x i, i = 1,, n} is a random sample from a probability density function f, which is strictly bounded away from zero and infinity on Ω. Denote F and F n as the CDF and empirical CDF of sample x i s. (iii) The x i s and ɛ i s are independent. Denote the true functions in model (1) as g 1 and g 2. It is obvious that g 1 and g 2 are minimizers of l (g 1, g 2 ) = (η(g 1, g 2 ; x) η(g 1, g 2 ; x)) 2 df (x). (3) Ω As Cox and O Sullivan (199) and O Sullivan (199), we define l λ (g 1, g 2 ) as the limiting regularization functional of l nλ : l λ (g 1, g 2 ) = (η(g 1, g 2 ; x) η(g 1, g 2 ; x)) 2 df (x) + λ 1 J 1 (g 1 ) + λ 2 J 2 (g 2 ). (4) Ω Assumption A.2 (Parameter space) (i) H 1 and H 2 are Hilbert spaces with norms 1, 2 and inner products < > 1 and < > 2. For simplicity of notation, the subscripts will be dropped when there is no confusion. Let H = H 1 H 2 with the norm (g 1, g 2 ) = g 1 + g 2. (ii) The penalty functionals are of a quadratic form J 1 (g 1 ) =< g 1, W 1 g 1 > and J 2 (g 2 ) =< g 2, W 2 g 2 >, where W 1 and W 2 are nonnegative definite linear operators on H 1 and H 2 respectively. (iii) There are bounded linear operators L 1 : H 1 L 2 (Ω ) and L 2 : H 2 L 2 (Ω ) with zero null-space. There are strictly positive constants M 1 and M 2 such that for all g i H i, M 1 g i 2 L i g i 2 L 2 + < g i, W i g i > M 2 g i 2, i = 1, 2. We then define bounded linear operators U 1 and U 2 by (L i g i, L i h i ) L2 =< g i, U i h i >, g i, h i H i, i = 1, 2. 2

3 U 1 and U 2 are compact, and satisfy M 1 g i 2 < g i, U i g i > + < g i, W i g i > M 2 g i 2, g i H i, i = 1, 2. We now define normed spaces based on U i and W i as O Sullican (199). For i = 1, 2, there is a sequence {φ vi : v = 1, 2, } of eigenfunctions and {γ vi, v = 1, 2, } of eigenvalues that satisfy < φ µi, U i φ vi >= δ vµ, < φ µi, W i φ vi >= γ vi δ vµ, where v and µ are any positive integers and δ vµ is Kronecker s delta. For b, define g i bi = { (1 + γvi) b < g i, U i φ vi > 2 i } 1/2, v=1 and let H bi be the normed linear space obtained by completing {g H i : g bi < }. H bi is a Hilbert space with inner product It is easy to see that H i = H 1i. < g, h > bi = (1 + γvi) b < g, U i φ vi >< h, U i φ vi >. v=1 Assumption A.3 (Property of η, g 1 and g 2 ) For some α 1, α 2 (, 1], there are g 1 H α1, g 2 H α2 and their neighbourhoods N 1 H α1 and N 2 H α2, such that (i) η(g 1, g 2 ; x) is three times continuously Fréchet differentiable with respect to g 1 and g 2 in N 1 N 2. (g 1, g 2 ) are the unique root of D 1 l (g 1, g 2 ) = and D 2 l (g 1, g 2 ) =. (ii) For some s such that mα > s > d/2, there exists M > such that for g 1 N 1 and g 2 N 2, η(g 1, g 2 ; x) W s 2 < M, where W s 2 is a Sobolev space with noninteger order s. For simplicity of notations, we assume that α 1 = α 2 = α. Construction and proof are similar for the case of α 1 α 2. 3 Linearizations In this section, we approximate the systematic and stochastic errors in the PLS estimates using Taylor series expansions. We first derive a Taylor series expansion for a bivariate nonlinear operator. Theorem 1 Let f : D(f) X Y Z, where X, Y and Z are Banach spaces. If f exists at (x, y), then the partial Fréchet derivatives f xx, f xy, f yx and f yy exist at (x, y) and for any h, a X, k, b Y, f (x, y)(h, k)(a, b) = f xx (x, y)ha + f xy (x, y)ka + f yx (x, y)hb + f yy (x, y)kb. (5) 3

4 [Proof] By definition, for any (h, k), (a, b) X Y, f (x + h, y + k)(a, b) = f (x, y)(a, b) + f (x, y)(h, k)(a, b) + o( (h, k) ), as (h, k). For the norm on the product space X Y, choose (h, k) = h + k. Setting b = k =, we have f (x + h, y)(a, ) = f (x, y)(a, ) + f (x, y)(h, )(a, ) + o( h ). Applying equation (15) in Chapter 4 of Zeidler (1985), we have Therefore, f x (x + h, y)a = f x (x, y)a + f (x, y)(h, )(a, ) + o( h ). f xx (x, y)ha = f (x, y)(h, )(a, ). Similarly by setting a = h =, (h, b) = (, ) and (a, k) = (, ) respectively, we get f yy (x, y)kb = f (x, y)(, k)(, b), f xy (x, y)ka = f (x, y)(, k)(a, ), f yx (x, y)hb = f (x, y)(h, )(, b). Addition gives (5). Based on the generalized Taylor s theorem (Theorem 4.A (b), Zeidler, 1985), equation (15) in Chapter 4 of Zeidler (1985) and Theorem 1, we have the first order Taylor series expansion f(x + h, y + k) = f(x, y) + and the second order Taylor series expansion where the remainder [f x (x + τh, y + τk)h + f y (x + τh, y + τk)k]dτ, (6) f(x + h, y + k) = f(x, y) + f x (x, y)h + f y (x, y)k + R, (7) R = = (1 τ)f (x + τh, y + τk)(h, k)(h, k)dτ (1 τ)[f xx (x + τh, y + τk)hh + f xy (x + τh, y + τk)kh + f yx (x + τh, y + τk)hk + f yy (x + τh, y + τk)kk]dτ. We now use Taylor expansions to approximate the systematic and stochastic errors of the estimates. Denote D 1 and D 2 as the partial Fréchet derivatives of η with respect to g 1 and g 2 respectively, D 11 and D 22 as the second partial Fréchet derivatives of η with respect to g 1 and 4

5 g 2 respectively, and D 12 as the second partial Fréchet derivative of η with respect to g 1 and g 2 (Zeidler, 1985). Higher Fréchet partial derivatives are denoted similarly. Let Z i (g 1, g 2 ) = 1 2 D il λ (g 1, g 2 ), i = 1, 2. Note that Z i also depend on λ, which is not expressed explicitly. For g 1 + h 1 N 1, g 2 + h 2 N 2 and i = 1, 2, Z i (g 1 + h 1, g 2 + h 2 ) = Z i (g 1, g 2 ) + D 1 Z i (g 1, g 2 )h 1 + D 2 Z i (g 1, g 2 )h 2 + [D 11 Z i (g 1 + τh 1, g 2 + τh 2 )h 1 h 1 + D 12 Z i (g 1 + τh 1, g 2 + τh 2 )h 1 h 2 +D 21 Z i (g 1 + τh 1, g 2 + τh 2 )h 2 h 1 + D 22 Z i (g 1 + τh 1, g 2 + τh 2 )h 2 h 2 ](1 τ)dτ. Define U 1 (g 1, g 2 ), U 2 (g 1, g 2 ), U 12 (g 1, g 2 ) and U 21 (g 1, g 2 ) by < u 1, U 1 (g 1, g 2 )v 1 > 1 = (D 1 η(g 1, g 2 )u 1, D 1 η(g 1, g 2 )v 1 ) L2, < u 2, U 2 (g 1, g 2 )v 2 > 2 = (D 2 η(g 1, g 2 )u 2, D 2 η(g 1, g 2 )v 2 ) L2, < u 1, U 21 (g 1, g 2 )v 2 > 1 = (D 1 η(g 1, g 2 )u 1, D 2 η(g 1, g 2 )v 2 ) L2, for any u i, v i H i, i = 1, 2. U 12 is the adjoint operator of U 21, i.e. U 12 = U 21. Let Then we have Z 1 (g 1 + h 1, g 2 + h 2 ) G i (g 1, g 2 ) = U i (g 1, g 2 ) + λ i W i, i = 1, 2. = Z 1 (g 1, g 2 ) + G 1 (g 1, g 2 )h 1 + U 12 (g 1, g 2 )h 2 + [e 11 (g 1 + τh 1, g 2 + τh 2 )h 1 h 1 + e 12 (g 1 + τh 1, g 2 + τh 2 )h 1 h 2 + e 13 (g 1 + τh 1, g 2 + τh 2 )h 2 h 1 + e 14 (g 1 + τh 1, g 2 + τh 2 )h 2 h 2 ](1 τ)dτ, where e 1i (g 1, g 2 )uvw = e 1i (g 1, g 2 ; x)uvwdf (x), i = 1, 2, 3, 4 and e 11 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 111 ηuvw +D 1 ηud 11 ηvw + D 1 ηvd 11 ηuw + D 1 ηwd 11 ηuv, e 12 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 112 ηuvw +D 1 ηud 12 ηvw + D 2 ηwd 11 ηuv + D 1 ηvd 12 ηuw, e 13 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 121 ηuvw +D 1 ηud 21 ηvw + D 1 ηwd 12 ηuv + D 2 ηvd 11 ηuw, e 14 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 122 ηvwd 1 ηu +D 1 ηud 22 ηvw + D 2 ηvd 12 ηuw + D 2 ηwd 12 ηuv. 5

6 Similarly, for Z 2 we have Z 2 (g 1 + h 1, g 2 + h 2 ) = Z 2 (g 1, g 2 ) + G 2 (g 1, g 2 )h 2 + U 21 (g 1, g 2 )h 1 + [e 21 (g 1 + τh 1, g 2 + τh 2 )h 1 h 1 + e 22 (g 1 + τh 1, g 2 + τh 2 )h 1 h 2 + e 23 (g 1 + τh 1, g 2 + τh 2 )h 2 h 1 + e 24 (g 1 + τh 1, g 2 + τh 2 )h 2 h 2 ](1 τ)dτ, where e 2i (g 1, g 2 )uvw = e 2i (g 1, g 2 ; x)uvwdf (x), i = 1, 2, 3, 4 and e 21 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 211 ηuvw +D 2 ηud 11 ηvw + D 1 ηvd 21 ηuw + D 1 ηwd 21 ηuv, e 22 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 212 ηuvw +D 2 ηud 12 ηvw + D 2 ηwd 21 ηuv + D 1 ηvd 22 ηuw, e 23 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 221 ηuvw +D 2 ηud 21 ηvw + D 1 ηwd 22 ηuv + D 2 ηvd 21 ηuw, e 24 (g 1, g 2 ; x)uvw = [η(g 1, g 2 ; x) η(g 1, g 2 ; x)]d 222 ηvwd 1 ηu +D 2 ηud 22 ηvw + D 2 ηvd 22 ηuw + D 2 ηwd 22 ηuv. Let e 1 = [e 11(g 1 + τh 1, g 2 + τh 2 )h 1 h 1 + e 12 (g 1 + τh 1, g 2 + τh 2 )h 1 h 2 + e 13 (g 1 + τh 1, g 2 + τh 2 )h 2 h 1 + e 14 (g 1 + τh 1, g 2 + τh 2 )h 2 h 2 ](1 τ)dτ, and e 2 = [e 21(g 1 + τh 1, g 2 + τh 2 )h 1 h 1 + e 22 (g 1 +τh 1, g 2 +τh 2 )h 1 h 2 +e 23 (g 1 +τh 1, g 2 +τh 2 )h 2 h 1 +e 24 (g 1 +τh 1, g 2 +τh 2 )h 2 h 2 ](1 τ)dτ. Then we have the following systems of equations Z 1 (g 1 + h 1, g 2 + h 2 ) = Z 1 (g 1, g 2 ) + G 1 (g 1, g 2 )h 1 + U 12 (g 1, g 2 )h 2 + e 1, Z 2 (g 1 + h 1, g 2 + h 2 ) = Z 2 (g 1, g 2 ) + G 2 (g 1, g 2 )h 2 + U 21 (g 1, g 2 )h 1 + e 2. (8) In the following the dependence of Z i, U ij and G i on g 1 and g 2 are sometimes dropped when there is no confusion. Let G 11 = G 1 U 12 G 1 2 U 21, G 22 = G 2 U 21 G 1 2 U 12. Define the systematic errors as g λ1 g 1 and g λ2 g 2, where g λi s are the local unique root of Z i (g 1, g 2 ) =, i = 1, 2, in N 1 N 2. Ignoring e 1 and e 2 in the system of equations (8) and assuming the existence of G 1 11 (g 1, g 2 ) and G 1 22 (g 1, g 2 ), we have linear approximations to the systematic errors ḡ λ1 g 1 ḡ λ2 g 2 = G 1 11 (g 1, g 2 )(Z 1 (g 1, g 2 ) U 12 (g 1, g 2 )G 1 2 (g 1, g 2 )Z 2 (g 1, g 2 )), = G 1 22 (g 1, g 2 )(Z 2 (g 1, g 2 ) U 21 (g 1, g 2 )G 1 1 (g 1, g 2 )Z 1 (g 1, g 2 )). 6 (9)

7 The stochastic errors are defined as g nλ1 g λ1 and g nλ2 g λ2. Let Z ni (g 1, g 2 ) = 1 2 D il nλ (g 1, g 2 ), i = 1, 2. Approximations of the stochastic errors will be obtained based on expansions of Z n1 and Z n2. For g λ1 + h 1 N 1 and g λ2 + h 2 N 2, we have Z n1 (g λ1 + h 1, g λ2 + h 2 ) = Z n1 (g λ1, g λ2 ) + D 1 Z n1 (g λ1, g λ2 )h 1 + D 2 Z n1 (g λ1, g λ2 )h 2 + e n1, Z n2 (g λ1 + h 1, g λ2 + h 2 ) = Z n2 (g λ1, g λ2 ) + D 1 Z n2 (g λ1, g λ2 )h 1 + D 2 Z n2 (g λ1, g λ2 )h 2 + e n2, where e n1 and e n2 are defined similarly as e 1 and e 2. Again, ignoring remainder terms and assuming the existence of G 1 11 (g λ1, g λ2 ) and G 1 22 (g λ1, g λ2 ), we approximate g nλ1 and g nλ2 by ḡ nλ1 g λ1 ḡ nλ2 g λ2 = G 1 11 (g λ1, g λ2 )(Z n1 (g λ1, g λ2 ) U 12 (g λ1, g λ2 )G 1 2 (g λ1, g λ2 )Z n2 (g λ1, g λ2 )), = G 1 22 (g λ1, g λ2 )(Z n2 (g λ1, g λ2 ) U 21 (g λ1, g λ2 )G 1 1 (g λ1, g λ2 )Z n1 (g λ1, g λ2 )). For b α, λ 1 >, λ 2 >, g 1, h 1 N 1, g 2, h 2 N 2, unit elements u 1, u 2 H 1α ( u i α = 1) and unit elements v 1, v 2 H 2α, let K 11 = u 1,u 2 G 11 (h 1, h 2 ) 1 [e 11 (g 1, g 2 )u 1 u 2 U 12 (h 1, h 2 )G 1 2 (h 1, h 2 )e 21 (g 1, g 2 )u 1 u 2 ] b, K 12 = u 1,v 1 G 11 (h 1, h 2 ) 1 [e 12 (g 1, g 2 )u 1 v 1 U 12 (h 1, h 2 )G 1 2 (h 1, h 2 )e 22 (g 1, g 2 )u 1 v 1 ] b, K 13 = u 1,v 1 G 11 (h 1, h 2 ) 1 [e 13 (g 1, g 2 )v 1 u 1 U 12 (h 1, h 2 )G 1 2 (h 1, h 2 )e 23 (g 1, g 2 )v 1 u 1 ] b, K 14 = v 1,v 2 G 11 (h 1, h 2 ) 1 [e 14 (g 1, g 2 )v 1 v 2 U 12 (h 1, h 2 )G 1 2 (h 1, h 2 )e 24 (g 1, g 2 )v 1 v 2 ] b, K 21 = u 1,u 2 G 22 (h 1, h 2 ) 1 [e 21 (g 1, g 2 )u 1 u 2 U 21 (h 1, h 2 )G 1 1 (h 1, h 2 )e 11 (g 1, g 2 )u 1 u 2 ] b, K 22 = u 1,v 1 G 22 (h 1, h 2 ) 1 [e 22 (g 1, g 2 )u 1 v 1 U 21 (h 1, h 2 )G 1 1 (h 1, h 2 )e 12 (g 1, g 2 )u 1 v 1 ] b, K 23 = u 1,v 1 G 22 (h 1, h 2 ) 1 [e 23 (g 1, g 2 )v 1 u 1 U 21 (h 1, h 2 )G 1 1 (h 1, h 2 )e 13 (g 1, g 2 )v 1 u 1 ] b, K 24 = v 1,v 2 G 22 (h 1, h 2 ) 1 [e 24 (g 1, g 2 )v 1 v 2 U 21 (h 1, h 2 )G 1 1 (h 1, h 2 )e 14 (g 1, g 2 )v 1 v 2 ] b. Similarly we can define K ij,n, i, j = 1, 2. Denote Let A ij (g 1, g 2 ) = D j Z ni (g 1, g 2 ) D j Z i (g 1, g 2 ), i, j = 1, 2. E21,n 1 = g1,g 2 11 (g 1, g 2 )U 12 (g 1, g 2 )G 1 2 (g 1, g 2 )A 21 (g 1, g 2 )u b, E21,n 2 = g1,g 2 11 (g 1, g 2 )A 11 (g 1, g 2 )u b, E31,n 1 = g1,g 2 11 (g 1, g 2 )U 12 (g 1, g 2 )G 1 2 (g 1, g 2 )A 22 (g 1, g 2 )u b, E31,n 2 = g1,g 2 11 (g 1, g 2 )A 12 (g 1, g 2 )u b, E22,n 1 = g1,g 2 22 (g 1, g 2 )U 21 (g 1, g 2 )G 1 1 (g 1, g 2 )A 11 (g 1, g 2 )u b, E22,n 2 = g1,g 2 22 (g 1, g 2 )A 21 (g 1, g 2 )u b, E32,n 1 = g1,g 2 22 (g 1, g 2 )U 11 (g 1, g 2 )G 1 1 (g 1, g 2 )A 12 (g 1, g 2 )u b, E32,n 2 = g1,g 2 22 (g 1, g 2 )A 22 (g 1, g 2 )u b. 7 (1)

8 Let K 1 = K 11 + K 12 + K 13 + K 14, K 2 = K 21 + K 22 + K 23 + K 24, E 21,n = E 1 21,n + E 2 21,n E 31,n = E 1 31,n + E 2 31,n, E 22,n = E 1 22,n + E 2 22,n and E 32,n = E 1 32,n + E 2 32,n. Standard analysis based on Taylor series expansions leads to the estimate of the error terms ḡ λ1 g 1 b K 1 h 1 h 2, ḡ λ2 g 2 b K 2 h 1 h 2, ḡ nλ1 g λ1 b E 21,n h 1 + E 31,n h 2 + K 11,n h 1 h 2, ḡ nλ2 g λ2 b E 22,n h 1 + E 32,n h 2 + K 12,n h 1 h 2. 4 Existence and Uniqueness In this section, we show that l λ and l nλ have unique minimizers. Let S 1,(g1,g 2 )(r, b) = {h H b1 : h g 1 b r}, S 2,(g1,g 2 )(r, b) = {h H b2 : h g 2 b r}, S 1 (r, b) = S 1,(,) (r, b), S 2 (r, b) = S 2,(,) (r, b), d 1 (λ, b) = ḡ λ1 g 1 b, d 2 (λ, b) = ḡ λ2 g 2 b, r 1 (λ, b) = (K 11 + K 21 )d 1 (λ, α) + (K 12 + K 22 )d 2 (λ, α), r 2 (λ, b) = (K 13 + K 23 )d 1 (λ, b) + (K 14 + K 24 )d 2 (λ, b). The following theorem guarantees the existence and uniqueness of g λ1 and g λ2. Theorem 2 If d i (λ, α) and r i (λ, α) as λ 1 and λ 2, then for λ 1 and λ 2 sufficiently small, we have (a) There are unique g λ1 S 1,(g1,g 2 )(2d 1 (λ, α), α) and g λ2 S 2,(g1,g 2 )(2d 2 (λ, α), α) satisfying Z 1 (g λ1, g λ2 ) = and Z 2 (g λ1, g λ2 ) =. (b) ḡ λ1 g λ1 b + ḡ λ2 g λ2 b 4r 1 (λ, b)d 1 (λ, α) + 4r 2 (λ, b)d 2 (λ, α) for b α. [Proof] For simplicity of notations, we let S i = S i (r, b), d i = d i (λ, α) and r i = r i (λ, b), i = 1, 2. Let F 1 (h, k) = h G 1 11 (g 1, g 2 )(Z 1 (g 1 + h, g 2 + k) U 12 (g 1, g 2 )G 1 2 (g 1, g 2 )Z 2 (g 1 + h, g 2 + k)), F 2 (h, k) = k G 1 22 (g 1, g 2 )(Z 2 (g 1 + h, g 2 + k) U 21 (g 1, g 2 )G 1 1 (g 1, g 2 )Z 1 (g 1 + h, g 2 + k)). 8

9 F (h, k) = (F 1 (h, k), F 2 (h, k)) is a linear operator on the product Hilbert space H. It is not difficult to show that F (S 1 S 2 ) S 1 S 2. Then we need to show that F is a contraction on S 1 S 2. For h 1, h 2 S 1 and k 1, k 2 S 2, F 1 (h 1, k 1 ) F 1 (h 2, k 2 ) = h 1 h 2 G 1 11 (g 1, g 2 )[Z 1 (g 1 + h 1, g 2 + k 1 ) Z 1 (g 1 + h 2, g 2 + k 2 ) U 12 (g 1, g 2 )G 1 2 (g 1, g 2 )(Z 2 (g 1 + h 1, g 2 + k 1 ) Z 2 (g 1 + h 2, g 2 + k 2 ))]. Applying Taylor series expansion (6), we have Z 1 (g 1 + h 2, g 2 + k 2 ) = Z 1 (g 1 + h 1, g 2 + k 1 ) + [D 1 Z 1 (g 1 + h 1 + τ(h 2 h 1 ), g 2 + k 1 + τ(k 2 k 1 ))(h 2 h 1 ) +D 2 Z 1 (g 1 + h 1 + τ(h 2 h 1 ), g 2 + k 1 + τ(k 2 k 1 ))(k 2 k 1 )]dτ. Applying Taylor series expansion again to the terms inside the integral, we have Z 1 (g 1 + h 2, g 2 + k 2 ) Z 1 (g 1 + h 1, g 2 + k 1 ) = D 1 Z 1 (g 1, g 2 )(h 2 h 1 ) + D 2 Z 1 (g 1, g 2 )(k 2 k 1 ) + + [D 11 Z 1 (g 1 + τ h, g 2 + k )τ h + D 12 Z 1 (g 1 + τ h, g 2 + τ k )k ](h 2 h 1 )dτdτ [D 21 Z 1 (g 1 + τ h, g 2 + τ k )h + D 22 Z 1 (g 1 + τ h, g 2 + τ k )k ](k 2 k 1 )dτdτ, where h = h 1 +τ(h 2 h 1 ) and k = k 1 +τ(k 2 k 1 ). Similar approximation to Z 2 (g 1 +h 2, g 2 + k 2 ) Z 2 (g 1 + h 1, g 2 + k 1 ) can be obtained. Plugging in and after some algebraic steps, we have F 1 (h 1, k 1 ) F 1 (h 2, k 2 ) b 2(K 11 d 1 + K 12 d 2 ) h 2 h 1 b + 2(K 13 d 1 + K 14 d 2 ) k 2 k 1 b, F 2 (h 1, k 1 ) F 2 (h 2, k 2 ) b 2(K 21 d 1 + K 22 d 2 ) h 2 h 1 b + 2(K 23 d 1 + K 24 d 2 ) k 2 k 1 b. Therefore, F (h 1, h 2 ) F (k 1, k 2 ) b 2(K 11 d 1 + K 12 d 1 + K 12 d 2 + K 22 d 2 ) h 2 h 1 b +2(K 13 d 1 + K 23 d 1 + K 14 d 2 + K 24 d 2 ) k 2 k 1 b 2r 1 h 2 h 1 b + 2r 2 k 2 k 1 b. Choosing λ 1 and λ 2 such that for λ 1 (, λ 1 ] and λ 2 (, λ 2 ], we have r 1, r 2 < C for some constant C < 1. For this choice of λ 1 and λ 2, F is a contraction. The contraction theorem leads to unique h λ1 and k λ2 for which F (h λ1, k λ2 ) =. Let g λ1 = g 1 + h λ1 and g λ2 = g 2 + k λ2, 9

10 and g λ1 S 1,(g1,g 2 )(2d 1, α) and g λ2 S 2,(g1,g 2 )(2d 2, α) are unique solutions to Z 1 (g λ1, g λ2 ) = and Z 2 (g λ1, g λ2 ) =. To obtain the bound, note that Thus (ḡ λ1 g λ1, ḡ λ2 g λ2 ) = F (h λ1, k λ2 ) F (, ). ḡ λ1 g λ1 b + ḡ λ2 g λ2 b F (h λ1, k λ2 ) F (, ) b 2r 1 h λ1 b + 2r 2 k λ2 b 4(r 1 d 1 + r 2 d 2 ). This completes the proof. We next consider the solution to l nλ. For g nλ 1 N 1 and g nλ2 N 2, define d n1 (λ, b) = ḡ nλ1 g λ1 b, d n2 (λ, b) = ḡ nλ2 g λ2 b, r n1 (λ, b) = (K 11,n + K 21,n )d n1 (λ, α) + (K 12,n + K 22,n )d n2 (λ, α) + E 21,n + E 22,n, r n2 (λ, b) = (K 13,n + K 23,n )d n1 (λ, b) + (K 14,n + K 24,n )d n2 + E 31,n + E 32,n. Theorem 3 If λ n1 and λ n2 tend to zero such that g λn1 N 1 and g λn2 N 2 for all n and d n1 (λ, b) p, d n2 (λ, b) p, r n1 (λ, b) p and r n2 (λ, b) p, then with probability tending to unity as n, we have (a) There is a unique root (g nλn1, g nλn2 ) satisfying Z n1 (g nλn1, g nλn2 ) = and Z n2 (g nλn1, g nλn2 ) =. (b) for b [, α], ḡ nλ1 g nλ1 b + ḡ nλ2 g nλ2 b 4r n1 (λ, b)d n1 (λ, α) + 4r n2 (λ, b)d n2 (λ, α). [Proof] For simplicity of notations, denote d ni = d ni (λ, α) and r ni = r ni (λ, b), i = 1, 2. Let F n1 (h, k) = h G11 1 (g λ1, g λ2 )(Z n1 (g λ1 + h, g λ2 + k) U 12 (g λ1, g λ2 )G 1 2 (g λ1, g λ2 )Z n2 (g 1 + h, g 2 + k)), F n2 (h, k) = k G 1 22 (g λ1, g λ2 )(Z n2 (g λ1 + h, g λ2 + k) U 21 (g λ1, g λ2 )G 1 1 (g λ1, g λ2 )Z n1 (g λ1 + h, g λ2 + k)). The proof proceeds similarly to the proof of Theorem 2, with additional terms to approximate Z n1 and Z n2 by Z 1 and Z 2. Take n large enough so that with probability arbitrarily close to unity, S 1,(gλ1,g λ2 )(2d n1, α) N 1, S 2 (2d n2, α) N 2, r n1 < 1 and r 2 n2 < 1. For the rest of the 2 1

11 proof, we restrict to this event. It is not difficult to show that F n (S 1 (2d n1, α) S 2 (2d n2, α)) S 1 (2d n1, α) S 2 (2d n2, α). We then need to show that F n is a contraction on S 1 (2d n1, α) S 2 (2d n2, α). Expand Z n1 and Z n2 as in Theorem 2 and Section 3. After some algebraic steps, for h 1, h 2 S 1 (2d n1, α) and k 2, k 2 S 2 (2d n2, α), F n1 (h 1, k 1 ) F n1 (h 2, k 2 ) b 2(K 11,n d n1 + K 12,n d n2 + E 21,n ) h 2 h 1 b +2(K 13,n d n1 + K 14,n d n2 + E 22,n ) k 2 k 1 b, F n2 (h 1, k 1 ) F n2 (h 2, k 2 ) b 2(K 21,n d n1 + K 22,n d n2 + E 31,n ) h 2 h 1 b +2(K 23,n d n1 + K 24,n d n2 + E 32,n ) k 2 k 1 b. Therefore F n (h 1, h 2 ) F n (k 1, k 2 ) 2r n1 h 1 h 2 b + 2r n2 k 1 k 2 b, which indicates that F n is a contraction. To get the upper bound, notice that ḡ nλ1 g nλ1 b + ḡ nλ2 g nλ2 b = F n (h nλ1, k nλ2 ) b 4r n1 d n1 + 4r n2 d n2. This completes proof of Theorem 3. References [1] Cox, D. D. (1988). Approximation of Method of Regularization estimators. Ann. Statist. 16, [2] Cox, D. D. and O Sullivan (199). Asymptotic Analysis of Penalized Likelihood and Related Estimators. Ann. Statist. 18, [3] O Sullivan, F. (199). Convergence Characteristics of Methods of Regularization Estimators for Nonlinear Operator Equations. SIAM Journal on Numerical Analysis 27, [4] Zeidler, E. (1985). Nonlinear functional analysis and its applications. Springer: New York. 11

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,