A note on SVM estimators in RKHS for the deconvolution problem

Size: px

Start display at page:

Download "A note on SVM estimators in RKHS for the deconvolution problem"

Samuel Barrie Harmon
5 years ago
Views:

1 Communications for Statistical Applications and Methods 26, Vol. 23, No., Print ISSN / Online ISSN A note on SVM estimators in RKHS for the deconvolution problem Sungho Lee,a a Department of Statistics, Daegu University, Korea Abstract In this paper we discuss a deconvolution density estimator obtained using the support vector machines SVM) and Tikhonov s regularization method solving ill-posed problems in reproducing kernel Hilbert space RKHS). A remarkable property of SVM is that the SVM leads to sparse solutions, but the support vector deconvolution density estimator does not preserve sparsity as well as we expected. Thus, in section 3, we propose another support vector deconvolution estimator method II) which leads to a very sparse solution. The performance of the deconvolution density estimators based on the support vector method is compared with the classical kernel deconvolution density estimator for important cases of Gaussian and Laplacian measurement error by means of a simulation study. In the case of Gaussian error, the proposed support vector deconvolution estimator shows the same performance as the classical kernel deconvolution density estimator. Keywords: deconvolution, ill-posed problem, kernel density estimator, regularization, reproducing kernel Hilbert space RKHS), support vector machines SVM). Introduction The support vector machines SVM) was developed in the 96s by Vapnik and co-workers Vapnik and Lerner, 963; Vapnik and Chervonenkis; 964). It was initially designed to solve pattern recognition problems. Later the SVM was extended to classification, regression and real valued function estimation e.g. Vapnik, 995). In the early 99s the SVM was proposed for classification in the context of Vapnik s learning theory. The use of the SVM grew rapidly among computer scientists due to its success with real world data analysis problems. In the 99s it became clear to statistician that the SVM with the kernel trick was a solution to optimization problems in a reproducing kernel Hilbert space RKHS)). Reproducing kernel Hilbert spaces Aronszajn, 95; Wahba, 99) became prominent when their relation to the SVM was clear Moguerza and Munoz, 26; Wahba, 26). Reproducing kernel Hilbert spaces RKHS) were explicitly introduced in learning theory by Girosi 998). In general it is quite difficult to find useful function spaces that are not RKHS. The problem of measurements being contaminated with noise exists in many different fields e.g. Mendelsohn and Rice, 982; Stefanski and Carroll, 99; Zhang, 992). This deconvolution problem can be stated as follows. Let X and Z be independent random variables with density functions f x) and qz), respectively, where f x) is unknown and qz) is known. One observes a sample of random variables Y i = X i + Z i, i =, 2,..., n. The objective is to estimate the density function f x) where gy) is the convolution of f x) and qz), gy) = f q)y) = f y z)qz)dz. The most popular approach to this deconvolution problem has been to estimate f x) using a kernel estimator and Fourier Department of Statistics, Daegu University, Kyungbuk 38453, Korea. shlee@daegu.ac.kr Published 3 January 26 / journal homepage: c 26 The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.

2 72 Sungho Lee transform e.g. Carroll and Hall, 988; Fan, 99). Kernel density estimation is widely considered the most popular approach to density deconvolution; however, other alternatives have been proposed e.g. Hall and Qiu, 25; Hazelton and Turlach, 29; Mendelsohn and Rice, 982; Pensky and Vidakovic, 999). The support vector regression method to density estimation was discussed by Weston et al. 999) and support vector density estimation in RKHS based on Phillips residual method Phillips, 962) was discussed by Mukherjee and Vapnik 999). The SVM in RKHS can be applied to the deconvolution problem with the support vector density estimation and Fourier transform Lee, 2, 22). In this paper we discuss deconvolution estimators obtained by support vector density estimation and Fourier transform in the framework of Tikhonov regularization method in RKHS and propose a support vector deconvolution estimator which leads to very sparse solution. 2. RKHS and SVM The support vector regression algorithm computes a nonlinear function in the space of the input data R m by using a linear function in high dimensional feature space F with a dot product. The functions take the form f x) = ω Φx) + b with Φ : R m F and ω F. In order to estimate f x) in the deconvolution problem by applying the support vector regression method Vapnik, 995), we first take gy) = ω Φx)+b from a training set {y, G n y )),..., y n, G n y n ))}, where G n y) is the empirical distribution function. Then we try to minimize the empirical risk function R emp G) with a complexity term ω 2 : R reg G) = R emp G) + λ 2 ω 2 = n where Vapnik s ϵ-insensitive loss function Gy) G n y) ϵ is described by Gy i ) G n y i ) ϵ + λ 2 ω 2, 2.) Gy) G n y) ϵ = { Gy) Gn y) ϵ, for Gy) G n y) ϵ,, otherwise. 2.2) Equation 2.) can be minimized by solving the quadratic programming problem formulated in terms of dot products in feature space F. Then applying the Fourier inversion formula, we can obtain f ˆx) Lee and Taylor, 28). This method can be constructed in the framework of regularization theory in RKHS. A real) RKHS H is a Hilbert space of real-valued functions f on an interval T with the property that, for each t T, the evaluation functional L t, L t : f f t), is a bounded linear functional. Then, by Riesz representation theorem, for each t T there exists a unique element K t H such that for each f H, L t f ) = f t) =< K t, f >. The function defined by K u v) = Ku, v) =< K u, K v > for u, v T is called the reproducing kernel. Conversely, a positive definite kernel K can define a unique RKHS with the following Moore-Aronszajn theorem. This result is important because it gives us a construction of an RKHS in terms of its reproducing kernel. Let X be some set, for example a subset of R n or R n itself. A kernel is a symmetric function K : X X R. A kernel K is called positive definite if its associated kernel matrix n i, c i c j Kx i, x j ) for any n N, and any x, x 2,..., x n X and c, c 2,..., c n R. Alternatively, if Kx, z)ux)uz)dxdz, for all u L X X X), then kernel K is called integrally positive definite. Two definitions are equivalent for continuous kernel K. Theorem. Aronszajn, 95) For every positive definite kernel K on X X there is a unique RKHS H K on X with K as its reproducing kernel.

3 A note on SVM estimators in RKHS for the deconvolution problem 73 We now present Mercer s theorem Mercer, 99) briefly, which is one of the fundamental mathematical results underlying learning theory with kernels. In the SVM literature, the kernel is called Mercer kernel and used as a linear function in high dimensional feature space F with a dot product. The Mercer s theorem is as follows e.g. see Scholkopf and Smola 22) for the details): A positive definite kernel K : X X R, with X a closed subset of R n has the expansion Kx, y) = λ i ϕ i x)ϕ i y), where the convergence is in L 2 X, µ) and ϕ i L 2 X, µ), i =, 2,..., are the orthonormal eigenfunctions of the integral equation Kx, y)ϕx)dµx) = λϕy). X We can now define the RKHS as the space of functions spanned by the eigenfunctions of the integral operator defined by the kernel. Let us consider the set of functions, H K = f f x) = ω r ϕ r x) and f HK <, r= where the inner product is defined as < f, g >= ω i ω i /λ i and the RKHS norm f HK is defined as f 2 H K = ω 2 i /λ i. Then we have a RKHS H K with its reproducing kernel K. Assume further that Kx, X X y)2 dxdy <. Then kernel K will have a countable sequence of eigenvalues and eigenfunctions, and hence a natural feature map that arises from Mercer s theorem is y Φy) = λ ϕ y),..., λ i ϕ i y),... ) and Kx, y) =< Φx), Φy) >. We can apply these properties of RKHS to the density estimation. Notice that for the Gaussian kernel on the infinite real line, /σ 2 dxdy = and hence Mercer s theorem does not e x y)2 apply. However, it turns out that one can derive a Mercer expansion for the Gaussian kernel with the orthonormal functions in L 2 R, ρ), ρx) = α/ π) e α2 x 2, α > Rasmussen and Williams, 26). Eigenvalues are not necessary countable and Mercer theorem does not apply if the kernel is defined over an unbounded domain. Feature maps do not necessary arise from Mercer s theorem. Let us consider a special class of kernels that is widely used in practice. The kernel is translation invariant, or Kx, y) = Kx y). This implies that we will have to consider Fourier hypothesis spaces and all these spaces will be defined via Fourier transform. The eigenvalue problem for translation invariant kernels is Kx y)ϕx)dx = λϕy), and hence by the convolution theorem the Fourier transform of the equation is Kω) ϕω) = λ ϕω), where ϕω) = ϕx)e iwx dx. The following Bochner s theorem relates the Fourier transform of a kernel to it being positive definite. Theorem 2. Bochner, 959) A function Kx y) is positive definite if and only if it is the Fourier transform of a symmetric, positive function Kω) decreasing to at infinity. If we consider a positive definite function Kx y) and define, in the Fourier domain, the inner product < f x), gx) > HK = /2π) f ω) gω) / Kω)dω, the subspace H R K of L 2 space of the functions f is a RKHS with the norm f 2 H K = /2π) f ω) 2 / Kω)dω < ). The well-known Gaussian R kernel represents an inner product in feature space F. Let C R) denote the set of continuous functions on R that vanish at infinity. Then the reproducing kernel Hilbert space H K Vert and Vert, 26) associated with the normalized Gaussian kernel Kx, y) = σ 2π) e x y)2 /2σ 2 is H K = { f C R) : f L R) and R f ω) } 2 σ e 2 ω 2 2 dω <

4 74 Sungho Lee and since f ω) 2 e σ2 ω 2 /2 dω = R k= σ 2k /2 k k!) R ω2k f ω) 2 dω, the space H K for the Gaussian kernel consists of the set of square integrable functions whose derivatives of all orders are square integrable. When sample observations are contaminated by normally distributed errors, the kernel K that has compactly supported Fourier transform, Kω) = ω 2 ) 3 I [,] ω), is used in common. Since Kω)= ω 2 ) 3 I [,] ω)) is also a symmetric, positive function decreasing to at infinity, the RKHS H K associated with the kernel K, ) ) 48 cosx y) 5 44 sinx y) 5 Kx, y) = Kx y) = 2, πx y) 4 x y) 2 πx y) 5 x y) 2 is a subspace of L 2 R) with norm, f 2 H K = /2π) R f ω) 2 / Kω)dω <. 3. SVM deconvolution estimators in RKHS The problem of approximating a density function from data can be dealt with an ill-posed prob- e.g. Vapnik, 995). A density function f x) will be defined as the solution of the equation lem x f t)dt = Fx), where Fx) is a distribution function and hence the problem of solving the linear operator equation A f = F with approximation ˆF n x) of unknown distribution function Fx) is ill-posed. A classical solution to this problem is the regularization theory Tikhonov and Arsenin, 977). Given a training sample {x, y ),..., x n, y n )}, the Tikhonov regularization method that we will consider here is defined by an optimization problem over RKHS: f = arg min V f x f H K i ), y i ) + λ n 2 f H K, where V is a loss function, f HK is a norm in RKHS H K defined by the positive definition function K and λ is the regularization parameter. In RKHS H K defined by the positive definite kernel K, The solution to the Tikhonov regularization method has a very compact representation by the generalized Representer theorem refer to Scholkopf et al. 2) for the details) as follows: f x) = α i Kx, x i ), α i R, i n. Now we discuss a SVM deconvolution estimator by Method I Mukherjee and Vapnik, 999; Lee, 22) within the framework of Tikhonov regularization method in RKHS H K as ill-posed problems and propose a new SVM deconvolution estimator by Method II. Let X and Z be independent random variables with density functions f x) and qz) respectively, where f x) is unknown and qz) is known. Assume that we observe a random sample Y, Y 2,..., Y n from a density gy) H K, where Y i = X i + Z i, i =, 2,..., n and let {y, y 2,..., y n } be a training data set. Method I. Step. Choose a positive definite kernel K and ϵ-insensitive loss function which consists of distribution function Gy) of Y and empirical distribution function G n y) of a random sample Y, Y 2,..., Y n. Step 2. Find a density estimator ĝy) to satisfy the following Tikhonov minimization problem by applying Representer theorem on RKHS H K and SVM regression method e.g. Smola and

5 A note on SVM estimators in RKHS for the deconvolution problem 75 Scholkopf, 23; Vapnik, 995): ĝ = arg min g H K n Gy i ) G n y i ) ϵ + λ 2 g 2 H K, where gy, ω) = n ω i K h y i, y), ω i, n ω i =, g 2 H K = n i, ω i ω j K h y i, y j ), G n y) = /n) n Iy i y), Gy) = y n ω i K h y i, t)dt, and the coefficients ω i s of ĝω) can be found by solving the following problem ) max min L ω, ξ, α,α,η,η ω,ξ,ξ ξ, α, α, η, η ) = ) ω i ω j K h yi, y j + C ξ i + ξi 2 ) ηi ξ i + η ) i ξ i yi α i ϵ + ξ i G n y i ) + ω j K h y, y j )dy yi ϵ + ξ i + G n y i ) ω j K h y, y j )dy, α i where ω, ξ, ξ, α, α, η, η. Then the coefficients ω i s can be found by solving the following quadratic programming problem and applying the equation ω = Γ h Rα α ): min α,α 2 α α ) t R t Γh Rα α ) G n y i )α i α i ) + ϵ α i + α i ), α i, α i C, i =,..., n, where Γ h = [K h y i, y j )] n n, R = [r i j ] n n, r i j = y j K hy, y i )dy = y j /h n)ky y i )/h n )dy. Step 3. Find density estimator ˆ f x) by applying the Fourier inversion formula, f ˆx) = ĝξ) 2π qξ) eiξx dξ, where ĝξ) = ω i K h y i, y)e iξy dy. A remarkable property of SVM is that ϵ-insensitive loss function 2.2) leads to sparse solutions. Unfortunately, the support vector deconvolution density estimator by Method I does not preserve sparsity in terms of coefficients ω, ω = Γ h Rα α ) as expected because a sparse decomposition in α and α is spoiled by Γ h R, which is not in general diagonal. However it is still attractive in the sense that some coefficients in ω = Γ h Rα α ) are very close to zero. Now we propose another support vector method which has a more sparse solution than Method I. The following Method II uses the support vector Regression and Tikhonov regularization method in RKHS. The difference between two methods depends on components of ϵ-insensitive loss function. In Method I, ϵ-insensitive loss function is defined by distribution function Gy) and empirical distribution function G n y), which cause it to lose sparse solution. In Method II, ϵ-insensitive loss function is

6 76 Sungho Lee defined by density function gy) and a classical kernel density estimator ĝ k, which leads to accomplish sparse solution. In kernel density estimation, ĝ k y) converge to gy) in probability if g is continuous at y and h n, nh n as n. Now we will consider the problem of approximating based on the set of data, {y, ĝ k y ), y 2, ĝ k y 2 )),..., y n, ĝ k y n ))}, by applying the support vector Regression and Tikhonov regularization method in RKHS. Method II. Step. Choose a positive definite kernel K and ϵ-insensitive loss function which consists of density function gy) and a classical kernel density estimator ĝ k y) with the kernel K. Step 2. Find a density estimator ĝy) to satisfy the following Tikhonov regularization problem by applying Representer theorem on RKHS H K and SVM regression method: ĝ = arg min g H n gy i ) ĝ k y i ) ϵ + λ 2 g 2 H K, where ĝ k y) = /nh n ) n Ky y j )/h n ) and the coefficients ω i s can be found by solving the following problem ) max min L ω, ξ, α,α,η,η ω,ξ,ξ ξ, α, α, η, η ) = ) ω i ω j K h yi, y j + C ξi + ξ ) i ηi ξ i + η ) i 2 ξ i α i ϵ + ξ ) i ĝ k y i ) + ω j K h yi, y j ϵ + ) ξ i + ĝ k y i ) ω j K h yi, y j, α i where ω, ξ, ξ, α, α, η, η. Then the coefficients ω i s can be found by solving the following quadratic programming problem and applying the equation ω = α α : min α,α 2 α α ) t Γ h α α ) ĝ k y i )α i α i ) + ϵ α i + α i ), where α i, α i C, i =,..., n, Γ h = [K h y i, y j )] n n. Step 3. Find density estimator ˆ f x) by applying the Fourier inversion formula, f ˆx) = ĝξ) 2π qξ) eiξx dξ, where ĝξ) = ω i K h y i, y)e iξy dy. This setting of Method II preserve sparsity in terms of the coefficients ω because ω = α α and α i, α i vanish for gy i ) ĝ k y i ) < ϵ. However, Method I does not preserve sparsity because ω = Γ h Rα α ) and a sparse decomposition in α and α is spoiled by Γ h R.

7 A note on SVM estimators in RKHS for the deconvolution problem Simulation and discussion The asymptotic properties of the kernel density estimator in deconvolution problems depend strongly on error distribution. Following the work of Fan 99), two types of error distributions can be considered: ordinary smooth and supersmooth distributions. Normal, mixture normal, and Cauchy distributions are supersmooth, therefore, the Fourier transform qξ) = e iξz qz)dz) of qz) satisfies ) ) d ξ β exp ξ β qξ) d ξ β exp ξ β as ξ, γ γ for some positive constants d, d, β, γ and constants β and β. Gamma and double exponential distributions are ordinary smooth, that is, the Fourier transform qξ) of qz) satisfies d ξ β qξ) d ξ β, as ξ, for some positive constants d, d, β. Thus in the classical deconvolution literature, normal and double exponential distributions have been typically selected and investigated as error distributions. In this section we compare the performance of three different deconvolution density estimators when measurement errors are Laplacian or normal: classical kernel deconvolution estimators and support vector deconvolution density estimators based on Method I and II. Target distributions are selected from distribution functions used in Hazelton and Turlach 29). When measurement errors are double exponential, the classical kernel deconvolution density estimator with Gaussian kernel is evaluated as f ˆx) = 2πn = n 2π σ h e iξx y j) Kσ h ξ) dξ qξ) x y ) j)2 2σ e 2 h σ2 Z x y 2 j σ 2 σ h h, 4.) where Kσ h ξ) = e.5σ2 h ξ2, qz) = /2σ Z )e z /σ Z, and qξ) = + σ Z ξ 2 ). The support vector deconvolution density estimator with Gaussian kernel is evaluated as f ˆx) = ω j K h y j, ξ)e iξx dξ 2π qξ) x y ) j)2 2σ = ω j e 2 h 2π σh σ2 Z x y 2 j σ 2 σ h h. 4.2) When measurement errors are normal, in order to avoid problems of integrability, a kernel K that has compactly supported Fourier transform Kω) = ω 2 ) 3 I [,] ω) is used in common. Using this kernel Kx) = {48 cos x)/πx 4 )} 5/x 2 ) {44 sin x)/πx 5 )}2 5/x 2 ), a kernel deconvolution density estimator f ˆx) in the presence of normal measurement error can be calculated as: f ˆx) = e iξx y j) Kh n ξ) dξ 2πn qξ) = nπh n ξ 2 ) 3 cos ξ x y j h n )) σ 2 Z ξ2 2h e 2 n dξ, 4.3)

8 78 Sungho Lee Table : The bandwidth used in the simulation when qz) is Gaussian f x) N, ).5N 2.5, ) +.5N2.5, ) 2 3 N, ) + 3 N,.4) varz)/varx) bandwidth= h n ) , I.4, I.4, I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure : f x) is N, ) and qz) is Gaussian. where Kξ) = ξ 2 ) 3 I [,] ξ), qz) = 2πσ Z ) e z2 /2σ 2 Z, and qξ) = e σ 2 Z ξ2 /2. The support vector deconvolution density estimator is evaluated as f ˆx) = 2π = 2π = πh n ω j K h y j, ξ ) e iξx dξ qξ) ω j hn hn ω j h 2 n ξ 2) 3 e iξy j e σ2 Z ξ2 2 e iξx dξ ξ 2 ) 3 cos ξ x y j h n )) σ 2 Z ξ2 2h e 2 n dξ. 4.4) All of the Figures 6 show plots of the classical kernel deconvolution estimates and support vector deconvolution estimates when points are randomly generated respectively from a target distribution f x) and a noise distribution, normal distribution qz) with mean zero or double exponential distribution qz) with mean zero. The measurement error variance is set at low = varz)/varx) =.), moderate = varz)/varx) =.25), and high levels = varz)/varx) =.5) as shown in Hazelton and Turlach 29). The exact probability density function f x) is shown in bold line. For the support vector deconvolution estimates, Gunn s program Gunn, 998) and MATLAB 6.5 were used. In kernel density estimation the choice of kernel is not crucial, but the choice of bandwidth is very important. In the following figures, a rule of thumb bandwidth as an initial value Fan, 99, 992; Wang and

9 A note on SVM estimators in RKHS for the deconvolution problem , I, I I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 2: f x) is.5n 2.5, ) +.5N2.5, ) and qz) is Gaussian , I.6.5, I.6.5, I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 3: f x) is 2/3N, ) + /3N,.4) and qz) is Gaussian. Wang, 2) is used: for Laplacian errors, σ h = 5σ 4 Z /n)/9 and for Gaussian errors, h n = σ Z ln n).5. Figures 3 presents the simulation study when qz) is Gaussian. The bandwidth= h n ) used for kernel deconvolution estimates 4.3) and support vector deconvolution estimates 4.4) in the simulation is in Table. All the deconvolution estimates used the same bandwidth. The parameters ϵ, C) used for all the support vector deconvolution estimates are.5, ). In the figures, and I represent kernel deconvolution estimates, support vector deconvolution estimates by Method I, and support

10 8 Sungho Lee Table 2: The bandwidth= σ h ) used in the simulation when qz) is Laplacian f x) N, ).5N 2.5, ) +.5N2.5, ) 2 3 N, ) + 3 N,.4) varz)/varx) I I.4.35 I.4.35 I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 4: f x) is N, ) and qz) is Laplasian. vector deconvolution estimates by Method II respectively. In the case of Gaussian error, all support vector deconvolution estimates by Method II show the same figures as the usual kernel deconvolution estimates as Figures 3 indicates. In Figures 3, the number of data support vectors) used for sve II is to 5 among random numbers and the number of data used for is 4 to 6. Figures 4 6 presents a simulation study when qz) is double exponential distribution with mean zero. The bandwidth= σ h ) used for kernel deconvolution estimator 4.) and support vector deconvolution estimator 4.2) in the simulation is in Table 2. The parameters ϵ, C) used for the support vector deconvolution estimates by Method I and the support vector deconvolution estimates by Method II are.5, ) and.5,.) respectively. In the case of Laplacian error, support vector deconvolution density estimates by Method II show very similar figures to the usual kernel deconvolution density estimates as Figures 4 6 indicate. In Figures 4 6, the number of data support vectors) used for sve II is 5 to 5 among random numbers. However the number of data used for is not sparse as ω = Γ h Rα α ) indicates. 5. Concluding remarks Three different deconvolution density estimators were discussed in this paper when the sample observations are contaminated by Gaussian or double exponentially distributed errors. The support vector deconvolution density estimators by Method II lead to very sparse solution and the number of support

11 A note on SVM estimators in RKHS for the deconvolution problem I I.5 I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 5: f x) is.5n 2.5, ) +.5N2.5, ) and qz) is Laplasian I.7.6 I I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 6: f x) is 2/3N, ) + /3N,.4) and qz) is Laplasian. vectors is to 5 among random numbers in the simulation. Furthermore, it is as good as the usual kernel deconvolution density estimators even though the simulation study conducted is limited. In the case of Gaussian error, the support vector deconvolution density estimates by Method II show the same figures as the kernel deconvolution density estimates. The support vector deconvolution density estimator by Method I is also attractive in the sense that some coefficients in ω = Γ h Rα α ) are close to zero.

12 82 Sungho Lee Acknowledgements This research was supported by the Daegu University Research Grant, 24. References Aronszajn N 95). Theory of reproducing kernels, Transactions of the American Mathematical Society, 68, Bochner S 959). Lectures on Fourier Integral, Princeton University Press, Princeton, New Jersey. Carroll RJ and Hall P 988). Optimal rates of convergence for deconvoluting a density, Journal of the American Statistical Association, 83, Fan J 99). On the optimal rates of convergence for nonparametric deconvolution problem, Annals of Statistics, 9, Fan J 992). Deconvolution with supersmooth distribution, Canadian Journal of Statistics, 2, Girosi F 998). An equivalence between sparse approximation and support vector machines, Neural Computation,, Gunn SR 998). Support vector machines for classification and regression, Technical report, University of Southampton. Hall P and Qiu P 25). Discrete-transform approach to deconvolution problems, Biometrika, 92, Hazelton ML and Turlach BA 29). Nonparametric density deconvolution by weighted kernel estimators, Statistics and Computing, 9, Lee S 2). A support vector method for the deconvolution problem, Communications of the Korean Statistical Society, 7, Lee S 22). A note on deconvolution estimators when measurements errors are normal, Communications of the Korean Statistical Society, 9, Lee S and Taylor RL 28). A note on support vector density estimation for the deconvolution problem, Communications in Statistics: Theory and Methods, 37, Mendelsohn J and Rice R 982). Deconvolution of microfluorometric histograms with B splines, Journal of the American Statistical Association, 77, Mercer J 99). Functions of positive and negative type and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society of London, A 29, Moguerza JM and Munoz A 26). Support vector machines with applications, Statistical Science, 2, Mukherjee S and Vapnik V 999). Support vector method for multivariate density estimation, In, Proceedings in Neural Information Processing Systems, Pensky M and Vidakovic B 999). Adaptive wavelet estimator for nonparametric density deconvolutoin, Annals of Statistics, 27, Phillips DL 962). A technique for the numerical solution of integral equations of the first kind, Journal of the Association for Computing Machinery, 9, Rasmussen CE and Williams CKI 26). Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA. Scholkopf B, Herbrich R, and Smola AJ 2). A generalized representer theorem, Computational Learning Theory, Lecture Notes in Computer Science, 2, Scholkopf B and Smola AJ 22). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA.

13 A note on SVM estimators in RKHS for the deconvolution problem 83 Smola AJ and Scholkopf B 23). A tutorial on support vector regression, Statistics and Computing, 4, Stefanski L and Carroll RJ 99). Deconvoluting kernel density estimators, Statistics, 2, Tikhonov AN and Arsenin VY 977). Solution of Ill-posed Problems, W. H. Winston, Washington. Vapnik V 995). The Nature of Statistical Learning Theory, Springer Verlag, New York. Vapnik V and Chervonenkis A 964). A note on one class of perceptrons, Automation and Remote Control, 25, 3 9. Vapnik V and Lerner L 963). Pattern recognition using generalized portrait method, Automation and Remote Control, 24, Vert R and Vert J 26). Consistency and convergence rates of one-class SVMs and related algorithms, Journal of Machine Learning Research, 7, Wahba G 99). Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM, Philadelphia. Wahba G 26). Comments to support vector machines with applications by J.M. Moguerza and A. Munoz, Statistical Sciences, 2, Wang X and Wang B 2). Deconvolution estimation in measurement error models: The R package decon, Journal of Statistical Software, 39, i. Weston J, Gammerman A, Stitson M, Vapnik V, Vovk V, and Watkins C 999). Support vector density estimation. In Scholkopf, B. and Smola, A., editors, Advances in Kernel Methods-Support Vector Learning, , MIT Press, Cambridge, MA. Zhang HP 992). On deconvolution using time of flight information in positron emission tomography, Statistica Sinica, 2, Received November 3, 25; Revised January 3, 26; Accepted January 3, 26

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators