A note on SVM estimators in RKHS for the deconvolution problem

Size: px
Start display at page:

Download "A note on SVM estimators in RKHS for the deconvolution problem"

Transcription

1 Communications for Statistical Applications and Methods 26, Vol. 23, No., Print ISSN / Online ISSN A note on SVM estimators in RKHS for the deconvolution problem Sungho Lee,a a Department of Statistics, Daegu University, Korea Abstract In this paper we discuss a deconvolution density estimator obtained using the support vector machines SVM) and Tikhonov s regularization method solving ill-posed problems in reproducing kernel Hilbert space RKHS). A remarkable property of SVM is that the SVM leads to sparse solutions, but the support vector deconvolution density estimator does not preserve sparsity as well as we expected. Thus, in section 3, we propose another support vector deconvolution estimator method II) which leads to a very sparse solution. The performance of the deconvolution density estimators based on the support vector method is compared with the classical kernel deconvolution density estimator for important cases of Gaussian and Laplacian measurement error by means of a simulation study. In the case of Gaussian error, the proposed support vector deconvolution estimator shows the same performance as the classical kernel deconvolution density estimator. Keywords: deconvolution, ill-posed problem, kernel density estimator, regularization, reproducing kernel Hilbert space RKHS), support vector machines SVM). Introduction The support vector machines SVM) was developed in the 96s by Vapnik and co-workers Vapnik and Lerner, 963; Vapnik and Chervonenkis; 964). It was initially designed to solve pattern recognition problems. Later the SVM was extended to classification, regression and real valued function estimation e.g. Vapnik, 995). In the early 99s the SVM was proposed for classification in the context of Vapnik s learning theory. The use of the SVM grew rapidly among computer scientists due to its success with real world data analysis problems. In the 99s it became clear to statistician that the SVM with the kernel trick was a solution to optimization problems in a reproducing kernel Hilbert space RKHS)). Reproducing kernel Hilbert spaces Aronszajn, 95; Wahba, 99) became prominent when their relation to the SVM was clear Moguerza and Munoz, 26; Wahba, 26). Reproducing kernel Hilbert spaces RKHS) were explicitly introduced in learning theory by Girosi 998). In general it is quite difficult to find useful function spaces that are not RKHS. The problem of measurements being contaminated with noise exists in many different fields e.g. Mendelsohn and Rice, 982; Stefanski and Carroll, 99; Zhang, 992). This deconvolution problem can be stated as follows. Let X and Z be independent random variables with density functions f x) and qz), respectively, where f x) is unknown and qz) is known. One observes a sample of random variables Y i = X i + Z i, i =, 2,..., n. The objective is to estimate the density function f x) where gy) is the convolution of f x) and qz), gy) = f q)y) = f y z)qz)dz. The most popular approach to this deconvolution problem has been to estimate f x) using a kernel estimator and Fourier Department of Statistics, Daegu University, Kyungbuk 38453, Korea. shlee@daegu.ac.kr Published 3 January 26 / journal homepage: c 26 The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.

2 72 Sungho Lee transform e.g. Carroll and Hall, 988; Fan, 99). Kernel density estimation is widely considered the most popular approach to density deconvolution; however, other alternatives have been proposed e.g. Hall and Qiu, 25; Hazelton and Turlach, 29; Mendelsohn and Rice, 982; Pensky and Vidakovic, 999). The support vector regression method to density estimation was discussed by Weston et al. 999) and support vector density estimation in RKHS based on Phillips residual method Phillips, 962) was discussed by Mukherjee and Vapnik 999). The SVM in RKHS can be applied to the deconvolution problem with the support vector density estimation and Fourier transform Lee, 2, 22). In this paper we discuss deconvolution estimators obtained by support vector density estimation and Fourier transform in the framework of Tikhonov regularization method in RKHS and propose a support vector deconvolution estimator which leads to very sparse solution. 2. RKHS and SVM The support vector regression algorithm computes a nonlinear function in the space of the input data R m by using a linear function in high dimensional feature space F with a dot product. The functions take the form f x) = ω Φx) + b with Φ : R m F and ω F. In order to estimate f x) in the deconvolution problem by applying the support vector regression method Vapnik, 995), we first take gy) = ω Φx)+b from a training set {y, G n y )),..., y n, G n y n ))}, where G n y) is the empirical distribution function. Then we try to minimize the empirical risk function R emp G) with a complexity term ω 2 : R reg G) = R emp G) + λ 2 ω 2 = n where Vapnik s ϵ-insensitive loss function Gy) G n y) ϵ is described by Gy i ) G n y i ) ϵ + λ 2 ω 2, 2.) Gy) G n y) ϵ = { Gy) Gn y) ϵ, for Gy) G n y) ϵ,, otherwise. 2.2) Equation 2.) can be minimized by solving the quadratic programming problem formulated in terms of dot products in feature space F. Then applying the Fourier inversion formula, we can obtain f ˆx) Lee and Taylor, 28). This method can be constructed in the framework of regularization theory in RKHS. A real) RKHS H is a Hilbert space of real-valued functions f on an interval T with the property that, for each t T, the evaluation functional L t, L t : f f t), is a bounded linear functional. Then, by Riesz representation theorem, for each t T there exists a unique element K t H such that for each f H, L t f ) = f t) =< K t, f >. The function defined by K u v) = Ku, v) =< K u, K v > for u, v T is called the reproducing kernel. Conversely, a positive definite kernel K can define a unique RKHS with the following Moore-Aronszajn theorem. This result is important because it gives us a construction of an RKHS in terms of its reproducing kernel. Let X be some set, for example a subset of R n or R n itself. A kernel is a symmetric function K : X X R. A kernel K is called positive definite if its associated kernel matrix n i, c i c j Kx i, x j ) for any n N, and any x, x 2,..., x n X and c, c 2,..., c n R. Alternatively, if Kx, z)ux)uz)dxdz, for all u L X X X), then kernel K is called integrally positive definite. Two definitions are equivalent for continuous kernel K. Theorem. Aronszajn, 95) For every positive definite kernel K on X X there is a unique RKHS H K on X with K as its reproducing kernel.

3 A note on SVM estimators in RKHS for the deconvolution problem 73 We now present Mercer s theorem Mercer, 99) briefly, which is one of the fundamental mathematical results underlying learning theory with kernels. In the SVM literature, the kernel is called Mercer kernel and used as a linear function in high dimensional feature space F with a dot product. The Mercer s theorem is as follows e.g. see Scholkopf and Smola 22) for the details): A positive definite kernel K : X X R, with X a closed subset of R n has the expansion Kx, y) = λ i ϕ i x)ϕ i y), where the convergence is in L 2 X, µ) and ϕ i L 2 X, µ), i =, 2,..., are the orthonormal eigenfunctions of the integral equation Kx, y)ϕx)dµx) = λϕy). X We can now define the RKHS as the space of functions spanned by the eigenfunctions of the integral operator defined by the kernel. Let us consider the set of functions, H K = f f x) = ω r ϕ r x) and f HK <, r= where the inner product is defined as < f, g >= ω i ω i /λ i and the RKHS norm f HK is defined as f 2 H K = ω 2 i /λ i. Then we have a RKHS H K with its reproducing kernel K. Assume further that Kx, X X y)2 dxdy <. Then kernel K will have a countable sequence of eigenvalues and eigenfunctions, and hence a natural feature map that arises from Mercer s theorem is y Φy) = λ ϕ y),..., λ i ϕ i y),... ) and Kx, y) =< Φx), Φy) >. We can apply these properties of RKHS to the density estimation. Notice that for the Gaussian kernel on the infinite real line, /σ 2 dxdy = and hence Mercer s theorem does not e x y)2 apply. However, it turns out that one can derive a Mercer expansion for the Gaussian kernel with the orthonormal functions in L 2 R, ρ), ρx) = α/ π) e α2 x 2, α > Rasmussen and Williams, 26). Eigenvalues are not necessary countable and Mercer theorem does not apply if the kernel is defined over an unbounded domain. Feature maps do not necessary arise from Mercer s theorem. Let us consider a special class of kernels that is widely used in practice. The kernel is translation invariant, or Kx, y) = Kx y). This implies that we will have to consider Fourier hypothesis spaces and all these spaces will be defined via Fourier transform. The eigenvalue problem for translation invariant kernels is Kx y)ϕx)dx = λϕy), and hence by the convolution theorem the Fourier transform of the equation is Kω) ϕω) = λ ϕω), where ϕω) = ϕx)e iwx dx. The following Bochner s theorem relates the Fourier transform of a kernel to it being positive definite. Theorem 2. Bochner, 959) A function Kx y) is positive definite if and only if it is the Fourier transform of a symmetric, positive function Kω) decreasing to at infinity. If we consider a positive definite function Kx y) and define, in the Fourier domain, the inner product < f x), gx) > HK = /2π) f ω) gω) / Kω)dω, the subspace H R K of L 2 space of the functions f is a RKHS with the norm f 2 H K = /2π) f ω) 2 / Kω)dω < ). The well-known Gaussian R kernel represents an inner product in feature space F. Let C R) denote the set of continuous functions on R that vanish at infinity. Then the reproducing kernel Hilbert space H K Vert and Vert, 26) associated with the normalized Gaussian kernel Kx, y) = σ 2π) e x y)2 /2σ 2 is H K = { f C R) : f L R) and R f ω) } 2 σ e 2 ω 2 2 dω <

4 74 Sungho Lee and since f ω) 2 e σ2 ω 2 /2 dω = R k= σ 2k /2 k k!) R ω2k f ω) 2 dω, the space H K for the Gaussian kernel consists of the set of square integrable functions whose derivatives of all orders are square integrable. When sample observations are contaminated by normally distributed errors, the kernel K that has compactly supported Fourier transform, Kω) = ω 2 ) 3 I [,] ω), is used in common. Since Kω)= ω 2 ) 3 I [,] ω)) is also a symmetric, positive function decreasing to at infinity, the RKHS H K associated with the kernel K, ) ) 48 cosx y) 5 44 sinx y) 5 Kx, y) = Kx y) = 2, πx y) 4 x y) 2 πx y) 5 x y) 2 is a subspace of L 2 R) with norm, f 2 H K = /2π) R f ω) 2 / Kω)dω <. 3. SVM deconvolution estimators in RKHS The problem of approximating a density function from data can be dealt with an ill-posed prob- e.g. Vapnik, 995). A density function f x) will be defined as the solution of the equation lem x f t)dt = Fx), where Fx) is a distribution function and hence the problem of solving the linear operator equation A f = F with approximation ˆF n x) of unknown distribution function Fx) is ill-posed. A classical solution to this problem is the regularization theory Tikhonov and Arsenin, 977). Given a training sample {x, y ),..., x n, y n )}, the Tikhonov regularization method that we will consider here is defined by an optimization problem over RKHS: f = arg min V f x f H K i ), y i ) + λ n 2 f H K, where V is a loss function, f HK is a norm in RKHS H K defined by the positive definition function K and λ is the regularization parameter. In RKHS H K defined by the positive definite kernel K, The solution to the Tikhonov regularization method has a very compact representation by the generalized Representer theorem refer to Scholkopf et al. 2) for the details) as follows: f x) = α i Kx, x i ), α i R, i n. Now we discuss a SVM deconvolution estimator by Method I Mukherjee and Vapnik, 999; Lee, 22) within the framework of Tikhonov regularization method in RKHS H K as ill-posed problems and propose a new SVM deconvolution estimator by Method II. Let X and Z be independent random variables with density functions f x) and qz) respectively, where f x) is unknown and qz) is known. Assume that we observe a random sample Y, Y 2,..., Y n from a density gy) H K, where Y i = X i + Z i, i =, 2,..., n and let {y, y 2,..., y n } be a training data set. Method I. Step. Choose a positive definite kernel K and ϵ-insensitive loss function which consists of distribution function Gy) of Y and empirical distribution function G n y) of a random sample Y, Y 2,..., Y n. Step 2. Find a density estimator ĝy) to satisfy the following Tikhonov minimization problem by applying Representer theorem on RKHS H K and SVM regression method e.g. Smola and

5 A note on SVM estimators in RKHS for the deconvolution problem 75 Scholkopf, 23; Vapnik, 995): ĝ = arg min g H K n Gy i ) G n y i ) ϵ + λ 2 g 2 H K, where gy, ω) = n ω i K h y i, y), ω i, n ω i =, g 2 H K = n i, ω i ω j K h y i, y j ), G n y) = /n) n Iy i y), Gy) = y n ω i K h y i, t)dt, and the coefficients ω i s of ĝω) can be found by solving the following problem ) max min L ω, ξ, α,α,η,η ω,ξ,ξ ξ, α, α, η, η ) = ) ω i ω j K h yi, y j + C ξ i + ξi 2 ) ηi ξ i + η ) i ξ i yi α i ϵ + ξ i G n y i ) + ω j K h y, y j )dy yi ϵ + ξ i + G n y i ) ω j K h y, y j )dy, α i where ω, ξ, ξ, α, α, η, η. Then the coefficients ω i s can be found by solving the following quadratic programming problem and applying the equation ω = Γ h Rα α ): min α,α 2 α α ) t R t Γh Rα α ) G n y i )α i α i ) + ϵ α i + α i ), α i, α i C, i =,..., n, where Γ h = [K h y i, y j )] n n, R = [r i j ] n n, r i j = y j K hy, y i )dy = y j /h n)ky y i )/h n )dy. Step 3. Find density estimator ˆ f x) by applying the Fourier inversion formula, f ˆx) = ĝξ) 2π qξ) eiξx dξ, where ĝξ) = ω i K h y i, y)e iξy dy. A remarkable property of SVM is that ϵ-insensitive loss function 2.2) leads to sparse solutions. Unfortunately, the support vector deconvolution density estimator by Method I does not preserve sparsity in terms of coefficients ω, ω = Γ h Rα α ) as expected because a sparse decomposition in α and α is spoiled by Γ h R, which is not in general diagonal. However it is still attractive in the sense that some coefficients in ω = Γ h Rα α ) are very close to zero. Now we propose another support vector method which has a more sparse solution than Method I. The following Method II uses the support vector Regression and Tikhonov regularization method in RKHS. The difference between two methods depends on components of ϵ-insensitive loss function. In Method I, ϵ-insensitive loss function is defined by distribution function Gy) and empirical distribution function G n y), which cause it to lose sparse solution. In Method II, ϵ-insensitive loss function is

6 76 Sungho Lee defined by density function gy) and a classical kernel density estimator ĝ k, which leads to accomplish sparse solution. In kernel density estimation, ĝ k y) converge to gy) in probability if g is continuous at y and h n, nh n as n. Now we will consider the problem of approximating based on the set of data, {y, ĝ k y ), y 2, ĝ k y 2 )),..., y n, ĝ k y n ))}, by applying the support vector Regression and Tikhonov regularization method in RKHS. Method II. Step. Choose a positive definite kernel K and ϵ-insensitive loss function which consists of density function gy) and a classical kernel density estimator ĝ k y) with the kernel K. Step 2. Find a density estimator ĝy) to satisfy the following Tikhonov regularization problem by applying Representer theorem on RKHS H K and SVM regression method: ĝ = arg min g H n gy i ) ĝ k y i ) ϵ + λ 2 g 2 H K, where ĝ k y) = /nh n ) n Ky y j )/h n ) and the coefficients ω i s can be found by solving the following problem ) max min L ω, ξ, α,α,η,η ω,ξ,ξ ξ, α, α, η, η ) = ) ω i ω j K h yi, y j + C ξi + ξ ) i ηi ξ i + η ) i 2 ξ i α i ϵ + ξ ) i ĝ k y i ) + ω j K h yi, y j ϵ + ) ξ i + ĝ k y i ) ω j K h yi, y j, α i where ω, ξ, ξ, α, α, η, η. Then the coefficients ω i s can be found by solving the following quadratic programming problem and applying the equation ω = α α : min α,α 2 α α ) t Γ h α α ) ĝ k y i )α i α i ) + ϵ α i + α i ), where α i, α i C, i =,..., n, Γ h = [K h y i, y j )] n n. Step 3. Find density estimator ˆ f x) by applying the Fourier inversion formula, f ˆx) = ĝξ) 2π qξ) eiξx dξ, where ĝξ) = ω i K h y i, y)e iξy dy. This setting of Method II preserve sparsity in terms of the coefficients ω because ω = α α and α i, α i vanish for gy i ) ĝ k y i ) < ϵ. However, Method I does not preserve sparsity because ω = Γ h Rα α ) and a sparse decomposition in α and α is spoiled by Γ h R.

7 A note on SVM estimators in RKHS for the deconvolution problem Simulation and discussion The asymptotic properties of the kernel density estimator in deconvolution problems depend strongly on error distribution. Following the work of Fan 99), two types of error distributions can be considered: ordinary smooth and supersmooth distributions. Normal, mixture normal, and Cauchy distributions are supersmooth, therefore, the Fourier transform qξ) = e iξz qz)dz) of qz) satisfies ) ) d ξ β exp ξ β qξ) d ξ β exp ξ β as ξ, γ γ for some positive constants d, d, β, γ and constants β and β. Gamma and double exponential distributions are ordinary smooth, that is, the Fourier transform qξ) of qz) satisfies d ξ β qξ) d ξ β, as ξ, for some positive constants d, d, β. Thus in the classical deconvolution literature, normal and double exponential distributions have been typically selected and investigated as error distributions. In this section we compare the performance of three different deconvolution density estimators when measurement errors are Laplacian or normal: classical kernel deconvolution estimators and support vector deconvolution density estimators based on Method I and II. Target distributions are selected from distribution functions used in Hazelton and Turlach 29). When measurement errors are double exponential, the classical kernel deconvolution density estimator with Gaussian kernel is evaluated as f ˆx) = 2πn = n 2π σ h e iξx y j) Kσ h ξ) dξ qξ) x y ) j)2 2σ e 2 h σ2 Z x y 2 j σ 2 σ h h, 4.) where Kσ h ξ) = e.5σ2 h ξ2, qz) = /2σ Z )e z /σ Z, and qξ) = + σ Z ξ 2 ). The support vector deconvolution density estimator with Gaussian kernel is evaluated as f ˆx) = ω j K h y j, ξ)e iξx dξ 2π qξ) x y ) j)2 2σ = ω j e 2 h 2π σh σ2 Z x y 2 j σ 2 σ h h. 4.2) When measurement errors are normal, in order to avoid problems of integrability, a kernel K that has compactly supported Fourier transform Kω) = ω 2 ) 3 I [,] ω) is used in common. Using this kernel Kx) = {48 cos x)/πx 4 )} 5/x 2 ) {44 sin x)/πx 5 )}2 5/x 2 ), a kernel deconvolution density estimator f ˆx) in the presence of normal measurement error can be calculated as: f ˆx) = e iξx y j) Kh n ξ) dξ 2πn qξ) = nπh n ξ 2 ) 3 cos ξ x y j h n )) σ 2 Z ξ2 2h e 2 n dξ, 4.3)

8 78 Sungho Lee Table : The bandwidth used in the simulation when qz) is Gaussian f x) N, ).5N 2.5, ) +.5N2.5, ) 2 3 N, ) + 3 N,.4) varz)/varx) bandwidth= h n ) , I.4, I.4, I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure : f x) is N, ) and qz) is Gaussian. where Kξ) = ξ 2 ) 3 I [,] ξ), qz) = 2πσ Z ) e z2 /2σ 2 Z, and qξ) = e σ 2 Z ξ2 /2. The support vector deconvolution density estimator is evaluated as f ˆx) = 2π = 2π = πh n ω j K h y j, ξ ) e iξx dξ qξ) ω j hn hn ω j h 2 n ξ 2) 3 e iξy j e σ2 Z ξ2 2 e iξx dξ ξ 2 ) 3 cos ξ x y j h n )) σ 2 Z ξ2 2h e 2 n dξ. 4.4) All of the Figures 6 show plots of the classical kernel deconvolution estimates and support vector deconvolution estimates when points are randomly generated respectively from a target distribution f x) and a noise distribution, normal distribution qz) with mean zero or double exponential distribution qz) with mean zero. The measurement error variance is set at low = varz)/varx) =.), moderate = varz)/varx) =.25), and high levels = varz)/varx) =.5) as shown in Hazelton and Turlach 29). The exact probability density function f x) is shown in bold line. For the support vector deconvolution estimates, Gunn s program Gunn, 998) and MATLAB 6.5 were used. In kernel density estimation the choice of kernel is not crucial, but the choice of bandwidth is very important. In the following figures, a rule of thumb bandwidth as an initial value Fan, 99, 992; Wang and

9 A note on SVM estimators in RKHS for the deconvolution problem , I, I I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 2: f x) is.5n 2.5, ) +.5N2.5, ) and qz) is Gaussian , I.6.5, I.6.5, I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 3: f x) is 2/3N, ) + /3N,.4) and qz) is Gaussian. Wang, 2) is used: for Laplacian errors, σ h = 5σ 4 Z /n)/9 and for Gaussian errors, h n = σ Z ln n).5. Figures 3 presents the simulation study when qz) is Gaussian. The bandwidth= h n ) used for kernel deconvolution estimates 4.3) and support vector deconvolution estimates 4.4) in the simulation is in Table. All the deconvolution estimates used the same bandwidth. The parameters ϵ, C) used for all the support vector deconvolution estimates are.5, ). In the figures, and I represent kernel deconvolution estimates, support vector deconvolution estimates by Method I, and support

10 8 Sungho Lee Table 2: The bandwidth= σ h ) used in the simulation when qz) is Laplacian f x) N, ).5N 2.5, ) +.5N2.5, ) 2 3 N, ) + 3 N,.4) varz)/varx) I I.4.35 I.4.35 I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 4: f x) is N, ) and qz) is Laplasian. vector deconvolution estimates by Method II respectively. In the case of Gaussian error, all support vector deconvolution estimates by Method II show the same figures as the usual kernel deconvolution estimates as Figures 3 indicates. In Figures 3, the number of data support vectors) used for sve II is to 5 among random numbers and the number of data used for is 4 to 6. Figures 4 6 presents a simulation study when qz) is double exponential distribution with mean zero. The bandwidth= σ h ) used for kernel deconvolution estimator 4.) and support vector deconvolution estimator 4.2) in the simulation is in Table 2. The parameters ϵ, C) used for the support vector deconvolution estimates by Method I and the support vector deconvolution estimates by Method II are.5, ) and.5,.) respectively. In the case of Laplacian error, support vector deconvolution density estimates by Method II show very similar figures to the usual kernel deconvolution density estimates as Figures 4 6 indicate. In Figures 4 6, the number of data support vectors) used for sve II is 5 to 5 among random numbers. However the number of data used for is not sparse as ω = Γ h Rα α ) indicates. 5. Concluding remarks Three different deconvolution density estimators were discussed in this paper when the sample observations are contaminated by Gaussian or double exponentially distributed errors. The support vector deconvolution density estimators by Method II lead to very sparse solution and the number of support

11 A note on SVM estimators in RKHS for the deconvolution problem I I.5 I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 5: f x) is.5n 2.5, ) +.5N2.5, ) and qz) is Laplasian I.7.6 I I a) varz)/varx) =. b) varz)/varx) =.25 c) varz)/varx) =.5 Figure 6: f x) is 2/3N, ) + /3N,.4) and qz) is Laplasian. vectors is to 5 among random numbers in the simulation. Furthermore, it is as good as the usual kernel deconvolution density estimators even though the simulation study conducted is limited. In the case of Gaussian error, the support vector deconvolution density estimates by Method II show the same figures as the kernel deconvolution density estimates. The support vector deconvolution density estimator by Method I is also attractive in the sense that some coefficients in ω = Γ h Rα α ) are close to zero.

12 82 Sungho Lee Acknowledgements This research was supported by the Daegu University Research Grant, 24. References Aronszajn N 95). Theory of reproducing kernels, Transactions of the American Mathematical Society, 68, Bochner S 959). Lectures on Fourier Integral, Princeton University Press, Princeton, New Jersey. Carroll RJ and Hall P 988). Optimal rates of convergence for deconvoluting a density, Journal of the American Statistical Association, 83, Fan J 99). On the optimal rates of convergence for nonparametric deconvolution problem, Annals of Statistics, 9, Fan J 992). Deconvolution with supersmooth distribution, Canadian Journal of Statistics, 2, Girosi F 998). An equivalence between sparse approximation and support vector machines, Neural Computation,, Gunn SR 998). Support vector machines for classification and regression, Technical report, University of Southampton. Hall P and Qiu P 25). Discrete-transform approach to deconvolution problems, Biometrika, 92, Hazelton ML and Turlach BA 29). Nonparametric density deconvolution by weighted kernel estimators, Statistics and Computing, 9, Lee S 2). A support vector method for the deconvolution problem, Communications of the Korean Statistical Society, 7, Lee S 22). A note on deconvolution estimators when measurements errors are normal, Communications of the Korean Statistical Society, 9, Lee S and Taylor RL 28). A note on support vector density estimation for the deconvolution problem, Communications in Statistics: Theory and Methods, 37, Mendelsohn J and Rice R 982). Deconvolution of microfluorometric histograms with B splines, Journal of the American Statistical Association, 77, Mercer J 99). Functions of positive and negative type and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society of London, A 29, Moguerza JM and Munoz A 26). Support vector machines with applications, Statistical Science, 2, Mukherjee S and Vapnik V 999). Support vector method for multivariate density estimation, In, Proceedings in Neural Information Processing Systems, Pensky M and Vidakovic B 999). Adaptive wavelet estimator for nonparametric density deconvolutoin, Annals of Statistics, 27, Phillips DL 962). A technique for the numerical solution of integral equations of the first kind, Journal of the Association for Computing Machinery, 9, Rasmussen CE and Williams CKI 26). Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA. Scholkopf B, Herbrich R, and Smola AJ 2). A generalized representer theorem, Computational Learning Theory, Lecture Notes in Computer Science, 2, Scholkopf B and Smola AJ 22). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA.

13 A note on SVM estimators in RKHS for the deconvolution problem 83 Smola AJ and Scholkopf B 23). A tutorial on support vector regression, Statistics and Computing, 4, Stefanski L and Carroll RJ 99). Deconvoluting kernel density estimators, Statistics, 2, Tikhonov AN and Arsenin VY 977). Solution of Ill-posed Problems, W. H. Winston, Washington. Vapnik V 995). The Nature of Statistical Learning Theory, Springer Verlag, New York. Vapnik V and Chervonenkis A 964). A note on one class of perceptrons, Automation and Remote Control, 25, 3 9. Vapnik V and Lerner L 963). Pattern recognition using generalized portrait method, Automation and Remote Control, 24, Vert R and Vert J 26). Consistency and convergence rates of one-class SVMs and related algorithms, Journal of Machine Learning Research, 7, Wahba G 99). Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM, Philadelphia. Wahba G 26). Comments to support vector machines with applications by J.M. Moguerza and A. Munoz, Statistical Sciences, 2, Wang X and Wang B 2). Deconvolution estimation in measurement error models: The R package decon, Journal of Statistical Software, 39, i. Weston J, Gammerman A, Stitson M, Vapnik V, Vovk V, and Watkins C 999). Support vector density estimation. In Scholkopf, B. and Smola, A., editors, Advances in Kernel Methods-Support Vector Learning, , MIT Press, Cambridge, MA. Zhang HP 992). On deconvolution using time of flight information in positron emission tomography, Statistica Sinica, 2, Received November 3, 25; Revised January 3, 26; Accepted January 3, 26

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Complexity and regularization issues in kernel-based learning

Complexity and regularization issues in kernel-based learning Complexity and regularization issues in kernel-based learning Marcello Sanguineti Department of Communications, Computer, and System Sciences (DIST) University of Genoa - Via Opera Pia 13, 16145 Genova,

More information

On a Class of Support Vector Kernels Based on Frames in function Hilbert Spaces

On a Class of Support Vector Kernels Based on Frames in function Hilbert Spaces On a Class of Support Vector Kernels Based on Frames in function Hilbert Spaces J.B. Gao, C.J. Harris and S. R. Gunn Image, Speech and Intelligent Systems Research Group Dept. of Electronics & Computer

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

On-line Support Vector Machines and Optimization Strategies

On-line Support Vector Machines and Optimization Strategies On-line Support Vector Machines and Optimization Strategies J.M. Górriz, C.G. Puntonet, M. Salmerón, R. Martin-Clemente and S. Hornillo-Mellado and J. Ortega E.S.I., Informática, Universidad de Granada

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Support Vector Method for Multivariate Density Estimation

Support Vector Method for Multivariate Density Estimation Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Support Vector Machine Regression for Volatile Stock Market Prediction

Support Vector Machine Regression for Volatile Stock Market Prediction Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Linear Dependency Between and the Input Noise in -Support Vector Regression

Linear Dependency Between and the Input Noise in -Support Vector Regression 544 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 3, MAY 2003 Linear Dependency Between the Input Noise in -Support Vector Regression James T. Kwok Ivor W. Tsang Abstract In using the -support vector

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1656 May 1999 C.B.C.L

More information

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Probabilistic Regression Using Basis Function Models

Probabilistic Regression Using Basis Function Models Probabilistic Regression Using Basis Function Models Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract Our goal is to accurately estimate

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Mercer s Theorem, Feature Maps, and Smoothing

Mercer s Theorem, Feature Maps, and Smoothing Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics,

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan In The Proceedings of ICONIP 2002, Singapore, 2002. NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION Haiqin Yang, Irwin King and Laiwan Chan Department

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

Smooth Bayesian Kernel Machines

Smooth Bayesian Kernel Machines Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

An introduction to Support Vector Machines

An introduction to Support Vector Machines 1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers

More information

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L

More information

Statistical Properties and Adaptive Tuning of Support Vector Machines

Statistical Properties and Adaptive Tuning of Support Vector Machines Machine Learning, 48, 115 136, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Statistical Properties and Adaptive Tuning of Support Vector Machines YI LIN yilin@stat.wisc.edu

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Songfeng Zheng Department of Mathematics Missouri State University Springfield, MO 65897 SongfengZheng@MissouriState.edu

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Support Vector Machines and Kernel Algorithms

Support Vector Machines and Kernel Algorithms Support Vector Machines and Kernel Algorithms Bernhard Schölkopf Max-Planck-Institut für biologische Kybernetik 72076 Tübingen, Germany Bernhard.Schoelkopf@tuebingen.mpg.de Alex Smola RSISE, Australian

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1651 October 1998

More information

Kernel Methods for Computational Biology and Chemistry

Kernel Methods for Computational Biology and Chemistry Kernel Methods for Computational Biology and Chemistry Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech Mathematical Statistics and Applications,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Are Loss Functions All the Same?

Are Loss Functions All the Same? Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-025 CBCL-268 May 1, 2007 Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert massachusetts institute

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Kernel Partial Least Squares for Nonlinear Regression and Discrimination

Kernel Partial Least Squares for Nonlinear Regression and Discrimination Kernel Partial Least Squares for Nonlinear Regression and Discrimination Roman Rosipal Abstract This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing

More information

Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines Advances in Computational Mathematics x (1999) x-x 1 Regularization Networks and Support Vector Machines Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio Center for Biological and Computational Learning

More information

Definition and basic properties of heat kernels I, An introduction

Definition and basic properties of heat kernels I, An introduction Definition and basic properties of heat kernels I, An introduction Zhiqin Lu, Department of Mathematics, UC Irvine, Irvine CA 92697 April 23, 2010 In this lecture, we will answer the following questions:

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Kernel Methods. Charles Elkan October 17, 2007

Kernel Methods. Charles Elkan October 17, 2007 Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Bernhard Schölkopf Max-Planck-Institut für biologische Kybernetik 72076 Tübingen, Germany Bernhard.Schoelkopf@tuebingen.mpg.de Alex Smola RSISE, Australian National

More information

Robust Kernel-Based Regression

Robust Kernel-Based Regression Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Regularization and statistical learning theory for data analysis

Regularization and statistical learning theory for data analysis Computational Statistics & Data Analysis 38 (2002) 421 432 www.elsevier.com/locate/csda Regularization and statistical learning theory for data analysis Theodoros Evgeniou a;, Tomaso Poggio b, Massimiliano

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

LMS Algorithm Summary

LMS Algorithm Summary LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal

More information

Topics in Harmonic Analysis Lecture 1: The Fourier transform

Topics in Harmonic Analysis Lecture 1: The Fourier transform Topics in Harmonic Analysis Lecture 1: The Fourier transform Po-Lam Yung The Chinese University of Hong Kong Outline Fourier series on T: L 2 theory Convolutions The Dirichlet and Fejer kernels Pointwise

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information