Derivative reproducing properties for kernel methods in learning theory
|
|
- Randell Rich
- 5 years ago
- Views:
Transcription
1 Journal of Computational and Applied Mathematics 220 (2008) Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics, City University of Hong ong, owloon, Hong ong, China Received 27 June 2007 Abstract The regularity of functions from reproducing kernel Hilbert spaces (RHSs) is studied in the setting of learning theory. We provide a reproducing property for partial derivatives up to order s when the Mercer kernel is C 2s. For such a kernel on a general domain we show that the RHS can be embedded into the function space C s. These observations yield a representer theorem for regularized learning algorithms involving data for function values and gradients. Examples of Hermite learning and semi-supervised learning penalized by gradients on data are considered Elsevier B.V. All rights reserved. MSC: 68T05; 62J02 eywords: Learning theory; Reproducing kernel Hilbert spaces; Derivative reproducing; Representer theorem; Hermite learning and semi-supervised learning. Introduction Reproducing kernel Hilbert spaces (RHSs) form an important class of function spaces in learning theory. Their reproducing property together with the Hilbert space structure ensures the effectiveness of many practical learning algorithms implemented in these function spaces. Let X be a separable metric space and : X X R be a continuous and symmetric function such that for any finite set of points {x,...,x l } X, the matrix ((x i,x j )) l i,j= is positive semidefinite. Such a function is called a Mercer kernel. The RHS H associated with the kernel is defined (see [2]) to be the completion of the linear span of the set of functions { x := (x, ) : x X} with the inner product, given by x, y = (x,y). That is, i α i xi, j β j yj = i,j α iβ j (x i,y j ). The reproducing property takes the form x,f = f(x) x X, f H. (.) Tel.: ; fax: address: mazhou@cityu.edu.hk /$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:0.06/j.cam
2 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Learning algorithms by regularization in RHS A large family of learning algorithms are generated by regularization schemes in RHS. Such a scheme can be expressed [6] in terms of a sample z ={(x i,y i )} m i= (X R)m for learning and a loss function V : R 2 R + as { } m f z,λ = arg min V(y i,f(x i )) + λ f 2, λ > 0. (.2) f H m i= The reproducing property (.) makes solution (.2) to a minimization problem over H, a possibly infinite dimensional space, achieved in the finite dimensional subspace spanned by { xi } m i=. This is called a representer theorem [5]. When V is convex with respect to the second variable, (.2) can be solved by a convex optimization problem for the coefficients of f z,λ = m i= c i xi over R m. For some special loss functions such as those in support vector machines, this convex optimization problem is actually a convex quadratic programming one, hence many efficient computing tools are available. This makes the kernel method in learning theory very powerful in various applications. For regression problems, one often takes the loss function to be V(y,f(x)) = ψ(y f(x)) with ψ : R R + a convex function. In particular, for the least square regression, we choose ψ(t) = t 2 and for the support vector machine regression, we choose ψ(t) = max{0, t ε},anε-insensitive loss function with some threshold ε > 0. For binary classification problems, y {, }, so one usually sets V(y,f(x)) = (yf (x)) with : R R + a convex function. In particular, for the least square classification, ψ(t) = ( t) 2 ; for the support vector machine classification, we choose to be the hinge loss (t) = max{0, t} or the support vector machine q-norm loss q (t) = (max{0, t}) q with <q<..2. Learning with gradients For some applications, one may have gradient data or unlabelled data available for improving learning ability [3,7]. Such situations yield learning algorithms involving data for function values or their gradients. Here X R n and with n variables {x,...,x n } of R n, the gradient of a differentiable function f : X R is a vector formed by its partial derivatives as f = ( f/ x,..., f/ x n ) T. Example (Semi-supervised learning with gradients of functions in H ). If z={(x i,y i )} m i= (X R)m are labelled data and u={x i } m+l i=m+ Xl are unlabelled data, we introduce a semi-supervised learning algorithm involving gradients as { } m f z,u,λ,μ = arg min V(y i,f(x i )) + μ m+l f(x i ) 2 + λ f 2. (.3) f H m m + l i= Here λ, μ > 0 are two regularization parameters. Example 2 (Hermite learning with gradient data). Assume in addition to the data z approximating values of a desired function f (i.e. y i f(x i )), we get sampling values y ={y i }m i=,y i Rn, for the gradients of f (i.e. y i f(x i )), then we introduce an Hermite learning algorithm by learning the function values and gradients simultaneously as { } m f z,y,λ = arg min [(y i f(x i )) 2 + y i f H m f(x i) 2 ]+λ f 2. (.4) i= To solve optimization problems like (.3) and (.4) with effective computing tools (such as those for convex quadratic programming), we study representer theorems and need reproducing properties for gradients similar to (.). This is the first purpose of this paper..3. Capacity of RHS Learning ability of algorithms in H depends on the kernel, a loss function measuring errors and probability distributions from which the samples are drawn [,]. Its quantitative estimates in terms of H rely on two features i=
3 458 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) of the RHS: the approximation power and the capacity [2,5,6,7]. The latter can be measured by covering numbers of the unit ball of the RHS as a subset of C(X). These covering numbers have been extensively studied in learning theory, see e.g. [,9,8]. In particular, when X =[0, ] n and is C s with s not being an even integer, an explicit bound for the covering numbers was presented in [9]. This was done by showing that H can be embedded in C s/2 (X), a Hölder space on X. The embedding result yields error estimates in the C s/2 metric by means of bounds in the H metric for learning algorithms. For example, for the least square regularized regression algorithm (.2) with V(y,f(x)) = (f (x) y) 2, when C 2+ε (X X) with some ε > 0, rates of the error f z,λ f ρ C (X) for learning the regression function f ρ was provided in [4] from bounds for f z,λ f ρ. A natural question is whether the extra ε > 0 can be omitted. The general problem for C s is whether H can be embedded in C s/2 (X) when X is a general domain in R n or when s is an even integer. Solving this general question is the second purpose of this paper. The main difficulty we overcome here is the lack of regularity of the general domain. 2. Reproducing partial derivatives in RHS To allow a general situation, we would not assume any regularity for the boundary of X. To consider partial derivatives, we assume that the interior of X is nonempty. For s Z +, we denote an index set I s := {α Z n + : α s} where α = n j= α j for α = (α,...,α n ) Z n +. For a function f of n variables and x = (x,...,x n ) R n, we denote its partial derivative D α f at x (if it exists) as D α f(x)= D α...dαn n f(x)= α f(x). (x ) α... (x n αn ) Definition (Ziemer [2]). Let X be a compact subset of R n which is the closure of its nonempty interior X o. Define C s (X o ) to be the space of functions f on X o such that D α f is well-defined and continuous on X o for each α I s. Define C s (X) to be the space of continuous functions f on X such that f X o C s (X o ) and for each α I s, D α (f X o) has a continuous extension to X denoted as D α f. In particular, the extension of f X o is f itself. The linear space C s (X) is actually a Banach space with the norm f C s (X) = α I s D α f = α I s sup D α (f X o)(x). x X o Observe that for f C (X), the gradient f equals (D e f,...,d en f)where e j is the jth standard unit vector in R n. The property of reproducing partial derivatives of functions in H is given by partial derivatives of the Mercer kernel. If C 2s (X X), for α I s we extend α to Z 2n + by adding zeros to the last n components and denote the partial derivative of as D α. That is, α D α (x,y) = (x ) α... (x n ) (x,...,x n,y,...,y n ), αn x, y X o and D α is a continuous extension of D α ( X o X o) to X X.Forx X, denote (Dα ) x as the function on X given by (D α ) x (y) = D α (x,y). By the symmetry of,wehave D α ( y )(x) = (D α ) x (y) = D α (x,y) x,y X. (2.) Now we can give the result on reproducing partial derivatives and embedding of H into C s (X). Here (.) and the weak compactness of a closed ball of a Hilbert space play an important role. Theorem. Let s N and : X X R be a Mercer kernel such that C 2s (X X). Then the following statements hold: (a) For any x X and α I s, (D α ) x H. (b) A partial derivative reproducing property holds true for α I s : D α f(x)= (D α ) x,f x X, f H. (2.2)
4 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) (c) The inclusion J : H C s (X) is well-defined and bounded: f C s (X) n s C 2s (X X) f f H. (2.3) (d) J is compact. More strongly, for any closed bounded subset B of H, J(B)is a compact subset of C s (X). Proof. We first prove (a) and (b) together by induction on α =0,,...,s. The case α =0 is trivial since α = 0 and (D 0 ) x = x satisfies (.). Let 0 l s. Suppose that (D α ) x H and (2.2) holds for any x X and α Z n + implies that for any y X, with α =l. Then (2.2) (D α ) y,(d α ) x = D α ((D α ) x )(y) = D α (D α (x, ))(y) = D (α,α) (x,y). (2.4) Here (α, α) Z 2n + is formed by α in the first and second sets of n components. Now we turn to the case l +. Consider the index α + e j with α + e j =l +. We prove (a) and (b) for this index in four steps. Step : Proving (D α+ej ) x H for x X o. Since x X o, there exists some r>0 such that x +{y R n : y r} X o. Then by (2.4), the set {(/t)((d α ) x+te j (D α ) x ) : t r} of functions in H satisfies t ((Dα ) x+te j (D α ) x ) 2 = t 2 {D(α,α) (x + te j,x+ te j ) D (α,α) (x + te j,x) D (α,α) (x,x + te j ) + D (α,α) (x,x)} D (α+ej,α+e j ) t r. Here we have used the assumption C 2s (X X) and (α + e j, α + e j ) =2 α +2 = 2l + 2 2s. That means {(/t)((d α ) x+te j (D α ) x ) : t r} lies in the closed ball of the Hilbert space H with a finite radius D (α+ej,α+e j ). Since this ball is weakly compact, there is a sequence {t i } i= with t i r and lim i t i = 0 such that {(/t i )((D α ) x+ti e j (Dα ) x )} converges weakly to an element g x of H as i. The weak convergence tells us that lim i t i ((D α ) x+ti e j (Dα ) x ), f In particular, by taking f = y with y X, there holds g x (y) = lim ((D α ) i t x+ti e j (Dα ) x ), y. i By (2.2) for α and (2.) we have g x (y) = lim i = lim i = g x,f f H. (2.5) (D α ( y )(x + t i e j ) D α ( y )(x)) t i (D α (x + t i e j,y) D α (x,y)) = D α+ej (x,y) = (D α+ej ) t x (y). i This is true for an arbitrary point y X. Hence (D α+ej ) x = g x as functions on X. Since g x H, we know (D α+ej ) x H. Step 2: Proving for x X o the convergence t ((Dα ) x+te j (D α ) x ) (D α+ej ) x in H (t 0). (2.6)
5 460 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Applying (2.5) and (2.2) for α to the function (D α+ej ) x H yields (D α+ej ) x,(d α+ej ) x = lim {D α ((D α+ej ) i t x )(x + t i e j ) D α ((D α+ej ) x )(x)} i = lim {D α (D α+ej (x, ))(x + t i e j ) D α (D α+ej (x, ))(x)} i t i = D (α+ej,α+e j ) (x,x). This in connection with (2.2) implies t ((Dα ) x+te j (D α 2 ) x ) (D α+ej ) x = t 2 {D(α,α) (x + te j,x+ te j ) 2D (α,α) (x + te j,x)+ D (α,α) (x,x)} 2 t {Dα ((D α+ej ) x )(x + te j ) D α ((D α+ej ) x )(x)}+d (α+ej,α+e j ) (x,x) = t t t 2 D (α+ej,α+e j ) (x + ue j,x+ ve j ) du dv 0 0 t 2 D (α+ej,α+e j ) (x,x + ve j ) dv + D (α+ej,α+e j ) (x,x) t 0 = t t t 2 {D (α+ej,α+e j ) (x + ue j,x+ ve j ) 0 0 2D (α+ej,α+e j ) (x,x + ve j ) + D (α+ej,α+e j ) (x,x)} du dv. If we define the modulus of continuity for a function g C(X X) to be a function of δ (0, ) as ω(g, δ) := sup{ g(x,y ) g(x 2,y 2 ) :x i,y i X with x x 2 δ, y y 2 δ}, (2.7) we know from the uniform continuity of g that lim δ 0+ ω(g, δ) = 0. Moreover, the function ω(g, δ) is continuous on (0, ). Using the modulus of continuity for the function D (α+ej,α+e j ) C(X X) we see that t ((Dα ) x+te j (D α 2 ) x ) (D α+ej ) x 2ω(D (α+ej,α+e j ), t ). (2.8) This converges to zero as t 0. Therefore (2.6) holds true. Step 3: Proving (2.2) for x X o and α + e j. Let f H. By (2.6) we have (D α+ej ) x,f = lim t 0 t ((Dα ) x+te j (D α ) x ), f By (2.2) for α, we see that this equals (D α+ej ) x,f = lim t 0 t {Dα f(x+ te j ) D α f(x)}. That is, D α+ej f(x)exists and equals (D α+ej ) x,f. This verifies (2.2) for α + e j. Step 4: Proving (a) and (b) for x X := X\X o. Notice from the first three steps that for x,x X o, there holds (D α+ej ) x (D α+ej ) x 2 ={D(α+ej,α+e j ) (x,x ) D (α+ej,α+e j ) (x,x ) D (α+ej,α+e j ) (x,x ) + D (α+ej,α+e j ) (x,x )} 2ω(D (α+ej,α+e j ), x x )..
6 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) It follows that for any sequence {x (i) X o } i= converging to x, the sequence of functions {(Dα+ej ) x (i)} is a Cauchy sequence in the Hilbert space H. So it converges to a limit function h H. Applying what we have proved for x (i) X o we get h(y) = h, y = lim i (Dα+ej ) x (i), y = (D α+ej ) x (y) y X. This verifies (D α+ej ) x = h H. Let f H. We define a function f [j] on X as f [j] (x) = (D α+ej ) x,f, x X. By the conclusion in Step 3, we know that f [j] (x) = D α+ej f(x)for x X o, hence f [j] is continuous on X o. Let us now prove the continuity of f [j] at each x X.If{x (i) X o } i= is a sequence satisfying lim i x (i) =x, then the above proof tells us that (D α+ej ) x (i) converges (D α+ej ) x in the H metric meaning that lim i (D α+ej ) x (i) (D α+ej ) x = 0. So by the definition of f [j] and the Schwarz inequality, we have f [j] (x (i) ) f [j] (x) = (D α+ej ) x (i) (D α+ej ) x,f (D α+ej ) x (i) (D α+ej ) x f 0 as i. Thus the function f [j] is continuous on X, and it is a continuous extension of D α+ej f from X o onto X. So (b) holds true for x X. This completes the induction procedure for proving the statements in (a) and (b). (c) We use (2.2) and (2.4). For f H, x, x X and α I s, the Schwarz inequality implies Hence D α f(x) D α f( x) = (D α ) x (D α ) x,f (D α ) x (D α ) x f D α f(x) D α f( x) D (α,α) (x,x) 2D (α,α) (x, x) + D (α,α) ( x, x) f. 2ω(D (α,α), x x ) f x, x X. (2.9) As lim δ 0+ ω(d (α,α),δ) = 0, we know that D α f C(X). It means f C s (X) and the inclusion J is well-defined. To see the boundedness, we apply the Schwarz inequality again and have D α f(x) = (D α ) x,f D (α,α) (x,x) f D (α,α) f. It follows that f C s (X) = α I s D α f α I s D (α,α) f n s α I s D (α,α) f. Then (2.3) is verified. (d) If B is a closed bounded subset of H, there is some R>0such that B {f H : f R}. To show that J(B)is compact, let {f j } j= be a sequence in B. The estimate (2.9) tells us that for each α I s and j N, D α f j (x) D α f j ( x) 2ω(D (α,α), x x )R x, x X. It says that the sequence of functions {D α f j } j= is uniformly continuous. This is true for each α I s. So by taking subsequences for α (one-after-one), we know that there is a subsequence {f jl } l= which converges to a function f C s (X) in the metric C s (X). Observe that {f jl } l= lies in the ball of H with radius R which is weakly compact, it contains a subsequence {f jlk } k= which is also a subsequence of {f j } j= and converges weakly to a function f H in the metric. According to (2.2), the weak convergence in H tells us that {f jlk } k= converges to f in the metric C s (X). Therefore, f = f H and {f j } j= J(B) contains a subsequence which converges in Cs (X) to f. This proves that J(B)is compact. The proof of Theorem is complete.
7 462 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Theorem can be extended to other kernels [0]. Relation (2.3) tells us that the error bounds in the norm can be used to estimate convergence rates of learning algorithms in the norm C s (X), as done in [4]. 3. Representer theorems for learning with derivative data A general learning algorithm of regularization in H involving partial derivative data takes the form { m } f x, y,λ = arg min V i ( y i, {D α f(x i )} α Ji ) + λ f 2, (3.) f H i= where for each i {,...,m}, x i X, y i is a vector, J i is a subset of I s and V i is a loss function with values in R + of compatible variables. Denote the number of elements in the set J i as #(J i ). The partial derivative reproducing property (2.2) stated in Theorem enables us to derive a representer theorem for the learning algorithm (3.), which asserts that the minimization over the possibly infinite dimensional space H can be achieved in a finite dimensional subspace generated by { xi } and their partial derivatives. Theorem 2. Let s N and : X X R be a Mercer kernel such that C 2s (X X). If λ > 0, then the solution f x, y,λ of scheme (3.) exists and lies in the subspace spanned by {(D α ) xi : α J i,i=,...,m}. If we write f x, y,λ = m i= α J i c i,α (Dα ) xi with c = (c i,α ) α J i,i=,...,m R N where N = m i= #(J i ), then c = arg min c R N +λ m i= m m y i, c j,β D (β,α) (x j,x i ) j= β J j m c i,α c j,β D (β,α) (x j,x i ). β J j V i i= α J i j= α J i Proof. By Theorem, we know that for any α in J i, the function (D α ) xi lies in H. Denote the subspace of H spanned by {(D α ) xi : α J i,i =,...,m} as H,x. Let P be the orthogonal projection onto this subspace. Then for any f H, the function f P(f) is orthogonal to H,x. In particular, f P(f),(D α ) xi = 0 for any α J i and i m. This in connection with the partial derivative reproducing property (2.2) tells us that D α (f P (f ))(x i ) = D α f(x i ) D α (P (f ))(x i ) = 0 α J i, i =,...,m. Thus, if we denote E z (f )= m i= V i ( y i, {D α f(x i )} α Ji ), we see that E z (f )=E z (P (f )). Notice that P(f) f and the strict inequality holds unless f = P(f), i.e., f H,x. Therefore, min {E z (f ) + λ f 2 }= min {E z (f ) + λ f 2 } f H f H,x and a minimizer f x, y,λ exists and lies in H,x since the subspace is finite dimensional. The second statement is trivial. We shall not discuss learning rates of the learning algorithms (.3) and (.4). Though rough estimates can be given using methods from [4,8,3], satisfactory error analysis for more general learning algorithms [9,20] will be done later. Acknowledgements The work described in this paper was supported by the Research Grants Council of the Hong ong Special Administrative Region, China (Project No. CityU 03704) and a Joint Research Fund for Hong ong and Macau Young Scholars from the National Science Fund for Distinguished Young Scholars (Project No ).
8 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) References [] M. Anthony, P.L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, 999. [2] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (950) [3] M. Belkin, P. Niyogi, Semisupervised learning on Riemannian manifolds, Mach. Learn. 56 (2004) [4] E. De Vito, A. Caponnetto, L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math. 5 (2005) [5] E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res. 5 (2004) [6] T. Evgeniou, M. Pontil, T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 3 (2000) 50. [7] D. Hardin, I. Tsamardinos, C.F. Aliferis, A theoretical characterization of linear SVM-based feature selection, Proceedings of the 2st International Conference on Machine Learning, Banff, Canada, [8] S. Mukherjee, P. Niyogi, T. Poggio, R. Rifkin, Learning theory: stability is sufficient for generalization and necessary and sufficient for empirical risk minimization, Adv. Comput. Math. 25 (2006) [9] S. Mukherjee, Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res. 7 (2006) [0] R. Schaback, J. Werner, Linearly constrained reconstruction of functions by kernels with applications to machine learning, Adv. Comput. Math. 25 (2006) [] J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Trans. Inform. Theory 44 (998) [2] S. Smale, D.X. Zhou, Estimating the approximation error in learning theory, Anal. Appl. (2003) 7 4. [3] S. Smale, D.X. Zhou, Shannon sampling and function reconstruction from point values, Bull. Amer. Math. Soc. 4 (2004) [4] S. Smale, D.X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx. 26 (2007) [5] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 990. [6] Y. Yao, On complexity issue of online learning algorithms, IEEE Trans. Inform. Theory, to appear. [7] Y. Ying, Convergence analysis of online algorithms, Adv. Comput. Math. 27 (2007) [8] D.X. Zhou, The covering number in learning theory, J. Complexity 8 (2002) [9] D.X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory 49 (2003) [20] D.X. Zhou,. Jetter, Approximation with polynomial kernels and SVM classifiers, Adv. Comput. Math. 25 (2006) [2] W.P. Ziemer, Weakly Differentiable Functions, Springer, New York, 989.
Geometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationAre Loss Functions All the Same?
Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationAn introduction to some aspects of functional analysis
An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms
More informationA note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationNOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS
NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS K. Jarosz Southern Illinois University at Edwardsville, IL 606, and Bowling Green State University, OH 43403 kjarosz@siue.edu September, 995 Abstract. Suppose
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationBINARY CLASSIFICATION
BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the
More informationLinear Analysis Lecture 5
Linear Analysis Lecture 5 Inner Products and V Let dim V < with inner product,. Choose a basis B and let v, w V have coordinates in F n given by x 1. x n and y 1. y n, respectively. Let A F n n be the
More informationAnalysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t
Analysis Comprehensive Exam Questions Fall 2. Let f L 2 (, ) be given. (a) Prove that ( x 2 f(t) dt) 2 x x t f(t) 2 dt. (b) Given part (a), prove that F L 2 (, ) 2 f L 2 (, ), where F(x) = x (a) Using
More informationPROBLEMS. (b) (Polarization Identity) Show that in any inner product space
1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationYour first day at work MATH 806 (Fall 2015)
Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies
More informationBoundedly complete weak-cauchy basic sequences in Banach spaces with the PCP
Journal of Functional Analysis 253 (2007) 772 781 www.elsevier.com/locate/jfa Note Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP Haskell Rosenthal Department of Mathematics,
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationLearning Theory of Randomized Kaczmarz Algorithm
Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,
More informationViscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert spaces
Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 016, 4478 4488 Research Article Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationMulti-view Laplacian Support Vector Machines
Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationProblem Set 6: Solutions Math 201A: Fall a n x n,
Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series
More informationBERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS
BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS SHOO SETO Abstract. These are the notes to an expository talk I plan to give at MGSC on Kähler Geometry aimed for beginning graduate students in hopes to motivate
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,
More informationFunctional Analysis MATH and MATH M6202
Functional Analysis MATH 36202 and MATH M6202 1 Inner Product Spaces and Normed Spaces Inner Product Spaces Functional analysis involves studying vector spaces where we additionally have the notion of
More informationMercer s Theorem, Feature Maps, and Smoothing
Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics,
More informationLinear Normed Spaces (cont.) Inner Product Spaces
Linear Normed Spaces (cont.) Inner Product Spaces October 6, 017 Linear Normed Spaces (cont.) Theorem A normed space is a metric space with metric ρ(x,y) = x y Note: if x n x then x n x, and if {x n} is
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationSobolev Spaces. Chapter Hölder spaces
Chapter 2 Sobolev Spaces Sobolev spaces turn out often to be the proper setting in which to apply ideas of functional analysis to get information concerning partial differential equations. Here, we collect
More informationGeneralizing the Bias Term of Support Vector Machines
Generalizing the Bias Term of Support Vector Machines Wenye Li and Kwong-Sak Leung and Kin-Hong Lee Department of Computer Science and Engineering The Chinese University of Hong Kong wyli, ksleung, khlee}@cse.cuhk.edu.hk
More informationYour first day at work MATH 806 (Fall 2015)
Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationPROJECTIONS ONTO CONES IN BANACH SPACES
Fixed Point Theory, 19(2018), No. 1,...-... DOI: http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html PROJECTIONS ONTO CONES IN BANACH SPACES A. DOMOKOS AND M.M. MARSH Department of Mathematics and Statistics
More informationOnline Learning with Samples Drawn from Non-identical Distributions
Journal of Machine Learning Research 10 (009) 873-898 Submitted 1/08; Revised 7/09; Published 1/09 Online Learning with Samples Drawn from Non-identical Distributions Ting Hu Ding-uan hou Department of
More informationRegularization on Discrete Spaces
Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de
More information1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,
Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationl(y j ) = 0 for all y j (1)
Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationBEST APPROXIMATIONS AND ORTHOGONALITIES IN 2k-INNER PRODUCT SPACES. Seong Sik Kim* and Mircea Crâşmăreanu. 1. Introduction
Bull Korean Math Soc 43 (2006), No 2, pp 377 387 BEST APPROXIMATIONS AND ORTHOGONALITIES IN -INNER PRODUCT SPACES Seong Sik Kim* and Mircea Crâşmăreanu Abstract In this paper, some characterizations of
More informationStrictly Positive Definite Functions on a Real Inner Product Space
Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNUMERICAL RADIUS OF A HOLOMORPHIC MAPPING
Geometric Complex Analysis edited by Junjiro Noguchi et al. World Scientific, Singapore, 1995 pp.1 7 NUMERICAL RADIUS OF A HOLOMORPHIC MAPPING YUN SUNG CHOI Department of Mathematics Pohang University
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationProf. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India
CHAPTER 9 BY Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India E-mail : mantusaha.bu@gmail.com Introduction and Objectives In the preceding chapters, we discussed normed
More informationOn Some Estimates of the Remainder in Taylor s Formula
Journal of Mathematical Analysis and Applications 263, 246 263 (2) doi:.6/jmaa.2.7622, available online at http://www.idealibrary.com on On Some Estimates of the Remainder in Taylor s Formula G. A. Anastassiou
More informationFUNCTIONAL ANALYSIS HAHN-BANACH THEOREM. F (m 2 ) + α m 2 + x 0
FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM If M is a linear subspace of a normal linear space X and if F is a bounded linear functional on M then F can be extended to M + [x 0 ] without changing its norm.
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationThe Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York
The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies
More informationDistributed Learning with Regularized Least Squares
Journal of Machine Learning Research 8 07-3 Submitted /5; Revised 6/7; Published 9/7 Distributed Learning with Regularized Least Squares Shao-Bo Lin sblin983@gmailcom Department of Mathematics City University
More informationON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES
U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 3, 2018 ISSN 1223-7027 ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES Vahid Dadashi 1 In this paper, we introduce a hybrid projection algorithm for a countable
More informationLearning with Consistency between Inductive Functions and Kernels
Learning with Consistency between Inductive Functions and Kernels Haixuan Yang Irwin King Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, N.T., Hong
More informationRelationships between upper exhausters and the basic subdifferential in variational analysis
J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationConstruction of Smooth Fractal Surfaces Using Hermite Fractal Interpolation Functions. P. Bouboulis, L. Dalla and M. Kostaki-Kosta
BULLETIN OF THE GREEK MATHEMATICAL SOCIETY Volume 54 27 (79 95) Construction of Smooth Fractal Surfaces Using Hermite Fractal Interpolation Functions P. Bouboulis L. Dalla and M. Kostaki-Kosta Received
More informationAvailable online at J. Nonlinear Sci. Appl., 10 (2017), Research Article
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 2719 2726 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa An affirmative answer to
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationHomework I, Solutions
Homework I, Solutions I: (15 points) Exercise on lower semi-continuity: Let X be a normed space and f : X R be a function. We say that f is lower semi - continuous at x 0 if for every ε > 0 there exists
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationIterative algorithms based on the hybrid steepest descent method for the split feasibility problem
Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (206), 424 4225 Research Article Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem Jong Soo
More informationFunctional Analysis Exercise Class
Functional Analysis Exercise Class Week 2 November 6 November Deadline to hand in the homeworks: your exercise class on week 9 November 13 November Exercises (1) Let X be the following space of piecewise
More informationSolving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels
Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation
More informationREAL AND COMPLEX ANALYSIS
REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any
More informationCOMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM
COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM A metric space (M, d) is a set M with a metric d(x, y), x, y M that has the properties d(x, y) = d(y, x), x, y M d(x, y) d(x, z) + d(z, y), x,
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999
Scientiae Mathematicae Vol. 3, No. 1(2000), 107 115 107 ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI Received December 14, 1999
More informationTechinical Proofs for Nonlinear Learning using Local Coordinate Coding
Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect
More informationAnalysis-3 lecture schemes
Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationOn the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1656 May 1999 C.B.C.L
More information(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define
Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that
More information3. Some tools for the analysis of sequential strategies based on a Gaussian process prior
3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationA collocation method for solving some integral equations in distributions
A collocation method for solving some integral equations in distributions Sapto W. Indratno Department of Mathematics Kansas State University, Manhattan, KS 66506-2602, USA sapto@math.ksu.edu A G Ramm
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationUNIFORM BOUNDS FOR BESSEL FUNCTIONS
Journal of Applied Analysis Vol. 1, No. 1 (006), pp. 83 91 UNIFORM BOUNDS FOR BESSEL FUNCTIONS I. KRASIKOV Received October 8, 001 and, in revised form, July 6, 004 Abstract. For ν > 1/ and x real we shall
More information2. Function spaces and approximation
2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C
More informationReceived 8 June 2003 Submitted by Z.-J. Ruan
J. Math. Anal. Appl. 289 2004) 266 278 www.elsevier.com/locate/jmaa The equivalence between the convergences of Ishikawa and Mann iterations for an asymptotically nonexpansive in the intermediate sense
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationImplicit Functions, Curves and Surfaces
Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then
More information