Derivative reproducing properties for kernel methods in learning theory

Size: px
Start display at page:

Download "Derivative reproducing properties for kernel methods in learning theory"

Transcription

1 Journal of Computational and Applied Mathematics 220 (2008) Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics, City University of Hong ong, owloon, Hong ong, China Received 27 June 2007 Abstract The regularity of functions from reproducing kernel Hilbert spaces (RHSs) is studied in the setting of learning theory. We provide a reproducing property for partial derivatives up to order s when the Mercer kernel is C 2s. For such a kernel on a general domain we show that the RHS can be embedded into the function space C s. These observations yield a representer theorem for regularized learning algorithms involving data for function values and gradients. Examples of Hermite learning and semi-supervised learning penalized by gradients on data are considered Elsevier B.V. All rights reserved. MSC: 68T05; 62J02 eywords: Learning theory; Reproducing kernel Hilbert spaces; Derivative reproducing; Representer theorem; Hermite learning and semi-supervised learning. Introduction Reproducing kernel Hilbert spaces (RHSs) form an important class of function spaces in learning theory. Their reproducing property together with the Hilbert space structure ensures the effectiveness of many practical learning algorithms implemented in these function spaces. Let X be a separable metric space and : X X R be a continuous and symmetric function such that for any finite set of points {x,...,x l } X, the matrix ((x i,x j )) l i,j= is positive semidefinite. Such a function is called a Mercer kernel. The RHS H associated with the kernel is defined (see [2]) to be the completion of the linear span of the set of functions { x := (x, ) : x X} with the inner product, given by x, y = (x,y). That is, i α i xi, j β j yj = i,j α iβ j (x i,y j ). The reproducing property takes the form x,f = f(x) x X, f H. (.) Tel.: ; fax: address: mazhou@cityu.edu.hk /$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:0.06/j.cam

2 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Learning algorithms by regularization in RHS A large family of learning algorithms are generated by regularization schemes in RHS. Such a scheme can be expressed [6] in terms of a sample z ={(x i,y i )} m i= (X R)m for learning and a loss function V : R 2 R + as { } m f z,λ = arg min V(y i,f(x i )) + λ f 2, λ > 0. (.2) f H m i= The reproducing property (.) makes solution (.2) to a minimization problem over H, a possibly infinite dimensional space, achieved in the finite dimensional subspace spanned by { xi } m i=. This is called a representer theorem [5]. When V is convex with respect to the second variable, (.2) can be solved by a convex optimization problem for the coefficients of f z,λ = m i= c i xi over R m. For some special loss functions such as those in support vector machines, this convex optimization problem is actually a convex quadratic programming one, hence many efficient computing tools are available. This makes the kernel method in learning theory very powerful in various applications. For regression problems, one often takes the loss function to be V(y,f(x)) = ψ(y f(x)) with ψ : R R + a convex function. In particular, for the least square regression, we choose ψ(t) = t 2 and for the support vector machine regression, we choose ψ(t) = max{0, t ε},anε-insensitive loss function with some threshold ε > 0. For binary classification problems, y {, }, so one usually sets V(y,f(x)) = (yf (x)) with : R R + a convex function. In particular, for the least square classification, ψ(t) = ( t) 2 ; for the support vector machine classification, we choose to be the hinge loss (t) = max{0, t} or the support vector machine q-norm loss q (t) = (max{0, t}) q with <q<..2. Learning with gradients For some applications, one may have gradient data or unlabelled data available for improving learning ability [3,7]. Such situations yield learning algorithms involving data for function values or their gradients. Here X R n and with n variables {x,...,x n } of R n, the gradient of a differentiable function f : X R is a vector formed by its partial derivatives as f = ( f/ x,..., f/ x n ) T. Example (Semi-supervised learning with gradients of functions in H ). If z={(x i,y i )} m i= (X R)m are labelled data and u={x i } m+l i=m+ Xl are unlabelled data, we introduce a semi-supervised learning algorithm involving gradients as { } m f z,u,λ,μ = arg min V(y i,f(x i )) + μ m+l f(x i ) 2 + λ f 2. (.3) f H m m + l i= Here λ, μ > 0 are two regularization parameters. Example 2 (Hermite learning with gradient data). Assume in addition to the data z approximating values of a desired function f (i.e. y i f(x i )), we get sampling values y ={y i }m i=,y i Rn, for the gradients of f (i.e. y i f(x i )), then we introduce an Hermite learning algorithm by learning the function values and gradients simultaneously as { } m f z,y,λ = arg min [(y i f(x i )) 2 + y i f H m f(x i) 2 ]+λ f 2. (.4) i= To solve optimization problems like (.3) and (.4) with effective computing tools (such as those for convex quadratic programming), we study representer theorems and need reproducing properties for gradients similar to (.). This is the first purpose of this paper..3. Capacity of RHS Learning ability of algorithms in H depends on the kernel, a loss function measuring errors and probability distributions from which the samples are drawn [,]. Its quantitative estimates in terms of H rely on two features i=

3 458 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) of the RHS: the approximation power and the capacity [2,5,6,7]. The latter can be measured by covering numbers of the unit ball of the RHS as a subset of C(X). These covering numbers have been extensively studied in learning theory, see e.g. [,9,8]. In particular, when X =[0, ] n and is C s with s not being an even integer, an explicit bound for the covering numbers was presented in [9]. This was done by showing that H can be embedded in C s/2 (X), a Hölder space on X. The embedding result yields error estimates in the C s/2 metric by means of bounds in the H metric for learning algorithms. For example, for the least square regularized regression algorithm (.2) with V(y,f(x)) = (f (x) y) 2, when C 2+ε (X X) with some ε > 0, rates of the error f z,λ f ρ C (X) for learning the regression function f ρ was provided in [4] from bounds for f z,λ f ρ. A natural question is whether the extra ε > 0 can be omitted. The general problem for C s is whether H can be embedded in C s/2 (X) when X is a general domain in R n or when s is an even integer. Solving this general question is the second purpose of this paper. The main difficulty we overcome here is the lack of regularity of the general domain. 2. Reproducing partial derivatives in RHS To allow a general situation, we would not assume any regularity for the boundary of X. To consider partial derivatives, we assume that the interior of X is nonempty. For s Z +, we denote an index set I s := {α Z n + : α s} where α = n j= α j for α = (α,...,α n ) Z n +. For a function f of n variables and x = (x,...,x n ) R n, we denote its partial derivative D α f at x (if it exists) as D α f(x)= D α...dαn n f(x)= α f(x). (x ) α... (x n αn ) Definition (Ziemer [2]). Let X be a compact subset of R n which is the closure of its nonempty interior X o. Define C s (X o ) to be the space of functions f on X o such that D α f is well-defined and continuous on X o for each α I s. Define C s (X) to be the space of continuous functions f on X such that f X o C s (X o ) and for each α I s, D α (f X o) has a continuous extension to X denoted as D α f. In particular, the extension of f X o is f itself. The linear space C s (X) is actually a Banach space with the norm f C s (X) = α I s D α f = α I s sup D α (f X o)(x). x X o Observe that for f C (X), the gradient f equals (D e f,...,d en f)where e j is the jth standard unit vector in R n. The property of reproducing partial derivatives of functions in H is given by partial derivatives of the Mercer kernel. If C 2s (X X), for α I s we extend α to Z 2n + by adding zeros to the last n components and denote the partial derivative of as D α. That is, α D α (x,y) = (x ) α... (x n ) (x,...,x n,y,...,y n ), αn x, y X o and D α is a continuous extension of D α ( X o X o) to X X.Forx X, denote (Dα ) x as the function on X given by (D α ) x (y) = D α (x,y). By the symmetry of,wehave D α ( y )(x) = (D α ) x (y) = D α (x,y) x,y X. (2.) Now we can give the result on reproducing partial derivatives and embedding of H into C s (X). Here (.) and the weak compactness of a closed ball of a Hilbert space play an important role. Theorem. Let s N and : X X R be a Mercer kernel such that C 2s (X X). Then the following statements hold: (a) For any x X and α I s, (D α ) x H. (b) A partial derivative reproducing property holds true for α I s : D α f(x)= (D α ) x,f x X, f H. (2.2)

4 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) (c) The inclusion J : H C s (X) is well-defined and bounded: f C s (X) n s C 2s (X X) f f H. (2.3) (d) J is compact. More strongly, for any closed bounded subset B of H, J(B)is a compact subset of C s (X). Proof. We first prove (a) and (b) together by induction on α =0,,...,s. The case α =0 is trivial since α = 0 and (D 0 ) x = x satisfies (.). Let 0 l s. Suppose that (D α ) x H and (2.2) holds for any x X and α Z n + implies that for any y X, with α =l. Then (2.2) (D α ) y,(d α ) x = D α ((D α ) x )(y) = D α (D α (x, ))(y) = D (α,α) (x,y). (2.4) Here (α, α) Z 2n + is formed by α in the first and second sets of n components. Now we turn to the case l +. Consider the index α + e j with α + e j =l +. We prove (a) and (b) for this index in four steps. Step : Proving (D α+ej ) x H for x X o. Since x X o, there exists some r>0 such that x +{y R n : y r} X o. Then by (2.4), the set {(/t)((d α ) x+te j (D α ) x ) : t r} of functions in H satisfies t ((Dα ) x+te j (D α ) x ) 2 = t 2 {D(α,α) (x + te j,x+ te j ) D (α,α) (x + te j,x) D (α,α) (x,x + te j ) + D (α,α) (x,x)} D (α+ej,α+e j ) t r. Here we have used the assumption C 2s (X X) and (α + e j, α + e j ) =2 α +2 = 2l + 2 2s. That means {(/t)((d α ) x+te j (D α ) x ) : t r} lies in the closed ball of the Hilbert space H with a finite radius D (α+ej,α+e j ). Since this ball is weakly compact, there is a sequence {t i } i= with t i r and lim i t i = 0 such that {(/t i )((D α ) x+ti e j (Dα ) x )} converges weakly to an element g x of H as i. The weak convergence tells us that lim i t i ((D α ) x+ti e j (Dα ) x ), f In particular, by taking f = y with y X, there holds g x (y) = lim ((D α ) i t x+ti e j (Dα ) x ), y. i By (2.2) for α and (2.) we have g x (y) = lim i = lim i = g x,f f H. (2.5) (D α ( y )(x + t i e j ) D α ( y )(x)) t i (D α (x + t i e j,y) D α (x,y)) = D α+ej (x,y) = (D α+ej ) t x (y). i This is true for an arbitrary point y X. Hence (D α+ej ) x = g x as functions on X. Since g x H, we know (D α+ej ) x H. Step 2: Proving for x X o the convergence t ((Dα ) x+te j (D α ) x ) (D α+ej ) x in H (t 0). (2.6)

5 460 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Applying (2.5) and (2.2) for α to the function (D α+ej ) x H yields (D α+ej ) x,(d α+ej ) x = lim {D α ((D α+ej ) i t x )(x + t i e j ) D α ((D α+ej ) x )(x)} i = lim {D α (D α+ej (x, ))(x + t i e j ) D α (D α+ej (x, ))(x)} i t i = D (α+ej,α+e j ) (x,x). This in connection with (2.2) implies t ((Dα ) x+te j (D α 2 ) x ) (D α+ej ) x = t 2 {D(α,α) (x + te j,x+ te j ) 2D (α,α) (x + te j,x)+ D (α,α) (x,x)} 2 t {Dα ((D α+ej ) x )(x + te j ) D α ((D α+ej ) x )(x)}+d (α+ej,α+e j ) (x,x) = t t t 2 D (α+ej,α+e j ) (x + ue j,x+ ve j ) du dv 0 0 t 2 D (α+ej,α+e j ) (x,x + ve j ) dv + D (α+ej,α+e j ) (x,x) t 0 = t t t 2 {D (α+ej,α+e j ) (x + ue j,x+ ve j ) 0 0 2D (α+ej,α+e j ) (x,x + ve j ) + D (α+ej,α+e j ) (x,x)} du dv. If we define the modulus of continuity for a function g C(X X) to be a function of δ (0, ) as ω(g, δ) := sup{ g(x,y ) g(x 2,y 2 ) :x i,y i X with x x 2 δ, y y 2 δ}, (2.7) we know from the uniform continuity of g that lim δ 0+ ω(g, δ) = 0. Moreover, the function ω(g, δ) is continuous on (0, ). Using the modulus of continuity for the function D (α+ej,α+e j ) C(X X) we see that t ((Dα ) x+te j (D α 2 ) x ) (D α+ej ) x 2ω(D (α+ej,α+e j ), t ). (2.8) This converges to zero as t 0. Therefore (2.6) holds true. Step 3: Proving (2.2) for x X o and α + e j. Let f H. By (2.6) we have (D α+ej ) x,f = lim t 0 t ((Dα ) x+te j (D α ) x ), f By (2.2) for α, we see that this equals (D α+ej ) x,f = lim t 0 t {Dα f(x+ te j ) D α f(x)}. That is, D α+ej f(x)exists and equals (D α+ej ) x,f. This verifies (2.2) for α + e j. Step 4: Proving (a) and (b) for x X := X\X o. Notice from the first three steps that for x,x X o, there holds (D α+ej ) x (D α+ej ) x 2 ={D(α+ej,α+e j ) (x,x ) D (α+ej,α+e j ) (x,x ) D (α+ej,α+e j ) (x,x ) + D (α+ej,α+e j ) (x,x )} 2ω(D (α+ej,α+e j ), x x )..

6 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) It follows that for any sequence {x (i) X o } i= converging to x, the sequence of functions {(Dα+ej ) x (i)} is a Cauchy sequence in the Hilbert space H. So it converges to a limit function h H. Applying what we have proved for x (i) X o we get h(y) = h, y = lim i (Dα+ej ) x (i), y = (D α+ej ) x (y) y X. This verifies (D α+ej ) x = h H. Let f H. We define a function f [j] on X as f [j] (x) = (D α+ej ) x,f, x X. By the conclusion in Step 3, we know that f [j] (x) = D α+ej f(x)for x X o, hence f [j] is continuous on X o. Let us now prove the continuity of f [j] at each x X.If{x (i) X o } i= is a sequence satisfying lim i x (i) =x, then the above proof tells us that (D α+ej ) x (i) converges (D α+ej ) x in the H metric meaning that lim i (D α+ej ) x (i) (D α+ej ) x = 0. So by the definition of f [j] and the Schwarz inequality, we have f [j] (x (i) ) f [j] (x) = (D α+ej ) x (i) (D α+ej ) x,f (D α+ej ) x (i) (D α+ej ) x f 0 as i. Thus the function f [j] is continuous on X, and it is a continuous extension of D α+ej f from X o onto X. So (b) holds true for x X. This completes the induction procedure for proving the statements in (a) and (b). (c) We use (2.2) and (2.4). For f H, x, x X and α I s, the Schwarz inequality implies Hence D α f(x) D α f( x) = (D α ) x (D α ) x,f (D α ) x (D α ) x f D α f(x) D α f( x) D (α,α) (x,x) 2D (α,α) (x, x) + D (α,α) ( x, x) f. 2ω(D (α,α), x x ) f x, x X. (2.9) As lim δ 0+ ω(d (α,α),δ) = 0, we know that D α f C(X). It means f C s (X) and the inclusion J is well-defined. To see the boundedness, we apply the Schwarz inequality again and have D α f(x) = (D α ) x,f D (α,α) (x,x) f D (α,α) f. It follows that f C s (X) = α I s D α f α I s D (α,α) f n s α I s D (α,α) f. Then (2.3) is verified. (d) If B is a closed bounded subset of H, there is some R>0such that B {f H : f R}. To show that J(B)is compact, let {f j } j= be a sequence in B. The estimate (2.9) tells us that for each α I s and j N, D α f j (x) D α f j ( x) 2ω(D (α,α), x x )R x, x X. It says that the sequence of functions {D α f j } j= is uniformly continuous. This is true for each α I s. So by taking subsequences for α (one-after-one), we know that there is a subsequence {f jl } l= which converges to a function f C s (X) in the metric C s (X). Observe that {f jl } l= lies in the ball of H with radius R which is weakly compact, it contains a subsequence {f jlk } k= which is also a subsequence of {f j } j= and converges weakly to a function f H in the metric. According to (2.2), the weak convergence in H tells us that {f jlk } k= converges to f in the metric C s (X). Therefore, f = f H and {f j } j= J(B) contains a subsequence which converges in Cs (X) to f. This proves that J(B)is compact. The proof of Theorem is complete.

7 462 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) Theorem can be extended to other kernels [0]. Relation (2.3) tells us that the error bounds in the norm can be used to estimate convergence rates of learning algorithms in the norm C s (X), as done in [4]. 3. Representer theorems for learning with derivative data A general learning algorithm of regularization in H involving partial derivative data takes the form { m } f x, y,λ = arg min V i ( y i, {D α f(x i )} α Ji ) + λ f 2, (3.) f H i= where for each i {,...,m}, x i X, y i is a vector, J i is a subset of I s and V i is a loss function with values in R + of compatible variables. Denote the number of elements in the set J i as #(J i ). The partial derivative reproducing property (2.2) stated in Theorem enables us to derive a representer theorem for the learning algorithm (3.), which asserts that the minimization over the possibly infinite dimensional space H can be achieved in a finite dimensional subspace generated by { xi } and their partial derivatives. Theorem 2. Let s N and : X X R be a Mercer kernel such that C 2s (X X). If λ > 0, then the solution f x, y,λ of scheme (3.) exists and lies in the subspace spanned by {(D α ) xi : α J i,i=,...,m}. If we write f x, y,λ = m i= α J i c i,α (Dα ) xi with c = (c i,α ) α J i,i=,...,m R N where N = m i= #(J i ), then c = arg min c R N +λ m i= m m y i, c j,β D (β,α) (x j,x i ) j= β J j m c i,α c j,β D (β,α) (x j,x i ). β J j V i i= α J i j= α J i Proof. By Theorem, we know that for any α in J i, the function (D α ) xi lies in H. Denote the subspace of H spanned by {(D α ) xi : α J i,i =,...,m} as H,x. Let P be the orthogonal projection onto this subspace. Then for any f H, the function f P(f) is orthogonal to H,x. In particular, f P(f),(D α ) xi = 0 for any α J i and i m. This in connection with the partial derivative reproducing property (2.2) tells us that D α (f P (f ))(x i ) = D α f(x i ) D α (P (f ))(x i ) = 0 α J i, i =,...,m. Thus, if we denote E z (f )= m i= V i ( y i, {D α f(x i )} α Ji ), we see that E z (f )=E z (P (f )). Notice that P(f) f and the strict inequality holds unless f = P(f), i.e., f H,x. Therefore, min {E z (f ) + λ f 2 }= min {E z (f ) + λ f 2 } f H f H,x and a minimizer f x, y,λ exists and lies in H,x since the subspace is finite dimensional. The second statement is trivial. We shall not discuss learning rates of the learning algorithms (.3) and (.4). Though rough estimates can be given using methods from [4,8,3], satisfactory error analysis for more general learning algorithms [9,20] will be done later. Acknowledgements The work described in this paper was supported by the Research Grants Council of the Hong ong Special Administrative Region, China (Project No. CityU 03704) and a Joint Research Fund for Hong ong and Macau Young Scholars from the National Science Fund for Distinguished Young Scholars (Project No ).

8 D.-X. Zhou / Journal of Computational and Applied Mathematics 220 (2008) References [] M. Anthony, P.L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, 999. [2] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (950) [3] M. Belkin, P. Niyogi, Semisupervised learning on Riemannian manifolds, Mach. Learn. 56 (2004) [4] E. De Vito, A. Caponnetto, L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math. 5 (2005) [5] E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res. 5 (2004) [6] T. Evgeniou, M. Pontil, T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 3 (2000) 50. [7] D. Hardin, I. Tsamardinos, C.F. Aliferis, A theoretical characterization of linear SVM-based feature selection, Proceedings of the 2st International Conference on Machine Learning, Banff, Canada, [8] S. Mukherjee, P. Niyogi, T. Poggio, R. Rifkin, Learning theory: stability is sufficient for generalization and necessary and sufficient for empirical risk minimization, Adv. Comput. Math. 25 (2006) [9] S. Mukherjee, Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res. 7 (2006) [0] R. Schaback, J. Werner, Linearly constrained reconstruction of functions by kernels with applications to machine learning, Adv. Comput. Math. 25 (2006) [] J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Trans. Inform. Theory 44 (998) [2] S. Smale, D.X. Zhou, Estimating the approximation error in learning theory, Anal. Appl. (2003) 7 4. [3] S. Smale, D.X. Zhou, Shannon sampling and function reconstruction from point values, Bull. Amer. Math. Soc. 4 (2004) [4] S. Smale, D.X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx. 26 (2007) [5] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 990. [6] Y. Yao, On complexity issue of online learning algorithms, IEEE Trans. Inform. Theory, to appear. [7] Y. Ying, Convergence analysis of online algorithms, Adv. Comput. Math. 27 (2007) [8] D.X. Zhou, The covering number in learning theory, J. Complexity 8 (2002) [9] D.X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory 49 (2003) [20] D.X. Zhou,. Jetter, Approximation with polynomial kernels and SVM classifiers, Adv. Comput. Math. 25 (2006) [2] W.P. Ziemer, Weakly Differentiable Functions, Springer, New York, 989.

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Are Loss Functions All the Same?

Are Loss Functions All the Same? Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS

NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS K. Jarosz Southern Illinois University at Edwardsville, IL 606, and Bowling Green State University, OH 43403 kjarosz@siue.edu September, 995 Abstract. Suppose

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

BINARY CLASSIFICATION

BINARY CLASSIFICATION BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the

More information

Linear Analysis Lecture 5

Linear Analysis Lecture 5 Linear Analysis Lecture 5 Inner Products and V Let dim V < with inner product,. Choose a basis B and let v, w V have coordinates in F n given by x 1. x n and y 1. y n, respectively. Let A F n n be the

More information

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t Analysis Comprehensive Exam Questions Fall 2. Let f L 2 (, ) be given. (a) Prove that ( x 2 f(t) dt) 2 x x t f(t) 2 dt. (b) Given part (a), prove that F L 2 (, ) 2 f L 2 (, ), where F(x) = x (a) Using

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP

Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP Journal of Functional Analysis 253 (2007) 772 781 www.elsevier.com/locate/jfa Note Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP Haskell Rosenthal Department of Mathematics,

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Back to the future: Radial Basis Function networks revisited

Back to the future: Radial Basis Function networks revisited Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert spaces

Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert spaces Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 016, 4478 4488 Research Article Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Multi-view Laplacian Support Vector Machines

Multi-view Laplacian Support Vector Machines Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS

BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS SHOO SETO Abstract. These are the notes to an expository talk I plan to give at MGSC on Kähler Geometry aimed for beginning graduate students in hopes to motivate

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,

More information

Functional Analysis MATH and MATH M6202

Functional Analysis MATH and MATH M6202 Functional Analysis MATH 36202 and MATH M6202 1 Inner Product Spaces and Normed Spaces Inner Product Spaces Functional analysis involves studying vector spaces where we additionally have the notion of

More information

Mercer s Theorem, Feature Maps, and Smoothing

Mercer s Theorem, Feature Maps, and Smoothing Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics,

More information

Linear Normed Spaces (cont.) Inner Product Spaces

Linear Normed Spaces (cont.) Inner Product Spaces Linear Normed Spaces (cont.) Inner Product Spaces October 6, 017 Linear Normed Spaces (cont.) Theorem A normed space is a metric space with metric ρ(x,y) = x y Note: if x n x then x n x, and if {x n} is

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Sobolev Spaces. Chapter Hölder spaces

Sobolev Spaces. Chapter Hölder spaces Chapter 2 Sobolev Spaces Sobolev spaces turn out often to be the proper setting in which to apply ideas of functional analysis to get information concerning partial differential equations. Here, we collect

More information

Generalizing the Bias Term of Support Vector Machines

Generalizing the Bias Term of Support Vector Machines Generalizing the Bias Term of Support Vector Machines Wenye Li and Kwong-Sak Leung and Kin-Hong Lee Department of Computer Science and Engineering The Chinese University of Hong Kong wyli, ksleung, khlee}@cse.cuhk.edu.hk

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

PROJECTIONS ONTO CONES IN BANACH SPACES

PROJECTIONS ONTO CONES IN BANACH SPACES Fixed Point Theory, 19(2018), No. 1,...-... DOI: http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html PROJECTIONS ONTO CONES IN BANACH SPACES A. DOMOKOS AND M.M. MARSH Department of Mathematics and Statistics

More information

Online Learning with Samples Drawn from Non-identical Distributions

Online Learning with Samples Drawn from Non-identical Distributions Journal of Machine Learning Research 10 (009) 873-898 Submitted 1/08; Revised 7/09; Published 1/09 Online Learning with Samples Drawn from Non-identical Distributions Ting Hu Ding-uan hou Department of

More information

Regularization on Discrete Spaces

Regularization on Discrete Spaces Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

BEST APPROXIMATIONS AND ORTHOGONALITIES IN 2k-INNER PRODUCT SPACES. Seong Sik Kim* and Mircea Crâşmăreanu. 1. Introduction

BEST APPROXIMATIONS AND ORTHOGONALITIES IN 2k-INNER PRODUCT SPACES. Seong Sik Kim* and Mircea Crâşmăreanu. 1. Introduction Bull Korean Math Soc 43 (2006), No 2, pp 377 387 BEST APPROXIMATIONS AND ORTHOGONALITIES IN -INNER PRODUCT SPACES Seong Sik Kim* and Mircea Crâşmăreanu Abstract In this paper, some characterizations of

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

NUMERICAL RADIUS OF A HOLOMORPHIC MAPPING

NUMERICAL RADIUS OF A HOLOMORPHIC MAPPING Geometric Complex Analysis edited by Junjiro Noguchi et al. World Scientific, Singapore, 1995 pp.1 7 NUMERICAL RADIUS OF A HOLOMORPHIC MAPPING YUN SUNG CHOI Department of Mathematics Pohang University

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India CHAPTER 9 BY Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India E-mail : mantusaha.bu@gmail.com Introduction and Objectives In the preceding chapters, we discussed normed

More information

On Some Estimates of the Remainder in Taylor s Formula

On Some Estimates of the Remainder in Taylor s Formula Journal of Mathematical Analysis and Applications 263, 246 263 (2) doi:.6/jmaa.2.7622, available online at http://www.idealibrary.com on On Some Estimates of the Remainder in Taylor s Formula G. A. Anastassiou

More information

FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM. F (m 2 ) + α m 2 + x 0

FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM. F (m 2 ) + α m 2 + x 0 FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM If M is a linear subspace of a normal linear space X and if F is a bounded linear functional on M then F can be extended to M + [x 0 ] without changing its norm.

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies

More information

Distributed Learning with Regularized Least Squares

Distributed Learning with Regularized Least Squares Journal of Machine Learning Research 8 07-3 Submitted /5; Revised 6/7; Published 9/7 Distributed Learning with Regularized Least Squares Shao-Bo Lin sblin983@gmailcom Department of Mathematics City University

More information

ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES

ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 3, 2018 ISSN 1223-7027 ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES Vahid Dadashi 1 In this paper, we introduce a hybrid projection algorithm for a countable

More information

Learning with Consistency between Inductive Functions and Kernels

Learning with Consistency between Inductive Functions and Kernels Learning with Consistency between Inductive Functions and Kernels Haixuan Yang Irwin King Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, N.T., Hong

More information

Relationships between upper exhausters and the basic subdifferential in variational analysis

Relationships between upper exhausters and the basic subdifferential in variational analysis J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Construction of Smooth Fractal Surfaces Using Hermite Fractal Interpolation Functions. P. Bouboulis, L. Dalla and M. Kostaki-Kosta

Construction of Smooth Fractal Surfaces Using Hermite Fractal Interpolation Functions. P. Bouboulis, L. Dalla and M. Kostaki-Kosta BULLETIN OF THE GREEK MATHEMATICAL SOCIETY Volume 54 27 (79 95) Construction of Smooth Fractal Surfaces Using Hermite Fractal Interpolation Functions P. Bouboulis L. Dalla and M. Kostaki-Kosta Received

More information

Available online at J. Nonlinear Sci. Appl., 10 (2017), Research Article

Available online at   J. Nonlinear Sci. Appl., 10 (2017), Research Article Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 2719 2726 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa An affirmative answer to

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Homework I, Solutions

Homework I, Solutions Homework I, Solutions I: (15 points) Exercise on lower semi-continuity: Let X be a normed space and f : X R be a function. We say that f is lower semi - continuous at x 0 if for every ε > 0 there exists

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (206), 424 4225 Research Article Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem Jong Soo

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 2 November 6 November Deadline to hand in the homeworks: your exercise class on week 9 November 13 November Exercises (1) Let X be the following space of piecewise

More information

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM

COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM A metric space (M, d) is a set M with a metric d(x, y), x, y M that has the properties d(x, y) = d(y, x), x, y M d(x, y) d(x, z) + d(z, y), x,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999 Scientiae Mathematicae Vol. 3, No. 1(2000), 107 115 107 ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI Received December 14, 1999

More information

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1656 May 1999 C.B.C.L

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

A collocation method for solving some integral equations in distributions

A collocation method for solving some integral equations in distributions A collocation method for solving some integral equations in distributions Sapto W. Indratno Department of Mathematics Kansas State University, Manhattan, KS 66506-2602, USA sapto@math.ksu.edu A G Ramm

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

UNIFORM BOUNDS FOR BESSEL FUNCTIONS

UNIFORM BOUNDS FOR BESSEL FUNCTIONS Journal of Applied Analysis Vol. 1, No. 1 (006), pp. 83 91 UNIFORM BOUNDS FOR BESSEL FUNCTIONS I. KRASIKOV Received October 8, 001 and, in revised form, July 6, 004 Abstract. For ν > 1/ and x real we shall

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

Received 8 June 2003 Submitted by Z.-J. Ruan

Received 8 June 2003 Submitted by Z.-J. Ruan J. Math. Anal. Appl. 289 2004) 266 278 www.elsevier.com/locate/jmaa The equivalence between the convergences of Ishikawa and Mann iterations for an asymptotically nonexpansive in the intermediate sense

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information