Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Size: px
Start display at page:

Download "Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling"

Transcription

1 Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology Zürich March 13, 2006

2 2

3 Contents 1 Introduction 5 2 Kernels and Nonlinear Feature Maps Nonlinear Feature Maps Positive (Semi-)Definite Kernels Gram Matrices and Positive Definiteness Kernel Induced Feature Spaces and the Kernel Trick RBF Kernels Conditionally Positive Semidefinite Kernels Surface Modeling with the SlabSVM The SlabSVM Problem Formulation The Solution of the SlabSVM The Geometry of the SlabSVM The OpenSlabSVM Problem Formulation The Solution of the OpenSlabSVM The Geometry of the OpenSlabSVM Using the Gaussian Kernel The ZeroSlabSVM Problem Formulation The Solution of the ZeroSlabSVM The Geometry of the ZeroSlabSVM Using the Gaussian Kernel The ZeroSlabSVM and Shape Reconstruction Using Radial Basis Functions Solving the ZeroSlabSVM The Geometry of the SlabSVM Revisited Center-of-Ball Approximation Algorithm The Algorithm Basic Two-Phase Iteration Algorithm Geometric Interpretation Refined Algorithm Remarks

4 4 CONTENTS Convergence and Complexity Kernel Induced Feature Spaces Off-Surface Points Orthogonalization Results Feature-Coefficient Correlation Features of a Shape Feature-Coefficient Correlation

5 Chapter 1 Introduction Since the introduction of the Support Vector Machines, kernel techniques have become an important instrument for a number of tasks in machine learning and statistics. Each kernel defines an implicit transformation from objective space into a (usually higher dimensional) feature space. Depending on the chosen kernel, the geometry of this induced feature space can be very specific. Using RBF kernels for example, the points in feature space all lie on a hypersphere around the origin. Our goal was to analyze the geometric constraints of the feature space induced by a RBF kernel (and the Gaussian kernel in particular) and its implications on the geometry of problems formulated with such kernels. We demonstrate in the example of shape reconstruction using the SlabSVM that those implications can then be used to restate or even improve algorithms performed in kernel feature spaces. Note that the techniques presented in this thesis as kernels or Support Vector Machines, are mainly used in machine learning. However, our research focus is not so much the analysis of learning related concepts (as risk and loss functions or other elements of statistical learning theory) but rather to investigate the properties (especially the geometric ones) of these techniques and to gain insights possibly leading to new perspectives. Chapter 2 gives a short introduction into the basic concepts crucial for the succeeding chapters as nonlinear feature maps and kernels. In chapter 3 the Slab- SVM is introduced as a method for shape reconstruction, along with a study of the solution properties, interesting special cases and geometric interpretations. Chapter 4 offers an in-depth analysis of a new algorithm for a special case of the SlabSVM, which exploits geometric insights gained so far. Finally, chapter 5 sheds some light on the interesting observation that the solution of the SlabSVM seems to correlate with features of the shape as ridges, ravines or sharp edges. 5

6 6 CHAPTER 1. INTRODUCTION

7 Chapter 2 Kernels and Nonlinear Feature Maps 2.1 Nonlinear Feature Maps A nonlinear feature map can be written as a nonlinear transformation function φ, which transforms any point x X to a corresponding point φ(x) Y. The space X of the original samples is then called objective space and Y is called feature space. Nonlinear feature maps can be useful for many applications. This is demonstrated by the following symbolic example: Example 2.1 Given is a set of sample points x i = ([x i ] 1, [x i ] 2 ) R 2 approximately arranged on a circle around the origin (fig. 2.1(a)). We now want to fit a curve through these points. To this end, we apply the feature map φ(x) = ( ) [x] 2 1, [x]2 2 R 2. (2.1) In Objective space, points x on circles centered at the origin meet the condition [x] [x]2 2 = const. (2.2) For the corresponding points φ(x) = y in feature space it consequently holds [y] 1 + [y] 2 = const., (2.3) which means that they lie on a straight line. Using the described feature map, the transformed sample points will thus approximately form a linear shape in feature space (fig. 2.1(b)), which is easier to learn. Of course this simple toy example is somewhat artificial and it may seem oversimplified. Nevertheless, it is sufficient to highlight the important fact that nonlinear feature maps can linearize problems and therefore reduce their solution complexity. 7

8 8 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS (a) Objective Space (b) Feature Space Figure 2.1: Nonlinear Feature Map 2.2 Positive (Semi-)Definite Kernels Let us look again at our simple toy example 2.1: We have used the feature map ( ) X Y : x φ(x) = [x] 2 1, [x]2 2 = y. (2.4) Scalar products in feature space can thus be computed as y 1,y 2 = φ(x 1 ), φ(x 2 ) (2.5) ( ) ( ) = [x 1 ] 2 1, [x 1] 2 2, [x 2 ] 2 1, [x 2] 2 2 (2.6) = [x 1 ] 2 1 [x 2] [x 1] 2 2 [x 2] 2 2 (2.7) =: k(x 1,x 2 ). (2.8) It is important to see that the scalar product of two vectors φ(x 1 ),φ(x 2 ) in feature space can therefore be computed as a function k which takes as input the two corresponding vectors x 1,x 2 in objective space. This function k is called a kernel Gram Matrices and Positive Definiteness Some important definitions concerning kernels are: Definition 2.2 (Gram Matrix) Given a function k : X 2 K (where K = C or K = R) and samples x 1,...,x n X, the n n matrix K with elements K ij = k(x i,x j ) (2.9) is called the Gram matrix (or kernel matrix) of k with respect to x 1,...,x n. Definition 2.3 (Positive Semidefinite Matrix) A real n n matrix K satisfying c i c j K ij 0 (2.10) i,j=1

9 2.2. POSITIVE (SEMI-)DEFINITE KERNELS 9 for all c i R is called positive semidefinite. If equality in (2.10) only holds for c 1 =... = c n = 0 then it is called positive definite. Definition 2.4 (Positive Semidefinite Kernel) Let X be a nonempty set. A function k on X X which for all n N and all x 1,...,x n X gives rise to a positive semidefinite Gram matrix is called a positive semidefinite kernel. If equality in (2.10) only holds for c 1 =... = c n = 0 then it is called a positive definite kernel. Note that positive definiteness is a special case of positive semidefiniteness and therefore every property stated for positive semidefinite kernels also holds for positive definite ones. Another class of kernels are the conditionally positive semidefinite kernels, defined in section 2.3, which play an important role in Computer Graphics. Since positive (semi)definite kernels give rise to nice geometric interpretations, they will be our main interest. Unless noted otherwise, a kernel will therefore always mean a positive (semi)definite kernel. However, we will discuss the use of conditionally positive semidefinite kernels for surface reconstruction using radial basis functions when we study the ZeroSlabSVM in section Kernel Induced Feature Spaces and the Kernel Trick In the beginning of this section we saw that we can compute scalar products in the feature space of our simple toy example 2.1 by an evaluation of a kernel k on the samples in objective space. In fact, for every positive semidefinite kernel it holds k(x 1,x 2 ) = φ(x 1 ), φ(x 2 ), (2.11) for some feature map φ. It can therefore be stated that every positive semidefinite kernel implicitly defines a feature map φ into some Hilbert space. For our example this means that the kernel k(x 1,x 2 ) = [x 1 ] 2 1 [x 2] [x 1] 2 2 [x 2] 2 2 (2.12) implicitly defines the feature map (2.4). This second statement indicates the power of kernels: Let us assume we have given some sample points in objective space and we want to perform an algorithm on these samples (e.g. learning the shape described by those samples). Since the structure of the samples is nonlinear, we would like to apply a feature transformation φ into a feature space in order to linearize the problem. Then we will perform the algorithm in feature space. Let us now further assume that this algorithm in feature space uses the scalar product as its sole basic operation. We know that there exists a kernel k, which implicitly defines the used feature map φ and computes scalar products in the feature space as a function on points in the objective space. Using this kernel it is thus possible

10 10 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS to formulate our algorithm, which operates in feature space, directly in objective space, without the need of performing the feature transformation explicitly! This can result in a massive reduction in complexity, especially when the feature space is very high (or even infinite) dimensional. An example of how this is done in practice is the SlabSVM introduced in chapter 3. Of course this Kernel Trick only works when the algorithm in feature space only depends on scalar products. Fortunately, due to the power of the scalar product, many interesting problems can be solved by algorithms fulfilling this condition. Consequently there exists quite a number of well-investigated kernels used for a variety of problems in machine learning and many other fields. The next sections will introduce some kernels important for the following chapters. For an in-depth analysis of kernels, their definitions and properties and applications, the interested reader is referred to the extensive literature including [1],[2],[3] RBF Kernels Radial basis function (RBF) kernels follow the form where f is a function on R + 0 k(x 1,x 1 ) = f (d(x 1,x 2 )) (2.13) and d is a metric in X, for which the usual choice is d(x 1,x 2 ) = x 1 x 2. (2.14) The fact that for every metric d(x,x) = 0 gives rise to a first geometric statement: Proposition 2.5 (Geometry of RBF Kernels) The transformed points φ(x) in the feature space induced by a positive semidefinite RBF kernel are equidistant to the origin and thus all lie on a hypersphere with radius k(x, x) = f(0) around the origin. The Gaussian Kernel A very popular choice of a positive definite RBF kernel in machine learning is the Gaussian kernel: ( ) k(x 1,x 2 ) = exp x 1 x 2 2 2σ 2, σ > 0. (2.15) It was mentioned before that when using positive semidefinite RBF kernels the transformed points in feature space all lie on a hypersphere around the origin. For the Gaussian kernel it holds φ(x i ) 2 = φ(x i ), φ(x i ) = k(x i,x i ) = 1. (2.16) Therefore the hypersphere has radius one in this case. The Gaussian kernel has another very important property:

11 2.2. POSITIVE (SEMI-)DEFINITE KERNELS 11 Theorem 2.6 (Full Rank of Gaussian Gram Matrices) Suppose that x 1,..., x n X are distinct points, and σ 0. The matrix given by has full rank. K ij = exp This leads to two crucial implications: ( x i x j 2 2σ 2 ) (2.17) The transformed feature points φ(x 1 ),...,φ(x n ) are linearly independent. In principle, the feature space induced by a Gaussian kernel is infinite dimensional. For geometric considerations however, it is usually sufficient to look at the n dimensional subspace spanned by the feature points φ(x 1 ),...,φ(x n ). The n points φ(x i ) also lie on an n 1 dimensional hyperball, which of course is trivial since n linearly independent points always do. With only three points, the geometric setup in feature space is indicated in figure 2.2: The feature points φ i = φ(x i ), i = 1, 2, 3 lie on the three dimensional unit sphere around the origin and can be interpolated by a circle on this sphere. φ 1 φ 3 φ 2 0 Figure 2.2: Feature space of Gaussian kernel with three sample points.

12 12 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS 2.3 Conditionally Positive Semidefinite Kernels Let us conclude the chapter with some notes on conditionally positive semidefinite kernels. Definition 2.7 (Conditionally Positive Definite Kernels of Order q) A symmetric kernel X X R is called conditionally positive semidefinite of order q on X R d if for any distinct points x 1,...,x n R d it holds for the quadratic form α i α j k(x i,x j ) 0, (2.18) i,j=1 provided that the coefficients α 1,...,α n satisfy α i p(x i ) = 0, (2.19) for all polynomials p(x) on R d of degree lower than q. If equality in (2.18) only holds for α 1 =... = α n = 0 then it is called conditionally positive definite. Note that (unconditional) positive definiteness is identical to conditional positive definiteness of order zero and that conditional positive definiteness of order q implies conditional positive definiteness of any larger order. Examples of conditionally positive definite radial kernels are: k(x 1,x 2 ) = ( 1) β/2 ( c 2 + x 1 x 2 2) β/2, order q = β/2, β > 0, β / 2N (2.20) k(x 1,x 2 ) = ( 1) k+1 x 1 x 2 2k log x 1 x 2, order q = k + 1, k N (2.21) Conditionally positive semidefinite kernels of order larger than zero do not directly define a dot product in some feature space anymore and geometric considerations of spaces induced by such kernels are not as straightforward as they are in the case of order zero. Thus they are not the focus of this thesis. However, their use in the context of shape reconstruction using radial basis functions will be discussed briefly in subsection

13 Chapter 3 Surface Modeling with the SlabSVM This chapter studies shape reconstruction using the Slab Support Vector Machine (SlabSVM). Section 3.1 introduces the SlabSVM and shows some interesting aspects and properties. As special cases of the SlabSVM, the OpenSlabSVM and ZeroSlabSVM are presented in section 3.2 and 3.3 together with some relations to shape reconstruction using radial basis functions as done in computer graphics and geometric insights which will eventually lead to the algorithm introduced in chapter The SlabSVM Problem Formulation Assume we have given n sample points x 1,...,x n X as a sampling of an unknown shape, where usually but not necessarily X = R 2 or X = R 3. We now want to reconstruct this shape by learning a function f(x), which implicitly defines the learned shape by one of its level-sets {x X : f(x) = c}. The SlabSVM was introduced in [4] as a kernel method for such an implicit surface modeling. Without the use of outliers it can be stated as the optimization problem 1 2 w 2 ρ (3.1) s.t. δ w, φ(x i ) ρ δ, i (3.2) min w,ρ defining the following geometric setup (fig. 3.1): We first apply a feature map φ(x) in order to linearize the problem. We will later see that it will not be necessary to perform this transformation explicitly, since the kernel trick will be applicable (read chapter 2 for an introduction of these important concepts). The goal is then to find a slab, defined by two parallel hyperplanes orthogonal to the solution vector w, enclosing all the sample points 13

14 14 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM o. o o o (ρ+δ* )/ w w (ρ+δ)/ w o o o Figure 3.1: Setup SlabSVM φ(x i ). The width of the slab has to be given in advance. Having found the solution vector w, we can then reconstruct the shape as a level-set of the function f(x) = w, φ(x). (3.3) In feature space, this represents a hyperplane parallel to those defining the slab. The level-set value is usually chosen such that this hyperplane lies within the slab, which holds for values in the interval [ρ + δ, ρ + δ ]. Also for the evaluation of the function f, the use of a kernel will be of great help, as shown later. To solve the optimization problem (3.1)-(3.2) we compute the generalized Lagrangian function as L(w, ρ,α, α ) = 1 2 w 2 ρ α i ( w, φ(x i ) ρ δ) + αi ( w, φ(x i ) ρ δ ). (3.4) The solution of our primal optimization problem is then equivalent to the one of the Lagrangian dual problem defined as max θ(α, α ) (3.5) α,α subject to α 0 (3.6) α 0, (3.7) where θ(α, α ) = inf w,ρ L(w, ρ,α, α ). To compute the infimum of L with respect to the primal variables w and ρ we set the corresponding derivatives to zero L(w, ρ,α, α ) w = 0 w = (α i αi ) φ(x i ) (3.8)

15 3.1. THE SLABSVM 15 L(w, ρ,α, α ) ρ = 0 (α i αi ) = 1 (3.9) and use the resulting equations to replace the primal variables in L by functions of the dual variables α i and αi. The dual problem can then be formulated as 1 min (α i α α,α i ) ( α j α 2 j) φ(xi ), φ(x j ) i,j=1 δ α i + δ αi (3.10) s.t. α ( ) 0 (3.11) and (α i αi ) = 1. (3.12) Since the only operation performed on the feature points φ(x i ) is the scalar product, we can apply the kernel trick and replace it by kernel evaluations, which leads to the final problem formulation of the SlabSVM: 1 min α,α 2 (α i αi ) ( α j αj) k(xi,x j ) δ i,j=1 α i + δ αi (3.13) s.t. α ( ) 0 (3.14) and (α i αi ) = 1. (3.15) This is a good example of how the use of kernels saves us from having to compute the feature map φ explicitly, because it does not appear in the optimization problem anymore. Instead, the feature transformation is implicitly defined by the chosen kernel k. The result is a convex quadratic program, which can be solved using standard techniques The Solution of the SlabSVM As mentioned earlier, the shape is reconstructed as a level-set of the function f (equation (3.3)). Since f too is defined only using scalar products, the use of kernels becomes possible also here and with equation (3.8) we get f(x) = w, φ(x) (3.16) = (α i αi )φ(x i ), φ(x) (3.17) = (α i αi ) φ(x i ), φ(x) (3.18)

16 16 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM = (α i αi )k(x i,x). (3.19) Thus, an explicit transformation into the feature space is at no point needed when reconstructing a shape with the SlabSVM, since we can perform all its computations directly in objective space. Properties of the Solution Using a Gaussian Kernel When we use a Gaussian kernel, the solution function f becomes ( ) f(x) = (α i αi )exp x i x 2 2σ 2. (3.20) There is a Gaussian Bell located at each sample point x i contributing to the overall value of f. The kernel parameter sigma controls the breadth of those bells, also called the support of the kernel. The dual variables α i and α i specify the amount of the contribution of the bell located at sample point x i to f. The reconstructed shape is then a level-set of this weighted sum of Gaussians. To study some interesting properties of such a sum, let us investigate the following example: Example 3.1 (Circular shape in 2D) Given samples x 1,...,x n R 2, n 2 uniformly arranged on a circle with radius r around the origin (fig. 3.2): ( ( r cos 2π ) x i = n (i 1)) r sin ( 2π, where i = 1,...,n (3.21) n (i 1)) To interpolate these points with a level-set of a function described by equation (3.20), we can choose uniform weights 1 (remember that they have to fulfill equation (3.15)): (α i α i ) = 1 n. (3.22) Proof: Due to the symmetry of the setup, the function f(x) will take the same value at each sample point x i and therefore the shape defined by the level-set {x : f(x) = f(x i )} for any i will interpolate every sample point correctly. q.e.d. The resulting function f is therefore ( ( f(x) = 1 r cos 2π n (i 1)) exp n r sin ( 2π n (i 1)) 2σ 2 ) 2 x (3.23) 1 This would actually be the result of the ZeroSlabSVM applied to this setup, which is explained in section 3.3.

17 3.1. THE SLABSVM 17 6 x x Figure 3.2: Points uniformly distributed on a circle. and only depends on the number of sampling data n, the circle radius r and the kernel parameter σ. Figure 3.3 shows the plot of such a function for some values of the remaining parameters. Considering the level-sets of f for different parameter values, two cases can be observed: 1. The function f decreases again and forms a valley in the middle of the circle, causing a second (inner) ring to appear in the level-set (figures 3.3(a)- 3.3(d)). 2. The function does not sink below the level-set value again in the middle of the circle and thus only the outer, point-interpolating ring lies in the level-set (figures 3.3(e)-3.3(f)). Restating f to f(x) = 1 n ( ( cos 2π r2 n (i 1)) exp sin ( 2π n (i 1)) 2σ 2 ) x r 2, (3.24) one can easily see that σ has the exact inverse influence to f as r has, when we rescale with each change of r the evaluation points x such that they occupy the same relative position with respect to the circle. Thus, for the qualitative description of f, increasing r has the same effect as decreasing σ. In order to investigate the two observed cases in a more mathematical manner, let us compare the values of f at a sample point and the value of f at the origin (the

18 18 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM 6 x x x x (a) σ = 2.5 (b) x x x x (c) σ = 3.5 (d) 6 x x1 4 6 x x (e) σ = 5 (f) Figure 3.3: The function f with r = 6, n = 8 and different values for σ and the corresponding reconstructed shapes.

19 3.1. THE SLABSVM 19 center of the circle). At the center, f becomes ( ( cos 2π f((0, 0)) = 1 r2 n (i 1)) exp n sin ( 2π n (i 1)) 2σ 2 ) 2 (3.25) = 1 n = exp exp ( r2 2σ 2 ( r2 2σ 2 ) (3.26) ). (3.27) Since f takes on the same value for every sample point, we are free to choose a point, for example x 1 = (r, 0): f((r, 0)) = 1 n ( ( cos 2π r2 n (i 1)) exp sin ( 2π n (i 1)) 2σ 2 ) ( 1 0 ) 2 (3.28) ( = 1 ( exp r2 2 2 cos ( ) 2π n (i 1))) n 2σ 2 (3.29) = 1 ( ) ( n exp r2 r 2 ( 2 cos ( 2π n 2σ 2 exp (i 1)) 1 ) ) 2σ 2 (3.30) = f((0, 0)) 1 r ) 2 ( ( σ 2 cos 2π n exp(( (i 1)) 1 ) ) (3.31) n 2 ( =: f((0, 0)) g n, r ). (3.32) σ We derived that f((r, 0)) has the value f((0, 0)) multiplied by a factor g which is a function in n and r σ. When g is larger than one, the value of f at a sample point is higher than at the origin, which leads to the first case mentioned above, where f forms a valley in the center and there are two level-sets. In the case of g being smaller than one, f will not decrease below the level-set value again moving from the rim to the center of the circle. In figure 3.4 the function g is plotted for different numbers of sample points n. When n goes to infinity, g becomes asymptotically lim (n, g r ) = 1 2π r ) 2 ) σ (2 cos (ω) 1) exp(( dω, (3.33) n σ 2π 0 2 which is also plotted in figure 3.4. From the plots we can learn that increasing r σ will increase the value of f((r, 0)) with respect to f((0, 0)). On the other hand,

20 20 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM n=2 n=3 n=4 n=5 n->inf 2 g r / sigma Figure 3.4: The function g for different values of n. when r σ lies in the range of 0 and about 1 to 1.7 (depending on n) f((r, 0)) will be less than f((0, 0)). Thus, increasing σ or decreasing r will change f from a valleytype structure into more of a hill-type one and it will make the second, inner ring disappear from the level-set. It will also smoothen the outer ring of the level-set. Let us conclude this example with an analysis of one last question: We saw that for a certain range of r σ f gets a hill-type structure and the inner ring observed earlier (fig. 3.3) disappears from our level-set. Is it also possible to increase r σ enough, such that the inner and outer rings of our level-set coincide? In the qualitative geography of f this would mean that the level-set lies exactly on the ridge of the hills arranged around the circle and forming the valley in its center (figures 3.3(a) and 3.3(c)). To investigate this matter we form the gradient of f at an arbitrary sample point, for example at x 1 = (r, 0): f(x) [x] 1 = x=(r,0) r nσ 2 ( ( ) ) ( ( ) ) 2π(i 1) cos 1 exp r2 cos 2π(i 1) n 1 n σ 2 (3.34)

21 3.1. THE SLABSVM 21 f(x) [x] 2 = x=(r,0) ( ( ) ) ( ) 2π(i 1) sin exp r2 cos 2π(i 1) n 1. n σ 2 (3.35) r nσ 2 Due to the symmetry of the problem and because n 2 equation (3.35) will always be zero. Since exp(x) > 0 and ( ( ) ) 2π(i 1) cos 1 0, i = 1,...,n (3.36) n where equality only holds for i = 1, equation (3.34) will always be less than zero for n 2. Therefore the gradient always points to the center of the circle and has a nonzero length. This proves that f cannot have a maximum at any sample point and thus the case that the level-set lies exactly on the ridge of f cannot occur. Equation (3.34) also tells us that for very high values of r σ the gradient will be very small and in any numerical setup the case becomes possible. However, when we increase r σ too much with respect to the number of sample points, the learned shape will fall apart into several subshapes (fig. 3.5), which is usually not a satisfactory result x x1 4 6 x x (a) σ = 2.3 (b) Figure 3.5: For too small values of r σ the reconstructed shape falls apart. This extensive example reveals quite a number of interesting properties of the solution as a level-set of a weighted sum of Gaussians. Of course all the computations only hold for a uniform sampling of circles but they provide insights and intuitions very useful to understand the solutions for arbitrary shapes. We learned for example that for certain parameter settings the resulting level-set can

22 22 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM contain additional shapes (as the additional inner ring in the example) or even disintegrate, which we actually wish to avoid. This can be done by increasing the value of σ and changing f from a valley-type to a hill-type structure. Also, to algorithmically find the level-set (i.e. in a rendering process) such a hill-structure would be preferable, since level-sets are better found when the function is steep around the iso-value. Unfortunately, as will be discussed later, increasing σ usually introduces severe numerical difficulties and thus reduces the quality of the reconstructed shape The Geometry of the SlabSVM Until now we have not yet studied the geometric implications to the SlabSVM when using feature spaces induced by specific kernels, as for example the Gaussian kernel. To do so, we will first consider some interesting special cases of the SlabSVM in the following sections. We will then return to the SlabSVM in section 3.4 and try to incorporate the insights from these special cases into a further analysis of the SlabSVM itself. 3.2 The OpenSlabSVM Problem Formulation In section 3.1 we defined the SlabSVM as finding a slab, such that all sample points in feature space lie within this slab and its distance to the origin is maximal. The special case presented in the current section is the one when the slab width is set to infinity. We have therefore only one remaining hyperplane to consider, for which we request that all the sample points lie on its side not containing the origin. Again, its distance to the origin is maximized (fig. 3.6). We therefore want to solve the following optimization problem: o. o o o w ρ / w o o o Figure 3.6: Setup OpenSlabSVM

23 3.2. THE OPENSLABSVM w 2 ρ (3.37) s.t. w, φ(x i ) ρ, i (3.38) min w,ρ As before, the sample points in objective space are denoted x 1,...,x n X and φ : X Y is a feature map into some feature space. In machine learning, this problem is also called the OneClassSVM or Single- ClassSVM [1]. To solve the problem, we can again form the generalized Lagrangian set its derivatives to zero L(w, ρ,α) = 1 2 w 2 ρ α i ( w, φ(x i ) ρ) (3.39) L(w, ρ,α) w L(w, ρ,α) ρ = 0 w = α i φ(x i ) (3.40) = 0 α i = 1 (3.41) and state the dual problem min α 1 2 α i α j φ(x i ), φ(x j ) (3.42) i,j=1 s.t. α 0 (3.43) and α i = 1. (3.44) Again, we can use the kernel trick to replace the scalar products, which ultimately leads to the problem min α 1 2 α i α j k(x i,x j ) (3.45) i,j=1 s.t. α 0 (3.46) and α i = 1. (3.47) As for the SlabSVM we have found a convex quadratic program to solve the OpenSlabSVM.

24 24 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM Figure 3.7: Solution of an OpenSlabSVM The Solution of the OpenSlabSVM The solution of the OpenSlabSVM is the hyperplane defined by w, φ(x) = ρ, (3.48) fulfilling the conditions stated in equations (3.37) - (3.38). In objective space, the points x for which (3.48) holds will not necessarily form a hyperplane anymore. They will rather define an arbitrary nonlinear shape enclosing the sample points x i (fig. 3.7). Using (3.40) this shape is defined as a level-set of the function f(x) = α i φ(x i ), φ(x) (3.49) = = α i φ(x i ), φ(x) (3.50) α i k(x i,x). (3.51) Note that the use of kernels here, and for the quadratic program in the last subsection, once more saved us from having to compute any feature transformation directly. The level-set value is ρ, which can be computed by ρ = min i f(x i ) (3.52) since it is the lower bound of the inequality (3.38) and because there will always be a sample point where equality holds. Using a Gaussian kernel, f will again be a weighted sum of Gaussians as discussed in subsection

25 3.2. THE OPENSLABSVM 25 φ 1 w φ 3 φ 2 0 Figure 3.8: Geometric setup of the OpenSlabSVM in Gaussian feature space with three sample points The Geometry of the OpenSlabSVM Using the Gaussian Kernel When we combine equation (3.40) and the quadratic program obtained at the end of subsection we get another optimization problem for the OpenSlabSVM, namely 1 min w 2 w 2 (3.53) s.t. α 0 (3.54) α i = 1 (3.55) i w = i α i φ(x i ). (3.56) This alternative formulation reveals some nice geometric insights into the solution w in feature space (fig. 3.8): Equation (3.56) tells us that w is a linear combination of the feature points φ(x i ). The constraints (3.54) and (3.55) on the coefficients α i of this linear combination narrow the solution region of w to points in the convex hull of the feature points φ(x i ). Because of the objective function (3.53), w will be the point closest to the origin in this convex hull.

26 26 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM There is another interesting geometric property of the OpenSlabSVM when the feature map φ is defined implicitly by a Gaussian kernel: Proposition 3.2 (OpenSlabSVM - MiniBall Analogy) In the feature space induced by a Gaussian kernel, solving the OpenSlabSVM (which means finding the point w ConvexHull({φ(x i )}) closest to the origin) is equivalent to finding the center w of the minimal enclosing ball for the points φ(x i ). In fact, this analogy even holds for positive semidefinite RBF kernels in general. Proof: The validity of this geometric analogy could actually be read off directly from the implications of proposition 2.5 on the geometric setup. For a mathematical proof let us state the problem of finding the center of the minimal enclosing ball of the points φ(x i ) as min R R,w R 2 (3.57) s.t. φ(x i ) w 2 R 2, i. (3.58) We can express the squared norm in (3.58) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w (3.59) As done before, we state the generalized Lagrangian as L(w, R, α) = R 2 R 2 α i + w,w α i φ(x i ), φ(x i ) 2 α i + α i φ(x i ),w, (3.60) set its partial derivatives with respect to the primal variables to zero L(w, R, α) R L(w, R, α) w = 0 α i = 1 (3.61) = 0 w = α i φ(x i ), (3.62) use the resulting equations to substitute the primal variables and introduce a kernel to compute the dot product. The resulting dual problem then becomes: min α α i α j k(φ(x i ), φ(x j )) i,j=1 α i k(φ(x i ), φ(x i )) (3.63) s.t. α 0 (3.64) and α i = 1 (3.65)

27 3.3. THE ZEROSLABSVM 27 Because of proposition 2.5, using an RBF kernel will render the linear term in the objective function (3.63) constant and it is easy to see that as a consequence the MiniBall problem stated with (3.63)-(3.65) and its solution (3.62) become equivalent to the OpenSlabSVM defined in (3.45)-(3.47) and (3.40). q.e.d. Taking these geometric insights into account, an alternative approach to the OpenSlabSVM could be a geometric algorithm solving the corresponding Mini- Ball problem, which is well studied in theoretical computer science (i.e. [5]). 3.3 The ZeroSlabSVM Problem Formulation In the last section we presented the OpenSlabSVM as a special case of the Slab- SVM when the slab width is set to infinity. Let us now consider the other extreme case reducing the slab width to zero and call that case the ZeroSlabSVM. This can be formulated as to find a hyperplane orthogonal to w Y which exactly interpolates the sample points φ(x 1 ),..., φ(x n ) Y and has maximal distance to the origin (fig. 3.9). Because of the nonlinear feature map φ : X Y the samples o. o o w ρ / w o o o o Figure 3.9: Setup ZeroSlabSVM x 1,...,x n X in objective space do not necessarily have to form a linear shape. In the context of geometric modeling and shape reconstruction, the ZeroSlabSVM seems more interesting than the OpenSlabSVM, since the learned shape will interpolate the given sample points exactly. The problem of the ZeroSlabSVM can be stated as 1 2 w 2 ρ (3.66) s.t. w, φ(x i ) = ρ, i. (3.67) min w,ρ Note that the only difference to the OpenSlabSVM is an equality in (3.67) instead of the inequality in (3.38). Because of this, only equality constraints are involved

28 28 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM and solving the problem with Lagrangian optimization theory becomes easier: Necessary and sufficient conditions for the solution are that the partial derivatives of the Lagrangian function be zero. Thus: L(w, ρ,α) = 1 2 w 2 ρ L(w, ρ,α) w L(w, ρ,α) ρ = 0 w = = 0 α i ( w, φ(x i ) ρ) (3.68) α i φ(x i ) (3.69) α i = 1 (3.70) L(w, ρ,α) = 0 α j w, φ(x j ) = ρ (3.71) α i φ(x i ), φ(x j ) = ρ, j (3.72) By introducing a variable transformation β i := α i ρ (3.73) and replacing the scalar product with a kernel, we get the new system of equations w = i α i φ(x i ) (3.74) α = ρβ (3.75) β i = 1 (3.76) ρ β i k(x i,x j ) = 1, j. (3.77) The most important equation is (3.77), since it lets us compute the coefficients β i. Using equations (3.74) to (3.76) we can then find values for α i, ρ and w. Consequently, solving the ZeroSlabSVM is reduced to solving a system of linear equations (3.77), which can be rewritten in matrix notation as Kβ = 1 (3.78) where K ij = k(x i,x j ). Of course, in order to find a unique feasible solution, the matrix K has to have full rank. Because of theorem 2.6 we know that this for example holds when using the Gaussian kernel. With kernels of inferior rank, the ZeroSlabSVM will not be

29 3.3. THE ZEROSLABSVM 29 solvable (except with the trivial solution w = 0 2 ). This can also be understood in the geometry of the kernel induced feature space: The ZeroSlabSVM tries to interpolate the endpoints of n vectors in the space spanned by the vectors by a hyperplane. In general this is only possible if the vectors are linearly independent, i.e., their endpoints are affinely independent and define a hyperplane. A necessary and sufficient condition for the linear independence of the samples in a kernel induced feature space is the full rank of its kernel The Solution of the ZeroSlabSVM The solution of the ZeroSlabSVM is the hyperplane defined by w, φ(x) = ρ (3.79) interpolating the sample points φ(x 1 ),...,φ(x n ) in feature space. In objective space, this hyperplane corresponds to a more complex shape interpolating the original sample points x 1,...,x n. Using (3.69) it can be computed as the level-set of the function f(x) = = = α i φ(x i ), φ(x) (3.80) α i φ(x i ), φ(x) (3.81) α i k(x i,x). (3.82) Once more the use of kernels here and for the linear equation system in the last subsection saved us from having to compute any feature transformation explicitly when solving the ZeroSlabSVM, since all computations can be performed directly in objective space. The level-set value is ρ, which can be computed for example with (3.76). Using a Gaussian kernel, f will again be a weighted sum of Gaussians as discussed in subsection The solution (3.22) suggested in example 3.1 was in fact the solution of the ZeroSlabSVM applied to the 2D-circle scenario, since the resulting shape interpolated all the points exactly. It should now be clear that when the setup is as symmetric as it was in example 3.1, the rows of the matrix K are simply permutations of each other and thus choosing the weights uniformly will solve the system of linear equations (3.78). 2 In this case the variable transformation (3.73) is no longer valid and (3.78) does not hold anymore, since ρ = 0.

30 30 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM The Geometry of the ZeroSlabSVM Using the Gaussian Kernel Let us consider an approach to the problem of the ZeroSlabSVM (3.66)-(3.67) slightly different than the one in the last subsection. Again using Lagrangian optimization theory and equations (3.68),(3.69) and (3.70) we can formulate its dual problem: min w s.t. 1 2 w 2 (3.83) α i = 1 (3.84) i w = α i φ(x i ) (3.85) i As for the OpenSlabSVM, this alternative formulation leads to some interesting insights into the geometry of the ZeroSlabSVM. In fact, if we compare (3.83)- (3.85) with (3.53)-(3.56) from subsection they look surprisingly similar. The only difference is the missing nonnegativity constraint (3.54) on the dual variables in the case of the ZeroSlabSVM. Therefore the only consequence of changing the inequality of the OpenSlabSVM in (3.38) to an equality for the ZeroSlabSVM in (3.67) is the loss of this nonnegativity constraint. What are the geometric implications of this observation? The alternative formulation of the ZeroSlabSVM stated above gives rise to the following geometric properties (fig. 3.10): Because of (3.85), w is a linear combination of the feature points φ(x i ). (3.84) restricts w to lie in the affine hull of the feature points φ(x i ) (compared to the convex hull for the OpenSlabSVM). Because of the objective function (3.83), w will be the point closest to the origin. Therefore it will be the projection of the origin onto the affine hull of {φ(x i )}. In subsection we proved the analogy of the OpenSlabSVM and the MiniBall problem, when a Gaussian kernel is used. For the ZeroSlabSVM, a very similar statement can be made: Proposition 3.3 (ZeroSlabSVM - Center of Ball Analogy) In the feature space induced by a Gaussian kernel, solving the ZeroSlabSVM (which means finding the point w AffineHull({φ(x i )}) closest to the origin) is equivalent to finding the center w of the ball with the smallest radius interpolating the points φ(x i ). Also this analogy holds for positive semidefinite RBF kernels in general. Proof: Compared to proposition 3.2, proposition 3.3 is even easier to understand by just combining the above geometric insights with proposition 2.5 or simply by looking at figure The mathematical proof is very similar to the one for the

31 3.3. THE ZEROSLABSVM 31 φ 1 φ 3 φ 2 w 0 Figure 3.10: Geometric setup of the ZeroSlabSVM in Gaussian feature space with three sample points. OpenSlabSVM: The problem of finding the center of the ball with the smallest radius interpolating the points φ(x i ) can be formulated as min R R,w R 2 (3.86) s.t. φ(x i ) w 2 = R 2, i. (3.87) We can again express the squared norm in (3.87) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w. (3.88) Stating the Lagrangian function L(w, R, α) = R 2 R 2 α i + w,w α i φ(x i ), φ(x i ) 2 setting its partial derivatives to zero L(w, R, α) R L(w, R, α) w = 0 α i + α i φ(x i ),w, (3.89) α i = 1 (3.90) = 0 w = α i φ(x i ) (3.91)

32 32 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM L(w, R, α) α j = 0 2 α i φ(x i ), φ(x j ) = α s α t φ(x s ), φ(x t ) + s,t=1 φ(x j ), φ(x j ) R 2, j (3.92) and introducing a kernel to compute the dot product we get the necessary and sufficient conditions: w = i α i φ(x i ) (3.93) α i = 1 (3.94) α i k(x i,x j ) = 1 2 α s α t k(x s,x t ) + k(x j,x j ) R 2 s,t=1 =: c j, j (3.95) Because of proposition 2.5, using an RBF kernel will render c j in (3.95) constant for all j and it is easy to see that as a consequence the center of ball problem stated with (3.93)-(3.95) becomes equivalent to the ZeroSlabSVM defined in (3.69)-(3.71), where c j corresponds to ρ. q.e.d The ZeroSlabSVM and Shape Reconstruction Using Radial Basis Functions A well known technique in computer graphics for implicit surface modeling is shape reconstruction with radial basis functions ([6],[7]). In this context a radial basis function with respect to the sample points x 1,...,x n R d is a function of the form f(x) = p(x) + α i g( x x i ) (3.96) where p is a polynomial and the basic function g is a real valued function on [0, ). Usually, g is a conditionally positive definite RBF kernel k g( x x i ) = k(x,x i ) (3.97) of order q (see subsection 2.3) and p is a polynomial of degree lower than q. We will now show a connection between shape reconstruction with such radial basis functions and the ZeroSlabSVM. Given a set of sample points x 1,...,x n R d, the

33 3.3. THE ZEROSLABSVM 33 problem of shape reconstruction using radial basis functions with a conditionally positive definite RBF kernel of order q is formulated as the interpolation problem f(x i ) = y i, i (3.98) α i s(x i ) = 0, for all polynomials s of degree smaller than q (3.99) where y i are given values for f at the sample points x i, for example y i = 0 for on-surface points and y i = ǫ i 0 for off-surface points. The reconstructed shape is then implicitly defined by f as one of its level-sets. Let k be a conditionally positive definite RBF kernel and let {p 1,...,p l } be a basis for polynomials of degree smaller than q and p(x) = l c i p i (x). (3.100) Then the interpolation problem (3.98)-(3.99) can be written as the following linear system: ( )( ) ( ) K P α y P T = (3.101) 0 c 0 where K ij = k(x i,x j ), i, j = 1,...,n (3.102) P ij = p j (x i ), i = 1,...,n; j = 1,...,l (3.103) When the used kernel is even positive definite (as for example the Gaussian kernel) and thus has order q = 0, the polynomial p is not required anymore and the linear system becomes Kα = y. (3.104) Note that this is - up to a multiplicative factor - exactly the system of linear equations we derived for the ZeroSlabSVM in subsection (yet we did not use off-surface points for the ZeroSlabSVM). While in computer graphics these equations were stated directly as an interpolation problem, it is interesting to see that for certain kernels they can also be derived as a special case of the SlabSVM Solving the ZeroSlabSVM In the last few subsections we have presented different ways of approaching the ZeroSlabSVM. An obvious way to find its solution is to solve the linear equation system (3.78) using standard techniques. However, the Gram matrix K is notoriously ill conditioned, especially for a high number of data points (n > 1000). Its condition also strongly depends on the kernel parameter(s) and the used kernel. In the case of the Gaussian kernel, increasing σ typically leads to a very poor condition. Choosing σ too small on the other hand, can decrease the quality of the

34 34 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM reconstructed shape. For reconstructions with radial basis functions in computer graphics, the multipole method ([6],[7]) is used to deal with this problem. It quite successfully solves the problem approximatively but is not easy to implement. In this thesis we present a different approach for the case when the Gaussian kernel (or any other full-rank RBF kernel) is used: exploiting the geometric insights gained in subsection we introduce a purely geometric algorithm, which approximately finds the solution w as the center of the ball with the smallest radius interpolating the points φ(x 1 ),...,φ(x n ) in feature space. This algorithm is discussed in detail in chapter 4 and is a good example of how the geometric analysis of kernel methods can lead to new solution strategies, approaching the problem from a different angle. 3.4 The Geometry of the SlabSVM Revisited So far we have studied in detail the two extreme cases of the SlabSVM, their geometric properties and the geometric implications when a Gaussian kernel is used. Let us now conclude this chapter by gaining some understanding of the geometry of the SlabSVM with a Gaussian kernel itself. Equations (3.8) and (3.9) tell us that, independent of the chosen slab width, the solution w will always lie in the affine hull of the samples φ(x 1 ),...,φ(x n ) in feature space. We have seen that it is the point w OpenSlab closest to the origin within the convex hull of those points when the slab width goes to infinity and it is the point w ZeroSlab closest to the origin within the whole affine hull when the slab width is set to zero. Thus, solutions for slab widths between those extreme cases must lie on some parametrized curve starting at w OpenSlab and leading to w ZeroSlab. An open issue is the formal description of this curve. For only three sample points, this curve is a straight line (fig. 3.11). It is possible that similar arguments as in [8] apply here, proving that the whole solution path of the SlabSVM - from w OpenSlab to w ZeroSlab - is piecewise linear in the slab width. However, this is only an intuition and remains to be proven. It is not surprising that an analogy in the spirit of propositions 3.2 and 3.3 can also be found for the SlabSVM: Proposition 3.4 (SlabSVM - Spherical Slab Analogy) In the feature space induced by a Gaussian kernel, solving the SlabSVM (which means finding the point w AffineHull({φ(x i )}) fulfilling (3.1)-(3.2)) is equivalent to finding the center w of the spherical slab with the smallest radius and a given slab width, defined by two hyperballs for which the points φ(x i ) lie outside the inner and inside the outer ball. Also this analogy even holds for positive semidefinite RBF kernels in general.

35 3.4. THE GEOMETRY OF THE SLABSVM REVISITED 35 φ 1 w OpenSlab φ 3 φ 2 w ZeroSlab 0 Figure 3.11: In the feature space induced by a Gaussian kernel and only three points, the solutions w of the SlabSVM for different slab widths will lie on a straight line between the solutions of its extreme cases. Proof: The problem of finding the center of the spherical slab with the smallest radius for the points φ(x i ) can be stated as min R R,w R 2 (3.105) s.t. R 2 + ǫ φ(x i ) w 2 R 2 + ǫ, i. (3.106) As done before, we can express the squared norm in (3.106) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w, (3.107) state the generalized Lagrangian L(w, R, α, α ) = R 2 ( + α i φ(xi ), φ(x i ) + w,w 2 φ(x i ),w R 2 ǫ ) α i ( φ(xi ), φ(x i ) + w,w 2 φ(x i ),w R 2 ǫ ), (3.108) set its partial derivatives with respect to the primal variables to zero L(w, R, α, α ) R = 0 (α i αi ) = 1 (3.109)

36 36 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM L(w, R, α, α ) w = 0 w (α i αi ) = w = (α i αi )φ(x i ) (3.110) (α i αi )φ(x i ), (3.111) use the resulting equations to substitute its primal variables and introduce a kernel to compute the dot product. The resulting dual problem then becomes: min α,α (α i αi )(α j αj)k(φ(x i ), φ(x j )) (3.112) i,j=1 + ǫ α i (3.113) α i ǫ (α i αi )k(φ(x i ), φ(x i )) (3.114) s.t. α ( ) 0 (3.115) and (α i αi ) = 1. (3.116) Due to proposition 2.5, using an RBF kernel will render the linear term (3.114) constant and the problem stated with (3.112)-(3.116) and its solution (3.111) becomes equivalent to the SlabSVM stated in (3.13)-(3.15) and (3.8), where ǫ ( ) = 2δ ( ). q.e.d.

37 Chapter 4 Center-of-Ball Approximation Algorithm Proposition 3.3 stated that for positive semidefinite RBF kernels, solving the ZeroSlabSVM is equivalent to finding the center of the ball with the smallest radius interpolating the points in the kernel induced feature space. In this chapter we introduce a geometric algorithm to approximate this center, which exploits the geometric insights into the problem gained in subsection The algorithm thus computes the point w with minimal but equal distance to n sample points φ 1,...,φ n in a Hilbert space, which all have the same distance from the origin. w is also the projection of the origin onto the affine hull of the samples {φ i }. Section 4.1 first presents the algorithm by stating its basic steps in subsection and explaining their geometric role in subsection and eventually investigates some implementation details in subsection refining the algorithm to its final form. In Section 4.2 it is shown how the algorithm can be computed directly in objective space when using kernels, along with implications using the Gaussian kernel, interesting remarks, observations and open questions. Section 4.3 concludes the chapter with some results and examples. 4.1 The Algorithm Basic Two-Phase Iteration Algorithm Given: n points φ 1,...,φ n with equal distance r to the origin. Goal: Find the center w of the ball with the smallest radius interpolating the sample points. Remarks: We can express any point p on the affine hull of {φ i } as a linear combination p = α i φ i, where α i = 1 (4.1) 37

38 38 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Since the points are equidistant to the origin, w can be viewed as the projection of the origin onto the affine hull of {φ i }. Definitions: In each iteration step k, the actual approximation of the center of ball is described by c k = αi k φ i, αi k = 1. (4.2) Initialization (k = 0): αi 0 = 1, i (4.3) n Iteration Step (k k + 1): Phase 1: choose φ sk In each step we find s k = arg max j (φ j c k )( c k ) φ j c k (4.4) Phase 2: update c k c k+1 Then we update For the coefficients α i this means c k+1 = c k + (φ s k c k )( c k ) (φ sk c k ) φ sk c k φ sk c k (4.5) =: c k + f k (φ sk c k ) (4.6) αs k+1 k = (1 f k )αs k k + f k (4.7) α k+1 i s k = (1 f k )αi k (4.8) Geometric Interpretation The current approximation of the center in each iteration is c k. It lies in the affine hull AH of {φ i }, since we have chosen 1 n as initial value for α0 i and the update rules (4.7)-(4.8) will thus never violate n αk+1 i = 1. Also the solution w lies in AH, being the projection of the origin onto AH. We now want to move to a point c k+1 closer to the center w. For this we consider the directions (φ j c k ) defined for every sample point, which allow us to move within AH. There are now two questions to be answered: 1. Which direction (φ j c k ) should we choose?

39 4.1. THE ALGORITHM How far should we move in this direction to define our new approximation c k+1? We will start by answering the second question and then investigate the first one. One may find it helpful to study (fig. 4.1), which demonstrates the following considerations graphically in a three dimensional example. w c k+1 φ j 0 -c k φ c k i φ i - c k Figure 4.1: k-th Iteration Step How far should we move? Let us assume we have already found the optimal direction (φ sk c k ). Starting at c k and moving in this direction we cannot move closer to w than the projection of w onto the line l(t) = c k + t(φ sk c k ) defined by the chosen direction. We cannot, of course, compute this projection directly, because w is the solution to our problem and we do not know it in advance. However, using the fact that w is the projection of the origin onto AH, the projection of w onto l(t) is equivalent to the projection of the origin onto l(t) (the projection point, w and the origin span a plane orthogonal to l(t)). We can reach this projection point, which will be our new center approximation c k+1, by projecting the length of the vector c k onto (φ sk c k ) which gives us the distance to move in this direction, starting at c k. This step is performed by phase two (4.5) of the algorithm. Which direction should we choose? It remains to find the direction (φ sk c k ) in which we will move as described above. This is the purpose of phase one (4.4). We have seen that the distance c k+1 c k we will move in phase two is the projection of c k onto the direction

40 40 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM (φ sk c k ). Since c k, c k+1 and w form a triangle with a right angle at c k+1, the direction which will bring us closest to w in one step is the one with the highest value for the resulting moving distance c k+1 c k. Therefore, in phase one (4.4) the direction (φ sk c k ) maximizing this distance is chosen Refined Algorithm In subsection the basic steps defining the algorithm have been presented. This subsection shows how to reduce the complexity by computing the terms in equations 4.4 and 4.5 iteratively. To this end we define d k j := φ j,c k = α k i φi, φ j (4.9) and e k := c k,c k = j=1 αi k αj k φi, φ j. (4.10) We can then rewrite the two phases of the algorithm replacing the corresponding terms in equations 4.4 and 4.5: Phase 1 Phase 2 s k = arg max j c k+1 = c k + e k d k j r 2 2d k j + e k (4.11) e k d k s k r 2 2d k s k + e k (φ sk c k ) (4.12) =: c k + f k (φ sk c k ) (4.13) Iterative update of d j and e Since we know, how the α i are updated every step (equations (4.7)-(4.8)), we can now also compute d j and e iteratively: e k+1 = j=1 = i s k j sk αi k+1 αj k+1 φi, φ j (1 f k ) 2 α k i α k j φi, φ j (4.14) + i s k ( (1 f k )α k i ((1 f k )α k s k + f k ) φ i, φ sk ) + ((1 f k )αs k k + f k )(1 f k )α k j φsk, φ j j s k

41 4.1. THE ALGORITHM 41 + ((1 f k )α k s k + f k ) 2 φ sk, φ sk = (1 f k ) 2 αi k α k j φi, φ j + (1 fk )f k i s k j=1 i s k α k i + (1 f k ) 2 αs k k α k j φsk, φ j + (1 fk )f k j s k j s k α k j φi, φ sk + (1 f k ) 2 (αs k k ) 2 φ sk, φ sk + 2(1 fk )f k αs k φsk k, φ sk + fk 2 φsk, φ sk = (1 f k ) 2 αi k αj k φi, φ j j=1 + 2f k (1 f k ) } {{ } e k φsk, φ i α k i } {{ } d k s k φsk, φ j (4.15) (4.16) +r 2 f 2 k (4.17) = (1 f k ) 2 e k + 2f k (1 f k )d k s k + r 2 f 2 k (4.18) d k+1 j = αi k+1 φi, φ j = i s k α k i (1 f k ) φ i, φ j + (α k sk (1 f k ) + f k ) φ sk, φ j (4.19) (4.20) = (1 f k ) i s k α k i φi, φ j + (1 fk )α k s k φsk, φ j + f k φsk, φ j = (1 f k ) α k i = (1 f k )d k j + f k φsk, φ j φi, φ j + fk φsk, φ j (4.21) (4.22) (4.23)

42 42 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Final Algorithm Taking all the above into account, the algorithm becomes: Init: Step (k k + 1): αi 0 = 1, i (4.24) n d 0 i = 1 φj, φ n i, i (4.25) e 0 = 1 n j=1 d 0 i (4.26) (phase1 : search) s k = arg max j e k d k j r 2 2d k j + e k (4.27) (phase2 : step) =: arg maxgj k (4.28) j f k := e k d k s k r 2 2d k s k + e k (4.29) αs k+1 k = (1 f k )αs k k + f k (4.30) α k+1 i s k = (1 f k )αi k (4.31) e k+1 = (1 f k ) 2 e k + 2f k (1 f k )d k s k + r 2 f 2 k (4.32) d k+1 j = (1 f k )d k j + f k φsk, φ j, j (4.33) 4.2 Remarks Convergence and Complexity Lemma 4.1 (Convergence) The solution c k of each iteration step described by (4.27)-(4.33) converges to the center w of the smallest ball interpolating the sample points φ 1,...,φ n. Proof: In phase two of the iteration step we move from c k to the projection of w onto l(t) = c k + t(φ sk c k ). Therefore the triangle defined by w, c k and c k+1 has a right angle at c k+1 and w c k+1 w c k. (4.34)

43 4.2. REMARKS 43 Equality only holds for two cases: Either w c k equals zero, or the moving distance c k+1 c k equals zero. In the first case we have reached the solution w exactly and are done. The second case implies that the directions (φ j c k ) for every φ j are orthogonal to (w c k ). This can only occur when c k is the projection of the origin onto the affine hull AH of the points {φ j } and thus also in the second case it holds w = c k. By this we have shown that the approximation c k+1 of each iteration step lies strictly closer to w until w is reached exactly. It now remains to prove that c k cannot converge to any other point c # w. Let us assume there exists such a limit point c #. We have seen that if we choose c k to be c #, c k+1 will lie a certain distance d > 0 closer to w. Since c # is a convergence point, we can get arbitrarily close to it. But since the moving distance c k+1 c k is a continuous function in c k, for a point c k arbitrarily close to c #, virtually the same step as for c k = c # will be performed and we get to a point c k+1 which is d > 0 closer to w and further away from c #. Thus, c # cannot be a convergence point. q.e.d. The speed of the convergence remains an open issue. Consequently this also holds for the overall complexity of the algorithm. The complexity per iteration is O(n) (search for the best of n directions and update coefficients α 1,...,α n ). A good alternative of the overall complexity would be the approximation strength of the algorithm, as how well the samples are interpolated after n steps. However, such a measure is not easily found and depends on the geometric setup of the original problem and the used kernel (see subsection 4.2.2). Additionally, the quality of the solution highly depends on numerical issues. Since shape reconstruction usually works with a large number of samples (and thus with a high dimensional kernel induced feature space) numerical errors will be inevitable. Therefore, the solution c k will always remain an approximation. For reasons shown in subsection 4.2.2, when using kernels, their parameters will play an important role in the numerical condition of the problem. We have already discussed the numerical difficulties of the ZeroSlabSVM in subsection to solve it with a system of linear equations. Our hope is that the presented algorithm exploiting geometric insights increases the numerical stability of finding a sufficient approximation in short time Kernel Induced Feature Spaces The algorithm introduced in section 4.1 can, of course, perform on a set of sample points in any Hilbert space, meeting the condition that the points have to be equidistant to the origin and thus lie on a hypersphere centered at the origin. Due to proposition 3.3, it can also be used as an alternative method to solve the ZeroSlabSVM presented in section 3.3, when a positive semidefinite RBF kernel is chosen. In that case, the space is the kernel induced feature space which because of proposition 2.5 fulfills the hypersphere-constraint mentioned above. The sample points φ i can then be seen as a nonlinear transformation φ(x i ) of some points x i in objective space and we can compute their scalar products directly in objective

44 44 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM space as a kernel evaluation of those objective points: φi, φ j = φ(xi ), φ(x j ) = k(x i,x j ) (4.35) Since the geometric algorithm uses the scalar product as its sole operation on the sample points φ i, the kernel trick is applicable. Therefore, also with the algorithm presented in this chapter, the ZeroSlabSVM can be computed entirely in objective space. In section 3.3 we have seen that for the ZeroSlabSVM the Gram matrix (s. definition 2.2) of the used kernel must have full rank. If this is not the case, the algorithm will find the trivial solution w = 0. An example of a full rank RBF kernel is the Gaussian kernel. In that case the value for r used in section 4.1 will be 1 (see subsection 2.2.3). As indicated earlier, the parameter σ of the Gaussian kernel has a strong influence on the geometric setup in feature space and the resulting numerical stability of the problem. Increasing σ will decrease the distances of the points φ(x i ) in feature space. It is clear that the algorithm performs better on points broadly scattered on the hyperball. As shown in figure 4.2, when some points lie close together it is possible that the algorithm quickly reaches a point with approximately the same distance to the sample points but then enters a phase where it continues to approach the correct solution w only slowly and thus needs many iteration steps. Moreover, small displacements of two close sample Figure 4.2: Two points lying close together can already strongly influence the speed of convergence. points (for example due to a small numerical error in the computation of the scalar product) can move the center w to an entirely different place in feature space (fig. 4.3). On the other hand, as discussed in subsection 3.1.2, the shape reproduction quality of the solution using too small values for σ can be poor. Thus, also when

45 4.2. REMARKS 45 w φ1 φ 2 φ 2 w φ 3 0 Figure 4.3: Small displacements of close sample points can relocate the solution. using the algorithm described in section 4.1, in order to obtain a moderate numerical condition and a satisfactory reproduction quality, the kernel parameter σ needs to be chosen wisely Off-Surface Points In computer graphics, besides the sample points defining the surface, often additional off-surface points are used for shape reconstruction. For the resulting function f, implicitly defining the surface as one of its level-sets, the values f(x offsurf ) at these off-surface points then provide additional constraints. For example we can demand that all the surface points lie in one level-set of f and that the off-surface points inside and outside the shape each lie in another one. We will not go into detail of the theory of off-surface points and how to generate them. However, we shortly demonstrate a way to solve the shape reconstruction problem with offsurface points using the algorithm introduced in this chapter: We have seen that the algorithm can solve linear equation systems of the form Kα = c (4.36) where c is a vector with constant entries c and K is a Gram matrix for a positive semidefinite RBF kernel with respect to some sample points x 1,...,x n. The entries of the vector c can have any constant value other than zero, since we can always solve (4.36) with c = λc as right side by setting α i = λα i. Introducing

46 46 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM off-surface points, the system slightly changes to c l Kα = c m + ǫ m c n ǫ n (4.37) where c x and ǫ x are x-dimensional vectors of constant entries c and ǫ. K again denotes the Gram matrix of a positive semidefinite RBF kernel now with respect to the on-surface points x 1,...,x l and the inner and outer off-surface points x l+1,...,x l+m, x l+m+1,...,x l+m+n. The level-set value of f at the inner and outer off-surface points will thus be c + ǫ and c ǫ where the level-set value for the on-surface points will be c. How can we now solve (4.37) using the algorithm presented in section 4.1? A possible way is to decompose it into components as and to solve The solution can then be found as Kα = Kα (Kα 2 Kα 3 ) (4.38) Kα 1 = c l+m+n (4.39) ǫ l Kα 2 = ǫ m (4.40) Kα 3 = ǫ n ǫ l ǫ m ǫ n (4.41) α = α (α 2 α 3 ). (4.42) As stated before, the geometric algorithm is able to solve linear equation systems of the form (4.36). The first system (4.39) has already that form. To solve the second system (4.40), we transform it into with K α 2 = ǫ l+m+n (4.43) K := [α 2] i := { Kij if (i l + m, j > l + m) or (i > l + m, j l + m) K ij else { [α2 ] i if i > l + m [α 2 ] i else (4.44) (4.45) (4.43) has now the requested form for the geometric algorithm. The third system (4.41) can be solved analogically. The geometric interpretation of this transformation is that we have reflected the

47 4.3. RESULTS 47 points φ(x l+m+1 ),...,φ(x l+m+n ) in feature space, corresponding to the off-surface points x l+m+1,...,x l+m+n, at the origin. All the points therefore still lie on the hyperball in feature space and K is still a matrix of pairwise scalar products of this new set of partially reflected points. One problem of this approach to solve the shape reconstruction problem with off-surface points using our geometric algorithm is that the numerical difficulties discussed in subsection can get worse. If we use the Gaussian kernel for example, we saw that for increasing values of σ, the points in feature space will lie closer together. When we now reflect some of these points at the origin, the affine hull of the points will inevitably be very close to the origin. Thus the solution w will also be very close to the origin and the values for f(x) = w, φ(x) will all be very small. Increasing this value by scaling α (which will scale w correspondingly) will also increase the numerical error and reduce the quality of the reconstructed shape. Consequently for a reasonable number of sample data this approach does not seem feasible Orthogonalization In the geometric interpretation (subsection 4.1.2) of the algorithm we have seen that in every step the direction (φ sk c k ), which brings us closest to the solution w in one step, is chosen. This resembles a steepest descent approach except that we only use a finite set of n possible directions. Similar to the idea of the conjugate gradient method we could also orthogonalize the next direction to all previously chosen directions in every step. This would lead to the solution w in maximal n (=number of samples) steps. We believe however, that this would introduce the same numerical difficulties as when we attempted to solve the problem through the linear equation system presented in subsection Due to numerical errors especially in the last iterations the solution would still be an approximation. Furthermore the complexity would increase and the argument that we reach a satisfactory solution in reasonable time would be weakened. 4.3 Results Figures 4.4 to 4.7 show some examples of shape reconstruction with the ZeroSlab- SVM using the algorithm introduced in the current chapter and a Gaussian kernel. As already mentioned in subsection 3.1.2, the kernel parameter σ should not be chosen too large in order to maintain a feasible numerical condition. Note that one then has to find a good rendering strategy to visualize the level-set of the resulting function f defining the shape, since f will be more of a valley-type structure, which can impose some problems to standard rendering techniques for implicit surfaces as marching cubes. To address these problems, one can for example slightly decrease the level-set value. However, changing the level-set value also reduces the interpolation quality of the shape.

48 48 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Figure 4.4: Cactus (3337 Points) Figure 4.5: Flat Ball (492 Points)

49 4.3. RESULTS 49 Figure 4.6: Max Planck Head (2022 Points) Figure 4.7: Ball Joint (1964 Points)

50 50 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM

51 Chapter 5 Feature-Coefficient Correlation 5.1 Features of a Shape In the context of pattern recognition, computer vision, medical imagery and freeform shape design, the work with features of a shape have become increasingly important. The two features ridge and ravine are defined as local positive maxima of the maximal and local negative minima of the minimal principal curvature along their associated principal curvature lines ([9],[10]). Ridges and ravines are deeply connected with interesting properties of a shape as its medial axis or its distance function. 5.2 Feature-Coefficient Correlation The observation was already made in [4] that features seem to correlate with the coefficients (α i αi ) obtained from the solution of the SlabSVM using a Gaussian kernel. As we have seen in chapter 3 such a coefficient (α i αi ) is assigned to every sample point x i. The observation was that the coefficients with the highest values and the ones with negative values seem to correspond with ridges and ravines of the shape. This is shown in the figures , where the samples are color coded according to their corresponding coefficients. We now made the same observation for the ZeroSlabSVM. Figures show some examples, where the sample points are again color coded according to their associated coefficients α i, now stemming from the solution of the ZeroSlabSVM. In figure 5.5 we can see that not only features as ridges, ravines or sharp edges but also the sampling density influences the values of those coefficients. A change of sampling density on both arms of the cactus can for example produce negative coefficients at points where no features exist. We know that the coefficients are the weights of a sum of Gaussians (see subsection 3.1.2) which implicitly defines the reconstructed shape through one of its level-sets. It is therefore not difficult to gain some intuition on why the highest 51

52 52 CHAPTER 5. FEATURE-COEFFICIENT CORRELATION Figure 5.1: Bunny Figure 5.2: Rocker Arm

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Modelli Lineari (Generalizzati) e SVM

Modelli Lineari (Generalizzati) e SVM Modelli Lineari (Generalizzati) e SVM Corso di AA, anno 2018/19, Padova Fabio Aiolli 19/26 Novembre 2018 Fabio Aiolli Modelli Lineari (Generalizzati) e SVM 19/26 Novembre 2018 1 / 36 Outline Linear methods

More information

Nonlinearity & Preprocessing

Nonlinearity & Preprocessing Nonlinearity & Preprocessing Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR: x = (x 1, x 2, x 1 x 2 ) income: add degree + major Perceptron Map data into feature space

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Advanced Topics in Machine Learning, Summer Semester 2012

Advanced Topics in Machine Learning, Summer Semester 2012 Math. - Naturwiss. Fakultät Fachbereich Informatik Kognitive Systeme. Prof. A. Zell Advanced Topics in Machine Learning, Summer Semester 2012 Assignment 3 Aufgabe 1 Lagrangian Methods [20 Points] Handed

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe is indeed the best)

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Inner Product Spaces

Inner Product Spaces Inner Product Spaces Introduction Recall in the lecture on vector spaces that geometric vectors (i.e. vectors in two and three-dimensional Cartesian space have the properties of addition, subtraction,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

LECTURE 7 Support vector machines

LECTURE 7 Support vector machines LECTURE 7 Support vector machines SVMs have been used in a multitude of applications and are one of the most popular machine learning algorithms. We will derive the SVM algorithm from two perspectives:

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Lecture 10: Support Vector Machines

Lecture 10: Support Vector Machines Lecture 0: Support Vector Machines Lecture : Support Vector Machines Haim Sompolinsky, MCB 3, Monday, March 2, 205 Haim Sompolinsky, MCB 3, Wednesday, March, 207 The Optimal Separating Plane Suppose we

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Support Vector Machines and Speaker Verification

Support Vector Machines and Speaker Verification 1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft

More information

Applied inductive learning - Lecture 7

Applied inductive learning - Lecture 7 Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

SVM optimization and Kernel methods

SVM optimization and Kernel methods Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

Support vector machines

Support vector machines Support vector machines Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com May 10, 2018 Contents 1 The key SVM idea 2 1.1 Simplify it, simplify

More information

There are two things that are particularly nice about the first basis

There are two things that are particularly nice about the first basis Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date May 9, 29 2 Contents 1 Motivation for the course 5 2 Euclidean n dimensional Space 7 2.1 Definition of n Dimensional Euclidean Space...........

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information