Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Size: px

Start display at page:

Download "Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling"

Grant Gallagher
5 years ago
Views:

1 Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology Zürich March 13, 2006

2 2

3 Contents 1 Introduction 5 2 Kernels and Nonlinear Feature Maps Nonlinear Feature Maps Positive (Semi-)Definite Kernels Gram Matrices and Positive Definiteness Kernel Induced Feature Spaces and the Kernel Trick RBF Kernels Conditionally Positive Semidefinite Kernels Surface Modeling with the SlabSVM The SlabSVM Problem Formulation The Solution of the SlabSVM The Geometry of the SlabSVM The OpenSlabSVM Problem Formulation The Solution of the OpenSlabSVM The Geometry of the OpenSlabSVM Using the Gaussian Kernel The ZeroSlabSVM Problem Formulation The Solution of the ZeroSlabSVM The Geometry of the ZeroSlabSVM Using the Gaussian Kernel The ZeroSlabSVM and Shape Reconstruction Using Radial Basis Functions Solving the ZeroSlabSVM The Geometry of the SlabSVM Revisited Center-of-Ball Approximation Algorithm The Algorithm Basic Two-Phase Iteration Algorithm Geometric Interpretation Refined Algorithm Remarks

4 4 CONTENTS Convergence and Complexity Kernel Induced Feature Spaces Off-Surface Points Orthogonalization Results Feature-Coefficient Correlation Features of a Shape Feature-Coefficient Correlation

5 Chapter 1 Introduction Since the introduction of the Support Vector Machines, kernel techniques have become an important instrument for a number of tasks in machine learning and statistics. Each kernel defines an implicit transformation from objective space into a (usually higher dimensional) feature space. Depending on the chosen kernel, the geometry of this induced feature space can be very specific. Using RBF kernels for example, the points in feature space all lie on a hypersphere around the origin. Our goal was to analyze the geometric constraints of the feature space induced by a RBF kernel (and the Gaussian kernel in particular) and its implications on the geometry of problems formulated with such kernels. We demonstrate in the example of shape reconstruction using the SlabSVM that those implications can then be used to restate or even improve algorithms performed in kernel feature spaces. Note that the techniques presented in this thesis as kernels or Support Vector Machines, are mainly used in machine learning. However, our research focus is not so much the analysis of learning related concepts (as risk and loss functions or other elements of statistical learning theory) but rather to investigate the properties (especially the geometric ones) of these techniques and to gain insights possibly leading to new perspectives. Chapter 2 gives a short introduction into the basic concepts crucial for the succeeding chapters as nonlinear feature maps and kernels. In chapter 3 the Slab- SVM is introduced as a method for shape reconstruction, along with a study of the solution properties, interesting special cases and geometric interpretations. Chapter 4 offers an in-depth analysis of a new algorithm for a special case of the SlabSVM, which exploits geometric insights gained so far. Finally, chapter 5 sheds some light on the interesting observation that the solution of the SlabSVM seems to correlate with features of the shape as ridges, ravines or sharp edges. 5

6 6 CHAPTER 1. INTRODUCTION

7 Chapter 2 Kernels and Nonlinear Feature Maps 2.1 Nonlinear Feature Maps A nonlinear feature map can be written as a nonlinear transformation function φ, which transforms any point x X to a corresponding point φ(x) Y. The space X of the original samples is then called objective space and Y is called feature space. Nonlinear feature maps can be useful for many applications. This is demonstrated by the following symbolic example: Example 2.1 Given is a set of sample points x i = ([x i ] 1, [x i ] 2 ) R 2 approximately arranged on a circle around the origin (fig. 2.1(a)). We now want to fit a curve through these points. To this end, we apply the feature map φ(x) = ( ) [x] 2 1, [x]2 2 R 2. (2.1) In Objective space, points x on circles centered at the origin meet the condition [x] [x]2 2 = const. (2.2) For the corresponding points φ(x) = y in feature space it consequently holds [y] 1 + [y] 2 = const., (2.3) which means that they lie on a straight line. Using the described feature map, the transformed sample points will thus approximately form a linear shape in feature space (fig. 2.1(b)), which is easier to learn. Of course this simple toy example is somewhat artificial and it may seem oversimplified. Nevertheless, it is sufficient to highlight the important fact that nonlinear feature maps can linearize problems and therefore reduce their solution complexity. 7

8 8 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS (a) Objective Space (b) Feature Space Figure 2.1: Nonlinear Feature Map 2.2 Positive (Semi-)Definite Kernels Let us look again at our simple toy example 2.1: We have used the feature map ( ) X Y : x φ(x) = [x] 2 1, [x]2 2 = y. (2.4) Scalar products in feature space can thus be computed as y 1,y 2 = φ(x 1 ), φ(x 2 ) (2.5) ( ) ( ) = [x 1 ] 2 1, [x 1] 2 2, [x 2 ] 2 1, [x 2] 2 2 (2.6) = [x 1 ] 2 1 [x 2] [x 1] 2 2 [x 2] 2 2 (2.7) =: k(x 1,x 2 ). (2.8) It is important to see that the scalar product of two vectors φ(x 1 ),φ(x 2 ) in feature space can therefore be computed as a function k which takes as input the two corresponding vectors x 1,x 2 in objective space. This function k is called a kernel Gram Matrices and Positive Definiteness Some important definitions concerning kernels are: Definition 2.2 (Gram Matrix) Given a function k : X 2 K (where K = C or K = R) and samples x 1,...,x n X, the n n matrix K with elements K ij = k(x i,x j ) (2.9) is called the Gram matrix (or kernel matrix) of k with respect to x 1,...,x n. Definition 2.3 (Positive Semidefinite Matrix) A real n n matrix K satisfying c i c j K ij 0 (2.10) i,j=1

9 2.2. POSITIVE (SEMI-)DEFINITE KERNELS 9 for all c i R is called positive semidefinite. If equality in (2.10) only holds for c 1 =... = c n = 0 then it is called positive definite. Definition 2.4 (Positive Semidefinite Kernel) Let X be a nonempty set. A function k on X X which for all n N and all x 1,...,x n X gives rise to a positive semidefinite Gram matrix is called a positive semidefinite kernel. If equality in (2.10) only holds for c 1 =... = c n = 0 then it is called a positive definite kernel. Note that positive definiteness is a special case of positive semidefiniteness and therefore every property stated for positive semidefinite kernels also holds for positive definite ones. Another class of kernels are the conditionally positive semidefinite kernels, defined in section 2.3, which play an important role in Computer Graphics. Since positive (semi)definite kernels give rise to nice geometric interpretations, they will be our main interest. Unless noted otherwise, a kernel will therefore always mean a positive (semi)definite kernel. However, we will discuss the use of conditionally positive semidefinite kernels for surface reconstruction using radial basis functions when we study the ZeroSlabSVM in section Kernel Induced Feature Spaces and the Kernel Trick In the beginning of this section we saw that we can compute scalar products in the feature space of our simple toy example 2.1 by an evaluation of a kernel k on the samples in objective space. In fact, for every positive semidefinite kernel it holds k(x 1,x 2 ) = φ(x 1 ), φ(x 2 ), (2.11) for some feature map φ. It can therefore be stated that every positive semidefinite kernel implicitly defines a feature map φ into some Hilbert space. For our example this means that the kernel k(x 1,x 2 ) = [x 1 ] 2 1 [x 2] [x 1] 2 2 [x 2] 2 2 (2.12) implicitly defines the feature map (2.4). This second statement indicates the power of kernels: Let us assume we have given some sample points in objective space and we want to perform an algorithm on these samples (e.g. learning the shape described by those samples). Since the structure of the samples is nonlinear, we would like to apply a feature transformation φ into a feature space in order to linearize the problem. Then we will perform the algorithm in feature space. Let us now further assume that this algorithm in feature space uses the scalar product as its sole basic operation. We know that there exists a kernel k, which implicitly defines the used feature map φ and computes scalar products in the feature space as a function on points in the objective space. Using this kernel it is thus possible

10 10 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS to formulate our algorithm, which operates in feature space, directly in objective space, without the need of performing the feature transformation explicitly! This can result in a massive reduction in complexity, especially when the feature space is very high (or even infinite) dimensional. An example of how this is done in practice is the SlabSVM introduced in chapter 3. Of course this Kernel Trick only works when the algorithm in feature space only depends on scalar products. Fortunately, due to the power of the scalar product, many interesting problems can be solved by algorithms fulfilling this condition. Consequently there exists quite a number of well-investigated kernels used for a variety of problems in machine learning and many other fields. The next sections will introduce some kernels important for the following chapters. For an in-depth analysis of kernels, their definitions and properties and applications, the interested reader is referred to the extensive literature including [1],[2],[3] RBF Kernels Radial basis function (RBF) kernels follow the form where f is a function on R + 0 k(x 1,x 1 ) = f (d(x 1,x 2 )) (2.13) and d is a metric in X, for which the usual choice is d(x 1,x 2 ) = x 1 x 2. (2.14) The fact that for every metric d(x,x) = 0 gives rise to a first geometric statement: Proposition 2.5 (Geometry of RBF Kernels) The transformed points φ(x) in the feature space induced by a positive semidefinite RBF kernel are equidistant to the origin and thus all lie on a hypersphere with radius k(x, x) = f(0) around the origin. The Gaussian Kernel A very popular choice of a positive definite RBF kernel in machine learning is the Gaussian kernel: ( ) k(x 1,x 2 ) = exp x 1 x 2 2 2σ 2, σ > 0. (2.15) It was mentioned before that when using positive semidefinite RBF kernels the transformed points in feature space all lie on a hypersphere around the origin. For the Gaussian kernel it holds φ(x i ) 2 = φ(x i ), φ(x i ) = k(x i,x i ) = 1. (2.16) Therefore the hypersphere has radius one in this case. The Gaussian kernel has another very important property:

11 2.2. POSITIVE (SEMI-)DEFINITE KERNELS 11 Theorem 2.6 (Full Rank of Gaussian Gram Matrices) Suppose that x 1,..., x n X are distinct points, and σ 0. The matrix given by has full rank. K ij = exp This leads to two crucial implications: ( x i x j 2 2σ 2 ) (2.17) The transformed feature points φ(x 1 ),...,φ(x n ) are linearly independent. In principle, the feature space induced by a Gaussian kernel is infinite dimensional. For geometric considerations however, it is usually sufficient to look at the n dimensional subspace spanned by the feature points φ(x 1 ),...,φ(x n ). The n points φ(x i ) also lie on an n 1 dimensional hyperball, which of course is trivial since n linearly independent points always do. With only three points, the geometric setup in feature space is indicated in figure 2.2: The feature points φ i = φ(x i ), i = 1, 2, 3 lie on the three dimensional unit sphere around the origin and can be interpolated by a circle on this sphere. φ 1 φ 3 φ 2 0 Figure 2.2: Feature space of Gaussian kernel with three sample points.

12 12 CHAPTER 2. KERNELS AND NONLINEAR FEATURE MAPS 2.3 Conditionally Positive Semidefinite Kernels Let us conclude the chapter with some notes on conditionally positive semidefinite kernels. Definition 2.7 (Conditionally Positive Definite Kernels of Order q) A symmetric kernel X X R is called conditionally positive semidefinite of order q on X R d if for any distinct points x 1,...,x n R d it holds for the quadratic form α i α j k(x i,x j ) 0, (2.18) i,j=1 provided that the coefficients α 1,...,α n satisfy α i p(x i ) = 0, (2.19) for all polynomials p(x) on R d of degree lower than q. If equality in (2.18) only holds for α 1 =... = α n = 0 then it is called conditionally positive definite. Note that (unconditional) positive definiteness is identical to conditional positive definiteness of order zero and that conditional positive definiteness of order q implies conditional positive definiteness of any larger order. Examples of conditionally positive definite radial kernels are: k(x 1,x 2 ) = ( 1) β/2 ( c 2 + x 1 x 2 2) β/2, order q = β/2, β > 0, β / 2N (2.20) k(x 1,x 2 ) = ( 1) k+1 x 1 x 2 2k log x 1 x 2, order q = k + 1, k N (2.21) Conditionally positive semidefinite kernels of order larger than zero do not directly define a dot product in some feature space anymore and geometric considerations of spaces induced by such kernels are not as straightforward as they are in the case of order zero. Thus they are not the focus of this thesis. However, their use in the context of shape reconstruction using radial basis functions will be discussed briefly in subsection

13 Chapter 3 Surface Modeling with the SlabSVM This chapter studies shape reconstruction using the Slab Support Vector Machine (SlabSVM). Section 3.1 introduces the SlabSVM and shows some interesting aspects and properties. As special cases of the SlabSVM, the OpenSlabSVM and ZeroSlabSVM are presented in section 3.2 and 3.3 together with some relations to shape reconstruction using radial basis functions as done in computer graphics and geometric insights which will eventually lead to the algorithm introduced in chapter The SlabSVM Problem Formulation Assume we have given n sample points x 1,...,x n X as a sampling of an unknown shape, where usually but not necessarily X = R 2 or X = R 3. We now want to reconstruct this shape by learning a function f(x), which implicitly defines the learned shape by one of its level-sets {x X : f(x) = c}. The SlabSVM was introduced in [4] as a kernel method for such an implicit surface modeling. Without the use of outliers it can be stated as the optimization problem 1 2 w 2 ρ (3.1) s.t. δ w, φ(x i ) ρ δ, i (3.2) min w,ρ defining the following geometric setup (fig. 3.1): We first apply a feature map φ(x) in order to linearize the problem. We will later see that it will not be necessary to perform this transformation explicitly, since the kernel trick will be applicable (read chapter 2 for an introduction of these important concepts). The goal is then to find a slab, defined by two parallel hyperplanes orthogonal to the solution vector w, enclosing all the sample points 13

14 14 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM o. o o o (ρ+δ* )/ w w (ρ+δ)/ w o o o Figure 3.1: Setup SlabSVM φ(x i ). The width of the slab has to be given in advance. Having found the solution vector w, we can then reconstruct the shape as a level-set of the function f(x) = w, φ(x). (3.3) In feature space, this represents a hyperplane parallel to those defining the slab. The level-set value is usually chosen such that this hyperplane lies within the slab, which holds for values in the interval [ρ + δ, ρ + δ ]. Also for the evaluation of the function f, the use of a kernel will be of great help, as shown later. To solve the optimization problem (3.1)-(3.2) we compute the generalized Lagrangian function as L(w, ρ,α, α ) = 1 2 w 2 ρ α i ( w, φ(x i ) ρ δ) + αi ( w, φ(x i ) ρ δ ). (3.4) The solution of our primal optimization problem is then equivalent to the one of the Lagrangian dual problem defined as max θ(α, α ) (3.5) α,α subject to α 0 (3.6) α 0, (3.7) where θ(α, α ) = inf w,ρ L(w, ρ,α, α ). To compute the infimum of L with respect to the primal variables w and ρ we set the corresponding derivatives to zero L(w, ρ,α, α ) w = 0 w = (α i αi ) φ(x i ) (3.8)

15 3.1. THE SLABSVM 15 L(w, ρ,α, α ) ρ = 0 (α i αi ) = 1 (3.9) and use the resulting equations to replace the primal variables in L by functions of the dual variables α i and αi. The dual problem can then be formulated as 1 min (α i α α,α i ) ( α j α 2 j) φ(xi ), φ(x j ) i,j=1 δ α i + δ αi (3.10) s.t. α ( ) 0 (3.11) and (α i αi ) = 1. (3.12) Since the only operation performed on the feature points φ(x i ) is the scalar product, we can apply the kernel trick and replace it by kernel evaluations, which leads to the final problem formulation of the SlabSVM: 1 min α,α 2 (α i αi ) ( α j αj) k(xi,x j ) δ i,j=1 α i + δ αi (3.13) s.t. α ( ) 0 (3.14) and (α i αi ) = 1. (3.15) This is a good example of how the use of kernels saves us from having to compute the feature map φ explicitly, because it does not appear in the optimization problem anymore. Instead, the feature transformation is implicitly defined by the chosen kernel k. The result is a convex quadratic program, which can be solved using standard techniques The Solution of the SlabSVM As mentioned earlier, the shape is reconstructed as a level-set of the function f (equation (3.3)). Since f too is defined only using scalar products, the use of kernels becomes possible also here and with equation (3.8) we get f(x) = w, φ(x) (3.16) = (α i αi )φ(x i ), φ(x) (3.17) = (α i αi ) φ(x i ), φ(x) (3.18)

16 16 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM = (α i αi )k(x i,x). (3.19) Thus, an explicit transformation into the feature space is at no point needed when reconstructing a shape with the SlabSVM, since we can perform all its computations directly in objective space. Properties of the Solution Using a Gaussian Kernel When we use a Gaussian kernel, the solution function f becomes ( ) f(x) = (α i αi )exp x i x 2 2σ 2. (3.20) There is a Gaussian Bell located at each sample point x i contributing to the overall value of f. The kernel parameter sigma controls the breadth of those bells, also called the support of the kernel. The dual variables α i and α i specify the amount of the contribution of the bell located at sample point x i to f. The reconstructed shape is then a level-set of this weighted sum of Gaussians. To study some interesting properties of such a sum, let us investigate the following example: Example 3.1 (Circular shape in 2D) Given samples x 1,...,x n R 2, n 2 uniformly arranged on a circle with radius r around the origin (fig. 3.2): ( ( r cos 2π ) x i = n (i 1)) r sin ( 2π, where i = 1,...,n (3.21) n (i 1)) To interpolate these points with a level-set of a function described by equation (3.20), we can choose uniform weights 1 (remember that they have to fulfill equation (3.15)): (α i α i ) = 1 n. (3.22) Proof: Due to the symmetry of the setup, the function f(x) will take the same value at each sample point x i and therefore the shape defined by the level-set {x : f(x) = f(x i )} for any i will interpolate every sample point correctly. q.e.d. The resulting function f is therefore ( ( f(x) = 1 r cos 2π n (i 1)) exp n r sin ( 2π n (i 1)) 2σ 2 ) 2 x (3.23) 1 This would actually be the result of the ZeroSlabSVM applied to this setup, which is explained in section 3.3.

17 3.1. THE SLABSVM 17 6 x x Figure 3.2: Points uniformly distributed on a circle. and only depends on the number of sampling data n, the circle radius r and the kernel parameter σ. Figure 3.3 shows the plot of such a function for some values of the remaining parameters. Considering the level-sets of f for different parameter values, two cases can be observed: 1. The function f decreases again and forms a valley in the middle of the circle, causing a second (inner) ring to appear in the level-set (figures 3.3(a)- 3.3(d)). 2. The function does not sink below the level-set value again in the middle of the circle and thus only the outer, point-interpolating ring lies in the level-set (figures 3.3(e)-3.3(f)). Restating f to f(x) = 1 n ( ( cos 2π r2 n (i 1)) exp sin ( 2π n (i 1)) 2σ 2 ) x r 2, (3.24) one can easily see that σ has the exact inverse influence to f as r has, when we rescale with each change of r the evaluation points x such that they occupy the same relative position with respect to the circle. Thus, for the qualitative description of f, increasing r has the same effect as decreasing σ. In order to investigate the two observed cases in a more mathematical manner, let us compare the values of f at a sample point and the value of f at the origin (the

18 18 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM 6 x x x x (a) σ = 2.5 (b) x x x x (c) σ = 3.5 (d) 6 x x1 4 6 x x (e) σ = 5 (f) Figure 3.3: The function f with r = 6, n = 8 and different values for σ and the corresponding reconstructed shapes.

19 3.1. THE SLABSVM 19 center of the circle). At the center, f becomes ( ( cos 2π f((0, 0)) = 1 r2 n (i 1)) exp n sin ( 2π n (i 1)) 2σ 2 ) 2 (3.25) = 1 n = exp exp ( r2 2σ 2 ( r2 2σ 2 ) (3.26) ). (3.27) Since f takes on the same value for every sample point, we are free to choose a point, for example x 1 = (r, 0): f((r, 0)) = 1 n ( ( cos 2π r2 n (i 1)) exp sin ( 2π n (i 1)) 2σ 2 ) ( 1 0 ) 2 (3.28) ( = 1 ( exp r2 2 2 cos ( ) 2π n (i 1))) n 2σ 2 (3.29) = 1 ( ) ( n exp r2 r 2 ( 2 cos ( 2π n 2σ 2 exp (i 1)) 1 ) ) 2σ 2 (3.30) = f((0, 0)) 1 r ) 2 ( ( σ 2 cos 2π n exp(( (i 1)) 1 ) ) (3.31) n 2 ( =: f((0, 0)) g n, r ). (3.32) σ We derived that f((r, 0)) has the value f((0, 0)) multiplied by a factor g which is a function in n and r σ. When g is larger than one, the value of f at a sample point is higher than at the origin, which leads to the first case mentioned above, where f forms a valley in the center and there are two level-sets. In the case of g being smaller than one, f will not decrease below the level-set value again moving from the rim to the center of the circle. In figure 3.4 the function g is plotted for different numbers of sample points n. When n goes to infinity, g becomes asymptotically lim (n, g r ) = 1 2π r ) 2 ) σ (2 cos (ω) 1) exp(( dω, (3.33) n σ 2π 0 2 which is also plotted in figure 3.4. From the plots we can learn that increasing r σ will increase the value of f((r, 0)) with respect to f((0, 0)). On the other hand,

20 20 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM n=2 n=3 n=4 n=5 n->inf 2 g r / sigma Figure 3.4: The function g for different values of n. when r σ lies in the range of 0 and about 1 to 1.7 (depending on n) f((r, 0)) will be less than f((0, 0)). Thus, increasing σ or decreasing r will change f from a valleytype structure into more of a hill-type one and it will make the second, inner ring disappear from the level-set. It will also smoothen the outer ring of the level-set. Let us conclude this example with an analysis of one last question: We saw that for a certain range of r σ f gets a hill-type structure and the inner ring observed earlier (fig. 3.3) disappears from our level-set. Is it also possible to increase r σ enough, such that the inner and outer rings of our level-set coincide? In the qualitative geography of f this would mean that the level-set lies exactly on the ridge of the hills arranged around the circle and forming the valley in its center (figures 3.3(a) and 3.3(c)). To investigate this matter we form the gradient of f at an arbitrary sample point, for example at x 1 = (r, 0): f(x) [x] 1 = x=(r,0) r nσ 2 ( ( ) ) ( ( ) ) 2π(i 1) cos 1 exp r2 cos 2π(i 1) n 1 n σ 2 (3.34)

21 3.1. THE SLABSVM 21 f(x) [x] 2 = x=(r,0) ( ( ) ) ( ) 2π(i 1) sin exp r2 cos 2π(i 1) n 1. n σ 2 (3.35) r nσ 2 Due to the symmetry of the problem and because n 2 equation (3.35) will always be zero. Since exp(x) > 0 and ( ( ) ) 2π(i 1) cos 1 0, i = 1,...,n (3.36) n where equality only holds for i = 1, equation (3.34) will always be less than zero for n 2. Therefore the gradient always points to the center of the circle and has a nonzero length. This proves that f cannot have a maximum at any sample point and thus the case that the level-set lies exactly on the ridge of f cannot occur. Equation (3.34) also tells us that for very high values of r σ the gradient will be very small and in any numerical setup the case becomes possible. However, when we increase r σ too much with respect to the number of sample points, the learned shape will fall apart into several subshapes (fig. 3.5), which is usually not a satisfactory result x x1 4 6 x x (a) σ = 2.3 (b) Figure 3.5: For too small values of r σ the reconstructed shape falls apart. This extensive example reveals quite a number of interesting properties of the solution as a level-set of a weighted sum of Gaussians. Of course all the computations only hold for a uniform sampling of circles but they provide insights and intuitions very useful to understand the solutions for arbitrary shapes. We learned for example that for certain parameter settings the resulting level-set can

22 22 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM contain additional shapes (as the additional inner ring in the example) or even disintegrate, which we actually wish to avoid. This can be done by increasing the value of σ and changing f from a valley-type to a hill-type structure. Also, to algorithmically find the level-set (i.e. in a rendering process) such a hill-structure would be preferable, since level-sets are better found when the function is steep around the iso-value. Unfortunately, as will be discussed later, increasing σ usually introduces severe numerical difficulties and thus reduces the quality of the reconstructed shape The Geometry of the SlabSVM Until now we have not yet studied the geometric implications to the SlabSVM when using feature spaces induced by specific kernels, as for example the Gaussian kernel. To do so, we will first consider some interesting special cases of the SlabSVM in the following sections. We will then return to the SlabSVM in section 3.4 and try to incorporate the insights from these special cases into a further analysis of the SlabSVM itself. 3.2 The OpenSlabSVM Problem Formulation In section 3.1 we defined the SlabSVM as finding a slab, such that all sample points in feature space lie within this slab and its distance to the origin is maximal. The special case presented in the current section is the one when the slab width is set to infinity. We have therefore only one remaining hyperplane to consider, for which we request that all the sample points lie on its side not containing the origin. Again, its distance to the origin is maximized (fig. 3.6). We therefore want to solve the following optimization problem: o. o o o w ρ / w o o o Figure 3.6: Setup OpenSlabSVM

23 3.2. THE OPENSLABSVM w 2 ρ (3.37) s.t. w, φ(x i ) ρ, i (3.38) min w,ρ As before, the sample points in objective space are denoted x 1,...,x n X and φ : X Y is a feature map into some feature space. In machine learning, this problem is also called the OneClassSVM or Single- ClassSVM [1]. To solve the problem, we can again form the generalized Lagrangian set its derivatives to zero L(w, ρ,α) = 1 2 w 2 ρ α i ( w, φ(x i ) ρ) (3.39) L(w, ρ,α) w L(w, ρ,α) ρ = 0 w = α i φ(x i ) (3.40) = 0 α i = 1 (3.41) and state the dual problem min α 1 2 α i α j φ(x i ), φ(x j ) (3.42) i,j=1 s.t. α 0 (3.43) and α i = 1. (3.44) Again, we can use the kernel trick to replace the scalar products, which ultimately leads to the problem min α 1 2 α i α j k(x i,x j ) (3.45) i,j=1 s.t. α 0 (3.46) and α i = 1. (3.47) As for the SlabSVM we have found a convex quadratic program to solve the OpenSlabSVM.

24 24 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM Figure 3.7: Solution of an OpenSlabSVM The Solution of the OpenSlabSVM The solution of the OpenSlabSVM is the hyperplane defined by w, φ(x) = ρ, (3.48) fulfilling the conditions stated in equations (3.37) - (3.38). In objective space, the points x for which (3.48) holds will not necessarily form a hyperplane anymore. They will rather define an arbitrary nonlinear shape enclosing the sample points x i (fig. 3.7). Using (3.40) this shape is defined as a level-set of the function f(x) = α i φ(x i ), φ(x) (3.49) = = α i φ(x i ), φ(x) (3.50) α i k(x i,x). (3.51) Note that the use of kernels here, and for the quadratic program in the last subsection, once more saved us from having to compute any feature transformation directly. The level-set value is ρ, which can be computed by ρ = min i f(x i ) (3.52) since it is the lower bound of the inequality (3.38) and because there will always be a sample point where equality holds. Using a Gaussian kernel, f will again be a weighted sum of Gaussians as discussed in subsection

25 3.2. THE OPENSLABSVM 25 φ 1 w φ 3 φ 2 0 Figure 3.8: Geometric setup of the OpenSlabSVM in Gaussian feature space with three sample points The Geometry of the OpenSlabSVM Using the Gaussian Kernel When we combine equation (3.40) and the quadratic program obtained at the end of subsection we get another optimization problem for the OpenSlabSVM, namely 1 min w 2 w 2 (3.53) s.t. α 0 (3.54) α i = 1 (3.55) i w = i α i φ(x i ). (3.56) This alternative formulation reveals some nice geometric insights into the solution w in feature space (fig. 3.8): Equation (3.56) tells us that w is a linear combination of the feature points φ(x i ). The constraints (3.54) and (3.55) on the coefficients α i of this linear combination narrow the solution region of w to points in the convex hull of the feature points φ(x i ). Because of the objective function (3.53), w will be the point closest to the origin in this convex hull.

26 26 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM There is another interesting geometric property of the OpenSlabSVM when the feature map φ is defined implicitly by a Gaussian kernel: Proposition 3.2 (OpenSlabSVM - MiniBall Analogy) In the feature space induced by a Gaussian kernel, solving the OpenSlabSVM (which means finding the point w ConvexHull({φ(x i )}) closest to the origin) is equivalent to finding the center w of the minimal enclosing ball for the points φ(x i ). In fact, this analogy even holds for positive semidefinite RBF kernels in general. Proof: The validity of this geometric analogy could actually be read off directly from the implications of proposition 2.5 on the geometric setup. For a mathematical proof let us state the problem of finding the center of the minimal enclosing ball of the points φ(x i ) as min R R,w R 2 (3.57) s.t. φ(x i ) w 2 R 2, i. (3.58) We can express the squared norm in (3.58) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w (3.59) As done before, we state the generalized Lagrangian as L(w, R, α) = R 2 R 2 α i + w,w α i φ(x i ), φ(x i ) 2 α i + α i φ(x i ),w, (3.60) set its partial derivatives with respect to the primal variables to zero L(w, R, α) R L(w, R, α) w = 0 α i = 1 (3.61) = 0 w = α i φ(x i ), (3.62) use the resulting equations to substitute the primal variables and introduce a kernel to compute the dot product. The resulting dual problem then becomes: min α α i α j k(φ(x i ), φ(x j )) i,j=1 α i k(φ(x i ), φ(x i )) (3.63) s.t. α 0 (3.64) and α i = 1 (3.65)

27 3.3. THE ZEROSLABSVM 27 Because of proposition 2.5, using an RBF kernel will render the linear term in the objective function (3.63) constant and it is easy to see that as a consequence the MiniBall problem stated with (3.63)-(3.65) and its solution (3.62) become equivalent to the OpenSlabSVM defined in (3.45)-(3.47) and (3.40). q.e.d. Taking these geometric insights into account, an alternative approach to the OpenSlabSVM could be a geometric algorithm solving the corresponding Mini- Ball problem, which is well studied in theoretical computer science (i.e. [5]). 3.3 The ZeroSlabSVM Problem Formulation In the last section we presented the OpenSlabSVM as a special case of the Slab- SVM when the slab width is set to infinity. Let us now consider the other extreme case reducing the slab width to zero and call that case the ZeroSlabSVM. This can be formulated as to find a hyperplane orthogonal to w Y which exactly interpolates the sample points φ(x 1 ),..., φ(x n ) Y and has maximal distance to the origin (fig. 3.9). Because of the nonlinear feature map φ : X Y the samples o. o o w ρ / w o o o o Figure 3.9: Setup ZeroSlabSVM x 1,...,x n X in objective space do not necessarily have to form a linear shape. In the context of geometric modeling and shape reconstruction, the ZeroSlabSVM seems more interesting than the OpenSlabSVM, since the learned shape will interpolate the given sample points exactly. The problem of the ZeroSlabSVM can be stated as 1 2 w 2 ρ (3.66) s.t. w, φ(x i ) = ρ, i. (3.67) min w,ρ Note that the only difference to the OpenSlabSVM is an equality in (3.67) instead of the inequality in (3.38). Because of this, only equality constraints are involved

28 28 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM and solving the problem with Lagrangian optimization theory becomes easier: Necessary and sufficient conditions for the solution are that the partial derivatives of the Lagrangian function be zero. Thus: L(w, ρ,α) = 1 2 w 2 ρ L(w, ρ,α) w L(w, ρ,α) ρ = 0 w = = 0 α i ( w, φ(x i ) ρ) (3.68) α i φ(x i ) (3.69) α i = 1 (3.70) L(w, ρ,α) = 0 α j w, φ(x j ) = ρ (3.71) α i φ(x i ), φ(x j ) = ρ, j (3.72) By introducing a variable transformation β i := α i ρ (3.73) and replacing the scalar product with a kernel, we get the new system of equations w = i α i φ(x i ) (3.74) α = ρβ (3.75) β i = 1 (3.76) ρ β i k(x i,x j ) = 1, j. (3.77) The most important equation is (3.77), since it lets us compute the coefficients β i. Using equations (3.74) to (3.76) we can then find values for α i, ρ and w. Consequently, solving the ZeroSlabSVM is reduced to solving a system of linear equations (3.77), which can be rewritten in matrix notation as Kβ = 1 (3.78) where K ij = k(x i,x j ). Of course, in order to find a unique feasible solution, the matrix K has to have full rank. Because of theorem 2.6 we know that this for example holds when using the Gaussian kernel. With kernels of inferior rank, the ZeroSlabSVM will not be

29 3.3. THE ZEROSLABSVM 29 solvable (except with the trivial solution w = 0 2 ). This can also be understood in the geometry of the kernel induced feature space: The ZeroSlabSVM tries to interpolate the endpoints of n vectors in the space spanned by the vectors by a hyperplane. In general this is only possible if the vectors are linearly independent, i.e., their endpoints are affinely independent and define a hyperplane. A necessary and sufficient condition for the linear independence of the samples in a kernel induced feature space is the full rank of its kernel The Solution of the ZeroSlabSVM The solution of the ZeroSlabSVM is the hyperplane defined by w, φ(x) = ρ (3.79) interpolating the sample points φ(x 1 ),...,φ(x n ) in feature space. In objective space, this hyperplane corresponds to a more complex shape interpolating the original sample points x 1,...,x n. Using (3.69) it can be computed as the level-set of the function f(x) = = = α i φ(x i ), φ(x) (3.80) α i φ(x i ), φ(x) (3.81) α i k(x i,x). (3.82) Once more the use of kernels here and for the linear equation system in the last subsection saved us from having to compute any feature transformation explicitly when solving the ZeroSlabSVM, since all computations can be performed directly in objective space. The level-set value is ρ, which can be computed for example with (3.76). Using a Gaussian kernel, f will again be a weighted sum of Gaussians as discussed in subsection The solution (3.22) suggested in example 3.1 was in fact the solution of the ZeroSlabSVM applied to the 2D-circle scenario, since the resulting shape interpolated all the points exactly. It should now be clear that when the setup is as symmetric as it was in example 3.1, the rows of the matrix K are simply permutations of each other and thus choosing the weights uniformly will solve the system of linear equations (3.78). 2 In this case the variable transformation (3.73) is no longer valid and (3.78) does not hold anymore, since ρ = 0.

30 30 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM The Geometry of the ZeroSlabSVM Using the Gaussian Kernel Let us consider an approach to the problem of the ZeroSlabSVM (3.66)-(3.67) slightly different than the one in the last subsection. Again using Lagrangian optimization theory and equations (3.68),(3.69) and (3.70) we can formulate its dual problem: min w s.t. 1 2 w 2 (3.83) α i = 1 (3.84) i w = α i φ(x i ) (3.85) i As for the OpenSlabSVM, this alternative formulation leads to some interesting insights into the geometry of the ZeroSlabSVM. In fact, if we compare (3.83)- (3.85) with (3.53)-(3.56) from subsection they look surprisingly similar. The only difference is the missing nonnegativity constraint (3.54) on the dual variables in the case of the ZeroSlabSVM. Therefore the only consequence of changing the inequality of the OpenSlabSVM in (3.38) to an equality for the ZeroSlabSVM in (3.67) is the loss of this nonnegativity constraint. What are the geometric implications of this observation? The alternative formulation of the ZeroSlabSVM stated above gives rise to the following geometric properties (fig. 3.10): Because of (3.85), w is a linear combination of the feature points φ(x i ). (3.84) restricts w to lie in the affine hull of the feature points φ(x i ) (compared to the convex hull for the OpenSlabSVM). Because of the objective function (3.83), w will be the point closest to the origin. Therefore it will be the projection of the origin onto the affine hull of {φ(x i )}. In subsection we proved the analogy of the OpenSlabSVM and the MiniBall problem, when a Gaussian kernel is used. For the ZeroSlabSVM, a very similar statement can be made: Proposition 3.3 (ZeroSlabSVM - Center of Ball Analogy) In the feature space induced by a Gaussian kernel, solving the ZeroSlabSVM (which means finding the point w AffineHull({φ(x i )}) closest to the origin) is equivalent to finding the center w of the ball with the smallest radius interpolating the points φ(x i ). Also this analogy holds for positive semidefinite RBF kernels in general. Proof: Compared to proposition 3.2, proposition 3.3 is even easier to understand by just combining the above geometric insights with proposition 2.5 or simply by looking at figure The mathematical proof is very similar to the one for the

31 3.3. THE ZEROSLABSVM 31 φ 1 φ 3 φ 2 w 0 Figure 3.10: Geometric setup of the ZeroSlabSVM in Gaussian feature space with three sample points. OpenSlabSVM: The problem of finding the center of the ball with the smallest radius interpolating the points φ(x i ) can be formulated as min R R,w R 2 (3.86) s.t. φ(x i ) w 2 = R 2, i. (3.87) We can again express the squared norm in (3.87) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w. (3.88) Stating the Lagrangian function L(w, R, α) = R 2 R 2 α i + w,w α i φ(x i ), φ(x i ) 2 setting its partial derivatives to zero L(w, R, α) R L(w, R, α) w = 0 α i + α i φ(x i ),w, (3.89) α i = 1 (3.90) = 0 w = α i φ(x i ) (3.91)

32 32 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM L(w, R, α) α j = 0 2 α i φ(x i ), φ(x j ) = α s α t φ(x s ), φ(x t ) + s,t=1 φ(x j ), φ(x j ) R 2, j (3.92) and introducing a kernel to compute the dot product we get the necessary and sufficient conditions: w = i α i φ(x i ) (3.93) α i = 1 (3.94) α i k(x i,x j ) = 1 2 α s α t k(x s,x t ) + k(x j,x j ) R 2 s,t=1 =: c j, j (3.95) Because of proposition 2.5, using an RBF kernel will render c j in (3.95) constant for all j and it is easy to see that as a consequence the center of ball problem stated with (3.93)-(3.95) becomes equivalent to the ZeroSlabSVM defined in (3.69)-(3.71), where c j corresponds to ρ. q.e.d The ZeroSlabSVM and Shape Reconstruction Using Radial Basis Functions A well known technique in computer graphics for implicit surface modeling is shape reconstruction with radial basis functions ([6],[7]). In this context a radial basis function with respect to the sample points x 1,...,x n R d is a function of the form f(x) = p(x) + α i g( x x i ) (3.96) where p is a polynomial and the basic function g is a real valued function on [0, ). Usually, g is a conditionally positive definite RBF kernel k g( x x i ) = k(x,x i ) (3.97) of order q (see subsection 2.3) and p is a polynomial of degree lower than q. We will now show a connection between shape reconstruction with such radial basis functions and the ZeroSlabSVM. Given a set of sample points x 1,...,x n R d, the

33 3.3. THE ZEROSLABSVM 33 problem of shape reconstruction using radial basis functions with a conditionally positive definite RBF kernel of order q is formulated as the interpolation problem f(x i ) = y i, i (3.98) α i s(x i ) = 0, for all polynomials s of degree smaller than q (3.99) where y i are given values for f at the sample points x i, for example y i = 0 for on-surface points and y i = ǫ i 0 for off-surface points. The reconstructed shape is then implicitly defined by f as one of its level-sets. Let k be a conditionally positive definite RBF kernel and let {p 1,...,p l } be a basis for polynomials of degree smaller than q and p(x) = l c i p i (x). (3.100) Then the interpolation problem (3.98)-(3.99) can be written as the following linear system: ( )( ) ( ) K P α y P T = (3.101) 0 c 0 where K ij = k(x i,x j ), i, j = 1,...,n (3.102) P ij = p j (x i ), i = 1,...,n; j = 1,...,l (3.103) When the used kernel is even positive definite (as for example the Gaussian kernel) and thus has order q = 0, the polynomial p is not required anymore and the linear system becomes Kα = y. (3.104) Note that this is - up to a multiplicative factor - exactly the system of linear equations we derived for the ZeroSlabSVM in subsection (yet we did not use off-surface points for the ZeroSlabSVM). While in computer graphics these equations were stated directly as an interpolation problem, it is interesting to see that for certain kernels they can also be derived as a special case of the SlabSVM Solving the ZeroSlabSVM In the last few subsections we have presented different ways of approaching the ZeroSlabSVM. An obvious way to find its solution is to solve the linear equation system (3.78) using standard techniques. However, the Gram matrix K is notoriously ill conditioned, especially for a high number of data points (n > 1000). Its condition also strongly depends on the kernel parameter(s) and the used kernel. In the case of the Gaussian kernel, increasing σ typically leads to a very poor condition. Choosing σ too small on the other hand, can decrease the quality of the

34 34 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM reconstructed shape. For reconstructions with radial basis functions in computer graphics, the multipole method ([6],[7]) is used to deal with this problem. It quite successfully solves the problem approximatively but is not easy to implement. In this thesis we present a different approach for the case when the Gaussian kernel (or any other full-rank RBF kernel) is used: exploiting the geometric insights gained in subsection we introduce a purely geometric algorithm, which approximately finds the solution w as the center of the ball with the smallest radius interpolating the points φ(x 1 ),...,φ(x n ) in feature space. This algorithm is discussed in detail in chapter 4 and is a good example of how the geometric analysis of kernel methods can lead to new solution strategies, approaching the problem from a different angle. 3.4 The Geometry of the SlabSVM Revisited So far we have studied in detail the two extreme cases of the SlabSVM, their geometric properties and the geometric implications when a Gaussian kernel is used. Let us now conclude this chapter by gaining some understanding of the geometry of the SlabSVM with a Gaussian kernel itself. Equations (3.8) and (3.9) tell us that, independent of the chosen slab width, the solution w will always lie in the affine hull of the samples φ(x 1 ),...,φ(x n ) in feature space. We have seen that it is the point w OpenSlab closest to the origin within the convex hull of those points when the slab width goes to infinity and it is the point w ZeroSlab closest to the origin within the whole affine hull when the slab width is set to zero. Thus, solutions for slab widths between those extreme cases must lie on some parametrized curve starting at w OpenSlab and leading to w ZeroSlab. An open issue is the formal description of this curve. For only three sample points, this curve is a straight line (fig. 3.11). It is possible that similar arguments as in [8] apply here, proving that the whole solution path of the SlabSVM - from w OpenSlab to w ZeroSlab - is piecewise linear in the slab width. However, this is only an intuition and remains to be proven. It is not surprising that an analogy in the spirit of propositions 3.2 and 3.3 can also be found for the SlabSVM: Proposition 3.4 (SlabSVM - Spherical Slab Analogy) In the feature space induced by a Gaussian kernel, solving the SlabSVM (which means finding the point w AffineHull({φ(x i )}) fulfilling (3.1)-(3.2)) is equivalent to finding the center w of the spherical slab with the smallest radius and a given slab width, defined by two hyperballs for which the points φ(x i ) lie outside the inner and inside the outer ball. Also this analogy even holds for positive semidefinite RBF kernels in general.

35 3.4. THE GEOMETRY OF THE SLABSVM REVISITED 35 φ 1 w OpenSlab φ 3 φ 2 w ZeroSlab 0 Figure 3.11: In the feature space induced by a Gaussian kernel and only three points, the solutions w of the SlabSVM for different slab widths will lie on a straight line between the solutions of its extreme cases. Proof: The problem of finding the center of the spherical slab with the smallest radius for the points φ(x i ) can be stated as min R R,w R 2 (3.105) s.t. R 2 + ǫ φ(x i ) w 2 R 2 + ǫ, i. (3.106) As done before, we can express the squared norm in (3.106) by dot products φ(x i ) w 2 = φ(x i ), φ(x i ) + w,w 2 φ(x i ),w, (3.107) state the generalized Lagrangian L(w, R, α, α ) = R 2 ( + α i φ(xi ), φ(x i ) + w,w 2 φ(x i ),w R 2 ǫ ) α i ( φ(xi ), φ(x i ) + w,w 2 φ(x i ),w R 2 ǫ ), (3.108) set its partial derivatives with respect to the primal variables to zero L(w, R, α, α ) R = 0 (α i αi ) = 1 (3.109)

36 36 CHAPTER 3. SURFACE MODELING WITH THE SLABSVM L(w, R, α, α ) w = 0 w (α i αi ) = w = (α i αi )φ(x i ) (3.110) (α i αi )φ(x i ), (3.111) use the resulting equations to substitute its primal variables and introduce a kernel to compute the dot product. The resulting dual problem then becomes: min α,α (α i αi )(α j αj)k(φ(x i ), φ(x j )) (3.112) i,j=1 + ǫ α i (3.113) α i ǫ (α i αi )k(φ(x i ), φ(x i )) (3.114) s.t. α ( ) 0 (3.115) and (α i αi ) = 1. (3.116) Due to proposition 2.5, using an RBF kernel will render the linear term (3.114) constant and the problem stated with (3.112)-(3.116) and its solution (3.111) becomes equivalent to the SlabSVM stated in (3.13)-(3.15) and (3.8), where ǫ ( ) = 2δ ( ). q.e.d.

37 Chapter 4 Center-of-Ball Approximation Algorithm Proposition 3.3 stated that for positive semidefinite RBF kernels, solving the ZeroSlabSVM is equivalent to finding the center of the ball with the smallest radius interpolating the points in the kernel induced feature space. In this chapter we introduce a geometric algorithm to approximate this center, which exploits the geometric insights into the problem gained in subsection The algorithm thus computes the point w with minimal but equal distance to n sample points φ 1,...,φ n in a Hilbert space, which all have the same distance from the origin. w is also the projection of the origin onto the affine hull of the samples {φ i }. Section 4.1 first presents the algorithm by stating its basic steps in subsection and explaining their geometric role in subsection and eventually investigates some implementation details in subsection refining the algorithm to its final form. In Section 4.2 it is shown how the algorithm can be computed directly in objective space when using kernels, along with implications using the Gaussian kernel, interesting remarks, observations and open questions. Section 4.3 concludes the chapter with some results and examples. 4.1 The Algorithm Basic Two-Phase Iteration Algorithm Given: n points φ 1,...,φ n with equal distance r to the origin. Goal: Find the center w of the ball with the smallest radius interpolating the sample points. Remarks: We can express any point p on the affine hull of {φ i } as a linear combination p = α i φ i, where α i = 1 (4.1) 37

38 38 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Since the points are equidistant to the origin, w can be viewed as the projection of the origin onto the affine hull of {φ i }. Definitions: In each iteration step k, the actual approximation of the center of ball is described by c k = αi k φ i, αi k = 1. (4.2) Initialization (k = 0): αi 0 = 1, i (4.3) n Iteration Step (k k + 1): Phase 1: choose φ sk In each step we find s k = arg max j (φ j c k )( c k ) φ j c k (4.4) Phase 2: update c k c k+1 Then we update For the coefficients α i this means c k+1 = c k + (φ s k c k )( c k ) (φ sk c k ) φ sk c k φ sk c k (4.5) =: c k + f k (φ sk c k ) (4.6) αs k+1 k = (1 f k )αs k k + f k (4.7) α k+1 i s k = (1 f k )αi k (4.8) Geometric Interpretation The current approximation of the center in each iteration is c k. It lies in the affine hull AH of {φ i }, since we have chosen 1 n as initial value for α0 i and the update rules (4.7)-(4.8) will thus never violate n αk+1 i = 1. Also the solution w lies in AH, being the projection of the origin onto AH. We now want to move to a point c k+1 closer to the center w. For this we consider the directions (φ j c k ) defined for every sample point, which allow us to move within AH. There are now two questions to be answered: 1. Which direction (φ j c k ) should we choose?

39 4.1. THE ALGORITHM How far should we move in this direction to define our new approximation c k+1? We will start by answering the second question and then investigate the first one. One may find it helpful to study (fig. 4.1), which demonstrates the following considerations graphically in a three dimensional example. w c k+1 φ j 0 -c k φ c k i φ i - c k Figure 4.1: k-th Iteration Step How far should we move? Let us assume we have already found the optimal direction (φ sk c k ). Starting at c k and moving in this direction we cannot move closer to w than the projection of w onto the line l(t) = c k + t(φ sk c k ) defined by the chosen direction. We cannot, of course, compute this projection directly, because w is the solution to our problem and we do not know it in advance. However, using the fact that w is the projection of the origin onto AH, the projection of w onto l(t) is equivalent to the projection of the origin onto l(t) (the projection point, w and the origin span a plane orthogonal to l(t)). We can reach this projection point, which will be our new center approximation c k+1, by projecting the length of the vector c k onto (φ sk c k ) which gives us the distance to move in this direction, starting at c k. This step is performed by phase two (4.5) of the algorithm. Which direction should we choose? It remains to find the direction (φ sk c k ) in which we will move as described above. This is the purpose of phase one (4.4). We have seen that the distance c k+1 c k we will move in phase two is the projection of c k onto the direction

40 40 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM (φ sk c k ). Since c k, c k+1 and w form a triangle with a right angle at c k+1, the direction which will bring us closest to w in one step is the one with the highest value for the resulting moving distance c k+1 c k. Therefore, in phase one (4.4) the direction (φ sk c k ) maximizing this distance is chosen Refined Algorithm In subsection the basic steps defining the algorithm have been presented. This subsection shows how to reduce the complexity by computing the terms in equations 4.4 and 4.5 iteratively. To this end we define d k j := φ j,c k = α k i φi, φ j (4.9) and e k := c k,c k = j=1 αi k αj k φi, φ j. (4.10) We can then rewrite the two phases of the algorithm replacing the corresponding terms in equations 4.4 and 4.5: Phase 1 Phase 2 s k = arg max j c k+1 = c k + e k d k j r 2 2d k j + e k (4.11) e k d k s k r 2 2d k s k + e k (φ sk c k ) (4.12) =: c k + f k (φ sk c k ) (4.13) Iterative update of d j and e Since we know, how the α i are updated every step (equations (4.7)-(4.8)), we can now also compute d j and e iteratively: e k+1 = j=1 = i s k j sk αi k+1 αj k+1 φi, φ j (1 f k ) 2 α k i α k j φi, φ j (4.14) + i s k ( (1 f k )α k i ((1 f k )α k s k + f k ) φ i, φ sk ) + ((1 f k )αs k k + f k )(1 f k )α k j φsk, φ j j s k

41 4.1. THE ALGORITHM 41 + ((1 f k )α k s k + f k ) 2 φ sk, φ sk = (1 f k ) 2 αi k α k j φi, φ j + (1 fk )f k i s k j=1 i s k α k i + (1 f k ) 2 αs k k α k j φsk, φ j + (1 fk )f k j s k j s k α k j φi, φ sk + (1 f k ) 2 (αs k k ) 2 φ sk, φ sk + 2(1 fk )f k αs k φsk k, φ sk + fk 2 φsk, φ sk = (1 f k ) 2 αi k αj k φi, φ j j=1 + 2f k (1 f k ) } {{ } e k φsk, φ i α k i } {{ } d k s k φsk, φ j (4.15) (4.16) +r 2 f 2 k (4.17) = (1 f k ) 2 e k + 2f k (1 f k )d k s k + r 2 f 2 k (4.18) d k+1 j = αi k+1 φi, φ j = i s k α k i (1 f k ) φ i, φ j + (α k sk (1 f k ) + f k ) φ sk, φ j (4.19) (4.20) = (1 f k ) i s k α k i φi, φ j + (1 fk )α k s k φsk, φ j + f k φsk, φ j = (1 f k ) α k i = (1 f k )d k j + f k φsk, φ j φi, φ j + fk φsk, φ j (4.21) (4.22) (4.23)

42 42 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Final Algorithm Taking all the above into account, the algorithm becomes: Init: Step (k k + 1): αi 0 = 1, i (4.24) n d 0 i = 1 φj, φ n i, i (4.25) e 0 = 1 n j=1 d 0 i (4.26) (phase1 : search) s k = arg max j e k d k j r 2 2d k j + e k (4.27) (phase2 : step) =: arg maxgj k (4.28) j f k := e k d k s k r 2 2d k s k + e k (4.29) αs k+1 k = (1 f k )αs k k + f k (4.30) α k+1 i s k = (1 f k )αi k (4.31) e k+1 = (1 f k ) 2 e k + 2f k (1 f k )d k s k + r 2 f 2 k (4.32) d k+1 j = (1 f k )d k j + f k φsk, φ j, j (4.33) 4.2 Remarks Convergence and Complexity Lemma 4.1 (Convergence) The solution c k of each iteration step described by (4.27)-(4.33) converges to the center w of the smallest ball interpolating the sample points φ 1,...,φ n. Proof: In phase two of the iteration step we move from c k to the projection of w onto l(t) = c k + t(φ sk c k ). Therefore the triangle defined by w, c k and c k+1 has a right angle at c k+1 and w c k+1 w c k. (4.34)

43 4.2. REMARKS 43 Equality only holds for two cases: Either w c k equals zero, or the moving distance c k+1 c k equals zero. In the first case we have reached the solution w exactly and are done. The second case implies that the directions (φ j c k ) for every φ j are orthogonal to (w c k ). This can only occur when c k is the projection of the origin onto the affine hull AH of the points {φ j } and thus also in the second case it holds w = c k. By this we have shown that the approximation c k+1 of each iteration step lies strictly closer to w until w is reached exactly. It now remains to prove that c k cannot converge to any other point c # w. Let us assume there exists such a limit point c #. We have seen that if we choose c k to be c #, c k+1 will lie a certain distance d > 0 closer to w. Since c # is a convergence point, we can get arbitrarily close to it. But since the moving distance c k+1 c k is a continuous function in c k, for a point c k arbitrarily close to c #, virtually the same step as for c k = c # will be performed and we get to a point c k+1 which is d > 0 closer to w and further away from c #. Thus, c # cannot be a convergence point. q.e.d. The speed of the convergence remains an open issue. Consequently this also holds for the overall complexity of the algorithm. The complexity per iteration is O(n) (search for the best of n directions and update coefficients α 1,...,α n ). A good alternative of the overall complexity would be the approximation strength of the algorithm, as how well the samples are interpolated after n steps. However, such a measure is not easily found and depends on the geometric setup of the original problem and the used kernel (see subsection 4.2.2). Additionally, the quality of the solution highly depends on numerical issues. Since shape reconstruction usually works with a large number of samples (and thus with a high dimensional kernel induced feature space) numerical errors will be inevitable. Therefore, the solution c k will always remain an approximation. For reasons shown in subsection 4.2.2, when using kernels, their parameters will play an important role in the numerical condition of the problem. We have already discussed the numerical difficulties of the ZeroSlabSVM in subsection to solve it with a system of linear equations. Our hope is that the presented algorithm exploiting geometric insights increases the numerical stability of finding a sufficient approximation in short time Kernel Induced Feature Spaces The algorithm introduced in section 4.1 can, of course, perform on a set of sample points in any Hilbert space, meeting the condition that the points have to be equidistant to the origin and thus lie on a hypersphere centered at the origin. Due to proposition 3.3, it can also be used as an alternative method to solve the ZeroSlabSVM presented in section 3.3, when a positive semidefinite RBF kernel is chosen. In that case, the space is the kernel induced feature space which because of proposition 2.5 fulfills the hypersphere-constraint mentioned above. The sample points φ i can then be seen as a nonlinear transformation φ(x i ) of some points x i in objective space and we can compute their scalar products directly in objective

44 44 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM space as a kernel evaluation of those objective points: φi, φ j = φ(xi ), φ(x j ) = k(x i,x j ) (4.35) Since the geometric algorithm uses the scalar product as its sole operation on the sample points φ i, the kernel trick is applicable. Therefore, also with the algorithm presented in this chapter, the ZeroSlabSVM can be computed entirely in objective space. In section 3.3 we have seen that for the ZeroSlabSVM the Gram matrix (s. definition 2.2) of the used kernel must have full rank. If this is not the case, the algorithm will find the trivial solution w = 0. An example of a full rank RBF kernel is the Gaussian kernel. In that case the value for r used in section 4.1 will be 1 (see subsection 2.2.3). As indicated earlier, the parameter σ of the Gaussian kernel has a strong influence on the geometric setup in feature space and the resulting numerical stability of the problem. Increasing σ will decrease the distances of the points φ(x i ) in feature space. It is clear that the algorithm performs better on points broadly scattered on the hyperball. As shown in figure 4.2, when some points lie close together it is possible that the algorithm quickly reaches a point with approximately the same distance to the sample points but then enters a phase where it continues to approach the correct solution w only slowly and thus needs many iteration steps. Moreover, small displacements of two close sample Figure 4.2: Two points lying close together can already strongly influence the speed of convergence. points (for example due to a small numerical error in the computation of the scalar product) can move the center w to an entirely different place in feature space (fig. 4.3). On the other hand, as discussed in subsection 3.1.2, the shape reproduction quality of the solution using too small values for σ can be poor. Thus, also when

45 4.2. REMARKS 45 w φ1 φ 2 φ 2 w φ 3 0 Figure 4.3: Small displacements of close sample points can relocate the solution. using the algorithm described in section 4.1, in order to obtain a moderate numerical condition and a satisfactory reproduction quality, the kernel parameter σ needs to be chosen wisely Off-Surface Points In computer graphics, besides the sample points defining the surface, often additional off-surface points are used for shape reconstruction. For the resulting function f, implicitly defining the surface as one of its level-sets, the values f(x offsurf ) at these off-surface points then provide additional constraints. For example we can demand that all the surface points lie in one level-set of f and that the off-surface points inside and outside the shape each lie in another one. We will not go into detail of the theory of off-surface points and how to generate them. However, we shortly demonstrate a way to solve the shape reconstruction problem with offsurface points using the algorithm introduced in this chapter: We have seen that the algorithm can solve linear equation systems of the form Kα = c (4.36) where c is a vector with constant entries c and K is a Gram matrix for a positive semidefinite RBF kernel with respect to some sample points x 1,...,x n. The entries of the vector c can have any constant value other than zero, since we can always solve (4.36) with c = λc as right side by setting α i = λα i. Introducing

46 46 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM off-surface points, the system slightly changes to c l Kα = c m + ǫ m c n ǫ n (4.37) where c x and ǫ x are x-dimensional vectors of constant entries c and ǫ. K again denotes the Gram matrix of a positive semidefinite RBF kernel now with respect to the on-surface points x 1,...,x l and the inner and outer off-surface points x l+1,...,x l+m, x l+m+1,...,x l+m+n. The level-set value of f at the inner and outer off-surface points will thus be c + ǫ and c ǫ where the level-set value for the on-surface points will be c. How can we now solve (4.37) using the algorithm presented in section 4.1? A possible way is to decompose it into components as and to solve The solution can then be found as Kα = Kα (Kα 2 Kα 3 ) (4.38) Kα 1 = c l+m+n (4.39) ǫ l Kα 2 = ǫ m (4.40) Kα 3 = ǫ n ǫ l ǫ m ǫ n (4.41) α = α (α 2 α 3 ). (4.42) As stated before, the geometric algorithm is able to solve linear equation systems of the form (4.36). The first system (4.39) has already that form. To solve the second system (4.40), we transform it into with K α 2 = ǫ l+m+n (4.43) K := [α 2] i := { Kij if (i l + m, j > l + m) or (i > l + m, j l + m) K ij else { [α2 ] i if i > l + m [α 2 ] i else (4.44) (4.45) (4.43) has now the requested form for the geometric algorithm. The third system (4.41) can be solved analogically. The geometric interpretation of this transformation is that we have reflected the

47 4.3. RESULTS 47 points φ(x l+m+1 ),...,φ(x l+m+n ) in feature space, corresponding to the off-surface points x l+m+1,...,x l+m+n, at the origin. All the points therefore still lie on the hyperball in feature space and K is still a matrix of pairwise scalar products of this new set of partially reflected points. One problem of this approach to solve the shape reconstruction problem with off-surface points using our geometric algorithm is that the numerical difficulties discussed in subsection can get worse. If we use the Gaussian kernel for example, we saw that for increasing values of σ, the points in feature space will lie closer together. When we now reflect some of these points at the origin, the affine hull of the points will inevitably be very close to the origin. Thus the solution w will also be very close to the origin and the values for f(x) = w, φ(x) will all be very small. Increasing this value by scaling α (which will scale w correspondingly) will also increase the numerical error and reduce the quality of the reconstructed shape. Consequently for a reasonable number of sample data this approach does not seem feasible Orthogonalization In the geometric interpretation (subsection 4.1.2) of the algorithm we have seen that in every step the direction (φ sk c k ), which brings us closest to the solution w in one step, is chosen. This resembles a steepest descent approach except that we only use a finite set of n possible directions. Similar to the idea of the conjugate gradient method we could also orthogonalize the next direction to all previously chosen directions in every step. This would lead to the solution w in maximal n (=number of samples) steps. We believe however, that this would introduce the same numerical difficulties as when we attempted to solve the problem through the linear equation system presented in subsection Due to numerical errors especially in the last iterations the solution would still be an approximation. Furthermore the complexity would increase and the argument that we reach a satisfactory solution in reasonable time would be weakened. 4.3 Results Figures 4.4 to 4.7 show some examples of shape reconstruction with the ZeroSlab- SVM using the algorithm introduced in the current chapter and a Gaussian kernel. As already mentioned in subsection 3.1.2, the kernel parameter σ should not be chosen too large in order to maintain a feasible numerical condition. Note that one then has to find a good rendering strategy to visualize the level-set of the resulting function f defining the shape, since f will be more of a valley-type structure, which can impose some problems to standard rendering techniques for implicit surfaces as marching cubes. To address these problems, one can for example slightly decrease the level-set value. However, changing the level-set value also reduces the interpolation quality of the shape.

48 48 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM Figure 4.4: Cactus (3337 Points) Figure 4.5: Flat Ball (492 Points)

49 4.3. RESULTS 49 Figure 4.6: Max Planck Head (2022 Points) Figure 4.7: Ball Joint (1964 Points)

50 50 CHAPTER 4. CENTER-OF-BALL APPROXIMATION ALGORITHM

51 Chapter 5 Feature-Coefficient Correlation 5.1 Features of a Shape In the context of pattern recognition, computer vision, medical imagery and freeform shape design, the work with features of a shape have become increasingly important. The two features ridge and ravine are defined as local positive maxima of the maximal and local negative minima of the minimal principal curvature along their associated principal curvature lines ([9],[10]). Ridges and ravines are deeply connected with interesting properties of a shape as its medial axis or its distance function. 5.2 Feature-Coefficient Correlation The observation was already made in [4] that features seem to correlate with the coefficients (α i αi ) obtained from the solution of the SlabSVM using a Gaussian kernel. As we have seen in chapter 3 such a coefficient (α i αi ) is assigned to every sample point x i. The observation was that the coefficients with the highest values and the ones with negative values seem to correspond with ridges and ravines of the shape. This is shown in the figures , where the samples are color coded according to their corresponding coefficients. We now made the same observation for the ZeroSlabSVM. Figures show some examples, where the sample points are again color coded according to their associated coefficients α i, now stemming from the solution of the ZeroSlabSVM. In figure 5.5 we can see that not only features as ridges, ravines or sharp edges but also the sampling density influences the values of those coefficients. A change of sampling density on both arms of the cactus can for example produce negative coefficients at points where no features exist. We know that the coefficients are the weights of a sum of Gaussians (see subsection 3.1.2) which implicitly defines the reconstructed shape through one of its level-sets. It is therefore not difficult to gain some intuition on why the highest 51

52 52 CHAPTER 5. FEATURE-COEFFICIENT CORRELATION Figure 5.1: Bunny Figure 5.2: Rocker Arm

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department