EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

Size: px
Start display at page:

Download "EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION"

Transcription

1 EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate School of the Ohio State University By Zhiyu Liang Grauate Program in Statistics The Ohio State University 014 Dissertation Committee: Yoonkyung Lee, Avisor Tao Shi Vincent Vu

2 c Copyright by Zhiyu Liang 014

3 ABSTRACT There has been growing interest in kernel methos for classification, clustering an imension reuction. For example, kernel linear iscriminant analysis, spectral clustering an kernel principal component analysis are wiely use in statistical learning an ata mining applications. The empirical success of the kernel metho is generally attribute to nonlinear feature mapping inuce by the kernel, which in turn etermines a low imensional ata embeing. It is important to unerstan the effect of a kernel an its associate kernel parameter(s) on the embeing in relation to ata istributions. In this issertation, we examine the geometry of the nonlinear embeings for kernel PCA an kernel LDA through spectral analysis of the corresponing kernel operators. In particular, we carry out eigenanalysis of the polynomial kernel operator associate with ata istributions an investigate the effect of the egree of polynomial on the ata embeing. We also investigate the effect of centering kernels on the spectral property of both polynomial an Gaussian kernel operators. In aition, we exten the framework of the eigen-analysis of kernel PCA to kernel LDA by consiering between-class an within-class variation operators for polynomial kernels. The results provie both insights into the geometry of nonlinear ata embeings given by kernel methos an practical guielines for choosing an appropriate egree for imension reuction an iscrimination with polynomial kernels. ii

4 This is eicate to my parents, Yigui Liang an Yunmei Hou; my sister, Zhiyan Liang an my husban, Sungmin Kim. iii

5 ACKNOWLEDGMENTS I woul like to express my sincere appreciation an gratitue first an foremost to my avisor, Dr. Yoonkyung Lee, for her continuous avice an encouragement throughout my octoral stuies an issertation work. Without her excellent guiance, patience an constructive comments, I coul have never finishe my issertation successfully. My gratitue also goes to the members of my committee, Dr. Tao Shi, Dr. Vincent Vu, an Dr. Prem Goel for their guiance in my research an valuable comments. Special thanks also go to Dr. Ranolph Moses for willing to participate in my final efense committee an giving constructive comments afterwars. I also thank Dr. Elizabeth Stasny for her help throughout my Ph.D. stuy; Dr. Rebecca Sela an Dr. Naer Gemayel for their valuable avice that mae my internship fruitful an exciting. I also thank my parents an sister for always supporting me an encouraging me with their best wishes. Last but not least, I woul like express my love, appreciation, an gratitue to my husban Sungmin Kim for his continuous help, support, love an many valuable acaemic avices. iv

6 VITA B.Sc. in Applie Mathematics, Shanghai University of Finance an Economics, Shanghai, China 00-Present Grauate Teaching /Research Associate, The Ohio State University, Columbus, OH PUBLICATIONS Liang, Z. an Lee, Y. (01), Eigen-analysis of Nonlinear PCA with Polynomial Kernels. Statistical Analysis an Data Mining, Vol 6, Issue 6, pp FIELDS OF STUDY Major Fiel: Statistics v

7 TABLE OF CONTENTS Abstract Deication Acknowlegments Vita ii iii iv v List of Figures viii CHAPTER PAGE 1 Introuction Kernel Methos Kernel Kernel metho Examples of kernel methos Kernel Operator Definition Eigen-analysis of the Gaussian kernel operator Eigen-analysis of Kernel Operators for Nonlinear Dimension Reuction 1.1 Eigen-analysis of the Polynomial Kernel Operator Two-imensional setting Multi-imensional setting Polynomial kernel with constant Simulation Stuies Uniform example Mixture normal example Effect of egree Restriction in embeings Analysis of Hanwritten Digit Data vi

8 4 On the Effect of Centering Kernels in Kernel PCA Centering in the feature space Simple illustration Centere ata with centere kernel Uncentere ata with uncentere kernel Uncentere ata with centere kernel Extension General Eigen-analysis for Centere Kernel Operator Derivation of eigenfunction-eigenvalue pair Orthogonality of eigenfunctions Connection with the centere polynomial kernel operator result Analysis of Centere Gaussian Kernel Operator Centere Gaussian kernel operator One-component normal Mixture normal istribution Eigen-analysis of Kernel Operators for Nonlinear Discrimination The Population Version of Kernel LDA Eigen-analysis of the Polynomial Kernel Operator Two-imensional setting Multi-imensional setting Simulation Stuies Effect of Degree Conclusion an Discussion Conclusion Discussion APPENDICES A Proof for the form of leaing eigenfunctions in a simple centere ata example B Example for the centere ata with centere kernel C Remarks on K p being a vali mapping Bibliography vii

9 LIST OF FIGURES FIGURE PAGE.1 Comparison of the contours of the nonlinear embeings given by three leaing eigenvectors an the theoretical eigenfunctions for the uniform ata. The upper three panels are for the embeings inuce by the eigenvectors for three nonzero eigenvalues, an the lower three panels are for the corresponing eigenfunctions Comparison of the contours of the nonlinear embeings given by three leaing eigenvectors (top panels) an the theoretical eigenfunctions (bottom panels) for the mixture normal ata when egree is Comparison of the contours of the nonlinear embeings given by four leaing eigenvectors (top panels) an the theoretical eigenfunctions (bottom panels) for the mixture normal ata when egree is The mixture normal ata an their projections through principal components with polynomial kernel of varying egrees. The colors istinguish the two normal components Wheel ata an their projections through principal components with polynomial kernel of varying egrees. The colors istinguish the two clusters Restricte projection space for kernel PCA with quaratic kernel when the leaing eigenfunctions are φ 1 (x) = 0.1x 1 0.0x an φ (x) = 0.5x 1 x Projections of hanwritten igits an by kernel PCA with polynomial kernel of egree 1 to viii

10 . Images corresponing to a 5 5 gri over the first two principal components for kernel PCA of the hanwritten igits an Projections of hanwritten igits an by kernel PCA with polynomial kernel of egree 1 to Projections of igits given by approximate eigenfunctions of kernel PCA that are base on the sample moment matrices Comparison of the contours of the leaing eigenfunction for the uncentere kernel operator (left) an centere kernel operator (right) in the bivariate normal example with k = Contour plots of the leaing eigenfunctions of the uncentere kernel operator for the istribution setting in (4.6) when the center of ata istribution graually moves along the x 1 axis away from the origin The trichotomy of the leaing eigenfunction form for the uncentere ata istribution case in (4.6) with centere kernel Contours of the leaing eigenfunction as m increases when k = 1.5 for the uncentere ata istribution in (4.6) with centere kernel operator Contours of the leaing eigenfunction when the center of the ata istribution moves from the origin along the 45 egree line The first row shows the first five eigenvectors of an uncentere Gaussian kernel matrix with the banwith w = 1.5 for ata sample from normal istribution N(, 1 ) an the secon row shows the first five eigenvectors of the centere kernel matrix. The thir row shows the linear combinations of eigenvectors with the coefficients erive from our analysis; the fourth row shows the first five theoretical eigenfunctions for the centere kernel operator; the fifth row shows the five eigenfunctions for the uncentere kernel operator The inner prouct φ 1, φ φ p versus the value of w The top row shows five leaing eigenvectors of a Gaussian kernel matrix of ata sample from a mixture normal istribution 0.6N(, 1 ) + 0.4N(, 1 ), the secon row shows the five leaing eigenvectors of the uncentere kernel matrix. The thir row shows the theoretical eigenfunctions of the centere kernel operator we obtaine ix

11 5.1 The contours of the probability ensity function of a mixture normal example with two classes. The re circles an blue crosses show the ata points generate from the istribution The left panel shows the contours of the empirical iscriminant function from kernel LDA algorithm with linear kernel; the right panel shows the contours of theoretical iscriminant function The left panel shows the contours of the empirical iscriminant function from kernel LDA algorithm; the right panel shows the contours of theoretical iscriminant function ( = ) The left panel shows the embeings of kernel LDA algorithm; the right panel shows the contours of the iscriminant function ( = ) Wheel ata an contours of the theoretical iscriminant functions of kernel LDA with polynomial kernel of varying egrees base on the sample moments Contours of the empirical iscriminant functions for wheel ata The scatterplot of two explicit features x 1 vs x 1x (polynomial kernel with = ) for the bivariate normal example in (5.4). Re circles an blue crosses represent two classes The scatterplot of two features x 1 vs x (polynomial kernel with = ) for the wheel ata. Black circles an re crosses represent the outer circle an inner cluster in the original ata Comparison between the first principal component of kernel PCA an the iscriminant function of kernel LDA for polynomial kernel with = over wheel ata x

12 CHAPTER 1 INTRODUCTION Kernel methos have rawn great attention in machine learning an ata mining in recent years (Schölkopf an Smola 00; Hofmann et al. 00). They are given as nonlinear generalization of linear methos by mapping ata into a high imensional feature space an applying the linear methos in the so-calle feature space (Aizerman et al. 164). Kernels are the functions that efine the inner prouct of the feature vectors an play an important role in capturing nonlinear mapping esire for ata analysis. Historically, they are closely relate to reproucing kernels use in statistics for nonparametric function estimation; see Wahba (10) for spline moels. The explicit form of feature mapping is not require. Instea, specification of a kernel is sufficient for kernel methos. Application of the nonlinear generalization through kernels has le to various methos for classification, clustering an imension reuction. Examples inclue support vector machines (SVMs) (Schölkopf et al. 1; Vapnik 15), kernel linear iscriminant analysis (kernel LDA) (Mika et al. 1), spectral clustering (Scott an Longuet-Higgins 10; von Luxburg 00), an kernel principal component analysis (kernel PCA) (Schölkopf et al. 1). There have been many stuies examining the effect of a kernel function an its associate parameters on the performance of kernel methos. For example, Brown 1

13 et al. (000), Ahn (010) an Bauat an Anouar (000) investigate how to select the banwith of Gaussian kernel for SVM an kernel LDA. In spectral clustering an kernel PCA, the kernel etermines the projections or ata embeings to be use for uncovering clusters or for representing ata effectively in a low imensional space, which are given as the leaing eigenvectors of the kernel matrix. As kernel PCA regars the spectral analysis of a finite-imensional kernel matrix, we can consier the eigen-analysis of the kernel operator as an infinite imensional analogue, where eigenfunctions are viewe as a continuous version of the eigenvectors of the kernel matrix. Such eigen-analysis can provie a view point of the metho at the population level. In general, it is important to unerstan the effect of a kernel on nonlinear ata embeing in relation to ata istributions. In this issertation, we examine the geometry of the ata embeing for kernel PCA. Zhu et al. (1), Williams an Seeger (000) an Shi et al. (00) stuie the relation between Gaussian kernels an the eigenfunctions of the corresponing kernel operator uner normal istributions. Zhu et al. (1) compute the eigenvalues an eigenfunctions of the Gaussian kernel operator explicitly when ata follow a univariate normal istribution. Williams an Seeger (000) investigate how eigenvalues an eigenfunctions change epening on the input ensity function, an state that the eigenfunctions with relatively large eigenvalues are useful in classification, in the context of approximating the kernel matrix using low rank eigen-expansion. Shi et al. (00) extene the iscussion for spectral clustering, explaining which eigenvectors to use for clustering when the istribution is a mixture of multiple components.

14 Among the kernel functions, Gaussian kernel an polynomial kernels are commonly use. Although Gaussian kernel is generally more flexible as a universal approximator, the two kernels have ifferent merits, an the polynomial kernel with appropriate egrees can be often as effective as Gaussian kernel. For example, Kaufmann (1) iscusse the application of polynomial kernels to hanwritten igits recognition an checkerboar problem in the context of classification using support vector machines, which prouce ecent results. Extening the current stuies of the Gaussian kernel operator, we carry out eigen-analysis of the polynomial kernel operator uner various ata istributions. In aition, we investigate the effect of the egree on the geometry of the nonlinear embeing with polynomial kernels. In stanar PCA, eigen-ecomposition is performe on the covariance matrix to obtain the principal components. Analogous to this stanar practice, ata are centere in the feature space an the corresponing centere version of the kernel matrix is commonly use in kernel PCA. We explore the effect of centering kernels on the spectral property of both polynomial kernel an Gaussian kernel operators, using the explicit form of the centere kernel operator. In particular, we characterize the change in the spectrum from the uncentere counterpart. As another popular kernel metho, kernel LDA has been use successfully in many applications. For example, Mika et al. (1) conucte an experimental stuy showing that kernel LDA is competitive in comparison to other classification methos. We exten the eigen-analysis of the kernel operator for kernel PCA to the general eigen-problem associate with kernel LDA, which leas to better unerstaning of the kernel LDA projections in relation to the unerlying ata istribution on the nonlinear embeing for iscrimination. We mainly investigate

15 the eigen-analysis of the polynomial kernel operator for kernel LDA an comment on the effect of the egree. Chapter gives introuction to technical etails of the kernel, kernel operator an kernel methos. It also provies a review on the eigen-analysis of the Gaussian kernel operator. Chapter presents the eigen-analysis of nonlinear PCA with polynomial kernels. Section.1 inclues the general results of the eigen-analysis of the polynomial kernel operator efine through ata istributions, an we show that the matrix of moments etermines the eigenvalues an eigenfunctions. In Section., numerical examples are given to illustrate the relationship between the eigenvectors of a sample kernel matrix an the eigenfunctions from the theoretical analysis. We comment on the effect of egrees (especially even or o) on ata projections given by the leaing eigenvectors, in relation to some features of the ata istribution in the original input space. We also iscuss how the eigenfunctions can explain some geometric patterns observe in ata projections. In Section., we present kernel principal component analysis of the hanwritten igit ata from Le Cun et al. (10) for some pairs of igits an explain the geometry of the embeings of igit pairs through analysis of the sample moment matrices. Chapter 4 mainly focuses on the effect of centering kernels. Section 4.1 regars how centering kernel affects the spectral property of the polynomial kernel operator. We show examples using both centere an uncentere polynomial kernels to illustrate the ifference. In Section 4., we use Mercer s theorem to express the kernel function for general analysis of the centere kernel operator, which encompasses the result for the polynomial kernel operator. Section 4. examines the effect of centering kernels on the spectral property of the Gaussian kernel operator. We investigate both one-component normal an multi-component normal examples an escribe the 4

16 change in the spectrum after centering. Chapter 5 extens the current framework for analysis of kernel PCA to kernel LDA. By solving the general eigen-problem associate with the population version of kernel LDA, we characterize the theoretical iscriminant function that maximizes the between-class variation relative to within-class variation. The polynomial kernel function is use in this erivation. Numerical examples are given in Section 5. an Section 5.4 to compare the empirical iscriminant function an theoretical iscriminant function. Chapter 6 conclues the issertation with iscussions. 5

17 CHAPTER KERNEL METHODS.1 Kernel Suppose that ata (D = {x 1,..., x n }) consist of ii sample from a probability istribution P an the input omain for the ata is X, e.g. X = R p. Then a kernel function is efine as a semi-positive efinite mapping from X X to R, i.e.: K : X X R, (x i, x j ) K(x i, x j ). The kernel function is symmetric, which means K(x, y) = K(y, x). Besies, there are some properties of a kernel that are worth noting: K(x, x) 0 K(u, v) K(u, u)k(v, v). For the corresponing ata point (x i, x j ), the kernel function can be expresse in terms of inner prouct as follows: K(x i, x j ) = Φ(x i ), Φ(x j ), 6

18 where Φ is typically a nonlinear map from the input space to an inner prouct space H, Φ : X H. The reason we introuce the inner prouct space is that being able to compute the inner prouct allows us to perform relate geometrical constructions with the information of angles, istances or lengths. Given such formulation, we call the similarity measure function K a kernel, Φ its feature map an H the corresponing feature space. We say that kernel K correspons to inner proucts in the feature space H via a feature mapping Φ. Schölkopf an Smola (00) showe that kernels which correspon to the inner proucts in the feature space coincie with the class of non-negative efinite kernels. Some examples of non-negative efinite kernels can be foun to be evaluate efficiently even if they correspon to inner proucts in infinite imensional inner prouct space. The corresponence is thus critical. Historically, kernels are closely relate to reproucing kernels typically use for nonparametric function estimation. The following summary of construction of reproucing kernel Hilbert space an reproucing kernel gives an example of well-efine kernel space an the corresponing kernel. To efine reproucing kernels, consier a Hilbert space H K of real value functions on an input omain X. Note that a Hilbert space H K is a complete inner prouct linear space, which is ifferent from the feature space H. In Wahba (10), a reproucing kernel Hilbert space is efine as a Hilbert space of real value functions, where for each x X, the evaluation functional L x (f) = f(x) is boune in H K.

19 By the Riesz representation theorem, if H K is a reproucing kernel Hilbert space, then there exists an element K x H K, the representer of evaluation at x, such that L x (f) = K x, f = f(x), f H K see Aronszajn (150) for etails. The symmetric bivariate function K(x, y) (Note that K(x, y) = K x (y) = K x, K y = K y, K x ) is calle the reproucing kernel an it has the reproucing property K(x, ), f( ) = f(x). It can be shown that any reproucing kernel is non-negative efinite. There exists a one-to-one corresponence between reproucing kernel Hilbert spaces an non-negative efinite functions. The Moore-Aronszajn theorem states that for every reproucing kernel Hilbert space H K of functions, there correspons a unique reproucing kernel K(x, y), which is non-negative efinite. Conversely, given a non-negative efinite function K(s, t) on X, we can construct a unique reproucing kernel Hilbert space H K that has K(s, t) as its reproucing kernel. Given the kernel function, we efine the kernel matrix in the following way. Let x 1,..., x n X be an ii sample an K be the kernel function, the kernel matrix is given as a n n matrix: K n = [K(x i, x j )]. We say a kernel matrix is non-negative efinite if it satisfies the conition c i c j K(x i, x j ) 0. i,j A kernel function which generates a non-negative efinite kernel matrix K n is calle a non-negative efinite kernel.

20 . Kernel metho Kernel methos are given as nonlinear generalization of linear methos by mapping ata into a high imensional feature space an applying the linear methos in the feature space. In most kernel methos, the key step is to replace the inner prouct with the kernel so that the explicit form of feature mapping is not require. This substitution is calle the kernel trick in machine learning. Such trick allows us to hanle problems which are ifficult to solve in the high or even infinite imensional feature space irectly. We introuce three popular kernel methos in the following section where the kernel trick is applie. Some examples of positive efinite kernels in those kernel methos inclue Gaussian kernel K(x, x ) = e x x /σ, polynomial kernel of egree K(x, x ) = (1 + x, x ), sigmoi kernels K(x, x ) = tanh(κ(x x ) + Θ) an so on.

21 ..1 Examples of kernel methos (a) Support Vector Machines (SVM) Several authors came up with the class of hyperplanes base on the ata (x 1, y 1 ), (x, y ),, (x n, y n ), with the input omain of x i X an y i { 1, 1}, x t β + β 0 = 0 where β R, corresponing to the following classification rule f(x) = sign(x t β + β 0 ), an propose a learning algorithm for problems which are linearly separable by fining hyperplane that create the largest margin between the points for ifferent classes through optimization problem (Vapnik an Lerner 16; Vapnik an Chervonenkis 164). We call the above classifier which fins linear bounaries in the input space the support vector classifier (Hastie et al. 00). In the non-separable case, where the classes overlap, the optimization problem can be generalize by allowing some points on the wrong sie of the margin, which leas to the objective function L D = n α i 1 i=1 n n α i α i y i y i x t ix i. i=1 i =1 While the support vector classifier fins the linear bounary, the proceures can be more flexible by mapping the ata into the feature space. We call this extension the Support Vector Machines. It prouces the nonlinear bounaries in the input space by constructing linear bounaries in the feature 10

22 space to achieve better separation. Through feature mapping, the objective function has the form L D = n α i 1 i=1 n n α i α i y i y i Φ(x i ), Φ(x i ). i=1 i =1 By applying the kernel trick, we replace K(x i, x i) = Φ(x i ), Φ(x i ), then the the support vector machines for two-class classification problems have the following form f(x) = n α i K(x, x i ) + β 0. i=1 The support vector machine is generally use to solve two-class problems. It can also be extene to multiclass problems by solving many two-class problems. SVMs have applications in many supervise an unsupervise learning problems. (b) Kernel PCA The stanar PCA is a powerful tool for extracting a linear structure in the ata. The principal components are obtaine by eigen-ecomposition of the covariance matrix. Schölkopf et al. (1) propose kernel PCA by computing the inner prouct in the feature space using kernel functions in the input space. In this kernel metho, one can compute the principal components in a high-imensional feature space, which is relate to input space by some nonlinear feature mapping. Similar to SVM, this kernel metho enables the construction of nonlinear version of principle component analysis. Suppose the covariance matrix for the centere observations x i, i = 1,, n, 11

23 is C = 1 n equation n x i x t i. PCA computes the principal components by solving the i=1 Cv = λv By mapping the ata into the feature space H, the covariance matrix in H can be written in the form of C = 1 n Φ(x i )Φ(x i ) t. The problem is thus n i=1 turne into fining the eigen-ecomposition of C, Cu = λu. Notice that the computation involve is prohibitive when the feature space n is very high imensional. Replacing u = α i Φ(x i ) in the above equation, i=1 we have the eigenvalue problem of the kernel matrix K n α = nλα, where the kernel matrix is given by K n = [K(x i, x j )] = [ Φ(x i ), Φ(x j ) ]. We thus have the matrix α with its columns as the eigenvectors of the kernel matrix, let α k inicate the eigenvectors with respect to the eigenvalue λ k. Corresponingly, the projections of any point x onto the normalize eigenvectors u k of the covariance matrix in the feature space can be erive as n αi k K(x i, x). We are thus able to obtain embeings in the feature space i=1 for any ata point in kernel PCA setting. (c) Kernel LDA In kernel PCA, we aim to fin the principal components explaining as much variance of ata as possible, which are for escribing the ata. When it 1

24 comes to classfication, we look for the features which iscriminate between the two classes given the label information. The classical classification algorithms inclue linear an quaratic iscriminant analysis, which assume the Gaussian istribution for each class. Fisher s linear iscriminant irection is obtaine through maximizing the between-class variance relative to the within-class variance. Mika et al. (1) propose a nonlinear classification technique base on Fisher s linear iscriminant analysis, which coul be useful when the classification bounary is not clear. By mapping the ata into the feature space, the kernel trick allows us to fin Fisher s linear iscriminant in the feature space H, leaing to a nonlinear iscriminant irection in the input space. Assume we have two classes for this classification problem an let D 1 = {x 1 1,, x 1 n 1 } an D 0 = {x 0 1,, x 0 n 0 } be samples from those two ifferent classes. Then the sample size is n = n 1 + n 0. Let Φ be the feature mapping into the feature space H. To fin the linear iscriminant in H, we nee to fin the irection w which maximizes the between-class variation relative to within-class variation in the feature space, i.e. J(w) = wt S B w w t S W w. (.1) Here w H an S B an S W are the matrices in the feature space, S B = (m Φ 1 m Φ 0 )(m Φ 1 m Φ 0 ) t an S W = (Φ(x) m Φ l )(Φ(x) m Φ l ) t, x D l l=1,0 where m Φ l = 1 n l Φ(x l n j), l = 1, 0 is the mean of feature vectors in class l. l j=1 1

25 When w H is in the span of all training samples in the feature space, n w can be written as w = α i Φ(x i ). Plugging w into the equation (.1) i=1 an expaning both the numerator an enominator in terms of α using the kernel trick, we have w t S B w = α t Bα an w t S W w = α t W α, where B an W are efine base on the kernel matrix, see Section 5.1 for etails of B an W. Therefore, Fisher s linear iscriminant can be foun through the eigen-problem: Bα = λw α, Similar to the kernel PCA, any projection of a new pattern on the irection w in the feature space is given by the linear combination of the coefficients α i an kernel functions evaluate at the new point an original ata K(x i, x), which gives the empirical iscriminant function ˆf(x) n = α i K(x i, x) in kernel LDA. i=1. Kernel Operator..1 Definition As we mentione before, our kernel function K n is efine as a semi-positive efinite mapping from X X to R, an there is a unique function space H K (calle a reproucing kernel Hilbert space) corresponing to the kernel. Given a probability istribution P with ensity function p(x) an a kernel function K, the istribution-epenent kernel operator is efine as K p f(y) = K(x, y)f(x)p(x)x (.) X 14

26 as a mapping from H K to H K. Then an eigenfunction φ H K an the corresponing eigenvalue λ for the operator K p are efine through the equation K p φ = λφ or X K(x, y)φ(x)p(x)x = λφ(y). (.) Note that the eigenvalue an eigenfunction epen on both the kernel an probability istribution. To see the connection between the kernel operator an kernel matrix as its sample version, consier the n n kernel matrix, K n = [K(x i, x j )]. From the iscussion about kernel PCA in Section..1, we know that kernel PCA fins nonlinear ata embeings for imension reuction through eigen-analysis of the kernel matrix. Suppose that λ n an v = (v 1,..., v n ) t are a pair of eigenvalue an eigenvector of K n such that K n v = λ n v. Then for each i = 1,,..., n, we have 1 n n j=1 K(x i, x j )v j = λ n n v i. When x 1,..., x n are sample from the istribution with ensity p(x) an v is consiere as a iscrete version of φ( ) at ata points, (φ(x 1 ),..., φ(x n )) t, we can see that the left-han sie of the above equation is an approximation to its integral counterpart: 1 n n K(x i, x j )φ(x j ) j=1 X K(x, x i )φ(x)p(x)x. As a result, λ n /n can be viewe as an approximation to the eigenvalue λ of the kernel operator with eigenfunction φ. The pair of λ n an v yiel a nonlinear principal component or nonlinear embeing from X to R given by ˆφ(x) = 1 λ n n v i K(x i, x). i=1 15

27 Hence, eigen-analysis of the kernel operator amounts to an infinite-imensional analogue of kernel PCA. Baker (1) gives the theory of the numerical solution of eigenvalue problems, showing that the eigenvalues of K n converge to eigenvalues of the kernel operator as n. The eigen-analysis of kernel PCA is useful for unerstaning the kernel metho on the population level... Eigen-analysis of the Gaussian kernel operator As we iscusse in the introuction, Zhu et al. (1), Williams an Seeger (000) an Shi et al. (00) stuie the relation between Gaussian kernels an the eigenfunctions of the corresponing kernel operator uner the normal istributions. Shi et al. (00) obtaine the refine version of analytic results in Zhu et al. (1) for the spectrum of Gaussian kernel operator with the univariate Gaussian case. When the probability ensity function is normal with P N(µ, σ ) an the (x y) kernel function K(x, y) = exp( ), the eigenvalues an eigenfunctions are w given explicitly by ( ) i 1 λ i = (1 + β + β 1 + β) 1 + β β ( ) ( (1 (1 + β)1/ φ i (x) = i 1 (i 1)! exp (x µ) 1 + β 1 H σ i β ) ) 1/4 x µ, σ for i = 1,,, where β = σ /w, H i is the ith orer Hermite polynomial; see Koekoek an Swarttouw (1) for more etails about Hermite polynomials. Williams an Seeger (000) investigate the epenence of the eigenfunction on the Gaussian input ensity an iscusse how this epenence etermines the basis functions for classification problems. Shi et al. (00) explore this connection in an attempt to unerstan spectral clustering metho from a population point of view. 16

28 Many clustering algorithms use the top eigenvectors of the kernel matrix or its normalize version (Scott an Longuet-Higgins 10; Perona an Freeman 1; Shi an Malik 000). Despite their empirical success, some limitations of the above stanar spectral clustering are note in Naler an Galun (00). For example, they pointe out that those clustering algorithms base on the kernel matrix with a single parameter (e.g, Gaussian kernel) woul fail when ealing with clusters of ifferent scales. Shi et al. (00) investigate the spectral clustering from a population level when the istribution P inclues several separate high-ensity components. They foun that when there is enough separation among the components, each of the top eigenfunctions of the kernel operator correspons to one of the separate components, with the orer of eigenfunctions etermine by the mixture proportion an the eigenvalues. They also showe that the top eigenfunction of the kernel operator for separate components is the only eigenfunction with no sign change. Hence, when each mixture component has enough separation from the other components, the number of eigenfunctions with no sign change of the kernel operator K P suggests the number of components of the istribution. Using the relationship between kernel matrix an kernel operator, we can estimate the number of clusters by the number of eigenvectors that have no sign change up to some precision. 1

29 CHAPTER EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION.1 Eigen-analysis of the Polynomial Kernel Operator In this section, we stuy the epenence of eigenfunctions an eigenvalues of the kernel operator on the ata ensity istribution when the polynomial kernels are use. We examine eigen-expansion of the polynomial kernel operator base on the equation (.) when X = R p, an establish the epenence of the eigen-expansion on the ata istribution. There are two types of polynomial kernels of egree : i) K(x, y) = (x t y) an ii) K (x, y) = (1 + x t y). We begin with eigen-analysis for the first type in two imensional setting in Section.1.1 an generalize it to p-imensional setting in Section.1.. Then we exten the analysis further to the secon type with an aitional constant in Section.1.. 1

30 .1.1 Two-imensional setting Suppose that ata arise from a two-imensional setting, X = R with probability ensity p(x). For polynomial kernel of egree, K(x, y) = (x t y), we erive λ an φ( ) satisfying (x t y) φ(x)p(x) x = λφ(y) (.1) R in this setting. More explicitly, K(x, y) = (x 1 y 1 + x y ) = j=0 ( ) (x 1 y 1 ) j (x y ) j = j j=0 ( ) (x j 1 x j j )(y j 1 y). j Note that the polynomial kernel can be also expresse as the inner prouct of the so-calle feature vectors, Φ(x) t Φ(y), through the feature map, Φ(x) = ( ( 0 ) 1 x 1, ( 1 ) 1 ( ) ) 1 t x 1 1 x,, x. Appenix C comments on that the mapping K p is vali with the polynomial kernel. With the explicit expression of K, the equation (.1) becomes [ ( ) ] (x j 1 x j j )(y j 1 y) j φ(x)p(x) x = λφ(y), j=0 which is re-expresse as ( ) [ 1 ( y j 1 y j j j j=0 ) 1 ) 1 x j 1 x j φ(x)p(x) x ] = λφ(y). ( Let C j = x j 1 x j φ(x)p(x) x be a istribution-epenent constant for j j = 0,...,. Then for λ 0, the corresponing eigenfunction φ( ) shoul be of the form φ(y) = 1 λ k=0 ( ) 1 Ck y1 k y k k. (.) 1

31 By substituting (.) for φ(x) in the efining equation for C j, we get the following equations for the constants (j = 0,..., ): C j = 1 λ which leas to λc j = ( ) 1 x j 1 x j j k=0 Note that ( ) 1 ( k j [ k=0 ) 1 Ck ( ) 1 ] Ck x k 1 x k p(x)x, k x (j+k) 1 x (j+k) p(x)x. x (j+k) 1 x (j+k) p(x)x is E(X (j+k) 1 X (j+k) ), a moment of the ranom vector X = (X 1, X ) t istribute with p(x). Let µ (j+k),(j+k) enote the moment. Then the set of the equations can be written as k=0 ( ) 1 ( k j ) 1 µ (j+k),(j+k) C k = λc j for j = 0,...,. (.) Defining the ( + 1) ( + 1) matrix with entries given by moments of total egree as follows M = ( ) 1 1 ( ) ( µ,0 0 0 ( 0 ) 1. ( ) 1 ( 0 µ 1,1 ( 1 ) 1 µ, ( ) 1 ( ) 1 ( ) 1 ( µ 1,1... ( 1 0 ) 1 ( ) 1 ( µ, ) 1 ) 1 ( 1 ) 1 we can succinctly express the set of equations as M C 0 C 1. C = λ C 0 C 1. C µ 1,+1... ) 1 µ, ) 1 µ 1,+1 ( ) µ 0,. (.5) 0,(.4)

32 For the moment matrix M, the subscript inicates the input imension, an the superscript refers to the egree of the polynomial kernel. From the equation (.5), we can see that the pairs of eigenvalue an eigenfunction for the polynomial kernel operator are etermine by the spectral ecomposition of the moment matrix M. Note that the eigenvectors of M nee to be scale so that φ (x)p(x) x = 1. Obviously, the eterminant of M λi is a polynomial of egree (+1). Therefore, there are at most ( + 1) nonzero eigenvalues of the polynomial kernel operator. The statements so far lea to the following theorem. Theorem 1. Suppose that the probability istribution p(x 1, x ) efine on R has finite th moments µ j,j kernel of egree, K(x, y) = (x t y), = E(X j 1 X j ), j = 0,...,. For the polynomial (i) The eigenvalues of the polynomial kernel operator are given by the eigenvalues of the moment matrix ( 1 1) M = ( 1 ) ( ) ( 1 µ,0 0 0) ( 0 ) 1 µ 1,1 ( 1. ( 0 ) 1 µ, ( ( 1 ) 1 ) 1 ( ) 1 1 µ 1,1... µ,.... ) 1 ( ) 1 1 µ 1,+1... ( 1) 1 ( 0) ( ) 1 µ, ( ) 1 µ 1,+1 ( ) µ0,. (ii) There are at most + 1 nonzero eigenvalues. (iii) The eigenfunctions are polynomials of total egree of the form in (.) with coefficients etermine by the eigenvectors of M. (iv) The eigenfunctions, φ i, are orthogonal in the sense of φ i, φ j p = φ i (x)φ j (x)p(x)x = 0 R for i j. 1

33 Proof. We prove the statement (iv). Let C i an C j be the eigenvectors of M corresponing to a pair of eigenfunctions φ i an φ j H K with eigenvalues λ i an λ j. Then for i j, φ i (x)φ j (x)p(x)x C t im C j = C t i(λ j C j ) = λ j C t ic j = 0. R.1. Multi-imensional setting In general, consier the p-imensional input space (X = R p ) for ata. For x, y R p, the kernel function can be expane as ( p ) (x t y) = x k y k = k=1 an the equation (.) becomes i.e. j 1 + +j p= j 1 + +j p= ( j 1 + +j p= ( ) p (x k y k ) j k, j 1,, j p k=1 ( ) p (x k y k ) j k φ(x)p(x) x = λφ(y) j 1,, j p j 1,, j p ) 1 φ(x)p(x) x, we can write the eigen- ( Letting C j1,,j p = j 1,, j p function φ( ) as φ(x) = 1 λ j 1 + +j p= ( k=1 p k=1 ) 1 j 1,, j p y j k k [ ( j 1,, j p p k=1 x j k k ) 1 p Cj1,,j p ) 1 p k=1 x j k k φ(x)p(x) x k=1 x j k k. (.6) Again, by plugging this expansion φ(x) in the equation that efines C j1,,j p, we get a set of equations for the constants: ] = λφ(y).

34 C j1,,j p = 1 λ ( j 1,, j p which is rewritten as λc j1,,j p = ) 1 p i 1 + +i p= k=1 ( x j k k i 1,, i p Let µ j1 +i 1,,j p+i p enote the moment E( i 1 + +i p= ( ) 1 ( p k=1 i 1,, i p j 1,, j p ) 1 X j k+i k k ) = ) 1 p Ci1,,i p k=1 Ci1,,i p p p k=1 k=1 x i k k p(x)x, x i k+j k k p(x) x. x i k+j k k p(x) x for (i 1,..., i p ) with i i p = an (j 1,..., j p ) with j j p =. Then we have ( ) 1 ( ) 1 µj1 +i 1,,j p+i p C i1,,i p = λc j1,,j p. (.) i 1 + +i p= i 1,, i p j 1,, j p To express the above equation in matrix form, we generalize the moment matrix ( ) 1 M to Mp ( ) 1 with entries given by µj1 +i i 1,, i p j 1,, j 1,,j p+i p. The p imension of Mp is the number of combinations of non-negative integers j k s satisfying j j p =, which is p =. Then the equation (.) is ( ) + p 1 written as M p C = λc, where C is a p -vector with entries C j1,,j p for j j p =. Applying the similar argument use for the two-imensional setting, we conclue that there are ( ) + p 1 at most p = nonzero eigenvalues of the polynomial kernel operator, an p epens on both the input imension an the egree of the polynomial kernel. Thus we arrive at the following theorem.

35 Theorem. Suppose that the probability istribution p(x 1, x,, x p ) efine on p R p has finite th moments, µ i1 +j 1,,i p+j p = E( X i k+j k k ) for j j p =, i i p =. For the polynomial kernel of egree, K(x, y) = (x t y), k=1 (i) The eigenvalues of the polynomial kernel operator are given by the eigenvalues of the moment matrix Mp. ( ) + p 1 (ii) There are at most p = nonzero eigenvalues. (iii) The eigenfunctions are polynomials of total egree of the form in (.6) with coefficients given by the eigenvectors of M p. (iv) The eigenfunctions are orthogonal with respect to the inner prouct, φ i, φ j p = R p φ i (x)φ j (x)p(x)x..1. Polynomial kernel with constant The kernel operator for the secon type of polynomial kernel with constant can be treate as a special case of what we have iscusse in the previous section. For example, K (x, y) = (1 + x 1 y 1 + x y ) in the two-imensional setting can be viewe as K(x, y) = (x 1 y 1 + x y + x y ) in the three-imensional setting with x = y = 1. Using the connection between K an K, we know that the number of nonzero eigenvalues for the kernel operator with K is at most ( ) ( ) = = 6 from Theorem. The eigenfunctions in this case are of the following form: φ(x) = 1 ( ) 1 Cj1,j λ j 1, j, j,j x j 1 1 x j. (.) j 1 +j +j = 4

36 There are six combinations of non-negative integers j k s such that j 1 + j + j =. M in general is given as follows: µ 4,0,0 µ,1,0 µ,0,1 µ,,0 µ,1,1 µ,0, µ,1,0 µ,,0 µ,1,1 µ1,,0 µ 1,,1 µ1,1, µ,0,1 µ M =,1,1 µ,0, µ1,,1 µ 1,1, µ1,0,, µ,,0 µ1,,0 µ1,,1 µ 0,4,0 µ0,,1 µ 0,, µ,1,1 µ 1,,1 µ 1,1, µ0,,1 µ 0,, µ0,1, µ,0, µ1,1, µ1,0, µ 0,, µ0,1, µ 0,0,4 an the vector C with constants C j1,j,j M C,0,0 C 1,1,0 C 1,0,1 C 0,,0 C 0,1,1 C 0,0, = λ C,0,0 C 1,1,0 C 1,0,1 C 0,,0 C 0,1,1 C 0,0,. satisfies the following equation: Since X = 1, the moments µ i1 +j 1,i +j,i +j = E( µ i 1 +j 1,i +j = E( k=1 X i k+j k k ). k=1 X i k+j k k ) are simplifie to In summary, we conclue that for ata istribution in R p an polynomial kernel of egree with constant term, the resulting eigenvalues an eigenfunctions of the kernel operator can be obtaine on the basis of Theorem. The extensions are accomplishe by application of the result with polynomial kernel of egree for ata istribution in R p+1 with X p+1 fixe at 1, where the moments p+1 p µ i1 +j 1,,i p+1 +j p+1 = E( X i k+j k k ) reuce to µ i 1 +j 1,,i p+j p = E( X i k+j k k ). k=1 5 k=1

37 . Simulation Stuies We present simulation stuies to illustrate the relationship between the theoretical eigenfunctions an sample eigenvectors for kernel PCA. First we consier two simulation settings in R an examine the explicit forms of the eigenfunctions using Theorem 1. With an aitional example, we investigate the effect of egree (the parameter for polynomial kernels) on the nonlinear ata embeings inuce by the kernel, which can be use for uncovering ata clusters or iscriminating ifferent classes. Furthermore, we explore how eigenfunctions can be use to unerstan certain geometric patterns observe in ata projections...1 Uniform example For X = (X 1, X ) t, let X 1 an X be ii with uniform istribution on (0, 1). Suppose that we use the secon-orer polynomial kernel, K(x, y) = (x 1 y 1 +x y ). Since all the fourth moments µ j,4 j = E(X j 1X 4 j ), j = 0,..., 4 are finite in this case, we can compute the theoretical moment matrix M explicitly, an it is given by M = µ 4,0 µ,1 µ, µ,1 µ, µ1, µ, µ1, µ 0,4 = Notice that there is symmetry in the moments ue to the exchangeability of X 1 an X (e.g. µ 1, = µ,1 ). The eigenvalues of the kernel operator are the same as those of M. We can get the eigenvalues of the matrix numerically, which are given by λ 1 = 0.506, 6

38 λ = 0.0, an λ = Accoring to Theorem 1, given each eigenvalue λ, the corresponing eigenfunction can be written explicitly in the form, φ(x) = 1 λ( C0 x 1 + C 1 x 1 x + C x ), where (C 0, C 1, C ) t is a scale version of the eigenvector of M corresponing to λ. For simplicity of exposition, we choose not to scale the eigenfunctions to the unit norm but to go with the scale given by the eigenvectors throughout our numerical stuies. With the unit-norme eigenvectors, we have the following eigenfunctions for the uniform istribution: φ 1 (x) = ( 0.54x 1 0.0x 1 x 0.54x ), φ (x) = (0.0x 1 0.0x ), φ (x) = (0.454x x 1 x x ). To make numerical comparisons, we took a sample of size 400 from the istribution an compute the sample kernel matrix for the secon-orer polynomial kernel. Then we obtaine its eigenvalues an corresponing eigenvectors. There are three non-zero eigenvalues, an they are ˆλ 1 = 0.540, ˆλ = 0.044, an ˆλ = 0.01 after being scale by the sample size n as iscusse in Section.1. The sample eigenvalues are quite close to the theoretical ones. Figure.1 compares the contour plots of the nonlinear embeings given by the sample eigenvectors an the theoretical eigenfunctions. The top panels are for the embeings inuce by the leaing eigenvectors of the kernel matrix, while the bottom panels are for the theoretical eigenfunctions obtaine from the moment matrix. The change in color from blue to yellow in each panel inicates increase in values. There is great similarity between the contours of the true eigenfunction

39 an its sample version through eigenvector in terms of the shape an the graient inicate by the color change. We also observe in Figure.1 that the nonlinear embeings given by the first two leaing eigenvectors an eigenfunctions of the secon-orer polynomial kernel are roughly along the two iagonal lines of the unit square (0, 1), which correspon to the irections of the largest variation in the uniform istribution. λ^ =.540 λ^ =.044 λ^ =.01 X X λ =.506 λ =.0 λ = Figure.1: Comparison of the contours of the nonlinear embeings given by three leaing eigenvectors an the theoretical eigenfunctions for the uniform ata. The upper three panels are for the embeings inuce by the eigenvectors for three nonzero eigenvalues, an the lower three panels are for the corresponing eigenfunctions.

40 .. Mixture normal example We turn to a mixture of normal istributions for (X 1, X ) t. Suppose that X 1 an X are two inepenent variables istribute with the following mixture Gaussian istribution: X N, N, X For this example, we consier the polynomial kernels of egrees an. When egree is The moment matrix for the mixture istribution can be obtaine as follows: µ 4,0 µ,1 µ, M = µ,1 µ, µ1, = µ, µ1, µ 0, Three nonzero eigenvalues of the matrix are λ 1 = 110.5, λ = 1.415, an λ =.4. With the corresponing eigenvectors of the moment matrix, we get the following three eigenfunctions: φ 1 (x) = (0.6x 1 0.5x 1 x x ), φ (x) = ( 0.45x x 1 x + 0.4x ), φ (x) = 1.4 (0.064x x 1 x + 0.5x ). Contours of these eigenfunctions are isplaye in the bottom panels of Figure.. For their sample counterparts, we generate a ranom sample of size 400 from the mixture of two normals. A scatter plot of the sample is isplaye in the top left panel of Figure.4. Three nonzero eigenvalues from the kernel matrix are

41 foun to be ˆλ 1 = , ˆλ = 1.6, an ˆλ =.. The top panels of Figure. show the contours of the ata embeings given by the corresponing eigenvectors. The ata embeings an eigenfunctions for this mixture normal example also exhibit strong similarity. The contours of the leaing embeing an eigenfunction are ellipses centere at the origin. It appears that the minor axis of the ellipses for the leaing eigenfuncion correspons to the line connecting the two mean vectors of the mixture istribution, capturing the largest ata variation, an the major axis is perpenicular to the mean ifference. The contours of the secon leaing eigenfunction are hyperbolas centere at the origin. The asymptotes of the hyperbolas for the eigenfunction are the same as the major an minor axes for the leaing eigenfunction. Although the approximate symmetry aroun the origin that the ata embeings an eigenfunctions exhibit reflects that of the unerlying istribution, information about the two normal components is lost after projection. If imension reuction is to be use primarily for ientifying the clusters later, then the quaratic kernel woul not be useful in this case. When egree is The moment matrix for = involves the moments up to orer 6, an for the mixture istribution, it is explicitly given by µ 6,0 µ5,1 µ4, µ, M µ5,1 µ 4, µ, µ, = = µ4, µ, µ,4 µ1, µ, µ,4 µ1,5 µ 0,

42 λ^ = λ^ = 1.6 λ^ =. X X λ = λ = λ = Figure.: Comparison of the contours of the nonlinear embeings given by three leaing eigenvectors (top panels) an the theoretical eigenfunctions (bottom panels) for the mixture normal ata when egree is. The matrix has four nonzero eigenvalues, λ 1 = , λ = 4.4, λ = 5.66, an λ 4 =.0, an the corresponing eigenfunctions are 1 φ 1 (x) = (0.4x 1 0.1x 1x + 0.4x 1 x 0.0x ), 1 φ (x) = 4.4 ( 0.51x 1 1.0x 1x + 0.6x 1 x 0.1x ), φ (x) = ( 0.11x 1 1.0x 1x 0.x 1 x x ), φ 4 (x) = 1.0 ( 0.06x x 1x + 1.0x 1 x x ). We obtaine the kernel matrix with the polynomial kernel of egree for the same ata as in = case. Four leaing eigenvalues for this kernel matrix are ˆλ 1 = 1.5, ˆλ = 46.10, ˆλ = 5.54, an ˆλ 4 =.65. 1

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Pure Further Mathematics 1. Revision Notes

Pure Further Mathematics 1. Revision Notes Pure Further Mathematics Revision Notes June 20 2 FP JUNE 20 SDB Further Pure Complex Numbers... 3 Definitions an arithmetical operations... 3 Complex conjugate... 3 Properties... 3 Complex number plane,

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Semiclassical analysis of long-wavelength multiphoton processes: The Rydberg atom

Semiclassical analysis of long-wavelength multiphoton processes: The Rydberg atom PHYSICAL REVIEW A 69, 063409 (2004) Semiclassical analysis of long-wavelength multiphoton processes: The Ryberg atom Luz V. Vela-Arevalo* an Ronal F. Fox Center for Nonlinear Sciences an School of Physics,

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Delocalization of boundary states in disordered topological insulators

Delocalization of boundary states in disordered topological insulators Journal of Physics A: Mathematical an Theoretical J. Phys. A: Math. Theor. 48 (05) FT0 (pp) oi:0.088/75-83/48//ft0 Fast Track Communication Delocalization of bounary states in isorere topological insulators

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

A Hybrid Approach for Modeling High Dimensional Medical Data

A Hybrid Approach for Modeling High Dimensional Medical Data A Hybri Approach for Moeling High Dimensional Meical Data Alok Sharma 1, Gofrey C. Onwubolu 1 1 University of the South Pacific, Fii sharma_al@usp.ac.f, onwubolu_g@usp.ac.f Abstract. his work presents

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS MICHAEL HOLST, EVELYN LUNASIN, AND GANTUMUR TSOGTGEREL ABSTRACT. We consier a general family of regularize Navier-Stokes an Magnetohyroynamics

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains Hyperbolic Systems of Equations Pose on Erroneous Curve Domains Jan Norström a, Samira Nikkar b a Department of Mathematics, Computational Mathematics, Linköping University, SE-58 83 Linköping, Sween (

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

On conditional moments of high-dimensional random vectors given lower-dimensional projections

On conditional moments of high-dimensional random vectors given lower-dimensional projections Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

New Statistical Test for Quality Control in High Dimension Data Set

New Statistical Test for Quality Control in High Dimension Data Set International Journal of Applie Engineering Research ISSN 973-456 Volume, Number 6 (7) pp. 64-649 New Statistical Test for Quality Control in High Dimension Data Set Shamshuritawati Sharif, Suzilah Ismail

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A Weak First Digit Law for a Class of Sequences

A Weak First Digit Law for a Class of Sequences International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs Mathias Fuchs, Norbert Krautenbacher A variance ecomposition an a Central Limit Theorem for empirical losses associate with resampling esigns Technical Report Number 173, 2014 Department of Statistics

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Some properties of random staircase tableaux

Some properties of random staircase tableaux Some properties of ranom staircase tableaux Sanrine Dasse Hartaut Pawe l Hitczenko Downloae /4/7 to 744940 Reistribution subject to SIAM license or copyright; see http://wwwsiamorg/journals/ojsaphp Abstract

More information

The Three-dimensional Schödinger Equation

The Three-dimensional Schödinger Equation The Three-imensional Schöinger Equation R. L. Herman November 7, 016 Schröinger Equation in Spherical Coorinates We seek to solve the Schröinger equation with spherical symmetry using the metho of separation

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

LeChatelier Dynamics

LeChatelier Dynamics LeChatelier Dynamics Robert Gilmore Physics Department, Drexel University, Philaelphia, Pennsylvania 1914, USA (Date: June 12, 28, Levine Birthay Party: To be submitte.) Dynamics of the relaxation of a

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract

More information

3 The variational formulation of elliptic PDEs

3 The variational formulation of elliptic PDEs Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

One-dimensional I test and direction vector I test with array references by induction variable

One-dimensional I test and direction vector I test with array references by induction variable Int. J. High Performance Computing an Networking, Vol. 3, No. 4, 2005 219 One-imensional I test an irection vector I test with array references by inuction variable Minyi Guo School of Computer Science

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

arxiv: v1 [physics.flu-dyn] 8 May 2014

arxiv: v1 [physics.flu-dyn] 8 May 2014 Energetics of a flui uner the Boussinesq approximation arxiv:1405.1921v1 [physics.flu-yn] 8 May 2014 Kiyoshi Maruyama Department of Earth an Ocean Sciences, National Defense Acaemy, Yokosuka, Kanagawa

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

Applications of the Wronskian to ordinary linear differential equations

Applications of the Wronskian to ordinary linear differential equations Physics 116C Fall 2011 Applications of the Wronskian to orinary linear ifferential equations Consier a of n continuous functions y i (x) [i = 1,2,3,...,n], each of which is ifferentiable at least n times.

More information

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012 Free rotation of a rigi boy 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012 1 Introuction In this section, we escribe the motion of a rigi boy that is free to rotate

More information

Solution to the exam in TFY4230 STATISTICAL PHYSICS Wednesday december 1, 2010

Solution to the exam in TFY4230 STATISTICAL PHYSICS Wednesday december 1, 2010 NTNU Page of 6 Institutt for fysikk Fakultet for fysikk, informatikk og matematikk This solution consists of 6 pages. Solution to the exam in TFY423 STATISTICAL PHYSICS Wenesay ecember, 2 Problem. Particles

More information