Remarks on kernel Bayes rule

Size: px
Start display at page:

Download "Remarks on kernel Bayes rule"

Transcription

1 Johno et al, Cogent Mathematics & Statistics 08, 5: STATISTICS RESEARCH ARTICLE Remarks on kernel Bayes rule Hisashi Johno, Kazunori Nakamoto * and Tatsuhiko Saigo Received: 09 April 07 Accepted: 5 February 08 First Published: 05 March 08 *Corresponding author: Kazunori Nakamoto, Center for Medical Education and Sciences, Faculty of Medicine, University of Yamanashi, Shimokato 0, Chuo, Yamanashi, , Japan nakamoto@yamanashiacjp Reviewing editor: Hiroshi Shiraishi, Keio university, Japan Additional information is available at the end of the article Abstract: The kernel Bayes rule has been proposed as a nonparametric kernelbased method to realize Bayesian inference in reproducing kernel Hilbert spaces However, we demonstrate both theoretically and experimentally that the way of incorporating the prior in the kernel Bayes rule is unnatural In particular, we show that under some reasonable conditions, the posterior in the kernel Bayes rule is completely unaffected by the prior, which seems to be irrelevant in the context of Bayesian inference We consider that this phenomenon is in part due to the fact that the assumptions in the kernel Bayes rule do not hold in general Subjects: Mathematics & Statistics Keywords: Kernel method; Bayes rule; reproducing kernel Hilbert space Introduction The kernel Bayes rule has recently emerged as a novel framework for Bayesian inference Fukumizu, Song, & Gretton, 03; Song, Fukumizu, & Gretton, 04; Song, Huang, Smola, & Fukumizu, 009 It is generally agreed that, in this framework, we can estimate the kernel mean of the posterior distribution, given kernel mean expressions of the prior and likelihood distributions Since the distributions are mapped and nonparametrically manipulated in infinite-dimensional feature spaces called reproducing kernel Hilbert spaces RKHS, it is believed that the kernel Bayes rule can accurately evaluate the statistical features of high-dimensional data and enable Bayesian inference even if there were no appropriate parametric models To date, several applications of the kernel Bayes rule have been reported Fukumizu et al, 03; Kanagawa et al, 04 However, the basic theory and the algorithm of the kernel Bayes rule might need to be modified because of the following reasons: ABOUT THE AUTHORS Hisashi Johno is a PhD student at the Department of Mathematical Sciences, Faculty of Medicine, University of Yamanashi, Japan His current research interests include probability theory and interpretable machine learning Kazunori Nakamoto is a professor of mathematics at Center for Medical Education and Sciences, Faculty of Medicine, University of Yamanashi, Japan His main research interests include algebraic geometry, invariant theory, and the moduli of representations Tatsuhiko Saigo is an associate professor of probability and statistics at Center for Medical Education and Sciences, Faculty of Medicine, University of Yamanashi, Japan His research fields include probability theory, statistics, and applied mathematics PUBLIC INTEREST STATEMENT This paper examines the validity of the kernel Bayes rule, a recently proposed nonparametric framework for Bayesian inference The researchers on the kernel Bayes rule are aiming to apply this method to a wide range of Bayesian inference problems However, as we demonstrate in this paper, the way of incorporating the prior in the kernel Bayes rule seems wrong in the context of Bayesian inference Several theorems of the kernel Bayes rule rely on a strong assumption which does not hold in general The problems of the kernel Bayes rule seem to be nontrivial and difficult, and we have currently no idea to solve them We hope that this study would trigger reexamination and correction of the basic framework of the kernel Bayes rule 08 The Authors This open access article is distributed under a Creative Commons Attribution CC-BY 40 license Page of 9

2 Johno et al, Cogent Mathematics & Statistics 08, 5: The posterior in the kernel Bayes rule is in some cases completely unaffected by the prior The posterior in the kernel Bayes rule considerably depends upon the choice of the parameters to regularize covariance operators 3 It does not hold in general that conditional expectation functions are included in the RKHS, which is an essential assumption of the kernel Bayes rule This paper is organized as follows We begin in Section with a brief review of the kernel Bayes rule In Section 3, we theoretically address the three arguments described above Numerical experiments are performed in Section 4 to confirm the theoretical results in Section 3 In Section 5, we summarize the theoretical and experimental results and present our conclusions Some of the proofs for Sections and 3 are given in Section 6 Kernel Bayes rule In this section, we briefly review the kernel Bayes rule following Fukumizu et al, 03 Let and be measurable spaces, X, Y be a random variable with an observed distribution P on, U be a random variable with the prior distribution Π on, and Z, W be a random variable with the joint distribution Q on Note that Q is defined by the prior Π and the family {P x x }, where P x denotes the conditional distribution of Y given X = x For each y, let Q y represent the posterior distribution of Z given W = y The aim of the kernel Bayes rule is to derive the kernel mean of Q y Definition Let and be measurable positive definite kernels on and such that E[ X, X] < and E[ Y, Y] <, respectively, where E[ ] denotes the expectation operator Let and be the RKHS defined by and, respectively We consider two bounded linear operators C YX : and C XX : such that g, C YX f = E [ f XgY ] and f, C XX f = E [ f Xf X ] for any f, f, f and g, where, and, denote inner products on and, respectively The integral expressions for C YX and C XX are given by C YX f =, yf x dpx, y and C XX f =, xf x dp x, where P denotes the marginal distribution of X Let C XY be the bounded linear operator defined by f, C XY g = E [ f XgY ] for any f and g Then C XY is the adjoint of C YX THEOREM Fukumizu et al, 03, Theorem C XX E[gY X = ] =C XY g If E[gY X = ] for g, then Definition Let Q denote the marginal distribution of W Assuming that E[ U, U] < and E[ W, W] <, we can define the kernel means of Π and Q by m Π = E[, U] and m Q = E[, W], respectively Due to the reproducing properties of and, the kernel means satisfy f, m Π = E[f U] and g, m Q = E[gW] for any f and g THEOREM Fukumizu et al, 03, Theorem If C XX is injective, m Π RanC XX, and E[gY X = ] for any g, then Page of 9

3 Johno et al, Cogent Mathematics & Statistics 08, 5: m Q = C YX C XX m Π, where RanC XX denotes the range of C XX Here we have, for any x, E [, Y X = x ] = C YX C k XX, x 3 by replacing m Π in Equation for, x It is noted in Fukumizu et al 03 that the assumption m Π RanC XX does not hold in general In order to remove this assumption, C XX + εi has been suggested to be used instead of C XX, where ε is a regularization constant and I is the identity operator Thus, the approximations of Equations and 3 are respectively given by m reg Q = C YX CXX + εi mπ and E reg[, Y X = x ] = C YX CXX + εi k, x Similarly, for any y, the approximation of m Q y is provided by m reg = E reg[ k Q y, Z W = y ] = C ZW CWW + δi k, y, 4 where δ is a regularization constant and the linear operators C ZW will be defined below Definition 3 We consider the kernel means m Q = m ZW and m WW such that m ZW, g f = E [ f ZgW ] and m WW, g g = E [ g Wg W ] for any f and g, g, g, where denotes the tensor product Let C YXX : and C YYX : be bounded linear operators which respectively satisfy g f, C YXX h = E [ gyf XhX ], g g, C YYX f = E [ g Yg Yf X ] 5 for any f, h and g, g, g From Theorem, Fukumizu et al 03 proposed that m ZW and m WW can be given by m ZW = C YXX C XX m Π and m WW = C YYX C XX m Π In case m Π is not included in RanC XX, they suggested that m ZW and m WW could be approximated by m reg = C ZW YXX CXX + εi mπ and m reg = C WW YYX CXX + εi mπ Remark Fukumizu et al, 03, p 3760 m ZW and m WW can respectively be identified with C ZW Here, we introduce the empirical method for estimating the posterior kernel mean m Q y following Fukumizu et al, 03 Definition 4 Suppose we have an independent and identically distributed iid sample {X i, Y i } n i= from the observed distribution P on and a sample {U j } l j= from the prior distribution Π on The prior kernel mean m Π is estimated by l m Π = γ j, U j, j= 6 Page 3 of 9

4 Johno et al, Cogent Mathematics & Statistics 08, 5: where γ are weights Let us put m Π = m Π X,, m Π X n T, G X = X i, X j i,j n, and G Y = Y i, Y j i, j n Proposition Fukumizu et al, 03, Proposition 3, revised Let I n denote the identity matrix of size n The estimates of C ZW are given by Ĉ ZW = μ i, X i, Y i and ĈWW = μ i, Y i, Y i, i= respectively, where μ = μ,, μ n T =G X m Π The proof of this revised proposition is given in Section 6 It is suggested in Fukumizu et al 03 that Equation 4 can be empirically estimated by m Q y = ĈZWĈ + δi WW n ĈWW, y THEOREM 3 Fukumizu et al, 03, Proposition 4 Given an observation y, m Q y can be calculated by m Q y = k T X R X Y k Y y, R X Y =ΛG Y where Λ =diag μ is the diagonal matrix with the elements of μ, k X =, X,,, X n T, and k Y =, Y,,, Y n T If we want to know the posterior expectation of a function f given an observation y, it is estimated by f, m Q y = f T X R X Y y, where f X =f X,, f X n T i= ΛG Y + δi Λ, 3 Theoretical arguments In this section, we theoretically support the three arguments raised in Section First, we show in Section 3 that the posterior kernel mean m Q y is completely unaffected by the prior distribution Π under the condition that Λ and G Y are non-singular This implies that, at least in some cases, Π does not properly affect m Q y Second, we mention in Section 3 that the linear operators C XX are not always surjective, and address the problems associated with the setting of the regularization parameters ε and δ Third, we demonstrate in Section 33 that conditional expectation functions are not generally contained in the RKHS, which means that Theorems,, and 5 8 in Fukumizu et al 03 do not work in some situations 3 Relations between the posterior m Q y and the prior Π Let us review Theorem 3 Assume that G Y and Λ are non-singular matrices This assumption is not so strange, as shown in Section 6 The matrix R X Y =ΛG Y ΛG Y + δi Λ tends to G Y as δ tends to 0 Furthermore, if we set δ = 0 from the beginning, we obtain R X Y = G Y This implies that the posterior kernel mean m Q y = k T R k X X Y Y y never depends on the prior distribution Π on, which seems to be a contradiction to the nature of Bayes rule Some readers may argue that, even in this case, we should not set δ = 0 Then, however, there is ambiguity about why and how the regularization parameters are introduced in the kernel Bayes rule, since Fukumizu et al originally used the regularization parameters just to solve inverse problems as an analog of ridge regression Fukumizu et al, 03, p 3758 They seem to support the validity of the regularization parameters by Theorems 5, 6, 7, and 8 in Fukumizu et al 03, Page 4 of 9

5 Johno et al, Cogent Mathematics & Statistics 08, 5: however, these theorems do not work without the strong assumption that conditional expectation functions are included in the RKHS, as will be discussed in Section 3 In addition, since the theorems work only when δ n, etc decay to zero sufficiently slowly, it seems that we have no principled way to choose values for the regularization parameters, except for cross-validation or similar techniques It is worth mentioning that, in our simple experiments in Section 4, we could not obtain a reasonable result with the kernel Bayes rule using any combination of values for the regularization parameters 3 The inverse of the operators C XX As noted by Fukumizu et al 03, the linear operators C XX are not surjective in some usual cases, the proof of which is given in Section 63 Therefore, they proposed an alternative way of obtaining a solution f of the equation C XX f = m Π, that is, a regularized inversion f =C XX + εi m Π as an analog of ridge regression, where ε is a regularization parameter and I is an identity operator One of the disadvantages of this method is that the solution f =C XX + εi m Π depends upon the choice of ε In Section 4, we numerically show that the prediction using the kernel Bayes rule considerably depends on the regularization parameters ε and δ Theorems 5 8 in Fukumizu et al 03 seem to support the appropriateness of the regularized inversion However, these theorems work under the condition that conditional expectation functions are contained in the RKHS, which does not hold in some cases as proved in Section 33 Furthermore, since we need to assume sufficiently slow decay of the regularization constants ε and δ in these theorems, it is practically difficult to set appropriate values for ε and δ A cross-validation procedure seems to be useful for tuning the parameters and we may obtain good experimental results, however, it seems to lack theoretical background Instead of the regularized inversion method, we can compute generalized inverse matrices of G X and ΛG Y, given a sample {X i, Y i } n i= Below, we briefly introduce a generalization of a matrix inverse For more details, see Horn and Johnson 03 Definition 5 Let A be a matrix of size m n over the complex number space C We say that a matrix A of size n m is a generalized inverse matrix of A if AA A = A We also say that a matrix A of size n m is the Moore-Penrose generalized inverse matrix of A if AA and A A are Hermitian, AA A = A, and A AA = A Remark In fact, any matrix A has the Moore-Penrose generalized inverse matrix A Note that A is uniquely determined by A If A is square and non-singular, then A = A = A For a generalized inverse matrix A of size n m, AA v = v for any vector v C m if v is contained in the image of A In particular, A v is a vector contained in the preimage of v under A In the calculation of m Q y = k T R k X X Y Y y, we numerically compare the case R = X Y Λ G Y Λ with the original case R X Y =ΛG Y ΛG Y + δi Λ in Section 4, where Λ = diagg m X Π 33 Conditional expectation functions and RKHS In this subsection, we show that conditional expectation functions are in some cases not contained in the RKHS Definition 6 For p [,, we define the spaces L p R, L p R, C, and L p R, R as { L p R: = f : R R } f x p dx <, { L p R, C: = f : R C } f x p dx <, { L p R, R: = f : R R } R f x, x p dx dx < Page 5 of 9

6 Johno et al, Cogent Mathematics & Statistics 08, 5: We also define the L p norm for f L p R or f L p R, C as f p : = f x p dx p, and the L p norm for f L p R, R as f p : = f x R, x p p dx dx Definition 7 For a function f L R, C L R, C, we define its Fourier transform as f t: = f x exp tx dx π We can uniquely extend the Fourier transform to an isometry : L R, C L R, C We also define the inverse Fourier transform : L R, C L R, C as an isometry uniquely determined by f t: = f x exp tx dx π for f L R, C L R, C Definition 8 Let us define a Gaussian kernel k G on R by x y k G x, y: = exp πσ σ As described in Fukumizu 04, the RKHS of real-valued functions and complex-valued functions corresponding to the positive definite kernel k G are given by { G : = f L R t f } exp σ t dt <, { G R, C: = f L R, C t f exp σ t dt < }, respectively, and the inner product of f, g G or f, g G R, C on the RKHS is calculated by f, g = σ f tĝt exp t dt, where the overline denotes the complex conjugate Remark that G is a real Hilbert subspace contained in the complex Hilbert space G R, C Fukumizu et al 03 mentioned that the conditional expectation function E[gY X = ] is not always included in Indeed, if the variables X and Y are independent, then E[gY X = ] becomes a constant function on, the value of which might be non-zero In the case that = R and = k G, the constant function with non-zero value is not contained in = G Additionally, in order to prove Theorems 5 and 8 in Fukumizu et al 03, they made the assumption that E[ Y, Ỹ X = x, X = x] and E[ Z, Z W = y, W = ỹ], where X, Ỹ and Z, W are independent copies of the random variables X, Y and Z, W on, respectively We also see that this assumption does not hold in general Suppose that X and Y are independent and that so are X and Ỹ Then E[ Y, Ỹ X = x, X = x] is a constant function of x, x, the value of which might be non-zero In the case that = R and = k G, the constant function having non-zero value is not contained in = G G Note that G G is isomorphic to the RKHS corresponding to the kernel kx, x, x, x = k G x, x k G x, x on R, that is, 7 Page 6 of 9

7 Johno et al, Cogent Mathematics & Statistics 08, 5: { G G = f L R, R R t f, t exp σ t + t dt dt < }, where the Fourier transform of f :R R is defined by f t, t : = lim n f x π x, x exp t x + t x dx dx +x<n Thus, the assumption that conditional expectation functions are included in the RKHS does not hold in general Since most of the theorems in Fukumizu et al 03 require this assumption, the kernel Bayes rule may not work in several cases 4 Numerical experiments In this section, we perform numerical experiments to illustrate the theoretical results in Sections 3 and 3 We first introduce probabilistic classifiers in Section 4 based on conventional Bayes rule assuming Gaussian distributions BR, the original kernel Bayes rule KBR, and the kernel Bayes rule using Moore-Penrose generalized inverse matrices KBR In Section 4, we apply the three classifiers to a binary classification problem with computer-simulated data sets Numerical experiments are implemented in version 76 of the Python software Python Software Foundation, Wolfeboro Falls, NH, USA 4 Algorithms of the three classifiers, BR, KBR, and KBR Let X, Y be a random variable with a distribution P on, where ={C,, C g } is a family of classes and = R d Let Π and Q be the prior and the joint distributions on and, respectively Suppose we have an iid training sample {X i, Y i } n i= from the distribution P The aim of this subsection is to derive algorithms of the three classifiers, BR, KBR, and KBR, which respectively calculate the posterior probability for each class given an observation y, that is, Q y C,, Q y C g 4 The algorithm of BR In BR, we estimate the posterior probability of j-th class j =,, g given a test value y by Q y C j = P Cj y ΠC j, g P k= Ck y ΠC k where P Cj is the density function of the d-dimensional normal distribution M j, Ŝj defined by P Cj = π d exp M j T Ŝ j M j Ŝ j The mean vector M j R d and the covariance matrix Ŝj R d d are calculated from the training data of the class C j 4 The algorithm of KBR Let us define positive definite kernels and as X, X = for X, X and Y, Y, and the corresponding RKHS as and, respectively Here we set Y = m Π = d { X = X 0 X X and Y, Y = exp Y Y πσ σ i= y i for Y =y, y,, y d T = R d Then, the prior kernel mean is given by g ΠC j, C j, j= Page 7 of 9

8 Johno et al, Cogent Mathematics & Statistics 08, 5: where g j= ΠC j = Let us put G X = X i, X j i, j n, G Y = Y i, Y j i, j n, D = A t{c i }X j i g, j n {0, } g n, m Π = m Π X,, m Π X n T, μ = μ,, μ n T =G X m Π, Λ=diag μ, k X =, X,,, X n T, ky =, Y,,, Y n T, and R X Y =ΛG Y ΛG Y + δi n Λ, where I n is the identity matrix of size n and ε, δ R are heuristically set regularization parameters Note that A t stands for the indicator function of a set A described as A t: = { t A 0 t A Following Theorem 3, the posterior kernel mean given a test value y is estimated by m Q y = k T X R X Y k Y y Here, we estimate the posterior probabilities for classes given a test value y by Q y C Q y C g = {C}, m Q y {Cg}, m Q y 43 The algorithm of KBR Let G X denote the Moore-Penrose generalized inverse matrix of G X Let us put μ = μ,, μ n T = G X m Π, Λ = diag μ, and R = Λ X Y Λ G Y Replacing R X Y in Section 4 for R X Y, the posterior probabilities for classes given a test value y is estimated by Q y C,, Q y C g T = DR X Y k Y y = DR X Y k Y y 4 Probabilistic predictions by the three classifiers Here, we apply the three classifiers defined in Section 4 to a binary classification problem using computer-simulated data-sets, where ={C, C } and = R In the first step, we independently generate 00 sets of training samples with each training sample being {X i, Y i } 00 i=, where X i = C and Y i M, S if i 50, X i = C and Y i M, S if 5 i 00, M =, 0 T, M =0, T, and S = S = diag0, 0 Here, {Y i } 50 i= and {Y i }00 i=5 are sampled iid from M, S and M, S, respectively Individual Y-values of one of the training samples are plotted in Figure With each of the 00 training samples and a simulated prior probability of C, or ΠC {0, 0,, 09}, the classifiers defined in Section 4 estimate the posterior probability of C given a test value y {05, 05, 06, 04, 07, 03}, that is, Q y C Figures 5 show the mean plus or minus standard error of the mean, SEM of the 00 values of Q y C calculated by each of the classifiers, BR, KBR, and KBR Here we show the case where σ in KBR and KBR is fixed to 0, and the regularization parameters of KBR are set to be ε = δ = 0 7 Figure, ε = δ = 0 5 Figure 3, ε = δ = 0 3 Figure 4, and ε = δ = 0 Figure 5 In Figures 5, BR_th represents the theoretical value of BR, which coincides with BR if the parameters M, M, Ŝ, and Ŝ are set to be M, M, S, and S, respectively Consistent to Section 3, Q y C calculated by KBR is poorly influenced by ΠC compared with that by BR when ε and δ are set to be small see Figures and 3 In addition, Q y C calculated by KBR also seems to be uninfluenced by ΠC When ε and δ are set to be larger, the effect of ΠC on Q y C becomes apparent in KBR, however, the value of Q y C becomes too small Page 8 of 9

9 Johno et al, Cogent Mathematics & Statistics 08, 5: Figure Individual Y-values of a training sample see Figures 4 and 5 These results suggest that in the kernel Bayes rule, the posterior does not depend on the prior if ε and δ are negligible, which might be a contradiction to the nature of Bayes theorem Moreover, even though the prior affects the posterior when ε and δ become larger, the posterior seems too much dependent on ε and δ, which are initially defined just for the regularization of matrices We have also tested all possible combinations of the following values for the parameters in KBR and/or KBR: ε {0, 0 3, 0 5, 0 7, 0 9, 0, 0 3, 0 5 }, δ {0, 0 3, 0 5, 0 7, 0 9, 0, 0 3, 0 5 }, and σ {00, 0,, 0, 00} All the experimental results have been evaluated in a similar manner as above, and none of the results are found to be reasonable in the context of Bayesian inference see Supplementary material 5 Conclusions One of the important features of Bayesian inference is that it provides a reasonable way of updating the probability for a hypothesis as additional evidence is acquired The kernel Bayes rule has been expected to enable Bayesian inference in RKHS In other words, the posterior kernel mean has been considered to be reasonably estimated by the kernel Bayes rule, given kernel mean expressions of the prior and likelihood What is reasonable" depends on circumstances, however, some of the results in this paper seem to show obviously unreasonable aspects of the kernel Bayes rule, at least in the context of Bayesian inference First, as shown in Section 3, when Λ and G Y are non-singular matrices and so we set δ = 0, the posterior kernel mean m Q y is entirely unaffected by the prior distribution Π on This means that, in Bayesian inference with the kernel Bayes rule, prior beliefs are in some cases completely neglected in calculating the kernel mean of the posterior distribution Numerical evidence is also presented in Section 4 When the regularization parameters ε and δ are set to be small, the posterior probability calculated by the kernel Bayes rule KBR is almost unaffected by the prior probability in comparison with that by conventional Bayes rule BR Consistently, when the regularized inverse matrices in KBR are replaced for the Moore-Penrose generalized inverse matrices KBR, the posterior probability is also uninfluenced by the prior probability, which seems to be unsuitable in the context of Bayesian updating of a probability distribution Page 9 of 9

10 Johno et al, Cogent Mathematics & Statistics 08, 5: Figure The case ε = δ = 0 7 Figure 3 The case ε = δ = 0 5 Page 0 of 9

11 Johno et al, Cogent Mathematics & Statistics 08, 5: Figure 4 The case ε = δ = 0 3 Figure 5 The case ε = δ = 0 Page of 9

12 Johno et al, Cogent Mathematics & Statistics 08, 5: Second, as discussed in Sections 3 and 4, the posterior estimated by the kernel Bayes rule considerably depends upon the regularization parameters ε and δ, which are originally introduced just for the regularization of matrices A cross-validation approach is proposed in Fukumizu et al 03 to search for the optimal values of the parameters However, theoretical foundations seem to be insufficient for the correct tuning of the parameters Furthermore, in our experimental settings, we are not able to obtain a reasonable result using any combination of the parameter values, suggesting the possibility that there are no appropriate values for the parameters in general Thus, we consider it difficult to solve the problem that C XX are not surjective by just adding regularization parameters Third, as shown in Section 33, the assumption that conditional expectation functions are included in the RKHS does not hold in general Since this assumption is necessary for most of the theorems in Fukumizu et al 03, we believe that the assumption itself may need to be reconsidered In summary, even though current research efforts are focused on the application of the kernel Bayes rule Fukumizu et al, 03; Kanagawa et al, 04, it might be necessary to reexamine its basic framework of combining new evidence with prior beliefs 6 Proofs In this section, we provide some proofs for Sections and 3 6 Estimation of C ZW Here we give the proof of Proposition Proof Let ĈXX, ĈYXX, and ĈYYX denote the estimates of C, C, and C XX YXX YYX, respectively We define the estimates of m ZW and m WW as m ZW = ĈYXXĈ XX m Π and m WW = ĈYYXĈ XX m Π, and put h = Ĉ XX m Π According to Equation 5, for any f and g, mzw, g f = n f X i gy i hx i = i= = ĈYXX h, g f = Ê[ f XgYhX ] n hx i, X i, Y i, f g i=, where Ê[ ] represents the empirical expectation operator Thus, from Remark, Ĉ ZW = m ZW = n hx i, X i, Y i i= 8 Similarly, for any g, g, mww, g g = n = ĈYYX h, g g g Y i g Y i hx i = hx n i, Y i, Y i, g g i= Thus, from Remark, i= = Ê[ g Yg YhX ] Ĉ WW = m WW = n hx i, Y i, Y i i= 9 Next, we will derive hx,, hx n Since C XX is a self-adjoint operator, Page of 9

13 Johno et al, Cogent Mathematics & Statistics 08, 5: h, ĈXX f = ĈXX h, f = m Π, f = l γ j f U j for any f On the other hand, from Equation, j= h, ĈXX f = Ê[ f XhX ] = n for any f Hence, we have f X i hx i i= l γ j f U j = n j= f X i hx i i= 0 for any f Replacing f in Equation 0 for X,,, X n,, we have X X, U l X n X n, U l γ γ l = hx n G X hx n Using Equation 6, the left hand side of Equation is given by l j= γ j, U j,, X l γ k, U, k, X j= j j n l γ k X, U j= j j = = l γ k X, U j= j n j m Π X m Π X n Therefore, we have n hx hx n = G X m Π X G X + nεi m Π = μ m Π X n Replacing n hx,, hx n T for μ = μ,, μ n T, Equations 8 and 9 become Ĉ ZW = μ i, X i, Y i and ĈWW = μ i, Y i, Y i, i= i= respectively 6 Non-singularity of G Y and Λ Here we show that the assumption in Section 3 holds under reasonable conditions Definition 9 Let f be a real-valued function defined on a non-empty open domain Domf R d We say that f is analytic if f can be described by a Taylor expansion on a neighborhood of each point of Domf Proposition Let k be a positive definite kernel on R d Let ν be a probability measure on R d which is absolutely continuous with respect to Lebesgue measure Assume that k is an analytic function on R d R d and that the RKHS corresponding to k is infinite dimensional Then for any iid random variables X, X,, X n with the same distribution ν, the Gram matrix G X =kx i, X j i, j n is non-singular almost surely with respect to ν n = ν ν ν n times Page 3 of 9

14 Johno et al, Cogent Mathematics & Statistics 08, 5: Proof Let us put f x, x,, x n : = detkx i, x j i, j n Since the RKHS corresponding to k is infinite dimensional, there are ξ, ξ,, ξ n R d such that {k, ξ i } i n are linearly independent Then f ξ, ξ,, ξ n 0 and hence f is a non-zero analytic function Note that any non-trivial subvarieties of the euclidean spaces defined by analytic functions have Lebesgue measure zero By this fact, the subvariety { f : = x, x,, x n R d n f x, x,, x n =0 } R d n has Lebesgue measure zero Since ν is absolutely continuous, ν n f = 0 This completes the proof From Proposition, we easily obtain the following corollary Corollary Let k be a Gaussian kernel on R d and let X, X,, X n be iid random variables with the same normal distribution on R d Then the Gram matrix G X =kx i, X j i, j n is non-singular almost surely Proposition 3 Let k be a positive definite kernel on = R d, ν a probability measure on which is absolutely continuous with respect to Lebesgue measure Assume that k is an analytic function on and that the RKHS corresponding to k is infinite dimensional Then for any ε, γ, γ, U R + R l R d l except Lebesgue measure zero, and for any iid random variables X, X,, X n with the same distribution ν, each μ i for i =,,, n is defined almost surely and non-zero almost surely, where μ, μ,, μ n T =G X m Π, m Π = m Π X, m Π X,, m Π X n T, and m Π = l γ k, U Here R j= j j + denotes the set of positive real numbers Proof Let us put : = R + R l R d l, : = n, and f i x, x,, x n, ε, γ, γ, U : = μ i i =,,, n for x, x,, x n n and ε, γ, γ, U Since the RKHS corresponding to k is infinite dimensional, we can obtain ξ, ξ,, ξ n such that {k, ξ i } i n are linearly independent The Gram matrix kξ i, ξ j i,j n = k, ξ i, k, ξ j i,j n is positive definite, and its eigenvalues are all positive Hence detkξ i, ξ j i,j n > 0 for each ε R +, and detg X =detkx i, x j i,j n is a non-zero analytic function on n for each ε R + For ε, γ, let us define a closed measure-zero set ε, γ : = {x, x,, x n n detg X =0} Then f i, ε, γ is defined on n ε, γ for each i {,,, n} Using Cramer s rule, μ i = det η, η,, η i, m Π, η i+,, η n det, G X where η m stands for the m-th column vector of G X Here we denote by g i the numerator of μ i, that is, g i = μ i detg X It is easy to see that g i ξ, ξ,, ξ n, is a non-zero analytic function of on Indeed, if ε +0 = ξ i, γ =, and γ = γ 3 = = γ l = 0, then g i det k, ξ i, k, ξ j i,j n 0 Hence i : = { g i ξ, ξ,, ξ n, = 0} is a closed subset of with Lebesgue measure zero for each i {,,, n} For any ε, γ n, i= i { } i ε, γ : = n g i, ε, γ = 0 Page 4 of 9

15 Johno et al, Cogent Mathematics & Statistics 08, 5: is a closed subset of n with Lebesgue measure zero for each i {,,, n}, since g i, ε, γ is a non-zero analytic function of on n Therefore, μ i = f i, ε, γ is defined and non-zero for i =,,, n and for n ε, γ n ε, γ,, γ, U,, U if j= j l l ε, γ,, γ, U,, U l l n i= i This completes the proof The following corollary directly follows from Proposition 3 Corollary Let k be a Gaussian kernel on R d and let X, X,, X n be iid random variables with the same normal distribution on R d All other notations are as in Proposition 3 Then Λ: = diagμ, μ,, μ n is non-singular almost surely for any ε, γ, γ, U R + R l R d l except for those in a set of Lebesgue measure zero 63 Non-surjectivity of C XX The covariance operators C XX are not surjective in general This can be verified by the fact that they are compact operators If the operators are surjective on the corresponding RKHS which is infinite-dimensional, then they cannot be compact because of the open mapping theorem Here we present some easy examples where C XX are not surjective Let us consider for simplicity the case = R Let X be a random variable on R with a normal distribution μ, σ 0 We prove that C XX is not surjective under the usual assumption that the positive definite kernel on R is Gaussian In order to demonstrate this, we use the symbols defined in Section 33 and several proven results on function spaces and Fourier transforms see Rudin, 987, for example Note that the following three propositions are introduced without proofs Proposition 4 Let us put f x =exp ax + bx + c for a, b, c R, where a > 0 Then f t = exp t bt b + 4ac a 4a Proposition 5 For f L R, C, f t = f t almost everywhere In particular, if f L R, then f t = f t almost everywhere Proposition 6 For f L R, C, put f a x: = f x a Then f a t =exp at f t Definition 0 Let p denote the density function of the normal distribution μ, σ 0 on R, that is, μ p = exp πσ0 σ 0 Let X be a random variable on R with μ, σ The linear operator C : 0 XX G G is defined by C XX f, g G = E[f XgX] for any f, g G, which is also described as C XX f = f xk, xpx dx for any f G Proposition 7 If f, g G, then f, g G R Proof From Proposition 5, f t = f t and ĝt =ĝ t for any f, g G Then, using Equation 7, we have Page 5 of 9

16 Johno et al, Cogent Mathematics & Statistics 08, 5: f, g G = = σ f tĝt exp t σ f tĝt exp t dt = dt = σ = f tĝt exp t dt = f, g G σ f tĝt exp Therefore, f, g G R t σ f tĝ t exp dt t dt Proposition 8 If f G R, C, then f G R, C Proof From Proposition 5, f t = f t for f L R, C Then, using Equation 7, we have f = f, f G R,C G R, C = f t σ exp = t f σ exp Therefore, f G R, C t t dt = f t σ exp t dt dt = f < G R, C Here, we denote by Re and Im the real part and the imaginary part of a complex number, respectively We also denote by ClA the closure of a subset A in a topological space Corollary 3 If f G R, C, then Ref, Imf G Proof If f G R, C, then f G R, C by Proposition 8 Hence we see that Ref = f + f G, Imf = f f G This completes the proof Remark 3 If f G R, C, then there uniquely exist f, f G such that f = f + f by Corollary 3 This means that G R, C = G G, where denotes the direct sum Proposition 9 For any f L R, C and for any ε >0, there exists g G R, C such that f g <ε In other words, G R, C is dense in L R, C Proof Let C 0 R, C denote the space of continuous complex-valued functions with compact support on R Let us define G R, C by { G R, C: = h L R, C ht exp σ t dt < Note that G R, C coincides with the image of G R, C by the Fourier transform Then, C 0 R, C G R, C L R, C and ClC 0 R, C = L R, C Hence Cl G R, C = L R, C In other words, for any f L R, C and for any ε >0, there exists ĝ G R, C such that f ĝ <ε because f L R, C, which implies that there exists g G R, C such that f g <ε This completes the proof The following corollary has also been shown in Theorem 463 in Steinwart and Christmann 008 } Page 6 of 9

17 Johno et al, Cogent Mathematics & Statistics 08, 5: Corollary 4 Cl G =L R Proof From Proposition 9, for any f L R L R, C and for any ε >0, there exists g G R, C such that f g <ε By Remark 3, there exist g, g G such that g = g + g Thus, ε > f g = f g dx = f g f g dx = f g g dx Therefore, f g <ε This completes the proof Definition Let us define r, r n L R as rt: = t t, r t: =, n t, n t, where, and, n denote the indicator functions of the intervals, and, n, respectively We also put h n : = r n and h: = ř Note that lim n r n = r L R, because lim r r n = lim n Proposition 0 h n, h L R n n dx = 0 x Proof It is obvious that h n, h L R, C Since r n L R L R, we see that h n x = r n x = π = π = π where t = t Hence, h n L R On the other hand, hx =lim n = lim n π π = lim h x n n r n t exp r n t exp tx dt tx dt = π Therefore h L R Let us define k a : = a πσk G, a =exp σ k a RanC XX for any a R This implies that C XX is not surjective Proposition For any a R, k a G RanC XX r n t exp t x dt = r n x =h n x, n n rt exp tx dt r n t exp tx dt r n t exp t x dt G for a R Now, we prove that Proof Suppose that there exists g G such that C XX g = k a Then, for any f G, ka, f = C G XX g, f G Let us put k = 0 πσk G,0=exp σ From Proposition 4, kt =σexp σ Equation 7 and Proposition 6, the left hand side of Equation equals t Then, using Page 7 of 9

18 Johno et al, Cogent Mathematics & Statistics 08, 5: σ σ k a t f t exp t dt = exp at kt f t exp = σ exp at f t dt The right hand side of Equation is equal to Thus, Equation is equivalent to the following equation: Let us define h n,a x: = h n x a and h a x =hx a Then h n, a, h a L R It is easy to see that h n, a h a = h n h = r n r 0 as n Hence lim h n n, a = h a in L R Since ĥ n, a t =exp atĥn t by Proposition 6, we have E [ gxf X ] = gxf xpx dx = gp, f L R gp, f L R = σ t σ exp ĥn,a exp at f t dt t which indicates that h n, a G Substituting h n,a for f, Equation 3 becomes gp, hn, a L R = σ If n goes to infinity, the left hand side of Equation 4 becomes gp, h a L R R On the other hand, the right hand side of Equation 4 becomes t dt dt = t σ exp ĥn t dt = r t n σ exp n σ = t exp t dt <, exp atĥn,a t dt t dt 3 4 σ exp at exp atĥn t dt = σ ĥ n t dt = σ r n t dt = σ n t dt n This is a contradiction Therefore, there exists no g G such that C XX g = k a This completes the proof Supplementary material Supplementary material for this article can be accessed Funding Kazunori Nakamoto was partially supported by JSPS KAKENHI [grant numbers JP , JP5K0484] Author details Hisashi Johno hisashijohno@gmailcom Kazunori Nakamoto nakamoto@yamanashiacjp Tatsuhiko Saigo tsaigoh@yamanashiacjp Faculty of Medicine, Department of Mathematical Sciences, University of Yamanashi, Shimokato 0, Chuo, Yamanashi , Japan Center for Medical Education and Sciences, Faculty of Medicine, University of Yamanashi, Shimokato 0, Chuo, Yamanashi, , Japan Citation information Cite this article as: Remarks on kernel Bayes rule, Hisashi Johno, Kazunori Nakamoto & Tatsuhiko Saigo, Cogent Mathematics & Statistics 08, 5: 4470 References Fukumizu, K 04 Introduction to kernel methods in Japanese Tokyo: Asakura Shoten Fukumizu, K, Song, L, & Gretton, A 03 Kernel bayes rule: Bayesian inference with positive definite kernels Journal of Machine Learning Research, 4, Horn, R A, & Johnson, C R 03 Matrix analysis nd ed Cambridge: Cambridge University Press Kanagawa, M, Nishiyama, Y, Gretton, A, & Fukumizu, K 04 Monte carlo filtering using kernel embedding of Page 8 of 9

19 Johno et al, Cogent Mathematics & Statistics 08, 5: distributions Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence p Rudin, W 987 Real and complex analysis 3rd ed New York, NY: McGraw-Hill Book Co Song, L, Fukumizu, K, & Gretton, A 04 Kernel embeddings of conditional distributions IEEE Signal Processing Magazine, 30, 98 Song, L, Huang, J, Smola, A, & Fukumizu, K 009 Hilbert space embeddings of conditional distributions with applications to dynamical systems In Proceedings of the 6th Annual International Conference on Machine Learning p Steinwart, I, & Christmann, A 008 Support vector machines New York, NY: Springer 08 The Authors This open access article is distributed under a Creative Commons Attribution CC-BY 40 license You are free to: Share copy and redistribute the material in any medium or format Adapt remix, transform, and build upon the material for any purpose, even commercially The licensor cannot revoke these freedoms as long as you follow the license terms Under the following terms: Attribution You must give appropriate credit, provide a link to the license, and indicate if changes were made You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use No additional restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits Cogent Mathematics & Statistics ISSN: is published by Cogent OA, part of Taylor & Francis Group Publishing with Cogent OA ensures: Immediate, universal access to your article on publication High visibility and discoverability via the Cogent OA website as well as Taylor & Francis Online Download and citation statistics for your article Rapid online publication Input from, and dialog with, expert editors and editorial boards Retention of full copyright of your article Guaranteed legacy preservation of your article Discounts and waivers for authors in developing regions Submit your manuscript to a Cogent OA journal at wwwcogentoacom Page 9 of 9

Graded fuzzy topological spaces

Graded fuzzy topological spaces Ibedou, Cogent Mathematics (06), : 8574 http://dxdoiorg/0080/85068574 PURE MATHEMATICS RESEARCH ARTICLE Graded fuzzy topological spaces Ismail Ibedou, * Received: August 05 Accepted: 0 January 06 First

More information

Matrix l-algebras over l-fields

Matrix l-algebras over l-fields PURE MATHEMATICS RESEARCH ARTICLE Matrix l-algebras over l-fields Jingjing M * Received: 05 January 2015 Accepted: 11 May 2015 Published: 15 June 2015 *Corresponding author: Jingjing Ma, Department of

More information

On a mixed interpolation with integral conditions at arbitrary nodes

On a mixed interpolation with integral conditions at arbitrary nodes PURE MATHEMATICS RESEARCH ARTICLE On a mixed interpolation with integral conditions at arbitrary nodes Srinivasarao Thota * and Shiv Datt Kumar Received: 9 October 5 Accepted: February 6 First Published:

More information

A note on the unique solution of linear complementarity problem

A note on the unique solution of linear complementarity problem COMPUTATIONAL SCIENCE SHORT COMMUNICATION A note on the unique solution of linear complementarity problem Cui-Xia Li 1 and Shi-Liang Wu 1 * Received: 13 June 2016 Accepted: 14 November 2016 First Published:

More information

The plastic number and its generalized polynomial

The plastic number and its generalized polynomial PURE MATHEMATICS RESEARCH ARTICLE The plastic number and its generalized polynomial Vasileios Iliopoulos 1 * Received: 18 December 2014 Accepted: 19 February 201 Published: 20 March 201 *Corresponding

More information

Finding the strong defining hyperplanes of production possibility set with constant returns to scale using the linear independent vectors

Finding the strong defining hyperplanes of production possibility set with constant returns to scale using the linear independent vectors Rafati-Maleki et al., Cogent Mathematics & Statistics (28), 5: 447222 https://doi.org/.8/233835.28.447222 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE Finding the strong defining hyperplanes

More information

Some aspects on hesitant fuzzy soft set

Some aspects on hesitant fuzzy soft set Borah & Hazarika Cogent Mathematics (2016 3: 1223951 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE Some aspects on hesitant fuzzy soft set Manash Jyoti Borah 1 and Bipan Hazarika 2 * Received:

More information

Derivation, f-derivation and generalized derivation of KUS-algebras

Derivation, f-derivation and generalized derivation of KUS-algebras PURE MATHEMATICS RESEARCH ARTICLE Derivation, -derivation and generalized derivation o KUS-algebras Chiranjibe Jana 1 *, Tapan Senapati 2 and Madhumangal Pal 1 Received: 08 February 2015 Accepted: 10 June

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Response surface designs using the generalized variance inflation factors

Response surface designs using the generalized variance inflation factors STATISTICS RESEARCH ARTICLE Response surface designs using the generalized variance inflation factors Diarmuid O Driscoll and Donald E Ramirez 2 * Received: 22 December 204 Accepted: 5 May 205 Published:

More information

Existence and uniqueness of a stationary and ergodic solution to stochastic recurrence equations via Matkowski s FPT

Existence and uniqueness of a stationary and ergodic solution to stochastic recurrence equations via Matkowski s FPT Arvanitis, Cogent Mathematics 2017), 4: 1380392 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE Existence and uniqueness of a stationary and ergodic solution to stochastic recurrence equations

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

THE MODULI OF SUBALGEBRAS OF THE FULL MATRIX RING OF DEGREE 3

THE MODULI OF SUBALGEBRAS OF THE FULL MATRIX RING OF DEGREE 3 THE MODULI OF SUBALGEBRAS OF THE FULL MATRIX RING OF DEGREE 3 KAZUNORI NAKAMOTO AND TAKESHI TORII Abstract. There exist 26 equivalence classes of k-subalgebras of M 3 (k) for any algebraically closed field

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Petrović s inequality on coordinates and related results

Petrović s inequality on coordinates and related results Rehman et al., Cogent Mathematics 016, 3: 1798 PURE MATHEMATICS RESEARCH ARTICLE Petrović s inequality on coordinates related results Atiq Ur Rehman 1 *, Muhammad Mudessir 1, Hafiza Tahira Fazal Ghulam

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Non-linear unit root testing with arctangent trend: Simulation and applications in finance

Non-linear unit root testing with arctangent trend: Simulation and applications in finance STATISTICS RESEARCH ARTICLE Non-linear unit root testing with arctangent trend: Simulation and applications in finance Deniz Ilalan 1 * and Özgür Özel 2 Received: 24 October 2017 Accepted: 18 March 2018

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

The combined reproducing kernel method and Taylor series to solve nonlinear Abel s integral equations with weakly singular kernel

The combined reproducing kernel method and Taylor series to solve nonlinear Abel s integral equations with weakly singular kernel Alvandi & Paripour, Cogent Mathematics (6), 3: 575 http://dx.doi.org/.8/33835.6.575 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE The combined reproducing kernel method and Taylor series to

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

On the complex k-fibonacci numbers

On the complex k-fibonacci numbers Falcon, Cogent Mathematics 06, 3: 0944 http://dxdoiorg/0080/33835060944 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE On the complex k-fibonacci numbers Sergio Falcon * ceived: 9 January 05

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

The Inverse Function Theorem 1

The Inverse Function Theorem 1 John Nachbar Washington University April 11, 2014 1 Overview. The Inverse Function Theorem 1 If a function f : R R is C 1 and if its derivative is strictly positive at some x R, then, by continuity of

More information

Convergence of the Ensemble Kalman Filter in Hilbert Space

Convergence of the Ensemble Kalman Filter in Hilbert Space Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based

More information

Kernel Bayes Rule: Bayesian Inference with Positive Definite Kernels

Kernel Bayes Rule: Bayesian Inference with Positive Definite Kernels Journal of Machine Learning Research 14 (2013) 3753-3783 Submitted 12/11; Revised 6/13; Published 12/13 Kernel Bayes Rule: Bayesian Inference with Positive Definite Kernels Kenji Fukumizu The Institute

More information

Math 210B. Artin Rees and completions

Math 210B. Artin Rees and completions Math 210B. Artin Rees and completions 1. Definitions and an example Let A be a ring, I an ideal, and M an A-module. In class we defined the I-adic completion of M to be M = lim M/I n M. We will soon show

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Karhunen-Loève decomposition of Gaussian measures on Banach spaces Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix GT APSSE - April 2017, the 13th joint work with Xavier Bay. 1 / 29 Sommaire 1 Preliminaries on Gaussian processes 2

More information

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014 What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

ECE 275A Homework #3 Solutions

ECE 275A Homework #3 Solutions ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

A detection for patent infringement suit via nanotopology induced by graph

A detection for patent infringement suit via nanotopology induced by graph PURE MATHEMATICS RESEARCH ARTICLE A detection for patent infringement suit via nanotopology induced by graph M. Lellis Thivagar 1 *, Paul Manuel 2 and. Sutha Devi 1 Received: 08 October 2015 Accepted:

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Kernel Embeddings of Conditional Distributions

Kernel Embeddings of Conditional Distributions Kernel Embeddings of Conditional Distributions Le Song, Kenji Fukumizu and Arthur Gretton Georgia Institute of Technology The Institute of Statistical Mathematics University College London Abstract Many

More information

Diagonalizing Hermitian Matrices of Continuous Functions

Diagonalizing Hermitian Matrices of Continuous Functions Int. J. Contemp. Math. Sciences, Vol. 8, 2013, no. 5, 227-234 HIKARI Ltd, www.m-hikari.com Diagonalizing Hermitian Matrices of Continuous Functions Justin Cyr 1, Jason Ekstrand, Nathan Meyers 2, Crystal

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7 Linear Algebra and its Applications-Lab 1 1) Use Gaussian elimination to solve the following systems x 1 + x 2 2x 3 + 4x 4 = 5 1.1) 2x 1 + 2x 2 3x 3 + x 4 = 3 3x 1 + 3x 2 4x 3 2x 4 = 1 x + y + 2z = 4 1.4)

More information

Determinant lines and determinant line bundles

Determinant lines and determinant line bundles CHAPTER Determinant lines and determinant line bundles This appendix is an exposition of G. Segal s work sketched in [?] on determinant line bundles over the moduli spaces of Riemann surfaces with parametrized

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Chapter 3: Baire category and open mapping theorems

Chapter 3: Baire category and open mapping theorems MA3421 2016 17 Chapter 3: Baire category and open mapping theorems A number of the major results rely on completeness via the Baire category theorem. 3.1 The Baire category theorem 3.1.1 Definition. A

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 9 November 13 November Deadline to hand in the homeworks: your exercise class on week 16 November 20 November Exercises (1) Show that if T B(X, Y ) and S B(Y, Z)

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Geometric interpretation of signals: background

Geometric interpretation of signals: background Geometric interpretation of signals: background David G. Messerschmitt Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-006-9 http://www.eecs.berkeley.edu/pubs/techrpts/006/eecs-006-9.html

More information

On new structure of N-topology

On new structure of N-topology PURE MATHEMATICS RESEARCH ARTICLE On new structure of N-topology M. Lellis Thivagar 1 *, V. Ramesh 1 and M. Arockia Dasan 1 Received: 17 February 2016 Accepted: 15 June 2016 First Published: 21 June 2016

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.

More information

Data fitting by vector (V,f)-reproducing kernels

Data fitting by vector (V,f)-reproducing kernels Data fitting by vector (V,f-reproducing kernels M-N. Benbourhim to appear in ESAIM.Proc 2007 Abstract In this paper we propose a constructive method to build vector reproducing kernels. We define the notion

More information

NOTES ON PRODUCT SYSTEMS

NOTES ON PRODUCT SYSTEMS NOTES ON PRODUCT SYSTEMS WILLIAM ARVESON Abstract. We summarize the basic properties of continuous tensor product systems of Hilbert spaces and their role in non-commutative dynamics. These are lecture

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

NAME: MATH 172: Lebesgue Integration and Fourier Analysis (winter 2012) Final exam. Wednesday, March 21, time: 2.5h

NAME: MATH 172: Lebesgue Integration and Fourier Analysis (winter 2012) Final exam. Wednesday, March 21, time: 2.5h NAME: SOLUTION problem # 1 2 3 4 5 6 7 8 9 points max 15 20 10 15 10 10 10 10 10 110 MATH 172: Lebesgue Integration and Fourier Analysis (winter 2012 Final exam Wednesday, March 21, 2012 time: 2.5h Please

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Support Vector Machine Classification via Parameterless Robust Linear Programming

Support Vector Machine Classification via Parameterless Robust Linear Programming Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified

More information

1.3.1 Definition and Basic Properties of Convolution

1.3.1 Definition and Basic Properties of Convolution 1.3 Convolution 15 1.3 Convolution Since L 1 (R) is a Banach space, we know that it has many useful properties. In particular the operations of addition and scalar multiplication are continuous. However,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J Class Notes 4: THE SPECTRAL RADIUS, NORM CONVERGENCE AND SOR. Math 639d Due Date: Feb. 7 (updated: February 5, 2018) In the first part of this week s reading, we will prove Theorem 2 of the previous class.

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Topological vectorspaces

Topological vectorspaces (July 25, 2011) Topological vectorspaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ Natural non-fréchet spaces Topological vector spaces Quotients and linear maps More topological

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Rudiments of Ergodic Theory

Rudiments of Ergodic Theory Rudiments of Ergodic Theory Zefeng Chen September 24, 203 Abstract In this note we intend to present basic ergodic theory. We begin with the notion of a measure preserving transformation. We then define

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information