Shannon Sampling II. Connections to Learning Theory

Size: px
Start display at page:

Download "Shannon Sampling II. Connections to Learning Theory"

Transcription

1 Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: Ding-Xuan Zhou Departent of Matheatics, City University of Hong Kong at Chee Avenue, Kowloon, Hong Kong, CHINA E-ail: July, 004 Abstract We continue our study [1] of Shannon sapling and function reconstruction In this paper, the error analysis is iproved he proble of function reconstruction is extended to a ore general setting with fraes beyond point evaluation hen we show how our approach can be applied to learning theory: a functional analysis fraework is presented; sharp, diension independent probability estiates are given not only for error in the L spaces, but also for the error in the reproducing kernel Hilbert space where the a learning algorith is perfored Covering nuber arguents are replaced by estiates of integral operators Keywords and Phrases: Shannon sapling, function reconstruction, learning heory, reproducing kernel Hilbert space, fraes 1 Introduction his paper considers regularization schees associated with the least square loss and Hilbert spaces H of continuous functions Our target is to provide a unified approach for two topics: interpolation theory, or ore generally, function reconstruction in Shannon sapling theory with H being a space of band-liited functions or functions with certain he first author is partially supported by NSF grant he second author is supported partially by the Research Grants Council of Hong Kong [Project No CityU ] and by City University of Hong Kong [Project No ] 1

2 decay; and regression proble in learning theory with H being a reproducing kernel Hilbert space H K First, we iprove the probability estiates in [1] with a siplified developent hen we apply the technique for function reconstruction to learning theory In particular, we show that a regression function f ρ can be approxiated by a regularization schee f z,λ in H K Diension independent exponential probability estiates are given for the error f z,λ f ρ K Our error bounds provide clues to the asyptotic choice of the regularization paraeter γ or λ Sapling Operator Let H be a Hilbert space of continuous functions on a coplete etric space X and the inclusion J : H CX is bounded with J < hen for each x X, the point evaluation functional f fx is bounded on H with nor at ost J Hence there exists an eleent E x H such that fx =< f, E x > H, f H 1 Let x be a discrete subset of X Define the sapling operator S x : H l x by S x f = fx x x We shall always assue that S x is bounded Denote S x as the adjoint of S x hen for each c l x, there holds < f, S x c > H =< S x f, c > l x= x x c x fx =< f, x x c x E x > H, f H It follows that S x c = x x c x E x, c l x

3 3 Algorith o allow noise, we ake the following assuption Special Assuption he sapled values y = y x x x have the for: For soe f H, and each x x, y x = f x + η x, where η x is drawn fro ρ x 31 Here for each x X, ρ x is a probability easure with zero ean, and its variance σ x satisfies σ := x x σ x < Note that x x f x = S x f l x S x f H < he Markov inequality for a nonnegative rando variable ξ asserts that Prob{ξ Eξ 1 δ, 0 < δ < 1 3 δ It tells us that for every ε > 0, Prob { {η x l x > ε E {η x σ l x /ε = ε By taking ε, we see that {η x l x and hence y l x in probability Let γ 0 With the saple z := x, y x x x, consider the algorith { Function reconstruction f := arg in fx yx + γ f H 33 f H heore 1 If S x S x + γi is invertible, then f exists, is unique and x x f = Ly, L := S x S x + γi 1 S x Proof Denote E z f := x x fx yx Since x x fx = Sx f l x =< S x S xf, f > H, we know that for f H, E z f + γ f H =< Sx S x + γi f, f > H < Sx y, f > H + y l x aking the functional derivative [10] for f H, we see that any iniizer f of 33 satisfies S x S x + γi f z,γ = Sx y his proves heore 1 he invertibility of the operator S x S x + γi is valid for rich data 3

4 Definition 1 We say that x provides rich data with respect to H if is positive It provides poor data if λ x = 0 λ x := inf f H S xf l x/ f H 34 he proble of function reconstruction here is to estiate the error f f H In this paper we shall show in Corollary below that in the rich data case, with γ = 0, for every 0 < δ < 1, with probability 1 δ, there holds f f H J σ /δ λ 35 x his estiate does not require the boundedness of the noise ρ x Moreover, under the stronger condition see 1] that η x M for each x x, we shall use the McDiarid inequality and prove in heore 5 below that for every 0 < δ < 1, with probability 1 δ, f f H J λ 8σ log 1 x δ M log 1 36 δ he two estiates, 35 and 36, iprove the bounds in [1] It turns out that heore 4 in [1] is a consequence of the reark which follows it about the Markov inequality Conversations with David McAllester were iportant to clarify this point 4 Saple Error Define f x,γ := L S x f 41 he saple error takes the for f f x,γ H heore If Sx S x + γi is invertible and Special Assuption holds, then for every 0 < δ < 1, with probability 1 δ, there holds f f x,γ H Sx S x + γi 1 J σ δ If η x M for soe M 0 and each x x, then for every ε > 0, we have Prob y { f f x,γ H L σ 1 + ε 1 exp { εσ M log 1 + ε 4

5 Proof Write f f x,γ H as L y S x f H Sx S x + γi 1 Sx y Sx f H But Hence S x S x y Sx f = yx f x E x x x y Sx f H = yx f x y x f x < E x, E x > H x x x x By the independence of the saples and Ey x f x = 0, E { y x f x = σ x, its expected value is E Sx y Sx f H = σx < E x, E x > H Now < E x, E x > H = E x H J hen the expected value of the saple error can be bounded as x x E f f x,γ H S x S x + γi 1 J σ he first desired probability estiate follows fro the Markov inequality 3 For the second estiate, we apply heore 3 fro [1] with w 1 to the rando variables {η x x x Special Assuption tells us that Eη x = 0, which iplies Eη x = σ x hen we see that for every ε > 0, { { Prob y η x σx x x{ > ε exp ε M log 1 + M ε x x σ ηx Here we have used the condition η x M, which iplies η x σx M x x σ ηx x x Eη 4 x M σ < he desired bound then follows fro f f x,γ H L {η x l x Also, Reark When x contains eleents, we can take σ M < Proposition 1 he sapling operator S x satisfies S x S x + γi 1 1 λ x + γ 5

6 For the operator L, we have L S x λ x + γ Proof Let v H and u = S x S x + γi 1 v hen S x S x + γi u = v aking inner products on both sides with u, we have < S x u, S x u > l x +γ u H =< v, u > H v H u H he definition of the richness λ x tells us that < S x u, S x u > l x= S x u l x λ x u H It follows that λ x + γ u H v H u H Hence u H λ x + γ 1 v H his is true for every v H hen the bound for the first operator follows he second inequality is trivial Corollary 1 If S x S x + γi is invertible and Special Assuption holds, then for every 0 < δ < 1, with probability 1 δ, there holds f f x,γ H J σ λ x + γ δ 5 Integration Error Recall that f x,γ = L S x f = S x S x + γi 1 S x S x f hen f x,γ = S x S x + γi 1 S x S x + γi γi f = f γ S x S x + γi 1 f 51 his in connection with Proposition 1 proves the following proposition 6

7 Proposition If S x S x + γi is invertible, then f x,γ f H γ f H λ x + γ Corollary If λ x > 0 and γ = 0, then for every 0 < δ < 1, with probability 1 δ, there holds 51 holds f f H J σ /δ λ x For the poor data case λ x = 0, we need to estiate γ S x S x + γi 1 f according to Recall that for a positive self-adjoint linear operator L on a Hilbert space H, there γ L + γi 1 f H = γ L + γi 1 f Lg + Lg H f Lg H + γ g H for every g H aking the infiu over g H, we have γ L + γi 1 { f H Kf, γ := inf f Lg H + γ g H, f H, γ > 0 5 g H his is the K-functional between H and the range of L hus, when the range of L is dense in H, we have li γ 0 γ L + γi 1 f H = 0 for every f H If f is in the range of L r for soe 0 < r 1, then γ L + γi 1 f H L r f H γ r See [11] Using 5 for L = S x S x, we can use a K-functional between H and the range of S x S x to get the convergence rate Proposition 3 Define f γ as fγ { := arg inf f S x S x g H + γ g H, γ > 0, g H then there holds f x,γ f H f S x S x f γ H + γ f γ H In particular, if f lies in the closure of the range of Sx S x, then li γ 0 f x,γ f H = 0 If f is in the range of Sx S r x for soe 0 < r 1, then fx,γ f H rf Sx S x H γ r Copared with Corollary, Proposition 3 in connection with Corollary 1 gives an error estiate for the poor data case when f is in the range of r Sx S x For every 7

8 0 < δ < 1, with probability 1 δ, there holds f f H J σ γ δ + S x S x rf H γ r 6 More General Setting of Function Reconstruction Fro 1 we see that the boundedness of S x is equivalent to the Bessel sequence property of the faily {E x x x of eleents in H, ie, there is a positive constant B such that < f, Ex > H B f H, f H 61 x x Moreover, x provides rich data if and only if this faily fors a frae of H, ie, there are two positive constants A B called frae bounds such that A f H x x < f, Ex > H B f H, f H In this case, the operator S x S x is called the frae operator Its inverse is usually difficult to copute, but it satisfies the reconstruction property: f = x x < f, S x S x 1Ex > H E x, f H For these basic facts about fraes, see [17] he function reconstruction algorith studied in the previous sections can be generalized to a setting with a Bessel sequence {E x x x in H satisfying 61 Here the point evaluation 1 is replaced by the functional < f, E x > H and the algorith becoes f := arg in f H { x x < f, Ex > H y x + γ f H 6 he saple values are given by y x =< f, E x > H +η x If we replace the sapling operator S x by the operator fro H to l x apping f to < f, E x > H, then the algorith can x x be analyzed in the sae as above and all the error bounds hold true Concrete exaples for this generalized setting can be found in the literature of iage processing, inverse probles [6] and sapling theory [1]: the Fredhol integral equation of the first kind, the 8

9 oent proble, and the function reconstruction fro weighted-averages One can even consider ore general function reconstruction schees: replacing the least-square loss in 6 by soe other loss function and H by soe other nor For exaple, if we choose Vapnik s ɛ-insensitive loss: t ɛ := ax{ t ɛ, 0, and a function space H included in H such as a Sobolev space in L, then a function reconstruction schee becoes { f := arg in < f, Ex > H y ɛ x + γ f H f H x x he rich data requireent is reasonable for function reconstruction such as sapling theory [13] On the other hand, in learning theory, the situation of poor data or poor frae bounds A 0 as the nuber of points in x increases often happens For such situations, we take x to be rando saples of soe probability distribution 7 Learning heory Fro now on we assue that X is copact Let ρ be a probability easure on Z := X Y with Y := IR he error for a function f : X Y is given by Ef := Z fx y dρ he function iniizing the error is called the regression function and is given by f ρ x := Y ydρy x, x X Here ρy x is the conditional distribution at x induced by ρ he arginal distribution on X is denoted as ρ X We assue that f ρ L ρ X Denote f ρ := f L ρx the variance of ρ and σ ρ as he purpose of the regression proble in learning theory [3, 7, 9, 14, 15] is to find good approxiations of the regression function fro a set of rando saples z := { x i, y i drawn independently according to ρ his purpose is achieved in Corollaries 3, 4, and 5 below Here we consider kernel based learning algoriths Let K : X X IR be continuous, syetric and positive seidefinite, ie, for any finite set of distinct points {x 1,, x l X, the atrix Kx i, x j l i,j=1 seidefinite Such a kernel is called a Mercer kernel is positive he Reproducing Kernel Hilbert Space RKHS H K associated with the kernel K is defined to be the closure [] of the linear span of the set of functions {K x := Kx, : x 9

10 X with the inner product <, > K satisfying < K x, K y > K = Kx, y he reproducing property takes the for < K x, g > K = gx, x X, g H K 71 he learning algorith we study here is a regularized one: { 1 Learning Schee f z,λ := arg in fxi y i + λ f f H K K 7 We shall investigate how f z,λ approxiates f ρ and how the choice of the regularization paraeter λ leads to optial convergence rates he convergence in L ρ x has been considered in [4, 5, 18] he purpose of this section is to present a siple functional analysis approach, and to provide the convergence rates in the space H K as weel as sharper, diension independent probability estiates in L ρ X he reproducing kernel property 71 tells us that the iniizer of 7 lies in H K,z := span{k xi by projection onto this subspace hus, the algorith can be written in the sae way as 33 o see this, we denote x = {x i, ρ x = ρ x f ρ x for x X hen E x = K x for x x Special Assuption holds, and 31 is true except that f H is replaced by f = f ρ Denote y = y i he learning schee 7 becoes f z,λ := arg in f H K,z { x x herefore, heore 1 still holds and we have fx yx + γ f K, γ = λ f z,λ = S x S x + λi 1 S x y his iplies the expression see, eg [3] that f z,λ = c ik xi with c = c i satisfying Kxi, x j i,j=1 + λi c = y Denote κ := sup x X Kx, x and f ρ x := f ρ x x x Define f x,λ := S x S x + λi 1 S x f ρ x Observe that S x : IR H K,z is given by S x c = c ik xi hen S x S x satisfies S x S x f = x x fxk x = L K,x S x f, f H K,z, 10

11 where L K,x : l x H K is defined as L K,x c := 1 c i K xi It is a good approxiation of the integral operator L K : L ρ X H K defined by L K fx := Kx, yfydρ X y, x X X he operator L K can also be defined as a self-adjoint operator on H K or on L ρ X We shall use the sae notion L K for these operators defined on different doains As operators on H K, L K,x S x approxiates L K well In fact, it was shown in [5] that E L K,x S x L K HK H K κ operators with doain L ρ X o get sharper error bounds, we need to get estiates for the Lea 1 Let x X be randoly drawn according to ρ X hen for any f L ρ X, E L K,x f x L K f K = E 1 fx i K xi L K f K κ f ρ Proof Define ξ to be the H K -valued rando variable ξ := fxk x over X, ρ X hen 1 fx ik xi L K f = 1 ξx i Eξ We know that { E 1 ξx i Eξ K E 1 which is bounded by κ f ρ/ ξx i Eξ K = 1 E ξ K Eξ K by he function f x,λ ay be considered as an approxiation of f λ where f λ is defined f λ := L K + λi 1 LK f ρ 73 In fact, f λ is a iniizer of the optiization proble: { f λ := arg in f fρ ρ + λ f { K = arg in Ef Efρ + λ f K 74 f H K f H K heore 3 Let z be randoly drawn according to ρ hen κ σ E z Z fz,λ f x,λ ρ K λ 11

12 and 3κ f ρ ρ E x X fx,λ f λ K λ Proof he sae proof as that of heore and Proposition 1 shows that E y fz,λ f x,λ K κ σ x i λ x + λ But E x σ x i = σ ρ hen the first stateent follows o see the second stateent we write f x,λ f λ as f x,λ f λ + f λ f λ, where f λ is defined by f λ := L K,x S x + λi 1 LK f ρ 75 Since f x,λ f λ = L K,x S x + λi 1 S x f ρ x L K f ρ, 76 Lea 1 tells us that E f x,λ f λ K 1 λ E S x f ρ x L K f ρ K κ f ρ ρ λ o estiate f λ f λ, we write L K f ρ as L K + λif λ hen f λ f λ = L K,x S x + λi 1 LK + λif λ f λ = L K,x S x + λi 1 LK f λ L K,x S x f λ Hence f λ f λ K 1 λ L Kf λ L K,x S x f λ K 77 Applying Lea 1 again, we see that E f λ f λ K 1 λ E L K f λ L K,x S x f λ K κ f λ ρ λ Note that f λ is a iniizer of 74 aking f = 0 yields f λ f ρ ρ + λ f λ K f ρ ρ Hence herefore, our second estiate follows f λ ρ f ρ ρ and f λ K f ρ ρ / λ 78 he last step is to estiate the approxiation error f λ f ρ 1

13 heore 4 Define f λ by 73 If L r K f ρ L ρ X for soe 0 < r 1, then f λ f ρ ρ λ r L r K f ρ ρ 79 When 1 < r 1, we have f λ f ρ K λ r 1 L r K f ρ ρ 710 We follow the sae line as we did in [11] Estiates siilar to 79 can be found [3, heore 3 1]: for a self-adjoint strictly positive copact operator A on a Hilbert space H, there holds for 0 < r < s, { b a + γ A s b γ r/s A r a 711 inf b H A istake was ade in [3] when scaling fro s = 1 to general s > 0: r should be < 1 in the general situation A proof of 79 was given in [5] Here we provide a coplete proof because the idea is used for verifying 710 Proof of heore 4 If {λ i, ψ i i 1 are the noralized eigenpairs of the integral operator L K : L ρ X L ρ X, then λ i ψ i K = 1 when λ i > 0 Write f ρ = L r K g for soe g = i 1 d iψ i with {d i l = g ρ < hen f ρ = i 1 λr i d iψ i and by 73, f λ f ρ = L K + λi 1 LK f ρ f ρ = i 1 i 1 i 1 λ λ i + λ λr i d i ψ i It follows that { 1/ { λ 1 r λ f λ f ρ ρ = λ i + λ λr i d i = λ r λi λ i + λ his is bounded by λ r {d i l λ + λ i = λ r g ρ = λ r L r K f ρ ρ Hence 79 holds r d i 1/ When r > 1, we have f λ f ρ K = λ 1 λ i + λ λr i d i = λ 3 r r 1 λ r 1 λi d i λ i + λ λ + λ i λ i >0 i 1 his is again bounded by λ r 1 {d i l has been verified = λ r 1 L r K f ρ ρ he second stateent 710 Cobining heores 3 and 4, we find the expected value of the error f z,λ f ρ By choosing the optial paraeter in this bound, we get the following convergence rates 13

14 Corollary 3 Let z be randoly drawn according to ρ Assue L r K f ρ L ρ X for soe 1 < r 1, then { 1 E z Z fz,λ f ρ K Cρ,K + λ r 1, 71 λ where C ρ,k := κ σ ρ + 3κ f ρ ρ + L r K f ρ ρ is independent of the diension Hence λ = 1 1 r 1 1+r = Ez Z fz,λ f ρ K 4r+ Cρ,K 713 Reark Corollary 3 provides estiates for the H K -nor error of f z,λ f ρ So we require f ρ H K which is equivalent to L 1 K f ρ L ρ X o get convergence rates we assue a stronger condition L r K f ρ L ρ X for soe 1 < r 1 he optial rate derived fro Corollary 3 is achieved by r = 1 In this case, E z Z fz,λ f ρ K = he nor f z,λ f ρ K can not be bounded by the error Ef z,λ Ef ρ, hence our bound for the H K -nor error is new in learning theory Corollary 4 Let z be randoly drawn according to ρ If L r K f ρ L ρ X 1, then for soe 0 < r { E z Z fz,λ f ρ ρ C 1 ρ,k + λ, r 714 λ where C ρ,k := κ σ ρ + 3κ f ρ ρ + L r K f ρ ρ hus, λ = 1 +r = Ez Z fz,λ f ρ ρ C 1 r r+ ρ,k 715 Reark he convergence rate 715 for the L ρ X -nor is obtained by optiizing the regularization paraeter λ in 714 he sharp rate derived fro Corollary 4 is 1 which is achieved by r = 1 In [18], a leave-one-out technique was used to derive the expected value of learning schees For the schee 7, the result can be expressed as E z Z Efz,λ 1 4, { 1 + κ inf Ef + λ λ f H K f K 716 Notice that Ef Ef ρ = f f ρ ρ If we denote the regularization error see [1] as Dλ := { inf Ef Efρ + λ f { K = inf f fρ ρ + λ f K, 717 f H K f H K 14

15 then the bound 716 can be restated as E z Z fz,λ f ρ ρ D λ/ + Ef ρ + D λ/ 4κ λ + 4κ4 λ One can then derive the convergence rate in expectation when f ρ H K and Ef ρ > 0 In fact, 711 with H = L ρ X, A = L K holds for r = s = 1/, which yields the best rate for the regularization error Dλ f ρ K λ One can thus get E z Z fz,λ 1 f ρ ρ = 1 by taking λ = 1/ Applying 3, one can have the probability estiate fz,λ f ρ ρ C 1 4 δ for the confidence 1 δ 1 In [5], a functional analysis approach was eployed for the error analysis of the schee 7 he ain result asserts that for any 0 < δ < 1, with confidence 1 δ, Efz,λ Ef λ Mκ 1 + κ 1 + log /δ 718 λ λ Convergence rates were also derived in [5, Corollary 1] by cobining 718 with 79: when f ρ lies in the range of L K, for any 0 < δ < 1, with confidence 1 δ, there holds fz,λ f ρ ρ C log /δ 1 5, if λ = log /δ hus the confidence is iproved fro 1/δ to log/δ, while the rate is weakened to In the next section we shall verify the sae confidence estiate while the sharp rate is kept Our approach is short and neat, without involving the leave-one-out technique Moreover, we can derive convergence rates in the space H K Probability Estiates by McDiarid Inequalities In this section we apply soe McDiarid inequalities to iprove the probability estiates derived fro expected values by the Markov inequality Let Ω, ρ be a probability space For t = t 1,, t Ω and t i Ω, we denote t i := t 1,, t i 1, t i, t i+1,, t Lea Let {t i, t i be iid drawers of the probability distribution ρ on Ω, and F : Ω IR be a easurable function 15

16 1 If for each i there is c i such that sup t Ω,t i Ω F t F t i c i, then Prob t Ω {F t E t F t ε exp { ε, ε > 0 81 c i If there is B 0 such that sup t Ω,1 i F t Eti F t B, then Prob t Ω {F t E t F t ε { exp where σ i F := sup z\{t i Ω 1 E t i { F t Eti F t ε Bε/3 +, ε > 0, σ i F 8 he first inequality is the McDiarid inequality, see [8] he second inequality is its Bernstein for which is presented in [16] First, we show how the probability estiate for function reconstruction stated in heore can be iproved heore 5 If S x S x + γi is invertible and Special Assuption holds, then under the condition that y x f x M for each x x, we have for every 0 < δ < 1, with probability 1 δ, f f x,γ H Sx S x + γi 1 J 8σ log 1 δ M log 1 δ J λ x + γ 8σ log 1 δ M log 1 δ Proof Write f f x,γ H as L y S x f H S x S x + γi 1 S x y Sx f H Consider the function F : l x IR defined by F y = Sx y Sx f H Recall fro the proof of heore that F y = x x yx f x E x H and E y F E y F = σx < E x, E x > H J σ 83 x x 16

17 hen we can apply the McDiarid inequality Let x 0 x and y x 0 be a new saple at x 0 We have F y F y x 0 = S x y Sx f H Sx y x 0 S x f H S x y y x 0 H he bound equals y x0 y x 0 Ex0 H y x0 y x 0 J Since y x f x M for each x x, it can be bounded by M J, which can be taken as B in Lea Also, F E yx0 y Ey0 F y yx0 y J dρx0 x 0 y x 0 dρ y x0 x 0 y x0 y x 0 J dρ x0 y x 0 dρ x0 y x0 4 J σ x 0 his yields x 0 x σ x 0 F 4 J σ hus Lea tells us that for every ε > 0, Prob y Y x {F y E y F y ε Solving the quadratic equation gives the probability estiate { exp ε M J ε/3 + 4 J σ = log 1 δ ε M J ε/3 + 4 J σ F y E y F + J 8σ log 1 δ M log 1 δ with confidence 1 δ his in connection with 83 proves heore 5 hen we turn to the learning theory estiates he purpose is to iprove the bound in heore 3 by applying the McDiarid inequality o this end, we refine Lea 1 to a probability estiate for Lea 3 Let x X be randoly drawn according to ρ X hen for any f L ρ X 0 < δ < 1, with confidence 1 δ, there holds 1 fx i K xi L K f 4κ f K 3 log 1 δ + κ f ρ log 1 δ and 17

18 Proof Define a function F : X IR as F x = F x 1,, x = 1 fx i K xi L K f K For j {1,,, we have F x F x j 1 fxj fx j K xj K κ fx j fx j It follows that F x Exj F x κ f =: B Moreover, κ E xj F x E xj F x fxj fx X X j dρx x j dρ X x j κ fxj + fx j dρ X x jdρ X x j 4κ f ρ X X hen we have j=1 σ j F 4κ f ρ hus we can apply Lea to the function F and find that Prob x X { { F x E x F x ε exp κ f ε 3 ε + 4κ f ρ Solving a quadratic equation again, we see that with confidence 1 δ, we have F x E x F x + 4κ f 3 log 1 δ + κ f ρ 8 log 1 δ Lea 1 says that E x F x κ f ρ hen our conclusion follows heore 6 Let z be randoly drawn according to ρ satisfying y M alost everywhere hen for any 0 < δ < 1, with confidence 1 δ we have f z,λ f λ K κm log 4/δ κ λ 3 λ Proof Since y M alost everywhere, we know that f ρ ρ f ρ M Recall the function f λ defined by 75 It satisfies 76 Hence f x,λ f λ K 1 λ 1 f ρ x i K xi L K f ρ K 18

19 Applying Lea 3 to the function f ρ, we have with confidence 1 δ, f x,λ f λ K 4κM 3λ log 1 δ + κm log 1 λ δ In the sae way, by Lea 3 with the function f λ and 77, we find { Prob x X f λ f λ K 4κ f λ log 1 3λ δ + κ f λ ρ log 1 1 δ λ δ By 78, we have f λ ρ M and herefore, with confidence 1 δ, there holds f λ f λ K f λ κ f λ K κm λ 4κ M 3λ λ log 1 δ + κm 1 + λ 8 log 1 δ Finally, we apply heore 5 For each x X, there holds with confidence 1 δ, f z,λ f x,λ K κ 8σ λ log 1 δ M log 1 84 δ Here σ = σ x i Apply the Bernstein inequality Prob x X { 1 { ε ξx i Eξ ε exp Bε/3 + σ ξ to the rando variable ξx = Y y fρ x dρy x It satisfies 0 ξ 4M, Eξ = σ ρ, and σ ξ 4M σ ρ hen we see that { 1 Prob x X σx i σ ρ + 8M log 1/δ 8M σ ρ log 1/δ + 1 δ 3 Hence σ σ ρ + M 3 log 1/δ + 8M σ ρ log 1/δ 1/4 which is bounded by σ ρ + M 3 log 1/δ ogether with 84, we see that with probability 1 δ in Z, we have the bound σ f z,λ f x,λ K 4κ ρ log 1/δ + 34κM log 1/δ λ 3λ 19

20 Cobining the above three bounds, we know that for 0 < δ < 1/4, with confidence 1 4δ, f z,λ f λ K is bounded by { κm 13 log 1/δ σ log 1/δ + 4 ρ log 1/δ + 4κ log 1/δ λ M 3 λ κm log 1/δ { log 1/δ 13 λ + 3 log σ ρ + 4κ M 3 λ But σ ρ M hen our conclusion follows log 1/δ We are in a position to state our convergence rates in both K and ρ nors Corollary 5 Let z be randoly drawn according to ρ satisfying y M alost everywhere If f ρ is in the range of L K, then for any 0 < δ < 1, with confidence 1 δ we have f z,λ f ρ K C log 4/δ 1 6 log 4/δ by taking λ = and f z,λ f ρ ρ C log 4/δ 1 4 log 4/δ by taking λ = 1 4, 86 where C is a constant independent of the diension: C := 30κM + κ M + L 1 K f ρ ρ References [1] A Aldroubi and K Gröchenig, Non-unifor sapling and reconstruction in shiftinvariant spaces, SIAM Review , [] N Aronszajn, heory of reproducing kernels, rans Aer Math Soc , [3] F Cucker and S Sale, On the atheatical foundations of learning, Bull Aer Math Soc , 1 49 [4] F Cucker and S Sale, Best choices for regularization paraeters in learning theory, Foundat Coput Math 00,

21 [5] E De Vito, A Caponnetto, and L Rosasco, Model selection for regularized leastsquares algorith in learning theory, Foundat Coput Math, to appear [6] H W Engl, M Hanke, and A Neubauer, Regularization of Inverse Probles, Matheatics and Its Applications, 375, Kluwer, Dordrecht, 1996 [7] Evgeniou, M Pontil, and Poggio, Regularization networks and support vector achines, Adv Coput Math , 1 50 [8] C McDiarid, Concentration, in Probabilistic Methods for Algorithic Discrete Matheatics, Springer-Verlag, Berlin, 1998, pp [9] P Niyogi, he Inforational Coplexity of Learning, Kluwer, 1998 [10] Poggio and S Sale, he atheatics of learning: dealing with data, Notices Aer Math Soc , [11] S Sale and D X Zhou, Estiating the approxiation error in learning theory, Anal Appl 1 003, [1] S Sale and D X Zhou, Shannon sapling and function reconstruction fro point values, Bull Aer Math Soc , [13] M Unser, Sapling-50 years after Shannon, Proc IEEE , [14] V Vapnik, Statistical Learning heory, John Wiley & Sons, 1998 [15] G Wahba, Spline Models for Observational Data, SIAM, 1990 [16] Y Ying, McDiarid inequalities of Bernstein and Bennett fors, echnical Report, City University of Hong Kong, 004 [17] R M Young, An Introduction to Non-Haronic Fourier Series, Acadeic Press, New York, 1980 [18] Zhang, Leave-one-out bounds for kernel ethods, Neural Cop ,

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming SVM Soft Margin Classifiers: Linear Prograing versus Quadratic Prograing Qiang Wu wu.qiang@student.cityu.edu.hk Ding-Xuan Zhou azhou@cityu.edu.hk Departent of Matheatics, City University of Hong Kong,

More information

Multi-kernel Regularized Classifiers

Multi-kernel Regularized Classifiers Multi-kernel Regularized Classifiers Qiang Wu, Yiing Ying, and Ding-Xuan Zhou Departent of Matheatics, City University of Hong Kong Kowloon, Hong Kong, CHINA wu.qiang@student.cityu.edu.hk, yying@cityu.edu.hk,

More information

Manifold learning via Multi-Penalty Regularization

Manifold learning via Multi-Penalty Regularization Manifold learning via Multi-Penalty Regularization Abhishake Rastogi Departent of Matheatics Indian Institute of Technology Delhi New Delhi 006, India abhishekrastogi202@gail.co Abstract Manifold regularization

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Journal of Mathematical Analysis and Applications

Journal of Mathematical Analysis and Applications J Math Anal Appl 386 202 205 22 Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Universal algorithms for learning theory Part II : piecewise polynomial functions

Universal algorithms for learning theory Part II : piecewise polynomial functions Universal algoriths for learning theory Part II : piecewise polynoial functions Peter Binev, Albert Cohen, Wolfgang Dahen, and Ronald DeVore Deceber 6, 2005 Abstract This paper is concerned with estiating

More information

Complex Quadratic Optimization and Semidefinite Programming

Complex Quadratic Optimization and Semidefinite Programming Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

The Hilbert Schmidt version of the commutator theorem for zero trace matrices The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Online gradient descent learning algorithm

Online gradient descent learning algorithm Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Derivative reproducing properties for kernel methods in learning theory

Derivative reproducing properties for kernel methods in learning theory Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

New Classes of Positive Semi-Definite Hankel Tensors

New Classes of Positive Semi-Definite Hankel Tensors Miniax Theory and its Applications Volue 017, No., 1 xxx New Classes of Positive Sei-Definite Hankel Tensors Qun Wang Dept. of Applied Matheatics, The Hong Kong Polytechnic University, Hung Ho, Kowloon,

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Some Classical Ergodic Theorems

Some Classical Ergodic Theorems Soe Classical Ergodic Theores Matt Rosenzweig Contents Classical Ergodic Theores. Mean Ergodic Theores........................................2 Maxial Ergodic Theore.....................................

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Neural Network Learning as an Inverse Problem

Neural Network Learning as an Inverse Problem Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic. Eail: vera@cs.cas.cz

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES

STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES Annales Univ Sci Budapest, Sect Cop 39 (2013) 365 379 STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES MK Runovska (Kiev, Ukraine) Dedicated to

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Max-Product Shepard Approximation Operators

Max-Product Shepard Approximation Operators Max-Product Shepard Approxiation Operators Barnabás Bede 1, Hajie Nobuhara 2, János Fodor 3, Kaoru Hirota 2 1 Departent of Mechanical and Syste Engineering, Bánki Donát Faculty of Mechanical Engineering,

More information

Alireza Kamel Mirmostafaee

Alireza Kamel Mirmostafaee Bull. Korean Math. Soc. 47 (2010), No. 4, pp. 777 785 DOI 10.4134/BKMS.2010.47.4.777 STABILITY OF THE MONOMIAL FUNCTIONAL EQUATION IN QUASI NORMED SPACES Alireza Kael Mirostafaee Abstract. Let X be a linear

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

A note on the realignment criterion

A note on the realignment criterion A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,

More information

Learning, Regularization and Ill-Posed Inverse Problems

Learning, Regularization and Ill-Posed Inverse Problems Learning, Regularization and Ill-Posed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh Septeber 7, 07 Abstract We investigate the convergence properties of increental irror descent

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh April 6, 08 Abstract We investigate the convergence properties of increental irror descent type

More information

APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY

APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY NEW ZEALAND JOURNAL OF MATHEMATICS Volue 36 007, 11 APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY Daniyal M. Israfilov and Yunus E. Yildirir

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

L p moments of random vectors via majorizing measures

L p moments of random vectors via majorizing measures L p oents of rando vectors via ajorizing easures Olivier Guédon, Mark Rudelson Abstract For a rando vector X in R n, we obtain bounds on the size of a saple, for which the epirical p-th oents of linear

More information

Asymptotics of weighted random sums

Asymptotics of weighted random sums Asyptotics of weighted rando sus José Manuel Corcuera, David Nualart, Mark Podolskij arxiv:402.44v [ath.pr] 6 Feb 204 February 7, 204 Abstract In this paper we study the asyptotic behaviour of weighted

More information

A Note on Online Scheduling for Jobs with Arbitrary Release Times

A Note on Online Scheduling for Jobs with Arbitrary Release Times A Note on Online Scheduling for Jobs with Arbitrary Release Ties Jihuan Ding, and Guochuan Zhang College of Operations Research and Manageent Science, Qufu Noral University, Rizhao 7686, China dingjihuan@hotail.co

More information

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf Konrad-Zuse-Zentru für Inforationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin - Wilersdorf Folkar A. Borneann On the Convergence of Cascadic Iterations for Elliptic Probles SC 94-8 (Marz 1994) 1

More information

THE AVERAGE NORM OF POLYNOMIALS OF FIXED HEIGHT

THE AVERAGE NORM OF POLYNOMIALS OF FIXED HEIGHT THE AVERAGE NORM OF POLYNOMIALS OF FIXED HEIGHT PETER BORWEIN AND KWOK-KWONG STEPHEN CHOI Abstract. Let n be any integer and ( n ) X F n : a i z i : a i, ± i be the set of all polynoials of height and

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1. M ath. Res. Lett. 15 (2008), no. 2, 375 388 c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS Van H. Vu Abstract. Let F q be a finite field of order q and P be a polynoial in F q[x

More information

DISSIMILARITY MEASURES FOR ICA-BASED SOURCE NUMBER ESTIMATION. Seungchul Lee 2 2. University of Michigan. Ann Arbor, MI, USA.

DISSIMILARITY MEASURES FOR ICA-BASED SOURCE NUMBER ESTIMATION. Seungchul Lee 2 2. University of Michigan. Ann Arbor, MI, USA. Proceedings of the ASME International Manufacturing Science and Engineering Conference MSEC June -8,, Notre Dae, Indiana, USA MSEC-7 DISSIMILARIY MEASURES FOR ICA-BASED SOURCE NUMBER ESIMAION Wei Cheng,

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information