arxiv: v4 [math.st] 9 Aug 2017

Size: px
Start display at page:

Download "arxiv: v4 [math.st] 9 Aug 2017"

Transcription

1 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING GILLES BLANCHARD AND NICOLE MÜCKE arxiv: v4 [athst] 9 Aug 2017 Abstract We consider a distributed learning aroach in suervised learning for a large class of sectral regularization ethods in an RKHS fraework The data set of size n is artitioned into = O(n α ) disjoint subsets On each subset, soe sectral regularization ethod (belonging to a large class, including in articular Kernel Ridge Regression, L 2 -boosting and sectral cut-off) is alied The regression function f is then estiated via sile averaging, leading to a substantial reduction in coutation tie We show that iniax otial rates of convergence are reserved if grows sufficiently slowly (corresonding to an uer bound for α) as n, deending on the soothness assutions on f and the intrinsic diensionality In sirit, our aroach is classical 1 Introduction Distributed learning (DL) algoriths are a standard tool for saving coutation tie in achine learning robles where assive datasets are involved: Dividing randoly data of cardinality n into equally-sized, easy anageable artitions and evaluating the in arallel roughly gains a factor 2 (for tie and eory) coared to the single achine aroach The final outut is obtained fro averaging the individual oututs 1 Recently, DL was studied in several achine learning contexts: in oint estiation [14], atrix factorization [17], soothing sline odels and testing [4], local average regression [3], in classification (kernel SVMs [13] and feature sace decoosition [11]) and also in kernel (ridge) regression (KRR) [21], [16], [20] In this aer, we study the DL aroach for the statistical learning roble (11) Y i := f(x j ) + ε i, j = 1,, n, at rando iid data oints X 1,, X n drawn according to a robability distribution ν on X, where ε j are indeendent centered noise variables The unknown regression function f is real-valued and belongs to soe reroducing kernel Hilbert sace with bounded kernel K We artition the given data set D = {(X 1, Y 1 ),, (X n, Y n )} X R into disjoint equal-size subsets D 1,, D On each subset D j, we coute a local estiator ˆf λ D j, using Date: August 10, For the sake of silicity, throughout this aer we assue that n is divisible by This could always be achieved be disregarding soe data; alternatively, it is straightforward to show that aditting one saller block in the artition does not affect the asytotic results of this aer We shall not try to discuss this oint in greater detail In articular, we shall not analyze in which general fraework our sile averages could be relaced by weighted averages 1

2 2 GILLES BLANCHARD AND NICOLE MÜCKE a sectral regularization ethod The final estiator for the target function f is obtained by sile averaging: f λ D := 1 ˆf λ D j The non-distributed setting (=1) has been studied in the recent aer [2], building the root osition of our results in the distributed setting, where (weak and strong) iniax otial rates of convergence are established Our ai is to extend these results to distributed learning and to derive iniax otial rates We again aly a fairly large class of sectral regularization ethods, including the oular KRR, L 2 -boosting and sectral cut-off As in [2], we let T : f f(x)k(x, )dν(x) denote the kernel integral oerator associated to K and the saling easure ν Our rates of convergence are governed by a source condition assution on f of the for T r f R for soe constants r, R > 0 as well as by the ill-osedness of the roble, as easured by an assued ower decay of the eigenvalues of T with exonent b > 1 We show, that for s [0, 1 2 ] in the sense of -th oent exectation (12) T s (f f λn D ) HK ( ) σ 2 (r+s) 2r+1+1/b R, R 2 n for an aroriate choice of the regularization araeter λ n, deending on the global sale size n as well as on R and the noise variance σ 2 (but not on the nuber of subsale sets) Note that s = 0 corresonds to the reconstruction error (ie - nor), and s = 1 to the rediction error (ie, 2 L2 (ν) nor) The sybol eans that the inequality holds u to a ultilicative constant that can deend on various araeters entering in the assutions of the result, but not on n,, σ, nor R An iortant assution is that the inequality q r + s should hold, where q is the qualification of the regularization ethod, a quantity defined in the classical theory of inverse robles (see Section 23 for a recise definition) Basic robles are the choice of the regularization araeter on the subsales and, ost iortantly, the roer choice of, since it is well known that choosing too large gives a subotial convergence rate in the liit n, see eg [20] Our aroach to this roble is classical Using a bias-variance decoosition and choosing the regularization araeter according to the total sale size n yields undersoothing on each of the individual sales The bias estiate is then straightforward For the hard art we write the variance as a su of indeendent rando variables, leading to a substantial reduction of variance by averaging To the best of our knowledge, coarable results u to coletion of this article had been restricted to KRR, corresonding to Tikhonov regularization In [21] the authors derive Miniax-otial rates in 3 cases (finite rank kernels, sub- Gaussian decay of eigenvalues of the kernel and olynoial decay), rovided satisfies a certain uer bound, deending on the rate of decay of the eigenvalues and an additional crucial uer bound on the eigenfunctions φ j of the Mercer kernel (see Section 5) It is therefore of great interest to investigate if and how can be allowed to go to infinity as a function of n without iosing any conditions on the eigenfunctions of the kernel Results in this direction have been obtained in the recent aer [16], for KRR, which is a great iroveent on the

3 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 3 worst rate of [21] The authors dub their aroach a second order decoosition, which uses concentration inequalities and certain resolvent identities adated to KRR After this aer had been coleted, however, we learned of the Oberwolfach reort [23], where the authors have reorted results for general sectral regularization ethods, which are siilar to the results in this aer At the tie of writing, we are not aware of any ublished roof It is unclear to us how the authors of [23] rove their results They require bounded outut sace, a continuous kernel (ours need only be bounded)and their estiates are only in L 2 sense, not in RKHS-nor Furtherore, they do not see to track the deendence on the noise variance and the source condition as recisely as we do For ore detail, we refer to our Discussion in Section 4 The outline of the aer is as follows Section 2 contains notation and the setting Section 3 states our ain result on distributed learning Section 4 resents nuerical studies, followed by a concluding discussion and a ore detailed coarison of our results in Section 5 In Section 6 we rove our theores 2 Notation, Statistical odel and distributed learning Algorith In this section, we secify the atheatical background and the statistical odel for (distributed) regularized learning We have included this section for self sufficiency and reader convenience It essentially reeats the setting in [2] in suarized for 21 Kernel-induced oerators We assue that the inut sace X is a standard Borel sace endowed with a robability easure ν, the outut sace is equal to R We let K be a ositive seidefinite kernel on X X which is bounded by κ The associated reroducing kernel Hilbert sace will be denoted by It is assued that all functions f are easurable and bounded in sureu nor, ie f κ f HK for all f Therefore, is a subset of L 2 (X, ν), with S : L 2 (X, ν) being the inclusion oerator, satisfying S κ The adjoint oerator S : L 2 (X, ν) is identified as S g = E ν [g(x)k X ] = g(x)k x ν(dx) Setting T x = K x Kx :, the covariance oerator is given by T = E ν [K X KX] =, K x HK K x ν(dx), which can be shown to be ositive self-adjoint trace class (and hence is coact) The corresonding eirical versions of these oerators are given by S x : R n, (S x f) j = f, K xj HK, Sx : R n, Sxy = 1 n y j K xj, n T x := S xs x :, X X T x = 1 n n K xj Kx j

4 4 GILLES BLANCHARD AND NICOLE MÜCKE We introduce the shortcut notation T = κ 2 T and T x := κ 2 T x, ensuring T 1 and T x 1 Siilarly, S = κ 1 S and S xj := κ 2 S xj, ensuring S 1 and S x 1 The nubers µ j are the ositive eigenvalues of T satisfying 0 < µ j+1 µ j for all j > 0 and µ j 0 22 Noise assution and rior classes In our setting of kernel learning, the saling is assued to be rando iid, where each observation oint (X i, Y i ) follows the odel Y = f(x) + ε For (X, Y ) having distribution ρ, we assue: The conditional exectation wrt ρ of Y given X exists and it holds for ν-alost all x X : (21) E ρ [Y X = x] = f ρ (x), for soe f ρ Furtherore, we will ake the following assution on the observation noise distribution: There exists σ > 0 such that (22) E[ Y f ρ (X) 2 X ] σ 2 ν as To derive nontrivial rates of convergence, we concentrate our attention on secific subsets (also called odels) of the class of robability easures If P denotes the set of all robability distributions on X, we define classes of saling distributions by introducing decay conditions on the eigenvalues µ i of the oerator T ν For b > 1 and β > 0, we set P < (b, β) := {ν P : µ j β/j b j 1}, For a subset Ω, we let K(Ω) be the set of regular conditional robability distributions ρ( ) on B(R) X such that (21) and (22) hold for soe f ρ Ω We will focus on a Hölder-tye source condition, ie given r > 0, R > 0 and ν P, we define (23) Ω ν (r, R) := {f : f = T r ν h, h HK R} Then the class of odels which we will consider will be defined as (24) M(r, R, P ) := { ρ(dx, dy) = ρ(dy x)ν(dx) : ρ( ) K(Ω ν (r, R)), ν P }, with P = P < (b, β) As a consequence, the class of odels deends not only on the soothness roerties of the solution (reflected in the araeters R > 0, r > 0), but also essentially on the decay of the eigenvalues of T ν 23 Regularization In this subsection, we introduce the class of linear regularization ethods based on sectral theory for self-adjoint linear oerators These are standard ethods for finding stable solutions for ill-osed inverse robles Originally, these ethods were develoed in the deterinistic context, see [8] Later on, they have been alied to robabilistic robles in achine learning, see [10] or [2] Definition 21 (Regularization function) Let g : (0, 1] [0, 1] R be a function and write g λ = g(λ, ) The faily {g λ } λ is called regularization function, if the following conditions hold: (i) There exists a constant D < such that for any 0 < λ 1 su tg λ (t) D 0<t 1

5 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 5 (ii) There exists a constant E < such that for any 0 < λ 1 (25) su g λ (t) E 0<t 1 λ (iii) Defining the residual r λ (t) := 1 g λ (t)t, there exists a constant γ 0 < such that for any 0 < λ 1 su r λ (t) γ 0 0<t 1 It has been shown in eg [6], [2] that attainable learning rates are essentially linked with the qualification of the regularization {g λ } λ, being the axial q such that for any 0 < λ 1 (26) su r λ (t) t q γ q λ q 0<t 1 for soe constant γ q > 0 The ost oular exales include: Exale 22 (Tikhonov Regularization, Kernel Ridge Regression) The choice g λ (t) = 1 λ+t corresonds to Tikhonov regularization In this case we have D = E = γ 0 = 1 The qualification of this ethod is q = 1 with γ q = 1 Exale 23 (Landweber Iteration, gradient descent ) The Landweber Iteration (gradient descent algorith with constant stesize) is defined by k 1 g k (t) = (1 t) j with k = 1/λ N j=0 We have D = E = γ 0 = 1 The qualification q of this algorith can be arbitrary with γ q = 1 if 0 < q 1 and γ q = q q if q > 1 Exale 24 (ν- ethod) The ν ethod belongs to the class of so called sei-iterative regularization ethods This ethod has finite qualification q = ν with γ q a ositive constant Moreover, D = 1 and E = 2 The filter is given by g k (t) = k (t), a olynoial of degree k 1, with regularization araeter λ k 2, which akes this ethod uch faster as eg gradient descent 24 Distributed Learning Algorith We let D = {(x j, y j )} n X Y be the dataset, which we artition into disjoint subsets D 1,, D, each having size n Denote the jth data vector by (x j, y j ) (X R) n On each subset we coute a local estiator for a suitable a-riori araeter choice λ = λ n according to (27) f λn D j := g λn (κ 2 T xj )κ 2 S x j y j = g λn ( T xj ) S x j y j By fd λ we will denote the estiator using the whole sale = 1 The final estiator is given by sile averaging the local ones: λ (28) f D := 1 fd λ j

6 6 GILLES BLANCHARD AND NICOLE MÜCKE 3 Main Results This section resents our ain results Theore 31 and Theore 32 contain searate estiates on the aroxiation error and the sale error and lead to Corollary 33 which gives an uer bound for the error T s (f ρ f D λ) HK and resents an uer rate of convergence for the sequence of distributed learning algoriths For the sake of the reader we recall Theore 34, which was already shown in [2], resenting the iniax otial rate for the single achine roble This yields an estiate on the difference between the single achine and the distributed learning algorith in Corollary 35 We want to track the recise behavior of these rates not only for what concerns the exonent in the nuber of exales n, but also in ters of their scaling (ultilicative constant) as a function of soe iortant araeters (naely the noise variance σ 2 and the colexity radius R in the source condition) For this reason, we introduce a notion of a faily of rates over a faily of odels More recisely, we consider an indexed faily (M θ ) θ Θ, where for all θ Θ, M θ is a class of Borel robability distributions on X R satisfying the basic general assutions 21 and (22) We consider rates of convergence in the sense of the -th oents of the estiation error, where 1 < is a fixed real nuber As already entioned in the Introduction, our roofs are based on a classical biasvariance decoosition as follows: Introducing (31) f λ D = 1 g λ ( T xj ) T xj f ρ, we write (32) T s (f ρ f D) λ = T s ( f ρ f D λ ) + T s ( f D λ f D λ ) = 1 T s r λ ( T xj )f ρ + 1 T s g λ ( T xj )( T xj f ρ Sx j y j ) }{{}}{{} Aroxiation Error Sale Error In all the forthcoing results in this section, we let s [0, 1 ], 1 and consider 2 the odel M σ,m,r := M(r, R, P < (b, β)) where r > 0, b > 1 and β > 0 are fixed, and θ = (R, M, σ) varies in Θ = R 3 λn + Given a sale D (X R) of size n, define f D, f λn D as λn in Section 24 and f D as in (31), using a regularization function of qualification q r + s, with araeter sequence (33) λ n := λ n,(σ,r) := in ( ( σ 2 R 2 n ) b 2br+b+1, 1 ),

7 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 7 indeendent on M Define the sequence ( ) σ 2 b(r+s) 2br+b+1 (34) a n := a n,(σ,r) := R R 2 n We recall fro the introduction that we shall always assue that n is a ultile of With these rearations, our ain results are: Theore 31 (Aroxiation Error) If the nuber of subsale sets satisfies (35) n α, α < Then su (σ,m,r) R 3 + li su n E su ρ n ρ M σ,m,r 2b in{r, 1} 2br + b + 1, [ T s (f ρ ] λn f D ) 1 < a n Theore 32 (Sale Error) If the nuber of subsale sets satisfies (36) n α 2br, α < 2br + b + 1, Then [ ] T s λn λn ( f D f D ) 1 su li su su < n a n (σ,m,r) R 3 + ρ M σ,m,r E ρ n And, as consequence (by (32) and alying the triangle inequality): Corollary 33 If the nuber of subsale sets satisfies (37) n α in{2br, b + 1}, α <, 2br + b + 1 then the sequence (34) is an uer rate of convergence in L, for the interolation nor of araeter s, for the sequence of estiated solutions ( f λ n,(σ,r) D ) over the faily of odels (M σ,m,r ) (σ,m,r) R 3 +, ie [ ] E T s λn (f su li su su ρ n ρ f D ) 1 < n ρ M σ,m,r a n (σ,m,r) R 3 + Theore 34 (Blanchard, Mücke (2017) [2]) The sequence (34) is an uer rate of convergence in L for all 1, for the interolation nor of araeter s, for the sequence of estiated solutions (f λ n,(σ,r) D ) - indeendent on M - over the faily of odels (M σ,m,r ) (σ,m,r) R 3 +, ie [ ] T s (f ρ f λn D ) 1 su (σ,m,r) R 3 + li su n E su ρ n ρ M σ,m,r a n <

8 8 GILLES BLANCHARD AND NICOLE MÜCKE Cobining Corollary 33 with Theore 34 by alying the triangle inequality iediately yields: Corollary 35 If the nuber of subsale sets satisfies (38) n α, α < then su (σ,m,r) R 3 + li su n E su ρ n ρ M σ,m,r 2b in{r, 1} 2br + b + 1, [ T s (f λn D 4 Nuerical Studies ] λn f D ) 1 a n < In this section we nuerically study the error in - nor, corresonding to s = 0 in Corollary 33 (in exectation with = 2) both in the single achine and distributed learning setting Our ain interest is to study the uer bound for our theoretical exonent α, araetrizing the size of subsales in ters of the total sale size, = n α, in different soothness regies In addition we shall deonstrate in which way arallelization serves as a for of regularization More secifically, we let = H 1 0[0, 1] with kernel K(x, t) = x t xt For all exerients in this section, we siulate data fro the regression odel Y i = f ρ (X i ) + ɛ i, i = 1,, n, where the inut variables X i Unif[0, 1] are uniforly distributed and the noise variables ε i N(0, σ 2 ) are norally distributed with standard deviation σ = 0005 We choose the target function f ρ according to two different cases, naely r < 1 (low soothness) and r = (high soothness) To accurately deterine the degree of soothness r > 0, we aly Proosition 41 below by exlicitly calculating the Fourier coefficients ( f ρ, e j HK ) j N, where e j (x) = 2 πj cos(πjx), for j N, fors an ONB of Recall that the rate of eigenvalue decay is exlicitly given by b = 2, eaning that we have full control over all araeters in (38) Fro [8] we need Proosition 41 Let, H 2 be searable Hilbert saces and S : H 2 be a coact linear oerator with singular syste {σ j, ϕ j, ψ j } 2 Denoting by S the generalized inverse 3 of S, one has for any r > 0 and g H 2 : g is in the doain of S and S g I((S S) r ) if and only if g, ψ j H2 2 j=0 σ 2+4r j < 2 ie, the ϕ j are the noralized eigenfunctions of S S with eigenvalues σ 2 j and ψ j = Sϕ j / Sϕ j ; thus S = σ j ϕ j, ψ j 3 the unique unbounded linear oerator with doain I(S) (I(S)) in H 2 vanishing on (I(S)) and satisfying SS = 1 on I(S), with range orthogonal to the null sace N(S)

9 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 9 In our case, is as above, H 2 is L 2 ([0, 1]) with Lebesgue easure and S : H0[0, 1 1] L 2 ([0, 1]) is the inclusion Since H0[0, 1 1] is dense in L 2 ([0, 1]), we know that (I(S)) is trivial, giving SS = 1 on I(S) Furtherore, ϕ j = e j is a noralized eigenbasis of T = S S with eigenvalues σj 2 = (πj) 2 With ψ j = Sϕ j we obtain for f Sϕ j H1 L 2 0[0, 1] Sf, ψ j L 2 = Sf, Thus, alying Proosition 41 gives Se j Se j L 2 = f, S Se j Se j H 1 0 = σ j f, e j H 1 0 Corollary 42 For S and T = S S as above we have for any r > 0: f I(T r ) if and only if j 4r f, e j L 2 2 < Thus, as exected, abstract soothness easured by the araeter r in the source condition corresonds in this secial case to decay of the classical Fourier coefficients which - by the classical theory of Fourier series - easures soothness of the eriodic continuation of f L 2 ([0, 1]) to the real line 401 Low soothness We choose f ρ (x) = 1x(1 x) which clearly belongs to H 2 K A straightforward calculation gives the Fourier coefficient f ρ, e j = 2(πj) 2 for j odd (vanishing for j even) Thus, by the above criterion, f ρ satisfies the source condition f ρ Ran(T r ) recisely for 0 < r < 075 According to Theore 34, the worst case rate in the single achine roble is given by n γ, with γ = 025 Regularization is done using the ν ethod (see Exale 24), with qualification q = ν = 1 Recall that the stoing index k sto serves as the regularization araeter λ, where k sto λ 2 We consider sale sizes fro 500, 9000 In the odel selection ste, we estiate the erforance of different odels and choose the oracle stoing tie ˆk oracle by iniizing the reconstruction error: ( ) 1 1 M ˆk oracle = arg in f ρ k M ˆf j k 2 2 over M = 30 runs In the odel assessent ste, we artition the dataset into n α subsales, for any α {0, 005, 01,, 085} On each subsale we regularize using the oracle stoing tie ˆk oracle (deterined by using the whole sale) Corresonding to Corollary 33, the accuracy should be coarable to the one using the whole sale as long as α < 05 In Figure 1 (left anel) we lot the reconstruction error f ˆk f ρ HK versus the ratio α = log()/ log(n) for different sale sizes We execute each siulation M = 30 ties The lot suorts our theoretical finding The right anel shows the reconstruction error versus the total nuber of sales using different artitions of the data The black curve (α = 0) corresonds to the baseline error ( = 0, no artition of data) Error curves below a threshold α < 06 are roughly coarable, whereas curves above this threshold show a ga in erforances

10 10 GILLES BLANCHARD AND NICOLE MÜCKE In another exerient we study the erforances in case of (very) different regularization: Only artitioning the data (no regularization), underregularization (higher stoing index) and overregularization (lower stoing index) The outcoe of this exerient alifies the regularization effect of arallelizing Figure 2 shows the ain oint: Overregularization is always hoeless, underregularization is better In the extree case of none or alost none regularization there is a shar iniu in the reconstruction error which is only slightly larger than the iniax otial value for the oracle regularization araeter and which is achieved at an attractively large degree of arallelization Qualitatively, this agrees very well with the intuitive notion that arallelizing serves as regularization We ehasize that nuerical results see to indicate that arallelization is ossible to a slightly larger degree than indicated by our theoretical estiate A siilar result was reorted in the aer [21], which also treats the low soothness case 402 High soothness We choose f ρ (x) = 1 sin(2πx), which corresonds to just one 2π non-vanishing Fourier coefficient and by our criterion Corollary 42 has r = In view of our ain Corollary 33 this requires a regularization ethod with higher qualification; we take the Gradient Descent ethod (see Exale 23) The aearance of the ter 2b in{1, r} in our theoretical result 33 gives a redicted value α = 0 (and would ily that arallelization is strictly forbidden for infinite soothness) More secifically, the left anel in Figure 3 shows the absence of any lateau for the reconstruction error as a function of α This corresonds to the right anel showing that no grou of values of α erfors roughly equivalently, eaning that we do not have any otiality guarantees Plotting different values of regularization in Figure 4 we again identify overregularization as hoeless, while severe underregularization exhibits a shar iniu in the reconstruction error But its value at roughly 025 is uch less attractive coared to the case of low soothness where the error is an order of agnitude less

11 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 11 Figure 1 The reconstruction error f k oracle D f ρ HK in the low soothness case Left lot: Reconstruction error curves for various (but fixed) sale sizes as a function of the nuber of artitions Right lot: Reconstruction error curves for various (but fixed) nubers of artitions as a function of the sale size (on log-scale)

12 12 GILLES BLANCHARD AND NICOLE MÜCKE Figure 2 The reconstruction error f D λ f ρ HK in the low soothness case Left lot: Error curves for different stoing ties for n = 500 sales, as a function of the nuber of artitions Right lot: Error curves for different stoing ties for n = 5000 sales, as a function of the nuber of artitions

13 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 13 Figure 3 The reconstruction error f λ oracle D f ρ HK in the high soothness case Left lot: Reconstruction error curves for various (but fixed) sale sizes as a function of the nuber of artitions Right lot: Reconstruction error curves for various (but fixed) nubers of artitions as a function of the sale size (on log-scale)

14 14 GILLES BLANCHARD AND NICOLE MÜCKE Figure 4 The reconstruction error f λ D f ρ in the high soothness case Left lot: Error curves for different stoing ties for n = 500 sales, as a function of the nuber of artitions Right lot: Error curves for different stoing ties for n = 5000 sales, as a function of the nuber of artitions 5 Discussion Miniax Otiality: We have shown that for a large class of sectral regularization ethods the error of the distributed algorith T s λn ( f D f ρ) HK satisfies the sae uer bound as the error T s (f λn D f ρ) HK for the single achine roble, if the regularization araeter λ n is chosen according to (33), rovided the nuber of subsales grows sufficiently slowly with the sale size n Since, by [2], the rates for the latter are iniax otial, our rates in Corollary 33 are iniax otial also Coarison with other results: In [21] the authors derive Miniax-otial rates in 3 cases: finite rank kernels, sub- Gaussian decay of eigenvalues of the kernel and olynoial decay, rovided satisfies a certain uer bound, deending on the rate of decay of the eigenvalues under two crucial assutions on the eigenfunctions of the integral oerator associated to the kernel: For any j N (51) E[φ j (X) 2k ] ρ 2k k, for soe k 2 and ρ k < or even stronger, it is assued that the eigenfunctions are uniforly bounded, ie (52) su φ j (x) ρ, x X

15 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 15 or any j N and soe ρ < We shall describe in ore detail the case of olynoially decaying eigenvalues, which corresonds to our setting Assuing eigenvalue decay µ j j b with b > 1, the authors choose a regularization araeter λ n = n b b+1 and leading to an error in L 2 - nor E[ ( f λn D n b(k 4) k b+1 ρ 4k log k (n) ) 1 k 2 f ρ 2 b L2] n b+1, being iniax otial For k < 4, this is not a useful bound, since 1 as n in this case (for any sort of eigenvalue decay) On the other hand, if k and b ight be taken arbitrarily large - corresonding to alost bounded eigenfunctions and arbitrarily large olynoial decay of eigenvalues - ight be chosen roortional to n 1 ɛ, for any ɛ > 0 As ight be exected, relacing the L 2k bound on the eigenfunctions by a bound in L, gives an uer bound on which sily is the liit for k in the bound given above, naely n b 1 b+1 ρ 4 log n, which for large b behaves as above Granted bounds on the eigenfunctions in L 2k for (very) large k, this is a strong result While the decay rate of the eigenvalues can be deterined by the soothness of K (see, eg, [9] and references therein), it is a widely oen question which general roerties of the kernel ily estiates as in (51) and (52) on the eigenfunctions The author in [22] even gives a counterexale and resents a C Mercer kernel on [0, 1] where the eigenfunctions of the corresonding integral oerator are not uniforly bounded Thus, soothness of the kernel is not a sufficient condition for (52) to hold Moreover, we oint out that the uer bound (51) on the eigenfunctions (and thus the uer bound for in [21]) deends on the (unknown) arginal distribution ν (only the strongest assution, a bound in su-nor (52), does not deend on ν) Concerning this oint, our aroach is agnostic As already entioned in the Introduction, these bounds on the eigenfunctions have been eliinated in [16], for KRR, iosing olynoial decay of eigenvalues as above This is very siilar to our aroach As a general rule, our bounds on and the bounds in [16] are worse than the bounds in [21] for eigenfunctions in (or close to ) L, but in the coleentary case where nothing is known on the eigenfunctions still can be chosen as an increasing function of n, naely = n α More recisely, choosing λ n as in (33), the authors in [16] derive as an uer bound n α, α = 2br 2br + b + 1,

16 16 GILLES BLANCHARD AND NICOLE MÜCKE with r being the soothness araeter arising in the source condition We recall here that due to our assution q r + s, the soothness araeter r is restricted to the interval (0, 1 2 ] for KRR (q = 1) and L2 risk (s = 1 2 ) Our results (which hold for a general class of sectral regularization ethods) are in soe ways coarable to [16] Secialized to KRR, our estiates for the exonent α in = O(n α ) coincide with the result given in [16] Furtherore we ehasize that [21] and [16] estiate the DL-error only for s = 1/2 in our notation (corresonding to L 2 (ν) nor), while our result holds for all values of s [0, 1/2] which soothly interolates between L 2 (ν) nor and RKHS nor and, in addition, for all values of [1, ) Thus, our results also aly to the case of non-araetric inverse regression, where one is articularly interested in the reconstruction error (ie - nor), see eg [2] Additionally, we recisely analyze the deendence of the noise variance σ 2 and the colexity radius R in the source condition Concerning general strategy, while [16] uses a novel second order decoosition in an essential way, our aroach is ore classical We clearly distinguish between estiating the aroxiation error and the sale error The bias using a subsale should be of the sae order as when using the whole sale, whereas the estiation error is higher on each subsale, but gets reduced by averaging by writing the variance as a su of iid rando variables (which allows to use Rosenthal s inequality) Finally, we want to ention the recent works [15] and [12], which were worked out indeently fro our work The authors in [12] also treat general sectral regularization ethods (going beyond kernel ridge) and obtain essentially the sae results, but with error bounds only in L 2 - nor, excluding inverse learning robles In [15], the authors investigate distributed learning on the exale of Gradient Descent algoriths, which have infinite qualification and allow larger soothness of the regression function They are able to irove the uer bound for the nuber of local achines to n α log 5 (n) + 1, α < br 2br + b + 1 which is larger in case r > 2 In the interediate case 1 < r < 2, our bound in (37) is still better An interesting feature is the fact that it is ossible to allow ore local achines by using additional unlabeled data This indicates that finding the uer bound for the nuber of achines in the high soothness regie is still an oen roble Nuber of Subsales: We follow the line of reasoning in earlier work on distributed learning insofar as we only rove sufficient conditions on the cardinality = n α of subsales coatible with iniax otial rates of convergence On the coleentary roble of roving necessity, analytical results are unknown to the best of our knowledge However, our nuerical results see to indicate that the exonent α ight actually be taken larger than we have roved so far in the low soothness regie Adativity: It is clear fro the theoretical results that both the regularization araeter λ and the allowed cardinality of subsales deend on the araeters r and b,

17 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 17 which in general are unknown Thus, an adative aroach to both araeters b and r for choosing λ and is of interest To the best of our knowledge, there are yet no rigorous results on adativity in this ore general sense Progress in this field ay well be crucial in finally assessing the relative erits of the distributed learning aroach as coared with alternative strategies to effectively deal with large data sets We sketch an alternative naive aroach to adativity, based on hold-out in the direct case, where we consider each f also as a function in L 2 (X, ν) We slit the data z (X Y) n into a training and validation art z = (z t, z v ) of cardinality t, v We further subdivide z t into k subsales, roughly of size t / k, where k t, k = 1, 2, is soe strictly decreasing sequence For each k and each subsale z j, 1 j k, we define the estiators ˆf z λ j as in (27) and their average λ (53) f k,z t := 1 k k Here, λ varies in soe sufficiently fine lattice Λ Then evaluation on z v gives the associated eirical L 2 error (54) Err λ k(z v ) := 1 v yi v f k,z λ t(xv i ) 2, z v = (y v, x v ), y v = (y1, v, y v v ), v leading us to define i=1 ˆf λ z j (55) ˆλk := Argin λ Λ Err λ k(z v ), Err(k) := Errˆλ k k (zv ) Then, an aroriate stoing criterion for k ight be to sto at (56) k := in{k 3 : (k) δ inf (j)}, (j) := Err(j) Err(j 1), 2 j<k for soe δ < 1 (which ight require tuning) The corresonding regularization araeter is ˆλ = ˆλ k, given by (55) At least intuitively, it is then reasonable to define a urely data driven estiator as (57) fn := f ˆλ k,z t Note that the training data z t enter the definition of fn via the exlicit forula (53) encoding our kernel based aroach, while z v serves to deterine (k, ˆλ ) via iniization of the eirical L 2 error and soe for of the discreancy rincile, which tells one to sto where Err(j) does not areciably irove anyore It is oen if such a rocedure achieves otial rates, and we have to leave this for future research 6 Proofs For ease of reading we ake use of the following conventions: we are interested in a recise deendence of ultilicative constants on the araeters σ, M, R, η,, n and

18 18 GILLES BLANCHARD AND NICOLE MÜCKE the deendence of ultilicative constants on various other araeters, including the kernel araeter κ, the nor araeter s [0, 1 ], the araeters arising fro 2 the regularization ethod, b > 1, β > 0, r > 0, etc will (generally) be oitted and sily indicated by the sybol the value of C ight change fro line to line the exression for n sufficiently large eans that the stateent holds for n n 0, with n 0 otentially deending on all odel araeters (including σ, M and R), but not on η 61 Preliinaries For roving our error bounds, we recall soe results (without roof) fro [5] We introduce the effective diension N (λ), being a easure for the colexity of with resect to the arginal distribution ν: For λ (0, 1] we set (61) N (λ) = tr( ( T + λ) 1 T ) Since the oerator T is trace-class, N (λ) < Moreover, N (λ) satisfies 1 2 βb N (λ) b 1 (κ2 λ) 1 rovided the arginal distribution ν of X belongs to P < (b, β) with b > 1 and β > 0 (see [5], Proosition 3) Proosition 61 ([12], Proosition 1) Let x 1,, x n be an iid sale, drawn according to ν on X Define ( ) 2 2 N (λ) (62) B n (λ) := 1 + nλ + nλ For any λ > 0, η (0, 1], with robability at least 1 η one has (63) ( Tx + λ) 1 ( T + λ) 8 log 2 (2η 1 )B n (λ) Corollary 62 Let η (0, 1) For n N let λ n be ilicitly defined as the unique solution of N ( λ n ) = n λ n Then for any λ n λ 1 one has In articular, with robability at least 1 η B n (λ) 26 b, ( Tx + λ) 1 ( T + λ) 208 log 2 (2η 1 ), Proof of Corollary 62 Let λ n be defined via N ( λ n ) = n λ n Since N (λ)/λ is decreasing, we have for any λ λ n N (λ) nλ N ( λ n ) = 1 n λ n

19 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 19 Since the effective diesion is lower bounded by 1, by the inequality above 2 N (λ) 1 nλ 1 = 1 2nλ nλ 2 < 2 for any λ λ n Inserting these bounds into 63 and noticing that 1 2 log(2η 1 ) for any η (0, 1) leads to the conclusion Corollary 63 If λ n is defined by (33) and if one has rovided n is sufficiently large n n α, α < B n n (λ n) 2, Proof of Lea 63 Recall that N (λ n ) C b λ 1 b n of λ n in (33) yields rovided Finally, n λ r n = o(1) if 2br 2br + b + 1, and σ 2 n nλ = o ( ) n λ r n, n n α, α < n n α, α < λ 1 b n nλ n 2(br + 1) 2br + b + 1 2br 2br + b + 1 = Rλ r n Using the definition We shortly illustrate how Corollary 62 and Proosition 61 will be used Let u [0, 1], λ n λ as above and f We have T u f HK = T u ( T + λ) u ( T + λ) u ( T x + λ) u (T x + λ) u f HK (64) T u ( T + λ) u ( T + λ) u ( T x + λ) u ( Tx + λ) u f HK 8 log 2u (2η 1 )B n (λ) u ( Tx + λ) u f HK, with robability at least 1 η, for any η (0, 1) In articular, for any λ n λ (with λ n as in Corollary 62) (65) T u f HK 208 u log 2u (2η 1 ) ( T x + λ) u f HK, with robability at least 1 η In the following, we constantly use (65)

20 20 GILLES BLANCHARD AND NICOLE MÜCKE 62 Aroxiation Error Bound Recall that ν denotes the inut saling distribution and P the set of all robability distributions on the inut sace X Lea 64 Let ν P, v R and let x X n be an iid sale, drawn according to ν Assue the regularization (g λ ) λ has qualification q v s Then with robability at least 1 η T s r λ ( T x ) T x( v T T x ) HK C log 4 (4η 1 )λ s+v+1 B s+1 n (λ) for soe C < ( 2 nλ + ) N (λ) Proof of Lea 64 Fro (64) and fro Proosition A1, since q s + v + 1, one has T s r λ ( T x ) T x( v T T x ) HK C log 2(s+1) (4η 1 )B s+1 n (λ) ( Tx + λ) s r λ ( T x ) T x( v T x + λ) ( T + λ) 1 ( T T x ) ( ) C log 4 (4η 1 )λ s+v+1 B s+1 2 N (λ) n (λ) nλ +, nλ nλ for any λ (0, 1], η (0, 1], with robability at least 1 η We also used that s 1 2 Lea 65 Let ν P, v R and let x X n be an iid sale, drawn according to ν Assue the regularization (g λ ) λ has qualification q v + s Then for any λ (0, 1], η (0, 1], with robability at least 1 η T s r λ ( T x ) T x v C log 2s (2η 1 )B s n (λ)λ s+v, for soe C < Proof of Lea 65 Using (64), since q v + s T s r λ ( T x ) T x v C log 2s (2η 1 )B s n (λ) ( Tx + λ) s r λ ( T x ) T x v C log 2s (2η 1 )B s n (λ)λ s+v, with robability at least 1 η Proosition 66 (Exectation of Aroxiation Error) Let f ρ Ω ν (r, R), λ (0, 1] and let B n (λ) be defined in (62) Assue the regularization has qualification q r + s For any 1 one has: (1) If r 1, then (2) If r > 1, then E ρ n E ρ n [ T s (f ρ f D) ] λ 1 [ T s (f ρ f D) ] λ 1 C Rλ s B s+1 n (λ) C R λ s+r B s+r n (λ) ( λ r + λ ( 2 nλ + )) N (λ) nλ

21 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 21 In 1 and 2 the constant C does not deend on (σ, M, R) R 3 + Proof of Proosition 66 Since f ρ Ω ν (r, R) (66) E ρ n [ T s (f ρ f D) ] λ 1 = E ρ n 1 R [ 1 T s r λ ( T ] 1 xj )f ρ E ρ n[ T s r λ ( T )f ] xj 1 ρ E ρ n [ T s r λ ( T ) T r ] xj 1 The first inequality is just the triangle inequality for the - nor f = E[ f ] 1 We bound the exectation for each searate subsale of size n by first deriving a robabilistic estiate and then we integrate Consider first the case where r 1 Using (64) and Cordes Inequality Proosition A3, one has for any j = 1,, T s r λ ( T xj ) T r C log 2(s+r) (4η 1 )B s+r n (λ) ( Txj + λ) s r λ ( T xj )( T xj + λ) r C log 3 (4η 1 )λ s+r B s+r n (λ), with robability at least 1 η and where B s+r n (λ) is defined in (62) regularization has qualification q r + s By integration one has Recall that the E ρ n [ T s r λ ( T xj ) T r ] 1 C, λ s+r B s+r n (λ), for soe C, <, not deending on σ, M, R Finally, fro (66) E ρ n [ T s (f ρ f D) ] λ 1 C, R λ s+r B s+r n (λ) In the case where r 1, we write r = k + u, with k = r and u = r k < 1 We shall use the decoosition (67) T k = k 1 l=0 T l x( T T x ) T k (l+1) + T k x

22 22 GILLES BLANCHARD AND NICOLE MÜCKE We roceed by bounding (66) according to decoosition (67) For any j = 1,, one has [ E ρ n T s r λ ( T ) T k+u xj ] 1 k 1 [ E ρ n T s r λ ( T ) T l xj x j ( T T xj ) T k (l+1)+u ] 1 l=0 k 1 l=0 + E ρ n E ρ n [ T s r λ ( T xj ) T k x j T u ] 1 [ T s r λ ( T xj ) T l x j ( T T xj ) ] 1 (68) + E ρ n [ T s r λ ( T xj ) T k x j T u ] 1 Here we use that T k (l+1)+u is bounded by 1 By Lea 65 and by (64), with robability at least 1 η T s r λ ( T xj ) T x k u j T C log 2(s+u) (2η 1 )B s+u n (λ)λ s+r and thus integration yields [ (69) E ρ n T s r λ ( T ) T r u xj x j T ] 1 C, B s+u n (λ)λ s+r For estiating the first ter in (68) we ay use Lea 64 For any l = 0,, k 1, j = 1,, with robability at least 1 η ( T s r λ ( T xj ) T x l j ( T T ) xj ) C log 4 (8η 1 )λ s+l+1 B s+1 2 N (λ) n (λ) nλ + nλ Again by integration, since λ l 1 for any l = 0,, k 1, one has k 1 [ (610) E ρ n T s r λ ( T xj ) T l ( T xj T xj ) ( ) ] 1 C, r λ s+1 B s+1 2 N (λ) n (λ) nλ + nλ l=0 Finally, cobining (69) and (610) with (66) gives in the case where r > 1 [ E ρ n T s (f ρ f D) ( ( )) ] λ 1 C H λ s B s+1 n (λ) λ r 2 N (λ) + λ K nλ + nλ The rest of the roof follows fro (68) Proof of Theore 31 Let λ n defined by (33) According to Lea 63, we have B n (λ n) n 2 rovided α < 2br We iediately obtain fro the first art of Proosition 66 in 2br+b+1 the case where r 1 E ρ n[ T s (f ρ ] λn f D ) 1 C, R λ s+r n = C, a n

23 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 23 We turn to the case where r > 1 We aly the second art of Proosition 66 By Corollary 63 we have ] E ρ n[ T s λn (f ρ f D ) 1 C H Rλ s K nb s+1 n (λ n ) λ r n + λ n 2 n n N (λ n ) + n nλ n nλ n C, Rλ s n λ r n + λ n 2 n n N (λ n ) +, nλ n nλ n where we used that N (λ n ) C b λ 1/b n and the definition of λ n Observe that 2 n nλ n = o ( ) n λ r n, rovided n n α 2(br + 1), α < 2br + b + 1 Furtherore, for n sufficiently large, R n λ σ n 1, rovided that As a result, for any 1 li su n α < E su ρ n ρ M σ,m,r 2b 2br + b + 1 [ T s (f ρ for soe C, <, not deending on σ, M, R ] λn f D ) 1 C,, a n 63 Sale Error Bound The ain idea for deriving an uer bound for the sale error is to identify it as a su of unbiased Hilbert sace- valued iid variables and then to aly a suitable version of Rosenthal s inequality Given λ (0, 1], we define the rando variable ξ λ : (X R) n by ξ λ (x, y) := T s g λ ( T x )( T x f ρ S xy) Recall that according to Assution 21, the conditional exectation wrt ρ of Y given X satisfies E ρ [Y X = x] = S x f ρ, ilying that ξ λ is unbiased (since T x = S x S x ) Thus, (611) T s ( f D λ f D) λ = 1 ξ λ (x j, y j ) is a su of centered iid rando variables

24 24 GILLES BLANCHARD AND NICOLE MÜCKE Furtherore, we need the following result fro [18], Theore 52, which generalizes Rosenthal s inequalities fro [19] (originally only forulated for real valued rando variables) to rando variables with values in a Banach sace For Hilbert saces this looks articularly nice Proosition 67 Let H be a Hilbert sace and ξ 1,, ξ be a finite sequence of indeendent, ean zero H- valued rando variables If 2 <, then there exists a constant C > 0, only deending on, such that (612) E 1 ξ j H 1 ( C ax E ξ j H ) 1 (, E ξ j 2 H ) 1 2 We reark in assing that [7], Corollary 122, contains the interesting result that in addition to the uer bound in (612) there is also a corresonding lower bound where the constant C is relaced by another constant C > 0, only deending on Proosition 68 (Exectation of Sale Error) Let ρ be a source distribution belonging to M σ,m,r, s [0, 1] and let λ (0, 1] Define B n (λ) as in (62) Assue the regularization 2 has qualification q r + s For any 1 one has: [ E ρ n T s ( f D λ f D) ( ) ] λ 1 C H 1 2 B n (λ) 1 2 +s λ s M N (λ) K nλ + σ, nλ where C does not deend on (σ, M, R) R 3 + Proof of Proosition 68 Let λ (0, 1] and 2 Fro Proosition 67 [ B E s ρ n ( f D λ f ] D) λ 1 = Eρ n 1 (613) ξ λ (x j, y j ) ( C ax E ρ n [ ] ) 1 ( ξ λ (x j, y j ), E ρ n 1 2 [ ] ) 1 ξ λ (x j, y j ) 2 Again, the estiates in exectation will follow fro integration a bound holding with high robability By (64), one has for any j = 1,, ξ λ (x j, y j ) HK = T s g λ ( T xj )( T xj f ρ S x j y j ) HK (614) 8 log 2s (4η 1 )B n (λ)s ( T xj + λ) s g λ ( T xj )( T xj f ρ S x j y j ) HK, holding with robability at least 1 η, where B n (λ) is defined in (62) We roceed by 2 slitting: ( T xj + λ) s g λ ( T xj )( T xj f ρ S x j y j ) = H x (1) j H x (2) j h λ z j,

25 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 25 with H (1) x j := ( T xj + λ) s g λ ( T xj )( T xj + λ) 1 2, H (2) x j := ( T xj + λ) 1 2 ( T + λ) 1 2, h λ z j := ( T + λ) 1 2 ( Txj f ρ S x j y j ) The first ter is estiated using (26) and gives (615) H (1) x j C λ s 1 2 The second ter is now bounded using (64) once ore One has with robability at least 1 η 4 (616) H (2) x j 8 log(8η 1 )B n (λ) 1 2 Finally, h λ z j is estiated using Proosition A2: (617) ( ) h λ z j 2 log(8η 1 M N (λ) ) n λ + σ, n holding with robability at least 1 η Thus, cobining (615), (616) and (617) with 4 (614) gives for any j = 1,, ξ λ (x j, y j ) HK C log 2(s+1) (8η 1 )B n (λ) 1 2 +s λ s ( M nλ + σ with robability at least 1 η Integration gives for any 2 [ E ρ n ξ λ (x j, y j ) ] C H, A, K with A := A n (λ) := B n (λ) 1 2 +s λ s Cobining this with (613) ilies, since 2 [ E ρ n T s ( f D λ f D) ] λ 1 ( M N (λ) nλ + σ nλ ) N (λ), nλ ) C ( ax ( (A ) ) 1, A ) = C A ax ( = C A, 1, 1 2 where C does not deend on (σ, M, R) R 3 + The result for the case 1 2 iediately follows fro Hölder s inequality )

26 26 GILLES BLANCHARD AND NICOLE MÜCKE Proof of Theore 32 Let λ n defined by (33) According to Lea 63 we have B n (λ n) 2 rovided α < 2br We iediately obtain fro Proosition 68 2br+b+1 ] E ρ n[ T s λn λn ( f D f D ) 1 C λ s M N (λ n ) n + σ nλ n nλ n C λ s M N (λ n ) n + σ nλ n nλ n Again, we use that N (λ n ) C b λ 1/b n n M nλ n and = o σ λn 1/b, nλ n rovided Recalling that σ λ 1/b n nλ n n n α, α < = Rλ r n = λ s n a n, we arrive at 2(br + 1) 2br + b + 1 As a result, for any 1 li su n E ρ n[ T s ( f λn D E su ρ n ρ M σ,m,r ] λn f D ) 1 [ T s ( f λn D C a n ] λn f D ) 1 a n C, for soe C <, not deending on the odel araeter (σ, M, R) R 3 + Aendix A Proosition A1 (see eg [2]) For any n N, λ (0, 1] and η (0, 1), one has with robability at least 1 η : ( T + λ) 1 ( T T x ) ( ) HS 2 log(2η 1 2 N (λ) ) nλ + nλ Proosition A2 (see eg [2]) For n N, λ (0, 1] and η (0, 1], it holds with robability at least 1 η : ( ( B + λ) 1 2 Bx f ρ S xy ) ( ) HK 2 log(2η 1 M ) n λ + σ2 N (λ) n

27 PARALLELIZING SPECTRAL ALGORITHMS FOR KERNEL LEARNING 27 Proosition A3 (Cordes Inequality,[1], Theore IX21-2) Let A, B be to self-adjoint, ositive oerators on a Hilbert sace Then for any s [0, 1]: (A1) A s B s AB s References [1] R Bhatia Matrix Analysis Sringer, 1997 [2] G Blanchard and N Mücke Otial rates for regularization of statistical inverse learning robles Foundations of Coutational Matheatics, 2017 doi:101007/s [3] L Chang and Wang Divide and conquer local average regression arxiv Prerint ( ), 2016 [4] G Cheng and Z Shang Coutational liits of divide-and-conquer ethod arxiv Prerint ( ), 2015 [5] E De Vito and A Caonnetto Otial rates for regularized least-squares algorith Foundations of Coutational Matheatics, 7(3): , 2006 [6] L Dicker, D Foster, and D Hsu Kernel ethods and regularization techniques for nonaraetric regression: Miniax otiality and adatation Technical reort, Rutgers University, 2015 [7] S Dirksen Noncoutative and vector-valued Rosenthal inequalities PhD thesis, Delft Univ Technology, 2011 [8] H Engl, M Hanke, and A Neubauer Regularization of Inverse Probles Kluwer Acadeic Publishers, 2000 [9] J C Ferreira and V A Menegatto Eigenvalues of integral oerators defined by sooth ositive definite kernels Integral equations and Oerator Theory, 64, 2009 [10] L L Gerfo, L Rosasco, F Odone, E De Vito, and A Verri Sectral algoriths for suervised learning Neural Coutation, 20(7): , 2008 [11] Q Guo et al Efficient divide-and-conquer classification based on arallel feature-sace decoosition for distributed systes IEEE Systes Journal, 2015 [12] Z-C Guo, S-B Lin, and D-X Zhou Learning theory of distributed sectral algoriths Inverse Probles, 33(7):074009, 2017 [13] C J Hsieh, S Si, and I Dhillon A divide-and-conquer solver for kernel suort vector achine Proceedings of the 31 International Conference on Machine Learning, 2014 [14] R Li, D K J Lin, and B Li Statistical inference in assive data sets Alied Stochastic Models in Business and Industry, 29 (5): , 2013 [15] D-X Lin, Shao-Boand Zhou Distributed kernel-based gradient descent algoriths Constructive Aroxiation, May 2017 [16] S Lin, X Guo, and D-X Zhou Distributed learning with regularized least squares arxiv Prerint ( ), 2016 [17] L Mackey, A Talwalkar, and M I Jordan Divide-and-conquer atrix factorization Advances in Neural Inforation Processing Systes 24 (NIPS 2011), 2011 [18] I Pinelis Otiu bounds for the distributions of artingales in banach saces The Annals of Probability, 22(4): , 1994 [19] H P Rosenthal On the subsaces of L ( > 2) sanned by sequences of indeendent rando variables Israel J Math, 8: , 1970 [20] C Xu, Y Zhang, and R Li On the feasibility of distributed kernel regression for big data arxiv Prerint ( ), 2015 [21] Y Zhang, J Duchi, and M Wainwright Divide and conquer kernel ridge regression JMLR: Worksho and Conference Proceedings, 30, 2013 [22] D-X Zhou The covering nuber in learning theory Journal of Colexity, 18 (3): , 2002 [23] D-X Zhou Distributed learning algoriths Technical reort, Matheatisches Forschungsinstitut Oberwolfach Reort No 33, 2016

28 28 GILLES BLANCHARD AND NICOLE MÜCKE Institute of Matheatics, University of Potsda, Karl-Liebknecht-Strae Potsda, Gerany E-ail address:

Parallelizing Spectrally Regularized Kernel Algorithms

Parallelizing Spectrally Regularized Kernel Algorithms Journal of Machine Learning Research 19 (2018) 1-29 Subitted 11/16; Revised 8/18; Published 8/18 Parallelizing Sectrally Regularized Kernel Algoriths Nicole Mücke nicole.uecke@atheatik.uni-stuttgart.de

More information

Approximation by Piecewise Constants on Convex Partitions

Approximation by Piecewise Constants on Convex Partitions Aroxiation by Piecewise Constants on Convex Partitions Oleg Davydov Noveber 4, 2011 Abstract We show that the saturation order of iecewise constant aroxiation in L nor on convex artitions with N cells

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Exlorer ALMOST-ORTHOGONALITY IN THE SCHATTEN-VON NEUMANN CLASSES Citation for ublished version: Carbery, A 2009, 'ALMOST-ORTHOGONALITY IN THE SCHATTEN-VON NEUMANN CLASSES' Journal of

More information

NONNEGATIVE matrix factorization finds its application

NONNEGATIVE matrix factorization finds its application Multilicative Udates for Convolutional NMF Under -Divergence Pedro J. Villasana T., Stanislaw Gorlow, Meber, IEEE and Arvind T. Hariraan arxiv:803.0559v2 [cs.lg 5 May 208 Abstract In this letter, we generalize

More information

[95/95] APPROACH FOR DESIGN LIMITS ANALYSIS IN VVER. Shishkov L., Tsyganov S. Russian Research Centre Kurchatov Institute Russian Federation, Moscow

[95/95] APPROACH FOR DESIGN LIMITS ANALYSIS IN VVER. Shishkov L., Tsyganov S. Russian Research Centre Kurchatov Institute Russian Federation, Moscow [95/95] APPROACH FOR DESIGN LIMITS ANALYSIS IN VVER Shishkov L., Tsyganov S. Russian Research Centre Kurchatov Institute Russian Federation, Moscow ABSTRACT The aer discusses a well-known condition [95%/95%],

More information

AN EXPLICIT METHOD FOR NUMERICAL SIMULATION OF WAVE EQUATIONS

AN EXPLICIT METHOD FOR NUMERICAL SIMULATION OF WAVE EQUATIONS The 4 th World Conference on Earthquake Engineering October -7, 8, Beiing, China AN EXPLICIT ETHOD FOR NUERICAL SIULATION OF WAVE EQUATIONS Liu Heng and Liao Zheneng Doctoral Candidate, Det. of Structural

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 13,,. 8 14 M a t h e a t i c s ON BOUNDEDNESS OF A CLASS OF FIRST ORDER LINEAR DIFFERENTIAL OPERATORS IN THE SPACE OF n 1)-DIMENSIONALLY

More information

J.B. LASSERRE AND E.S. ZERON

J.B. LASSERRE AND E.S. ZERON L -NORMS, LOG-BARRIERS AND CRAMER TRANSFORM IN OPTIMIZATION J.B. LASSERRE AND E.S. ZERON Abstract. We show that the Lalace aroxiation of a sureu by L -nors has interesting consequences in otiization. For

More information

EXACT BOUNDS FOR JUDICIOUS PARTITIONS OF GRAPHS

EXACT BOUNDS FOR JUDICIOUS PARTITIONS OF GRAPHS EXACT BOUNDS FOR JUDICIOUS PARTITIONS OF GRAPHS B. BOLLOBÁS1,3 AND A.D. SCOTT,3 Abstract. Edwards showed that every grah of size 1 has a biartite subgrah of size at least / + /8 + 1/64 1/8. We show that

More information

Handout 6 Solutions to Problems from Homework 2

Handout 6 Solutions to Problems from Homework 2 CS 85/185 Fall 2003 Lower Bounds Handout 6 Solutions to Probles fro Hoewor 2 Ait Charabarti Couter Science Dartouth College Solution to Proble 1 1.2: Let f n stand for A 111 n. To decide the roerty f 3

More information

Lecture 3: October 2, 2017

Lecture 3: October 2, 2017 Inforation and Coding Theory Autun 2017 Lecturer: Madhur Tulsiani Lecture 3: October 2, 2017 1 Shearer s lea and alications In the revious lecture, we saw the following stateent of Shearer s lea. Lea 1.1

More information

DISCRETE DUALITY FINITE VOLUME SCHEMES FOR LERAY-LIONS TYPE ELLIPTIC PROBLEMS ON GENERAL 2D MESHES

DISCRETE DUALITY FINITE VOLUME SCHEMES FOR LERAY-LIONS TYPE ELLIPTIC PROBLEMS ON GENERAL 2D MESHES ISCRETE UALITY FINITE VOLUME SCHEMES FOR LERAY-LIONS TYPE ELLIPTIC PROBLEMS ON GENERAL 2 MESHES BORIS ANREIANOV, FRANCK BOYER AN FLORENCE HUBERT Abstract. iscrete duality finite volue schees on general

More information

Numerical Method for Obtaining a Predictive Estimator for the Geometric Distribution

Numerical Method for Obtaining a Predictive Estimator for the Geometric Distribution British Journal of Matheatics & Couter Science 19(5): 1-13, 2016; Article no.bjmcs.29941 ISSN: 2231-0851 SCIENCEDOMAIN international www.sciencedoain.org Nuerical Method for Obtaining a Predictive Estiator

More information

Some simple continued fraction expansions for an in nite product Part 1. Peter Bala, January ax 4n+3 1 ax 4n+1. (a; x) =

Some simple continued fraction expansions for an in nite product Part 1. Peter Bala, January ax 4n+3 1 ax 4n+1. (a; x) = Soe sile continued fraction exansions for an in nite roduct Part. Introduction The in nite roduct Peter Bala, January 3 (a; x) = Y ax 4n+3 ax 4n+ converges for arbitrary colex a rovided jxj

More information

arxiv: v1 [math.ds] 19 Jun 2012

arxiv: v1 [math.ds] 19 Jun 2012 Rates in the strong invariance rincile for ergodic autoorhiss of the torus Jérôe Dedecker a, Florence Merlevède b and Françoise Pène c 1 a Université Paris Descartes, Sorbonne Paris Cité, Laboratoire MAP5

More information

5. Dimensional Analysis. 5.1 Dimensions and units

5. Dimensional Analysis. 5.1 Dimensions and units 5. Diensional Analysis In engineering the alication of fluid echanics in designs ake uch of the use of eirical results fro a lot of exerients. This data is often difficult to resent in a readable for.

More information

Optimal Adaptive Computations in the Jaffard Algebra and Localized Frames

Optimal Adaptive Computations in the Jaffard Algebra and Localized Frames www.oeaw.ac.at Otial Adative Coutations in the Jaffard Algebra and Localized Fraes M. Fornasier, K. Gröchenig RICAM-Reort 2006-28 www.rica.oeaw.ac.at Otial Adative Coutations in the Jaffard Algebra and

More information

Scaled Enflo type is equivalent to Rademacher type

Scaled Enflo type is equivalent to Rademacher type Scaled Enflo tye is equivalent to Radeacher tye Manor Mendel California Institute of Technology Assaf Naor Microsoft Research Abstract We introduce the notion of scaled Enflo tye of a etric sace, and show

More information

SUPPORTING INFORMATION FOR. Mass Spectrometrically-Detected Statistical Aspects of Ligand Populations in Mixed Monolayer Au 25 L 18 Nanoparticles

SUPPORTING INFORMATION FOR. Mass Spectrometrically-Detected Statistical Aspects of Ligand Populations in Mixed Monolayer Au 25 L 18 Nanoparticles SUPPORTIG IFORMATIO FOR Mass Sectroetrically-Detected Statistical Asects of Lig Poulations in Mixed Monolayer Au 25 L 8 anoarticles Aala Dass,,a Kennedy Holt, Joseh F. Parer, Stehen W. Feldberg, Royce

More information

Frequency Domain Analysis of Rattle in Gear Pairs and Clutches. Abstract. 1. Introduction

Frequency Domain Analysis of Rattle in Gear Pairs and Clutches. Abstract. 1. Introduction The 00 International Congress and Exosition on Noise Control Engineering Dearborn, MI, USA. August 9-, 00 Frequency Doain Analysis of Rattle in Gear Pairs and Clutches T. C. Ki and R. Singh Acoustics and

More information

Modi ed Local Whittle Estimator for Long Memory Processes in the Presence of Low Frequency (and Other) Contaminations

Modi ed Local Whittle Estimator for Long Memory Processes in the Presence of Low Frequency (and Other) Contaminations Modi ed Local Whittle Estiator for Long Meory Processes in the Presence of Low Frequency (and Other Containations Jie Hou y Boston University Pierre Perron z Boston University March 5, 203; Revised: January

More information

Exploiting Matrix Symmetries and Physical Symmetries in Matrix Product States and Tensor Trains

Exploiting Matrix Symmetries and Physical Symmetries in Matrix Product States and Tensor Trains Exloiting Matrix Syetries and Physical Syetries in Matrix Product States and Tensor Trains Thoas K Huckle a and Konrad Waldherr a and Thoas Schulte-Herbrüggen b a Technische Universität München, Boltzannstr

More information

arxiv: v2 [math.st] 13 Feb 2018

arxiv: v2 [math.st] 13 Feb 2018 A data-deendent weighted LASSO under Poisson noise arxiv:1509.08892v2 [ath.st] 13 Feb 2018 Xin J. Hunt, SAS Institute Inc., Cary, NC USA Patricia Reynaud-Bouret University of Côte d Azur, CNRS, LJAD, Nice,

More information

#A62 INTEGERS 16 (2016) REPRESENTATION OF INTEGERS BY TERNARY QUADRATIC FORMS: A GEOMETRIC APPROACH

#A62 INTEGERS 16 (2016) REPRESENTATION OF INTEGERS BY TERNARY QUADRATIC FORMS: A GEOMETRIC APPROACH #A6 INTEGERS 16 (016) REPRESENTATION OF INTEGERS BY TERNARY QUADRATIC FORMS: A GEOMETRIC APPROACH Gabriel Durha Deartent of Matheatics, University of Georgia, Athens, Georgia gjdurha@ugaedu Received: 9/11/15,

More information

INTERIOR BALLISTIC PRINCIPLE OF HIGH/LOW PRESSURE CHAMBERS IN AUTOMATIC GRENADE LAUNCHERS

INTERIOR BALLISTIC PRINCIPLE OF HIGH/LOW PRESSURE CHAMBERS IN AUTOMATIC GRENADE LAUNCHERS XXXX IB08 19th International Syosiu of Ballistics, 7 11 May 001, Interlaken, Switzerland INTERIOR BALLISTIC PRINCIPLE OF HIGH/LOW PRESSURE CHAMBERS IN AUTOMATIC GRENADE LAUNCHERS S. Jaraaz1, D. Micković1,

More information

Uniform Deviation Bounds for k-means Clustering

Uniform Deviation Bounds for k-means Clustering Unifor Deviation Bounds for k-means Clustering Olivier Bache Mario Lucic S Haed Hassani Andreas Krause Abstract Unifor deviation bounds liit the difference between a odel s exected loss and its loss on

More information

Minimizing Machinery Vibration Transmission in a Lightweight Building using Topology Optimization

Minimizing Machinery Vibration Transmission in a Lightweight Building using Topology Optimization 1 th World Congress on Structural and Multidiscilinary Otiization May 19-4, 13, Orlando, Florida, USA Miniizing Machinery Vibration ransission in a Lightweight Building using oology Otiization Niels Olhoff,

More information

Control and Stability of the Time-delay Linear Systems

Control and Stability of the Time-delay Linear Systems ISSN 746-7659, England, UK Journal of Inforation and Couting Science Vol., No. 4, 206,.29-297 Control and Stability of the Tie-delay Linear Systes Negras Tahasbi *, Hojjat Ahsani Tehrani Deartent of Matheatics,

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2

STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2 STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2 ANGELES ALFONSECA Abstract In this aer we rove an almost-orthogonality rincile for

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

IOSIF PINELIS. 1. INTRODUCTION The Euler Maclaurin (EM) summation formula can be written as follows: 2m 1. B j j! [ f( j 1) (n) f ( j 1) (0)], (EM)

IOSIF PINELIS. 1. INTRODUCTION The Euler Maclaurin (EM) summation formula can be written as follows: 2m 1. B j j! [ f( j 1) (n) f ( j 1) (0)], (EM) APPROXIMATING SUMS BY INTEGRALS ONLY: MULTIPLE SUMS AND SUMS OVER LATTICE POLYTOPES arxiv:1705.09159v7 [ath.ca] 29 Oct 2017 IOSIF PINELIS ABSTRACT. The Euler Maclaurin (EM) suation forula is used in any

More information

Ayşe Alaca, Şaban Alaca and Kenneth S. Williams School of Mathematics and Statistics, Carleton University, Ottawa, Ontario, Canada. Abstract.

Ayşe Alaca, Şaban Alaca and Kenneth S. Williams School of Mathematics and Statistics, Carleton University, Ottawa, Ontario, Canada. Abstract. Journal of Cobinatorics and Nuber Theory Volue 6, Nuber,. 17 15 ISSN: 194-5600 c Nova Science Publishers, Inc. DOUBLE GAUSS SUMS Ayşe Alaca, Şaban Alaca and Kenneth S. Willias School of Matheatics and

More information

Analysis of low rank matrix recovery via Mendelson s small ball method

Analysis of low rank matrix recovery via Mendelson s small ball method Analysis of low rank atrix recovery via Mendelson s sall ball ethod Maryia Kabanava Chair for Matheatics C (Analysis) ontdriesch 0 kabanava@athc.rwth-aachen.de Holger Rauhut Chair for Matheatics C (Analysis)

More information

FRESNEL FORMULAE FOR SCATTERING OPERATORS

FRESNEL FORMULAE FOR SCATTERING OPERATORS elecounications and Radio Engineering, 70(9):749-758 (011) MAHEMAICAL MEHODS IN ELECROMAGNEIC HEORY FRESNEL FORMULAE FOR SCAERING OPERAORS I.V. Petrusenko & Yu.K. Sirenko A. Usikov Institute of Radio Physics

More information

Uniform Deviation Bounds for k-means Clustering

Uniform Deviation Bounds for k-means Clustering Unifor Deviation Bounds for k-means Clustering Olivier Bache Mario Lucic S. Haed Hassani Andreas Krause Abstract Unifor deviation bounds liit the difference between a odel s exected loss and its loss on

More information

A Constraint View of IBD Graphs

A Constraint View of IBD Graphs A Constraint View of IBD Grahs Rina Dechter, Dan Geiger and Elizabeth Thoson Donald Bren School of Inforation and Couter Science University of California, Irvine, CA 92697 1 Introduction The reort rovides

More information

On spinors and their transformation

On spinors and their transformation AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH, Science Huβ, htt:www.scihub.orgajsir ISSN: 5-69X On sinors and their transforation Anaitra Palit AuthorTeacher, P5 Motijheel Avenue, Flat C,Kolkata

More information

MULTIPLIER IDEALS OF SUMS VIA CELLULAR RESOLUTIONS

MULTIPLIER IDEALS OF SUMS VIA CELLULAR RESOLUTIONS MULTIPLIER IDEALS OF SUMS VIA CELLULAR RESOLUTIONS SHIN-YAO JOW AND EZRA MILLER Abstract. Fix nonzero ideal sheaves a 1,..., a r and b on a noral Q-Gorenstein colex variety X. For any ositive real nubers

More information

On Maximizing the Convergence Rate for Linear Systems With Input Saturation

On Maximizing the Convergence Rate for Linear Systems With Input Saturation IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 7, JULY 2003 1249 On Maxiizing the Convergence Rate for Linear Systes With Inut Saturation Tingshu Hu, Zongli Lin, Yacov Shaash Abstract In this note,

More information

CHAPTER 2 THERMODYNAMICS

CHAPTER 2 THERMODYNAMICS CHAPER 2 HERMODYNAMICS 2.1 INRODUCION herodynaics is the study of the behavior of systes of atter under the action of external fields such as teerature and ressure. It is used in articular to describe

More information

PROJECTIONS IN VECTOR SPACES OVER FINITE FIELDS

PROJECTIONS IN VECTOR SPACES OVER FINITE FIELDS Annales Acadeiæ Scientiaru Fennicæ Matheatica Voluen 43, 2018, 171 185 PROJECTIONS IN VECTOR SPACES OVER FINITE FIELDS Changhao Chen The University of New South Wales, School of Matheatics and Statistics

More information

Mistiming Performance Analysis of the Energy Detection Based ToA Estimator for MB-OFDM

Mistiming Performance Analysis of the Energy Detection Based ToA Estimator for MB-OFDM IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS Mistiing Perforance Analysis of the Energy Detection Based ToA Estiator for MB-OFDM Huilin Xu, Liuqing Yang contact author, Y T Jade Morton and Mikel M Miller

More information

ACCURACY OF THE DISCRETE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM

ACCURACY OF THE DISCRETE FOURIER TRANSFORM AND THE FAST FOURIER TRANSFORM SIAM J. SCI. COMPUT. c 1996 Society for Industrial and Alied Matheatics Vol. 17, o. 5,. 1150 1166, Seteber 1996 008 ACCURACY OF THE DISCRETE FOURIER TRASFORM AD THE FAST FOURIER TRASFORM JAMES C. SCHATZMA

More information

CALCULATION of CORONA INCEPTION VOLTAGES in N 2 +SF 6 MIXTURES via GENETIC ALGORITHM

CALCULATION of CORONA INCEPTION VOLTAGES in N 2 +SF 6 MIXTURES via GENETIC ALGORITHM CALCULATION of COONA INCPTION VOLTAGS in N +SF 6 MIXTUS via GNTIC ALGOITHM. Onal G. Kourgoz e-ail: onal@elk.itu.edu.tr e-ail: guven@itu.edu..edu.tr Istanbul Technical University, Faculty of lectric and

More information

The Construction of Orthonormal Wavelets Using Symbolic Methods and a Matrix Analytical Approach for Wavelets on the Interval

The Construction of Orthonormal Wavelets Using Symbolic Methods and a Matrix Analytical Approach for Wavelets on the Interval The Construction of Orthonoral Wavelets Using Sybolic Methods and a Matrix Analytical Aroach for Wavelets on the Interval Frédéric Chyzak, Peter Paule, Otar Scherzer, Arin Schoisswohl, and Burkhard Zierann

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

EGN 3353C Fluid Mechanics

EGN 3353C Fluid Mechanics Lecture 4 When nondiensionalizing an equation, nondiensional araeters often aear. Exale Consider an object falling due to gravity in a vacuu d z ays: (1) the conventional diensional aroach, and () diensionless

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Design of Linear-Phase Two-Channel FIR Filter Banks with Rational Sampling Factors

Design of Linear-Phase Two-Channel FIR Filter Banks with Rational Sampling Factors R. Bregović and. Saraäi, Design of linear hase two-channel FIR filter bans with rational saling factors, Proc. 3 rd Int. Sy. on Iage and Signal Processing and Analysis, Roe, Italy, Set. 3,. 749 754. Design

More information

Computationally Efficient Control System Based on Digital Dynamic Pulse Frequency Modulation for Microprocessor Implementation

Computationally Efficient Control System Based on Digital Dynamic Pulse Frequency Modulation for Microprocessor Implementation IJCSI International Journal of Couter Science Issues, Vol. 0, Issue 3, No 2, May 203 ISSN (Print): 694-084 ISSN (Online): 694-0784 www.ijcsi.org 20 Coutationally Efficient Control Syste Based on Digital

More information

The CIA (consistency in aggregation) approach A new economic approach to elementary indices

The CIA (consistency in aggregation) approach A new economic approach to elementary indices The CIA (consistency in aggregation) aroach A new econoic aroach to eleentary indices Dr Jens ehrhoff*, Head of Section Business Cycle and Structural Econoic Statistics * Jens This ehrhoff, resentation

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Quadratic Reciprocity. As in the previous notes, we consider the Legendre Symbol, defined by

Quadratic Reciprocity. As in the previous notes, we consider the Legendre Symbol, defined by Math 0 Sring 01 Quadratic Recirocity As in the revious notes we consider the Legendre Sybol defined by $ ˆa & 0 if a 1 if a is a quadratic residue odulo. % 1 if a is a quadratic non residue We also had

More information

RESOLVENT ESTIMATES FOR ELLIPTIC SYSTEMS IN FUNCTION SPACES OF HIGHER REGULARITY

RESOLVENT ESTIMATES FOR ELLIPTIC SYSTEMS IN FUNCTION SPACES OF HIGHER REGULARITY Electronic Journal of Differential Equations, Vol. 2011 2011, No. 109,. 1 12. ISSN: 1072-6691. URL: htt://ejde.ath.txstate.edu or htt://ejde.ath.unt.edu ft ejde.ath.txstate.edu RESOLVENT ESTIMATES FOR

More information

Input-Output (I/O) Stability. -Stability of a System

Input-Output (I/O) Stability. -Stability of a System Inut-Outut (I/O) Stability -Stability of a Syste Outline: Introduction White Boxes and Black Boxes Inut-Outut Descrition Foralization of the Inut-Outut View Signals and Signal Saces he Notions of Gain

More information

A Subspace Iteration for Calculating a Cluster of Exterior Eigenvalues

A Subspace Iteration for Calculating a Cluster of Exterior Eigenvalues Advances in Linear Algebra & Matrix heory 05 5 76-89 Published Online Seteber 05 in SciRes htt://wwwscirorg/ournal/alat htt://dxdoiorg/0436/alat0553008 A Subsace Iteration for Calculating a Cluster of

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Security Transaction Differential Equation

Security Transaction Differential Equation Security Transaction Differential Equation A Transaction Volue/Price Probability Wave Model Shi, Leilei This draft: June 1, 4 Abstract Financial arket is a tyical colex syste because it is an oen trading

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

1. (2.5.1) So, the number of moles, n, contained in a sample of any substance is equal N n, (2.5.2)

1. (2.5.1) So, the number of moles, n, contained in a sample of any substance is equal N n, (2.5.2) Lecture.5. Ideal gas law We have already discussed general rinciles of classical therodynaics. Classical therodynaics is a acroscoic science which describes hysical systes by eans of acroscoic variables,

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binoial and Poisson Probability Distributions There are a few discrete robability distributions that cro u any ties in hysics alications, e.g. QM, SM. Here we consider TWO iortant and related cases, the

More information

A GENERAL THEORY OF PARTICLE FILTERS IN HIDDEN MARKOV MODELS AND SOME APPLICATIONS. By Hock Peng Chan National University of Singapore and

A GENERAL THEORY OF PARTICLE FILTERS IN HIDDEN MARKOV MODELS AND SOME APPLICATIONS. By Hock Peng Chan National University of Singapore and Subitted to the Annals of Statistics A GENERAL THEORY OF PARTICLE FILTERS IN HIDDEN MARKOV MODELS AND SOME APPLICATIONS By Hock Peng Chan National University of Singaore and By Tze Leung Lai Stanford University

More information

Phase field modelling of microstructural evolution using the Cahn-Hilliard equation: A report to accompany CH-muSE

Phase field modelling of microstructural evolution using the Cahn-Hilliard equation: A report to accompany CH-muSE Phase field odelling of icrostructural evolution using the Cahn-Hilliard equation: A reort to accoany CH-uSE 1 The Cahn-Hilliard equation Let us consider a binary alloy of average coosition c 0 occuying

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

POWER RESIDUES OF FOURIER COEFFICIENTS OF MODULAR FORMS

POWER RESIDUES OF FOURIER COEFFICIENTS OF MODULAR FORMS POWER RESIDUES OF FOURIER COEFFICIENTS OF MODULAR FORMS TOM WESTON Abstract Let ρ : G Q > GL nq l be a otivic l-adic Galois reresentation For fixed > 1 we initiate an investigation of the density of the

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

A STUDY OF UNSUPERVISED CHANGE DETECTION BASED ON TEST STATISTIC AND GAUSSIAN MIXTURE MODEL USING POLSAR SAR DATA

A STUDY OF UNSUPERVISED CHANGE DETECTION BASED ON TEST STATISTIC AND GAUSSIAN MIXTURE MODEL USING POLSAR SAR DATA A STUDY OF UNSUPERVISED CHANGE DETECTION BASED ON TEST STATISTIC AND GAUSSIAN MIXTURE MODEL USING POLSAR SAR DATA Yang Yuxin a, Liu Wensong b* a Middle School Affiliated to Central China Noral University,

More information

The Semantics of Data Flow Diagrams. P.D. Bruza. Th.P. van der Weide. Dept. of Information Systems, University of Nijmegen

The Semantics of Data Flow Diagrams. P.D. Bruza. Th.P. van der Weide. Dept. of Information Systems, University of Nijmegen The Seantics of Data Flow Diagras P.D. Bruza Th.P. van der Weide Det. of Inforation Systes, University of Nijegen Toernooiveld, NL-6525 ED Nijegen, The Netherlands July 26, 1993 Abstract In this article

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

New Set of Rotationally Legendre Moment Invariants

New Set of Rotationally Legendre Moment Invariants New Set of Rotationally Legendre Moent Invariants Khalid M. Hosny Abstract Orthogonal Legendre oents are used in several attern recognition and iage rocessing alications. Translation and scale Legendre

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volue 19, 2013 htt://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Physical Acoustics Session 1PAb: Acoustics in Microfluidics and for Particle

More information

Metric Cotype. 1 Introduction. Manor Mendel California Institute of Technology. Assaf Naor Microsoft Research

Metric Cotype. 1 Introduction. Manor Mendel California Institute of Technology. Assaf Naor Microsoft Research Metric Cotye Manor Mendel California Institute of Technology Assaf Naor Microsoft Research Abstract We introduce the notion of cotye of a etric sace, and rove that for Banach saces it coincides with the

More information

Anomalous heat capacity for nematic MBBA near clearing point

Anomalous heat capacity for nematic MBBA near clearing point Journal of Physics: Conference Series Anoalous heat caacity for neatic MA near clearing oint To cite this article: D A Lukashenko and M Khasanov J. Phys.: Conf. Ser. 394 View the article online for udates

More information

ON THE INTEGER PART OF A POSITIVE INTEGER S K-TH ROOT

ON THE INTEGER PART OF A POSITIVE INTEGER S K-TH ROOT ON THE INTEGER PART OF A POSITIVE INTEGER S K-TH ROOT Yang Hai Research Center for Basic Science, Xi an Jiaotong University, Xi an, Shaanxi, P.R.China Fu Ruiqin School of Science, Xi an Shiyou University,

More information

One- and multidimensional Fibonacci search very easy!

One- and multidimensional Fibonacci search very easy! One and ultidiensional ibonacci search One and ultidiensional ibonacci search very easy!. Content. Introduction / Preliinary rearks...page. Short descrition of the ibonacci nubers...page 3. Descrition

More information

Review from last time Time Series Analysis, Fall 2007 Professor Anna Mikusheva Paul Schrimpf, scribe October 23, 2007.

Review from last time Time Series Analysis, Fall 2007 Professor Anna Mikusheva Paul Schrimpf, scribe October 23, 2007. Review fro last tie 4384 ie Series Analsis, Fall 007 Professor Anna Mikusheva Paul Schrif, scribe October 3, 007 Lecture 3 Unit Roots Review fro last tie Let t be a rando walk t = ρ t + ɛ t, ρ = where

More information

Wind Loading for the Design of the Solar Tower

Wind Loading for the Design of the Solar Tower Wind Loading for the Design of the Solar Tower H.-J. Nieann, R. Höffer Faculty of Civil Engineering, Ruhr-University Bochu, Gerany Keywords: solar chiney, wind seed, wind turbulence and wind load u to

More information

Suppress Parameter Cross-talk for Elastic Full-waveform Inversion: Parameterization and Acquisition Geometry

Suppress Parameter Cross-talk for Elastic Full-waveform Inversion: Parameterization and Acquisition Geometry Suress Paraeter Cross-talk for Elastic Full-wavefor Inversion: Paraeterization and Acquisition Geoetry Wenyong Pan and Kris Innanen CREWES Project, Deartent of Geoscience, University of Calgary Suary Full-wavefor

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

The Number of Information Bits Related to the Minimum Quantum and Gravitational Masses in a Vacuum Dominated Universe

The Number of Information Bits Related to the Minimum Quantum and Gravitational Masses in a Vacuum Dominated Universe Wilfrid Laurier University Scholars Coons @ Laurier Physics and Couter Science Faculty Publications Physics and Couter Science 01 The uber of Inforation Bits Related to the Miniu Quantu and Gravitational

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

VACUUM chambers have wide applications for a variety of

VACUUM chambers have wide applications for a variety of JOURNAL OF THERMOPHYSICS AND HEAT TRANSFER Vol. 2, No., January March 27 Free Molecular Flows Between Two Plates Equied with Pus Chunei Cai ZONA Technology, Inc., Scottsdale, Arizona 85258 Iain D. Boyd

More information

CALIFORNIA INSTITUTE OF TECHNOLOGY

CALIFORNIA INSTITUTE OF TECHNOLOGY CALIFORNIA INSIUE OF ECHNOLOGY Control and Dynaical Systes Course Project CDS 270 Instructor: Eugene Lavretsky, eugene.lavretsky@boeing.co Sring 2007 Project Outline: his roject consists of two flight

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Algorithm Design and Implementation for a Mathematical Model of Factoring Integers

Algorithm Design and Implementation for a Mathematical Model of Factoring Integers IOSR Journal of Matheatics (IOSR-JM e-iss: 78-578, -ISS: 39-765X. Volue 3, Issue I Ver. VI (Jan. - Feb. 07, PP 37-4 www.iosrjournals.org Algorith Design leentation for a Matheatical Model of Factoring

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

An Investigation into the Effects of Roll Gyradius on Experimental Testing and Numerical Simulation: Troubleshooting Emergent Issues

An Investigation into the Effects of Roll Gyradius on Experimental Testing and Numerical Simulation: Troubleshooting Emergent Issues An Investigation into the Effects of Roll Gyradius on Exeriental esting and Nuerical Siulation: roubleshooting Eergent Issues Edward Dawson Maritie Division Defence Science and echnology Organisation DSO-N-140

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information