Universal algorithms for learning theory Part II : piecewise polynomial functions

Size: px
Start display at page:

Download "Universal algorithms for learning theory Part II : piecewise polynomial functions"

Transcription

1 Universal algoriths for learning theory Part II : piecewise polynoial functions Peter Binev, Albert Cohen, Wolfgang Dahen, and Ronald DeVore Deceber 6, 2005 Abstract This paper is concerned with estiating the regression function f ρ in supervised learning by utilizing piecewise polynoial approxiations on adaptively generated partitions. The ain point of interest is algoriths that with high probability are optial in ters of the least square error achieved for a given nuber of observed data. In a previous paper [1], we have developed for each β > 0 an algorith for piecewise constant approxiation which is proven to provide such optial order estiates with probability larger than 1 β. In this paper, we consider the case of higher degree polynoials. We show that for general probability easures ρ epirical least squares iniization will not provide optial error estiates with high probability. We go further in identifying certain conditions on the probability easure ρ which will allow optial estiates with high probability. Key words: Regression, universal piecewise polynoial estiators, optial convergence rates in probability, adaptive partitioning, thresholding on-line updates AMS Subject Classification: 62G08, 62G20, 41A25 1 Introduction This paper is concerned with providing estiates in probability for the approxiation of the regression function in supervised learning when using piecewise polynoials on adaptively generated partitions. We shall work in the following setting. We suppose that ρ is an unknown easure on a product space Z := X Y, where X is a bounded doain of R d and Y = R. Given independent rando observations z i = (x i, y i ), i = 1,...,, This research was supported by the Office of Naval Resarch Contracts ONR-N , ONR/DEPSCoR N and ONR/DEPSCoR N ; the Ary Research Office Contract DAAD ; the AFOSR Contract UF/USAF F ; and NSF contracts DMS and DMS and by the European Counity s Huan Potential Prograe under contract HPRN-CT , (BREAKING COMPLEXITY) and by the National Science Foundation Grant DMS

2 identically distributed according to ρ, we are interested in estiating the regression function f ρ (x) defined as the conditional expectation of the rando variable y at x: f ρ (x) := ydρ(y x) (1.1) Y with ρ(y x) the conditional probability easure on Y with respect to x. We shall use z = {z 1,..., z } Z to denote the set of observations. One of the goals of learning is to provide estiates under inial restrictions on the easure ρ since this easure is unknown to us. In this paper, we shall always work under the assuption that y M, (1.2) alost surely. It follows in particular that f ρ M. This property of ρ can often be inferred in practical applications. We denote by ρ X the arginal probability easure on X defined by We shall assue that ρ X is a Borel easure on X. We have ρ X (S) := ρ(s Y ). (1.3) dρ(x, y) = dρ(y x)dρ X (x). (1.4) It is easy to check that f ρ is the iniizer of the risk functional E(f) := (y f(x)) 2 dρ, (1.5) Z over f L 2 (X, ρ X ) where this space consists of all functions fro X to Y which are square integrable with respect to ρ X. In fact, one has where E(f) = E(f ρ ) + f f ρ 2, f L 2 (X, ρ X ), (1.6) := L2 (X,ρ X ). (1.7) Our objective will be to find an estiator f z for f ρ based on z such that the quantity f z f ρ is sall with high probability. This type of regression proble is referred to as distribution-free. A recent survey on distribution free regression theory is provided in the book [8], which includes ost existing approaches as well as the analysis of their rate of convergence in the expectation sense. A coon approach to this proble is to choose an hypothesis (or odel) class H and then to define f z, in analogy to (1.5), as the iniizer of the epirical risk f z := argin f H E z (f), with E z (f) := 1 (y j f(x j )) 2. (1.8) j=1 2

3 In other words, f z is the best approxiation to (y j ) j=1 fro H in the the epirical nor g 2 := 1 g(x j ) 2. (1.9) j=1 Typically, H = H depends on a finite nuber n = n() of paraeters. In soe algoriths, the nuber n is chosen using an a priori assuption on f ρ. We want to avoid such prior assuptions. In other procedures, the nuber n is adapted to the data and thereby avoids any a priori assuptions. We shall be interested in estiators of this type. The usual way of evaluating the perforance of the estiator f z is by studying its convergence either in probability or in expectation, i.e. the rate of decay of the quantities P{ f ρ f z η}, η > 0 or E( f ρ f z 2 ) (1.10) as the saple size increases. Here both the expectation and the probability are taken with respect to the product easure ρ defined on Z. Estiations in probability are to be preferred since they give ore inforation about the success of a particular algorith and they autoatically yield an estiate in expectation by integrating with respect to η. Much ore is known about the perforance of algoriths in expectation than in probability as we shall explain below. The present paper will be ainly concerned about estiates in probability and we shall show that this proble has soe interesting twists. Estiates for the decay of the quantities in (1.10) are usually obtained under certain assuptions (called priors) on f ρ. We ephasize that the algoriths should not depend on prior assuptions on f ρ. Only in the analysis of the algoriths do we ipose such prior assuptions in order to see how well the algorith perfors. Priors on f ρ are typically expressed by a condition of the type f ρ Θ where Θ is a class of functions that necessarily ust be contained in L 2 (X, ρ X ). If we wish the error, as easured in (1.10), to tend to zero as the nuber of saples tends to infinity then we necessarily need that Θ is a copact subset of L 2 (X, ρ X ). There are three coon ways to easure the copactness of a set Θ: (i) inial coverings, (ii) soothness conditions on the eleents of Θ, (iii) the rate of approxiation of the eleents of Θ by a specific approxiation process. We have discussed the advantages and liitations of each of these approaches in [1] (see also [6]). In the present paper we shall take the view of (iii) and seek estiators which are optial for a certain collection of priors of this type. Describing copactness in this way provides a bench ark for what could be achieved at best by a concrete estiator. Our previous work [1] has considered the special case of approxiation using piecewise constants on adaptively generated partitions. In that case, we have introduced algoriths that we prove to be optial in the sense that their rate of convergence is best possible, for a large collection of prior classes, aong all ethods that utilize piecewise constant approxiation on adaptively generated partitions based on isotropic refineents. Moreover, the ethods we proposed in [1] had certain aesthetic and nuerical advantages. For exaple, they are ipleented by a siple thresholding procedure that can be done on line. This eans that in the case of streaing data only a sall nuber of updates are necessary as new data appear. Also, the analysis of our ethods provided not only estiates in expectation but also the estiates in probability which we seek. 3

4 On the other hand, fro an approxiation theoretic point of view, using just piecewise constants severely liits the range of error decay rates that can be obtained even for very regular approxiands. Much better rates can be expected when eploying higher order piecewise polynoials which would result in equally local and, due to higher accuracy, overall ore econoical procedures. However, in this previous work, we have purposefully not considered the general case of piecewise polynoial approxiation because we had already identified certain notable - perhaps at first sight surprising - distinctions with the piecewise constant case. The purpose of the present paper is to analyze the case of general piecewise polynoial approxiation, and in particular, to draw out these distinctions. We ention a few of these in this introduction. In the piecewise constant case, we have shown that estiators built on epirical risk iniization (1.8) are guaranteed with high probability to approxiate the regression function with optial accuracy (in ters of rates of convergence). Here, by high probability we ean that the probability of not providing an optial order of approxiation tends to zero faster than β for any prescribed β > 0. We shall show in 3 that in general such probability estiates do not hold when using epirical risk iniization with piecewise polynoial approxiation. This eans that if we seek estiators which perfor well in probability then either we ust assue soething ore about the underlying probability easure ρ or we ust find an alternative to epirical risk iniization. In 4 we put soe additional restrictions on the easure ρ X and show that under these restrictions, we can again design algoriths based on epirical risk iniization which perfor optially with high probability. While, as we have already entioned, these assuption on ρ X are undesirable, we believe that in view of the counter exaple of 3 they represent roughly what can be done if one proceeds only with epirical risk iniization. 2 Approxiating the Regression Function: General Strategies In studying the estiation of the regression function, the question arises at the outset as to what are the best approxiation ethods to use in deriving algoriths for approxiating f ρ and therefore indirectly in defining prior classes? With no additional knowledge of ρ (and thereby f ρ ) there is no general answer to this question. However, we can draw soe distinctions between certain strategies. Suppose that we seek to approxiate f ρ by the eleents fro a hypothesis class H = Σ n. Here the paraeter n easures the coplexity associated to the process. In the case of approxiation by eleents fro linear spaces we will take the space Σ n to be of diension n. For nonlinear ethods, the space Σ n is not linear and now n represents the nuber of paraeters used in the approxiation. For exaple, if we choose to approxiate by piecewise polynoials on partitions with the degree r of the polynoials fixed then n could be chosen as the nuber of cells in the partition. The potential effectiveness of the approxiation process for our regression proble would be easured by the error of approxiation in the L 2 (X, ρ X ) nor. We define this error for a function 4

5 g L 2 (X, ρ X ) by E n (g) := E(g, Σ n ) := inf S Σ n g S, n = 1, 2,.... (2.1) If we have two approxiation ethods corresponding to sequences of approxiation spaces (Σ n ) and (Σ n), then the second process would be superior to the first in ters of rates of approxiation if E n(g) CE n (g) for all g and an absolute constant C > 0. For exaple, approxiation using piecewise linear functions would in this sense be superior to using approxiation by piecewise constants. In our learning context however, there are other considerations since: (i) the rate of approxiation need not translate directly into results about estiating f ρ because of the uncertainty in our observations, (ii) it ay be that the superior approxiation ethod is in fact uch ore difficult (or ipossible) to ipleent in practice. For exaple, a typical nonlinear ethod ay consist of finding an approxiation to g fro a faily of linear spaces each of diension N. The larger the faily the ore powerful the approxiation ethod. However, too large of a faily will generally ake the nuerical ipleentation of this ethod of approxiation ipossible. Suppose that we have chosen the space Σ n to be used as our hypothesis class H in the approxiation of f ρ fro our given data z. How should we define our approxiation? As we have noted in the introduction, the ost coon approach is epirical risk iniization which gives the function ˆf z := ˆf z,σn defined by (1.8). However, since we know f ρ M, the approxiation will be iproved if we post-truncate ˆf z by M. For this, we define the truncation operator for any real nuber x and define T M (x) := in( x, M) sign(x) (2.2) f z := f z,h := T M ( ˆf z,h ). (2.3) There are general results that provide estiates for how well f z approxiates f ρ. One such estiate given in [8] (see Theore 11.3) applies when H is a linear space of diension n and gives 1 E( f ρ f Z 2 ) < n log() + inf g H f ρ g 2. (2.4) The second ter is the bias and equals our approxiation error E n (f ρ ) for approxiation using the eleents of H. The first ter is the variance which bounds the error due to uncertainty. One can derive rates of convergence in expectation by balancing both ters (see [8] and [6]) for specific applications. The deficiency of this approach is that one needs to know the behavior of E n (f ρ ) in order to choose the best value of n and this requires a priori knowledge of f ρ. There is a general procedure known as odel selection which circuvents this difficulty and tries to autoatically choose a good value of n (depending on f ρ ) by introducing a penalty ter. Suppose that (Σ n ) n=1 is a faily of linear spaces each of diension n. For each C 1 Here and later in this paper we use the notation A < B to ean A CB for soe absolute constant 5

6 n = 1, 2,...,, we have the corresponding estiator f z,σn defined by (2.3) and the epirical error E n,z := 1 (y j f z,σn (x j )) 2. (2.5) j=1 Notice that E n,z is a coputable quantity which we can view as an estiate for E n (f ρ ). In coplexity regularization, one chooses a value of n by { n := n (z) := argin E n,z + n log }. (2.6) We now define f z := f z,σn (2.7) as our estiator to f ρ. One can then prove (see Chapter 12 of [8]) that whenever f ρ can be approxiated to accuracy E n (f ρ ) Mn s for soe s > 0, then 2 [ log E( f ρ f z 2 C ] 2s 2s+1 (2.8) which save for the logarith is an optial rate estiation in expectation. For a certain range of s, one can also prove siilar estiates in probability (see [6]). Notice that the estiator did not need to have knowledge of s and nevertheless obtains the optial perforance. Model selection can also be applied in the setting of nonlinear approxiation, i.e. when the spaces Σ n are nonlinear but in this case, one needs to invoke conditions on the copatibility of the penalty with the coplexity of the approxiation process as easured by an entropy restriction. We refer the reader to Chapter 12 of [8] for a ore detailed discussion of this topic and will briefly take up this point again in 2.3. Let us also note that the penalty approach is not always copatible with the practical requireent of on-line coputations. By on-line coputation, we ean that the estiator for the saple size can be derived by a siple update of the estiator for the saple size 1. In penalty ethods, the optiization proble needs to be globally re-solved when adding a new saple. However, when there is additional structure in the approxiation process such as the adaptive partitioning that we discuss in the next section, then there are algoriths that circuvent this difficulty (see the discussion of CART algoriths given in the following section). 2.1 Adaptive Partitioning In this paper, we shall be interested in approxiation by piecewise polynoials on partitions generated by adaptive partitioning. We shall restrict our discussion to the case X = [0, 1] d and the case of dyadic partitions. However, all results would follow in the ore general setting described in [1]. 2 We use the following conventions concerning constants throughout this paper. Constants like C, c, c depend on the specified paraeters but they ay vary at each occurrence, even in the sae line. We shall indicate the dependence of the constant on other paraeters whenever this is iportant. 6

7 Let D j = D j (X) be the collection of dyadic subcubes of X of sidelength 2 j and D := j=0d j. These cubes are naturally aligned on a tree T = T (D). Each node of the tree T corresponds to a cube I D. If I D j, then its children are the 2 d dyadic cubes J D j+1 with J I. We denote the set of children of I by C(I). We call I the parent of each such child J and write I = P (J). A proper subtree T 0 of T is a collection of nodes of T with the properties: (i) the root node I = X is in T 0, (ii) if I X is in T 0 then its parent is also in T 0. We obtain (dyadic) partitions Λ of X fro finite proper subtrees T 0 of T. Given any such T 0 the outer leaves of T 0 consist of all J T such that J / T 0 but P (J) is in T 0. The collection Λ = Λ(T 0 ) of outer leaves of T 0 is a partition of X into dyadic cubes. It is easily checked that #(T 0 ) #(Λ) 2 d #(T 0 ). (2.9) A unifor partition of X into dyadic cubes consists of all dyadic cubes in D j (X) for soe j 0. Thus, each cube in a unifor partition has the sae easure 2 jd. Another way of generating partitions is through soe refineent strategy. One begins at the root X and decides whether to refine X (i.e. subdivide X) based on soe refineent criteria. If X is subdivided then one exaines each child and decides whether or not to refine such a child based on the refineent strategy. Partitions obtained this way are called adaptive. We let Π K denote the space of ultivariate polynoials of total degree K with K 0 a fixed integer. In the analysis we present, the space Π K could be replaced by any space of functions of fixed finite diension without affecting our general discussion. Given a dyadic cube I D, and a function f L 2 (X, ρ X ), we denote by p I (f) the best approxiation to f on I: p I (f) := argin p ΠK f p L2 (I,ρ X ). (2.10) Given K > 0 and a partition Λ, let us denote by SΛ K the space of piecewise polynoial functions of degree K subordinate to Λ. Each S SΛ K can be written S = I Λ p I χ I, p I Π K, (2.11) where for G X we denote by χ G the indicator function, i.e. χ G (x) = 1 for x G and χ G (x) = 0 for x G. We shall consider the approxiation of a given function f L 2 (X, ρ X ) by the eleents of SΛ K. The best approxiation to f in this space is given by P Λ f := I Λ p I (f)χ I. (2.12) We shall be interested in two types of approxiation corresponding to unifor refineent and adaptive refineent. We first discuss unifor refineent. Let E n (f) := f P Λn f, n = 0, 1,... (2.13) which is the error for unifor refineent. We shall denote by A s the approxiation class consisting of all functions f L 2 (X, ρ X ) such that E n (f) M 0 2 nds, n = 0, 1,.... (2.14) 7

8 Notice that #(Λ n ) = 2 nd so that the decay in (2.14) is like N s with N the nuber of eleents in the partition. The sallest M 0 for which (2.14) holds serves to define the sei-nor f A s on A s. The space A s can be viewed as a soothness space of order ds > 0 with soothness easured with respect to ρ X. For exaple, if ρ X is the Lebesgue easure then A s/d = B (L s 2 ), 0 < s 1, with equivalent nors. Here B (L s 2 ) is a Besov space. For s < K one can take sup t t s ω K (f, t) L2 as a nor for this space, where ω K (f, t) L2 is the Kth order odulus of soothness in L 2 (see [2] for the definition and properties of Besov spaces). Instead of working with a priori fixed partitions there is a second kind of approxiation where the partition is generated adaptively and will vary with f. Adaptive partitions are typically generated by using soe refineent criterion that deterines whether or not to subdivide a given cell. We shall use a refineent criterion that is otivated by adaptive wavelet constructions such as those given in [4] for iage copression. Given a function f L 2 (X, ρ X ), we define the local atos ψ I (f) := p J (f)χ J p I (f)χ I, I X, ψ X (f) := p X (f), (2.15) and Clearly, we have ɛ I (f) := ψ I (f). (2.16) f = I D ψ I (f), (2.17) and since the ψ I are utually orthogonal, we also have f 2 L 2 (X,ρ X ) = I D ɛ I (f) 2. (2.18) The nuber ɛ I (f) gives the iproveent in the L 2 (X, ρ X ) error squared when the cell I is refined. We let T (f, η) be the sallest proper tree that contains all I D such that ɛ I (f) > η. Corresponding to this tree we have the partition Λ(f, η) consisting of the outer leaves of T (f, η). We shall define soe new soothness spaces B s which easure the regularity of a given function f by the size of the tree T (f, η). Given s > 0, we let B s be the collection of all f L 2 (X, ρ X ) such that for p = (s + 1/2) 1/2, the following is finite f p Bs := sup η p #(T (f, η)). (2.19) We obtain the nor for B s by adding f to f B s. One can show that f P Λ(f,η) f C s f 1 2s+1 B s η>0 η 2s 2s+1 Cs f B sn s, N := #(T (f, η)), (2.20) where the constant C s depends only on s. The proof of this estiate can be based on the sae strategy as used in [4] where a siilar result is proven in the case of the Lebesgue 8

9 easure: one introduces the trees T j := T (f, 2 j η) which have the property T j T j+1, and then writes f P Λ(f,η) f 2 = I / T (f,η) ψ I 2 = j 0 I T j+1 \T j ψ I 2 j 0 #(T j+1)(2 j η) 2 f p B j 0 s (2 j η) 2 p, which gives (2.20) with C s := j 0 2(p 2)j. Invoking (2.9), it follows that every function f B s can be approxiated to order O(N s ) by P Λ f for soe partition Λ with #(Λ) = N. This should be contrasted with A s which has the sae approxiation order for the unifor partition. It is easy to see that B s is larger than A s. In classical settings, the class B s is well understood. For exaple, in the case of Lebesgue easure and dyadic partitions we know that each Besov space B s q(l τ ) with τ > (s/d + 1/2) 1 and 0 < q, is contained in B s/d (see [4]). This should be copared with the A s where we know that A s/d = B s (L 2 ) as we have noted earlier. 2.2 An adaptive algorith for learning In the learning context, we cannot use the algorith described in the previous section since the regression function f ρ and the easure ρ are not known to us. Instead we shall use an epirical version of this adaptive procedure. Given the data z and any Borel set I X, we define p I,z := argin p ΠK 1 When there are no x i in I, we set p I,z = 0. Given a partition Λ of X we define the estiator f z as (p(x i ) y i ) 2 χ I (x i ). (2.21) i=1 f z = f z,λ := I Λ T M (p I,z )χ I (2.22) with T M the truncation operator defined earlier. Note that the epirical iniization (2.21) is not done over the truncated polynoials, since this is not nuerically feasible. Instead, truncation is only used as a post processing. As in the previous section, we have two ways to generate partitions Λ. The first is to use the unifor partition Λ n consisting of all dyadic cells in D n. The second is to define an epirical analogue of the ɛ I. For each cell I in the aster tree T, we define ɛ I (z) := T M ( p J,z χ J p I,z χ I ), (2.23) where is the epirical nor defined in (1.9). A data based adaptive partitioning requires liiting the depth of corresponding trees. To this end, let γ > 0 be an arbitrary but fixed constant. We define j 0 = j 0 (, γ) as 9

10 the sallest integer j such that 2 jd τ 1/γ. We then consider the sallest tree T (τ, z) which contains the set Σ(z, ) := {I j j0 Λ j : ɛ I (z) τ }, (2.24) where τ is a threshold to be set further. We then define the partition Λ = Λ(τ, z) associated to this tree and the corresponding estiator f z := f z,λ. Obviously, the role of the integer j 0 is to liit the depth search for the coefficient ɛ I (z) which are larger than the threshold τ. Without this restriction, the tree T (τ, z) could be infinite preventing a nuerical ipleentation. The essential steps of the adaptive algorith in the present setting read as follows: Algorith: Given z, choose γ > 0, τ := κ log and for j 0 (, γ) deterine the set Σ(z, ) according to (2.24); for T (τ, z), Λ(τ, z) and copute f z according to (2.22). For further coents concerning the treatent of streaing data we refer to an analogous strategy outlined in [1]. In our previous work [1], we have analyzed the above algorith in the case of piecewise constant approxiation and we have proved the following result. Theore 2.1 Let β, γ > 0 be arbitrary. Then, using piecewise constant approxiations in the above schee, i.e. K = 0, there exists κ 0 = κ 0 (β, γ, M) such that if κ κ 0 in the definition of τ, then whenever f ρ A γ B s for soe s > 0, the following concentration estiate holds P { f ρ f z c ( log ) s 2s+1 where the constants c and C are independent of. } C β, (2.25) Let us ake soe rearks on this theore. First note that truncation does not play any role in the case of piecewise constant approxiation since in that case the constant of best epirical approxiation autoatically is M in absolute value. The theore gives an estiate for the error f ρ f z in probability which is the type of estiate we are looking for in this paper. Fro this one obtains a corresponding estiate in expectation. The order of approxiation can be shown to be optial save for the logarithic ter by using the results on lower estiates fro [6]. Finally, note that the role of the space A γ is a inor one since the only assuption on γ is that it be positive. This assuption erely guarantees that a finite depth search will behave close to an infinite depth search. The goal of the present paper is to deterine whether the analogue of Theore 2.1 holds when piecewise polynoials are used in place of piecewise constants. We shall see that this is not the case by eans of a counterexaple in 3. We shall then show that such estiates are possible if we place restrictions on the easure ρ X. 10

11 2.3 Estiates in Expectation Before starting the analysis of results in probability for the higher order piecewise polynoial case, let us point out that it is possible to derive estiates in expectation for adaptive partitions constructed by other strategies, such as odel selection by coplexity regularization. In particular, the following result can easily be derived fro Theore 12.1 in [8]. Theore 2.2 Let γ > 0 be arbitrary and let j 0 = j 0 (γ, ) be defined again as the sallest integer j such that 2 jd (log /) 1/2γ. Consider the set M of all partitions Λ induced by proper trees T j j0 Λ j. Then, there exists κ 0 = κ 0 (d, K) such that if pen (Λ) = κ log #(Λ). for soe κ κ 0, the estiator defined by f z := f z,λ with Λ := argin Λ M { fz,λ y 2 + pen (Λ) }, satisfies ( log ) s 2s+1 E( f ρ f z ) C, = 1, 2,..., (2.26) if f ρ A γ B s where the constant C depends on κ, M, f ρ B s, f ρ A s, but not on. Let us also reark that the search of the optial partition Λ in the above theore can be perfored at a reasonable coputational cost using a CART algorith (see e.g. [3] or [7]). Note that our approach for selecting the appropriate partition differs fro the CART algoriths, in the sense that it is based on a thresholding procedure rather than solving an optiization proble. 3 A counteraple We begin by showing that in general we cannot obtain optial estiates with high probability when using epirical risk iniization with piecewise polynoials of degree larger than zero. We shall first consider the case of approxiation by linear functions on the interval X = [ 1, 1] for the bound M = 1. For each = 1, 2,..., we will construct a easure ρ = ρ on [ 1, 1] [ 1, 1] for which epirical risk iniization does not perfor well in probability. Let p be the polynoial p = argin g Π1 E( g y 2 ) = argin g Π1 f ρ g 2, (3.1) with f ρ the regression function and the L 2 (X, ρ X ) nor. Consider also the epirical least square iniizer ˆp = argin g Π1 i=1 g(x i ) y i 2. (3.2) 11

12 We are interested in the concentration properties between T (p) and T (ˆp) in the etric, where T (u) = sign(u) ax{1, u } is the truncation operator. We shall prove the following result, which expresses the fact that we cannot hope for a distribution free concentration inequality with a fast rate of decay of the probability. Lea 3.1 Given any β > 2, there exist absolute constants c, c > 0 such that for each = 1, 2,..., there is a distribution ρ = ρ such that the following inequalities hold P{ T (p) T (ˆp) c} c β+1 (3.3) and On the other hand, we have P{ f ρ T (ˆp) c} c β+1. (3.4) for an absolute constant C 0. f ρ T (p) 2 C 0 [ 2β+4 + β ] (3.5) Reark 3.2 Note that this clearly iplies the sae results as in (3.3) and (3.4) with the truncation operator reoved. By contrast, for least square approxiation by constants q, we have (see [1]) for all ρ, and η P{ q ˆq η} < e cη2. (3.6) Proof of Lea 3.1: In order to prove this result, we consider the probability easure ρ X := (1/2 κ)(δ γ + δ γ ) + κ(δ 1 + δ 1 ), (3.7) where γ := γ = 1 3 and κ := κ := β. We then define ρ = ρ copletely by y(γ) = 1, y( γ) = 1, y(±1) = 0, with probability 1. (3.8) Therefore, there is no randoness in the y direction. It follows that We next proceed in three steps. f ρ (γ) = 1, f ρ ( γ) = 1, f ρ (±1) = 0. (3.9) 1. Properties of p: By syetry, the linear function that best approxiates f ρ in L 2 (ρ X ) is of the for p(x) = ax, where a iniizes We therefore find that F (a) = (1/2 κ)(aγ 1) 2 + κa 2. (3.10) (1/2 κ)γ2 p(±γ) = ±aγ = ± (1/2 κ)γ 2 + κ = ±1 + O( β+2 ). (3.11) 12

13 This shows that with C 0 an absolute constant. f ρ T p 2 C 0 2β+4 + 2κ C 0 [ 2β+4 + β ] (3.12) 2. Properties of ˆp: We can write the epirical least square polynoial ˆp as ˆp(x) = ˆb + â(x ˆξ), (3.13) where ˆb = 1 i=1 y i and ˆξ := 1 i=1 x i. (Notice that 1 and x ˆξ are orthogonal with respect to the epirical easure 1 i=1 δ x i.) Fro ˆp(γ) ˆp( γ) 2γ = â = ˆp(ˆξ) ˆp(γ), (3.14) (ˆξ γ) it follows that ˆp(γ) = ˆp( γ)(1 + 2γ ˆξ γ ) 1 + ˆp(ˆξ)(1 + ˆξ γ 2γ ) 1. (3.15) Since ˆp(ˆξ) = ˆb [ 1, 1], (3.15) shows that whenever ˆξ 2γ, then either ˆp(γ) 1/2 or ˆp( γ) 1/2. It follows that whenever ˆξ 2γ, we have f ρ T (ˆp) 2 (1/2 κ)(1/2) 2 c 2 (3.16) with c an absolute constant. Using (3.12), we see that (3.3) is also satisfied provided that P{ˆξ > 2γ} c β Study of ˆξ: We consider the event where x i = 1 for one i {1,, } and x j = γ or γ for the other j i. In such an event, we have The probability of this event is ˆξ 1 (1 ( 1)γ) > 1 2 (1 1/3) = 3 = 2γ. (3.17) P = κ(1 2κ) 1 c β+1. (3.18) This concludes the proof of (3.3). Let us now adjust the exaple of the lea to give inforation about piecewise linear approxiation on adaptively generated dyadic partitions. We first note that we can rescale the easure of the lea to any given interval I. We choose a dyadic interval I of length 2 and scale the easure of the lea to that interval. We denote this easure again by ρ. The regression function f ρ will be denoted by f and it will take values ±1 at the rescaled points corresponding to γ and γ and will take the value 0 everywhere else. We know fro Lea 3.1 that we can approxiate f by a single polynoial to accuracy f ρ p 2 C 0 [ 2β+4 + β ]. (3.19) 13

14 Also, if we allow dyadic partitions with ore than eleents, we can approxiate f ρ exactly. In other words, f ρ is in B s with s = in(β 2, β/2). On the other hand, any adaptively generated partition with at ost eleents will have an interval J containing I. The epirical data sets will be rescaled to I and the epirical ˆp J s will all be the epirical linear functions of Lea 3.1 scaled to I. Hence, for any of the bad draws z of Lea 3.1, we will have f ρ ˆf z c (3.20) on a set of z with probability larger than c β+1. This shows that epirical least squares using piecewise linear functions on adaptively generated partitions will not provide optial bounds with high probability. Note that the above results are not in contradiction with optial estiates in expectation. The counter exaple also indicates, however, that the arguents leading to optial rates in expectation based on coplexity regularization cannot be expected to be refined towards estiates in probability. 4 Optial results in probability under regularity assuptions on ρ X In view of the results of the previous section, it is not possible to prove optial convergence rates with high probability for adaptive ethods based on piecewise polynoials for the general setting of the regression proble in learning. In this section, we want to show that if we ipose soe restrictions on the arginal easure ρ X, then high probability results are possible. We fix the value K of the polynoial space Π K in this section. For each dyadic cube I R d and any function f L 2 (X, ρ X ), we define p I (f) as in (2.10). This eans that given any dyadic partition Λ, the best approxiation to f fro S Λ is given by the projector P Λ f = I Λ p I (f)χ I. (4.1) We shall work under the following assuption in this section: Assuption A : There exists a constant C A > 0 such that for each dyadic cube I, there exists an L 2 (I, ρ X )-orthonoral basis (L I,k ) k=1,,λ of Π K (with λ the algebraic diension of Π K ) such that L I,k L (I) C A (ρ X (I)) 1/2, k = 1,..., λ. (4.2) Note that this assuption iplies that the least squares projection is uniquely defined on each I by λ p I (f) = f, L I,k L2 (I,ρ X )L I,k, (4.3) k=1 14

15 and in particular ρ X (I) 0. It also iplies that for all partitions Λ and for all f L (X), P Λ f L λc A f L, (4.4) i.e. the projectors P Λ are bounded in L independently of Λ. Indeed, fro Cauchy- Schwartz inequality, we have L I,k L1 (I,ρ X ) (ρ X (I)) 1/2 L I,k L2 (I,ρ X ) = (ρ X (I)) 1/2 (4.5) and therefore for all x I Λ (P Λ f)(x) λ f, L I,k χ I L I,k (x) λc A f L (I). (4.6) k=1 It is readily seen that Assuption A holds when ρ X is the Lebesgue easure dx. On the other hand, it ay not hold when ρ X is a very irregular easure. A particular siple case where Assuption A always holds is the following: Assuption B : We have dρ X = ω(x)dx and there exists a constant C B > 0 such that for all I D, there exists a convex subset D I with I B D and such that 0 < ρ X (I) C B I inf x D ω(x). This assuption is obviously fulfilled by weight functions such that 0 < c ω(x) C, but it is also fulfilled by functions which ay vanish (or go to + ) with a power-like behaviour at isolated points or lower diensional anifolds. Let us explain why Assuption B iplies Assuption A. Any convex doain D can be fraed by E D Ẽ where E is an ellipsoid and Ẽ a dilated version of E by a fixed factor depending only on the diension d. Therefore, if P is a polynoial with P L2 (I,ρ X ) = 1, we have P 2 L (I) C P 2 L (D) holds for a constant C depending only on the degree K, the spatial diension d and the constant B in Assuption B. Hence, using Assuption B, we further conclude P 2 L < (I) P 2 < L (D) D 1 P (x) 2 dx < < D 1 D ρ X(I) 1 D P (x) 2 ρ X (I) 1 I ω(x)dx I P (x) 2 dρ = ρ X (I) 1. Here, all the constants in the inequalities depend on d, K, B, C B but are independent of I T. 4.1 Probability estiates for the epirical estiator We shall now show that under Assuption A, we can estiate the discrepancy between the truncated least squares polynoial approxiation to f ρ and the truncated least squares 15

16 polynoial fit to the epirical data. This should be copared with the counterexaple of the last section which showed that for general ρ we do not have this property. Lea 4.1 There exists a constant c > 0 which depends on the constant C A in Assuption A, on the polynoial space diension λ = λ(k) of Π K and on the bound M, such that for all I D where c = 2(λ + λ 2 ), Proof : We clearly have P{ T M (p I )χ I T M (p I,z )χ I > η} ce cη2, (4.7) T M (p I )χ I T M (p I,z )χ I 2 4M 2 ρ X (I), (4.8) and therefore it suffices to prove (4.7) for those I such that η 2 < 4M 2 ρ X (I). We use the basis (L I,k ) k=1,,l of Assuption A to express p I and p I,z according to p I = λ c I,k L I,k and p I,z = k=1 λ c I,k (z)l I,k, (4.9) k=1 and we denote by c I and c I (z) the corresponding coordinate vectors. We next reark that since T M is a contraction it follows that T M (p I )χ I T M (p I,z )χ I p I χ I p I,z χ I, (4.10) T M (p I )χ I T M (p I,z )χ I 2 λ c I,k c I,k (z) 2 = c I c I (z) 2 l 2 (R λ ), (4.11) k=1 were l2 (R λ ) is the λ-diensional Euclidean nor. Introducing α I,k = yl I,k (x)χ I (x)dρ and α I,k (z) := 1 Z y i L I,k (x i )χ I (x i ), (4.12) with α I and α I (z) the corresponding coordinate vectors, we clearly have c I = α I. (4.13) On the other hand, we have G I (z)c I (z) = α I (z), where (G I (z)) k,l := 1 i=1 L I,k (x i )L I,l (x i ) = L I,k, L I,l, i=1 16

17 and c I (z) := 0 when there are no x i s in I. Therefore, when G I (z) is invertible, we can write c I (z) c I = G I (z) 1 α I (z) α I = G I (z) 1 [α I (z) G I (z)α I ], and therefore c I (z) c I l2 (R λ ) G I (z) 1 l2 (R λ ) ( α I (z) α I l2 (R λ ) + G I (z) I l2 (R ) α λ I l2 (R λ ) ), (4.14) where we also denote by G l2 (R λ ) the spectral nor for an λ λ atrix G. Since G I (z) I l2 (R λ ) 1/2 iplies G I (z) 1 l2 (R λ ) 2 it follows that c I (z) c I l2 (R λ ) η provided that we have jointly and By Bernstein s inequality, we have α I (z) α I l2 (R λ ) η 4, (4.15) { 1 G I (z) I l2 (R λ ) in 2, η } 4 α I 1 l 2. (4.16) (R λ ) P { α I (z) α I l2 (R λ ) > η 4 } P{ α I,k(z) α I,k > η for soe k} 4λ1/2 2λe η 2 c with c depending on λ where, on account of (4.2), and S := sup k σ 2 +Sη, (4.17) sup yl I,k (x) MC A ρ X (I) 1/2, x,y σ 2 := sup k E(y 2 L i,k 2 ) M 2. Since η 2 < 2M 2 ρ X (I), we thus have, up to a change in the constant c, { P α I (z) α I l2 (R λ ) > η } 2λe c η 2 M 2 +Mρ X (I) 1/2 η 2λe cη2, (4.18) 4 with c depending on λ, C A and M. In a siilar way, we obtain by Bernstein s inequality that P { G I (z) I l2 (R λ ) > ξ} P { (G I (z)) k,l δ k,l > ξ λ for soe (k, l)} λ 2 e ξ 2 c σ 2 +Sξ, (4.19) with c depending on λ, where now again by (4.2) S := sup k,l sup L I,k (x)l I,l (x) CAρ 2 X (I) 1, (4.20) x and σ 2 := sup k,l E( L I,k (x)l I,l (x) 2 ) C 2 Aρ X (I) 1. (4.21) 17

18 In the case ξ = 1 2 η 4 α I 1, we thus obtain fro (4.19), using that η 2 2M 2 ρ X (I), P { G I (z) I l2 (R λ ) > ξ} 2λ 2 e c ρ X (I) 1 2λ 2 e cη2, (4.22) with c depending on λ, C A and M. In the case ξ = η α 4 I 1 1, we notice that 2 α I l2 (R λ ) λ 1/2 M sup L I,k L 1 (I,ρ X ) λ 1/2 MC A ρ X (I) 1/2, (4.23) k because of Assuption A. It follows that ξ η(16λm 2 C 2 A ρ X(I)) 1/2 and the absolute value of the exponent in the exponential of (4.19) is bounded fro below by cη 2 ρ X (I) 1 24λM 2 C 4 A ρ X(I) 1 cη2, (4.24) with c depending on λ, C A and M. The proof of (4.7) is therefore coplete. Reark 4.2 The constant c in (4.7) depends on M and C A and behaves like (MC 2 A ) Learning on Fixed and Unifor Partitions Fro the basic estiate (4.7), we can iediately derive an estiate for an arbitrary but fixed partition Λ consisting of disjoint dyadic cubes. If Λ = N, we have P { f z,λ T M (P Λ f ρ ) > η} P { T M (p I )χ I T M (p I,z )χ I > η N 1/2 for soe I Λ}, which yields the following analogue to Theore 2.1 of [1]. Reark 4.3 Under Assuption A one has for any fixed integer K 0, any partition Λ and η > 0 η2 c P { f z,λ T M (P Λ f ρ ) > η} C 0 Ne N, (4.25) where N := #(Λ) and C 0 = C 0 (λ) and c = c(λ, M, C A ). We can then derive by integration over η > 0 an estiate in the ean square sense E ( f z,λ T M (P Λ f ρ ) 2 ) C N log N, (4.26) siilar to Corollary 2.2 of [1], with C = C(M, C A, λ). Cobining these estiates with the definition of the approxiation classes A s, we obtain siilar convergence rates as in Theore 2.3 of [1]. Theore 4.4 If f ρ A s and the estiator is defined by f z := f Λj,z with j chosen as the sallest integer such that 2 jd(1+2s), then given any β > 0, there exist constants log C = C(M, C A, λ, β) and c = c(m, C A, λ, β) such that { ( log ) } s 2s+1 P f ρ f z > ( c + f ρ A s) C β. (4.27) There also exists a constant C = C(M, C A, λ) such that ) ( log ) 2s E ( f ρ f z 2 (C + f ρ 2 A s) 2s+1. (4.28) 18

19 As we have noted earlier, the second estiate (4.28) could have been obtained in a different way, naely by using Theore 11.3 in [8] and without Assuption A. On the other hand, it is not clear that the probability estiate (4.27) could be obtained without this assuption. 4.3 Learning on Adaptive Partitions We now turn to the adaptive algorith as defined in the 2.2. Here, we shall extend Theore 2.5 of [1] in two ways. Recall that the depth of the tree is liited by j 0 = j 0 (, γ) the sallest integer j such that 2 jd τ 1/γ. Theore 4.5 We fix an arbitrary β 1 and γ > 0. If we take the threshold τ := log κ with κ κ 0 = κ 0 (γ, β, M, λ, d, C A ), then for any ρ such that f ρ A γ B s, s > 0, the adaptive algorith gives an f z satisfying the concentration estiate { ( log ) } s 2s+1 P f ρ f z c β, (4.29) with c = c(s, d, λ, β, f ρ B s, f ρ A γ). We also have the following expectation bound with C = C(s, λ, M, C A, d, β, f ρ B s, f ρ A γ). ( log ) 2s E ( f ρ f z 2 2s+1 ) C, (4.30) A defect of this theore is the dependency of κ 0 on C A which ay be unknown in our regression proble. We shall propose later another convergence theore where this defect is reoved. The strategy for proving Theore 4.5 is siilar to the one for Theore 2.5 in [1]. The idea is to show that the set of coefficients chosen by the adaptive epirical algorith are with high probability siilar to the set that would be chosen if the adaptive thresholding took place directly on f ρ. This will be established by probability estiates which control the discrepancy between ɛ I and ɛ I (z). This is given by the following result, which is a substitute to Lea 4.1 in [1]. Lea 4.6 For any η > 0 and any eleent I Λ j0, one has and P {ɛ I (z) η and ɛ I 8λC A η} c(1 + η C )e cη2 (4.31) P {ɛ I η and ɛ I (z) 4η} c(1 + η C )e cη2, (4.32) where c = c(λ, M, d), c = c(λ, M, C A, d) and C = C(λ, d). The proof of Lea 4.6 is rather different fro the counterpart in [1] and is postponed to the end of this section. Its usage in deriving the claied bounds in Theore 4.5, however, is very siilar to the proof in [1]. Because of a few changes due to the truncation 19

20 operator we briefly recall first its ain steps suppressing at ties details that could be recovered fro [1]. For any two partitions Λ 0, Λ 1 we denote by Λ 0 Λ 1, Λ 0 Λ 1 the partitions induced by the corresponding trees T 0 T 1, T 0 T 1, respectively. Recall that the data dependent partitions Λ(η, z), defined in (2.24), do not contain cubes fro levels higher than j 0. Likewise, for any threshold η > 0, we need the deterinistic analogue Λ(η) := Λ(f ρ, η), where Λ(f ρ, η) is the partition associated to the sallest tree T (f ρ, η) that contains all I for which ɛ I (f ρ ) > η. As in the proof of Theore 2.5 in [1], we estiate the error by with each ter now defined by f ρ f z,λ(τ,z) e 1 + e 2 + e 3 + e 4 (4.33) e 1 := f ρ T M (P Λ(τ,z) Λ(8C A λτ )f ρ ), e 2 := T M (P Λ(τ,z) Λ(8C A λτ )f ρ ) T M (P Λ(τ,z) Λ(τ /4)f ρ ), e 3 := T M (P Λ(τ,z) Λ(τ /4)f ρ ) f z,λ(τ,z) Λ(τ /4), e 4 := f z,λ(τ,z) Λ(τ /4) f z,λ(τ,z). The first ter e 1 is treated by approxiation ( e 1 C s (8C A λτ ) 2s 2s+1 ) s 2s+1 fρ B s + τ f ρ A γ c( f ρ B s + f ρ A γ), (4.34) log where c = c(s, C A, λ, κ). The second suand in the first estiate of (4.34) stes fro the fact that optial trees ight be clipped by the restriction to Λ j0 and this issing part exploits the assuption f ρ A γ. The third ter e 3 can be estiated by an inequality siilar to (4.25) although Λ(τ, z) Λ(τ /4) is a data-dependent partition. We use the fact that all the cubes in this partition are always contained in the tree consisting of T (f ρ, τ /4) copleted by its outer leaves, i.e. by the cubes of Λ(τ /4). We denote by T (f ρ, τ /4) this tree and by N its cardinality. Using the sae arguent that led us to (4.25) for a fixed partition, we obtain η2 c P {e 3 > η} C 0 Ne N (4.35) with c and C 0 the sae constants as in (4.25). Fro the definition of the space B s, we have the estiate ( N (2 d + 1)#(T (f ρ, τ /4)) (2 d + 1)(4κ 1 f ρ B s) 2 2s+1 ) 1 2s+1. (4.36) log Thus, for κ larger than κ 0 (β) we obtain that { P e 3 > c with c = c(d, λ, β, C A, f ρ B s, s, κ, d). ( log ) s 2s+1 } β, (4.37) 20

21 The ters e 2 and e 4 are treated using the probabilistic estiates of Lea 4.6. Fro these estiates it follows that P {e 2 > 0} + Pr{e 4 > 0} 2#(Λ j0 ) c(1 + η C )e cη2 (4.38) with η = τ /4. It follows that for κ κ 0 (β, γ, λ, M, C A, d), we have P {e 2 > 0} + P {e 4 > 0} β. (4.39) Cobining the estiates for e 1, e 2, e 3 and e 4, we therefore obtain the probabilistic estiate (2.25). For the expectation estiate, we clearly infer fro (4.34) ( log ) 2s E (e 2 1) c 2 ( f ρ 2 B + f ρ 2 s A γ) 2s+1 with c = c(s, C A, λ, κ). Also, since e 2 2, e 2 4 4M 2, we obtain (4.40) E (e 2 2) + E (e 2 4) M 2 β, (4.41) with C = C(κ, λ, C A, M, d). We can estiate E (e 2 3) by integration of (4.35) which gives ( log ) 2s E (e 2 3) 4M 2 β + c 2 2s+1, (4.42) with c as in (4.37). This copletes the proof of the Theore 4.5. We now turn to the proof of Lea 4.6. We shall give a different approach than that used in [1] for piecewise constant approxiation. Recall that ɛ I (z) := T M ( p J,z χ J p I,z χ I ), (4.43) and ɛ I := ɛ I (f ρ ) := We introduce the auxiliary quantities and ɛ I (z) := T M ( ɛ I := T M ( p J χ J p I χ I. (4.44) p J,z χ J p I,z χ I ), (4.45) p J χ J p I χ I ). (4.46) For the proof of (4.31), we first reark that fro the L bound (4.4) on the least square projection, we know that p J χ J p I χ I L sup p J χ J L + p I χ I L 2λC A M. (4.47) 21

22 It follows that the inequality p J χ J p I χ I 2λC A T M ( p J χ J p I χ I ), (4.48) holds at all points, and therefore ɛ I 2λC A ɛ I so that We now write P {ɛ I (z) η and ɛ I 8λC A η} P {ɛ I (z) η and ɛ I 4η}. (4.49) P {ɛ I (z) η and ɛ I 4η} P { ɛ I (z) 3η and ɛ I 4η} The first probability p 1 is estiated by p 1 P { T M ( + P {ɛ I (z) η and ɛ I (z) 3η} P { ɛ I (z) ɛ I η} + P { ɛ I (z) 2ɛ I (z) η} = p 1 + p 2. p J χ J p I χ I ) T M ( p J,z χ J p I,z χ I ) > η} ce cη2, (4.50) with c = c(λ, d) and c = c(λ, M, C A, d), where the last inequality is obtained by the sae technique as in the proof of Lea 4.1 applied to p Jχ J p I χ I instead of p I χ I only. The second probability p 2 is estiated by using results fro [8]. Fix I and let F = F(I) be the set of all functions f of the for f = T M ( q J χ J q I χ I ). (4.51) where q I and the q J are arbitrary polynoials fro Π K. Let t = (t 1,..., t 2 ) with the t j chosen independently and at rando with respect to the easure ρ X and consider the discrete nor f t := 1 2 f(t j ) 2. (4.52) 2 We will need a bound for the covering nuber N (F, η, t) which is the sallest nuber of balls of radius η which cover F with respect to the nor t. It is well known that if V is a linear space of diension q and G := {T M g : g V } then j=1 N (G, η, t) (Cη) (2q+1), 0 < η 1, (4.53) with C = C(M) (see e.g. Theores 9.4 and 9.5 in [8]). In our present situation, this leads to N (F, η, t) (Cη) 2λ[2d +1]+2, 0 < η < 1, (4.54) We apply this bound on covering nubers in Theore 11.2 of [8] which states that P { f 2 f > η for soe f F} 3e η2 288M 2 E (N (F, η, t)) (4.55) 22

23 Here the probability is with respect to z and the expectation is with respect to t. We can put in the entropy bound (4.54) for N into (4.55) and obtain p 2 P { f 2 f > η for soe f F} cη C e cη2 (4.56) with c, C, c as stated in the lea. This is the estiate we wanted for p 2. For the proof of (4.32), we first reark that obviously ɛ I ɛ I so that P {ɛ I η and ɛ I (z) 4η} P {ɛ I η and ɛ I (z) 4η}. (4.57) We proceed siilarily to the proof of (4.31) by writing P {ɛ I η and ɛ I (z) 4η} P {ɛ I η and ɛ I (z) 3 } 2 η { + P ɛ I (z) 3 } 2 η and ɛ I(z) 4η P { ɛ I (z) ɛ I η/2} + P {ɛ I (z) 2 ɛ I (z) η} = p 1 + p 2. (4.58) The first probability p 1 is estiated as previously. For the second probability p 2, we need a syetric stateent to Theore 11.2 in [8], to derive P { f 2 f > η for soe f F} Ce cη2 (4.59) with c and C as before. It is easily checked fro the proof of Theore 11.2 in [8], that such a stateent also holds (one only needs to odify the first page of the proof of this theore in [8] in order to bound P{ f 2 f > η for soe f F} by 3P{ f f > η/4 for soe f F}/2 and the rest of the proof is then identical). The proof of Lea 4.6 is therefore coplete. We finally odify our algorith slightly by choosing a slightly larger threshold which is now independent of the unknown constant C A, naely τ := log. We could actually use any threshold of the type log τ := κ(), (4.60) where κ() is a sequence which grows very slowly to +. This results in an additional logarith factor in our convergence estiates. Moreover, the sae analysis as in 5 of [1] shows that this new algorith is universally consistent. We record this in the following theore for which we do not give a proof since it is very siilar to the proof of Theore 4.5. Theore 4.7 Given an arbitrary β 1 and γ > 0, we take the threshold τ := log. Then the adaptive algorith has the property that whenever f ρ A γ B s for soe s > 0, the following concentration estiate holds { ( log ) } 2s 2s+1 P f ρ f z c β, (4.61) 23

24 with c = c(s, C A, λ, f ρ B s, f ρ A γ), as well as the following expectation bound ( log ) 4s E ( f ρ f z 2 2s+1 ) C (4.62) with C = C(s, λ, M, C A, d, f ρ B s, f ρ A γ). For a general regression function f ρ, we have the universal consistency estiate li E( f ρ f z 2 ) = 0, (4.63) + which in turn iplies the convergence in probability: for all ɛ > 0, References li P{ f ρ f z > ɛ} = 0. (4.64) + [1] Binev, P., A. Cohen, W. Dahen, R. DeVore, and V. Telyakov (2004) Universal algoriths in learning theory - Part I : piecewise constant functions, Journal of Machine Learning Research (JMLR), 6(2005), [2] C. Bennett and R. Sharpley, Interpolation of Operators, Vol. 129 in Pure and Applied Matheatics, Acadeic Press, N.Y., [3] Breian, L., J.H. Friedan, R.A. Olshen and C.J. Stone (1984) Classification and regression trees, Wadsworth international, Belont, CA. [4] Cohen, A., W. Dahen, I. Daubechies and R. DeVore (2001) Tree-structured approxiation and optial encoding, App. Cop. Har. Anal. 11, [5] Cohen, A., R. DeVore, G. Kerkyacharian and D. Picard (2001) Maxial spaces with given rate of convergence for thresholding algoriths, App. Cop. Har. Anal. 11, [6] DeVore, R., G. Kerkyacharian, D. Picard and V. Telyakov (2004) Matheatical ethods for supervised learning, to appear in J. of FOCM. [7] Donoho, D.L (1997) CART and best-ortho-basis : a connection, Ann. Stat. 25, [8] Györfy, L., M. Kohler, A. Krzyzak, A. and H. Walk (2002) A distribution-free theory of nonparaetric regression, Springer, Berlin. Peter Binev, Industrial Matheatics Institute, University of South Carolina, Colubia, SC 29208, Albert Cohen, Laboratoire Jacques-Louis Lions, Université Pierre et Marie Curie 175, rue du Chevaleret, Paris, France, cohen@ann.jussieu.fr 24

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

Max-Product Shepard Approximation Operators

Max-Product Shepard Approximation Operators Max-Product Shepard Approxiation Operators Barnabás Bede 1, Hajie Nobuhara 2, János Fodor 3, Kaoru Hirota 2 1 Departent of Mechanical and Syste Engineering, Bánki Donát Faculty of Mechanical Engineering,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket. Generalized eigenfunctions and a Borel Theore on the Sierpinski Gasket. Kasso A. Okoudjou, Luke G. Rogers, and Robert S. Strichartz May 26, 2006 1 Introduction There is a well developed theory (see [5,

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany. New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf Konrad-Zuse-Zentru für Inforationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin - Wilersdorf Folkar A. Borneann On the Convergence of Cascadic Iterations for Elliptic Probles SC 94-8 (Marz 1994) 1

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Math Reviews classifications (2000): Primary 54F05; Secondary 54D20, 54D65

Math Reviews classifications (2000): Primary 54F05; Secondary 54D20, 54D65 The Monotone Lindelöf Property and Separability in Ordered Spaces by H. Bennett, Texas Tech University, Lubbock, TX 79409 D. Lutzer, College of Willia and Mary, Williasburg, VA 23187-8795 M. Matveev, Irvine,

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

The isomorphism problem of Hausdorff measures and Hölder restrictions of functions

The isomorphism problem of Hausdorff measures and Hölder restrictions of functions The isoorphis proble of Hausdorff easures and Hölder restrictions of functions Doctoral thesis András Máthé PhD School of Matheatics Pure Matheatics Progra School Leader: Prof. Miklós Laczkovich Progra

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

THE POLYNOMIAL REPRESENTATION OF THE TYPE A n 1 RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n

THE POLYNOMIAL REPRESENTATION OF THE TYPE A n 1 RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n THE POLYNOMIAL REPRESENTATION OF THE TYPE A n RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n SHEELA DEVADAS AND YI SUN Abstract. We study the polynoial representation of the rational Cherednik algebra

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

LORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH

LORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH LORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH GUILLERMO REY. Introduction If an operator T is bounded on two Lebesgue spaces, the theory of coplex interpolation allows us to deduce the boundedness

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

A Note on Online Scheduling for Jobs with Arbitrary Release Times

A Note on Online Scheduling for Jobs with Arbitrary Release Times A Note on Online Scheduling for Jobs with Arbitrary Release Ties Jihuan Ding, and Guochuan Zhang College of Operations Research and Manageent Science, Qufu Noral University, Rizhao 7686, China dingjihuan@hotail.co

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information