Online Learning of Noisy Data with Kernels

Size: px
Start display at page:

Download "Online Learning of Noisy Data with Kernels"

Transcription

1 Online Learning of Noisy Data with Kernels Nicolò Cesa-Bianchi Università degli Studi di Milano Shai Shalev Shwartz The Hebrew University Ohad Shamir The Hebrew University Abstract We study online learning when individual instances are corruted by adversarially chosen random noise We assume the noise distribution is unknown, and may change over time with no restriction other than having zero mean and bounded variance Our technique relies on a family of unbiased estimators for non-linear functions, which may be of indeendent interest We show that a variant of online gradient descent can learn functions in any dotroduct eg, olynomial or Gaussian kernel sace with any analytic convex loss function Our variant uses randomized estimates that need to query a random number of noisy coies of each instance, where with high robability this number is uer bounded by a constant Allowing such multile queries cannot be avoided: Indeed, we show that online learning is in general imossible when only one noisy coy of each instance can be accessed Introduction In many machine learning alications training data are tyically collected by measuring certain hysical quantities Examles include bioinformatics, medical tests, robotics, and remote sensing These measurements have errors that may be due to several reasons: sensor costs, communication constraints, or intrinsic hysical limitations In all such cases, the learner trains on a distorted version of the actual target data, which is where the learner s redictive ability is eventually evaluated In this work we investigate the extent to which a learning algorithm can achieve a good redictive erformance when training data are corruted by noise with unknown distribution We rove uer and lower bounds on the learner s cumulative loss in the framework of online learning, where examles are generated by an arbitrary and ossibily adversarial source We model the measurement error via a random erturbation which affects each instance observed by the learner We do not assume any secific roerty of the noise distribution other than zero-mean and bounded variance Moreover, we allow the noise distribution to change at every ste in an adversarial way and fully hidden from the learner Our ositive results are quite general: by using a randomized unbiased estimate for the loss gradient and a randomized feature maing to estimate kernel values, we show that a variant of online gradient descent can learn functions in any dot-roduct eg, olynomial or Gaussian RKHS under any given analytic convex loss function Our techniques are readily extendable to other kernel tyes as well In order to obtain unbiased estimates of loss gradients and kernel values, we allow the learner to query a random number of indeendently erturbed coies of the current unseen instance We show how low-variance estimates can be comuted using a number of queries that is constant with high robability This is in shar contrast with standard averaging techniques which attemts to directly estimate the noisy instance, as these require a samle whose size deends on the scale of the roblem Finally, we formally show that learning is imossible, even without kernels, when only one erturbed coy of each instance can be accessed This is true for essentially any reasonable loss function Our aer is organized as follows In the next subsection we discuss related work In Sec 2 we introduce our setting and justify some of our choices In Sec 4 we resent our main results but before that, in Sec 3, we discuss the techniques used to obtain them In the same section, we also exlain why existing techniques are insufficient to deal with our roblem The detailed roofs and subroutine imlementations aear in Sec 5, with some of the more technical lemmas and roofs relegated to [7] We wra u with a discussion on ossible avenues for future work in Sec 6

2 Related Work In the machine learning literature, the roblem of learning from noisy examles, and, in articular, from noisy training instances, has traditionally received a lot of attention see, for examle, the recent survey [2] On the other hand, there are comarably few theoretically-rinciled studies on this toic Two of them focus on models quite different from the one studied here: random attribute noise in PAC boolean learning [3, 9], and malicious noise [0, 5] In the first case, learning is restricted to classes of boolean functions and the noise must be indeendent across each boolean coordinate In the second case, an adversary is allowed to erturb a small fraction of the training examles in an arbitrary way, making learning imossible in a strong informational sense unless this erturbed fraction is very small of the order of the desired accuracy for the redictor The revious work erhas closest to the one resented here is [], where binary classification mistake bounds are roven for the online Winnow algorithm in the resence of attribute errors Similarly to our setting, the sequence of instances observed by the learner is chosen by an adversary However, in [] the noise is generated by an adversary, who may change the value of each attribute in an arbitrary way The final mistake bound, which only alies when the noiseless data sequence is linearly searable without kernels, deends on the sum of all adversarial erturbations 2 Setting We consider a setting where the goal is to redict values y R based on instances x R d In this aer we focus on kernel-based linear redictors of the form x w, Ψx, where Ψ is a feature maing into some reroducing kernel Hilbert sace RKHS We assume there exists a kernel function that efficiently imlements dot roducts in that sace, ie, kx, x Ψx, Ψx Note that a secial case of this setting is linear kernels, where Ψ is the identity maing and kx, x x, x The standard online learning rotocol for linear rediction with kernels is defined as follows: at each round t, the learner icks a linear hyothesis w t from the RKHS The adversary then icks an examle x t, y t and reveals it to the learner The loss suffered by the learner is l w t, Ψx t, y t, where l is a known and fixed loss function The goal of the learner is to minimize regret with resect to a fixed convex set of hyotheses W, namely T l w t, Ψx t, y t min w W T l w, Ψx t, y t Tyically, we wish to find a strategy for the learner, such that no matter what is the adversary s strategy of choosing the sequence of examles, the exression above is sub-linear in T We now make the following twist, which limits the information available to the learner: instead of receiving x t, y t, the learner observes y t and is given access to an oracle A t On each call, A t returns an indeendent coy of x t + Z t, where Z t is a zero-mean random vector with some known finite bound on its variance in the sense that E [ Z t 2] a for some uniform constant a In general, the distribution of Z t is unknown to the learner It might be chosen by the adversary, and change from round to round or even between consecutive calls to A t Note that here we assume that y t remains unerturbed, but we emhasize that this is just for simlicity - our techniques can be readily extended to deal with noisy values as well The learner may call A t more than once In fact, as we discuss later on, being able to call A t more than once is necessary for the learner to have any hoe to succeed On the other hand, if the learner calls A t an unlimited number of times, it can reconstruct x t arbitrarily well by averaging, and we are back to the standard learning setting In this aer we focus on learning algorithms that call A t only a small, essentially constant number of times, which deends only on our choice of loss function and kernel rather than T, the norm of x t, or the variance of Z t, which will haen with naïve averaging techniques Moreover, since the number of queries is bounded with very high robability, one can even roduce an algorithm with an absolute bound on the number of queries, which will fail or introduce some bias with an arbitrarily small robability For simlicity, we ignore these issues in this aer In this setting, we wish to minimize the regret in hindsight with resect to the unerturbed data and averaged over the noise introduced by the oracle, namely [ T ] T E l w t, Ψx t, y t min l w, Ψx t, y t w W where the random quantities are the redictors w, w 2, generated by the learner, which deend on the observed noisy instances in [7], we briefly discuss alternative regret measures, and why

3 they are unsatisfactory This kind of regret is relevant where we actually wish to learn from data, without the noise causing a hindrance In articular, consider the batch setting, where the examles {x t, y t } T are actually samled iid from some unknown distribution, and we wish to find a redictor which minimizes the exected loss E[l w, x, y] with resect to new examles x, y Using standard online-to-batch conversion techniques, if we can find an online algorithm with a sublinear bound on Eq, then it is ossible to construct learning algorithms for the batch setting which are robust to noise That is, algorithms generating a redictor w with close to minimal exected loss E[l w, x, y] among all w W While our techniques are quite general, the exact algorithmic and theoretical results deend a lot on which loss function and kernel is used Discussing the loss function first, we will assume that l w, Ψx, y is a convex function of w for each examle x, y Somewhat abusing notation, we assume the loss can be written either as l w, Ψx, y fy w, Ψx or as l w, Ψx, y f w, Ψx y for some function f We refer to the first tye as classification losses, as it encomasses most reasonable losses for classification, where y {, +} and the goal is to redict the label We refer to the second tye as regression losses, as it encomasses most reasonable regression losses, where y takes arbitrary real values For simlicity, we resent some of our results in terms of classification losses, but they all hold for regression losses as well with slight modifications We resent our results under the assumtion that the loss function is smooth, in the sense that l a can be written as γ na n, for any a in its domain This assumtion holds for instance for the squared loss la a 2, the exonential loss la exa, and smoothed versions of loss functions such as the hinge loss and the absolute loss we discuss examles in more details in Subsection 42 This assumtion can be relaxed under certain conditions, and this is further discussed in Subsection 32 Turning to the issue of kernels, we note that the general resentation of our aroach is somewhat hamered by the fact that it needs to be tailored to the kernel we use In this aer, we focus on two families of kernels: Dot Product Kernels: the kernel kx, x can be written as a function of x, x Examles of such kernels kx, x are linear kernels x, x ; homogeneous olynomial kernels x, x n, inhomogeneous olynomial kernels + x, x n ; exonential kernels e x,x ; binomial kernels + x, x α, and more see for instance [5, 7] Gaussian Kernels: kx, x e x x 2 /σ 2 for some σ 2 > 0 Again, we emhasize that our techniques are extendable to other kernel tyes as well 3 Techniques Our results are based on two key ideas: the use of online gradient descent algorithms, and construction of unbiased gradient estimators in the kernel setting The latter is based on a general method to build unbiased estimators for non-linear functions, which may be of indeendent interest 3 Online Gradient Descent There exist well develoed theory and algorithms for dealing with the standard online learning setting, where the examle x t, y t is revealed after each round, and for general convex loss functions One of the simlest and most well known ones is the online gradient descent algorithm due to Zinkevich [8] Since this algorithm forms a basis for our algorithm in the new setting, we briefly review it below as adated to our setting The algorithm initializes the classifier w 0 At round t, the algorithm redicts according to w t, and udates the learning rule according to w t+ P w t η t t, where ηt is a suitably chosen constant which might deend on t; t l y t w t, Ψx t y t Ψx t is the gradient of l y t w, Ψx t with resect to w t ; and P is a rojection oerator on the convex set W, on whose elements we wish to achieve low regret In articular, if we wish to comete with hyotheses of bounded squared norm B w, P simly involves rescaling the norm of the redictor so as to have squared norm at most B w With this algorithm, one can rove regret bounds with resect to any w W A folklore result about this algorithm is that in fact, we do not need to udate the redictor by the gradient at each ste Instead, it is enough to udate by some random vector of bounded variance, which merely equals the gradient in exectation This is a useful roerty in settings where x t, y t is not revealed to the learner, and has been used before, such as in the online bandit setting see for instance [6, 8, ] Here, we will use this roerty in a new way, in order to devise algorithms which are robust to noise When the kernel and loss function are linear eg, Ψx x and la ca + b for some constants b, c, this roerty already ensures that the algorithm is robust

4 to noise without any further changes This is because the noise injected to each x t merely causes the exact gradient estimate to change to a random vector which is correct in exectation: If we assume l is a classification loss, then E [l y t w t, Ψ x t Ψ x t ] E [c x t ] x t On the other hand, when we use nonlinear kernels and nonlinear loss functions, using standard online gradient descent leads to systematic and unknown biases since the noise distribution is unknown, which revents the method from working roerly To deal with this roblem, we now turn to describe a technique for estimating exressions such as l y t w t, Ψx t in an unbiased manner In Subsection 33, we discuss how Ψx t can be estimated in an unbiased manner 32 Unbiased Estimators for Non-Linear Functions Suose that we are given access to indeendent coies of a real random variable X, with exectation E[X], and some real function f, and we wish to construct an unbiased estimate of fe[x] If f is a linear function, then this is easy: just samle x from X, and return fx By linearity, E[fX] fe[x] and we are done The roblem becomes less trivial when f is a general, nonlinear function, since usually E[fX] fe[x] In fact, when X takes finitely many values and f is not a olynomial function, one can rove that no unbiased estimator can exist see [4], Proosition 8 and its roof Nevertheless, we show how in many cases one can construct an unbiased estimator of fe[x], including cases covered by the imossibility result There is no contradiction, because we do not construct a standard estimator Usually, an estimator is a function from a given samle to the range of the arameter we wish to estimate An imlicit assumtion is that the size of the samle given to it is fixed, and this is also a crucial ingredient in the imossibility result We circumvent this by constructing an estimator based on a random number of samles Here is the key idea: suose f : R R is any function continuous on a bounded interval It is well known that one can construct a sequence of olynomials Q n n, where Q n is a olynomial of degree n, which converges uniformly to f on the interval If Q n x n i0 γ n,ix i, let Q nx,, x n n i0 γ i n,i j x j Now, consider the estimator which draws a ositive integer N according to some distribution PN n n, samles X for N times to get x, x 2,, x N, and returns N Q N x,, x N Q N x,, x N, where we assume Q 0 0 The exected value of this estimator is equal to: [ E N,x,,x N Q N x,, x N Q N x,, x N ] N n [ E x,,x n Q n x,, x n Q n x,, x n ] n n Qn E[X] Q n E[X] fe[x] n Thus, we have an unbiased estimator of fe[x] This technique aeared in a rather obscure early 960 s aer [6] from sequential estimation theory, and aears to be little known, articularly outside the sequential estimation community However, we believe this technique is interesting, and exect it to have useful alications for other roblems as well While this may seem at first like a very general result, the variance of this estimator must be bounded for it to be useful Unfortunately, this is not true for general continuous functions More recisely, let N be distributed according to n, and let θ be the value returned by the estimator In [2], it is shown that if X is a Bernoulli random variable, and if E[θN k ] < for some integer k, then f must be k times continuously differentiable Since E[θN k ] E[θ 2 ] + E[N 2k ]/2, this means that functions f which yield an estimator with finite variance, while using a number of queries with bounded variance, must be continuously differentiable Moreover, in case we desire the number of queries to be essentially constant ie choose a distribution for N with exonentially decaying tails, we must have E[N k ] < for all k, which means that f should be infinitely differentiable in fact, in [2] it is conjectured that f must be analytic in such cases Thus, we focus in this aer on functions f which are analytic, ie, they can be written as fx i0 γ ix i for aroriate constants γ 0, γ, In that case, Q n can simly be the truncated Taylor exansion of f to order n, ie, Q n n i0 γ ix i Moreover, we can ick n / n for any > So the estimator becomes the following: we samle a nonnegative integer N according

5 to PN n / n+, samle X indeendently N times to get x, x 2,, x N, and return θ γ N+ N x x 2 x N where we set θ γ 0 if N 0 We have the following: Lemma For the above estimator, it holds that E[θ] fe[x] The exected number of samles used by the estimator is /, and the robability of it being at least z is z Moreover, if we assume that f + x γ n x n exists for any x in the domain of interest, then E[θ 2 ] f + 2 E[X2 ] Proof The fact that E[θ] fe[x] follows from the discussion above The results about the number of samles follow directly from roerties of the geometric distribution As for the second moment, E[θ 2 ] equals E N,x,,x N [γ N 2 2N+ ] 2 x2 x 2 2 x 2 2n+ [ N 2 n+ γ2 ne x,,x n x 2 x 2 2 x 2 ] n γ 2 n n E[X 2 ] n 2 n γ n E[X2 ] n 2 γ n E[X2 ] f + 2 E[X2 ] The arameter rovides a tradeoff between the variance of the estimator and the number of samles needed: the larger is, the less samles do we need, but the estimator has more variance In any case, the samle size distribution decays exonentially fast, so the samle size is essentially bounded It should be emhasized that the estimator associated with Lemma is tailored for generality, and is subotimal in some cases For examle, if f is a olynomial function, then γ n 0 for sufficiently large n, and there is no reason to samle N from a distribution suorted on all nonnegative integers - it just increases the variance Nevertheless, in order to kee the resentation unified and general, we will always use this tye of estimator If needed, the estimator can always be otimized for secific cases We also note that this technique can be imroved in various directions, if more is known about the distribution of X For instance, if we have some estimate of the exectation and variance of X, then we can erform a Taylor exansion around the estimated E[X] rather than 0, and tune the robability distribution of N to be different than the one we used above These modifications can allow us to make the variance of the estimator arbitrarily small, if the variance of X is small enough Moreover, one can take olynomial aroximations to f which are erhas better than truncated Taylor exansions In this aer, for simlicity, we will ignore these otential imrovements Finally, we note that a related result in [2] imlies that it is imossible to estimate fe[x] in an unbiased manner when f is discontinuous, even if we allow a number of queries and estimator values which are infinite in exectation Therefore, since the derivative of the hinge loss is not continuous, estimating in an unbiased manner the gradient of the hinge loss with arbitrary noise aears to be imossible Thus, if online learning with noise and hinge loss is at all feasible, a rather different aroach than ours will need to be taken 33 Unbiasing Noise in the RKHS The third comonent of our aroach involves the unbiased estimation of Ψx t, when we only have unbiased noisy coies of x t Here again, we have a non-trivial roblem, because the feature maing Ψ is usually highly non-linear, so E[Ψ x t ] ΨE[ x t ] in general Moreover, Ψ is not a scalar function, so the technique of Subsection 32 will not work as-is To tackle this roblem, we construct an exlicit feature maing, which needs to be tailored to the kernel we want to use To give a very simle examle, suose we use the homogeneous 2nddegree olynomial kernel, kr, s r, s 2 It is not hard to verify that the function Ψ : R d R d2, Admittedly, the event N 0 should receive zero robability, as it amounts to skiing the samling altogether However, setting PN 0 0 aears to imrove the bound in this aer only in the smaller order terms, while making the analysis in the aer more comlicated

6 defined via Ψx x x, x x 2,, x d x d, is an exlicit feature maing for this kernel Now, if we query two indeendent noisy coies x, x of x, we have that the exectation of the random vector x x, x x 2,, x d x d is nothing more than Ψx Thus, we can construct unbiased estimates of Ψx in the RKHS Of course, this examle ertains to a very simle RKHS with a finite dimensional reresentation By a randomization trick somewhat similar to the one in Subsection 32, we can adat this aroach to infinite dimensional RKHS as well In a nutshell, we reresent Ψx as an infinite-dimensional vector, and its noisy unbiased estimate is a vector which is non-zero on only finitely many entries, using finitely many noisy queries Moreover, inner roducts between these estimates can be done efficiently, allowing us to imlement the learning algorithms, and use the resulting redictor on test instances 4 Main Results 4 Algorithm We resent our algorithmic aroach in a modular form We start by introducing the main algorithm, which contains several subroutines Then we rove our two main results, which bound the regret of the algorithm, the number of queries to the oracle, and the running time for two tyes of kernels: dot roduct and Gaussian our results can be extended to other kernel tyes as well In itself, the algorithm is nothing more than a standard online gradient descent algorithm with a standard O T regret bound Thus, most of the roofs are devoted to a detailed discussion of how the subroutines are imlemented including exlicit seudo-code In this section, we just describe one subroutine, based on the techniques discussed in Sec 3 The other subroutines require a more detailed and technical discussion, and thus their imlementation is described as art of the roofs in Sec 5 In any case, the intuition behind the imlementations and the techniques used are described in Sec 3 For simlicity, we will focus on a finite-horizon setting, where the number of online rounds T is fixed and known to the learner The algorithm can easily be modified to deal with the infinite horizon setting, where the learner needs to achieve sub-linear regret for all T simultaneously Also, for the remainder of this subsection, we assume for simlicity that l is a classification loss, namely can be written as a function of ly w, Ψx It is not hard to adat the results below to the case where l is a regression loss where l is a function of w, Ψx y We note that at each round, the algorithm below constructs an object which we denote as Ψx t This object has two interretations here: formally, it is an element of a reroducing kernel Hilbert sace RKHS corresonding to the kernel we use, and is equal in exectation to Ψx t However, in terms of imlementation, it is simly a data structure consisting of a finite set of vectors from R d Thus, it can be efficiently stored in memory and handled even for infinite-dimensional RKHS Algorithm Kernel Learning Algorithm with Noisy Inut Parameters: Learning rate η > 0, number of rounds T, samle arameter > Initialize: α i 0 for all i,, T Ψx i for all i,, T // Ψx i is a data structure which can store a variable number of vectors in R d For t T Define w t t i α Ψx i i Receive A t, y t // The oracle A t rovides noisy estimates of x t Let Ψx t : Ma EstimateA t, // Get unbiased estimate of Ψx t in the RKHS Let g t : Grad Length EstimateA t, y t, // Get unbiased estimate of l y t w t, Ψx t Let α t : g t η/ T // Perform gradient ste Let ñ t : t t i j α t,iα t,j Prod Ψx i, Ψx j // Comute squared norm, where Prod Ψx i, Ψx j returns Ψx i, Ψx j If ñ t > B w // If norm squared is larger than B w, then roject Let α i : α Bw i ñ t for all i,, t Like Ψx t, w t+ has also two interretations: formally, it is an element in the RKHS, as defined in the seudocode In terms of imlementation, it is defined via the data structures Ψx,, Ψx t and the values of α,, α t at round t To aly this hyothesis on a given instance x, we comute

7 t i α t,iprod Ψx i, x, where Prod Ψx i, x is a subroutine which returns Ψx i, Ψx a seudocode is rovided as art of the roofs later on We now turn to the main results ertaining to the algorithm The first result shows what regret bound is achievable by the algorithm for any dot-roduct kernel, as well as characterize the number of oracle queries er instance, and the overall running time of the algorithm Theorem Assume that the loss function l has an analytic derivative l a γ na n for all a in its domain, and let l +a γ n a n assuming it exists Assume also that the kernel kx, x can be written as Q x, x for all x, x R d Finally, assume that E[ x t 2 ] B x for any x t returned by the oracle at round t, for all t,, T Then, for all B w > 0 and >, it is ossible to imlement the subroutines of Algorithm such that: 2 The exected number of queries to each oracle A t is The exected running time of the algorithm is O T 3 + d 2 / 2 If we run Algorithm with η B w ul + u, where u Bw QB x, then [ T ] T E ly t w t, Ψx t min ly t w, Ψx t l + u ut w : w 2 B w The exectations are with resect to the randomness of the oracles and the algorithm throughout its run We note that the distribution of the number of oracle queries can be secified exlicitly, and it decays very raidly - see the roof for details Also, for simlicity, we only bound the exected regret in the theorem above If the noise is bounded almost surely or with sub-gaussian tails rather than just bounded variance, then it is ossible to obtain similar guarantees with high robability, by relying on Azuma s inequality or variants thereof see for examle [4] We now turn to the case of Gaussian kernels Theorem 2 Assume that the loss function l has an analytic derivative l a γ na n for all a in its domain, and let l +a γ n a n assuming it exists Assume that the kernel kx, x is defined as ex x x 2 /σ 2 Finally, assume that E[ x t 2 ] B x for any x t returned by the oracle at round t, for all t,, T Then for all B w > 0 and > it is ossible to imlement the subroutines of Algorithm such that 3 2 The exected number of queries to each oracle A t is The exected running time of the algorithm is O T 3 + d 2 / If we run Algorithm with η B w ul + u, where 3 B x + 2 B x u B w ex σ 2 then [ T E ly t w t, Ψx t min w : w 2 B w ] T ly t w, Ψx t l + u ut The exectations are with resect to the randomness of the oracles and the algorithm throughout its run As in Thm, note that the number of oracle queries has a fast decaying distribution Also, note that with Gaussian kernels, σ 2 is usually chosen to be on the order of the examle s squared norms Thus, if the noise added to the examles is roortional to their original norm, we can assume that B x /σ 2 O, and thus u which aears in the bound is also bounded by a constant As reviously mentioned, most of the subroutines are described in the roofs section, as art of the roof of Thm Here, we only show how to imlement Grad Length Estimate subroutine, which returns the gradient length estimate g t The idea is based on the technique described in

8 Subsection 32 We rove that g t is an unbiased estimate of l y t w t, Ψx t, and bound E[ g 2 t ] As discussed earlier, we assume that l is analytic and can be written as l a γ na n Subroutine Grad Length EstimateA t, y t, Samle nonnegative integer n according to Pn / n+ For j,, n Let Ψx t j : Ma EstimateA t // Get unbiased estimate of Ψx t in the RKHS Return g t : y t γ n+ n t n j i α t,iprod Ψx i, Ψx t j Lemma 2 Assume that E[ Ψx t ] Ψx t, and that Prod Ψx, Ψx returns Ψx, Ψx for all x, x Then for any given w t α t, Ψx + + α t,t Ψxt it holds that E t [ g t ] y t l y t w t, Ψx t and E t [ g t 2 ] 2 l + B w B Ψx where the exectation is with resect to the randomness of Subroutine, and l +a γ n a n Proof The result follows from Lemma, where g t corresonds to the estimator θ, the function f corresonds to l, and the random variable X corresonds to w t, Ψx t where Ψx t is random and w t is held fixed The term E[X 2 ] in Lemma can be uer bounded as E t [ wt, Ψx t 2 ] w t 2 E t [ Ψx t 2] B w B Ψx 42 Loss Function Examles Theorems and 2 both deal with generic loss functions l whose derivative can be written as γ na n, and the regret bounds involve the functions l +a γ n a n Below, we resent a few examles of loss functions and their corresonding l + As mentioned earlier, while the theorems in the revious subsection are in terms of classification losses ie, l is a function of y w, Ψx, virtually identical results can be roven for regression losses ie, l is a function of w, Ψx y, so we will give examles from both families Working out the first two examles is straightforward The roofs of the other two aear in Sec 5 The loss functions are illustrated grahically in Fig Examle For the squared loss function, l w, x, y w, x y 2, we have l + u 2 u Examle 2 For the exonential loss function, l w, x, y e y w,x, we have l + u e u Examle 3 Consider a smoothed absolute loss function l σ w, Ψx y, defined as an antiderivative of Erfsa for some s > 0 see roof for exact analytic form Then we have that l + u 2 + e s2 u s π u Examle 4 Consider a smoothed hinge loss ly w, Ψx, defined as an antiderivative of Erfsa /2 for some s > 0 see roof for exact analytic form Then we have that l + u 2 e s2 u s π u For any s, the loss function in the last two examles are convex, and resectively aroximate the absolute loss w, Ψx y and the hinge loss max { 0, y w, Ψx } arbitrarily well for large enough s Fig shows these loss functions grahically for s Note that s need not be large in order to get a good aroximation Also, we note that both the loss itself and its gradient are comutationally easy to evaluate Finally, we remind the reader that as discussed in Subsection 32, erforming an unbiased estimate of the gradient for non-differentiable losses directly such as the hinge loss or absolute loss aears to be imossible in general On the fli side, if one is willing to use a random number of queries with olynomial rather than exonential tails, then one can achieve much better samle comlexity results, by focusing on loss functions or aroximations thereof which are only differentiable to a bounded order, rather than fully analytic This again demonstrates the tradeoff between the samle size and the amount of information that needs to be gathered on each training examle

9 Absolute Loss Smoothed Absolute Loss s 2 Hinge Loss Smoothed Hinge Loss s Figure : Absolute loss, hinge loss, and smooth aroximations 43 One Noisy Coy is Not Enough The revious results might lead one to wonder whether it is really necessary to query the same instance more than once In some alications this is inconvenient, and one would refer a method which works when just a single noisy coy of each instance is made available In this subsection we show that, unfortunately, such a method cannot be found Secifically, we rove that under very mild assumtions, no method can achieve sub-linear regret when it has access to just a single noisy coy of each instance On the other hand, for the case of squared loss and linear kernels, our techniques can be adated to work with exactly two noisy coies of each instance, 2 so without further assumtions, the lower bound that we rove here is indeed tight For simlicity, we rove the result for linear kernels ie, where kx, x x, x It is an interesting oen roblem to show imroved lower bounds when nonlinear kernels are used We also note that the result crucially relies on the learner not knowing the noise distribution, and we leave to future work the investigation of what haens when this assumtion is relaxed Theorem 3 Let W be a comact convex subset of R d, and let l, : R R satisfy the following: it is bounded from below; 2 it is differentiable at 0 with l 0, < 0 For any learning algorithm which selects hyotheses from W and is allowed access to a single noisy coy of the instance at each round t, there exists a strategy for the adversary such that the sequence w, w 2, of redictors outut by the algorithm satisfies lim su T max w W T T l w t, x t, y t l w, x t, y t > 0 with robability with resect to the randomness of the oracles Note that condition is satisfied by virtually any loss function other than the linear loss, while condition 2 is satisfied by most regression losses, and by all classification calibrated losses, which include all reasonable losses for classification see [3] The most obvious examle where the conditions are not satisfied is when l, is a linear function This is not surrising, because when l, is linear, the learner is always robust to noise see the discussion at Sec 3 The intuition of the roof is very simle: the adversary chooses beforehand whether the examles are drawn iid from a distribution D, and then erturbed by noise, or drawn iid from some other distribution D without adding noise The distributions D, D and the noise are designed so that the examles observed by the learner are distributed in the same way irresective to which of the two samling strategies the adversary chooses Therefore, it is imossible for the learner accessing a single coy of each instance to be statistically consistent with resect to both distributions simultaneously As a result, the adversary can always choose a distribution on which the algorithm will be inconsistent, leading to constant regret The full roof is resented in Section 53 2 In a nutshell, for squared loss and linear kernels, we just need to estimate 2 w t, x t y tx t in an unbiased manner at each round t This can be done by comuting 2 w t, x t y t x t, where x t, x t are two noisy coies of x t

10 5 Proofs Due to the lack of sace, some of the roofs are given in the [7] 5 Preliminary Result To rove Thm and Thm 2, we need a theorem which basically states that if all subroutines in algorithm behave as they should, then one can achieve an O T regret bound This is rovided in the following theorem, which is an adatation of a standard result of online convex otimization see, eg, [8] The roof is given in [7] Theorem 4 Assume the following conditions hold with resect to Algorithm : For all t, Ψxt and g t are indeendent of each other as random variables induced by the randomness of Algorithm as well as indeendent of any Ψx i and g i for i < t 2 For all t, E[ Ψx t ] Ψx t, and there exists a constant B Ψ > 0 such that E[ Ψx t 2 ] B Ψ 3 For all t, E[ g t ] y t l y t w t, Ψx t, and there exists a constant B g > 0 such that E[ g 2 t ] B g 4 For any air of instances x, x, Prod Ψx, Ψx Ψx, Ψx Then if Algorithm is run with η Bw B gb, the following inequality holds Ψ [ T E l y t w t, Ψx t T min l y t w, Ψx t ] B w B g B ΨT w : w 2 B w where the exectation is with resect to the randomness of the oracles and the algorithm throughout its run 52 Proof of Thm In this subsection, we resent the roof of Thm We first show how to imlement the subroutines of Algorithm, and rove the relevant results on their behavior Then, we rove the theorem itself It is known that for k, Q x, x to be a valid kernel, it is necessary that Q x, x can be written as a Taylor exansion β n x, x n, where β n 0 see theorem 49 in [5] This makes these tyes of kernels amenable to our techniques We start by constructing an exlicit feature maing Ψ corresonding to the RKHS induced by our kernel For any x, x, we have that d n kx, x β n x, x n β n x i x i i d d β n x k x k2 x kn x k x k 2 x k n k k d d k n k n βn x k x k2 x kn βn x k x k2 x kn This suggests the following feature reresentation: for any x, Ψx returns an infinite-dimensional vector, indexed by n and k,, k n {,, d}, with the entry corresonding to n, k,, k n being βn x k x kn The dot roduct between Ψx and Ψx is similar to a standard dot roduct between two vectors, and by the derivation above equals kx, x as required We now use a slightly more elaborate variant of our unbiased estimate technique, to derive an unbiased estimate of Ψx First, we samle N according to PN n / n+ Then, we query the oracle for x for N times to get x,, x N, and formally define Ψx as Ψx n+ d d β n x k x n k n e n,k,,k n 2 k where e n,k,,k n reresents the unit vector in the direction indexed by n, k,, k n as exlained above Since the oracle queries are iid, the exectation of this exression is n+ d d βn n+ E [ x k x n ] d d k n en,k,,k n βn x k x n k n e n,k,,k n k k n k n k which is exactly Ψx We formalize the needed roerties of Ψx in the following lemma k n

11 Lemma 3 Assuming Ψx is constructed as in the discussion above, it holds that E[ Ψx] Ψx for any x Moreover, if the noisy samles x t returned by the oracle A t satisfy E[ x t 2 ] B x, then [ E Ψx t 2] QB x where we recall that Q defines the kernel by kx, x Q x, x Proof The first art of the lemma follows from the discussion above As to the second art, note that by 2, [ E Ψx t 2] E β 2n+2 d 2 n 2 x t,k x N t,k n E β 2n+2 n n x j 2 2 t k,k n n+ β 2n+2 ] n E 2 n ] [ x 2 t β n E 2 n [ x t where the second-to-last ste used the fact that β n 0 for all n j n β n B x QB x Of course, exlicitly storing Ψx as defined above is infeasible, since the number of entries is huge Fortunately, this is not needed: we just need to store x t,, x N t The reresentation above is used imlicitly when we calculate dot roducts between Ψx and other elements in the RKHS, via the subroutine Prod We note that while N is a random quantity which might be unbounded, its distribution decays exonentially fast, so the number of vectors to store is essentially bounded After the discussion above, the seudocode for Ma Estimate below should be self-exlanatory Subroutine 2 Ma EstimateA t, Samle nonnegative integer N according to PN n / n+ Query A t for N times to get x t,, x N t Return x t,, x N t as Ψx t We now turn to the subroutine Prod, which given two elements in the RKHS, returns their dot roduct This subroutine comes in two flavors: either as a rocedure defined over Ψx, Ψx and returning Ψx, Ψx Subroutine 3; or as a rocedure defined over Ψx, x Subroutine 4, where the second element is an exlicitely given vector and returning Ψx, Ψx This second variant of Prod is needed when we wish to aly the learned redictor on a new given instance x Subroutine 3 Prod Ψx, Ψx Let x,, x n be the index and vectors comrising Ψx Let x,, x n be the index and vectors comrising Ψx If n n return 0, else return β n 2n+2 2 n j xj, x j Lemma 4 Prod Ψx, Ψx returns Ψx Ψx Proof Using the formal reresentation of Ψx, Ψx in 2, we have that Ψx, Ψx is 0 whenever n n because then these two elements are comosed of different unit vectors with resect to an orthogonal basis Otherwise, we have that Ψx Ψx β n 2n+2 2 β n 2n+2 2 d k,,k n d k x k x n x k x k k n x k d k N x n k n x n k N x n k N which is exactly what the algorithm returns, hence the lemma follows β n 2n+2 2 N j x j, x j

12 The seudocode for calculating the dot roduct Ψx, Ψx where x is known is very similar, and the roof is essentially the same Subroutine 4 Prod Ψx, x Let n, x,, x n be the index and vectors comrising Ψx Return β n n+ n j xj, x We are now ready to rove Thm First, regarding the exected number of queries, notice that to run Algorithm, we invoke Ma Estimate and Grad Length Estimate once at round t Ma Estimate uses a random number B of queries distributed as PB n / n+, and Grad Length Estimate invokes Ma Estimate a random number C of times, distributed as PC n / n+ The total number of queries is therefore C+ j B j, where B j for all j are iid coies of B The exected value of this exression, using a standard result on the exected value of a sum of a random number of indeendent random variables, is equal to + E[C]E[B j ], or + 2 d, In terms of running time, we note that the exected running time of Prod is O + this because it erforms N multilications of inner roducts, each one with running time Od, and E[N] The exected running time of Ma Estimate is O + The exected running time of Grad Length Estimate is O T + d, which can be written as O + T + d Since Algorithm at each of T rounds calls Ma Estimate once, 2 Grad Length Estimate once, Prod for OT 2 times, and erforms O other oerations, we get that the overall runtime is O T T + d + T 2 + d Since, we can uer bound this by 2 O T T 2 d 2 O T 3 d + 2 The regret bound in the theorem follows from Thm 4, with the exressions for constants following from Lemma 2, Lemma 3, and Lemma 4 53 Proof Sketch of Thm 3 To rove the theorem, we use a more general result which leads to non-vanishing regret, and then show that under the assumtions of Thm 3, the result holds The roof of the result is given in [7] Theorem 5 Let W be a comact convex subset of R d and ick any learning algorithm which selects hyotheses from W and is allowed access to a single noisy coy of the instance at each round t If there exists a distribution over a comact subset of R d such that argmin w W E [ l w, x, ] and argmin l w, E[x], 3 w W are disjoint, then there exists a strategy for the adversary such that the sequence w, w 2, W of redictors outut by the algorithm satisfies lim su T max w W T T l w t, x t, y t l w, x t, y t > 0 with robability with resect to the randomness of the oracles Another way to hrase this theorem is that the regret cannot vanish, if given examles samled iid from a distribution, the learning roblem is more comlicated than just finding the mean of the data Indeed, the adversary s strategy we choose later on is simly drawing and resenting examles from such a distribution Below, we sketch how we use Thm 5 in order to rove Thm 3 A full roof is rovided in [7]

13 We construct a very simle one-dimensional distribution, which satisfies the conditions of Thm 5: it is simly the uniform distribution on {3x, x}, where x is the vector, 0,, 0 Thus, it is enough to show that argmin l3w, + l w, and argmin lw, 4 w : w 2 B w w : w 2 B w are disjoint, for some aroriately chosen B w Assuming the contrary, then under the assumtions on l, we show that the first set in Eq 4 is inside a bounded ball around the origin, in a way which is indeendent of B w, no matter how large it is Thus, if we ick B w to be large enough, and assume that the two sets in Eq 4 are not disjoint, then there must be some w such that both l3w, + l w, and lw, have a subgradient of zero at w However, this can be shown to contradict the assumtions on l, leading to the desired result 6 Future Work There are several interesting research directions worth ursuing in the noisy learning framework introduced here For instance, doing away with unbiasedness, which could lead to the design of estimators that are alicable to more tyes of loss functions, for which unbiased estimators may not even exist Also, it would be interesting to show how additional information one has about the noise distribution can be used to design imroved estimates, ossibly in association with secific losses or kernels Another oen question is whether our lower bound Thm 3 can be imroved when nonlinear kernels are used References [] J Abernethy, E Hazan, and A Rakhlin Cometing in the dark: An efficient algorithm for bandit linear otimization In COLT, ages , 2008 [2] S Bhandari and A Bose Existence of unbiased estimators in sequential binomial exeriments Sankhyā: The Indian Journal of Statistics, 52:27 30, 990 [3] N Bshouty, J Jackson, and C Tamon Uniform-distribution attribute noise learnability Information and Comutation, 872: , 2003 [4] N Cesa-Bianchi, A Conconi, and C Gentile On the generalization ability of on-line learning algorithms IEEE Transactions on Information Theory, 509: , Setember 2004 [5] N Cesa-Bianchi, E Dichterman, P Fischer, E Shamir, and H Simon Samle-efficient strategies for learning in the resence of noise Journal of the ACM, 465:684 79, 999 [6] N Cesa-Bianchi and G Lugosi Prediction, learning, and games Cambridge University Press, 2006 [7] N Cesa-Bianchi, S Shalev-Shwartz, and O Shamir Online learning of noisy data with kernels Technical Reort, available at arxiv: [8] A Flaxman, A Tauman Kalai, and H McMahan Online convex otimization in the bandit setting: gradient descent without a gradient In Proceedings of SODA, ages , 2005 [9] S Goldman and R Sloan Can ac learning algorithms tolerate random attribute noise? Algorithmica, 4:70 84, 995 [0] M Kearns and M Li Learning in the resence of malicious errors SIAM Journal on Comuting, 224: , 993 [] N Littlestone Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow In Proceedings of COLT, ages 47 56, 99 [2] D Nettleton, A Orriols-Puig, and A Fornells A study of the effect of different tyes of noise on the recision of suervised learning techniques Artificial Intelligence Review, 200 [3] M Jordan P Bartlett and J McAuliffe Convexity, classification and risk bounds Journal of the American Statistical Association, 0473:38 56, March 2006 [4] L Paninski Estimation of entroy and mutual information Neural Comutation, 56:9 253, 2003 [5] B Schölkof and A Smola Learning with Kernels MIT Press, 2002 [6] R Singh Existence of unbiased estimates Sankhyā: The Indian Journal of Statistics, 26:93 96, 964 [7] I Steinwart and A Christmann Suort Vector Machines Sringer, 2008 [8] M Zinkevich Online convex rogramming and generalized infinitesimal gradient ascent In Proceedings of ICML, ages , 2003

Online Learning of Noisy Data Nicoló Cesa-Bianchi, Shai Shalev-Shwartz, and Ohad Shamir

Online Learning of Noisy Data Nicoló Cesa-Bianchi, Shai Shalev-Shwartz, and Ohad Shamir IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 12, DECEMBER 2011 7907 Online Learning of Noisy Data Nicoló Cesa-Bianchi, Shai Shalev-Shwartz, and Ohad Shamir Abstract We study online learning of

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

CMSC 425: Lecture 4 Geometry and Geometric Programming

CMSC 425: Lecture 4 Geometry and Geometric Programming CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

arxiv: v1 [cs.lg] 31 Jul 2014

arxiv: v1 [cs.lg] 31 Jul 2014 Learning Nash Equilibria in Congestion Games Walid Krichene Benjamin Drighès Alexandre M. Bayen arxiv:408.007v [cs.lg] 3 Jul 204 Abstract We study the reeated congestion game, in which multile oulations

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

q-ary Symmetric Channel for Large q

q-ary Symmetric Channel for Large q List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu

More information

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating

More information

Convex Games in Banach Spaces

Convex Games in Banach Spaces Technical Reort TTIC-TR-2009-6 Setember 2009 Convex Games in Banach Saces Karthik Sridharan Toyota Technological Institute at Chicago karthik@tti-c.org Ambuj Tewari Toyota Technological Institute at Chicago

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics EE 508 Lecture 3 Statistical Characterization of Filter Characteristics Comonents used to build filters are not recisely redictable L C Temerature Variations Manufacturing Variations Aging Model variations

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2)

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2) Chater 5 Pell s Equation One of the earliest issues graled with in number theory is the fact that geometric quantities are often not rational. For instance, if we take a right triangle with two side lengths

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE GUSTAVO GARRIGÓS ANDREAS SEEGER TINO ULLRICH Abstract We give an alternative roof and a wavelet analog of recent results

More information

Some results of convex programming complexity

Some results of convex programming complexity 2012c12 $ Ê Æ Æ 116ò 14Ï Dec., 2012 Oerations Research Transactions Vol.16 No.4 Some results of convex rogramming comlexity LOU Ye 1,2 GAO Yuetian 1 Abstract Recently a number of aers were written that

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in

More information

An Estimate For Heilbronn s Exponential Sum

An Estimate For Heilbronn s Exponential Sum An Estimate For Heilbronn s Exonential Sum D.R. Heath-Brown Magdalen College, Oxford For Heini Halberstam, on his retirement Let be a rime, and set e(x) = ex(2πix). Heilbronn s exonential sum is defined

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands A Secial Case Solution to the Persective -Point Problem William J. Wolfe California State University Channel Islands william.wolfe@csuci.edu Abstract In this aer we address a secial case of the ersective

More information

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Stochastic integration II: the Itô integral

Stochastic integration II: the Itô integral 13 Stochastic integration II: the Itô integral We have seen in Lecture 6 how to integrate functions Φ : (, ) L (H, E) with resect to an H-cylindrical Brownian motion W H. In this lecture we address the

More information

MATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s,

MATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s, MATH 50: THE DISTRIBUTION OF PRIMES ROBERT J. LEMKE OLIVER For s R, define the function ζs) by. Euler s work on rimes ζs) = which converges if s > and diverges if s. In fact, though we will not exloit

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Impossibility of a Quantum Speed-up with a Faulty Oracle

Impossibility of a Quantum Speed-up with a Faulty Oracle Imossibility of a Quantum Seed-u with a Faulty Oracle Oded Regev Liron Schiff Abstract We consider Grover s unstructured search roblem in the setting where each oracle call has some small robability of

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Equivalence of Wilson actions

Equivalence of Wilson actions Prog. Theor. Ex. Phys. 05, 03B0 7 ages DOI: 0.093/te/tv30 Equivalence of Wilson actions Physics Deartment, Kobe University, Kobe 657-850, Jaan E-mail: hsonoda@kobe-u.ac.j Received June 6, 05; Revised August

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction : An Algorithm for Aroximating Sarse Kernel Reconstruction Gregory Ditzler Det. of Electrical and Comuter Engineering The University of Arizona Tucson, AZ 8572 USA ditzler@email.arizona.edu Nidhal Carla

More information

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at A Scaling Result for Exlosive Processes M. Mitzenmacher Λ J. Sencer We consider the following balls and bins model, as described in [, 4]. Balls are sequentially thrown into bins so that the robability

More information

Sets of Real Numbers

Sets of Real Numbers Chater 4 Sets of Real Numbers 4. The Integers Z and their Proerties In our revious discussions about sets and functions the set of integers Z served as a key examle. Its ubiquitousness comes from the fact

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

LEIBNIZ SEMINORMS IN PROBABILITY SPACES

LEIBNIZ SEMINORMS IN PROBABILITY SPACES LEIBNIZ SEMINORMS IN PROBABILITY SPACES ÁDÁM BESENYEI AND ZOLTÁN LÉKA Abstract. In this aer we study the (strong) Leibniz roerty of centered moments of bounded random variables. We shall answer a question

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls CIS 410/510, Introduction to Quantum Information Theory Due: June 8th, 2016 Sring 2016, University of Oregon Date: June 7, 2016 Fault Tolerant Quantum Comuting Robert Rogers, Thomas Sylwester, Abe Pauls

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

A Note on Guaranteed Sparse Recovery via l 1 -Minimization

A Note on Guaranteed Sparse Recovery via l 1 -Minimization A Note on Guaranteed Sarse Recovery via l -Minimization Simon Foucart, Université Pierre et Marie Curie Abstract It is roved that every s-sarse vector x C N can be recovered from the measurement vector

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information