Replication Strategies in Unstructured Peer-to-Peer Networks

Size: px
Start display at page:

Download "Replication Strategies in Unstructured Peer-to-Peer Networks"

Transcription

1 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs Research 180 Park Avenue Florha Park, NJ 07932, USA Scott Shenker ICSI Berkeley, CA USA ABSTRACT The Peer-to-Peer (P2P) architectures that are ost prevalent in today s Internet are decentralized and unstructured. Search is blind in that it is independent of the query and is thus not ore effective than probing randoly chosen peers. One technique to iprove the effectiveness of blind search is to proactively replicate data. We evaluate and copare different replication strategies and reveal interesting structure: Two very coon but very different replication strategies unifor and proportional yield the sae average perforance on successful queries, and are in fact worse than any replication strategy which lies between the. The optial strategy lies between the two and can be achieved by siple distributed algoriths. These fundaental results offer a new understanding of replication and show that currently deployed replication strategies are far fro optial and that optial replication is attainable by protocols that reseble existing ones in siplicity and operation. Categories and Subject Descriptors C.2 [Counication Networks]; H.3 [Inforation Storage and Retrieval]; F.2 [Analysis of Algoriths] General Ters Algoriths Keywords replication; peer-to-peer; rando search 1. INTRODUCTION Peer-to-peer (P2P) systes, alost unheard of three years ago, are now one of the ost popular Internet applications and a very significant source of Internet traffic. While Napster s recent legal troubles ay lead to its deise, there are Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. SIGCOMM 02, August 19-23, 2002, Pittsburgh, Pennsylvania, USA. Copyright 2002 ACM X/02/ $5.00. any other P2P systes that are continuing their eteoric growth. Despite their growing iportance, the perforance of P2P systes is not yet well understood. P2P systes were classified by [9] into three different categories. Soe P2P systes, such as Napster [6], are centralized in that they have a central directory server to which users can subit queries (or searches). Other P2P systes are decentralized and have no central server; the hosts for an ad hoc network aong theselves and send their queries to their peers. Of these decentralized designs, soe are structured in that they have close coupling between the P2P network topology and the location of data; see [12, 10, 11, 15, 2] for a sapling of these designs. Other decentralized P2P systes, such as Gnutella [3] and FastTrack [13]- based Morpheus [5] and KaZaA [4], are unstructured with no coupling between topology and data location. Each of these design styles centralized, decentralized structured, and decentralized unstructured have their advantages and disadvantages and it is not our intent to advocate for a particular choice aong the. However, the decentralized unstructured systes are the ost coonly used in today s Internet. They also raise an iportant perforance issue how to replicate data in such systes and that perforance issue is the subject of this paper. In these decentralized unstructured P2P systes, the hosts for a P2P overlay network; each host has a set of neighbors that are chosen when it joins the network. A host sends its query (e.g., searching for a particular file) to other hosts in the network; Gnutella uses a flooding algorith to propagate the query, but any other query-propagation approaches are possible. The fundaental point is that in these unstructured systes, because the P2P network topology is unrelated to the location of data, the set of nodes receiving a particular query is unrelated to the content of the query. A host doesn t have any inforation about which other hosts ay best be able to resolve the query. Thus, these blind search probes can not be on average ore effective than probing rando nodes. Indeed, siulations in [9] suggest that rando probing is a reasonable odel for search perforance in these decentralized unstructured P2P systes. To iprove syste perforance, one wants to iniize the nuber of hosts that have to be probed before the query is resolved. One way to do this is to replicate the data on several hosts. 1 That is, either when the data is originally 1 For the basic questions we address, it does not atter if the actual data is replicated or if only pointers to the data

2 stored or when it is the subject of a later search, the data can be proactively replicated on other hosts. Gnutella does not support proactive replication, but at least part of the current success of FastTrack-based P2P networks can be attributed to replication: FastTrack designates high-bandwidth nodes as search-hubs (super-nodes). Each supernode replicates the index of several other peers. As a result, each FastTrack search probe eulates several Gnutella probes and thus is uch ore effective. It is clear that blind search is ore effective when a larger index can be viewed per probe, but even though FastTrack increased per-probe capacity, it basically uses the sae replication strategy as Gnutella: The relative index capacity dedicated to each ite is proportional to the nuber of peers that have copies. Thus, current approaches to replication are largely iplicit. We consider the case where one designs an explicit replication strategy. The fundaental question we address in our paper is: given fixed constraints on per-probe capacity, what is the optial way to replicate data? 2 We define replication strategies that, given a query frequency distribution, specify for each ite the nuber of copies ade. The ain etric we consider is the perforance on successful queries, which we easure by the resulting expected (or average) search size A. We also consider perforance on insoluble queries, which is captured by the axiu search size allowed by the syste. Our analysis reveals several surprising fundaental results. We first consider two natural but very different replication strategies: Unifor and Proportional. The Unifor strategy, replicating everything equally, appears naive, whereas the Proportional strategy, where ore popular ites are ore replicated, is designed to perfor better. However, we show that the two replication strategies have the sae expected search size on successful queries. Furtherore, we show that the Unifor and Proportional strategies constitute two extree points of a large faily of strategies which lie between the two and that any other strategy in this faily has better expected search size. We then show that one of the strategies in this faily, Square-root replication, iniizes the expected search size on successful queries. Proportional and Unifor replications, however, are not equivalent. Proportional akes popular ites easier to find and less popular ites harder to find. In particular, Proportional replication requires a uch higher liit on the axiu search size while Unifor iniizes this liit and, thus, iniizes resources consued on processing insoluble queries. We show that the axiu search size with Square-root strategy is in-between the two (though closer to Unifor), and define a range of optial strategies that balance the processing of soluble and insoluble queries. Interestingly, these optial strategies lie between Unifor and Square-root and are far fro Proportional. Last, we address how to ipleent these replication strategies in the distributed setting of an unstructured P2P network. It is easy to ipleent the Unifor and Proporare replicated. The actual use depends on the architecture and is orthogonal to the scope of this paper. Thus, in the sequel, copies refers to actual copies or pointers. 2 Our analysis assues adaptive terination echanis, where the search is stopped once the query is resolved. See [7] for a discussion of a different question related to replication in P2P networks without adaptive terination. tional replication strategies; for Unifor the syste creates a fixed nuber of copies when the ite first enters the syste, for Proportional the syste creates a fixed nuber of copies every tie the ite is queried. What isn t clear is how to ipleent the Square-root replication policy in a distributed fashion. We present several siple distributed algoriths which produce optial replication. Surprisingly, one of these algoriths, path replication, is ipleented, in a soewhat different for, in the Freenet P2P syste. These results are particularly intriguing given that Proportional replication is close to what is used by current P2P networks: With Gnutella there is no proactive replication; if we assue that all copies are the result of previous queries, then the nuber of copies is proportional to the query rate for that ite. Even FastTrack, which uses proactive replication, replicates the coplete index of each node at a search hub so the relative representation of each ite reains the sae. Our results suggest that Proportional, although intuitively appealing, is far fro optial and a siple distributed algorith which is consistent in spirit with Fast- Track replication is able to obtain optial replication. In the next section we introduce our odel and etrics and present a precise stateent of the proble. In Section 3 we exaine the Unifor and Proportional replication strategies. In Section 4 we show that Square-root replication iniizes the expected search size on soluble queries and provide soe coparisons to Proportional and Unifor replication. Section 5 defines the optial policy paraeterized by varying cost of insoluble queries. In Section 6 we present distributed algoriths that yields Square-root replication and present siulation results of its perforance. 2. MODEL AND PROBLEM STATEMENT The network consists of n nodes, each with capacity ρ which is the nuber of copies/keys that the node can hold. 3 Let R = nρ denote the total capacity of the syste. There are distinct data ites in the syste. The noralized vector of query rates takes the for q = q 1 q 2 q with È q i = 1. The query rate q i is the fraction of all queries that are issued for the ith ite. An allocation is a apping of ites to the nuber of copies of that ite (where we assue there is no ore than one copy per node). We let r i denote the nuber of copies of the i th ite (r i counts all copies, including the original È one), and let p i r i/r be the fraction of the total syste capacity allotted to ite i: ri = R. The allocation is represented by the vector p = (r 1/R, r 2/R,..., r /R). A replication or allocation strategy is a apping fro the query rate distribution q to the allocation p. We assue R ρ because outside of this region the proble is either trivial (if ρ the optial allocation is to have copies of all ites on all nodes) or insoluble (if > R there is no allocation with all ites having at least one copy). Our analysis is geared for large values of n (and R = ρn) with our results generally stated in ters of p and ρ with n factored out. Thus, we do not concern ourselves with integrality restrictions on the r i s. However, we do care about the bounds on the quantities p i. Since r i 1, we 3 Later on in this section we explain how to extend the definitions, results, and etrics to the case where nodes have heterogeneous capacities.

3 have p i l where l = 1. Since there is no reason to have R ore than one copy of an ite on a single node, we have r i n and so p i u where u = n = R ρ 1. Later in this section we will discuss reasons why these bounds ay be ade ore strict. We argued in the introduction that perforance of blind search is captured well by rando probes. This abstraction allows us to evaluate perforance without having to consider the specifics of the overlay structure. Siulations in [9] show that rando probes are a reasonable odel for several conceivable designs including Gnutella-like overlays. Specifically, the search echanis we consider is rando search: The search repeatedly draws a node uniforly at rando and asks for a copy of the ite; the search is stopped when the ite is found. The search size is the nuber of nodes drawn until an answer is found. With the rando search echanis, search sizes are rando variables drawn fro a Geoetric distribution with expectation ρ/p i. Thus, perforance is deterined by how any nodes have copies of any particular ite. For a query distribution q and an allocation p, we define the expected search size (ESS) A q(p) to be the expected nuber of nodes one needs to visit until an answer to the query is found, averaged over all ites. It is not hard to see that A q(p) = 1/ρ( i q i/p i). (1) The set of legal allocations P is a polyhedron defined by the ( 1)-diensional siplex of distributions on ites, intersected with an -diensional hypercube, which defines upper and lower bounds on the nuber of copies of each ite: p i = 1 (2) l p i u. (3) The legal allocation that iniizes the expected search size is the solution to the optiization proble Miniize q i/p i such that p P. One obvious property of the optial solution is onotonicity u p 1 p 2 p l (4) (If p is not onotone then consider two ites i, j with q i > q j and p i < p j. The allocation with p i and p j swapped is legal, if p was legal, and has a lower ESS.) In the next section we explore the perforance of two coon replication strategies, Unifor and Proportional. However, we first discuss refineents of our basic odel. 2.1 Bounded search size and insoluble queries So far we have assued that searches continue until the ite is found, but we now discuss applying these results in a ore realistic setting when searches are truncated when soe axial search size L is reached. We say an ite is locatable if a search for it is successful with high probability. Evidently, an ite is locatable if the fraction of nodes with a copy ρp i is sufficiently large with respect to L. Clearly, truncated searches have soe likelihood of failing altogether. The probability of failure is at ost 2 C if p i C/(ρL). Thus, if C = O(log L) then the failure probability would be polynoially sall in L. In order to copare two allocations under truncated search, we set the relation between L and l such that the sae base set of ites is locatable. We now argue that when all ites are locatable, Expression (1) constitutes a close approxiation of the expected truncated search size. The distribution of search sizes under truncated rando searches is a Geoetric distribution with paraeter p i, where values greater by L are truncated by L. Thus, Expression (1), which odels untruncated search, constitutes an upper bound on the truncated expected search size. The error, however, is at ost in i Èj 1 (1 ρpi)l+j = in i(1 ρp i) L /(ρp i) exp( C)/(ρl) = (L/C) exp( C). In the sequel, we assue that l is such that l > ln L/(ρL) and use the untruncated approxiation (1) for the expected search size (it is within an additive ter of exp( lρl/ln L) < 1). Also note that the likelihood for an unsuccessful search for a locatable ite is at ost exp( lρl) < 1/L. Thus, even though each locatable ite can have soe failed searches, these searches constitute a very sall fraction of total searches for the ite. The axiu search size paraeter L is also iportant for analyzing the cost of insoluble queries; that is, queries ade to ites that are not locatable. In actual systes, soe fraction of queries are insoluble, and search perfored on such queries would continue until the axiu search size is exceeded. The cost of these queries is not captured by the ESS etric, but is proportional to L and to the fraction of queries that are insoluble. When coparing replication strategies on the sae set of locatable ites, we need to consider both the ESS, which captures perforance on soluble queries and L, which captures perforance on insoluble queries. In this situation we assue that the respective L(p) is the solution of in i p i = ln L/(ρL). Thus, if f s is the fraction of queries which are soluble and (1 f s) is the fraction of insoluble queries, the perforance of the allocation p is f sa q(p) + (1 f s)l(p). (5) 2.2 Heterogeneous capacities and bandwidth So far we consider a hoogeneous setting, where all copies have the sae size and all nodes have the sae storage and the sae likelihood of getting probed. In reality, hosts have different capacities and bandwidth and, in fact, the ost recent wave of unstructured P2P networks [5, 4] exploits this asyetry. For siplicity of presentation we will keep using the hoogeneous setting in the sequel, but we note that all out results can be generalized in a fairly straightforward way to heterogeneous systes. Suppose nodes have capacities ρ i and visitation weight v i (in the hoogeneous case v i = 1, in general v i is the factor in which the visitation rate differs fro the average). It is not hard È to see that the average capacity seen per probe is ρ = i viρi. The quantity ρ siply replaces ρ in Equation 1 and in particular, the ESS of all allocations are affected by the sae factor; thus all our findings on the relative perforance of different allocations also apply for heterogeneous capacities and visitation rates. An issue that arises when the replication is of copies (rather than pointers) is that ites often have very different sizes. Our subsequent analysis can be extended to this case by treating the ith ite as a group of c i ites with the sae

4 query rate, where c i is the size of ite i. As a result we obtain altered definitions of the three basic allocations we consider: È Proportional has p i = c iq i/ j cjqj (proportional to query rate and size). È Unifor has p i = c i/ j cj (proportional to ite s size). Square-root has p i = c i qi/ È j cj qj (proportional to size and to the square-root of the query rate). 3. ALLOCATION STRATEGIES 3.1 Unifor and Proportional We now address the perforance of two replication strategies. The Unifor replication strategy is where all ites are equally replicated: Definition 3.1. Unifor allocation is defined when l 1/ u and has p i = 1/ for all i = 1,...,. This is a very priitive replication strategy, where all ites are treated identically even though soe ites are ore popular than others. One wouldn t, initially, think that such a strategy would produce good results. When there are restriction on the search size (and thus on l), Unifor allocation has the property that it is defined for all q for which soe legal allocation exists: Any allocation p other than Unifor ust have i where p i > 1/ and j where p j < 1/; If Unifor results in allocations outside the interval [l, u], then either 1/ > u (and thus p i > u) or 1/ < l (and thus p j < l). An iportant appeal of the Unifor allocation is that it iniizes the required axiu search size, and thus, iniizes syste resources spent on insoluble queries. It follows fro Equation (5) that when a large fraction of queries are insoluble, Unifor is (close to) optial. Another natural replication strategy is to have r i be proportional to the query rate. Definition 3.2. Proportional allocation is defined when l q i u. The Proportional solution has p i = q i for all i. Soe of the intuitive appeal of Proportional allocation is that it iniizes the axiu utilization rate [9]. This etric is relevant when the replication is of copies rather than of pointers; that is, when a successful probe is uch ore expensive to process than an unsuccessful probe. The utilization rate of a copy is the average rate of requests it serves. Under rando search, all copies of the sae ite i have the sae utilization rate q i/p i; when there are ore copies, each individual copy has a lower utilization. To avoid hot-spots, it is desirable to have low values for the axiu utilization rate È of a copy ax i q i/p i. The average utilization over all copies, piqi/pi = 1, is independent of the allocation p. The proportional allocation has q i/p i = 1 for all i, and so it clearly iniizes the axial value ax i q i/p i. When copared to Unifor, Proportional iproves the ost coon searches at the expense of the rare ones, which presuably would iprove overall perforance. We now analyze soe of the properties of these two strategies. A straightforward calculation reveals the following surprising result: Lea 3.1. Proportional and Unifor allocations have the sae expected search size A = /ρ, which is independent of the query distribution. We note that Lea 3.1 easily generalizes to all allocations that are a ix of Proportional and Unifor, in which each ite i has p i {1/, q i}. We next characterize the space of allocations. 3.2 Characterizing allocations As a warup, we consider the space of allocations for two ites ( = 2). We shall see that Proportional and Unifor constitute two points in this space with anything between the achieving better perforance, and anything outside having worse perforance. Consider a pair of ites with q i q j. The range of allocations is defined by a single paraeter 0 < x < 1, with p i/(p i +p j) = x and p j/(p i +p j) = (1 x). Proportional corresponds to x = q i/(q i + q j), and Unifor to x = 0.5. The range 0.5 x q i/(q i + q j) captures allocations between Unifor and Proportional. The outside range 0 < x < 0.5 contains non-onotone allocations where the less-popular ite obtains a larger allocation. The outside range 1 > x > q i/(q i + q j) contains allocations where the relative allocation of the ore popular ite is larger than its relative query rate. These different allocations are visualized in Figure 1(A), which plots p i/p j as a function of q i/q j. The ESS for these two ites is proportional to q i/x + q j/(1 x). This function has equal value on x = 0.5 and x = q i/(q i +q j) and is convex. The iniu is obtained at soe iddle point in the between range. By taking the first derivative, equating it to zero, and solving the resulting quadratic equation, we obtain that the iniu is obtained at x = q i/( q i + q j). Figure 1(B) shows the expected search size when using these allocations on two ites ( = 2 and ρ = 1). In this case, the axiu gain factor by using the optial allocation over Unifor or Proportional is 2. In the sequel we extend the observations ade here to arbitrary nuber of ites ( 2). In particular, we develop a notion of an allocation being between Unifor and Proportional and show that these allocations have better ESS, and that outside allocations have worse ESS. We also define the policy that iniizes the ESS, and we bound the axiu gain as a function on, u, and l. 3.3 Between Unifor and Proportional For = 2, we noticed that all allocations that lie between Unifor and Proportional have saller ESS. For general we first define a precise notion of being between Unifor and Proportional: Definition 3.3. An allocation p lies between Unifor and Proportional if for any pair of ites i < j we have q i/q j p i/p j 1; that is, the ratio of allocations p i/p j is between 1 ( Unifor ) and the ratio of their query rates q i/q j ( Proportional ). Note that this faily includes Unifor and Proportional and that all allocations in this faily are onotone. We now establish that all allocations in this faily other than Unifor and Proportional have a strictly better expected search size: Theore 3.1. Consider an allocation p between Unifor and Proportional. Then p has an expected search size of at

5 p2/p Relative allocations for two ites worse ESS than Proportional/Unifor Unifor Square-Root better ESS than Proportional/Unifor worse ESS than Proportional Proportional/Unifor ESS Expected Search Size for =2; rho=1; q_2=1-q_ q2/q1 (A) 1.2 Unifor/Proportional 1 Square-Root q1 (B) Figure 1: (A) The space of allocations on two ites (B) The average query cost for two ites ost /ρ. Moreover, if p is different fro Unifor or Proportional then its expected search size is strictly less than /ρ. Proof. The liits q i/q j p i/p j 1 hold if and only if they hold for any consecutive pair of ites (that is, for j = i + 1). Thus, the set of between allocations is characterized by the 1-diensional polyhedron obtained by intersecting È the 1-diensional siplex (the constraints p i = 1 p i > 0 defining the space of all allocations) with the additional constraints p i p i+1 and p i+1 p iq i+1/q i. Observe È that the expected search size function F(p 1,..., p ) = qi/pi is convex. Thus, its axiu value(s) ust be obtained on vertices of this polyhedron. The vertices are allocations where for any 1 i <, either p i = p j or p i = p jq i/q j. We refer to these allocations as vertex allocations. It reains to show that the axiu of the expected search size function over all vertex allocations is obtained on the Unifor or Proportional allocations. We show that if we are at a vertex other than Proportional or Unifor, we can get to Unifor or Proportional by a series of oves, where each ove is to a vertex with a larger ESS than the one we are currently at. Consider a vertex allocation p which is different than Proportional and Unifor. Let 1 < k < be the iniu such that the ratios p 2/p 1,..., p k+1 /p k are not all 1 or not all equal to the respective ratios q i+1/q i. Note that by definition we ust have that q i+1 < q i for at least one i = 1,..., k 1; and q k+1 < q k. (Otherwise, if q 1 = = q k then any vertex allocation is such that p 1,..., p k+1 are consistent with Unifor or with Proportional; if q k = q k+1 then iniality of k is contradicted). There are now two possibilities for the structure of the (k + 1)-prefix of p: 1. We have p i+1/p i = 1 for i = 1,..., k 1 and p k+1 /p k = q k /q k We have p i+1/p i = q i+1/q i for i = 1,..., k 1 and p k+1 /p k = 1. Clai: at least one of the two candidate vertices characterized by p : p i+1/p i = 1 for i = 1,..., k and p i+1/p i = p i+1/p i for (i > k), or p : p i+1/p i = q i+1/q i for i = 1,..., k and p p i+1/p i for (i > k). i+1/p i = has strictly worse ESS than p. Note that this clai concludes the proof: We can iteratively apply this ove, each tie selecting a worse candidate. After each ove, the prefix of the allocation which is consistent with either Unifor or Proportional is longer. Thus, after at ost 1 such oves, the algorith reaches the Unifor or the Proportional allocation. The proof of the clai is fairly technical and is deferred to the Appendix. We next consider the two failies of outside allocations and show that they perfor worse than Unifor and Proportional. Lea 3.2. Allocations such that one of the following holds j, p j < p j+1 (less popular ite gets larger allocation) or j, p j/p j+1 > q j/q j+1 (between any two ites, the ore popular ite get ore than its share according to query rates) perfor strictly worse than Unifor and Proportional. Proof. The arguents here are sipler, but follow the lines of the proof of Theore 3.1. We start with the first faily of outside allocations. Consider the intersection of the allocation siplex with the halfspace constraints p i p i+1. These constraints constitute a cone with one vertex which is the Unifor allocation. The resulting polyhedron obtained fro the intersection with the siplex has additional vertices, but note that all these vertices lie on the boundary of the siplex (and thus, the ESS function is infinite on the). We now need to show that the iniu of the ESS function over this polyhedron is obtained on the Unifor allocation. Recall that the function is convex,

6 and its global iniu is obtained outside this polyhedron. Thus, the iniu over the polyhedron ust be obtained on a vertex, but the only vertex with bounded ESS value is the Unifor allocation. Thus, the iniu ust be obtained on the Unifor allocation. Siilar arguents apply for the second faily. We consider the halfspace constraints p i+1 p iq i+1/q i. The resulting polyhedron includes the Proportional allocation as the only vertex with finite ESS value. The following observation will becoe useful when bounds are iposed on allocation sizes of ites. Lea 3.3. All allocations in the between faily result in a narrower range of possible ite allocation values than Proportional, that is, we have p 1 q 1 and p q. Proof. Consider such an È allocation p. We have that p i (q i/q 1)p 1. Thus 1 pi (È qi)p1/q1 = p 1/q 1. Hence, p 1 q 1. Siilarly p i (q i/q )p, and we obtain that p q. 4. THE SQUARE-ROOT ALLOCATION We consider an allocation where for any two ites, the ratio of allocations is the square root of the ratio of query rates. Note that this allocation lies between Unifor and Proportional in the sense of Definition 3.3. We show that this allocation iniizes the ESS. Definition 4.1. Square-root allocation is defined when l q i/ È i qi u and has p i = q i/ È i qi for all i. Lea 4.1. Square-root allocation, when defined, iniizes the expected search size. È Proof. The goal is to iniize qi/pi. This natural siple optiization proble had arisen in different contexts, e.g, the capacity assignent proble [8] and scheduling data broadcast [14]. We include a proof for the sake of copleteness. È 1 Substituting p = 1 pi we have F(p 1,..., p 1) = 1 1 q i/p i + q /(1 p i). We are looking for the iniu of F when È 1 pi < 1 and p i > 0. The value of F approaches when we get closer to the boundary of the siplex. Thus, the iniu ust be obtained at an interior point. By solving df/dp i = 0 we obtain that 1 p i = (1 Ô Ô p j) q i/q = p qi/q. j=1 4.1 How uch can we gain? Recall that for all q, the expected search size under both Unifor and Proportional allocations is /ρ. The expected search size under Square-Root allocation is ( q 1/2 i ) 2 /ρ, and depends on the query distribution. An interesting question is the potential gain of applying Square-root rather than Unifor or Proportional allocations. We refer to the ratio of the ESS under Unifor/Proportional to the ESS under Square-root as the gain factor. We first bound the gain factor by, u, and l (proof is in the Appendix): Lea 4.2. Let A SR be the expected search size using Square-root allocation. Let A unifor be the expected search size using Proportional or Unifor allocation. Then A unifor /A SR (u + l lu). Moreover, this is tight for soe distributions. Note that if l = 1/ or u = 1/ then the only legal allocation is 1/ on all ites, and indeed the gain factor is 1. If l 1/, the gain factor is roughly u. ESS Gain Factor w=0.5 w=1 w=1.5 w= e+06 Nuber of ites () Figure 2: The ratio of the ESS under Unifor/Proportional to the ESS under square root allocation, for truncated Zipf-like query distributions on ites. The query rate of the ith ost popular ite is proportional to i w. In [9] we considered truncated Zipf-like query distributions, where there are ites and the query rate of the ith ost popular ite is proportional to i w. A siple calculation (see Table 1) shows that the gain factor draatically grows with the skew. In particular, the gain factor is a conw ρa SR Gain factor w < 1 w = 1 1 < w < 2 1 w (1 w/2) 2 4/ln w 1 2 w (1 w/2) 2 (1 w/2) 2 1 w 1 4 ln (1 w/2)2 w 1 w = 2 ln 2 /ln 2 w > 2 w 1 (w/2 1) 2 (w/2 1) 2 w 1 w 1 Table 1: Asyptotic dependence on of the gain factor and ESS under Square-root allocation for truncated -ites Zipf distribution with power w. stant fraction for w < 1, logarithic in for w = 1, and (fractional) polynoial in for w > 1; for w > 2 the ESS

7 under Square-root is constant and the gain factor is thus linear in. Figure 2 plots the gain factor as a function of the nuber of ites for representative values of the power paraeter w. We next consider two natural query distribution obtained fro Web proxy logs (the Boeing logs described in [1]). We looked at the top Urls, with query rates proportional to the nuber of requests, and top hostnaes (Web sites), with query rates proportional to the nuber of users that issued a request to the site. Figure 3 shows the gain-factor and the ESS under Square-root allocation and Proportional allocation as a function of the nuber of ites for these two distributions. For larger values of, the ESS of Square-root allocation is 30%-50% of that of Proportional or Unifor allocations. Although these gaps are not as draatic as for highly-skewed Zipf distributions, they are substantial. The Unifor, Proportional, and Square-root allocations result in different values of p (the allocation assigned to the locatable ite with sallest query rate); in turn, this corresponds to different required iniu values of the axiu search size, which deterines the cost of insoluble queries. Figure 4 shows that for the hostnae distribution, the axiu search size under Square-root is within a factor of 2 of the sallest possible (Unifor allocation) whereas Proportional requires a uch larger axiu search size. In the next section we develop the optial policy, which iniizes the cobined resource consuption on soluble and insoluble queries. 5. SQUARE-ROOT AND PROPORTIONAL ALLOCATIONS Suppose now that we fix the set of locatable ites and the bound on the axiu search size (that is, we fix the lower bound l on the allocation of any one ite). Phrased differently, this is like fixing the resources consued on insoluble queries. With l (or u) fixed, Square-root allocation ay not be defined - since the sallest (largest) allocation ay be below (above) the bound. We now ask what is the replication strategy which iniizes the ESS under these constraints? We define a natural extension of Square-root, Squareroot, which always results in a legal allocation (when one exists, that is, when l 1/ u). Square-root allocation lies between Square-root and Unifor and we show that Square-root iniizes the ESS. Since Square-root iniizes the ESS while fixing the axiu search size, by sweeping l we obtain a range of optial strategies for any given ratio of soluble and insoluble queries. The extrees of these range are Unifor allocation, which is optial if insoluble queries copletely doinate and Square-root allocation which is optial if there are relatively few insoluble queries. 5.1 Square-root allocation Lea 5.1. Consider a query distribution q where q 1 q and l 1/ u. There is a unique onotone allocation p P for which the following conditions apply. 1. if u > p i, p j > l then p i/p j = Ô (q i/q j). 2. Ô if p j = l p i or if p j u = p i, then p i/p j (qi/q j). Furtherore, this allocation iniizes the ESS. The proof is deferred to the Appendix, and provides a siple iterative procedure to copute the Square-root allocation for arbitrary q i s. The Square-root allocation as a function of l varies between l q / È i qi (where Square-root coincides with the Square-root allocation) and l = 1/ (where Squareroot coincides with the Unifor allocation). On interediate values of l, a suffix of the ites is assigned the iniu allocation value l, but the reaining ites still have allocations proportional to the square-root of their query rate. In the range [ q / È i qi,1/], the ESS as a function of l is a piecewise hyperbolic increasing function that is iniized at l = q / È i qi. The breakpoints of this function correspond to a suffix of the ites that have iniu allocation. Basic algebraic È anipulations show that the breakpoints are l n = s n/(1 i=n si + sn( n + 1)), where s i = È q i/ j qj is the allocation of the ith ite under Square-root allocation. Note that the extrees of this range are indeed l = s (Square-root allocation) and l 1 = 1/ (Unifor allocation). The ESS for l [l n, l n 1) is given by i=n n 1 q i/l + ( q i/s i) È n 1 si 1 l( n + 1) and is increasing with l. The axiu search size needed to support the allocation is approxiately 1/l and is decreasing with l. The overall search size (as defined in Equation 5) is a convex cobination of the ESS and the MSS and is iniized inside the interval [s,1/]. Figure 5 illustrates the cobined expected search size for different ixes of soluble and insoluble queries as a function of l. When all queries are soluble (f s = 1), the iniu is obtained at l = s, where Square-root coincides with Square-root (which iniizes the ESS). At the other extree, when all queries are insoluble (f s = 0), the iniu is obtained at l = 1/, where Square-root coincides with Unifor (which iniizes the MSS). 5.2 Proportional allocation Siilarly, with l (or u) fixed, the Proportional allocation ay not be defined. We siilarly define Proportional allocation to be the legal allocation that iniizes the axiu utilization rate. Proportional allocation is defined whenever there exists a legal allocation, that is, when l 1/ u. We define Proportional allocation as the unique allocation defined by the following conditions: if u > p i, p j > l then p i/p j = q i/q j. if p j = l p i or if p j u = p i, then p i/p j q i/q j. When we have l q i u for all i then Proportional is the sae as Proportional. It follows fro Theore 3.1 that Proportional allocation has ESS no higher than Unifor allocation and when Proportional is different than Proportional (that is, there are q i < l or q i > u) then Proportional has a lower ESS than Unifor. Proportional allocation lies between Proportional and Unifor (in the sense of Theore 3.3).

8 Unifor,Proportional Square-root (hostnae) Square-root (url) hostnae(nuber of users) url(nuber of requests) ESS Gain Factor Nuber of Locatable Ites (A) e+06 Nuber of Locatable Ites (B) Figure 3: (A) The ESS of the different strategies versus the nuber of locatable ites (the x axis), for the hostnae and Url distributions. (B) The gain factor as a function of for the two distributions. Maxiu Search Size Max Search Size vs Nuber of Ites (hostnae dist.) Proportional Square-root Unifor Nuber of Locatable Ites Maxiu Search Size relative to Unifor Ratio of Max Search Size to Opt vs Nuber of Ites (hostnae dist.) Proportional/Unifor Square-root/Unifor Nuber of Locatable Ites Figure 4: Maxiu search size as a function of nuber of locatable ites, and ratio of axiu search size to the iniu possible (Unifor allocation). 6. DISTRIBUTED REPLICATION Our previous results identified the optial replication strategy. In this section we show that it can be achieved, via siple distributed protocols, in a decentralized unstructured P2P network. At least conceptually, The Unifor allocation can be obtained via a siple schee which replicates each ite in a fixed nuber of locations when it first enters the syste. We had seen that the optial allocation is a hybrid of Squareroot and Unifor allocations. We first develop algoriths that obtain Square-root allocation, and then discuss how they can be slightly odified to optially balance the cost of soluble and insoluble queries. In order to consider replication algoriths 4 aied at queryrate-dependent allocations (like Proportional and Squareroot), we odel a dynaic setting where copies are created 4 A note on terinology: a replication strategy is a apping between q and p whereas a replication algorith is a distributed algorith that realizes the desired replication strategy. and deleted: Creation: New copies can be created after each query. After a successful rando search, the requesting node creates soe nuber of copies, call it C, at randoly-selected nodes. This nuber C can only depend on quantities locally observable to the requesting node. Deletion: Copies do not reain in the syste forever; they can be deleted through any different echaniss. All we assue here of copies is that their lifeties are independent of the identity of the ite and the survival probability of a copy is non-increasing with its age: if two copies were generated at ties t 1 and t 2 > t 1 then the second copy is ore likely to still exist at tie t 3 > t 2. This creation and deletion processes are consistent with our search and replication odel and with the operation of unstructured networks. In our odel, and perhaps in reality, the network would perfor copy creation by visiting

9 Search size over all queries (all insoluble) MSS (fs=0%) 0.25*ESS+0.75*MSS (fs= 25%) 0.5*ESS+0.5*MSS (fs= 50%) 0.75*ESS+0.25*MSS (fs= 75%) (all soluble) ESS (fs=100%) Miniu allocation (l) Figure 5: The search size over all queries (soluble and insoluble), as a function of l, when Squareroot allocation is used (Hostnae distribution and =1000). The search size is shown for different ixes of soluble and insoluble queries, The point that iniizes the search size depends on the ix. nodes in a siilar fashion to how it perfors search. 5 Our copy deletion process is consistent with the expectation that in deployed networks, copy deletion occurs either by a node going offline or by soe replaceent procedure internal to a node. We expect node crashes to be unrelated to the content (part of the index) they possess. The replaceent procedure within each node should not discriinate copies based on access patterns. Two coon policies, Least Recently Used (LRU) or Least Frequently Used (LFU) are inconsistent with our assuption, but other natural policies such as First In First Out (FIFO), fixed lifetie durations, or rando deletions (when a new copy is created reove a cached copy selected at rando), are consistent with it. Let C i be the average value of C that is used for creating copies of ite i ( C i ay change over tie). We say that the syste is in steady state when the lifetie distribution of copies does not change over tie. We use the following property of this deletion process in our distributed replication algorith: Clai 6.1. If the ratio C i / C j reains fixed over tie and C i, C j are bounded fro below by soe constant, then p i/p j q i C i /(q j C j ) Thus, Proportional allocation is obtained if we use the sae value of C for all ites. A ore challenging task is designing algoriths that result in Square-root allocation. The challenge in achieving this is that no individual node issues or sees enough queries to estiate the query rate q i, and we would like to deterine C without using additional inter-host counication on top of what is required for the search and replication. We can use the above clai to obtain the following condition for Square-root allocation 5 This process autoatically adjust to designs where search is perfored on a fraction of nodes, or when nodes are not evenly utilized, such as with FastTrack, since hosts receive copies in the sae rate that they receive search probes. Corollary 6.1. If C i 1/ q i then p i/p j Ô q i/q j (the proportion factor ay vary with tie but should be the sae for all ites). We propose three algoriths that achieve the above property while obeying our general guidelines. The algoriths differ in the aount and nature of bookkeeping, and so which algorith is ore desirable in practice will depend on the details of the deployent setting. The first algorith, path replication, uses no additional bookkeeping; the second, replication with sibling-nuber eory, records with each copy the value C (nuber of sibling copies) used in its creation; the third probe eory has every node record a suary on each ite it sees a probe for. We show that under soe reasonable conditions, sibling-nuber and probeeory have C i close to its desired value and path replication has C i converge over tie to its desired value. 6.1 Path replication At any given tie, the expected search size for ite i, A i, is inversely proportional to the allocation p i. If C i is steady then p i q i C i and thus A i 1/(q i C i ). The Path replication algorith sets the nuber of new copies C to be the size of the search (the nuber of nodes probed). At the fixed point, when A i and C i are steady and equal, they are proportional to 1/ q i; and p i is proportional to q i. We show that in steady state, under reasonable conditions, A i and C i converge to this fixed point. Let A i be the value of A i at the fixed point. At a given point in tie, the rate of generating new copies is A i/a i ties the optial. This eans that the rate is higher (respectively, lower) than at the fixed point when the nuber of copies is lower (respectively, higher) than at the fixed point. If the tie between queries is at least of the order of the tie between a search and subsequent copy generation then path replication will converge to its fixed points. A possible disadvantage of path replication is that the current C i overshoots or undershoots the fixed point by a large factor (A i/a i). Thus, if queries arrive in large bursts or if the tie between search and subsequent copy generation is large copared to the query rate then the nuber of copies can fluctuate fro too-few to too-any and never reach the fixed point. Note that this convergence issue ay occur even for a large nuber of nodes. We next consider different algoriths that circuvents this issue by using C i that is close to this fixed-point value. 6.2 Replication with sibling-nuber eory Observe that the search size alone is not sufficient for estiating the query rate q i at any point in tie (path replication only reached the appropriate estiates at the fixed point). In order to have C i 1/ q i we ust use soe additional bookkeeping. Under replication with sibling-nuber eory (SNM), with each new copy we record the nuber of sibling copies that were generated when it was generated, and its generation tie. The algorith assues that each node known the historic lifetie distribution as a function of the age of the copy (thus, according to our assuptions, the age of the copy provides the sibling survival rate). Consider soe past query for the ite, a j, let d j > 0 be the nuber of sibling copies generated after a j and let λ j be the expected fraction of surviving copies at the current tie.

10 Suppose that every copy stores (d, λ) as above. Let T be such that copies generated T tie units ago have a positive probability of survival. Let P T be the set of copies with age at ost T. Then, È c P T 1/(λ cd c) is an unbiased estiator for the nuber of requests for the ite in the past T tie units. Denote by 1/(λ cd c) the expectation of 1/(dλ) over copies. We thus obtain that q i 1/(λ cd c) p i 1/(λ cd c) (1/A i) Hence, it suffices to choose C i with expected value C i (1/ 1/(λ cd c) ) 0.5 A 0.5 i. 6.3 Replication with probe eory We now consider a schee where each node records a suary of all recent 6 probes it had received: For each ite it had seen at least one probe for, the node records the total nuber of probes it received and the cobined size of the searches which these probes where part of (in fact, it is sufficient to use the search size up to the point that the recording node is probed). Interpret q i as the rate per node in which queries for ite i are generated. The probe rate of ite i, which is the rate in which a host in the network receives search probes for ite i, is then r i = q ia i, where A i = 1/(ρp i) is the size of the expected search size for i. Thus, intuitively, the query rate q i can be obtained (estiated) fro (estiates on) A i and r i. In order to estiate r i, which could be very low, with reasonable confidence, we can aggregate it across ultiple nodes. A natural choice for the size for this group, that allows the replication strategy to be integrated with the search process, is to aggregate along nodes on the search path. Consider now the aggregated rate in which nodes on a search path for i received search probes for i. The aggregated rate for a group of nodes of size A i is q ia 2 i = q i/(ρp i) 2. Observe that for Square-root allocation, this aggregated rate is fixed for all ites and is equal to ( È i qi) 2 /ρ 2 ties the rate in which nodes generate queries. This suggests a siple replication algorith: when searching for ite i, count the total nuber of recent probes for i seen on nodes in the search path. The count being lower than the threshold suggests that the ite is over-replicated (with respect to Square-root allocation). Otherwise, it is under-replicated. More generally, by carefully cobining inforation fro nodes on the search path, we can obtain high confidence estiates on A i and r i, that can lead to better estiates on q i. Let v be a node and let k v be the nuber of recent probes it had received for ite i. Let s j (j = 1,..., k v) be the search size of the jth probe seen by v for ite i. Let S v = È k v j=1 sj,7 Suppose that each node stores (k v, S v) for each ite. Let V be a set of nodes (say, the nodes encountered on a search for ite i.). We can estiate the probe rate by 6 We use the inforal ter recent for the a tie duration in which the allocation p i did not significantly change. Given this constraint, we would typically want the duration to be as long as possible. 7 Note that the expected value of s j is A i/2, since it is a Geoetrically distributed rando variable, we expect S v/k v to rapidly converge to A i/2 as k v grows. È ˆr i = v V kv/ V, and the expected search size by  i = 2 v V S v/ v V k v. È È Since q i = r i/a i, we can use ( v V kv)2 /(2 V v V Sv) to estiate q i. We thus can use 2 V S v/ k v v V v V as a (biased) estiator for 1/ q i. È The accuracy of these estiates grows with v V kv. We earlier argued that È when the allocation is Square-root, the expectation of v V kv (when V are nodes on the search path for i) is fixed across ites. This iplies that it is desirable to aggregate statistics on probes for an ite over nuber of nodes that is proportional to the search size of the ite. The confidence level in the estiates can be increased by increasing the proportion factor. If an ite is over-allocated with respect to Square root, then the probe rate over the search path is low, in which case the estiates are not as accurate, but this only iplies that the ite should be replicated at a higher rate. If an ite is under-allocated, then the statistics collects ore probes and the estiates obtained have even larger confidence. 6.4 Obtaining the optial allocation The algoriths presented above converge to Square-root allocation, but now we discuss how they can be odified so that we obtain the optial allocation. For the purposes of the discussion, assue that we fix the set of locatable ites, that is, ites for which we want queries to be soluble. The optial algorith is a hybrid of a Unifor and a Square-root algoriths: For each locatable ite, the Unifor coponent of the replication algorith assigns a nuber of peranent copies (e.g., by noinating nodes that never delete it and are replaced when they go offline). The Square-root coponent, which can be based on any of the specific algoriths proposed above, generates transient copies as a followup on searches. The axiu search size is set together with the nuber of peranent copies so that ites with this iniu nuber of copies would be locatable. The value of these two dependent paraeters is then tuned according to the ix of soluble and insoluble queries to obtain the optial balance between the cost of soluble and insoluble queries. 6.5 Siulations We argued that all the distributed replication algoriths discussed here do indeed achieve the Square-root allocation, though with different rates of convergence and degrees of stability. We illustrate these convergence issued by siulating two of the proposed algoriths: path replication, and replication with sibling-nuber eory. The siulations track the fraction of nodes containing copies of a single ite in a network with 10K nodes. In our siulations, copies had a fixed lifetie duration and queries are issued in fixed intervals. Each search process continues until k copies are found (k {1, 5}). The nuber of new copies generated following each search is the search size for path replication, and the estiators discussed above for sibling-nuber eory. Soe of the siulations included delay between the tie a search is perfored and

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER IEPC 003-0034 ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER A. Bober, M. Guelan Asher Space Research Institute, Technion-Israel Institute of Technology, 3000 Haifa, Israel

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

New Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors

New Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors New Slack-Monotonic Schedulability Analysis of Real-Tie Tasks on Multiprocessors Risat Mahud Pathan and Jan Jonsson Chalers University of Technology SE-41 96, Göteborg, Sweden {risat, janjo}@chalers.se

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Controlled Flooding Search with Delay Constraints

Controlled Flooding Search with Delay Constraints Controlled Flooding Search with Delay Constraints Nicholas B. Chang and Mingyan Liu Departent of Electrical Engineering and Coputer Science University of Michigan, Ann Arbor, MI 4809-222 Eail: {changn,ingyan}@eecs.uich.edu

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area Proceedings of the 006 WSEAS/IASME International Conference on Heat and Mass Transfer, Miai, Florida, USA, January 18-0, 006 (pp13-18) Spine Fin Efficiency A Three Sided Pyraidal Fin of Equilateral Triangular

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO., FEBRUARY 015 37 Target Assignent in Robotic Networks: Distance Optiality Guarantees and Hierarchical Strategies Jingjin Yu, Meber, IEEE, Soon-Jo Chung,

More information

Scheduling Contract Algorithms on Multiple Processors

Scheduling Contract Algorithms on Multiple Processors Fro: AAAI Technical Report FS-0-04. Copilation copyright 200, AAAI (www.aaai.org). All rights reserved. Scheduling Contract Algoriths on Multiple Processors Daniel S. Bernstein, Theodore. Perkins, Shloo

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

A Study of Information Dissemination Under Multiple Random Walkers and Replication Mechanisms

A Study of Information Dissemination Under Multiple Random Walkers and Replication Mechanisms A Study of Inforation Disseination Under Multiple Rando Walkers and Replication Mechaniss Konstantinos Oikonoou Dept. of Inforatics Ionian University Corfu, Greece okon@ionio.gr Diitrios Kogias Dept. of

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

time time δ jobs jobs

time time δ jobs jobs Approxiating Total Flow Tie on Parallel Machines Stefano Leonardi Danny Raz y Abstract We consider the proble of optiizing the total ow tie of a strea of jobs that are released over tie in a ultiprocessor

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Analytical Model of Epidemic Routing for Delay-Tolerant Networks Qingshan Wang School of Mathematics Hefei University of Technology Hefei, China

Analytical Model of Epidemic Routing for Delay-Tolerant Networks Qingshan Wang School of Mathematics Hefei University of Technology Hefei, China Analytical Model of Epideic Routing for Delay-Tolerant etwors Qingshan Wang School of Matheatics Hefei University of Technology Hefei, China qswang@hfut.edu.cn Zygunt J. Haas School of Electrical and Coputer

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

About the definition of parameters and regimes of active two-port networks with variable loads on the basis of projective geometry

About the definition of parameters and regimes of active two-port networks with variable loads on the basis of projective geometry About the definition of paraeters and regies of active two-port networks with variable loads on the basis of projective geoetry PENN ALEXANDR nstitute of Electronic Engineering and Nanotechnologies "D

More information

Device-to-Device Collaboration through Distributed Storage

Device-to-Device Collaboration through Distributed Storage Device-to-Device Collaboration through Distributed Storage Negin Golrezaei, Student Meber, IEEE, Alexandros G Diakis, Meber, IEEE, Andreas F Molisch, Fellow, IEEE Dept of Electrical Eng University of Southern

More information

MULTIAGENT Resource Allocation (MARA) is the

MULTIAGENT Resource Allocation (MARA) is the EDIC RESEARCH PROPOSAL 1 Designing Negotiation Protocols for Utility Maxiization in Multiagent Resource Allocation Tri Kurniawan Wijaya LSIR, I&C, EPFL Abstract Resource allocation is one of the ain concerns

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Controlled Flooding Search with Delay Constraints

Controlled Flooding Search with Delay Constraints Controlled Flooding Search with Delay Constraints Nicholas B. Chang and Mingyan Liu Departent of Electrical Engineering and Coputer Science University of Michigan, Ann Arbor, MI 48109-2122 Eail: {changn,ingyan}@eecs.uich.edu

More information

General Properties of Radiation Detectors Supplements

General Properties of Radiation Detectors Supplements Phys. 649: Nuclear Techniques Physics Departent Yarouk University Chapter 4: General Properties of Radiation Detectors Suppleents Dr. Nidal M. Ershaidat Overview Phys. 649: Nuclear Techniques Physics Departent

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get: Equal Area Criterion.0 Developent of equal area criterion As in previous notes, all powers are in per-unit. I want to show you the equal area criterion a little differently than the book does it. Let s

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks 1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information