Reliable Agnostic Learning

Size: px
Start display at page:

Download "Reliable Agnostic Learning"

Transcription

1 Reliable Agnostic Learning Ada Tauan Kalai Microsoft Research Cabridge, MA Varun Kanade Georgia Tech Atlanta, GA Yishay Mansour Google Research and Tel Aviv University Abstract It is well-known that in any applications erroneous predictions of one type or another ust be avoided. In soe applications, like spa detection, false positive errors are serious probles. In other applications, like edical diagnosis, abstaining fro aking a prediction ay be ore desirable than aking an incorrect prediction. In this paper we consider different types of reliable classifiers suited for such situations. We foralize and study properties of reliable classifiers in the spirit of agnostic learning (Haussler, 992; Kearns, Schapire, and Sellie, 994), a PAC-like odel where no assuption is ade on the function being learned. We then give two algoriths for reliable agnostic learning under natural distributions. The first reliably learns DNF forulas with no false positives using ebership queries. The second reliably learns halfspaces fro rando exaples with no false positives or false negatives, but the classifier soeties abstains fro aking predictions. Introduction In any achine learning applications, a crucial requireent is that istakes of one type or another should be avoided at all cost. As a otivating exaple, consider spa detection, where classifying a correct eail as spa (a false positive) can have dire, or even fatal consequences. On the other hand, if a spa essage is not tagged correctly (a false negative) it is coparatively a saller proble. In other situations, abstaining Part of this research was done while the author was at Georgia Institute of Technology, supported in part by NSF SES-7347, and NSF CAREER award, and a SLOAN Fellowship The author was supported in part by NSF CCF This work was supported in part by the IST Prograe of the European Counity, under the PASCAL2 Network of Excellence, by a grant fro the Israel Science Foundation and by a grant fro United States-Israel Binational Science Foundation (BSF). This publication reflects the authors views only.!" $"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!" a)! b)! c)! Figure : Three different learning settings. Depending on the application and data, one ay be ore appropriate than the others. (a) The best (ost accurate) classifier (fro the class of halfspaces) for a typical agnostic learning proble. (b) The best positive-reliable halfspace classifier for a proble like spa-prediction in which false positives are to be avoided. (c) The best fully reliable halfspace sandwich classifier for a proble in which all errors are to be avoided. It predicts, +, or?. fro aking a prediction ight be better than aking wrong predictions; in a edical test it is preferable to have inconclusive predictions to wrong ones, so that the reaining predictions can be relied upon. A reliable classifier would have a pre-specified error bound for false positives or false negatives or both. We present foral odels for different types of reliable classifiers and give efficient algoriths for soe of the learning probles that arise. In agnostic learning [Hau92, KSS94], one would like provably efficient learning algoriths that ake no assuption about the function to be learned. The learner s goal is to nearly atch (within ) the accuracy of the best classifier fro a specified class of functions (See Figure a). Agnostic learning can be viewed as PAC learning [Val4] with arbitrary noise. In reliable agnostic learning, the learner s goal is to output a nearly reliable classifier whose accuracy nearly atches the accuracy of the best reliable classifier fro a specified class of functions (See Figure b-c). Since agnostic learning is extreely coputationally deanding, it is interesting that we can find efficient algoriths for reliably agnostic learning interesting classes of functions over natural distributions. Our

2 algoriths build on recent results in agnostic learning, one for learning decision trees [GKK] and one for learning halfspaces [KKMS5]. Throughout the paper, our focus is on coputationally efficient algoriths for learning probles. The contributions of this paper are the following: - We introduce a odel of reliable agnostic learning. Following prior work on agnostic learning, we consider both distribution-specific and distribution-free learning, as well as both learning fro ebership queries and fro rando exaples. We show that reliable agnostic learning is no harder that agnostic learning and no easier that PAC learning. - We give an algorith for reliably learning DNF (with alost no false positives) over the unifor distribution, using ebership queries. More generally, we show that if concept class C is agnostically learnable, then the class of disjunctions of concepts fro C is reliably learnable (with alost no false positives). - We give an algorith for reliably learning halfspace sandwich classifiers over the unit ball under unifor distribution. This algorith is fully reliable in the sense that it alost never akes istakes of any type. We also extend this algorith to have tolerant reliability, in which case a perissible rate of false positives and false negatives is specified, and the goal of the algorith is to achieve axial accuracy subject to these constraints. Positive Reliability: A positive-reliable classifier is one that never produces false positives. (One can siilarly define a negative-reliable classifier as one that never produces false negatives.) On the spa exaple, a positive-reliable DNF could be (Nigeria bank transaction illion) (Viagra COLT.... An eail should be alost certainly spa if it fits into any of a nuber of categories, where each category is specified by a sets of words that it ust contain and ust not contain. Our goal is to output a classifier that is (alost) positive-reliable and has false negative rate (alost) as low as the best positive-reliable classifier fro a fixed set of classifiers, such as size-s DNFs. Consider a positivereliable classifier that has least rate of false negatives. We require our algorith to output a classifier that has rate of false negatives (within ) as low as this best one, and require that the false positive rate of our classifier be less than. Our first algorith efficiently learns the class of DNF forulas in the positive reliable agnostic odel over the unifor distribution and uses ebership queries. Our odel is conceptually akin to a one-sided error agnostic noise odel; hence this ay be a step towards agnostically learning DNF. Note that our algorith also gives an alternative way of learning DNFs (without noise) that is soewhat different than Jackson s celebrated haronic sieve [Jac97]. More generally, we show that if a class of functions is efficiently agnostically learnable, then polynoial-sized disjunctions of concepts fro that class are efficiently learnable in the positive reliable agnostic odel. Siilarly, it can be shown that agnostically learning a class of functions iplies learning polynoial-size conjunctions of concepts fro that class in the negative reliable agnostic odel. A consequence of this is that a polynoial-tie algorith for agnostically learning DNFs over the unifor distribution would give an algorith for the challenging proble unifor-pac learning depth-3 circuits (since PAC learning is easier than reliable learning). Full Reliability: We consider the notion of full-reliability, which eans siultaneous positive and negative reliability. In order to achieve this, we need to consider partial classifiers, ones that ay predict positive, negative, or?, where a? eans no prediction. A partial classifier is fully reliable if it never produces false positives or false negatives. Given a concept class of partial classifiers, the goal is to find a (nearly) fully reliable classifier that is alost as accurate as the best fully reliable classifier fro the concept class. We show that reliable agnostic learning is easier (or no harder) than agnostic learning; if a concept class is efficiently learnable in the agnostic setting, it is also efficiently learnable in the positive-reliable, negativereliable, and full-reliable odels. Tolerant reliability: We also consider the notion of tolerant reliability, where we are willing to tolerate given rates τ +, τ of false positives and false negatives. In this ost general version, we consider the class of halfspace sandwich partial classifiers (halfspace sandwiches for short), in which the exaples that are in one halfspace are positive, exaples that are in another are negative, and the rest of the exaples are classified as?. We extend the agnostic halfspace algorith and analysis of Kalai, Klivans, Mansour, and Servedio [KKMS5] to the case of halfspace sandwiches. In particular, we show that given arbitrary rates τ +, τ, we can learn the class of halfspace sandwiches to within any constant over the unifor distribution in polynoial tie using rando exaples. Our algorith outputs a hypothesis h such that h has false positive and false negative rates close (within ) to τ + and τ respectively and has accuracy within of the best halfspace sandwich with false positive and false negative rates bounded by τ +, τ. Related Work: A classical approach to the proble of reliable classification is defining a loss function that has different penalties for different type of errors. For exaple, by having an infinite loss on false positive errors and a loss of one on false negative errors, we essentially define a positive-reliable classifier as one that iniizes the loss. The ain issue that arises is coputational, since there is no efficient way to copute a halfspace that would iniize a general loss function. The contribution of this paper is to show that one can define interesting sub-classes for which the coputational tasks

3 are feasible. Conceptually, our work is siilar to the study of cautious classifiers, i.e. classifiers which ay output unknown as a label[fho4]. The odels we introduce, in particular the full-reliable learning odel is very siilar to the bounded-iproveent odel in [Pie7]. The ethods we use to prove the reduction between agnostic learning and reliable learning are related to work on delegating classifiers [FFHO4] and cost-sensitive learning [Elk, ZLA3]. The cost-sensitive classification odel is a ore general odel than reliable learning. In particular, to siulate positive reliability, one would ipose a very large cost on false positives. However, we do not see how to extend our DNF algorith to this setting. Even when the istake costs are equal, the proble of agnostically learning DFN reains an open proble [GKK]. The requireent that there be alost no false positives sees to ake the proble significantly sipler. Extending our halfspace algorith to the cost-sensitive classification setting sees ore straightforward. The otivation of our work is highly related to the Neyan-Pearson criterion, an application of which is the following: given a joint distribution over points and labels, iniize the rate of false negatives subject to the condition that the rate of false positives is bounded by a given input paraeter. The Neyan-Pearson criterion classifies points based on the ratio between the likelihood of the point being labeled positive and negative. Neyan and Pearson [NP33] proved that the optial classification strategy is to choose a threshold on the ratio of the likelihoods. This ethod was applied to statistical learning in [CHHS2, SN5], to solve constrained version of epirical risk iniization or structural risk iniization. Unfortunately, the optiization probles that arise for ost classes of interest are coputationally intractable. In contrast, our work focuses on deriving coputationally efficient algoriths for soe interesting concept classes. The ain focus of our work is to develop foral odels for reliable learning and give algoriths with theoretical guarantees on their perforance. Our odels are siilar in spirit to Valiant s PAC learning odel [Val4] and agnostic learning odels by Kearns, Schapire and Sellie [KSS94]. In our algoriths we do not change the underlying distribution on the instance space, but soeties flip labels of instances. This allows us to convert distribution-specific agnostic learning algoriths into algoriths for reliable learning. Organization. For readability, the paper is divided into three parts. In section 2, we focus only on positive reliability. Section 3 contains a reduction fro agnostic learning to fully reliable learning. Finally, in section 4, we consider reliable learning with pre-specified perissible error rates. 2 Positive reliability The ain result we prove in this section is the following: if a concept class C is efficiently agnostically learnable, the coposite class of size-s disjunctions of concepts fro C is efficiently positive-reliably learnable. An application of this result is an algorith for positive reliably learning size-s DNF forulas, based on the recent result by Gopalan, Kalai and Klivans [GKK] for agnostically learning decision trees. (Recall that a onoial has a sall decision tree and therefore the class of DNF is included in the class of disjunctions of decision trees.) 2. Preliinaries Throughout the paper, we will consider X n n as the instance space (for exaple the boolean cube X n = {, } n or the unit ball X n = B n ). For each n, the target is an arbitrary function f n : X n [, ], which we interpret as f n (x) = Pr[y = x]. A distribution D n over X n together with f n, induces a joint distribution, (D n, f n ), over exaples X n {, } as follows: To draw a rando exaple (x, y), pick x X n according to distribution D n, set y = with probability f n (x), and otherwise y =. A false positive is a prediction of when the label is y =. Siilarly a false negative is a prediction of when the label is y =. The rate of false positives and negatives of a classifier c : X n {, } are defined below. When D n and f n are clear fro context, we will oit the. false + (c) = false + (c, D n, f n ) Pr [c(x) = y = ] = E [c(x)( f n (x))] (x,y) (D n,f n) x D n false (c) = false (c, D n, f n ) Pr [c(x) = y = ] = E [( c(x))f n (x)] (x,y) (D n,f n) x D n A classifier c is said to be positive-reliable if false + (c) =. In other words, it never akes false positive predictions. Although, we focus on positive reliability, an entirely siilar definition can be ade in ters of negative reliability. The error of classifier c is, err(c) = err(c, D n, f n ) Pr (x,y) (D n,f n) [c(x) y] = false +(c) + false (c) To keep notation siple, we will drop the subscript n, except in definitions. Oracles: As is standard in coputational learning theory, we consider two types of oracles: ebership query (MQ) and exaple (EX). Given a target function f : X [, ] we define the behavior of the two types of oracles below: - Mebership Query (MQ) oracle: For a query x X, the oracle returns y = with probability f(x), and y = otherwise, independently each tie it is invoked. - Exaple (EX) oracle: When invoked, the oracle draws x X according to the distribution D, sets y = with probability f(x), y = otherwise and returns (x, y).

4 Alost all our results hold for both types of oracles and none of our algoriths change the underlying distribution D over X. We use O(f) to denote an oracle, which ay be of either of these types. Note that neither type of oracle we consider is persistent either ay return (x, ) and (x, ) for the sae x. In this sense, our odel is siilar to the p-concept distribution odel for agnostic learning introduced in Kearns, Schapire and Sellie [KSS94]. Whenever an algorith A has access to oracle O(f) we denote it by A O(f). Agnostic Learning. Algorith A efficiently agnostically learns sequence of concept classes C n n, under distributions D n n, if there exists a polynoial p(n, /, /δ) such that, for every n,, δ > and every f n : X n [, ], with probability at least δ, A O(fn) (, δ) outputs hypothesis h that satisfies err(h, D n, f n ) in c C n err(c, D n, f n ) +. The tie coplexity of both A and h is bounded by p(n, /, /δ). Below we define positive-reliable learning and postpone definitions of full reliability and tolerant reliability to sections 3 and 4 respectively. Define the subset of positive-reliable classifiers (relative to a fixed distribution and target function), to be: C + = {c C false + (c) = }. Note that if we assue that C contains the classifier which predicts on all exaples, then C + is non-epty. Positive Reliable Learning. Algorith A efficiently positive reliably learns a sequence of concept classes C n n, under distributions D n n, if there exists a polynoial p(n, /, /δ) such that, for every n,, δ > and every f n : X n [, ], with probability at least δ, A O(fn) (, δ) outputs hypothesis h that satisfies false + (h, D n, f n ) and false (h, D n, f n ) in false (c, D n, f n ) +. c C n + The tie coplexity of both A and h is bounded by p(n, /, /δ). Each oracle call is assued to take unit tie. Hence, an upper bound on the run-tie is also an upper-bound on the saple coplexity, i.e., the nuber of oracle calls. 2.2 Positive reliably learning DNF Our ain theore for this section is the following: Theore. Let A be an algorith that (using oracle O(f)) efficiently agnostically learns concept class C under distribution D. There exists an algorith A that (using oracle O(f) and black-box access to A) efficiently positive reliably learns the class of size-s disjunctions of concepts fro C. Using the result on learning decision trees [GKK], we iediately get Corollary. The reainder of this section is devoted to proving Theore and Corollary. Corollary. There is an MQ algorith B such that for any n, s, B positive reliably learns the class of sizes DNF forulas on n variables in tie poly(n, s,, δ ) with respect to the unifor distribution. We show that if we have access to oracle O(f), we can siulate oracle O(f ) where f (x) = q(x)f(x) + ( q(x))r(x), where q, r : X [, ] are arbitrary functions and we only assue black-box access to q and r. Lea. Given access to oracle O(f) and black-box access to the functions q and r we can siulate oracle O(f ) for f = qf + ( q)r. Proof. We show this assuing that O(f) is an exaple oracle; the case when O(f) is a ebership query oracle is sipler. Let f = qf + ( q)r. To siulate O(f ), we do the following: First draw (x, y) fro O(f), with probability q(x), return (x, y). With probability q(x) do the following: Set y = with probability r(x), y = otherwise, return (x, y ). It is easy to see that Pr[y = x] = f (x), thus this siulates the oracle O(f ). When C is a concept class that is agnostically learnable, we give an algorith that uses the agnostic learner as a black-box and outputs a hypothesis which has a low rate of false positives. This hypothesis will have rate of false negatives close to optiu with respect to the positive-reliable concepts in C. Our construction odifies the target function f, leaving the distribution D unchanged, in such a way that false positives (with respect to the original function) are penalized uch ore than false negatives. By correctly choosing paraeters, the output of the black-box agnostic learner is close to the best positive-reliable classifier. Lea 2. Assue that algorith A O(f) (, δ) efficiently agnostically learns concept class C under distribution D in tie T (, δ). Then, A O(f ) ( 2 /2, δ), where f = (/2+/4)f, positive reliably learns C in tie T ( 2 /2, δ). Proof. Let f = ( 2 + 4) f. Let g : X {, } be an arbitrary function and let p = Ex D[f(x)]. We relate the quantities false + and false of g with respect to the functions f and f - siple calculations show that, false + (g, D, f ) = ( false + (g, D, f) + 2 ) (p false (g, D, f)) () 4 ( false (g, D, f ) = 2 + ) false (g, D, f) (2) 4 err(g, D, f ) = false + (g, D, f) + ( 2 ) p false (g, D, f) (3) Let c C be such that false + (c, D, f) = and false (c, D, f) = opt + = in c C + false (c, D, f), so that if h is the output of A O(f ) ( 2 2, δ), with probability at least δ, err(h, D, f ) err(c, D, f ) (4)

5 Substituting identity (3) in (4), once for c and once for h we get false + (h, D, f) 2 (opt + false (h, D, f)) (5) Dropping the non-negative ter false + (h, D, f) fro (5) and rearranging we get, 2 false (h, D, f ) 2 opt false (h, D, f) opt + + (6) input:, δ, M, O(f), CPRL set H := ; for i = to M { set := 2 log 2 δ ; draw Z = (x, y ),..., (x, y ) fro distribution (D, f); set ˆp i := j= ( H i (x j )); if ˆp i 2 { h i := CPRL(, δ, 2 3 ˆp i, O(f), H i ); } else { h i := ; } H i := H i h i ; } output H M For learning disjunction of concepts, we first need to learn the concept class on a subset of the instance space. We show that in the agnostic setting, learning on a subset of the instance space is only as hard as learning over the entire instance space. The siple reduction we present here does not see feasible in the noiseless case. Let S X be a subset, a distribution D over X induces a conditional distribution D S over S; for any T X, Pr D S [T ] = Pr D [T S]/ Pr D [S]. We assue access to the indicator function for the set S, i.e., I S : X {, } such that I S (x) = if x S and I S (x) = otherwise. If Pr D [S] is not negligible, it is possible to agnostically learn under conditional distribution D S using a blackbox agnostic learner. Lea 3. Suppose algorith A O(f) (, δ) efficiently agnostically learns C under distribution D in tie T (, δ). Let S X such that, Pr D [S] γ(= / poly). Then, algorith A O(f ) (γ, δ), where f = fi S + ( I S )/2, efficiently agnostically learns C under distribution D S in tie T (γ, δ). Proof. We first relate the errors with respect to f and f = fi S +( I S )/2. In words, f is f on S and rando outside S. Therefore, the error on any function g would be /2 outside of S and its error on D S inside S. More forally, for any function g : X {, } we have, err(g, D, f ) = E [g(x)( f (x)) + ( g(x))f (x)] x D = E [(g(x)( f(x)) + ( g(x))f(x))i S ] x D + E [( I S )/2] x D = Pr [S] err(g, D S, f) + ( Pr[S])/2 (7) D D Let c C be such that err(c, D S, f) = in c C err(c, D S, f) = opt. Using (7) we can conclude that c is also optial under distribution D with respect to function f. Algorith A O(f ) (γ, δ) outputs hypothesis h which satisfies with probability at least δ, err(h, D, f ) err(c, D, f ) + γ. Using (7) and since Pr D [S] γ we iediately get err(h, D S, f) opt +. We define algorith CPRL (conditional positive reliable learner) as follows. The input to the algorith Figure 2: Algorith Disjunction Learner is - the accuracy paraeter, δ - the confidence paraeter, γ - bound on the probability of S, O(f) oracle to access the target function and I S the indicator function for subset S. Algorith CPRL initially defines f = (/2 + /4)(fI S + ( I S )/2). By Lea we can saple fro O(f ) given access to O(f) and I S. Algorith CPRL runs A O(f ) ( 2 γ 2 /2, δ) and returns its output. We state the properties of algorith CPRL as Lea 4 Lea 4. Suppose algorith A O(f) (, δ) agnostically learns C under distribution D in tie T (, δ). For a subset S X with Pr D [S] γ(= / poly), algorith CPRL (which uses A as a black-box) with probability at least δ returns a hypothesis h such that false + (h, D S, f) and false (h, D S, f) in c C + false (c, D S, f) + Algorith CPRL has access to oracle O(f) and black-box access to I S. The running tie of algorith CPRL is T ( 2 γ 2 /2, δ). Let OR s (C) = {c c s c j C, j s} be the coposite class of size-s disjunctions of concepts fro C. Theore 2 states that if C is agnostically learnable under distribution D, then OR s (C) is positive reliably learnable under distribution D. We need two siple leas to prove Theore 2, whose proofs are eleentary and we oit. Lea 5. Let D be a distribution over X, f : X [, ] an arbitrary function and let c, h be such that false + (c) = and false (h) false (c) +, then Pr D [h(x) = ] Pr D [c(x) = ] Lea 6. Let D be a distribution over X, f : X [, ] an arbitrary function and let c, h be such that false + (c) =, false + (h) and Pr D [h(x) = ] Pr D [c(x) = ], then false (h) false (c) + Lea 5 states that if a hypothesis h has a rate of false negatives close to that of a concept c that predicts

6 no false positives, then h ust predict alost as often as c. Lea 6 states that if h is a hypothesis with low rate of false positives and if it predicts alost as often as a concept c which never predicts false positives, the rate of false negatives of h ust be close to that of c. Theore 2. Let C be a concept class that is agnostically learnable (in polynoial tie) under D and f : X [, ] be an arbitrary function. Algorith Disjunction- Learner (see Fig. 2) run with paraeters, δ, M = s log(2/) and access to oracle O(f) and black-box access to CPRL, with probability at least δ outputs a hypothesis H M, such that false + (H M, D, f) and false (H M, D, f) in (ψ, D, f) +. ψ OR s(c) + The running tie is polynoial in s,, δ. Proof. For definitions of H i and p i refer to figure 2. Let ϕ OR s (C) satisfy false + (ϕ, D, f) = and false (ϕ, D, f) = opt + = in ψ ORs(C) + false (ψ, D, f). Suppose ϕ = c c s, where c j C. Let S i = {x X H i (x) = }, and hence H i is an indicator function for S i. Let p i = Pr D [S i ] and p + = Pr D [ϕ(x) = ]. Our goal is to show that Pr D [H M (x) = ] p + and false + (H M, D, f) and then using Lea 6 we are done. In the ith iteration, we copute using a saple, ˆp i to estiate p i. Using Hoeffding s bound, with probability at least (δ/), ˆp i p i (/2). Let us suppose this holds for all M iterations, allowing our algorith a failure probability of δ/2 so far. In this scenario, if p i, ˆp i (/2) and ˆp i (3p i /2). Note that if p i <, then false (H i, D, f) < and Pr D [H i (x) = ] and hence we are done. Let D Si denote the conditional distribution given S i and in the ith iteration the call to CPRL returns a hypothesis h i such that false + (h i, D Si, f) and false (h i, D Si, f) opt i + where opt i = in c C + false (c, D Si, f). Note that for j =,..., s, false + (c j, D Si, f) =, and hence false (c j, D Si, f) opt i for all i. Thus we get, false (h i, D Si, f) false (c j, D Si, f) + and hence using Lea 5, Pr [h i (x) = ] Pr [c j (x) = ] D Si D Si Pr [h i (x) = ] D Si s Pr [ϕ(x) = ] D Si Let q i = p i = Pr D [H i (x) = ]. We define the quantity d i = p + q i and show by induction that d i ( ) p + i+ i ( ) s j= j. s To check the base step see that d = p + q = p + Pr D [h (x) = ] p + p + s +. For the induction step we have: d i+ = p + Pr [H i+(x) = ] D = p + Pr [H i(x) = ] Pr [H i(x) = h i+ (x) = ] D D = d i q i+ Pr [h i+ (x) = ] D Si+ d i q i+ s Pr [ϕ(x) = ] + D Si+ d i s (Pr [ϕ(x) = ] Pr [H i(x) = ]) + D D ( d i ) + s p + ( s ) i+ + i j= ( ) j s When M = s log 2, d M, and it is easily checked that false + (H M, D, f) and hence using Lea 6, false (H M, D, f) opt + +. The probability of failure soe call to CPRL is at ost M δ = δ/2, which cobined with probability of failure caused by incorrect estiation of soe ˆp i gives total failure probability at ost δ. Theore 2 is a ore precise restateent of Theore. This cobined with Lea 7 below (which is Theore fro [GKK]) proves Corollary. Lea 7. [GKK] The class of polynoial-size decision trees can be agnostically learned (using queries) to accuracy in tie poly(n,, δ ) with respect to the unifor distribution. 3 Full reliability In this section we deal with the case of full reliability. We are interested in obtaining a hypothesis that has low error rate, in ters of false positives and negatives both. In the noisy (agnostic) setting, this is not possible unless we allow our hypothesis to refrain fro aking a prediction. For this reason we use the notion of partial classifiers which predict a value fro the set {,,?}, where prediction of? is treated as uncertainty of the classifier. Recall, that our instance space is X n n and for each n, the target is an unknown function f n : X [, ]. Forally, a partial classifier is c : X n {,,?}. Let I(E) denote the indicator function for event E. The error (siilarly false +, false ) of a partial classifier can be defined as err(c, D n, f n ) = E [I(c(x) = )f n (x) + I(c(x) = )( f n (x))] x D n We define the uncertainty of a partial classifier c to be?(c, D n, f n ) = E [I(c(x) =?)] x Dn Finally we define the accuracy of a partial classifier c to be acc(c, D n, f n ) = E [I(c(x) = )( f n (x)) + I(c(x) = )f n (x)] x D n

7 Note that for a partial classifier c, we have err(c)+?(c)+ acc(c) =. Suppose that C n n is a sequence of concept classes, recall that we defined C n + = {c C n false + (c) = }. Siilarly we define Cn = {c C n false (c) = }. We can now define the class of partial classifiers derived fro C n as PC(C n ) = {(c +, c ) c + C n +, c Cn }. The definition ensures that for a partial classifier c = (c +, c ) PC(C n ), with probability a saple x D n satisfies c (x) c + (x). For any x, c(x) =, when both c + (x) = and c (x) =, c(x) =, when both c + (x) = and c (x) = and? otherwise. A fully reliable classifier is a partial classifier which akes no errors. We define fully reliable learning as: Full Reliable Learning. Algorith A efficiently fully reliably learns a sequence of concept classes C n n, under distributions D n n, if there exists a polynoial p(n, /, /δ) such that, for every n,, δ > and every f n : X n [, ], algorith A O(fn) (, δ), with probability at least δ, outputs a hypothesis h such that err(h, D n, f n ) and acc(h, D n, f n ) ax acc(c, D n, f n ). c PC(C n) The tie coplexity of both A and h is bounded by p(n, /, /δ). In section 2 we gave an algorith for positive reliable learning using a black-box agnostic learner for C n (using Lea 2). We define this as algorith PRL (positive reliable learner). The input to the algorith is - the accuracy paraeter, δ - the confidence paraeter and oracle O(f) - oracle access to the target function. The output of the algorith is a hypothesis h such that false + (h) and false (h) in c C + false (c) +. The algorith runs in tie polynoial in n, / and /δ. A syetric algorith NRL (negative reliable learner) also has identical properties with false positive and false negative error bounds interchanged. Our ain result of this section is showing that agnostic learning iplies fully reliable learning. Theore 3. If C is efficiently agnostically learnable under distribution D, it is efficiently fully reliably learnable under distribution D. Proof. A siple algorith to fully reliably learn C is the following: Let h + = PRL O(f) ( 4, δ 2 ), h = NRL O(f) ( 4, δ 2 ). Define h : X {,,?} as, if h + (x) = h (x) = h(x) = if h + (x) = h (x) =? otherwise We clai that hypothesis h is close to the best fully reliable partial classifier. Let c = (c +, c ) PC(C) be such that err(c, D, f) = and acc(c, D, f) = ax c PC(C) acc(c, D, f). We know that the following hold: false + (h +, D, f) 4 false (h +, D, f) false (c +, D, f) + 4 Siilarly, false (h, D, f) 4 false + (h, D, f) false + (c, D, f) + 4 Then err(h, D, f) false + (h +, D, f)+false (h, D, f) 2. Note that when h(x) =? then h + (x) h (x) and hence one of the akes an error. Therefore, we also have acc(h, D, f) err(h +, D, f)+err(h, D, f). We know that err(h +, D, f) = false + (h +, D, f) + false (h +, D, f) false (c +, D, f) + 2 Siilarly err(h, D, f) false + (c, D, f)+/2. Finally, we check that false (c +, D, f) + false + (c, D, f) = acc(c, D, f), hence acc(h, D, f) acc(c, D, f) 4 Reliable learning with tolerance In this section, we consider our final generalization where we allow tolerance rates τ + and τ for false positives and false negatives respectively. As in the case of fully reliable learning we need to consider the class of partial classifiers. Given a sequence of concept classes C n n, for each C n we define the class of sandwich classifiers as SC(C n ) = {(c +, c ) c + c }. For a sandwich classifier c = (c +, c ), given x, c(x) = when both c + (x) = and c (x) =, c(x) = when both c + (x) = and c (x) = and? otherwise. Sandwich classifiers are a generalization of the class of partial classifiers PC(C n ) that we defined in the previous section. In particular, the class PC(C n ) is the set of all sandwich classifiers with zero error. We call a sandwich classifier c SC(C n ), (τ +, τ )-tolerant reliable if false + (c, D n, f n ) τ + and false (c, D n, f n ) τ. Given acceptable tolerance rates of τ +, τ define opt as opt(τ +, τ ) = ax acc(c); c SC(C n) subject to: {false + (c) τ + ; false (c) τ } We show that algorith Sandwich Learner (Fig. 3) efficiently learns the class of halfspace sandwiches, over the unit ball B n, under the unifor distribution. With inor odifications, this algorith can also learn under log-concave distributions. A halfspace sandwich over B n can be visualized as two slices, one labeled, the other and the reaining ball labeled?. Our algorith is based on the results due to Kalai, Klivans, Mansour and Servedio [KKMS5], and the analysis is largely siilar. Algorith Sandwich Learner in each iteration draws a saple of size and sets up an l regression proble (- ), defined as: in deg(p) d deg(q) d i: y i = p(x i ) + q(x i ) ()

8 inputs:, T, d.draw labeled exaples Z t = (x t, y t ),..., (x t, y t ) (B n {, }) and set up the l polynoial regression proble (-). 2.Solve the regression proble, get polynoials p t (x), q t (x) and record z t to be the value of the objective function. 3.Repeat the above two steps T ties and take p = p s, q = q s to be the polynoials where z s was the least. 4.Output a randoized partial hypothesis h : B n {,,?} [, ]: h(x, ) = crop(in(p(x), q(x))) h(x, ) = crop(in( p(x), q(x)) h(x,?) = h(x, ) h(x, ) subject to Figure 3: Algorith Sandwich Learner i: y i = p(x i ) τ q(x i ) τ + 2 ax(p(x i ) q(x i ), ) i (9) () () This proble can be solved efficiently by setting it up as a linear prograing proble (see appendix of [KKMS5]). Thus, the running tie of the algorith is polynoial in, T and d. The function crop : R [, ] used in the algorith is defined as: crop(z) = z for z [, ], crop(z) = for z < and crop(z) = for z >. The output of the algorith is a randoized partial classifier. We define a randoized partial classifier as a function, c : X {,,?} [, ], so that c(x, ) + c(x, ) + c(x,?) =, and c(x, y) is interpreted the probability that the output for x is y. We define error and epirical error of such classifiers as, err(c, D) = E (x,y) D [c(x, y)] êrr(c, Z) = c(x i, y i ), i= Other quantities are defined siilarly. Theore 4 below shows that for any constant, the class of halfspace sandwiches over the unit ball B n is efficiently (τ +, τ )- tolerant reliably learnable. The running tie of the algorith is exponential in /. Theore 4. Let U be the unifor distribution on the unit ball B n and let f : B n [, ] be an arbitrary unknown function. There exists a polynoial P such that, for any, δ >, n, τ +, τ, algorith Sandwich-Learner with paraeters = P (n d /(δ)), T = log(2/δ), d = O(/ 2 ), with probability at least δ, returns a randoized hypothesis h which satisfies false + (h) τ + +, false (h) τ + and acc(h) opt(τ +, τ ), where opt is with respect to the class of halfspace sandwich classifiers. The proof of Theore 4 is given in the appendix. References [CHHS2] A. Cannon, J. Howse, D. Hush, and C. Scovel. Learning with the neyanpearson and in-ax criteria. Technical Report LA-UR-2-295, Los Alaos National Laboratory, 22. [Elk] C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the seventeenth International Joint Conference on Artificial Intelligence, pages , 2. [FFHO4] C. Ferri, P. Flach, and J. Hernández-Orallo. Delegating classifier. In ICML 4: Proceedings of the twenty-first International Conference on Machine Learning, pages 37 44, New York, NY, USA, 24. ACM. [FHO4] C. Ferri and J. Hernández-Orallo. Cautious classifiers. In Proceedings of first International Workshop on the ROC Analysis in Artificial Intelligence, pages 27 36, 24. [GKK] P. Gopalan, A.T. Kalai, and A.R. Klivans. Agnostically learning decision trees. In STOC : Proceedings of the fortieth annual ACM Syposiu on the Theory of Coputation, New York, NY, USA, 2. ACM. [Hau92] D. Haussler. Decision-theoretic generalizations of the PAC odel for neural networks and other applications. Inforation and Coputation, :7 5, 992. [Jac97] J.C. Jackson. An efficient ebershipquery algorith for learning DNF with respect to the unifor distribution. Journal of Coputer and Syste Sciences, 55(3):44 44, 997. [KKMS5] A.T. Kalai, A.R. Klivans, Y. Mansour, and R.A. Servedio. Agnostically learning halfspaces. In FOCS 5: Proceedings of the 46th annual IEEE Syposiu on Foundations of Coputer Science, pages 2, Washington, DC, USA, 25. IEEE Coputer Society. [KSS94] [NP33] [Pie7] M.J. Kearns, R.E. Schapire, and L.M. Sellie. Toward efficient agnostic learning. Machine Learning, 7(2):5 4, 994. J. Neyan and E. S. Pearson. On the proble of the ost efficient tests for statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A, Containing Papers of a Matheatical or Physical Character, 23:29 337, 933. Tadeusz Pietraszek. On the use of roc analysis for the optiization of abstaining clas-

9 sifiers. Machine Learning, 6(2):37 69, 27. [SN5] C. Scott and R. Nowak. A neyanpearson approach to statistical learning. IEEE Transactions on Inforation Theory, 5():36 39, 25. [Val4] L.G. Valiant. A theory of the learnable. Counications of the ACM, 27():34 42, 94. [ZLA3] B. Zadrozny, J. Langford, and N. Abe. Costsensitive learning by cost-proportionate exaple weighting. In ICDM 3: Proceeding of the third IEEE International Conference on Data Mining, page 435, Washington, DC, USA, 23. IEEE Coputer Society. A Proof of Theore 4 To prove Theore 4, we need Lea as an interediate step. Here B n is the unit ball. Lea. Let D be a distribution over B n and f : B n [, ] be an arbitrary function. Let τ +, τ be the required tolerance paraeters. Suppose c = (c +, c ) is a half-space sandwich classifier such that false + (c) τ +, false (c) τ and acc(c) = opt(τ +, τ ). If p +, p are 2 degree d polynoials such that E[ c + (x) p + (x) ] and E[ c (x) p (x) ] 2 2, and if =, then with 2 probability at least 2, p +, p are feasible solutions to the l Polynoial Regression Proble (-) and value of the objective function at p +, p is at ost false (c + ) + false + (c ) + 2. Proof. Using Hoeffding s bound when = 2 2, Pr[ false + (c + ) false + (c + ) + ] 6 (2) where false + (c + ) is the epirical estiate of false + (c + ) using a saple of size. Also by Markov s inequality, [ ] Pr c + (x i ) p + (x i ) i [ ] Pr c + (x i ) p + (x i ) 6 E[ c + (x) p + (x) ] i (3) 6 And hence with probability at least 7/, p + (x i ) ( c + (x i ) + c + (x i ) p + (x i ) ) false + (c + ) + / + / τ + + /4 (4) and hence p + satisfies constraint (9) in the regression proble. Siilarly one can show that with probability at least 7/, p satisfies constraint () of the regression proble. Observe that E[ax(p + (x) p (x), )] E[ p + (x) p (x)+c (x) c + (x) ] E[ p + (x) c + (x) ]+E[ p (x) c (x) ] /64. Here we use the fact that c (x) c + (x) for all x (according to our definition of SC(C)). Hence by Markov s inequality, Pr[ ax(p + (x i ) p i (x i ), ) ] (5) i By union bound with probability at least 5/, p +, p satisfy all the constraints of the regression proble. Let us assue we are in the event where all of (2-5) and the corresponding stateents in the case of false negatives are true. Using Hoeffding s bound, Pr[ false (c + ) false (c + )+/] /6 and Pr[ false + (c ) false + (c )+ /] /6. We allow ourselves a further / loss in probability so that these two events do not occur either. Thus with probability at least /2, p +, p are feasible solutions to the regression proble and the value of the objective is, i: y i = + p + (x i ) + i: y i = p (x i ) ( c + (x i ) + c + (x i ) p + (x i ) ) ( c (x i ) + c (x i ) p (x i ) ) false (c + ) + false + (c ) + /2 Proof of Theore 4. We use threshold functions θ t in our proof, where for any t [, ], θ t : R {, } is such that θ t (x) = if x t, and θ t (x) = otherwise. Although the threshold functions θ t do not occur in the algorith, they significantly siplify the proof. To get a decision using the randoized hypothesis that our algorith outputs, a siple technique is to choose a threshold uniforly in [, ] and use that to output a value in {,,?}. Thresholds over polynoials are half-spaces in a higher (n d ) diensional space, and we can use standard results fro VC theory to bound the difference between the epirical error rates and true error rates. We will use frequently the following useful observation: For any z R we have crop(z) = θ t (z)dt. For a degree d polynoial p : R n R, the function θ t p can be viewed a half-space in diension n d diension, where we extend the ters of the polynoial to a linear function. Using the classical VC theory, for any distribution D over B n, there exists a polynoial Q such that for = Q(n d,, δ ) a saple S of exaples drawn fro (D, f), with probability at least δ, êrr(θ t p) err(θ t p), where êrr(g) = (x,y) S I(g(x) y). Siilar bounds hold for false + and false, where false + (g, S) = (x,y) S( g(x))y and false (g, S) = (x,y) S g(x)( y).

10 Let c = (c +, c ) be the best linear sandwich classifier with respect to tolerance rates τ +, τ. Using results fro Kalai, Klivans, Mansour and Servedio [KKMS5], for d = O(/ 2 ) there exist polynoials p +, p such that E[ p + (x) c + (x) ] /2 and E[ p (x) c (x) ] /2. Thus when algorith Sandwich-Learner is run with T = log 2 δ by Lea, with probability at least δ/2, at least one of the iterations has value of the objective function saller than false (c + )+false + (c )+ /2. It can be checked that false (c + ) + false + (c ) = acc(c) = err(c)+?(c). We assue that we are in the case when the least objective function is saller than acc(c) + /2, allowing our algorith to fail with probability δ/2 so far. Let p, q be the polynoials which are solutions to the regression proble with the least objective function. We now analyze the quantities false +, false, acc of the randoized hypothesis h output by algorith Sandwich-Learner. We present the analysis of false +, and false can be done siilarly. We assue that the nuber of exaples is large enough, so that all the bounds due to VC theory that we require in the proof hold siultaneously with probability at least δ 2. This can be done easily by taking union bound using only polynoial nuber of exaples, say P (n d /(δ)). For the first part, consider hyperplanes of the for θ t p and θ t ( q), for any t [, ]. By the VC bound we have that false + (θ t p, S) false + (θ t p, U, f) /2 and false (θ t ( q), S) false + (θ t ( q), U, f) /2. The analysis then proceeds as follows, false + (h, U, f) = E [h(x, )( f(x))] x U [ x U = E = = x U θ t (in(p(x), q(x)))dt( f(x))] E [θ t (p(x))( f(x))]dt false + (θ t p, U, f)dt false + (θ t p, S)dt + 2 θ t (p(x i ))dt + 2 p(x i ) + 2 τ + + Next we analyze the quantity acc(h) = err(h)+?(h). Let S = {(x, y ),..., (x, y )}, naely we flip the label in S, and S = {(x, ),..., (x, )}, naely the saple S all with labels. Here we assue the following bounds hold false + (θ t ( p), S) false + (θ t All our requireents would be about coparing the error of a hyperplane to its observed error on the saple. We would need that the saple S = {(x, y ),..., (x, y )} is such that no hyperplane would have a difference larger than /. ( p), U, f) /, false + (θ t q, S) false + (θ t q, U, f) / and êrr(θ t (p q), S ) err(θ t (p q), U, ) /. Step (6) below holds because crop(a) = crop( a). Step (7) uses the fact that in(a, b) = a (a b)i(a > b) and ax(a, b) = b + (a b)i(a > b). Finally in step () we use that crop(a+b) crop(a) + crop(b). All these facts can be checked easily. acc(h) = E [h(x, )f(x) + h(x, )( f(x))] = E [( h(x, ))f(x) + ( h(x, ))( f(x))] = E [( crop(in(p(x), q(x))))f(x)+ ( crop(in( p(x), q(x))))( f(x))] = E [crop( in(p(x), q(x)))f(x)+ crop(ax(p(x), q(x)))( f(x))] (6) = E [crop( p(x) + (p(x) q(x))i(p(x) > q(x)))f(x) + crop(q(x) + (p(x) q(x))i(p(x) > q(x)))( f(x))] (7) E [crop( p(x))f(x) + crop(q(x))( f(x)) + crop(p(x) q(x))] () [ = E + θ t ( p(x))f(x)dt + θ t (p(x) q(x)) dt] θ t q(x)( f(x))dt ( = E [θ t ( p(x))f(x)] + E [θ t (q(x))( f(x))] ) + E [θ t (p(x) q(x))] dt = (false + (θ t ( p), U, f) + false + (θ t q, U, f) + err(θ t (p q), U, )) dt ( false + (θ t ( p), S) + false + (θ t q, S) + êrr(θ t (p q), S ) ) dt + 3 = θ t ( p(x i )) + i: y i = ) + θ t (p(x i ) q(x i )) dt + 3 = + i= i: y i = i θ t ( p(x i ))dt + θ t (p(x i ) q(x i ))dt + 3 θ t (q(x i )) θ t (q(x i ))dt (9)

11 + i: y i = i= p(x i ) + acc(c, U, f) + q(x i ) ax(p(x i ) q(x i ), ) + 3 where in the last inequality we used that fact that the first two ters are the objective function of the regression (and we assued that they are at ost false (c + )+ false + (c ) + /2, and the third ter is bounded in the regression by /. Hence acc(h) acc(c).

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Robust polynomial regression up to the information theoretic limit

Robust polynomial regression up to the information theoretic limit 58th Annual IEEE Syposiu on Foundations of Coputer Science Robust polynoial regression up to the inforation theoretic liit Daniel Kane Departent of Coputer Science and Engineering / Departent of Matheatics

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

The Frequent Paucity of Trivial Strings

The Frequent Paucity of Trivial Strings The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Agnostic Learning of Disjunctions on Symmetric Distributions

Agnostic Learning of Disjunctions on Symmetric Distributions Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling

Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling Quantile Search: A Distance-Penalized Active Learning Algorith for Spatial Sapling John Lipor 1, Laura Balzano 1, Branko Kerkez 2, and Don Scavia 3 1 Departent of Electrical and Coputer Engineering, 2

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Energy-Efficient Threshold Circuits Computing Mod Functions

Energy-Efficient Threshold Circuits Computing Mod Functions Energy-Efficient Threshold Circuits Coputing Mod Functions Akira Suzuki Kei Uchizawa Xiao Zhou Graduate School of Inforation Sciences, Tohoku University Araaki-aza Aoba 6-6-05, Aoba-ku, Sendai, 980-8579,

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information