Multiple Instance Learning with Query Bags

Size: px
Start display at page:

Download "Multiple Instance Learning with Query Bags"

Transcription

1 Multiple Instance Learning with Query Bags Boris Babenko UC San Diego Piotr Dollár California Institute of Technology Serge Belongie UC San Diego Abstract In any achine learning applications, precisely labeled data is either burdensoe or ipossible to collect. Multiple Instance Learning (MIL), in which training data is provided in the for of labeled bags rather than labeled instances, is one approach for dealing with abiguously labeled data. In this paper we argue that in any applications of MIL (e.g. iage, audio, text, bioinforatics) a single bag actually consists of a large or infinite nuber of instances, such as all points on a low diensional anifold. For practical reasons, these bags get subsapled before training. Instead, we introduce a MIL forulation which directly odels the underlying structure of these bags. We propose and analyze the query bag odel, in which instances are obtained by repeatedly querying an oracle in a way that can capture relationships between instances. We show that sapling ore instances results in better classification perforance, which otivates us to develop algorithic strategies for sapling any instances while aintaining efficiency. 1 Introduction Traditional supervised learning requires exaple/label pairs during training. However, in any doains precisely labeled data is either burdensoe or ipossible to collect. E.g., less effort is required to specify what digits are present in an iage than to accurately delineate their location, a proble first studied by Keeler et al. [15]. There ay also be inherent abiguity during labeling. The ultiple instance learning fraework (MIL), introduced by Dietterich et al. [1], provides a general paradig for weakly supervised learning. Instead of exaple/label pairs, MIL requires unordered sets of instances, or bags, and labels are assigned to each bag, rather than each instance. A bag is labeled positive if it contains at least one positive instance. In recent years MIL has received significant attention and nuerous algoriths and applications have been proposed [19, 1, 24, 4, 23, 7]. In any MIL applications a bag is generated by taking an object, e.g. an iage or audio wave, and splitting it into overlapping pieces, which serve as the instances. However, both the existing MIL theory and algoriths are underdeveloped for this scenario. E.g., a typical data odel used in classic MIL theory papers [2, 17] assues that bags are finite and all instances are i.i.d., which is not appropriate for any of the datasets we consider here. While existing MIL algoriths do not always explicitly ake these assuptions, they do not offer a principled way of dealing with large or infinite bags. Instead, instances are sapled prior to training. Note that a sapled positive bag ay ay actually contain no positive instances. Furtherore, by separating sapling fro training, the two phases cannot be jointly optiized. These observations inspire us to propose the query bag odel 1 for MIL. In this odel, instances are obtained by repeatedly querying an oracle in a way that can capture relationships between instances (e.g. in any applications all instances in a bag lie on a low diensional anifold). We also propose to integrate the query bag odel directly into existing MIL algoriths. To do so we turn to the filtering paradig, where the data is drawn fro a continuous strea or oracle [6], and extend these ideas to MIL with query bags. We show extensive experients to validate the proposed approach. 1 Note that although the terinology is siilar to [11], our odel and assuptions are quite different. 1

2 Figure 1: Object detection is a typical MIL application. In this exaple a pedestrian is known to exist in the iage but at an unknown location. The iage can be broken up into ultiple overlapping patches using a sliding window, at ultiple scales and orientations, to create a positive bag. Note that the nuber of instances is essentially infinite, since a sliding window can change location by soe sall ɛ. If we subsaple the bag by randoly selecting a liited nuber of patches, we run the chance of issing the object of interest; oreover, relationships between instances are discarded. Instead, in this work we propose strategies that ore directly take advantage of such bag structure. There are nuerous MIL applications where our odel is appropriate. In coputer vision, an iage that contains an object at an unknown position and scale can be partitioned into overlapping sub-iages, one of which will contain the object, either by sliding a rectangular window [23] or through segentation [1]. In coputer audition, prior to processing an audio wave can be partitioned either in the spatial doain [18] or in the frequency doain [21]. In text processing, a docuent can be partitioned into any pieces by using a sliding window [1]. Siilarly, in bioinforatics, a label is often assigned to a long sequence even though a short sub-sequence is responsible for a positive label [2]. In these scenarios the nuber of instances in a bag can becoe prohibitively large or even infinite (e.g. if subpixel iage window locations are used). The query bag odel is well suited for the above scenarios; in fact, while surveying the literature we found very few applications for which it was not well suited (e.g. [1]). 2 The Query Bag Model for MIL We begin with a review of supervised learning and MIL with the classic bag odel. In standard supervised learning data consists of instances {x 1,., x n }, x i X, and labels {y 1,., y n }, y i Y, where the instance/label pairs are usually assued to be drawn i.i.d. fro soe distribution (x i, y i ) D x,y. The goal is to learn a classification function h : X Y that generalizes well to unseen data. In MIL with the classic bag odel data is given in the for {X 1,., X n }, where each X i = {x i1,., x i } is a bag, with associated labels {Y 1,., Y n }, Y i {, 1}. For notational siplicity we assue all bags have the sae cardinality. A bag is positive if one or ore of its instances are positive according to an unknown classifier h : Y i = ax j {1...} {h (x ij )} (1) Alternatively, to odel noisy instance labels, we can write Y i = ax{y ij }, where y ij need not equal h (x ij ). Typically, the instances/label pairs are assued to be drawn i.i.d. fro soe distribution (x ij, y ij ) D x,y, uch like in the standard supervised setting. In the query bag odel odel, rather than being given n bags with instances each, we odel each bag as an oracle that can be queried to obtain an arbitrary nuber of instances per bag. Furtherore, we design the query odel so we can ask for nearby instances. Each query bag i is represented by an object o i O (e.g. a large iage). To saple an instance fro o i, we ust specify a location paraeter α A (e.g. coordinates of an iage patch). A saple x ij X is then generated using a query function Q : O A X. We write x ij = Q(o i, α ij ). Given certain assuptions about Q, foralized below, we can query Q(o i, α + ɛ) for sall ɛ A to get an instance near Q(o i, α). The query function Q and the associated spaces are proble specific, exaples are given in Sec We can express the set of all instances in a bag i using X i = {x x = Q(o i, α), α A}. Note that unlike in the classic bag odel, this set ay be potentially infinite. Likewise, we can define the bag label for query bags in a anner analogous to Eqn. 1: Y i = ax α A {h ( Q(o i, α) ) }, (2) where h is again an instance classifier that deterines the bag labels. In addition, for each bag i, we associate a distribution D i over A that provides soe prior inforation on where positive instances are likely to be located. Most often no prior knowledge is available and this distribution is unifor. Let p be the probability that h (Q(o i, α ij )) = 1 given that bag i is positive and α ij D i. p is a easure of the expected inforativeness of D i and helps deterine the difficulty of the resulting 2

3 MIL proble. Given independent draws fro a positive bag, the probability that all instances are negative is (1 p). Thus, to deterine a bag s label with confidence δ, we ust query instances, where: log(1 δ)/ log(1 p) (3) Given query bags, we can convert the to a suitable forat for standard MIL algoriths by siply querying instances per bag prior to training. We can define a saple of the i th bag, X i = {x i1,., x i } where each x ij is generated using Q(o i, α ij ), where α ij D i. Note that by sapling, there is a chance that for a positive bag, Eqn. 1 will no longer hold. This observation suggests that sapling ore instances for each bag is advantageous. We study this in ore detail in Sec. 3. One intuitive interpretation of the query bag odel is that all instances in a bag i are noisy observations of soe key instance x i. Let α i = argax α {h ( Q(o i, α) ) } and x i = Q(o i, αi ). We call x i the key to bag i the key need not be unique and is arbitrary for negative bags (since α, h (Q(o i, α)) = ). Regardless, every instance x ij in a bag can be written as being soe ɛ ij A away fro the key instance: x ij = Q(o i, αi + ɛ ij). If the ɛ ij were known we could recover the αi and thus the keys x i and the proble would reduce to the standard supervised case. Instead, we can query noisy observations, where D i is the noise distribution (specifically ɛ ij D i αi ). To ake the odel ore tractable, we assue Q is well behaved. Typically, we expect Q to at least be continuous w.r.t. α A, and often we can additionally assue that Q is differentiable. In ost of the exaples that follow A = R d and X = R D where d D; note that under these conditions, a bag defines a d-diensional anifold in R D (cf. Fig. 2). Knowing that Q eets these requireents can be useful. E.g. given a classifier h : R D [, 1] that is also differentiable, we can attept to find the axiu of h(q(o i, α)) w.r.t. α Figure 2: Data generation odel. using gradient descent. This also allows us to query the oracle for nearby instances. Given x ij = Q(o i, α ij ), we can query an instance x ij = Q(o i, α ij + ɛ) such that for sall ɛ R d, x ij x ij 2 is sall (see Sec. 4). Datasets where instances lie on low diensional anifolds have been studied extensively and any interesting techniques have been proposed to take advantage of their structure. These range fro techniques for estiating distances [22] to algoriths that traverse the anifold during training [9]. We draw inspiration fro this work and propose soe effective filtering strategies that take advantage of the anifold structure in Sec Exaples of Query Bags We now consider soe concrete, illustrative exaples of query bags. These exaples are by no eans coprehensive, rather they are eant to clarify the odel and allow us to perfor a nuber of carefully controlled experients in Sec For each odel we define Q : O A X, and also O, A and X. Reaining details are deferred to the experients. Line Bags: A siple case of bags that lie on a anifold is when instances fall on a line. Let o i = {u i, v i } where u i, v i R D, A = R 1, and the query function be Q(o i, α) = u i + αv i. In other words, Q(o i, ) defines a line that extends in direction v i and passes through point u i. Each instance x ij X = R D is a point on that line. Hypercube Bags: Under this odel each bag i is defined to be a hypercube R D (or a square in R 2 ) with side length 2r and center u i. Let o i = {u i, r} where u i R D and r R, and let A = [ 1, 1] D. The query function is Q(o i, α) = u i + α r. Iage Translation Bags: Let I represent a large iage and x = I(α) a p p patch cropped fro iage I centered at location α R 2. Each iage corresponds to a bag and patches cropped fro the iage are instances, e.g. for face detection, αi for a positive bag i would specify the coordinates of a face. More forally, A = R 2, X = R p p, o i = I i and Q(o i, α) = I i (α). The query bag odel is appropriate for any typical MIL applications. E.g., it is straightforward to extend iage bags, defined above, to include other iage transforations, such as rotation and scale change, by increasing the diension of A to account for all degrees of freedo and updating Q accordingly. Also, recall the various applications discussed in Sec. 1. For both bioinforatics 3

4 and text processing, A could encode the position and size of a sliding window; for audio processing, A could encode the frequency bandwidth. Further details are oitted for space. 3 Iplications of Bag Size For the classic bag odel the notion of bag size is clearly defined by the nuber of instances in the bag, which stays constant for any given bag. On the other hand, for our query bag odel the nuber of instances in a bag,, is variable since it depends on the nuber of instances we decide to saple. Therefore, one interpretation of bag size in our odel is how uch we choose to saple the bags. Alternatively, we could consider the distribution D i to define bag size. For exaple, consider the iage bag shown in Fig. 1. D i in this case could be unifor over the entire iage, or over a saller region around the pedestrian (this depends on the data and is out of our control). The bag in the forer case is effectively larger than the latter case. Next we perfor soe experients studying the effects of these interpretations of bag size on the perforance of a MIL algorith Line Bags pos bag 1 pos bag 2 pos bag 3 neg bag Square Bags MNIST Bags Classic MIL Bags Line Bags Square Bags MNIST Classic MIL Bags Figure 3: Error versus. (TOP) illustrations of the datasets; for synthetic datasets the black square designates the positive region, and point color/sybol indicates bag ebership. (BOTTOM) Equal error rate () vs.. See text for details. 3.1 Experients: Error versus We begin with a set of experients that study the perforance of a MIL algorith as we increase the nuber of instances we saple fro each bag. In all experients we use MIL-BOOST, proposed by Viola et al. [23]. In Sec. 4 we propose soe odifications so MIL-BOOST can work directly with the query bag odel; however, for the experients that follow we use the original algorith. To do this we saple a fixed nuber of instances per bag prior to training. We expect perforance of other MIL algoriths to be qualitatively siilar. We easure the effect of varying on 2 synthetic datasets and 1 real dataset: (1) line bags, (2) hypercube bags, and (3) iage translation bags (with the MNIST iage dataset). Thorough details on how each dataset was generated are given below. All errors reported are bag errors. We estiate the predicted bag label by sapling 1 instances per bag. The datasets and plots showing perforance averaged over 25 trials are shown in the first three coluns of Fig. 3. As expected, the results show that as increases error decreases. Furtherore the MNIST results closely reseble the two synthetic cases, suggesting the query bag odel correctly captures the properties of such data. Since our odel and these experients suggest that sapling ore instances is advantageous, we ust consider the coputational consequences. In Sec 4 we propose strategies that allow us to saple any instances while being efficient. Details for synthetic experients: The first 2 experients involve 2D point datasets (X = R 2 ). In each case h (x ij ) = 1 if x ij falls in the square region spanning [5, 5] 2 (the positive region ). For both training and testing 5 positive and 5 negative bags are used. Decision stups served as the weak classifiers. Reaining details follow: (1) For each line bag, u i is saple uniforly fro the positive region for positive bags and fro outside it for negative bags; v i is sapled uniforly fro the unit circle (aking sure that the resulting line for negative bags does not pass through the positive region). D i = N (, 1) for each i. (2) For each hypercube bag, u i is sapled uniforly 4

5 fro [5,.75] 2 for positive bags and fro outside of this region (but inside [, 1] 2 ) for negative bags. D i is unifor over A = [ 1, 1] 2 and r =. See Fig. 3, top, for exaple bags. Details for MNIST digit experient: The final experient was based on a variation of the MNIST handwritten digits [16]. Each digit is originally a iage; we pad each with an 8 pixel border and randoly translate the digit within the resulting iage. Each resulting bag I i contains 256 instances (28 28 patches), one of which is the original iage (see Fig. 3). No prior knowledge is assued, and thus D i is unifor. We arbitrarily labeled bags containing 3 s as positive and the rest negative. Decision stups over Haar features served as the weak classifiers as in [23]. We use 1 positive and negative bags for training, and the rest of the data for testing. 3.2 Experients: Classic Bags We now briefly review the behavior of classic bags with respect to bag size. Using the classic bag odel, Blu and Kalai proposed a PAC algorith [5] with saple coplexity Õ ( D 2 /ε 2) (where D is diensionality of X, is the nuber of instances per bag, and ε is the error). Previous results reported saple coplexity that included even higher powers of [2, 17]. All of these results suggest MIL becoes ore difficult as the bag size increases. Given the independence assuption these results are intuitive the larger the bag, the ore difficult it is to identify positive instances in positive bags. Note that unlike in our odel, here we have no control over the value of. For coparison, we perfor an experient siilar to the ones in the previous section. We generate synthetic classic bags, odeled on the experient described in Fig. 2 of [19]. We use a setup siilar to the previous synthetic experients; we generate bags by uniforly sapling points fro [, 1] 2 and assign the bag a positive label if any of these points fall into the positive region. The results are shown in the last colun of Fig. 3; as is predicted by the PAC bounds, the error goes up as bag size increases. The ore iportant point, however, is that this bag odel is not appropriate for MIL datasets like the ones descried in Sec. 1. Line Bags: VS Variance Experients: Error versus p Finally, we study the alternate interpretation of bag size for query bags. Recall that that the distributions D i provide prior inforation on where positive instances are likely to be located and p quantifies their expected inforativeness. Eqn. 3 suggests that for a fixed bag size a lower value for p would ake learning ore difficult, as sapled positive bags are less likely to contain positive instances. To investigate this, we repeat the line bag experient, setting D i = N (, σ) for each i and varying σ. Note that in this particular setup, σ = iplies p = 1, and as σ increases p approaches =1 =3 =5 =7 = Variance Figure 4: See text. (as the bag becoes bigger ). Results for ultiple values of and σ are shown in Fig. 4; indeed increasing σ degrades perforance while increasing iproves it. Note that in real applications we can control, but the value of p is fixed and dependent on the data. 4 Filtering Instances In the previous sections we saw that sapling ore instances for bags iproves the perforance of a trained classifier. We now present soe strategies for reaping the benefits of sapling any instances, while aintaining efficiency. When training a classifier, we would like to iniize the epirical error over bags. Substituting h for h in Eqn. 2, we can write this as: h = argin h n i=1 ( 1[ax h(q(oi, α)) ) Y i ] (4) α A This is a challenging optiization for a nuber of reasons. One of the key difficulties is that finding the axiu in ters of α ay be intractable, especially if A is infinite. Moreover, for practical reasons we wish to deal with only liited aounts of data at a tie. Recently, Bradley and Schapire [6] introduced a boosting algorith called FilterBoost which learns fro a continuous source of data. Boosting is well adapted to this scenario because training happens in stages weak classifiers are trained sequentially. FilterBoost alternates between training an additional weak classifier and querying the oracle for ore data, using the the latest version of the overall classifier to evaluate the weights of queried instances. 5

6 MILBoost() Input: {o 1,., o n},{y 1,., Y n},{d 1,., D n} 1: for t = 1 to T do 2: Call FilterInstances(i) for i = 1... n 3: Copute weights w ij = L H t ( xij ) 4: Train weak classifier h t ( h t = argax h ij wij h(x ij ) ) 5: Find a t via line search to axiize L a t = argax a L(H t 1 + ah t ) 6: Update strong classifier H t H t 1 + a t h t 7: end for FilterInstances() Paraeters:, SRCH,, ɛ, R, F 1: if t > 1 & od(t, F ) then return 2: if & t > 1 then r = + 1 else r = 1 3: α ij = D i (), x ij = Q(o i, α ij ), for j = r... R 4: Copute p(x ij ) for each j, keep top instances 5: if SRCH then 6: for j = 1 : do 7: if p ( Q(o i, α ij + ɛ) ) > p(x ij ) then 8: α ij = α ij + ɛ, x ij = Q(o i, α ij ) 9: end if 1: end for 11: end if Inspired by this work, we ake an analogous extension of MIL for the case of query bags. In our case, however, we query the oracle for additional instances fro a given bag i, rather than requesting additional labeled data (recall that for MIL labels are required only for bags). Therefore, as opposed to FilterBoost, the nuber of labels required does not depend on the nuber of queries to the oracle. We focus on the MIL-BOOST algorith, which was originally proposed in [23] and extended in [3]. We begin with a brief review of MIL-BOOST, and then describe our extensions to filter instances during training. 4.1 MIL-BOOST Review MIL-BOOST trains a classifier of the for H(x) = T t=1 a th t (x), where h t : X { 1, 1} is a weak classifier, such as a decision stup, and a t is a scalar weight. Using Friedan s gradient boosting fraework [14], we can train this classifier to optiize the log likelihood of the data (since /1 loss is generally difficult to optiize). For shorthand we define p i p(y i = 1 o i ), the probability that bag i is positive. We can write down the log likelihood (defined over bag, not instance, probabilities) as: n L(H) = [Y i log p i + (1 Y i ) log(1 p i )] (5) i=1 We will odel the probability of an instance x being positive as p(x) = σ ( H(x) ), where σ(v) = (1 + exp{ v}) 1 is the sigoid function. What reains is to define the probability of a bag p i as a function of its instances. In our odel with query bags we could write this as: p i = ax α A {p(q(o i, α))} (6) The above is analogous to Eqn. 2. However, as entioned before, the ax over α is difficult to deal with. If we subsaple our bag to get instances {x i1,., x i }, we can re-write the above as p i = ax j {1,...,} {p(x ij )}. Furtherore, we replace the ax with a differentiable approxiation such as Noisy-OR ([3] proposes several options for this). To optiize Eqn. 5, we perfor gradient descent in function space. The MIL-BOOST algorith is suarized above; for details see [23, 3]. 4.2 Filtering Strategies Previously, we observed that when data obeys the query bag odel, having ore instances per bag is advantageous. Since the nuber of candidate instances per bag can be quite large, even infinite, for practical reasons we are liited to a relatively sall nuber of instances per bag for use during training. Instead of using a fixed set, we propose querying the data oracle for new instances during each boosting iteration. In each iteration, we assue we have the coputational resources to train the weak classifiers with instances per bag. Our goal is to optiize Eqn. 5. To get a good estiate of the likelihood, filtering is used to select instances x ij for each bag that have high probability p(x ij ) given the current classifier H t = t k a kh k. For negative bags this is siilar to traditional bootstrapping we want to select the hardest negative exaples. For positive bags we want to get the ost correct exaples. Soe cost is incurred for querying the oracle (e.g. cropping a patch out of an iage) and evaluating p(x ij ). Assue we have the coputational resources to evaluate the probability of R instances per bag and that we can filter instances once in every F iterations of boosting. Given these constraints, we propose the following filtering strategies. Rando Sapling (RAND): The siplest filtering strategy is to query R > saples and keep the with the highest probability, resulting in O(nR) classifier evaluations (n is the nuber of bags). Saples are queried using α ij D i followed by x ij = Q(o i, α ij ). 6

7 Meory (): Instead of sapling a fresh batch of R instances per bag in each iteration, we can retain the instances fro the previous iteration, saple (R ) new instances, and then keep the best of the cobined set (resulting in O(nR) evaluations as before). The classifier changes between iterations; nevertheless, eory allows high probability instances to accuulate over tie. Search (SRCH): Recall that in any scenarios we expect that given x ij = Q(o i, α ij ) and x ij = Q(o i, α ij +ɛ), x ij x ij 2 is sall for sall ɛ A. Although H need not be sooth in the technical sense, it is likely that p(x ij ) p(x ij) is also sall for nearby instances. Thus, given a high probability instance x ij, we can search for nearby x ij such that p(x ij ) > p(x ij). A straightforward way to operationalize this search is to test c nearby locations for each instance at offsets {ɛ 1,., ɛ c } A, and keep the neighbor with the highest probability. Note that this incurs an additional cost of O(nc) classifier evaluations. We suarize the three strategies (rando sapling, eory, and search) in Algorith 3. Meory () and search (SRCH) can be turned on/off, and we can adjust the aount (R) and filtering period (F ). We ephasize that ore sophisticated filtering strategies could be developed, e.g. a true gradient descent strategy to take advantage of underlying anifold structure of the instances as opposed to the steepest descent type search described above. However, our goal is to deonstrate that even siple filtering strategies can result in significant perforance gains. 4.3 Filtering Experients MNIST: vs R RAND SRCH +SRCH MNIST: vs NONE MNIST: vs F RAND SRCH +SRCH MNIST: vs T RAND SRCH +SRCH NONE R F T (A) (B) (C) (D) Figure 5: Plots showing perforance for various filtering strategies (see text) for the MNIST dataset. In each plot we show the equal error rate () plotted against four different paraeters: (A) R, aount of sapling; (B), nuber of instance per bag during training (only shown); (C) F, the filtering period (low F = frequent); and (D) T, the nuber of weak classifiers. MNIST For this experient we used the MNIST handwritten digit dataset described in Sec A = R 2 encodes the 2D pixel coordinates of the center of an iage patch and D i is set to unifor as we have no prior inforation as to where the digit ay be located. For SRCH we used four values of ɛ = (±1, ), (, ±1). Our goal is to easure the effectiveness of various cobinations of filtering strategies. We report the equal error rate (), (the point where false positive equal false negatives), repeating each experient 1-25 ties depending on the easured variance, and plotting the averaged results with standard deviation bars. We easured filtering perforance with on/off and SRCH on/off, while sweeping through the paraeters R (sapling aount), (bag size), F (filtering period) and T (nuber boosting iterations). In each experient we keep 3 of the 4 paraeters fixed at their default values of 16, 4, 1, 64, respectively, and sweep through the fourth. Where appropriate, we also include perforance without filtering (NONE). Results are shown in Fig. 5. In Fig. 5(A) we show the effect of altering R, the nuber of instances queried per bag per iteration. Both strategies converge to low error when R = 8, while the other strategies take up to R = 128. SRCH is beneficial for rando sapling but akes a saller difference when is turned on. Fig. 5(B) shows the advantage of filtering () versus sapling bags and keeping the fixed during training. Both strategies converge to low error as increases, however, with filtering an eighth of the instances ( = 8 vs. = 64) are required during training. Fig. 5(C) shows in ore detail the effects of filtering period F (lower values of F result in ore frequent sapling). Frequent filtering is beneficial for all strategies, however the iproveent is ost significant for strategies (see Sec. 4.4 for discussion). In Fig. 5(D) we plot error vs. the nuber of boosting iterations T. These results are particularly interesting, as without error actually increases for large values of T. This is true also for 7

8 training error (not shown). This behavior is counterintuitive as boosting is know to have excellent generalization, with error decreasing as T increases [13]. We observed that this behavior is not as severe given larger R (results not shown). Essentially, as the classifier becoes ore refined its response becoes ore peaked and rando sapling with low R ay not yield any high probability instances for a given bag, thus preventing the classifier fro converging. Using eory alleviates this proble by guaranteeing high probability instances are retained (ore details in Sec. 4.4). INRIA Pedestrians The MNIST dataset is convenient because it is not particularly large or difficult and it allowed us to study the behavior of the algorith in a well controlled anner. To ensure that the general trends of this behavior hold for a ore challenging dataset, we repeat siilar experients with the INRIA pedestrian dataset [8]. This dataset is currently a standard benchark for pedestrian detection, and is used to evaluate any recent state of the art systes (e.g. [12]). We use a setup analogous to the MNIST experients we resized these iages to be 8 4 pixels in size, and then padded both the negative and positive iages by 3 pixels in each direction by replicating the border. The nuber of possible instances for each bag is therefore Unlike the MNIST dataset, Err INRIA: ERR vs R RAND SRCH +SRCH R (A) Err INRIA: ERR vs NONE (B) Figure 6: Plots showing perforance for various filtering strategies (see text) for the INRIA dataset. In each plot we show the iss rate at a false positive rate of 1% plotted against two paraeters: (A) R, aount of sapling; (B), nuber of instance per bag during training. here it would be ipossible to saple the bags exhaustively and store this in eory 2. We use 5 positive and 5 negative bags for training. The default paraeters used are as follows: T = 128, F = 8, = 4, R = 16. Results are shown in Fig. 6. We see that the general trends we observed with the MNIST data appear in these results as well. 4.4 Analysis We now consider possible explanations for the results discussed above. Let L (H) be defined as in Eqn. 5, but with p i replacing p i. It is easy to show that as, L (H) L(H) for a fixed classifier H. This follows fro the fact that p i p i. This yet again suggests that training with ore sapled instances is advantageous. Now let us consider soe of the filtering strategies proposed above, oentarily assuing the classifier H is fixed during training (obviously this is not the case in the algorith). For RAND, we are drawing novel instances in each iteration, and there is no guarantee that at tie step t + 1 our estiate of the likelihood L (H) would be any closer to L(H) than at tie t. With, however, during every tie step we are effectively evaluating an additional (R ) new instances per bag. Therefore, as t, L (H) L(H). In practice, however, at every tie step t the classifier H changes slightly. This akes it difficult to prove convergence for MIL-BOOST with filtering. In particular, note that for two different classifiers H and H, L (H) > L ( H) does not iply L(H) > L( H). In other words, if the boosting algoriths finds a local axiu of L, it is not necessarily the local axiu of the true log likelihood L. However, the above analysis provides soe intuition as to which filtering strategies should work, in particular, it helps explain why is essential. 5 Conclusions In this work we argued that the ajority of MIL applications in recent literature have diverged fro the assuptions and data generation odels associated with the original MIL forulation. We presented the query bag odel, which ore accurately fits the data in these applications. We showed that sapling ore instances for each bag is advantageous and proposed a nuber of filtering strategies for dealing with a large nuber of saples. These strategies open the door to effectively dealing with a range of MIL applications in coputer vision, audition, text, bioinforatics, and other doains that previously required heavy sapling of bags and thus resulted in suboptial perforance. We envision developing ore sophisticated techniques for specific doains thus extending the effectiveness and applicability of the MIL fraework. 2 Storing the 25 diensional feature vectors for this any instances would require over 6GB of eory. 8

9 References [1] S. Andrews, T. Hofann, and I. Tsochantaridis. Multiple instance learning with generalized support vector achines. A.I., pages , 22. [2] P. Auer. On learning fro ulti-instance exaples: Epirical evaluation of a theoretical approach. ICML, [3] B. Babenko, P. Dollár, Z. Tu, and S. Belongie. Siultaneous learning and alignent: Multiinstance and ulti-pose learning. In Faces in Real-Life Iages, 28. [4] J. Bi, Y. Chen, and J. Wang. A sparse support vector achine approach to region-based iage categorization. In CVPR, 25. [5] A. Blu and A. Kalai. A Note on Learning fro Multiple-Instance Exaples. Mach. Learning, 3(1):23 29, [6] J. Bradley and R. Schapire. FilterBoost: Regression and Classification on Large Datasets. In NIPS, 27. [7] R. Bunescu and R. Mooney. Multiple instance learning for sparse positive bags. In ICML, 27. [8] N. Dalal and B. Triggs. Histogras of oriented gradients for huan detection. In CVPR, 25. [9] T. Dietterich, A. Jain, R. Lathrop, and T. Lozano-Perez. A Coparison of Dynaic Reposing and Tangent Distance for Drug Activity Prediction. In NIPS, [1] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Perez. Solving the ultiple-instance proble with axis parallel rectangles. A.I., [11] D. Dooly, S. Goldan, and S. Kwek. Real-valued ultiple-instance learning with queries. Journal of Coputer and Syste Sciences, 72:1 15, 26. [12] P. Felzenszwalb, D. McAllester, and D. Raanan. A discriinatively trained, ultiscale, deforable part odel. In CVPR, 28. [13] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Cop. and Sys. Sci., 55: , [14] J. Friedan. Greedy function approxiation: A gradient boosting achine. Ann. of Stat., 29(5): , 21. [15] J. D. Keeler, D. E. Ruelhart, and W. K. Leow. Integrated segentation and recognition of hand-printed nuerals. In NIPS, 199. [16] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docuent recognition. Proceedings of the IEEE, 86(11): , [17] P. Long and L. Tan. PAC Learning Axis-aligned Rectangles with Respect to Product Distributions fro Multiple-Instance Exaples. Mach. Learning, 3(1):7 21, [18] M. Mandel and D. Ellis. Multiple-instance learning for usic inforation retrieval. In ISMIR, 28. [19] O. Maron and T. Lozano-Perez. A fraework for ultiple-instance learning. In NIPS, [2] S. Ray and M. Craven. Supervised versus ultiple instance learning: an epirical coparison. ICML, 25. [21] L. K. Saul, M. G. Rahi, and J. B. Allen. A statistical odel for robust integration of narrowband cues in speech. Coputer Speech and Language, 15: , 21. [22] P. Siard, Y. LeCun, and D. J. Efficient pattern recognition using a new transforation distance. In NIPS, [23] P. Viola, J. C. Platt, and C. Zhang. Multiple instance boosting for object detection. In NIPS, 25. [24] Q. Zhang and S. Goldan. EM-DD: An iproved ultiple-instance learning technique. In NIPS, volue 14, pages , 22. 9

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Transformation-invariant Collaborative Sub-representation

Transformation-invariant Collaborative Sub-representation Transforation-invariant Collaborative Sub-representation Yeqing Li, Chen Chen, Jungzhou Huang Departent of Coputer Science and Engineering University of Texas at Arlington, Texas 769, USA. Eail: yeqing.li@avs.uta.edu,

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

arxiv: v1 [cs.cv] 28 Aug 2015

arxiv: v1 [cs.cv] 28 Aug 2015 Discrete Hashing with Deep Neural Network Thanh-Toan Do Anh-Zung Doan Ngai-Man Cheung Singapore University of Technology and Design {thanhtoan do, dung doan, ngaian cheung}@sutd.edu.sg arxiv:58.748v [cs.cv]

More information