Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Size: px
Start display at page:

Download "Examples are not Enough, Learn to Criticize! Criticism for Interpretability"

Transcription

1 Exaples are not Enough, Learn to Criticize! Criticis for Interpretability Been Ki Allen Institute for AI Rajiv Khanna UT Austin Oluwasani Koyejo UIUC Abstract Exaple-based explanations are widely used in the effort to iprove the interpretability of highly coplex distributions. However, prototypes alone are rarely sufficient to represent the gist of the coplexity. In order for users to construct better ental odels and understand coplex data distributions, we also need criticis to explain what are not captured by prototypes. Motivated by the Bayesian odel criticis fraework, we develop MMD-critic which efficiently learns prototypes and criticis, designed to aid huan interpretability. A huan subject pilot study shows that the MMD-critic selects prototypes and criticis that are useful to facilitate huan understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing copetitive perforance copared to baselines. 1 Introduction and Related Work As achine learning (ML) ethods have becoe ubiquitous in huan decision aking, their transparency and interpretability have grown in iportance (Varshney, 2016). Interpretability is particularity iportant in doains where decisions can have significant consequences. For exaple, the pneuonia risk prediction case study in Caruana et al. (2015) showed that a ore interpretable odel could reveal iportant but surprising patterns in the data that coplex odels overlooked. Studies of huan reasoning have shown that the use of exaples (prototypes) is fundaental to the developent of effective strategies for tactical decision-aking (Newell and Sion, 1972; Cohen et al., 1996). Exaple-based explanations are widely used in the effort to iprove interpretability. A popular research progra along these lines is case-based reasoning (CBR) (Aaodt and Plaza, 1994), which has been successfully applied to real-world probles (Bichindaritz and Marling, 2006). More recently, the Bayesian fraework has been cobined with CBR-based approaches in the unsupervised-learning setting, leading to iproveents in user interpretability (Ki et al., 2014). In a supervised learning setting, exaple-based classifiers have been is shown to achieve coparable perforance to non-interpretable ethods, while offering a condensed view of a dataset (Bien and Tibshirani, 2011). However, exaples are not enough. Relying only on exaples to explain the odels behavior can lead over-generalization and isunderstanding. Exaples alone ay be sufficient when the distribution of data points are clean in the sense that there exists a set of prototypical exaples which sufficiently represent the data. However, this is rarely the case in real world data. For instance, fitting odels to coplex datasets often requires the use of regularization. While the regularization adds bias to the odel to iprove generalization perforance, this sae bias ay conflict with the distribution of the data. Thus, to aintain interpretability, it is iportant, along with prototypical exaples, to deliver insights signifying the parts of the input space where prototypical exaples All authors contributed equally. 29th Conference on Neural Inforation Processing Systes (NIPS 2016), Barcelona, Spain.

2 do not provide good explanations. We call the data points that do not quite fit the odel criticis saples. Together with prototypes, criticis can help huans build a better ental odel of the coplex data space. Bayesian odel criticis (BMC) is a fraework for evaluating fitted Bayesian odels, and was developed to to aid odel developent and selection by helping to identify where and how a particular odel ay fail to explain the data. It has quickly developed into an iportant part of odel design, and Bayesian statisticians now view odel criticis as an iportant coponent in the cycle of odel construction, inference and criticis (Gelan et al., 2014). Lloyd and Ghahraani (2015) recently proposed an exploratory approach for statistical odel criticis using the axiu ean discrepancy (MMD) two saple test, and explored the use of the witness function to identify the portions of the input space the odel ost isrepresents the data. Instead of using the MMD to copare two odels as in classic two saple testing (Gretton et al., 2008), or to copare the odel to input data as in the Bayesian odel criticis of Lloyd and Ghahraani (2015), we consider a novel application of the MMD, and its associated witness function as a principled approach for selecting prototype and criticis saples. We present the MMD-critic, a scalable fraework for prototype and criticis selection to iprove the interpretability of achine learning ethods. To our best knowledge, ours is the first work which leverages the BMC fraework to generate explanations for achine learning ethods. MMD-critic uses the MMD statistic as a easure of siilarity between points and potential prototypes, and efficiently selects prototypes that axiize the statistic. In addition to prototypes, MMD-critic selects criticis saples i.e. saples that are not well-explained by the prototypes using a regularized witness function score. The scalability follows fro our analysis, where we show that under certain conditions, the MMD for prototype selection is a superodular set function. Our superodularity proof is general and ay be of independent interest. While we are priarily concerned with prototype selection and criticis, we quantitatively evaluate the perforance of MMD-critic as a nearest prototype classifier, and show that it achieves coparable perforance to existing ethods. We also present results fro a huan subject pilot study which shows that including the criticis together with prototypes is helpful for an end-task that requires the data-distributions to be well-explained. 2 Preliinaries This section includes notation and a few iportant definitions. Vectors are denoted by lower case x and atrices by capital X. The Euclidean inner product between atrices A and B is given by A, B = a i,j b i,j. Let det(x) denote the deterinant of X. Sets are denoted by sans serif e.g. S. The reals are denoted by R. [n] denotes the set of integers {1,..., n}, and 2 V denotes the power set of V. The indicator function 1 [a] takes the value of 1 if its arguent a is true and is 0 otherwise. We denote probability distributions by either P or Q. The notation will denote cardinality when applied to sets, or absolute value when applied to real values. 2.1 Maxiu Mean Discrepancy (MMD) The axiu ean discrepancy (MMD) is a easure of the difference between distributions P and Q, given by the suprenu over a function space F of differences between the expectations with respect to two distributions. The MMD is given by: MMD(F, P, Q) = sup f F ( E X P [f(x)] E Y Q [f(y )] ). (1) When F is a reproducing kernel Hilbert space (RKHS) with kernel function k : X X R, the suprenu is achieved at (Gretton et al., 2008): f(x) = E X P [k(x, X )] E X Q [k(x, X )]. (2) The function (2) is also known as the witness function as it easures the axiu discrepancy between the two expectations in F. Observe that the witness function is positive whenever Q underfits the density of P, and negative wherever Q overfits P. We can substitute (2) into (1) and square the result, leading to: MMD 2 (F, P, Q) = E X,X P [k(x, X )] 2E X P,y Q [k(x, Y )] + E Y,Y Q [k(y, Y )]. (3) 2

3 It is clear that MMD 2 (F, P, Q) 0 and MMD 2 (F, P, Q) = 0 iff. P is indistinguishable fro Q on the RHKS F. This population definition can be approxiated using saple expectations. In particular, given n saples fro P as X = {x i P, i [n]}, and saples fro Q as Z = {z i Q, i []}, the following is a finite saple approxiation: MMD 2 b(f, X, Z) = 1 n 2 i,j [n] k(x i, x j ) 2 n and the witness function is approxiated as: i [n] i [n],j [] j [] k(x i, z j ) i,j [] k(z i, z j ), (4) f(x) = 1 k(x, x i ) 1 k(x, z j ). (5) n 3 MMD-critic for Prototype Selection and Criticis Given n saples fro a statistical odel X = {x i, i [n]}, let S [n] represent a subset of the indices, so that X S = {x i i S}. Given a RKHS with the kernel function k(, ), we can easure the axiu ean discrepancy between the saples and any selected subset using MMD 2 (F, X, X S ). MMD-critic selects prototype indices S which iniize MMD 2 (F, X, X S ). For our purposes, it will be convenient to pose the proble as a noralized discrete axiization. To this end, consider the following cost function, given by the negation of MMD 2 (F, X, X S ) with an additive bias: J b (S) = 1 n 2 = 2 n S n i,j=1 k(x i, x j ) MMD 2 (F, X, X S ) i [n],j S k(x i, y j ) 1 S 2 k(y i, x j ). (6) Note that the additive bias MMD 2 (F, X, ) = 1 n 2 n i,j=1 k(x i, x j ) is a constant with respect to S. Further, J b (S) is noralized, since, when evaluated on the epty set, we have that: J b ( ) = in S 2 [n] J b (S) = 1 n 2 n k(x i, x j ) 1 n 2 i,j=1 i,j S n i,j=1 k(x i, x j ) = 0. MMD-critic selects prototypes as the subset of indices S [n] which optiize: ax J b (S). (7) S 2 [n], S For the purposes of optiizing the cost function (6), it will prove useful to exploit it s linearity with respect to the kernel entries. The following Lea is easily shown by enueration. Lea 1. Let J b ( ) be defined as in (6), then J b ( ) is a linear function of k(x i, x j ). In particular, define K R n n, with k i,j = k(x i, x j ), and A(S) R n n with entries a i,j (S) = 2 n S 1 [j S] 1 S 1 2 [i S] 1 [j S] then: J b (S) = A(S), K. 3.1 Subodularity and Efficient Prototype Selection While the discrete optiization proble (6) ay be quite coplicated to optiize, we show that the cost function J b (S) is onotone subodular under conditions on the kernel atrix which are often satisfied in practice, and which can be easily checked given a kernel atrix. Based on this result, we describe the greedy forward selection algorith for efficient prototype selection. Let F : 2 [n] R represent a set function. F is noralized if F ( ) = 0. F is onotonic, if for all subsets u v 2 [n] it holds that F (U) F (V). F is subodular, if for all subsets U, V 2 [n] it holds that F (U V) + F (U V) F (U) + F (V). Subodular functions have a diinishing returns property (Nehauser et al., 1978) i.e. the arginal gain of adding eleents decreases with the size of the set. When F is subodular, F is superodular (and vice versa). 3

4 We prove subodularity for a larger class of probles, then show subodularity of (6) as a special case. Our proof for the larger class ay be of independent interest. In particular, the following Theore considers general discrete optiization probles which are linear atrix functionals, and shows sufficient conditions on the atrix for the proble to be onotone and/or subodular. Theore 2 (Monotone Subodularity for Linear Fors). Let H R n n (not necessarily syetric) be eleent-wise non-negative and bounded, with upper bound h = ax i,j [n] h i,j > 0. Further, construct the binary atrix representation of the indices that achieve the axiu as E [0, 1] n n with e i,j = 1 if h i,j = h and e i,j = 0 otherwise, and its copleent E = 1 E with the corresponding set E = {(i, j) s.t. e i,j = 0}. Given the ground set S 2 [n] consider the linear for: F (H, S) = A(S), H S S. Given = S, define the functions: a(s {u}) a(s) a(s {u}) + a(s v}) a(s {u, v}) a(s) α(n, ) =, β(n, ) =, b(s) b(s {u, v}) + d(s) (8) where a(s) = F (E, S), b(s) = F (E, S) for all u, v S (additional notation suppressed in α( ) and β( ) for clarity). Let = ax S S S be the axial cardinality of any eleent in the ground set. 1. If h i,j h α(n, ) 0, (i, j) E, then F (H, S) is onotone 2. If h i,j h β(n, ) 0, (i, j) E, then F (H, S) is subodular. Finally, we consider a special case of Theore 2 for the MMD. Corollary 3 (Monotone Subodularity for MMD). Let the kernel atrix K R n n be eleent-wise non-negative, with equal diagonal ters k i,i = k > 0 i [n], and be diagonally doinant. If the k off-diagonal ters k i,j i, j [n], i j satisfy 0 k i,j n 3 +2n 2 2n 3, then J b(s) given by (6) is onotone subodular. The diagonal doinance condition expressed by Corollary 3 is easy to check given a kernel atrix. We also note that the conditions can be significantly weakened if one deterines the required nuber of prototypes = ax S n a-priori. This is further siplified for the MMD since the bounds (8) are both onotonically decreasing functions of, so the condition need only be checked for. Observe that diagonal doinance is not a necessary condition, as the ore general approach in Theore 2 allows arbitrarily indexed axial entries in the kernel. Diagonal doinance is assued to siplify the resulting expressions. Perhaps, ore iportant to practice is our observation that the diagonal doinance condition expressed by Corollary 3 is satisfied by paraetrized kernels with appropriately selected paraeters. We provide an exaple for radial basis function (RBF) kernels and powers of positive standardized kernels. Further exaples and ore general conditions are left for future work. Exaple 4 (Radial basis function Kernel). Consider the radial basis function kernel K with entries k i,j = k(x i, x j ) = exp( γ x i x j ) evaluated on a saple X with non-duplicate points i.e. x i x j x i, x j X. The off-diagonal kernel entries k i,j i j onotonically decrease with respect to increasing γ. Thus, γ such that Corollary 3 is satisfied for γ γ. Exaple 5 (Powers of Positive Standardized Kernels). Consider a eleent-wise positive kernel atrix G standardized to be eleent-wise bounded 0 g i,j < 1 with unitary diagonal g i,i = 1 i [n]. Define the kernel power K with k i,j = g p i,j. The off-diagonal kernel entries k i,j i j onotonically decrease with respect to increasing p. Thus, p such that Corollary 3 is satisfied for p p. Beyond the exaples outlined here, siilar conditions can be enuerated for a wide range of paraetrized kernel functions, and are easily checked for odel-based kernels e.g. the Fisher kernel (Jaakkola et al., 1999) useful for coparing data points based on siilarity with respect to a probabilistic odel. Our interpretation of fro these exaples is that the conditions of Corollary 3 are not excessively restrictive. While constrained axiization of subodular functions is generally NP-hard, the siple greedy forward selection heuristic has been shown to perfor alost as well as the optial in practice, and is known to have strong theoretical guarantees. Theore 6 (Nehauser et al. (1978)). In the case of any noralized, onotonic subodular function F, the set S obtained by the greedy algorith achieves at least a constant fraction ( 1 1 e ) of the objective value obtained by the optial solution i.e. F (S ) = ( 1 1 e ) ax S F (s). 4

5 In addition, no polynoial tie algorith can provide a better approxiation guarantee unless P = NP (Feige, 1998). An additional benefit of the greedy approach is that it does not require the decision of the nuber of prototypes to be ade at training tie, so assuing the kernel satisfies appropriate conditions, training can be stopped at any based on coputational constraints, while still returning eaningful results. The greedy algorith is outlined in Algorith 1. Algorith 1 Greedy algorith, ax F (S) s.t. S Input:, S = while S < do foreach i [n]\s, f i = F (S i) F (S) S = S {arg ax f i } end while Return: S. 3.2 Model Criticis In addition to selecting prototype saples, MMD-critic characterizes the data points not well explained by the prototypes which we call the odel criticis. These data points are selected as the largest values of the witness function (5) i.e. where the siilarity between the dataset and the prototypes deviate the ost. Consider the cost function: L(C) = 1 k(x i, x l ) 1 k(x j, x l ) l C n j S. (9) i [n] The absolute value ensures that we easure both positive deviations f(x) > 0 where the prototypes underfit the density of the saples, and negative deviations f(x) < 0, where the prototypes overfit the density of the saples. Thus, we focus priarily on the agnitude of deviation, rather than its sign. The following theore shows that (9) is a linear function of C. Theore 7. The criticis function L(C) is a linear function of C. We found that the addition of a regularizer which encourages a diverse selection of criticis points iproved perforance. Let r : 2 [n] R represent a regularization function. We select the criticis points as the axiizers of this cost function: ax L(C) + r(k, C) (10) C [n]\s, C c Where [n]\s denote all indexes which not include the prototypes, and c is the nuber of criticis points desired. Fortunately, due to the linearity of (5), the optiization function (10) is subodular when the regularization function is subodular. We encourage the use of regularizers which incorporate diversity into the criticis selection. We found the best qualitative perforance using the log-deterinant regularizer (Krause et al., 2008). Let K C,C be the sub-atrix of K corresponding to the pair of indexes in C C, then the log-deterinant regularizer is given by: r(k, C) = log det K C,C (11) which is known to be subodular. Further, several researchers have found, both in theory and practice (Shara et al., 2015), that greedy optiization is an effective strategy for optiization. We apply the greedy algorith for criticis selection with the function F (C) = L(C) + r(k, C). 4 Related Work There is a large literature on techniques for selecting prototypes that suarize a dataset, and a full literature survey is beyond the scope of this anuscript. Instead, we overview a few of the ost relevant references. The K-edoid clustering (Kaufan and Rousseeuw, 1987) is a classic technique for selecting a representative subset of data points, and can be solved using various iterative algoriths. K-edoid clustering is quite siilar to K-eans clustering, with the additional condition that the presented prototypes ust be in the dataset. The ubiquity of large datasets has led to resurgence 5

6 of interest in the data suarization proble, also known as the set cover proble. Progress has included novel cost functions and algoriths for several doains including iage suarization (Sion et al., 2007) and docuent suarizauion (Lin and Biles, 2011). Recent innovations also include highly scalable and distributed algoriths (Badanidiyuru et al., 2014; Mirzasoleian et al., 2015). There is also a large literature on variations of the set cover proble tuned for classification, such as the cover digraph approach of (Priebe et al., 2003) and prototype selection for interpretable classification (Bien and Tibshirani, 2011), which involves selecting prototypes that axiize the coverage within the class, but iniize the coverage across classes. Subodular / Superodular functions are well studied in the cobinatorial optiization literature, with several scalable algoriths that coe with optiization theoretic optiality guarantees (Nehauser et al., 1978). In the Bayesian odeling literature, subodular optiization has previously been applied for approxiate inference by Koyejo et al. (2014). The technical conditions required for subodularity of (6) are due to averaging of the kernel siilarity scores as the average requires a division by the cardinality S. In particular, the analogue of (6) which replaces all the averages by sus (i.e. reoves all division by S ) is equivalent to the well known subodular functions previously used for scene (Sion et al., 2007) and docuent (Lin and Biles, 2011) suarization, given by: 2 n i [n],j S k(x i, y j ) + λ i,j S k(y i, x j ), where λ > 0 is a regularization paraeter. The function that results is known to be subodular when the kernel is eleent-wise positive i.e. without the need for additional diagonal doinance conditions. On the other hand, the averaging has a desirable built-in balancing effect. When using the su, practitioners ust tune the additional regularization paraeter λ to achieve a siilar balance. 5 Results We present results for the proposed technique MMD-critic using USPS hand written digits (Hull, 1994) and Iagenet (Deng et al., 2009) datasets. We quantitatively evaluate the prototypes in ters of predictive quality as copared to related baselines on USPS hand written digits dataset. We also present preliinary results fro a huan subject pilot study. Our results suggest that the odel criticis which is unique to the proposed MMD-critic is especially useful to facilitate huan understanding. For all datasets, we eployed the radial basis function (RBF) kernel with entries k i,j = k(x i, x j ) = exp( γ x i x j ), which satisfies the conditions of Corollary 3 for sufficiently large γ (c.f. Exaple 4, see Exaple 5 and following discussion for alternative feasible kernels). The Nearest Prototype Classifier: While our priary interest is in interpretable prototype selection and criticis, prototypes ay also be useful for speeding up eory-based achine learning techniques such as the nearest neighbor classifier by restricting the neighbor search to the prototypes, soeties known as the nearest prototype classifier (Bien and Tibshirani, 2011; Kuncheva and Bezdek, 1998). This classification provides an objective (although indirect) evaluation of the quality of the selected prototypes, and is useful for setting hyperparaeters. We eploy a 1 nearest neighbor classifier using the Hilbert space distance induced by the kernels. Let y i [k] denote the label associated with each prototype i S, for k classes. As we eploy noralized kernels (where the diagonal is 1), it is sufficient to easure the pairwise kernel siilarity. Thus, for a test point ˆx, the nearest neighbor classifier reduces to: ŷ = y i, where i = argin ˆx x i 2 H K i S 5.1 MMD-critic evaluated on USPS Digits Dataset = argax k(ˆx, x i ). i S The USPS hand written digits dataset Hull (1994) consists of n = 7291 training (and 2007 test) greyscale iages of 10 handwritten digits fro 0 to 9. We consider two kinds of RBF kernels (i) global: where the pairwise kernel is coputed between all data points, and (ii) local: given by exp( γ x i x j )1 [yi=y j], i.e. points in different classes are assigned a siilarity score of zero. The local approach has the effect of pushing points in different classes further apart. The kernel hyperparaeter γ was chosen based to axiize the average cross-validated classification perforance, then fixed for all other experients. Classification: We evaluated nearest prototype classifiers using MMD-critic, and copared to baselines (and reported perforance) fro Bien and Tibshirani (2011) (abbreviated as PS) and their 6

7 MMD-global MMD-local PS K-edoids Test error Nuber of prototypes Figure 1: Classification error vs. nuber of prototypes = S. MMD-critic shows coparable (or iproved) perforance as copared to other odels (left). Rando subset of prototypes and criticis fro the USPS dataset (right). ipleentation of K-edoids. Figure 1(left) copares MMD-critic with global and local kernels, to the baselines for different nubers of selected prototypes = S. Our results show coparable (or iproved) perforance as copared to other odels. In particular, we observe that the global kernels out-perfor the local kernels 2 by a sall argin. We note that MMD is particularly effective at selecting the first few prototypes (i.e. speed of error reduction as nuber of prototypes increases) suggesting its utility for rapidly suarising the dataset. Selected Prototypes and Criticis: Fig. 1 (right) presents a randoly selected subset of the prototypes and criticis fro the MMD-critic using the local kernel. We observe that the prototypes capture any of the coon ways of writing digits, while the criticis clearly capture outliers. 5.2 Qualitative Measure: Prototypes and Criticiss of Iages In this section, we learn prototypes and criticiss fro the Iagenet dataset (Russakovsky et al., 2015) using iage ebeddings fro He et al. (2015). Each iage is represented by a 2048 diensions vector ebedding, and each iage belongs to one of 1000 categories. We select two breeds of one category (e.g., Blenhei spaniel) and run MMD-critic to learn prototypes and criticis. As shown in Figure 2, MMD-critic learns reasonable prototypes and criticiss for two types of dog breeds. On the left, criticiss picked out the different coloring (second criticis is in black and white picture), as well as pictures capturing oveents of dogs (first and third criticiss). Siilarly, on the right, criticiss capture the unusual, but potentially frequent pictures of dogs in costues (first and second criticiss). 5.3 Quantitative easure: Prototypes and Criticiss iprove interpretability We conducted a huan pilot study to collect objective and subjective easures of interpretability using MMD-critic. The experient used the sae dataset as Section 5.2. We define interpretability in this work as the following: a ethod is interpretable if a user can correctly and efficiently predict the ethod s results. Under this definition, we designed a predictive task to quantitatively evaluate the interpretability. Given a randoly sapled data point, we easure how well a huan can predict a group it belongs to (accuracy), and how fast they can perfor the task (efficiency). We chose this dataset as the task of assigning a new iage to a group requires groups to be well-explained but does not require specialized training. We presented four conditions in the experient. 1) raw iages condition (Raw Condition) 2) Prototypes Only (Proto Only Condition) 3) Prototypes and criticiss (Proto and Criticis Condition) 4) Uniforly sapled data points per group (Unifor Condition). Raw Condition contained 100 iages per species (e.g., if a group contains 2 species, there are 200 iages) Proto Only Condition, Proto and Criticis Condition and Unifor Condition contains the sae nuber of iages. 2 Note that the local kernel trivially achieves perfect accuracy. Thus, in order to easure generalization perforance, we do not use class labels for local kernel test instances i.e. we use the global kernel instead of local kernel for test instances regardless of training. 7

8 Figure 2: Learned prototypes and criticiss fro Iagenet dataset (two types of dog breeds) We used within-subject design to iniize the effect of inter-participant variability, with a balanced Latin square to account for a potential learning effect. The four conditions were assigned to four participants (four ales) in a balanced anner. Each subject answered 21 questions, where the first three questions are practice questions and not included in the analysis. Each question showed six groups (e.g., red fox, kit fox) of a species (e.g., fox), and a randoly sapled data point that belongs to one of the groups. Subjects were encouraged to answer the questions as quickly and accurately as possible. A break was iposed after each question to itigate the potential effect of fatigue. We easured the accuracy of answers as well as the tie they took to answer each question. Participants were also asked to respond to 10 5-point Likert scale survey questions about their subjective easure of accuracy and efficiency. Each survey question copared a pair of conditions (e.g., Condition A was ore helpful than condition B to correctly (or efficiently) assign the iage to a group). Subjects perfored the best using Proto and Criticis Condition (M=87.5%, SD=20%). The perforance with Proto Only Condition was relatively siilar (M=75%, SD=41%), while that with Unifor Condition (M=55%, SD=38%, 37% decrease) and Raw Condition (M=56%, SD=33%, 36% decrease) was substantially lower. In ters of speed, subjects were ost efficient using Proto Only Condition (M=1.04 ins/question, SD=0.28, 44% decrease copared to Raw Condition), followed by Unifor Condition (M=1.31 ins/question, SD=0.59) and Proto and Criticis Condition (M=1.37 ins/question, SD=0.8). Subjects spent the ost tie with Raw Condition (M=1.86 ins/question, SD=0.67). Subjects indicated their preference of Proto and Criticis Condition over Raw Condition and Unifor Condition. In a survey question that asks to copare Proto and Criticis Condition and Raw Condition, a subject added that [Proto and Criticis Condition resulted in] less confusion fro trying to discover hidden patterns in a ton of iages, ore clues indicating what features are iportant". In particular, in a question that asks to copare Proto and Criticis Condition and Proto Only Condition, a subject said that The addition of criticiss ade it easier to locate the defining features of the cluster within the prototypical iages". The huans superior perforance with prototypes and criticis in this preliinary study shows that providing criticiss together with prototypes is a proising direction to iprove the interpretability. 6 Conclusion We present the MMD-critic, a scalable fraework for prototype and criticis selection to iprove the interpretability of coplex data distributions. To our best knowledge, ours is the first work which leverages the BMC fraework to generate explanations. Further, MMD-critic shows copetitive perforance as a nearest prototype classifier copared to to existing ethods. When criticis is given together with prototypes, a huan pilot study suggests that huans are better able to perfor a predictive task that requires the data-distributions to be well-explained. This suggests that criticis and prototypes are a step towards iproving interpretability of coplex data distributions. For future work, we hope to further explore the properties of MMD-critic such as the effect of the choice of kernel, and weaker conditions on the kernel atrix for subodularity. We plan to explore applications to larger datasets, aided by recent work on distributed algoriths for subodular optiization. We also intend to coplete a larger scale user study on how criticis and prototypes presented together affect huan understanding. 8

9 References A. Aaodt and E. Plaza. Case-based reasoning: Foundational issues, ethodological variations, and syste approaches. AI counications, A. Badanidiyuru, B. Mirzasoleian, A. Karbasi, and A. Krause. Streaing subodular axiization: Massive data suarization on the fly. In KDD. ACM, I. Bichindaritz and C. Marling. Case-based reasoning in the health sciences: What s next? AI in edicine, J. Bien and R. Tibshirani. Prototype selection for interpretable classification. The Annals of Applied Statistics, pages , R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Stur, and N. Elhadad. Intelligible odels for healthcare: Predicting pneuonia risk and hospital 30-day readission. In KDD, M.S. Cohen, J.T. Freean, and S. Wolf. Metarecognition in tie-stressed decision aking: Recognizing, critiquing, and correcting. Huan Factors, J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. Iagenet: A large-scale hierarchical iage database. In CVPR, U. Feige. A threshold of ln n for approxiating set cover. JACM, A. Gelan, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian data analysis. Taylor & Francis, A. Gretton, K.M. Borgwardt, M.J. Rasch, B. Schölkopf, and A. Sola. A kernel ethod for the two-saple proble. JMLR, K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for iage recognition. arxiv: , J.J. Hull. A database for handwritten text recognition research. TPAMI, T.S. Jaakkola, D. Haussler, et al. Exploiting generative odels in discriinative classifiers. In NIPS, pages , L. Kaufan and P. Rousseeuw. Clustering by eans of edoids. North-Holland, B. Ki, C. Rudin, and J.A. Shah. The Bayesian Case Model: A generative approach for case-based reasoning and prototype classification. In NIPS, O.O. Koyejo, R. Khanna, J. Ghosh, and R. Poldrack. On prior distributions and approxiate inference for structured variables. In NIPS, A. Krause, A. Singh, and C. Guestrin. Near-optial sensor placeents in gaussian processes: Theory, efficient algoriths and epirical studies. JMLR, L. I. Kuncheva and J.C. Bezdek. Nearest prototype classification: clustering, genetic algoriths, or rando search? IEEE Transactions on Systes, Man, and Cybernetics, 28(1): , H. Lin and J. Biles. A class of subodular functions for docuent suarization. In ACL, J. R. Lloyd and Z. Ghahraani. Statistical odel criticis using kernel two saple tests. In NIPS, B. Mirzasoleian, A. Karbasi, A. Badanidiyuru, and A. Krause. Distributed subodular cover: Succinctly suarizing assive data. In NIPS, G. L Nehauser, L.A. Wolsey, and M.L. Fisher. An analysis of approxiations for axiizing subodular set functions. Matheatical Prograing, A. Newell and H.A. Sion. Huan proble solving. Prentice-Hall Englewood Cliffs, C.E. Priebe, D.J. Marchette, J.G. DeVinney, and D.A. Socolinsky. Classification using class cover catch digraphs. Journal of classification, O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, and L. Fei-Fei. IageNet Large Scale Visual Recognition Challenge. IJCV, D. Shara, A. Kapoor, and A. Deshpande. On greedy axiization of entropy. In ICML, I. Sion, N. Snavely, and S.M. Seitz. Scene suarization for online iage collections. In ICCV, K.R. Varshney. Engineering safety in achine learning. arxiv: ,

10 Proof of Theore 2 Observe that fro the eleent-wise upper bound on H, the following eleent-wise inequality holds h E H h E + νe. Thus, fro the linearity of F (H, S) = A(S), H with respect to H, we have that: F (h E, S) F (H, S) F (h E + νe, S), where (by linearity) F (h E + νe, S) = h F (E, S) + νf (E, S). Next, eploying ters: a(s) = F (E, S) = H(S), E and b(s) = F (E, S) = A(S), E. we ay rewrite the bounds as: h a(s) F (H, S) h a(s) + νb(s). Monotonicity: The function F (H, S) is onotone with respect to S if: F (H, S {u}) F (H, S) 0. Applying the lower and upper bounds, we have that: F (H, S {u}) F (H, S) h a(s {u}) h a(s) νb(s) 0 = ν h a(s {u}) h a(s) b(s) = h α(n, ) Thus, when the off-diagonal ters satisfy h i,j h α(n, ) 0, (i, j) E, we have that F (H, S) is onotone. Subodularity: The function F (H, S) is subodular with respect to S if: F (H, S {u}) + F (H, S {v}) F (H, S {u, v}) + F (H, S). Again, applying the lower and upper bounds, we have that: F (H, S {u}) + F (H, S {v}) F (H, S {u, v}) F (H, S) h a(s {u}) + h a(s {v}) h a(s {u, v}) νb(s {u, v}) h a(s) νb(s) 0 = ν h a(s {u}) + a(s {v}) a(s {u, v}) a(s) b(s {u, v}) + b(s) = h β(n, ) Thus, when the off-diagonal ters satisfy h i,j h β(n, ) 0, (i, j) E, we have that F (H, S) is subodular. Proof of Corollary 3 Based on the diagonal doinance assuption on K, it is clear that E = {i, j [n] i j} indexes the off diagonal ters, and E = 1 E = I. Given A(S) with entries a i,j(s) = 2 1 n S [j S] 1 1 S 2 [i S] 1 [j S], we can copute the bounds (8) siply by enuerating sus as: a(s) = A(S), I = 2 n = 2 2 n 1 2(n ) b(s) = A(S), 1 I = n 2 2 = 2(n 1) n 1 Monotonicity: J p( ) is onotone when the upper bound of the off-diagonal ters is given by α(n, ) = by Theore 2. We have that: a(s {u}) a(s) b(s) Thus: a(s {u}) a(s) = α(n, ) = (n 1), b(s) = 1 n. n ( + 1)((n 2) + n). This is a decreasing function wrt. Further, for the ground set 2 [n], we have that = n, and α(n, n) = 1 n 2 1 Subodularity: J p( ) is subodular when the upper bound of the off-diagonal ters is given by β(n, ) = by Theore 2. We have that: a(s {u})+a(s v}) a(s {u,v}) a(s) b(s {u,v})+b(s) a(s {u}) + a(s v}) a(s {u, v}) a(s) = (n 1) b(s {u, v}) + b(s) = + 1 n

11 Thus: β(n, ) = n ( + 1)(n( ) 2( 2 + 2)) This is a decreasing function wrt. Further, for the ground set 2 [n], we have that = n, and β(n, n) = 1 n 3 +2n 2 2n 3. Cobined Bound: Finally, we show that β(n, n) α(n, n), so that the bound k i,j k β(n, n) is sufficient to guarantee both onotonicity and subodularity. β(n, n) α(n, n) 1 = n 3 + 2n 2 2n 3 1 n 2 1 = n 2 1 n 3 + 2n 2 2n 3 = 0 n 3 + n 2 n 3 = 0 (n 1)(n 2 2) which holds when n > 1 and n 2. Thus β(n, n) α(n, n). The proof is coplete. Proof of Theore 7 A discrete function is linear if it can be written in the for F (C) = i [n] wi1 [i C]. Consider (9) and observe that: L(C) = 1 k(x i, x l ) 1 k(x j, x l ) l C n i [n] j S = 1 k(x i, x l ) 1 k(x j, x l ) l [n] n 1 [l C] i [n] j S = w l 1 [l C], where: l [n] w l = 1 k(x i, x l ) 1 k(x j, x l ) n. i [n] j S 11

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Examples are not Enough, Learn to Criticize! Criticism for Interpretability Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim 1 Rajiv Khanna 2 Oluwasanmi Koyejo 3 1 Allen Institute for AI 2 UT AUSTIN 3 UIUC NIPS, 2016/ Presenter: Ji Gao Presenter:

More information

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Examples are not Enough, Learn to Criticize! Criticism for Interpretability Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi Koyejo Wittawat Jitkrittum Gatsby Machine Learning Journal Club 16 Jan 2017 1/20 Summary Examples

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Mixed Robust/Average Submodular Partitioning

Mixed Robust/Average Submodular Partitioning Mixed Robust/Average Subodular Partitioning Kai Wei 1 Rishabh Iyer 1 Shengjie Wang 2 Wenruo Bai 1 Jeff Biles 1 1 Departent of Electrical Engineering, University of Washington 2 Departent of Coputer Science,

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

arxiv: v1 [cs.cv] 28 Aug 2015

arxiv: v1 [cs.cv] 28 Aug 2015 Discrete Hashing with Deep Neural Network Thanh-Toan Do Anh-Zung Doan Ngai-Man Cheung Singapore University of Technology and Design {thanhtoan do, dung doan, ngaian cheung}@sutd.edu.sg arxiv:58.748v [cs.cv]

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION Vol. IX Uncertainty Models For Robustness Analysis - A. Garulli, A. Tesi and A. Vicino

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION Vol. IX Uncertainty Models For Robustness Analysis - A. Garulli, A. Tesi and A. Vicino UNCERTAINTY MODELS FOR ROBUSTNESS ANALYSIS A. Garulli Dipartiento di Ingegneria dell Inforazione, Università di Siena, Italy A. Tesi Dipartiento di Sistei e Inforatica, Università di Firenze, Italy A.

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information