Rotational Prior Knowledge for SVMs

Size: px
Start display at page:

Download "Rotational Prior Knowledge for SVMs"

Transcription

1 Rotational Prior Knowledge for SVMs Arkady Epshteyn and Gerald DeJong University of Illinois at Urbana-Chapaign, Urbana, IL 68, USA Abstract. Incorporation of prior knowledge into the learning process can significantly iprove low-saple classification accuracy. We show how to introduce prior knowledge into linear support vector achines in for of constraints on the rotation of the noral to the separating hyperplane. Such knowledge frequently arises naturally, e.g., as inhibitory and excitatory influences of input variables. We deonstrate that the generalization ability of rotationally-constrained classifiers is iproved by analyzing their VC and fat-shattering diensions. Interestingly, the analysis shows that large-argin classification fraework justifies the use of stronger prior knowledge than the traditional VC fraework. Epirical experients with text categorization and political party affiliation prediction confir the usefulness of rotational prior knowledge. Introduction Support vector achines (SVMs) have outperfored copeting classifiers on any classification tasks [,2,]. However, the aount of labeled data needed for SVM training can be prohibitively large for soe doains. Intelligent user interfaces, for exaple, ust adopt to the behavior of an individual user after a liited aount of interaction in order to be useful. Medical systes diagnosing rare diseases have to generalize well after seeing very few exaples. Natural language processing systes learning to identify infrequent social events (e.g., revolutions, wars, etc.) fro news articles have access to very few training exaples. Moreover, they rely on anually labeled data for training, and such data is often expensive to obtain. Various techniques have been proposed specifically to deal with the proble of learning fro very sall datasets. These include active learning [4], hybrid generative-discriinative classification [], learning-to-learn by extracting coon inforation fro related learning tasks [6], and using prior knowledge. In this work, we focus on the proble of using prior knowledge to increase the accuracy of a large argin classifier at low saple sizes. Several studies have shown the efficacy of this ethod. Scholkopf et. al. [7] deonstrate how to integrate prior knowledge about invariance under transforations and iportance of local structure into the kernel function. Fung et. al. [8] use doain knowledge in for of labeled polyhedral sets to augent the training data. Wu and Srihari [9] allow huan users to specify their confidence in the exaple s label, varying the effect of each exaple on the separating hyperplane proportionately

2 to its confidence. Mangasarian et. al. [] introduce prior knowledge into the large-argin regression fraework. While the ability of prior knowledge to iprove any classifier s generalization perforance is well-known, the properties of large argin classifiers with prior knowledge are not well understood. In order to study this proble, we introduce a new for of prior knowledge for SVMs (rotational constraints) and prove that it is possible to obtain stronger guarantees for the generalization ability of constrained classifiers in the large-argin fraework than in the classical VC fraework. Specifically, we show that the VC diension of our classifier reains large even when its hypothesis space is severely constrained by prior knowledge. The fat-shattering diension, however, continues to decrease with decreasing hypothesis space, justifying the use of stronger doain knowledge. We conduct experients to deonstrate iproveents in perforance due to rotational prior knowledge and copare the with iproveents achievable by active learning. 2 Preliinaries The SVM classifier with a linear kernel learns a function of the for sign(f(x; ω, θ) = ω T x + θ = n ω i x i + θ) () i = that aps (x; ω, θ) R n xw, Θ to one of the two possible output labels {, }. Given a training saple of points (x, y )...(x, y ), SVM seeks to axiize the argin between the separating hyperplane and the points closest to it []. For canonical hyperplanes (i.e., hyperplanes with unit argins), the axiu-argin hyperplane iniizes the regularized risk functional R reg [f, l] = l(y i, f(x i ; ω, θ)) + C 2 ω 2 2 (2) i = with hard argin - loss given by l(y i, f(x i ; ω, θ)) = I { yi f(x i ;ω,θ)>}. The soft argin forulation allows for deviation fro the objective of axiizing the argin in order to better fit the data. This is done by substituting the hinge loss function l(y i, f(x i ; ω, θ)) = ax( y i f(x i ; ω, θ), ) into (2). Miniizing the regularized risk (2) in the soft argin case is equivalent to solving the following (prial) optiization proble: iniize ω, θ, ξ 2 ω C ξ i subj. to y i = i (ω T x + θ) ξ i, i =... () sign(y) = if y, otherwise

3 Calculating the Wolfe dual fro () and solving the resulting axiization proble: axiize α i 2 α i α j y i y j (x T i x j) (4) α i = i, j = subject to C α i, i =... and α i y i = i = yields the solution ω = α i y i x i () i = Setting ξ i =, i =..., (), (4), and () can be used to define and solve the original hard argin optiization proble. The generalization error of a classifier is governed by its VC diension []: Definition. A set of points S = {x...x } is shattered by a set of functions F apping fro a doain X to {, } if, for each b {, }, there is a function f b in F with bf b (x i ) =, i =... The VC-diension of F is the cardinality of the largest shattered set S. Alternatively, the fat-shattering diension can be used to bound the generalization error of a large argin classifier[]: Definition 2. A set of points S = {x...x } is γ-shattered by a set of functions F apping fro a doain X to R if there are real nubers r,..., r such that, for each b {, }, there is a function f b in F with b(f b (x i ) r i ) γ, i =... We say that r,..., r witness the shattering. Then the fat-shattering diension of F is a function fat F (γ) that aps γ to the cardinality of the largest γ-shattered set S. Proble Forulation and Generalization Error In this work, we introduce prior knowledge which has not been previously applied in the SVM fraework. This prior is specified in ters of explicit constraints placed on the noral vector of the separating hyperplane. For exaple, consider the task of deterining whether a posting cae fro the newsgroup alt.atheis or talk.politics.guns, based on the presence of the words gun and atheis in the posting. Consider the unthresholded perceptron f(posting; ω atheis, ω gun, θ) = ω atheis I {atheis present} + ω gun I {gun present} + θ (I {x present} is the indicator function that is when the word x is present in the posting and otherwise). A positive value of ω atheis captures excitatory influence of the word atheis on the outcoe of classification by ensuring that the value of f(posting; ω atheis, ω gun, θ) increases when the word atheis is encountered in the posting, all other things being equal. Siilarly, constraining ω gun to be

4 negative captures an inhibitory influence. Note that such constraints restrict the rotation of the hyperplane, but not its translation offset θ. Thus, prior knowledge by itself does not deterine the decision boundary. However, it does restrict the hypothesis space. We are interested in iposing constraints on the paraeters of the faily F of functions sign(f(x; ω, θ)) defined by (). Constraints of the for ω T c > generalize excitatory and inhibitory sign constraints 2 (e.g., ω i > is given by c = [c =,..., c i =,..., c n = ] T ). In addition, soeties it is possible to deterine the approxiate orientation of the hyperplane a-priori. Noralizing all the coefficients ω i in the range [, ] enables the doain expert to specify the strength of the contribution of ω gun and ω atheis in addition to to the signs of their influence. When prior knowledge is specified in ters of an orientation vector v, the conic constraint fro deviating too far fro v. ω T v ω v > ρ (ρ [, )) prevents the noral ω It is well-known that the VC-diension of F in R n is n + (see, e.g., [2]). Interestingly, the VC-diension of constrained F is at least n with any nuber of constraints iposed on ω W as long as there is an open subset of W that satisfies the constraints (this result follows fro []). This eans that any value of ρ in the conic constraint cannot result in significant iproveent in the classifier s generalization ability as easured by its VC-diension. Siilarly, sign constraints placed on all the input variables cannot decrease the classifier s VCdiension by ore than. The following theore shows that the VC-diension of a relatively weakly constrained classifier achieves this lower bound of n: Theore. For the class F C n = {x sign( ω i x i + θ) : ω > }, VC- i = diension of F C = n. Proof. The proof uses techniques fro [2]. Let F C = {x sign(ω x + ω T x + θ) : ω > }, where x = [x 2,..., x n ] T is the projection of x into the hyperplane {ω = } and ω = [ω 2,...ω n ] T. First, observe that {ω > } defines an open subset of W. Hence, the VC-diension of F C is at least n. Now, we show by contradiction that a set of n + points cannot be shattered by F C. Assue that soe set of points x,..., x n+ R n can be shattered. Let x,..., x n+ R n be their projections into the hyperplane {ω = }. There are two cases: Case : x,..., x n+ are distinct. Since these are n + points in an (n )- diensional hyperplane, by Radon s Theore [4] they can be divided into two sets S and S 2 whose convex hulls intersect. Thus, λ i, λ j ( λ i, λ j ) 2 In the rest of the paper, we refer to excitatory and inhibitory constraints of the for ω i > (ω i < ) as sign constraints because they constrain the sign of w i.

5 such that λ i x i = i : x i S j : x j S 2 λ j x j (6) and i : x i S λ i = j : x j S 2 λ j = (7) Since x,..., x n+ are shattered in R n, ω, ω, θ such that ω x i + ω T x i θ for all x i S. Multiplying by λ i and suing over i, we get (after applying (7)) ω T λ i x i θ ω λ i x i (8) i : x i S i : x i S Siilarly, for all x j S 2, ω x j + ωt x j < θ ω T λ j x j < θ ω λ j x j j : x j S 2 j : x j (9) S 2 Cobining (8), (9), and (6) yields ω ( λ j x j j : x j λ i x i ) < S 2 i : x i S () Since ω >, ( λ j x j j : x j λ i x i ) < () S 2 i : x i S Now, shattering the sae set of points, but reversing the labels of S and S 2 iplies that ω, ω, θ such that ω x i + ω T x i < θ for all x i S and ω x j + ω T x j θ for all x j S 2. An arguent identical to the one above shows that ω ( λ j x j j : x j λ i x i ) > (2) S 2 i : x i S Since ω >, ( λ j x j j : x j λ i x i ) >, which contradicts () S 2 i : x i S Case 2: Two distinct points x and x 2 project to the sae point x = x 2 () on the hyperplane {ω = }. Assue, wlog, that x < x 2 (4). Since x and x 2 are shattered, ω, ω, θ such that ω x + ω T x θ > ω x 2 + ω T x 2, which, together with () and (4), iplies that ω <, a contradiction.

6 This result eans that iposing a sign constraint on a single input variable or using ρ = in the conic constraint is sufficient to achieve the axiu theoretical iproveent within the VC fraework. However, it is unsatisfactory in a sense that it contradicts our intuition (and epirical results) which suggests that stronger prior knowledge should help the classifier reduce its generalization error faster. The following theore shows that the fat-shattering diension decreases continuously with increasing ρ in the conic constraint, giving us the desired guarantee. Technically, the fat-shattering diension is a function of the argin γ, so we use the following definition of function doination to specify what we ean by decreasing fat-shattering diension: Definition. A function f (x) is doinated by a function f 2 (x) if, for all x, f (x) f 2 (x) and, at least for one a, f (a) < f 2 (a). When we say that f ρ (x) decreases with increasing ρ, we ean that ρ < ρ 2 iplies that f ρ2 (x) is doinated by f ρ (x). Theore 2. For the class F v,ρ = {x ω T x + θ : ω 2 =, v 2 =, x 2 R, ω T v > ρ }, fat Fv,ρ (γ) decreases with increasing ρ. 4 Proof. The fat-shattering diension obviously cannot increase with increasing ρ, so we only need to find a value of γ where it decreases. We show that this happens at γ = R ρ 2 2. First, we upper bound fat F v,ρ2 (γ ) by showing that, in order to γ -shatter two points, the separating hyperplane ust be able to rotate through a larger angle than that allowed by the constraint ω T v > ρ 2. Assue that two points x, x 2 can be γ -shattered by F v,ρ2. Then ω, ω 2, θ, θ 2, r, r 2 such that ω T x + θ r γ, ω T x 2 + θ r 2 γ, ω 2T x + θ 2 r γ, ω 2T x 2 + θ 2 r 2 γ. Cobining the ters and applying the Cauchy-Schwartz inequality, we get ω ω 2 2γ R. Squaring both sides, expanding ω ω 2 2 as ω 2 + ω 2 2 2ω T ω 2, and using the fact that ω = ω 2 = yields ω T ω 2 2γ 2 R 2 = 2ρ2 2 () Since the angle between ω and ω 2 cannot exceed the su of the angle between ω and the prior v and the angle between v and ω 2, both of which are bounded above by arccos(ρ 2 ), we get (after soe algebra) ω T ω 2 > 2ρ 2 2, which contradicts (). Thus, fat Fv,ρ2 (R ρ 2 2 ) < 2 (6) Now, we lower bound fat Fv,ρ (γ ) by exhibiting two points γ -shattered by F v,ρ. Wlog, let v = [,,,..] T. It is easy to verify that x = [R,,..] T and x 2 = The constraint {w > } is weak since it only cuts the volue of the hypothesis space by. 4 2 Note that the stateent of this theore deals with hyperplanes with unit norals, not canonical hyperplanes. The argin of a unit-noral hyperplane is given by in i=.. ωx i + θ.

7 [ R,,..] T can be R ρ 2 2 -shattered by F v,ρ, witnessed by r = r 2 =. Hence, fat Fv,ρ (R ρ 2 2 ) 2 (7) which, cobined with (6), copletes the arguent. The result of Theore is iportant because it shows that even weak prior knowledge iproves the classifier s generalization perforance in the VC fraework which akes less assuptions about the data than the fat-shattering fraework. However, it is the result of Theore 2 within the fat-shattering fraework which justifies the use of stronger prior knowledge. 4 Ipleentation The quadratic optiization proble for finding the axiu argin separating hyperplane (2) can be easily odified to take into account linear rotational constraints of the for ω T c j >, j =...l. The soft argin/soft constraint forulation that allows for possibility of violating both the argin axiization objective and the rotational constraints iniizes the following regularization functional: R reg [f, l, l ] = l(y i, f(x i ; ω, θ)) + C C i = 2 l l j = l (ω, c j ) + C 2 ω 2 2 (8) with - losses for the data and the prior: l(y i, f(x i ; ω, θ)) = I { yi f(x i ;ω,θ)>} and l (ω, c j ) = I { ω T c j >} in the hard argin/hard rotational constraints case and hinge losses: l(y i, f(x i ; ω, θ)) = ax( y i f(x i ; ω, θ), ),l (ω, c j ) = ax( ω T c j, ) in the soft argin/soft rotational constraints case. The regularization functional above is the sae as in (2) with an additional loss function which penalizes the hyperplanes that violate the prior. Miniizing (8) with hinge loss functions is equivalent to solving: iniize ω, θ, ξ, ν 2 ω C i = ξ i + C 2 l l ν j (9) j = subject to y i (ω T x + θ) ξ i, ξ i, i =..., ω T c j ν j, ν j, j =...l. Constructing the Lagrangian fro (9) and calculating the Wolfe dual results in the following axiization proble: axiize α, β α i 2 α i α j y i y j (x T i x j) i = i, j = l α i β j y i (x T i c j ) l β i β j (x T i c j ) (2) 2 i = j = i, j =

8 a) b) c) Fig.. Approxiating a conic constraint: a) Start with the known constraints ω, ω 2, and ω around v = [,, ] T. The figure shows linear constraints around v (white vector) and the cone ωt v ω > approxiated by these constraints. b) Rotate the bounding hyperplanes {ω = }, {ω 2 = }, {ω = } into v, approxiating a cone with the required angle ρ around v c) Rotate the whole boundary fro v (white vector in (a),(b)) to the required orientation v (white vector in (c)). subj. to C α i, i =..., C 2 l β j, i =...l, and α i y i = i = The solution to (2) is given by ω = l α i y i x i + β j c j. (2) i = j = As before, setting ξ i =, ν j =, i =..., j =...l, (9), (2), and (2) can be used to solve the hard argin/hard rotational constraints optiization proble. Note that in the soft-argin forulation, constants C and C 2 define a trade-off between fitting the data, axiizing the argin, and respecting the rotational constraints. The above calculation can ipose linear constraints on the orientation of the large argin separating hyperplane when such constraints are given. This is the case with sign-constrained prior knowledge. However, doain knowledge in for of a cone centered around an arbitrary rotational vector v cannot be represented as a linear constraint in the quadratic optiization proble given by (9). The approach taken in this work is to approxiate an n-diensional cone with n hyperplanes. For exaple, sign constraints ω, ω 2, and ω approxiate a cone of angle ρ = around v = [,, ] T (see Figure -(a)). To approxiate a cone of arbitrary angle ρ around an arbitrary orientation vector v, ) the noral ω i of each bounding hyperplane {ω i = } (as defined by the sign constraints above) is rotated in the plane spanned by {ω i, v } by an angle acos(ω i T v ) ρ, and 2) a solid body rotation that transfors v into v is subsequently applied to all the bounding hyperplanes, as illustrated in Figure. This construction generalizes in a straightforward way fro R to R n.

9 Experients a) b) Deocrat (ω i > ).8 Handicapped Infants.6 Anti-Satellite Weapon Test Ban.4 Aid To Nicaraguan Contras.2 Iigration South Africa Export Adinistration Act. Republican (ω i < ).8 Military Aid to El Salvador.6 Religious Groups in Schools.4 Generalization Error Voting sv w/prior sv Nuber of Training Points Fig. 2. a) Prior Knowledge for voting b)generalization error as a percentage versus nuber of training points for voting classification. For each classification task, the data set is split randoly into training and test sets in different ways. SVM classifier is trained on the training set with and without prior knowledge, and its average error on the test set is plotted, along with error bars showing 9% confidence intervals. Experients were perfored on two distinct real-world doains: Voting Records. This is a UCI database [] of congressional voting records. The vote of each representative is recorded on the 6 key issues. The task is to predict the representative s political party (Deocrat or Republican) based on his/her votes. The doain theory was specified in for of inhibitory/excitatory sign constraints. An excitatory constraint eans that the vote of yea correlates with the Deocratic position on the issue, an inhibitory constraint eans that Republicans favor the proposal. The coplete doain theory is specified in Figure 2-(a). Note that sign constraints are iposed on relatively few features (7 out of 6). Since this type of doain knowledge is weak, a hard rotational constraint SVM was used. Only representatives whose positions are known on all the 6 issues were used in this experient. The results shown in Figure 2- (b) deonstrate that sign constraints decrease the generalization error of the classifier. As expected, prior knowledge helps ore when the data is scarce. Text classification. The task is to deterine the newsgroup that a posting was taken fro based on the posting s content. We used the 2-newsgroups dataset [6]. Each posting was treated as a bag-of-words, with each binary feature encoding whether or not the word is present in the posting. Steing was used in the preprocessing stage to reduce the nuber of features. Feature selection based on utual inforation between each individual feature and the label was eployed ( axially inforative features were chosen). Since SVMs are best suited for binary classification tasks, all of our experients involve pairwise newsgroup classification. The proble of applying SVMs to ulticat-

10 egory classification has been researched extensively([2,]), and is orthogonal to our work. a)atheis vs politics.guns b)politics. guns vs ideast c)religion.christian vs autos d)sci.edicine vs sport.hockey e) sci.edicine vs politics.guns Legend: sv w/prior sv active sv Fig.. Generalization error as a percentage versus nuber of training points for different classification experients. For each rando saple selection, the data set is split randoly into training and test sets in different ways. For active learning experients, the data set is split randoly into two equal-sized sets in different ways, with one set used as the unlabeled pool for query selection, and the other set - for testing. All error bars are based on 9% confidence intervals. Prior knowledge in this experient is represented by a conic constraint around a specific orientation vector v. While it ay be hard for huan experts to supply such a prior, there are readily available sources of doain knowledge that were not developed specifically for the classification task at hand. In order to be able to utilize the, it is essential to decode the inforation into a for usable by the learning algorith. This is the virtue of rotational constraints: they are directly usable by SVMs and they can approxiate ore sophisticated pre-existing fors of inforation. In our experients, doain knwoledge fro Wordnet, a lexical syste which encodes seantic relations between words [7], is autoatically converted into v. The coefficient v x of each word x is calculated fro the relative proxiity of x to each category label in the hyperny (is-a) hierarchy of Wordnet (easured in hops). A natural approxiation of v x is given hops(x,label +) hops(x,label )+hops(x,label + ) by, noralized by a linear apping to the required range [, ], where label + and label are the naes of the two newsgroups. Perforance of the following three classifiers on this task was evaluated:

11 . A soft rotational constraint SVM (C = C 2 = ) with Wordnet prior (ρ =.99) (reasonable values of constants were picked based on the alt.atheis vs. politics.guns classification task, with no attept to optiize the for other tasks). 2. An SVM which actively selects the points to be labeled out of a pool of unlabeled newsgroup postings. We ipleented a strategy suggested in [4] which always queries the point closest to the separating hyperplane.. Traditional SVM (C = ) trained on a randoly selected saple. Typical results of this experient for a few different pairwise classification tasks appear in Figure. For sall data saples, the prior consistently decreases generalization error by up to 2%, showing that even a very approxiate prior orientation vector v can result in significant perforance iproveent. Since prior knowledge is iposed with soft constraints, the data overwhels the prior with increasing saple size. Figure also copares the effect of introducing rotational constraints with the effect of active learning. It has been shown theoretically that active learning can iprove the convergence rate of the classification error under a favorable distribution of the input data [8], although no such guarantees exist for general distributions. In our experients, active learning begins to iprove perforance only after enough data is collected. Active learning does not help when the saple size is very sall, probably due to the fact that the separating hyperplane of the classifier cannot be approxiated well, resulting in uninforative choices of query points. Rotational prior knowledge, on the other hand, is ore helpful for lowest saple sizes and ceases to be useful in the region where active learning helps. Thus, the strengths of prior knowledge and active learning are copleentary. Cobining the is a direction for future research. 6 Conclusions. We presented a siple fraework for incorporating rotational prior knowledge into support vector achines. This fraework has proven not only practically useful, but also useful for gaining insight into generalization ability of a-priori constrained large-argin classifiers. Related work includes using Wordnet for feature creation for text categorization ([9]) and introducing sign constraints into the perceptron learning algorith [2,2]. These studies do not provide generalization error guarantees for classification. Acknowledgeents. We thank Ilya Shpitser and anonyous reviewers for helpful suggestions on iproving this paper. This aterial is based upon work supported in part by the National Science Foundation under Award NSF CCR -24 ITR and in part by the Inforation Processing Technology Office of the Defense Advanced Research Projects Agency under award HR---4. Any opinions, findings, and conclusions or recoendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Defense Advanced Research Projects Agency.

12 References. Vapnik, V.: The Nature of Statistical Learning Theory. Springer-Verlag (99) 2. Joachis, T.: Text categorization with support vector achines: learning with any relevant features. In: Proceedings of the Tenth European Conference on Machine Learning. Nuber 98 (998). Duas, S., Platt, J., Heckeran, D., Sahai, M.: Inductive learning algoriths and representations for text categorization. Proceedings of the Seventh International Conference on Inforation and Knowledge Manageent (998) 4. Capbell, C., Cristianini, N., Sola, A.: Query learning with large argin classifiers. Proceedings of The Seventeenth International Conference on Machine Learning (2) 8. Raina, R., Shen, Y., Ng, A., McCallu, A.: Classification with hybrid generative/discriinative odels. Proceedings of the Seventeenth Annual Conference on Neural Inforation Processing Systes (2) 6. Fink, M.: Object classification fro a single exaple utilizing class relevance etrics. Proceedings of the Eighteenth Annual Conference on Neural Inforation Processing Systes (24) 7. Scholkopf, B., Siard, P., Vapnik, V., Sola, A.: Prior knowledge in support vector kernels. Advances in kernel ethods - support vector learning (22) 8. Fung, G., Mangasarian, O., Shavlik, J.: Knowledge-based support vector achine classifiers. Proceedings of the Sixteenth Annual Conference on Neural Inforation Processing Systes (22) 9. Wu, X., Srihari, R.: Incorporating prior knowledge with weighted argin support vector achines. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (24). Mangasarian, O., Shavlik, J., Wild, E.: Knowledge-based kernel approxiation. Journal of Machine Learning Research (24). Shawe-Taylor, J., Bartlett, P.L., Williason, R.C., Anthony, M.: Structural risk iniization over data-dependent hierarchies. IEEE Transactions on Inforation Theory 44 (998) 2. Anthony, M., Biggs, N.: PAC learning and artificial neural networks. Technical report (2). Erlich, Y., Chazan, D., Petrack, S., Levy, A.: Lower bound on VC-diension by local shattering. Neural Coputation 9 (997) 4. Grunbau, B.: Convex Polytopes. John Wiley (967). Blake, C., Merz, C.: UCI repository of achine learning databases, learn/lrepository.htl (998) 6. Blake, C., Merz, C.: 2 newsgroups database, 2newsgroups/ (998) 7. Miller, G.: WordNet: an online lexical database. International Journal of Lexicography (99) 8. Dasgupta, S., Kalai, A.T., Monteleoni, C.: Analysis of perceptron-based active learning. Eighteenth Annual Conference on Learning Theory (2) 9. Gabrilovich, E., Markovitch, S.: Text categorization with any redundant features: Using aggressive feature selection to ake svs copetitive with c4.. Proceedings of The Twenty-First International Conference on Machine Learning (24) 2. Ait, D., Capbell, C., Wong, K.: The interaction space of neural networks with sign-constrained weights. Journal of Physics (989) 2. Barber, D., Saad, D.: Does extra knowledge necessarily iprove generalization? Neural Coputation 8 (996)

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Predicting FTSE 100 Close Price Using Hybrid Model

Predicting FTSE 100 Close Price Using Hybrid Model SAI Intelligent Systes Conference 2015 Noveber 10-11, 2015 London, UK Predicting FTSE 100 Close Price Using Hybrid Model Bashar Al-hnaity, Departent of Electronic and Coputer Engineering, Brunel University

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Two-Diensional Multi-Label Active Learning with An Efficient Online Adaptation Model for Iage Classification Guo-Jun Qi, Xian-Sheng Hua, Meber,

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Department of Physics Preliminary Exam January 3 6, 2006

Department of Physics Preliminary Exam January 3 6, 2006 Departent of Physics Preliinary Exa January 3 6, 2006 Day 1: Classical Mechanics Tuesday, January 3, 2006 9:00 a.. 12:00 p.. Instructions: 1. Write the answer to each question on a separate sheet of paper.

More information

MODIFICATION OF AN ANALYTICAL MODEL FOR CONTAINER LOADING PROBLEMS

MODIFICATION OF AN ANALYTICAL MODEL FOR CONTAINER LOADING PROBLEMS MODIFICATIO OF A AALYTICAL MODEL FOR COTAIER LOADIG PROBLEMS Reception date: DEC.99 otification to authors: 04 MAR. 2001 Cevriye GECER Departent of Industrial Engineering, University of Gazi 06570 Maltepe,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information