Active Learning Models and Noise

Size: px
Start display at page:

Download "Active Learning Models and Noise"

Transcription

1 Actve Learnng Models and Nose Sara Stolbach COMS 6253, Advanced CLT May 3, 2007 Abstract I study actve learnng n general pool-based actve learnng models as well nosy actve learnng algorthms and then compare them for the class of lnear separators under the unform dstrbuton. 1 Introducton There are often cases where data s abundant and easly accessble but labelng the data s costly. For example, n bonformatcs many DNA sequences are avalable but decodng one sequence can take many hours or days for one person to acheve. Ths scenaro s known as actve learnng; the labels of data are hdden and the learner can pay for the label of any example. Ths s not captured n the typcal PAC supervsed learnng scenaro. An actve learnng algorthm wll want to mnmze the number of examples t needs to label due to the expense of labelng. In ths paper I wll focus on pool-based actve learnng models, whch s when the learner can pay for the label of any example n a pool of unlabeled examples as opposed to another model where ponts can be created synthetcally. A nosy dataset s dffcult n the actve learnng settng snce standard actve learnng models seek to fnd the most nformatve examples whch tend to be the most nose-prone. I wll frst dscuss some noseless actve learnng models and show why they are nose-prone and then dscuss two nosy actve learnng models and how they compare as well as dscuss the next steps that should be taken. 1

2 For the most part actve learnng methods fall under three orthogonal technques; generalzed bnary search, opportunstc prors or algorthmc luckness, and Bayesan assumptons. Opportunstc prors s when a unform bet over all Ĥ leads to standard VC generalzaton bounds. If the algorthm places more weght on a certan hypothess, t could be excellent f guessed rght but worse than usual f guessed wrong. Ths method s not as practcal because of ts unpredctablty. The other two methods are dscussed n the paper. Progress can be measured n a number of ways such as the rate at whch the sze of the verson space decreases and the number of label requests needed. I wll be focusng on the number of label requests needed. 2 Prelmnares Some general defntons used throughout the paper are mentoned here; the varables have these meanngs unless specfed dfferently. Let X be the nstance space of examples x that are..d over the unform nput dstrbuton D n verson space V. Let n be the number of label requests. Let C be the concept class over dstrbuton P. Let η denote the nose rate n nosy models. A common applcaton examned n actve learnng s lnear separators through the orgn of a unt sphere n R d. X s the set of all data on the surface area of the sphere such that X = {x R d : x = 1}. Each example n X s denoted as ( x, t) where x s the drecton of the example and t s the offset. In other words X = S d [ 1, +1] where S d s the unt sphere around the orgn of R d. The dstrbuton D on X s unform. H s the class of lnear separators through the orgn, and any h H s a homogeneous hyperplane. h s represented by a unt vector w X wth the classfcaton rule h(x) = sgn(w x). The dstance between two hypothess, u and v n H wth respect to D s gven by dstance D (u, v) = θ(u,v) where π θ(u, v) = arccos(u v). 3 General Models 3.1 Bayesan Model The Query by Commttee (QBC) algorthm [1] s one of the most sgnfcant pool-based actve learnng algorthms. It shows that usng queres over random unlabeled examples can accelerate the learnng concept of some classes 2

3 over standard learnng approaches. Work s done n the Bayesan model whch dffers from the PAC model n that t s assumed that the target concept s chosen accordng to a pror dstrbuton P over C and that ths dstrbuton s known to the learner. A Bayesan Model has an mmedate beneft for actve learnng snce f there s large agreement on unlabeled data you can stop and output the current hypothess. The algorthm assumes realzablty, meanng a perfect classfer exsts. The papers analyss s based on probablstc assumptons and they show that queres can help accelerate learnng of concept classes that are determnstc and noseless. The paper also goes on to dscuss a generalzed form of QBC that uses two dstrbutons. It s more computatonally ntensve but can have an outcome that s not bnary or dscrete and the nputs can be stochastc. Algorthm 1 QBC Algorthm Input: > 0, δ > 0, Gbbs, Sample, Label Intalze: n = 0, V 0 = C repeat Call the Sample oracle to get a random nstance of x. Call Gbbs twce to get two predctons p 1 and p 2 for x. f p 1 = p 2 then reject the example else call the Label(x) to get c(x), ncrease n by 1 and set V n to be all concepts c V n 1 where c (x) = c(x) end f untl more than t n consecutve examples are rejected. Output: the Gbbs predcton hypothess The Query by Commttee algorthm (Algorthm 1) uses an oracle denoted Gbbs(V, x) whch computes the Gbbs predcton rule. It predcts the label of a new example x by randomly choosng an h C accordng to D, restrcted to V C, and labelng x accordng to t. Two calls to Gbbs wth the same V and x can result n dfferent predctons. The goal s to label x n order to maxmze the expected nformaton gan to cause an exponentally fast decrease n the error of the Gbbs predcton rule. However, ths s not guaranteed manly because dstrbuton s gnored. If queres of the same type are always called, the predcton error wll stay large. The Sample oracle returns an unlabeled example x X chosen accordng to D. A call to 3

4 the Label oracle wth nput x returns c(x) whch s the label of x accordng to the target concept. The teratons are done n a batch learnng scenaro untl some termnaton condton s acheved. The termnaton condton s met when t n = 1 ln π2 (n+1) 2 consecutve examples are rejected. 3δ Theorem 1 If a concept class C has VC-dmenson 0 < d < and the expected nformaton gan of queres made by QBC s unformly bounded by g > 0 bts, then wth probablty larger than 1 δ over the random choce of the target concept, the sequence of examples, and the choces made by QBC, the number of calls to Sample s smaller than m 0 = max ( 4d δ ( 160(d + 1), max 6, ln g the number of calls to Label s smaller than ) ) 2 80(d + 1), δ 2 g n 0 = 10(d + 1) g ln 4m 0 δ and the probablty that the Gbbs predcton that uses the fnal verson space of QBC makes a mstake s smaller than. [1] It s easy to show that f QBC ever stops then the error of the resultng hypothess s small wth hgh probablty. The real queston s wll the QBC algorthm stop; The proof of Theorem 1, n [1], shows that t wll stop f the number of examples that are rejected between consecutve queres ncreases wth the number of queres; a lnear ncrease. Ths mples that the probablty of acceptng a query or makng a predcton mstake s exponentally small compared to the number of queres asked (based on nformaton gan g > 0). A common class examned n Actve Learnng s unformly dstrbuted half spaces through the orgn of the unt sphere n R d. The nformaton gan from random examples wll vansh as d goes to nfnty because n a hgh dmenson the volume of the sphere s concentrated near the equator. A typcal example wll cut the sphere away from the equator whch means that query examples are especally mportant n hgh dmensons; QBC wll lkely choose two random ponts near the equator so the example that separates them wll lkely be near the equator whch mples that QBC can attan a fnte nformaton gan n hgh dmensons. [1] prove the lower bound on the 4

5 nformaton gan whch mples that Theorem 1 holds and that the number of calls to the Sample oracle s O( d log 1 ) and the number of calls to the δ label oracle s O( d). The paper also proves that QBC Algorthm has such results for the perceptron class as well by modelng t as a specal case of the dstrbuted halfspaces problem. Dasgupta, et al [2] show an algorthm whch uses a smple modfcaton to the perceptron update to provde even better results. The perceptron algorthm uses the same concept class of lnear separators where dataponts are on the surface area of the unt sphere n R d. It starts wth an ntal hypothess V 0 R d and n each teraton receves an unlabeled pont, makes a predcton sgn(v t x t ) and durng the flterng step t decdes whether to ask for ts label based on V t x t threshold, s t. If the label s requested the update step s called. The regular perceptron update s f (x t, y t ) s msclassfed then V t+1 = V t + y t x t. The error rate cannot be better than Ω(1/ l t ) where l t s the number of labels quered up to tme t. They have changed the update to be f (x t, y t ) s msclassfed then V t+1 = V t 2(V t x t )x t. Ths scaled the update by a factor of 2 V t x t to avod oscllatons cause by ponts close to the hyperplane. The flterng step s based on s t ; ts choce s crucal. [2] set t adaptvely by startng t hgh and keep dvdng t n half untl a level s reached where enough msclassfcaton ponts are quered. The modfed perceptron results n a theorem statng that O(d log 1) labels are needed when drawng O( d log 1 ) data ponts at random from the unt sphere n R d, as opposed to the QBC s need of O( d ) dataponts. It wll 2 make O(d log 1 ) errors and have fnal error. The bound mprovements are based on the change to the update step and the threshold, s t, used n the flterng step. The QBC Algorthm would have very poor results n a nosy settng because the wrong examples could be quered for labels producng poor verson spaces. In addton, an adversaral nose model could cause the algorthm to never stop. 3.2 Generalzed Bnary Search Actve learnng could also be looked at as an approach to mprove the same supervsed settng. In a supervsed settng of some class wth VC dmenson d, and error rate over dstrbuton P some m = m(, d) labeled ponts are needed. Dasgupta, [3] uses a greedy generalzed bnary search to examne 5

6 Fgure 1: The boundary can be found wth just O(log m) labels usng bnary search 1 f fewer then m labels are suffcent to learn the class f the ponts are not labeled. In the case of data lyng on the real lne, and a hypothess class H of smple threshold functons t s enough to draw m = O(1/) random labeled examples from P and return a classfer consstent wth them. However, as n Fgure 1, f we use unlabeled examples we can use a smple bnary search to fnd the transton from 0 to 1 whch requres only log m labels to nfer the rest of them. Hence, there s an exponental mprovement. However, what about n the generalzed case; Is t possble to pck among O(m d ) possbltes usng o(m) labels? If bnary search were possble, just O(d log m) labels would be needed. Ths s not the case. In d 2 there are some cases where the target hypothess cannot be dentfed wthout queryng all the labels. However, n the average case the number of labels needed s small. A varant of a popular greedy scheme s used, where one always asks for the label whch most evenly dvdes the current effectve verson space weghted by π. π s merely a devce for averagng queryng counts over some unform dstrbuton Ĥ. Ĥ s used nstead of H; t reflects the underlyng combnatoral structure of the problem, and π can often be chosen to mask ts structure. The expected number of labels needed by ths strategy s at most O(ln Ĥ ) tmes that of any other strategy. Ths s a sgnfcant performance guarantee. A query tree structure s used; there s not always a tree of average depth o(m). The best hope s to come close to mnmzng the number of queres and ths s done usng a greedy approach: Let S Ĥ be the current verson space. For each unlabeled x, let S + be the hypothess whch label x postve and S the ones whch label t negatve. Pck the x for whch the postve and negatve are most nearly equal n π-mass; n other words mn{π(s + ), π(s )} s largest. Generalzed bnary search would clearly have poor results n a nosy set- 1 Ths fgure was taken from Dasgupta s paper [3] 6

7 tng because as prevously mentoned, the dataponts that are chosen to be labeled tend to be the most nose-prone. A small amount of adversaral nose can cause the dataponts that would be chosen to dvde the verson space to gve vrtually no help n learnng the concept. 4 Nosy Models There are two actve learnng models that work wth arbtrary classfcaton nose. The only restrcton s that samples are drawn..d from some underlyng dstrbuton. The results hold for any sort of mechansm used to generate the nose. The algorthms have dfferent restrctons on η. However t s mportant to note that Kaaranen [10] shows a lower bound of Ω( η2 ) 2 on the sample complexty of any actve learner and therefore t can not be hoped to acheve speedups when η s large. 4.1 Agnostc Actve Learnng Balcan et al [4] descrbe an algorthm known as the A 2 algorthm, (Algorthm 2), whch s nose tolerant. It was the frst nose tolerant algorthm and shows some postve results. They produce bounds for Lnear Threshold Functons and lnear separators under the unform dstrbuton where the algorthm s successful for any amount of nose and shows exponental mprovements f η <. However, the algorthm s not very sample effcent. 16 A 2 reles on a subroutne such as the VC bound or Occam s Razor bound to compute the lower bound, LB(S, h, δ), and the upper bound, UB(S, h, δ) on the true error rate of h, err P (h), by usng a sample S of examples drawn..d from P such that LB(S, h, δ) err P (h) UB(S, h, δ) holds for all h wth probablty 1 δ. A 2 algorthm can be vewed as a robust selectve samplng algorthm [9]. Selectve samplng keeps track of two spaces; the current verson space, H, consstent wth all labels quered so far and the regon of uncertanty, R. The regon of uncertanty ncludes all dataponts x X, whch have two hypothess that do not agree on t. In each round of the selectve samplng algorthm, a random unlabeled example s pcked from R and quered, elmnatng all hypothess n H nconsstent wth the receved label. In the agnostc case we cannot elmnate an hypothess based on a sngle example. The A 2 algorthm samples a set of examples S and uses UB and LB to 7

8 calculate the dsagreement of a regon, DISAG D (H ). The dsagreement of a regon s DISAG D (H ) = Pr x D [ h 1, h 2 G : h 1 (x) h 2 (x)]. If all h H agree on some regon t can be safely elmnated thereby reducng the regon of uncertanty. Ths elmnates all hypotheses whose lower bound s greater than the mnmum upper bound. Each round completes when S s large enough to reduce half of ts regon of uncertanty. Therefore, the number of rounds s bounded by log 1. The algorthm stops when DISAG D (H )(mn h H UB(S, h, δ k ) mn h H LB(S, h, δ k )). A 2 returns h = argmn(mn h H UB(S, h, δ k )) where and k s the teraton that the algorthm was n when t satsfed the condton. The bounds for the class of Lnear Separators under the Unform Dstrbuton over the unt sphere for A 2 s descrbed n a later secton. Algorthm 2 A 2 Algorthm Input:, Sample Oracle for D, Label Oracle O, H Intalze: = 1, D = D H = H, S =, and k = 1 whle DISAG D (H )(mn h H UB(S, h, δ k ) mn h H LB(S, h, δ k )) > do Set S =, H = H, k = k + 1 whle DISAG D (H ) 1 2 DISAG D(H ) do f DISAG D (H )(mn h H UB(S, h, δ k ) mn h H LB(S, h, δ k )) then Output: h = argmn(mn h H UB(S, h, δ k )) else S = 2 S +1 sample from D satsfyng h 1, h 2 H : h 1 (x) h 2 (x) S = S {(x, O(x)) : x S }; H = {h H : LB(S, h, δ k ) mn h H UB(S, h, δ k )}; k = k + 1; end f end whle H +1 = H, D +1 = D condtoned on h 1, h 2 H : h 1 (x) h 2 (x), = + 1 end whle Output: h = argmn(mn h H UB(S, h, δ k )) 8

9 4.2 Teachng Dmenson and Actve Learnng Hanneke [5] descrbes a general actve learnng nose-tolerant algorthm whch s based on exact learnng wth membershp queres. He shows the frst nontrval general upper bound on label complexty n an actve learnng nose model. In exact learnng, the algorthm s requred to dentfy the oracle s actual target functon rather then approxmatng t wth hgh probablty. There s no classfcaton nose and the algorthm can ask for the label of any example. In a sense t s a lmtng case of PAC actve learnng. An addtonal restrcton to the algorthm n [5], s that t only works for arbtrary persstent classfcaton nose; meanng the label of a datapont cannot change from one query to the next. The goal of exact learnng s to ask for labels f(x) untl the only concept n C consstent wth the observed labels s the target f C. C C F where F s the correspondng σ-algebra of set X and C F s the set of all F - measurable f : X { 1, 1}. MembHalvng (Hegedüs [6]) s an example of an exact learnng algorthm. It uses majorty vote to contnously mnmze the verson space untl only one hypothess s left. Queryng a specfyng set for h maj guarantees that we at least halve the verson space each round because t s guaranteed that ether h makes a mstake or we dentfy f. Defnton 1 f C F, XT D(f, V, U) = nf({t R U : {h V : h(r) = f(r)} 1 R t} The teachng dmenson s the mnmum number of nstances a teacher must reveal to unquely dentfy any target concept chosen from the class. The exact teachng dmenson s a more restrctve form; The functon, f(r), of a mnmal subset, R U, can be satsfed by only one hypothess, h(r), and R s at most the value of t = XT D. The Teachng Dmenson and Actve Learnng algorthm (TDA), (Algorthm 3), works by contnuously reducng the sze of the verson space untl t s between specfed szes. The method Reduce acheves ths by gettng the mnmal specfyng set, R, of the subsequence S U based upon the majorty, h maj, of V. V s the set of h V where h(r ) Oracle(R ). Reduce gets the mnmal set r tmes where V s all h V that dd appeared n > θ r of the sets h(r ). It returns V = V \ V ; t s unlkely that these dataponts are nosy. It then get the labels from the verson space that should be used for the fnal hypothess va the method Label and returns the hypothess whch has the smallest error. Label gets the mnmal specfyng 9

10 Algorthm 3 ReduceAndLabel (TDA) Input: Fnte V C F, U = {x 1, x 2,..., x m } X m, values, δ, ˆη (0, 1]. Intalze: u = U /(5 ln V ), V 0 = V, = 0 repeat = + 1 Let U = {x 1+u( 1), x 2+u( 1),..., x u } δ V = Reduce(V 1, U,, ˆη + ) 48 ln V 2 untl V > 3 V 4 1 or V 1 Let Ū = {x u+1), x u+2),..., x u+l }, where l = 12 ˆη ln 12 V 2 δ L = Label(V 1, Ū, δ, ˆη + ) 12 2 Output: Concept h V havng smallest er L (h), (or any h V f V = ). set for V based upon h maj as n Reduce, and labels those ponts. It then looks at all examples n Ū that were not n the mnmal set and labels those based upon ts majorty value over ts value all ts values from the subsets f h(r ) = h maj (R ) = Oracle(R ). It s mportant to use subsamples of sze < 1 n TDA because the 16η probablty of such a subsample contanng a nosy sample s small. 1 Theorem 2 Let n =, and let N be the sze of a mnmal 16(η+3/4) cover of C. Let l = 48 η+/2 ln 12N 96 ln N. Let s = (397 ln ) (4 ln N) + 2 δ δ 167 l 36l ln, and t = XT D(C, D, n, δ n δ 2s n C, D,, δ, η s ts = O(t( η2 + 1)(d log 1 + log 1)(log d 2 δ δ 2 - ). Then the number of labels quered )). [5] The theorem states that the bound s st; s s based on n, N, and l, and t s the extended teachng dmenson. Ths mples that the upper bound for any concept class s based upon ts extended teachng dmenson. The number of dataponts TDA requres s based upon the sze of V where t s known that V N < 2( 4e m 224 η + /2 2 ln ln 4e 4e 48 ln 2( δ 2 ) [11]. So, the number of dataponts s ln 4e 2 ) (5 ln 2( 4e 2 4e ln ) ). The concept class of axs( algned rectangles ) s shown as an applcaton n whose XTD(C, D, n, δ) O 2 log nm. Results were not shown for any λ δ other concept classes; n partcular the most common applcaton n the actve 10

11 learnng model as descrbed n QBC and A 2 s not mentoned. Why wasn t ths concept class examned? How does ths algorthm compare to the other nosy model, A 2? 4.3 Lnear Separators under the Unform Dstrbuton Model # of Dataponts # of Labels Quered QBC O( d log 1 ) δ O( d) Modfed Perceptron O( d log 1 ) O(d log 1 ) ( A d ln( 12 ) + ln( 4)) O ( d ( ) ) d ln d + ln 1 2 δ δ ln 1 TDA 224 η+/2 ln 2 (5 ln 2( 4e 4e 4e 2 48 ln 2( ln ) δ O > (( 2d d )( η ) 4e ln 2 ) ) (d log 1 + log 1)(log d )) δ δ Table 1: Bounds for models wth the class of lnear separators under D A common applcaton analyzed s data drawn from the unt sphere n R d where the labels are dvded by a lnear separator that goes through the orgn of the sphere. The teachng dmenson and actve learnng model dd not examne ths case whch would be useful n analyzng the dfference between the two nosy models. I wll show the upper bounds for each algorthm wth ths applcaton and provde an analyss of the two relatve to eachother. Table 1 dsplays the bounds on dataponts and queres for each of the models mentoned that analyzed ths classfer. QBC and Modfed Perceptron use the Perceptron algorthm on the classfer to produce these bounds A 2 A 2 algorthm shows exponental mprovements for the lnear separator over the unt sphere. It s well-desgned for ths applcaton due to the mnmzaton that t does to the area of uncertanty. Theorem 3 Let X, H, andd be as defned above, and let LB and UB be the VC bound. Then for any 0 < < 1, 0 < η <, and δ > 0, wth probablty 2 16 d 11

12 1 δ, the algorthm A 2 requres ( O d (d ln d + ln 1δ ) ln 1 ) calls to the labelng oracle for lnear separators under ( the unform dstrbuton, where δ δ )) = (ln and N(, δ, H) = O 1 d 2 d ln 1/ ln d + ln. [4] N 2 (,δ,h) δ δ s based on N(, δ, H) whch s an upper bound on the number of bound (LB and UB) evaluatons needed n the algorthm. If H s a set of functons from X to { 1, 1} wth fnte VC Dmenson, d 1, then for any, δ > 0, the sample ( sze requred from D, an arbtrary but fxed probablty dstrbuton, s 64 2d ln( 12 ) + ln( 4)). Ths s based upon 2 δ standard sample complexty bounds from Anthony and Bartlett [12] and t mples wth probablty at least 1 δ that err(h) err(h) ˆ for all h H T DA The number of queres requred n the TDA model s based upon the exact teachng dmenson of the concept class. Ths proposes some problems because regardless of the number of dataponts gven by the teacher the separator cannot be learned. Ths s because there are nfntely many lnear separators n R d. The concept space must be dscretzed to X = {0, 1} d to produce some results. Theorem 4 Let XTD be as defned above and X = {0, 1} d and no dataponts le on the separator. The bound on the number of labels quered n C, D,, δ, η for lnear separators under the unform dstrbuton n X s > ( 2d d )( η )(d log 1 + log 1)(log d ) δ δ Proof: The teachng dmenson of lnearly separable functons s 2 d [8]. We are only concerned wth lnear separators through the orgn whch mples that we only need to be concerned about the dataponts that le near the orgn. It s possble to shft any one of those dataponts slghtly wthout changng the others and thereby requre a dfferent lnear separator. The teachng dmenson s on the order of 2d d. 2 The XTD s even worse snce t s a more restrctve case. However, the TD s poor n tself and t s therefore 2 Ths value was receved from Rocco Servedo 12

13 not necessary to fnd the XTD. The proof s therefore an approxmate bounds based on substtutng t = XT D wth TD and s as defned n Theorem 2. Based upon Theorem 4, t would not be a good dea to use TDA for lnear separators under the unform dstrbuton. The poor bounds n ths class were caused by the dependence of the TDA algorthm on the XTD. It can be assumed that f the XTD for a class s small that the algorthm would perform well and therefore be a good algorthm to use. It would appear that ths was the reason that lnear separators over the unt sphere was not examned n [5]. Hanneke shows the classfer of axs-algned rectangles, whch have a low XTD. (However, they requred dscreteness as well). In the class of lnear separators under the unform dstrbuton, A 2 algorthm has un upper bound of sgnfcantly less queres and would therefore be a better choce. 5 Concluson and Open Questons I have descrbed a number of actve learnng algorthms and analyzed and compared the A 2 and TDA algorthms. I have produced results that show that A 2 s better for lnear separators over the unt sphere. Some open questons: 1. In order to fully analyze and compare the two algorthms, what bounds does A 2 have for axs-algned rectangles? (the concept class shown usng TDA) 2. TDA s useful snce t s a general actve learnng and nosy model, however t does not do well n the settng analyzed by many other papers n ths topc. Can TDA be altered so that t does not depend on the exact teachng dmenson? 3. Can a general algorthm be wrtten whch would produce reasonable bounds for all the applcatons? 4. Can general bounds be made for A 2? 13

14 6 References [1] Y. Freund, S. Seung, E. Shamr, and N. Tshby. (1997) Selectve Samplng usng the query by commttee algorthm. Machne Learnng, 28: [2] S. Dasgupta, A. Kala, and C. Monteleon. (2005) Analyss of perceptronbased actve learnng. COLT. [3] S. Dasgupta (2004) Analyss of a greedy actve learnng strategy. NIPS. [4] M.-F. Balcan, A. Beygelzmer, J. Langford. (2006) Agnostc Actve Learnng. Proc. of the 23rd Internatonal Conference on Machne Learnng. [5] S. Hanneke. (2007) Teachng Dmenson and the Complexty of Actve Learnng. COLT. [6] T. Hegedüs. (1995) Generalsed teachng dmensonand the query complexty of learnng. Proc. of the 8th Annual Conference on Computatonal Learnng Theory. [7] S. A. Goldman and M. J. Kearns. (1995) On the complexty of teachng. Journal of Computer and System Scences, 50: [8] M. Anthony, G. Brghtwell, J. Shawe-Taylor (1995) On specfyng Boolean functons by labelled examples. Dscrete Appled Mathematcs, 61:1-25. [9] D. Cohen, L. Atlas, & R. Ladner (1994). Improvng generalzaton wth actve learnng. Machne Learnng, 15(2), [10] M. Kaaranen (2005). On actve learnng n the nonrealzable case. NIPS Workshop on foundatons of Actve Learnng. [11] D. Haussler (1992). Decson theoretc generalzatons of the PAC model for neural net and other learnng applcatons. Informaton and Computaton, 100: [12] M. Anthony & P. Bartlett (1999). Neural Network Learnng: Theoretcal Foundatons. Cambrdge Unversty Press. 14

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

CSCI B609: Foundations of Data Science

CSCI B609: Foundations of Data Science CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Finding Primitive Roots Pseudo-Deterministically

Finding Primitive Roots Pseudo-Deterministically Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Message modification, neutral bits and boomerangs

Message modification, neutral bits and boomerangs Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Pulse Coded Modulation

Pulse Coded Modulation Pulse Coded Modulaton PCM (Pulse Coded Modulaton) s a voce codng technque defned by the ITU-T G.711 standard and t s used n dgtal telephony to encode the voce sgnal. The frst step n the analog to dgtal

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013 1 Ph 219a/CS 219a Exercses Due: Wednesday 23 October 2013 1.1 How far apart are two quantum states? Consder two quantum states descrbed by densty operators ρ and ρ n an N-dmensonal Hlbert space, and consder

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

a b a In case b 0, a being divisible by b is the same as to say that

a b a In case b 0, a being divisible by b is the same as to say that Secton 6.2 Dvsblty among the ntegers An nteger a ε s dvsble by b ε f there s an nteger c ε such that a = bc. Note that s dvsble by any nteger b, snce = b. On the other hand, a s dvsble by only f a = :

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Topic 23 - Randomized Complete Block Designs (RCBD)

Topic 23 - Randomized Complete Block Designs (RCBD) Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

SIO 224. m(r) =(ρ(r),k s (r),µ(r)) SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information