Online Linear Regression using Burg Entropy

Size: px
Start display at page:

Download "Online Linear Regression using Burg Entropy"

Transcription

1 Onlne Lnear Regresson usng Burg Entropy Prateek Jan, Bran Kuls, and Inderjt Dhllon Techncal Report TR Unversty of Texas at Austn Austn, TX 7872 February 4, 2007 Abstract We consder the problem of onlne predcton wth a lnear model. In contrast to exstng work n onlne regresson, whch regularzes based on squared loss or KL-dvergence, we regularze usng dvergences arsng from the Burg entropy. We demonstrate regret bounds for our resultng onlne gradent-descent algorthm; to our knowledge, these are the frst onlne bounds nvolvng Burg entropy. We extend ths analyss to the matrx case, where our algorthm employs LogDet-based regularzaton, and dscuss an applcaton to onlne metrc learnng. We demonstrate emprcally that usng Burg entropy for regularzaton s useful n the presence of nosy data. Introducton Recently, onlne learnng has receved sgnfcant attenton, and several recent results have demonstrated strong provable onlne bounds. In the onlne lnear regresson problem, consdered n ths paper, the algorthm receves an nstance x t x t R m +) at tme t and predcts ŷ t = w t x t ; w t s the current model used by the algorthm. We denote w t as the -th component of the model at the t-th step. The error ncurred at step t s l t w t ) = y t ŷ t ) 2, where y t s the true/target dstance, and the goal s to mnmze the total loss over all tme steps t l tw t ). If there s no correlaton between the nput x t and the output y t, then the cumulatve loss of the algorthm may be unbounded. Hence, onlne algorthms are tradtonally compared to the performance of the best possble offlne soluton, where all the nput and output nstances are provded beforehand. The goal s to prove a relatve loss bound on how the onlne algorthm compares to the optmal offlne algorthm. Gven a T-tral sequence S = {x, y ), x 2, y 2 ),..., x T, y T )}, the optmal offlne soluton s gven by u =argmn w T l t w) t= s.t. w > 0 A typcal approach to solvng the onlne lnear regresson problem uses the standard gradent descent method wth regularzaton for mnmzng the loss at each step. Specfcally, the algorthm tres to mnmze the followng functon at each step: Uw) = Regularzaton Term {}}{ Loss Term {}}{ Dw, w t ) +η t Ly t, w x t ), ) where η t s learnng rate at t-th step and dw, w t ) s some dstance measure between the new weght vector w and the current weght vector w t. In the above objectve functon, the regularzaton term favors weght vectors whch are close to the current model w t, representng a tendency for conservatveness. On the other hand, the loss term s mnmzed

2 when w s updated to exactly satfy y t = ŷ t ; hence the loss term has a tendency to satsfy target outputs for recent examples. The tradeoff between these two quanttes s handled by the learnng rate η t, an mportant parameter n onlne learnng problems. The performance of any method followng ths approach wll depend heavly on the choce of the regularzaton term and the loss functon. Kvnen et. al[6, 8] show that the performance of any regularzaton term n turn depends heavly on the problem doman. For example, for sparse nputs, relatve entropy as a regularzaton term often works better than squared Eucldean dstance. For dense nputs, relatve entropy s performance s generally very poor compared to that of squared Eucldean dstance. In ths paper, we study the performance of onlne lnear regresson when the Burg entropy s consdered as the regularzaton term: D Burg w t+, w t ) = N w t+ w )) log t+. = The relatve Burg entropy or Itakura-Sato dstance) s a specal case of a Bregman dvergence, and such measures have been shown to be useful dstance measures for varous machne learnng tasks [] relatve entropy and squared Eucldean dstance are other specal cases of Bregman dvergences). The Burg entropy n partcular has recently been shown to be useful for matrx approxmatons and kernel learnng [7]. Our study of Burg entropy s also motvated by the fact that onlne nformaton-theoretc metrc learnng reduces to onlne lnear regresson problem wth the LogDet dvergence the natural generalzaton of the Burg entropy as regularzaton term [3]. The results n ths paper complement those of our work on nformaton-theoretc metrc learnng. We present an algorthm for usng Burg relatve entropy for regularzaton and prove an assocated regret bound. We also generalze the above approach to the matrx case when the nput matrces are rank-one; ths s precsely the case needed for nformaton-theoretc metrc learnng. In secton 5, we emprcally show how the Burg entropy performs compared to other standard regularzaton terms. 2 The Vector Case In ths study, we select the Bregman dvergence correspondng to the Burg entropy as our regularzaton term for ): N w D Burg w t+, w t ) = t+ w )) log t+. = We derve Algorthm from standard gradent descent; n partcular, f we dfferentate Uw), we obtan the update gven as 2) n Algorthm. Algorthm Burg Gradent BG) Algorthm : Intalze: Set η t = 4mR 2 and w 0 = N,. 2: Predcton: Upon recevng the t-th nstance x t, gve the predcton ŷ t = w t x t. 3: Update: Upon recevng the t-th outcome y t, update the weghts accordng to the rule, w t w t w t w t w t+ = w t) + 2η t ŷ t y t )x t), 2) where n t f ŷ t y t > 0 η t = mn n t, 2ŷ t y t)x ) ) f ŷ t wt t y t < 0. 3) Note that the dvergence D Burg s only defned over postve-valued weght vectors. To mantan postvty, we use an adaptve learnng rate η t, whch guarantees that the resultng vector w s postve. Assumng η t > 0, then η t > 0 snce η t s the mnmum of two postve quanttes. Also, f ŷ t y t > 0 then clearly 2

3 ) w t+ > 0 assumng w t > 0). On the other hand, f ŷ t y t < 0, then snce η t 2ŷ t y t)x, we t wt can conclude that w t+ > 0. Hence, we mantan postvty. 2. Regret Bounds Now we demonstrate regret bounds for the proposed algorthm. Let the total loss of Algorthm be Loss BG = t l tw t ), and let Loss u = t l tu) be the loss of the best offlne soluton. Frst we bound the loss of the algorthm at a sngle tral n terms of the loss of a comparson vector u at that tral and the progress of the algorthm towards u. Lemma. a t ŷ t y t ) 2 b t u x t y t ) 2 D Burg u, w t ) D Burg u, w t+ ), where a t and b t are postve constants and u s the optmal offlne soluton. Proof. N D Burg u, w t ) D Burg u, w t+ ) = u w ) ) = t wt+ + ln w t wt+ N N = 2η t ŷ t y t ) u x t + ln + 2η t ŷ t y t )wtx t) Consder the followng two cases: = = 2η t ŷ t y t )u x t + = N ln + 2η t ŷ t y t )wt x t ).. ŷ t y t 0: Snce wt > 0 and x t > 0, t follows that 2η tŷ t y t )wt x t 0. ) 2. ŷ t y t < 0: Snce η t < 2ŷ t y t)x, t follows that 2η t wt t ŷ t y t )wtx t >. Now, w wt t > 0 and so 2η t ŷ t y t )wtx t >. Thus, n ether case, 2η t ŷ t y t )wt x t >. We can therefore apply the nequalty whch holds for all z >. Hence ln + z) z z 2 /2, D Burg u, w t ) D Burg u, w t+ ) 2η t ŷ t y t )u x t + Assume x t R and N = w t m; the above nequalty smplfes to = N 2ηt ŷ t y t )wt x t 2η2 t ŷ t y t ) 2 wt x t )2). D Burg u, w t ) D Burg u, w t+ ) 2η t ŷ t y t )u x t + 2η t ŷ t y t )ŷ t 2η 2 t ŷ t y t ) 2 mr 2. Let r = u x t. Provng the lemma amounts to showng that D Burg u, w t ) D Burg u, w t+ ) 2η t ŷ t y t )r + 2η t ŷ t y t )ŷ t 2η 2 t ŷ t y t ) 2 mr 2 4) for some postve constants a and b. Consder the functon = a t ŷ t y t ) 2 b t r y t ) 2, 5) Fr) = a t ŷ t y t ) 2 b t r y t ) 2 + 2η t ŷ t y t )r 2η t ŷ t y t )ŷ t + 2η 2 t ŷ t y t ) 2 mr 2. Equaton 5 s equvalent to Fr) 0, r. It can be easly seen that Fr) s maxmzed when r = y t + ηt b t ŷ t y t ). Substtutng for r n Fr) and smplfyng, we get: a t ŷ t y t ) 2 + η2 t b t ŷ t y t ) 2 2η t ŷ t y t ) 2 + 2η 2 t ŷ t y t ) 2 mr 2. 3

4 Hence, we need to prove that 0 a t ŷ t y t ) 2 + η2 t b t ŷ t y t ) 2 2η t ŷ t y t ) 2 + 2η 2 t ŷ t y t ) 2 mr 2. Snce ŷ t y t ) 2 0, ths amounts to showng 0 a t b t + η 2 t 2η tb t + 2η 2 t mr2 b t = Qη t ). Now, Qη t ) s mnmzed for η t = +2mR 2 b t, whch mples b t = that b t > 0. Qη t ) < 0 ff a t holds. b t η t 2η tmr 2. Snce, η t η 0 = b t +2mR 2 b t = η t. Hence, for 0 a t η t and b t = 4mR, we know 2 η t 2η tmr, Lemma 2 Usng Lemma, we can sum the loss over all tme steps to get an overall bound on the loss of the Burg entropy-based onlne learnng algorthm. Theorem. L BG 2η T mr 2 L u + η T D Burg u,w 0 ), where L BG s the loss ncurred by the Burg Gradent descent algorthm and L u s the loss ncurred by the optmal offlne algorthm. Proof. By Lemma, for each tral t, η t ŷ t y t ) 2 Hence, addng over all trals we get T η t ŷ t y t ) 2 t= η t 2η t mr 2 u x t y t ) 2 D Burg u, w t ) D Burg u, w t+ ). T t= Thus, snce η t η T for all t, we have: η t 2η t mr 2 u x t y t ) 2 + D Burg u, w 0 ) D Burg u, w T ) L BG 2η T mr 2 L u + D Burg u,w 0 ). η T Note that smlar bounds can be obtaned usng the framework presented n Gordon[5], but the resultng bound s substantally weaker than our bound and the proof s much more nvolved. 3 The Matrx Case In the matrx case, at the t-th step, the algorthm receves an nstance X t X t R m m + ) and predcts ŷ t = trw t X t ). W t s the current model used by the algorthm. The error ncurred at the t-th step s l t W t ) = y t ŷ t ) 2. Smlarly to the vector case, for a gven T-tral sequence S = {X, y ), X 2, y 2 ),..., X T, y T )}, the optmal offlne soluton s gven by U =argmn W T l t W) t= s.t. W 0. As a result, the goal here s to compare the loss of the onlne algorthm wth the optmal offlne soluton, just as n the vector case. For the matrx case, we consder the LogDet dvergence as the regularzaton n the objectve. The LogDet dvergence s the generalzaton of D Burg to matrces, gven by D ld W, W t ) = trwwt ) log detwwt ). 4

5 3. Rank-One Input We consder the specal case where each nput X t s a rank- matrx. Ths assumpton holds n the case of nformaton-theoretc metrc learnng dscussed later), where the nputs are rank-one constrants. We further assume that nput s symmetrc; thus, the nput matrx X t can be wrtten as X t = z t z T t. Usng LogDet as the regularzaton term, we derve Algorthm 2 analogously as n the vector case. We set η t carefully at each step so that the algorthm ensures postve defnteness of the weght matrx W t. Lemma 2 proves ths nvarant. Algorthm 2 Matrx Burg Gradent BG) Algorthm : Intalze: Set η t = 4R and W 2 0 = N I,. 2: Predcton: Upon recevng the t-th nstance X t, gve the predcton ŷ t = trw t X t ). 3: Update: Upon recevng the t-th outcome y t, update the weghts accordng to the rule W t+ = W t + 2η t ŷ t y t )X t ), 6) where n t f ŷ t y t > 0 η t = ) ) mn n t, 2y t ŷ t) f ŷ t y t < 0. tri+w t I) )W tx t) 7) Lemma 2. Algorthm 2 preserves the nvarant 0 W t I. Proof. We prove ths by nducton over t. The base case holds trvally for N >. By the nducton hypothess, 0 W t I. Assume X t = zz T as X t s a symmetrc postve defnte rank-one matrx. Thus, usng the update rule and the Sherman Morrson-Woodbury formula[4]: We break up the proof nto the followng two cases: W t+ =W t 2η tŷ t y t )W t zz T W t + 2η t ŷ t y t )z T W t z. 8) Case. ŷ t y t 0: Consder the update W t+ = Wt ). + 2η t ŷ t y t )X t Snce ηt > 0 and ŷ t y t > 0, then 2η t ŷ t y t )X t 0. Now, Wt+ s sum of two symmetrc postve defnte matrces, and so W t+ s postve defnte. Usng Equaton 8, Snce, W t s symmetrc, I W t+ = I W t + 2η tŷ t y t )W t zz T W t + 2η t ŷ t y t )z T W t z. W t zz T W t = W t zz T W T t = vv T 0, where v = Wz. Also, η t > 0 and ŷ t y t > 0, and thus 2ηtŷt yt)wtzzt W t +2η tŷ t y t)z T W tz 0. Furthermore, W t I, and so I W t 0. Thus, I W t+ s sum of two symmetrc postve defnte matrces, mplyng I W t+ 0. We conclude that W t+ I. Case 2. ŷ t y t < 0: By Equaton 8, W t+ = W t 2η tŷ t y t )W t zz T W t + 2η t ŷ t y t )z T W t z = W t + 2η ty t ŷ t )W t zz T W t 2η t y t ŷ t )z T W t z. 5

6 The update rule guarantees that η t < 2y t ŷ t)trw tx t). Snce trw t X t ) = trw t zz T ) = trz T W t z) = z T W t z, 9) we know that 2η t y t ŷ t )z T W t z > 0, mplyng that W t+ s sum of two postve defnte matrces hence W t+ 0. Wrte W t as ts egendecomposton W t = QΣQ T. Snce W t s postve defnte matrx, Σ s a dagonal matrx wth postve dagonal. Usng ths decomposton, the update for W t+ s expressed as W t+ =QΣQ T 2η tŷ t y)qσq T zz T QΣQ T + 2η t ŷ t y)z T QΣQ T z. 0) Snce W t I, then I Σ 0. Usng Equaton 0, we have that I W t+ = I QΣQ T 2η ty t ŷ t )QΣQ T zz T QΣQ T + 2η t ŷ t y t )z T QΣQ T z = Q I Σ 2η ty t ŷ t )ΣQ T zz T ) QΣ + 2η t ŷ t y t )z T QΣQ T Q T z = QI Σ) 2 I 2η ty t ŷ t )I Σ) 2ΣQ T zz T QΣI Σ) 2 + 2η t ŷ t y t )z T QΣQ T z ) I Σ) 2 Q T. Let G = QI Σ) 2. If the mddle term s symmetrc postve defnte, then t can be wrtten as EE T. Ths would lead to I W t+ = GEE T G T = GEGE) T 0, snce for any A, AA T 0. Thus to prove that W t+ I, we need to prove that ) I 2η ty t ŷ t )I Σ) 2ΣQ T zz T QΣI Σ) 2 + 2η t ŷ t y t )z T QΣQ T z 0. ) To do ths, we calculate the egenvalues of ths matrx, and show that they are greater than zero. Let v = I Σ) 2ΣQ T z. Then I Σ) 2 ΣQ T zz T QΣI Σ) 2 = vv T. Snce, vv T s a rank-one matrx, t has exactly one postve egenvalue, and that egenvalue s equal to ts trace tri Σ) 2 ΣQ T zz T QΣI Σ) 2 ). Expandng the trace yelds tri Σ) 2 ΣQ T zz T QΣI Σ) 2 ) = trqσi Σ) ΣQ T zz T ) = trqσ I) ΣQ T zz T ) = trw t I) W t X t ). To prove Equaton, we need to show that the egenvalues are postve; equvalently, we can show that By the update rule, η t < η t < 2y t ŷ t) 2η ty t ŷ t )trwt I) W t X t ) > 0. 2) 2η t y ŷ t )trw t X t ) 2y t ŷ t)trw tx t) tri+w t I) )W tx t) 3.2 Regret Bound, and hence the denomnator of the second term s postve. Also, ), so Equaton 2 holds. Thus W t+ I. Smlar to the vector case, we bound loss at each step ncurred by algorthm n terms of loss ncurred by the optmal offlne soluton. Assume z 2 R. Lemma 3. a t ŷ t y t ) 2 b t trux t ) y t ) 2 D ld U, W t ) D ld U, W t+ ), where a t and b t are postve constants and U s the optmal offlne soluton. 6

7 Proof. ) detwt ) D ld U, W t ) D ld U, W t+ ) = log + trwt Wt+ detw t+ ) )U). Snce logdeta)) = trloga)) and trab) = trba), we have that D ld U, W t ) D ld U, W t+ ) = trlogw t ) logw t+ )) + trw t W t+ )U) = trlogw t W t+ )) 2η tŷ t y t )trx t U) = trlogi + 2η t ŷ t y t )W t X t )) 2η t ŷ t y t )trux t ). Usng the Taylor seres expanson and gnorng lower order terms, D ld U, W t ) D ld U, W t+ ) 2η t ŷ t y t )trw t X t ) 2η 2 t ŷ t y t ) 2 trw t X t ) 2 ) 2η t ŷ t y t )trux t ). We wrte X as zz T, so trw t X t ) 2 ) = trw t zz T W t zz T ) = z T W t z)trwzz T ) = trw t X t ) 2. Hence, D ld U, W t ) D ld U, W t+ ) 2η t ŷ t y t )trw t X t ) 2η 2 t ŷ t y t ) 2 trw t X t ) 2 ) 2η t ŷ t y t )trux t ) 2η t ŷ t y t )trw t X t ) 2η 2 t ŷ t y t ) 2 trw t X t ) 2 2η t ŷ t y t )trux t ). Snce W t I by Lemma 2, trw t X t ) = z T W t z z T z R 2. As a result, D ld U, W t ) D ld U, W t+ ) 2η t ŷ t y t )trw t X t ) 2η 2 t ŷ t y t ) 2 R 2 2η t ŷ t y t )trux t ) = 2η t ŷ t y t )ŷ t 2η 2 t ŷ t y t ) 2 R 2 2η t ŷ t y t )r, where r = tru t X t ). Thus, to prove Lemma 3, we need to prove the same nequalty as n Equaton 5, wth mr 2 replaced by R 2 η. Hence, for 0 a t η t and b t = t 2η tr, Lemma 3 holds. 2 As n the vector case, we can obtan the fnal bound by summng the loss at each step. Theorem 2. L BG 2η T R 2 L U + η T H ld U), where L MBG s the loss ncurred by the Matrx Burg Gradent descent algorthm, L U s loss ncurred by optmal offlne algorthm and H ld U) = logdetu). Proof. The proof s very smlar to that of the vector case. 4 Applcaton: Onlne Metrc Learnng Selectng an approprate dstance measure or metrc) s fundamental to many learnng algorthms such as k-means, nearest neghbor searches, and others. However, choosng such a measure s hghly problemspecfc and ultmately dctates the success or falure of the learnng algorthm. To ths end, there have been several recent approaches that attempt to learn dstance functons. These methods work by explotng dstance nformaton that s ntrnscally avalable n many learnng settngs. For example, n the problem of sem-supervsed clusterng, ponts are constraned to be ether smlar.e. the dstance between them should be relatvely small) or dssmlar the dstance should be larger). In nformaton retreval settngs, constrants between pars of dstances can be gathered from clck-through feedback. In fully supervsed settngs, constrants can be nferred so that ponts n the same class have smaller dstances to each other than to ponts n dfferent classes. Gven a set of n ponts {x,..., x n } n R d, we seek a postve defnte matrx A whch parameterzes the Mahalanobs dstance : d A x, x j ) = x x j ) T Ax x j ). 3) The term Mahalanobs dstance frequently refers to the squared Mahalanobs dstance, even though t s p d A x, y) that s a metrc. Ths dfference does not affect our method, hence we drop the qualfcaton squared when referrng to 3) 7

8 We can quantfy the measure of closeness between two Mahalanobs matrces A and A 0 va a natural nformaton-theoretc approach. There exsts a smple bjecton up to a scalng functon) between the set of Mahalanobs dstances and the set of equal-mean multvarate Gaussan dstrbutons wthout loss of generalty, we can assume the Gaussans have zero-mean µ). Gven a Mahalanobs dstance parameterzed by A, we express ts correspondng multvarate Gaussan as px; A) = Z exp 2 d Ax, µ)), where Z s a normalzng constant. Usng ths bjecton, we measure the dstance between two Mahalanobs dstance functons parameterzed by A and A 0 by the dfferental) relatve entropy between ther correspondng multvarate Gaussans: px; A) KLpx; A) px; A 0 )) = px; A)log dx. 4) px; A 0 ) The dstance 4) provdes a well-founded measure of closeness between two Mahalanobs dstance functons and forms the bass of nformaton-theoretc metrc learnng. It s known that the dfferental relatve entropy between two multvarate Gaussans can be expressed as the convex combnaton of a Mahalanobs dstance between mean vectors and the LogDet dvergence between the covarance matrces. Assumng the means to be the same, we have KLpx; A) px; A 0 )) = 2 D lda, A 0 ), 5) We therefore select da, A t ) = D ld A, A t ) as the regularzer for ), usng the above equvalence 5 to obtan the update A t+ = argmnd ld A, A t ) + η t d t ˆd t ) 2. A We can vew the problem of onlne metrc learnng n the followng manner: at each tme step, we receve a matrx X t and a target y t. These encode the dstance between two vectors x and x j. If we let z = x x j, then tra t X t ) = z T A t z = x x j ) T A t x x j ) = d At x, x j ). Thus, the predcton tra t X t ) captures the Mahalanobs dstance between the vectors x and x j usng the current Mahalanobs matrx A t. The resultng loss tra t X t ) y t ) 2, where y t s the target dstance between the two vectors, tells us how close the current Mahalanobs matrx A t s n terms of satsfyng the gven dstance constrant. As a result, the rank-one onlne regresson algorthm usng the LogDet dvergence can be used as method for learnng a Mahalanobs matrx A, where at every tme step two ponts, along wth ther target dstance, are presented to the algorthm. The regret bound proven n the prevous secton apples drectly to ths case. See[2, 3] for further detals on nformaton-theoretc metrc learnng, ncludng the analyss of an offlne algorthm. 5 Expermental Results In secton 2. and 3.2, we showed worst case bounds for our algorthms. To smulate the onlne learnng problem, we now generate smple artfcal data. Frst, we generate a random vector u drawn from S m. Then, we generate some random nputs x t, also drawn from S m. We output y as u x t mxed wth some nose,.e, y = u x t ) + νz), where Z s an unformly dstrbuted random number over [ 0.5, 0.5] and ν s the nose level. We generate ths artfcal data for N = 00 and T = 0000 teratons. Fgure compares performance of the Burg entropy-based algorthm wth those usng relatve entropy and squared Eucldean dstance as the regularzaton term, when the nose level s set to 0. Clearly, the Burg algorthm converges slower compared to the other methods. But, Fgures 2 and 3 show that as nose ncreases, the Burg performance mproves n comparson to both relatve entropy and squred Eucldean dstance. Fgure 4 shows that when nose s around 30% then the Burg algorthm performs comparatvely to relatve entropy, and squred Eucldean performs poorly. 6 Concluson In ths paper, we ntroduced onlne regresson based on usng a Burg entropy-based regularzaton, leadng to new onlne regret bounds. An extenson to the rank-one matrx case was also analyzed. The man applcaton 8

9 0. Burg Entropy Relatve Entropy Eucldean Entropy Burg Entropy Relatve Entropy Eucldean Entropy Fgure : Comparson of performance of burg, relatve and eucldean dstance measure wth nose=0% Fgure 2: Comparson of performance of burg, relatve and eucldean dstance measure wth nose=0% Burg Entropy Relatve Entropy Eucldean Entropy Burg Entropy Relatve Entropy Eucldean Entropy Fgure 3: Comparson of performance of burg, relatve and eucldean dstance measure wth nose=20% Fgure 4: Comparson of performance of burg, relatve and eucldean dstance measure wth nose=30% 9

10 s to onlne metrc learnng, whch can be cast as an onlne regresson problem for rank-one matrces; ths metrc learnng formulaton has an nformaton-theoretc motvaton, and results n [2] have shown that t outperforms exstng methods. Future work ncludes mprovng the bounds so that they do not depend on the learnng rate, and extendng the matrx analyss to the general case. References [] A. Banerjee, S. Merugu, I. S. Dhllon, and J. Ghosh, Clusterng wth bregman dvergences., n SDM, 2004, pp [2] J. Davs, B. Kuls, P. Jan, S. Sra, and I. S. Dhllon, Informaton-theoretc metrc learnng, n Internatonal Conference on Metrc Learnng, 2007, p. to appear. [3] J. Davs, B. Kuls, S. Sra, and I. Dhllon, Informaton-theoretc metrc learnng, n NIPS 2006 Workshop on Learnng to Compare Examples, [4] G. H. Golub and C. F. van Loan, Matrx Computatons, Johns Hopkns Seres n the Mathematcal Scences, Johns Hopkns Unv. Press, second ed., 989. [5] G. J. Gordon, Regret bounds for predcton problems, n COLT, 999, pp [6] J. Kvnen and M. K. Warmuth, Exponentated gradent versus gradent descent for lnear predctors, Inf. Comput., ), pp. 63. [7] B. Kuls, M. Sustk, and I. S. Dhllon, Learnng low-rank kernel matrces., n ICML, 2006, pp [8] K. Tsuda, G. Raetsch, and M. K. Warmuth, Matrx exponentated gradent updates of onlne learnng and bregman projecton, Journal of Machne Learnng Research, ). 0

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CSCI B609: Foundations of Data Science

CSCI B609: Foundations of Data Science CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,

More information

Mean Field / Variational Approximations

Mean Field / Variational Approximations Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS These are nformal notes whch cover some of the materal whch s not n the course book. The man purpose s to gve a number of nontrval examples

More information

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014) 0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis A Appendx for Causal Interacton n Factoral Experments: Applcaton to Conjont Analyss Mathematcal Appendx: Proofs of Theorems A. Lemmas Below, we descrbe all the lemmas, whch are used to prove the man theorems

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Projective change between two Special (α, β)- Finsler Metrics

Projective change between two Special (α, β)- Finsler Metrics Internatonal Journal of Trend n Research and Development, Volume 2(6), ISSN 2394-9333 www.jtrd.com Projectve change between two Specal (, β)- Fnsler Metrcs Gayathr.K 1 and Narasmhamurthy.S.K 2 1 Assstant

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

CSCE 790S Background Results

CSCE 790S Background Results CSCE 790S Background Results Stephen A. Fenner September 8, 011 Abstract These results are background to the course CSCE 790S/CSCE 790B, Quantum Computaton and Informaton (Sprng 007 and Fall 011). Each

More information

Feb 14: Spatial analysis of data fields

Feb 14: Spatial analysis of data fields Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s

More information

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Perron Vectors of an Irreducible Nonnegative Interval Matrix Perron Vectors of an Irreducble Nonnegatve Interval Matrx Jr Rohn August 4 2005 Abstract As s well known an rreducble nonnegatve matrx possesses a unquely determned Perron vector. As the man result of

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Random Walks on Digraphs

Random Walks on Digraphs Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information