A Novel Family of Boosted Online Regression Algorithms with Strong Theoretical Bounds

Size: px
Start display at page:

Download "A Novel Family of Boosted Online Regression Algorithms with Strong Theoretical Bounds"

Transcription

1 Noname manuscrip No. will be insered by he edior) A Novel Family of Boosed Online Regression Algorihms wih Srong Theoreical Bounds Dariush Kari Farhan Khan Selami Cifci Suleyman S. Koza he dae of receip and accepance should be insered laer arxiv: v2 [mah.st] 6 Dec 2016 Absrac We invesigae boosed online regression and propose a novel family of regression algorihms wih srong heoreical bounds. In addiion, we implemen several varians of he proposed generic algorihm. We specifically provide heoreical bounds for he performance of our proposed algorihms ha hold in a srong mahemaical sense. We achieve guaraneed performance improvemen over he convenional online regression mehods wihou any saisical assumpions on he desired daa or feaure vecors. We demonsrae an inrinsic relaionship, in erms of boosing, beween he adapive mixure-of-expers and daa reuse algorihms. Furhermore, we inroduce a boosing algorihm based on random updaes ha is significanly faser han he convenional boosing mehods and oher varians of our proposed algorihms while achieving an enhanced performance gain. Hence, he random updaes mehod is specifically applicable o he fas and high dimensional sreaming daa. Specifically, we invesigae Newon Mehod-based and Sochasic Gradien Descen-based linear regression algorihms in a mixure-of-expers seing, and provide several varians of hese well known adapaion mehods. However, he proposed algorihms can be exended o oher base learners, e.g., nonlinear, ree-based piecewise linear. Furhermore, we provide heoreical bounds for he compuaional complexiy of our proposed algorihms. We demonsrae subsanial performance gains in erms of mean square error over Dariush Kari Suleyman S. Koza Deparmen of Elecrical and Elecronics Engineering, Bilken Universiy Ankara 06800, Turkey kari@ee.bilken.edu.r, koza@ee.bilken.edu.r Selami Cifci Turk Telekom Communicaions Services Inc., Isanbul, Turkey selami.cifci1@urkelekom.com.r Farhan Khan Deparmen of Elecrical and Elecronics Engineering, Bilken Universiy Ankara 06800, Turkey khan@ee.bilken.edu.r and also Elecrical Engineering Deparmen, COMSATS Insiue of Informaion Technology, Pakisan engrfarhan@cii.ne.pk

2 2 Dariush Kari e al. he base learners hrough an exensive se of benchmark real daa ses and simulaed examples. Keywords Online boosing, online regression, boosed regression, ensemble learning, smooh boos, mixure mehods 1 Inroducion Boosing is considered as one of he mos imporan ensemble learning mehods in he machine learning lieraure and i is exensively used in several differen real life applicaions from classificaion o regression Bauer and Kohavi 1999); Dieerich 2000); Schapire and Singer 1999); Schapire and Freund 2012); Freund and E.Schapire 1997); Shresha and Solomaine 2006); Shalev-Shwarz and Singer 2010); Saigo e al. 2009); Demiriz e al. 2002)). As an ensemble learning mehod Fern and Givan 2003); Solanmohammadi e al. 2016); Duda e al. 2001)), boosing combines several parallel running weakly performing algorihms o build a final srongly performing algorihm Solanmohammadi e al. 2016); Freund 2001); Schapire and Freund 2012); Mannor and Meir 2002)). This is accomplished by finding a linear combinaion of weak learning algorihms in order o minimize he oal loss over a se of raining daa commonly using a funcional gradien descen Duffy and Helmbold 2002); Freund and E.Schapire 1997)). Boosing is successfully applied o several differen problems in he machine learning lieraure including classificaion Jin and Zhang 2007); Chapelle e al. 2011); Freund and E.Schapire 1997)), regression Duffy and Helmbold 2002); Shresha and Solomaine 2006)), and predicion Taieb and Hyndman 2014, 2013)). However, significanly less aenion is given o he idea of boosing in online regression framework. To his end, our goal is a) o inroduce a new boosing approach for online regression, b) derive several differen online regression algorihms based on he boosing approach, c) provide mahemaical guaranees for he performance improvemens of our algorihms, and d) demonsrae he inrinsic connecions of boosing wih he adapive mixure-of-expers algorihms Arenas-Garcia e al. 2016); Koza e al. 2010)) and daa reuse algorihms Shaffer and Williams 1983)). Alhough boosing is iniially inroduced in he bach seing Freund and E.Schapire 1997)), where algorihms boos hemselves over a fixed se of raining daa, i is laer exended o he online seing Oza and Russell 2001)). In he online seing, however, we neiher need nor have access o a fixed se of raining daa, since he daa samples arrive one by one as a sream Ben-David e al. 1997); Fern and Givan 2003); Lu e al. 2016)). Each newly arriving daa sample is processed and hen discarded wihou any soring. The online seing is naurally moivaed by many real life applicaions especially for he ones involving big daa, where here may no be enough sorage space available or he consrains of he problem require insan processing Boou and Bousque 2008)). Therefore, we concenrae on he online boosing framework and propose several algorihms for online regression asks. In addiion, since our algorihms are online, hey can be direcly used in adapive filering applicaions o improve he performance of convenional mixure-of-expers mehods Arenas-Garcia e al. 2016)). For adapive filering purposes, he online seing is especially imporan, where he sequenially

3 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 3 arriving daa is used o adjus he inernal parameers of he filer, eiher o dynamically learn he underlying model or o rack he nonsaionary daa saisics Arenas-Garcia e al. 2016); Sayed 2003)). Specifically, we have m parallel running weak learners WL) Schapire and Freund 2012)) ha receive he inpu vecors sequenially. Each WL uses an updae mehod, such as he second order Newon s Mehod NM) or Sochasic Gradien Descen SGD), depending on he arge of he applicaions or problem consrains Sayed 2003)). Afer receiving he inpu vecor, each algorihm produces is oupu and hen calculaes is insananeous error afer he observaion is revealed. In he mos generic seing, his esimaion/predicion error and he corresponding inpu vecor are hen used o updae he inernal parameers of he algorihm o minimize a priori defined loss funcion, e.g., insananeous error for he SGD algorihm. These updaes are performed for all of he m WLs in he mixure. However, in he online boosing approaches, hese adapaions a each ime proceed in rounds from op o boom, saring from he firs WL o he las one o achieve he boosing effec Chen e al. 2012)). Furhermore, unlike he usual mixure approaches Arenas-Garcia e al. 2016); Koza e al. 2010)), he updae of each WL depends on he previous WLs in he mixure. In paricular, a each ime, afer he k h WL calculaes is error over x, d ) pair, i passes a cerain weigh o he nex WL, he k + 1) h WL, quanifying how much error he consiuen WLs from 1 s o k h made on he curren x, d ) pair. Based on he performance of he WLs from 1 o k on he curren x, d ) pair, he k + 1) h WL may give a differen emphasis imporance weigh) o x, d ) pair in is adapaion in order o recify he misake of he previous WLs. The proposed idea for online boosing is clearly relaed o he adapive mixureof-expers algorihms widely used in he machine learning lieraure, where several parallel running adapive algorihms are combined o improve he performance. In he mixure mehods, he performance improvemen is achieved due o he diversiy provided by using several differen adapive algorihms each having a differen view or advanage Koza e al. 2010)). This diversiy is exploied o yield a final combined algorihm, which achieves a performance beer han any of he algorihms in he mixure. Alhough he online boosing approach is similar o mixure approaches Koza e al. 2010)), here are significan differences. In he online boosing noion, he parallel running algorihms are no independen, i.e., one deliberaely inroduces he diversiy by updaing he WLs one by one from he firs WL o he m h WL for each new sample based on he performance of all he previous WLs on his sample. In his sense, each adapive algorihm, say he k + 1) h WL, receives feedback from he previous WLs, i.e., 1 s o k h, and updaes is inner parameers accordingly. As an example, if he curren x, d ) is well modeled by he previous WLs, hen he k + 1) h WL performs minor updae using x, d ) and may give more emphasis imporance weigh) o he laer arriving samples ha may be worse modeled by he previous WLs. Thus, by boosing, each adapive algorihm in he mixure can concenrae on differen pars of he inpu and oupu pairs achieving diversiy and significanly improving he gain. The linear online learning algorihms, such as SGD or NM, are among he simples as well as he mos widely used regression algorihms in he real-life applicaions Sayed 2003)). Therefore, we use such algorihms as base WLs in our boosing algorihms. To his end, we firs apply he boosing noion o several parallel

4 4 Dariush Kari e al. running linear NM-based WLs and inroduce hree differen approaches o use he imporance weighs Chen e al. 2012)), namely weighed updaes, daa reuse, and random updaes. In he firs approach, we use he imporance weighs direcly o produce cerain weighed NM algorihms. In he second approach, we use he imporance weighs o consruc daa reuse adapive algorihms Oza and Russell 2001)). However, daa reuse in boosing, such as Oza and Russell 2001)), is significanly differen from he usual daa reusing approaches in adapive filering Shaffer and Williams 1983)). As an example, in boosing, he imporance weigh coming from he k h WL deermines he daa reuse amoun in he k + 1) h WL, i.e., i is no used for he k h filer, hence, achieving he diversiy. The hird approach uses he imporance weighs o decide wheher o updae he consiuen WLs or no, based on a random number generaed from a Bernoulli disribuion wih he parameer equal o he weigh. The laer mehod can be effecively used for big daa processing Malik 2013)) due o he reduced complexiy. The oupu of he consiuen WLs is also combined using a linear mixure algorihm o consruc he final oupu. We hen updae he final combinaion algorihm using he SGD algorihm Koza e al. 2010)). Furhermore, we exend he boosing idea o parallel running linear SGD-based algorihm similar o he NM case. We sar our discussions by invesigaing he relaed works in Secion 2. We hen inroduce he problem seup and background in Secion 3, where we provide individual sequence as well as MSE convergence resuls for he NM and SGD algorihms. We inroduce our generic boosed online regression algorihm in Secion 4 and provide he mahemaical jusificaions for is performance. Then, in Secions 5 and 6, hree differen varians of he proposed boosing algorihm are derived, using he NM and SGD, respecively. Then, in Secion 7 we provide he mahemaical analysis for he compuaional complexiy of he proposed algorihms. The paper concludes wih exensive ses of experimens over he well known benchmark daa ses and simulaion models widely used in he machine learning lieraure o demonsrae he significan gains achieved by he boosing noion. 2 Relaed Works AdaBoos is one of he earlies and mos popular boosing mehods, which has been used for binary and muliclass classificaions as well as regression Freund and E.Schapire 1997)). This algorihm has been well sudied and has clear heoreical guaranees, and is excellen performance is explained rigorously Breiman 1997)). However, AdaBoos canno perform well on he noisy daa ses Servedio 2003)), herefore, oher boosing mehods have been suggesed ha are more robus agains noise. In order o reduce he effec of noise, SmoohBoos was inroduced in Servedio 2003)) in a bach seing. Moreover, in Servedio 2003)) he auhor proves he erminaion ime of he SmoohBoos algorihm by simulaneously obaining upper and lower bounds on he weighed advanage of all samples over all of he weak learners. We noe ha he SmoohBoos algorihm avoids overemphasizing he noisy samples, hence, provides robusness agains noise. In Oza and Russell 2001)), he auhors exend bagging and boosing mehods o an online seing, where hey use a Poisson sampling process o approximae he reweighing algorihm. However, he online boosing mehod in Oza and Russell 2001))

5 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 5 corresponds o AdaBoos, which is suscepible o noise. In Babenko e al. 2009)), he auhors use a greedy opimizaion approach o develop he boosing noion o he online seing and inroduce sochasic boosing. Neverheless, while mos of he online boosing algorihms in he lieraure seek o approximae AdaBoos, Chen e al. 2012)) invesigaes he inheren difference beween bach and online learning, exend he SmoohBoos algorihm o an online seing, and provide he mahemaical guaranees for heir algorihm. Chen e al. 2012)) poins ou ha he online weak learners do no need o perform well on all possible disribuions of daa, insead, hey have o perform well only wih respec o smooher disribuions. Recenly, in Beygelzimer e al. 2015b)) he auhors have developed wo online boosing algorihms for classificaion, an opimal algorihm in erms of he number of weak learners, and also an adapive algorihm using he poenial funcions and boos-by-majoriy Freund 1995)). In addiion o he classificaion ask, he boosing approach has also been developed for he regression Duffy and Helmbold 2002)). In Beroni e al. 1997)), a boosing algorihm for regression is proposed, which is an exension of Adaboos.R Beroni e al. 1997)). Moreover, in Duffy and Helmbold 2002)), several gradien descen algorihms are presened, and some bounds on heir performances are provided. In Babenko e al. 2009)) he auhors presen a family of boosing algorihms for online regression hrough greedy minimizaion of a loss funcion. Also, in Beygelzimer e al. 2015a)) he auhors propose an online gradien boosing algorihm for regression. In his paper we propose a novel family of boosed online algorihms for he regression ask using he online boosing noion inroduced in Chen e al. 2012)), and invesigae hree differen varians of he inroduced algorihm. Furhermore, we show ha our algorihm can achieve a desired mean squared error MSE), given a sufficien amoun of daa and a sufficien number of weak learners. In addiion, we use similar echniques o Servedio 2003)) o prove he correcness of our algorihm. We emphasize ha our algorihm has a guaraneed performance in an individual sequence manner, i.e., wihou any saisical assumpions on he daa. In esablishing our algorihm and is jusificaions, we refrain from changing he regression problem o he classificaion problem, unlike he AdaBoos.R Freund and E.Schapire 1997)). Furhermore, unlike he online SmoohBoos Chen e al. 2012)), our algorihm can learn he guaraneed MSE of he weak learners, which in urn improves is adapiviy. 3 Problem Descripion and Background All vecors are column vecors and represened by bold lower case leers. Marices are represened by bold upper case leers. For a vecor a or a marix A), a T or A T ) is he ranspose and TrA) is he race of he marix A. Here, I m and 0 m represen he ideniy marix of dimension m m and he all zeros vecor of lengh m, respecively. Excep I m and 0 m, he ime index is given in he subscrip, i.e., x is he sample a ime. We work wih real daa for noaional simpliciy. We denoe he mean of a random variable x as E[x]. Also, we show he cardinaliy of a se S by S. We sequenially receive r-dimensional inpu regressor) vecors {x } 1, x R r, and desired daa {d } 1, and esimae d by ˆd = f x ), where f.) is an

6 6 Dariush Kari e al. online regression algorihm. A each ime he esimaion error is given by e = d ˆd and is used o updae he parameers of he WL. For presenaion purposes, we assume ha d [ 1, 1], however, our derivaions hold for any bounded bu arbirary desired daa sequences. In our framework, we do no use any saisical assumpions on he inpu feaure vecors or on he desired daa such ha our resuls are guaraneed o hold in an individual sequence manner Koza and Singer Jan. 2008)). The linear mehods are considered as he simples online modeling or learning algorihms, which esimae he desired daa d by a linear model as ˆd = w T x, where w is he linear algorihm s coefficiens a ime. Noe ha he previous expression also covers he affine model if one includes a consan erm in x, hence we use he purely linear form for noaional simpliciy. When he rue d is revealed, he algorihm updaes is coefficiens w based on he error e. As an example, in he basic implemenaion of he NM algorihm, he coefficiens are seleced o minimize he accumulaed squared regression error up o ime 1 as 1 w = arg min d w l x T l w) 2, = l=1 1 x l x T l l=1 ) 1 1 ) x l d l, 1) where w is a fixed vecor of coefficiens. The NM algorihm is shown o enjoy several opimaliy properies under differen saisical seings Sayed 2003)). Apar from hese resuls and more relaed o he framework of his paper, he NM algorihm is also shown o be rae opimal in an individual sequence manner Merhav and Feder 1993)). As shown in Merhav and Feder 1993)) Secion V), when applied o any sequence {x } 1 and {d } 1, he accumulaed squared error of he NM algorihm is as small as he accumulaed squared error of he bes bach leas squares LS) mehod ha is direcly opimized for hese realizaions of he sequences, i.e., for all T, {x } 1 and {d } 1, he NM achieves l=1 l=1 T T d l x T l w l ) 2 min d w l x T l w) 2 Oln T ). 2) l=1 The NM algorihm is a member of he Follow-he-Leader ype algorihms Cesa- Bianchi and Lugosi 2006)) Secion 3), where one uses he bes performing linear model up o ime 1 o predic d. Hence, 2) follows by direc applicaion of he online convex opimizaion resuls Shalev-Shwarz 2012)) afer regularizaion. The convergence rae or he rae of he regre) of he NM algorihm is also shown o be opimal so ha Oln T ) in he upper bound canno be improved Singer e al. 2002)). I is also shown in Singer e al. 2002)) ha one can reach he opimal upper bound wih exac scaling erms) by using a slighly modified version of 1) ) 1 1 ) w = x l x T l x l d l. 3) l=1 Noe ha he exension 3) of 1) is a forward algorihm Secion 5 of Azoury and Warmuh 2001)) and one can show ha, in he scalar case, he predicions of 3) are always bounded which is no he case for 1)) Singer e al. 2002)). l=1

7 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 7 We emphasize ha in he basic applicaion of he NM algorihm, all daa pairs d l, x l ), l = 1,...,, receive he same imporance or weigh in 1). Alhough here exiss exponenially weighed or windowed versions of he basic NM algorihm Sayed 2003)), hese mehods weigh or concenrae on) he mos recen samples for beer modeling of he nonsaionariy Sayed 2003)). However, in he boosing framework Freund and E.Schapire 1997)), each sample pair receives a differen weigh based on no only hose weighing schemes, bu also he performance of he boosed algorihms on his pair. As an example, if a WL performs worse on a sample, he nex WL concenraes more on his example o beer recify his misake. In he following secions, we use his noion o derive differen boosed online regression algorihms. Alhough in his paper we use linear WLs for he sake of noaional simpliciy, one can readily exend our approach o nonlinear and piecewise linear regression mehods. For example, one can use ree based online regression mehods Khan e al. 2016); Vanli and Koza 2014); Koza e al. 2007)) as he weak learners, and boos hem wih he proposed approach. 4 New Boosed Online Regression Algorihm In his secion we presen he generic form of our proposed algorihms and provide he guaraneed performance bounds for ha. Regarding he noion of online boosing inroduced in Chen e al. 2012)), he online weak learners need o perform well only over smooh disribuions of daa poins. We firs presen he generic algorihm in Algorihm 1) and provide is heoreical jusificaions, hen discuss abou is srucure and he inuiion behind i. Algorihm 1 Boosed online regression algorihm 1: Inpu: x, d ) daa sream), m number of weak learners running in parallel), σm 2 he modified desired MSE), and σ 2 he guaraneed achievable weighed MSE). 2: Iniialize he regression coefficiens w k) 1 for each WL; and he combinaion coefficiens as z 1 = 1 m [1, 1,..., 1]T ; 3: for = 1 o T do 4: Receive he regressor daa insance x ; 5: k) Compue he WLs oupus ˆd ; 6: Produce he final esimae ˆd = z T y = zt 7: Receive he rue oupu d desired daa); 8: λ 1) = 1; l 1) = 0; 9: for k = 1 o m{ do 10: λ k) = min 1, } σ 2) l k) /2 ; 1) m) [ ˆd,..., ˆd ] T ; 11: Updae he WL k), such ha i has a weighed MSE σ 2 ; 12: e k) k) = d ˆd ; ) ] 13: l k+1) = l k) + [σ 2m e k) 2 ; 14: end for 15: Updae z based on e = d z T y ; 16: end for

8 8 Dariush Kari e al. In Algorihm 1, we have m copies of an online WL, each of which is guaraneed o have a weighed MSE of a mos σ 2. We prove ha he Algorihm 1 can reach a desired MSE, σd 2, hrough Lemma 1, Lemma 2, and Theorem 1. Noe ha since we assume d [ 1, 1], he rivial soluion ˆd = 0 incurs an MSE of a mos 1. Therefore, we define a weak learner as an algorihm which has an MSE less han 1. Lemma 1. every k M, and also T In Algorihm 1, if here is an ineger M such ha T =1 λk) κt for < κt, where 0 < κ < σd 2 is arbirarily chosen, i =1 λm+1) can reach a desired MSE, σd 2. Proof. The proof of Lemma 1 is given in Appendix A. Lemma 2. If he weak learners are guaraneed o have a weighed MSE less han σ 2, i.e., T =1 λk) e k) ) 2 k : 4 σ 2 1 T =1 λk) 4, here is an ineger M ha saisfies he condiions in Lemma 1. Proof. The proof of Lemma 2 is given in Appendix B. Theorem 1. If he weak learners in line 11 of Algorihm 1 achieve a weighed MSE of a mos σ 2 < 1 4, here exiss an upper bound for m such ha he algorihm reaches he desired MSE. Proof. This heorem is a direc consequence of combining Lemma 1 and Lemma 2. Noe ha alhough we are using copies of a base learner as he weak learners and seek o improve is performance, he consiuen WLs can be differen. However, by using he boosing approach, we can improve he MSE performance of he overall sysem as long as he WLs can provide a weighed MSE of a mos σ 2. For example, we can improve he performance of mixure-of-expers algorihms Arenas-Garcia e al. 2016)) by leveraging he boosing approach inroduced in his paper. As shown in Fig. 1, a each ieraion, we have m parallel running WLs wih esimaing funcions f k), producing esimaes ˆdk) = f k) x ) of d, k = 1,..., m. As an example, if we use m linear algorihms, ˆdk) = x T w k) is he esimae generaed by he k h WL. The oupus of hese m WLs are hen combined using he linear weighs z o produce he final esimae as ˆd = z T y Koza e al. 2010)), where y [ ˆd1),..., ˆdm) ] T is he vecor of oupus. Afer he desired oupu d is revealed, he m parallel running WLs will be updaed for he nex ieraion. Moreover, he linear combinaion coefficiens z are also updaed using he normalized SGD Sayed 2003)), as deailed laer in Secion 4.1. Afer d is revealed, he consiuen WLs, f k), k = 1,..., m, are consecuively updaed, as shown in Fig. 1, from op o boom, i.e., firs k = 1 is updaed, hen, k = 2 and finally k = m is updaed. However, o enhance he performance, we use a boosed updaing approach Freund and E.Schapire 1997)), such ha he k + 1) h WL receives a oal loss parameer, l k+1), from he k h WL, as [ ) ] 2 l k+1) = l k) + σm 2 d f k) x ), 4) o compue a weigh λ k). The oal loss parameer l k), indicaes he sum of he differences beween he modified desired MSE σm) 2 and he squared error of

9 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 9 Desired Oupu d + - e dˆ Combining he resuls of all consiuen WLs Final Esimae m) z 2) z Combinaion Weighs 1) z 1 ˆ1) d 1) f + - 1) e x Inpu Vecor m) ˆm) d - m m) f WL m) 1 + m) e 2) 2) 2 ˆ2) d 2) f WL Parameers Updae 3) l - 2) 1 2) l + 2) e 1) 1) WL Parameers Updae 2) l 1) 1 1) l m) Parameers Updae m) l Fig. 1: The block diagram of a boosed online regression sysem ha uses he inpu vecor x o produce he final esimae ˆd. There are m consiuen WLs f 1),..., f m), each of which is k) an online linear algorihm ha generaes is own esimae ˆd. The final esimae ˆd is a linear combinaion of he esimaes generaed by all hese consiuen WLs, wih he combinaion weighs z k) k) s corresponding o ˆd s. The combinaion weighs are sored in a vecor which is updaed afer each ieraion. A ime he k h WL is updaed based on he values of λ k) and e k), and provides he k + 1) h filer wih l k+1) ha is used o compue λ k+1). The parameer δ k) used in compuing λ k). indicaes he weighed MSE of he k h WL over he firs esimaions, and is he firs k 1 WLs a ime. Then, we add he difference σm 2 e k) ) 2 o l k), o [ generae l k+1), and pass l k+1) o he nex WL, as shown in Fig. 1. Here, ) ] 2 σm 2 d f k) x ) measures how much he k h WL is off wih respec o he final MSE performance goal. For example, in a saionary environmen, if d = fx ) + ν, where f ) is a deerminisic funcion and ν is he observaion noise, one can selec he desired MSE σd 2 as an upper bound on he variance of he noise process ν, and define a modified desired MSE as σm 2 σ2 d κ 1 κ. In his sense, l k) measures how he WLs j = 1,..., k are cumulaively performing on d, x ) pair wih respec o he final performance goal. We hen use he weigh λ k) o updae he k h WL wih he weighed updaes, daa reuse, or random updaes mehod, which we explain laer in Secions 5

10 10 Dariush Kari e al. and 6. Our aim is o make λ k) large if he firs k 1 WLs made large errors on d, so ha he k h WL gives more imporance o x, d ) in order o recify he performance of he overall sysem. We now explain how o consruc hese weighs, such ha 0 < λ k) 1. To his end, we se λ 1) = 1, for all, and inroduce a weighing similar o Servedio 2003); Chen e al. 2012)). We define he weighs as { } λ k) = min 1, σ 2) l k) /2, 5) where σ 2 is he guaraneed upper bound on he weighed MSE of he weak learners. However, since here is no prior informaion abou he exac MSE performance of he weak learners, we use he following weighing scheme λ k) = min { 1, δ k) 1 ) c l k) }, 6) where δ k) 1 indicaes an esimae of he kh weak learner s MSE, and c 0 is a design parameer, which deermines he dependence of each WL updae on he performance of he previous WLs, i.e., c = 0 corresponds o independen updaes, like he ordinary combinaion of he WLs in adapive filering Koza e al. 2010); Arenas-Garcia e al. 2016)), while a greaer c indicaes he greaer effec of he previous WLs performance on he weigh λ k) of he curren WL. Noe ha including he parameer c does no change he validiy of our proofs, since one can ake δ k) 1 ) 2c as he new guaraneed weighed MSE. Here, δ k) 1 is an esimae of he Weighed Mean Squared Error WMSE) of he k h WL over {x } 1 and {d } 1. In he basic implemenaion ) of he online boosing Servedio 2003); Chen e al. 2012)), 1 δ k) 1 is se o he classificaion advanage of he weak learners Servedio 2003)), where his advanage is assumed o be he same for all weak learners. In his paper, o avoid using any a priori knowledge and o be compleely adapive, we choose δ k) 1 as he weighed and hresholded MSE of he k h WL up o ime 1 as where Λ k) δ k) = = τ=1 λk) τ τ=1 [ ] ) λ k) + 2 τ 4 d τ f τ k) x τ ) τ=1 λk) τ Λ k) 1 δk) 1 + λk) 4 d [ ] ) + 2 f k) x ) Λ k) 1 +, 7) λk) [ ] +, and f τ k) k) x τ ) hresholds f τ x τ ) ino he range [ 1, 1]. This hresholding is necessary o assure ha 0 < δ k) 1, which guaranees 0 < λ k) 1 for all k = 1,..., m and. We poin ou ha 7) can be recursively calculaed. Regarding he definiion of λ k), if he firs k WLs are good, we will pass less weigh o he nex WLs, such ha hose WLs can concenrae more on he oher

11 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 11 samples. Hence, he WLs can increase he diversiy by concenraing on differen pars of he daa Koza e al. 2010). Furhermore, following his idea, in 6), he weigh λ k) is larger, i.e., close o 1, if mos of he WLs, 1,..., k 1, have errors larger han σm 2 on x, d ), and smaller, i.e., close o 0, if he pair x, d ) is easily modeled by he previous WLs such ha he WLs k,..., m do no need o concenrae more on his pair. 4.1 The Combinaion Algorihm Alhough in he proof of our algorihm, we assume a consan combinaion vecor z over ime, we use a ime varying combinaion vecor in pracice, since here is no knowledge abou he exac number of he required week learners for each problem. Hence, afer d is revealed, we also updae he final combinaion weighs z based on he final oupu ˆd = z T y, where ˆd = z T y, y = [ ˆd1),..., ˆdm) ] T. To updae he final combinaion weighs, we use he normalized SGD algorihm Sayed 2003) yielding y z +1 = z + µ ze y 2. 8) 4.2 Choice of Parameer Values The choice of σm 2 is a crucial ask, i.e., we canno reach any desired MSE for any daa sequence uncondiionally. As an example, suppose ha he daa are generaed randomly according o a known disribuion, while hey are conaminaed wih a whie noise process. I is clear ha we canno obain an MSE level below he noise power. However, if he WLs are guaraneed o saisfy he condiions of Theorem 1, his would no happen. Inuiively, here is a guaraneed upper bound i.e., σ 2 ) on he wors case performance, since in he weighed MSE, he samples wih a higher error have a more imporan effec. On he oher hand, if one chooses a σm 2 smaller han he noise power, l k) will be negaive for almos every k, urning mos of he weighs ino 1, and as a resul he weak learners fail o reach a weighed MSE smaller han σ 2. Neverheless, in pracice we have o choose he parameer σm 2 reasonably and precisely such ha he condiions of Theorem 1 are saisfied. For insance, we se σm 2 o be an upper bound on he noise power. In addiion, he number of weak learners, m, is chosen regarding o he compuaional complexiy consrains. However, in our experimens we choose a moderae number of weak learners, m = 20, which successfully improves he performance. Moreover, according o he resuls in Secion 8.3, he opimum value for c is around 1, hence, we se he parameer c = 1 in our simulaions. 5 Boosed NM Algorihms A each ime, all of he WLs shown in Fig. 1) esimae he desired daa d in parallel, and he final esimae is a linear combinaion of he resuls generaed by he WLs. When he k h WL receives he weigh λ k), i updaes he linear coefficiens w k) using one of he following mehods.

12 12 Dariush Kari e al. 5.1 Direcly Using λ s as Sample Weighs Here, we consider λ k) as he weigh for he observaion pair x, d ) and apply a weighed NM updae o w k). For his paricular weighed NM algorihm, we define he Hessian marix and he gradien vecor as R k) +1 βrk) + λ k) x x T, 9) p k) +1 βpk) + λ k) x d, 10) ) 1 where β is he forgeing facor Sayed 2003) and w k) +1 = R k) k) +1 p +1 can be calculaed in a recursive manner as e k) g k) = = d x T w k), λ k) w k) +1 = wk) P k) x β + λ k) x T P k), x + e k) g k), P k) +1 = β 1 P k) g k) ) x T P k). 11) where P k) R k) ) 1, k) and P 0 = v 1 I, and 0 < v 1. The complee algorihm is given in Algorihm 2 wih he weighed NM implemenaion in 11). 5.2 Daa Reuse Approaches Based on The Weighs Anoher approach follows Ozaboos Oza and Russell 2001)). In his approach, from λ k), we generae an ineger, say n k) = ceilkλ k) ), where K is a design parameer ha akes on posiive ineger values. We hen apply he NM updae on he x, d ) pair repeaedly n k) imes, i.e., run he NM updae on he same x, d ) pair n k) imes consecuively. Noe ha K should be deermined according o he compuaional complexiy consrains. However, increasing K does no necessarily resul in a beer performance, herefore, we use moderae values for K, e.g., we use K = 5 in our simulaions. The final w k) +1 is calculaed afer nk) NM updaes. As a major advanage, clearly, his reusing approach can be readily generalized o oher adapive algorihms in a sraighforward manner. We poin ou ha Ozaboos Oza and Russell 2001)) uses a differen daa reuse sraegy. In his approach, λ k) is used as he parameer of a Poisson disribuion and an ineger n k) is randomly generaed from his Poisson disribuion. One hen applies he NM updae n k) imes. 5.3 Random Updaes Approach Based on The Weighs In his approach, we simply use he weigh λ k) as a probabiliy of updaing he k h WL a ime. To his end, we generae a Bernoulli random variable, which

13 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 13 Algorihm 2 Boosed NM-based algorihm 1: Inpu: x, d ) daa sream), m number of WLs) and σm. 2 2: Iniialize he regression coefficiens w k) 1 for each WL; and he combinaion coefficiens as z 1 = 1 m [1, 1,..., 1]T ; and for all k se δ k) 0 = 0. 3: for = 1 o T do 4: Receive he regressor daa insance x ; k) 5: Compue he WLs oupus ˆd = x T wk) ; 6: Produce he final esimae ˆd = z T 1) [ ˆd,..., 7: Receive he rue oupu d desired daa); 8: λ 1) = 1; l 1) = 0; 9: for k = 1 o m{ do } ) = min 1, δ k) k) c l 1 ; 10: λ k) ˆd m) ] T ; 11: Updae he regression coefficiens w k) by using he NM and he weigh λ k) based on one of he inroduced algorihms in Secion 5; 12: e k) k) = d ˆd ; 13: δ k) Λ k) 1 δk) 1 + λk) [ 4 d f k) ] ) + 2 x ) = ; 14: Λ k) 15: l k+1) Λ k) 1 +λk) = Λ k) 1 + λk) = l k) + [σ 2m 16: end for 17: e = d z T y ; 18: z +1 = z + µ ze y y 2 ; 19: end for e k) ) ] 2 ; is 1 wih probabiliy λ k) and is 0 wih probabiliy 1 λ k). Then, we updae he k h WL, only if he Bernoulli random variable equals 1. Wih his mehod, we significanly reduce he compuaional complexiy of he algorihm. Moreover, due o he dependence of his Bernoulli random variable on he performance of he previous consiuen WLs, his mehod does no degrade he MSE performance, while offering a considerably lower complexiy, i.e., when he MSE is low, here is no need for furher updaes, hence, he probabiliy of an updae is low, while his probabiliy is larger when he MSE is high. 6 Boosed SGD Algorihms In his case, as shown in Fig. 1, we have m parallel running WLs, each of which is updaed using he SGD algorihm. Based on he weighs given in 6) and he oal loss and MSE parameers in 4) and 7), we nex inroduce hree SGD based boosing algorihms, similar o hose inroduced in Secion Direcly Using λ s o Scale The Learning Raes We noe ha by consrucion mehod in 6), 0 < λ k) 1, hus, hese weighs can be direcly used o scale he learning raes for he SGD updaes. When he k h

14 14 Dariush Kari e al. WL receives he weigh λ k), i updaes is coefficiens w k), as w k) +1 = I µ k) λ k) x x T ) w k) + µ k) λ k) x d, 12) where 0 < µ k) λ k) µ k). Noe ha we can choose µ k) = µ for all k, since he online algorihms work consecuively from op o boom, and he k h WL will have a differen learning rae µ k) λ k). 6.2 A Daa Reuse Approach Based on The Weighs In his scenario, for updaing w k), we use he SGD updae n k) = ceilkλ k) ) imes o obain he w k) +1 as q 0) = w k), ) q a) = I µ k) x x T q a 1) + µ k) x d, a = 1,..., n k), ) w k) +1 = q n k). 13) where K is a consan design parameer. Similar o he NM case, if we follow he Ozaboos Oza and Russell 2001)), we use he weighs o generae a random number n k) from a Poisson disribuion wih parameer λ k), and perform he SGD updae n k) imes on w k) as explained above. 6.3 Random Updaes Based on The Weighs Again, in his scenario, similar o he NM case, we use he weigh λ k) o generae a random number from a Bernoulli disribuion, which equals 1 wih probabiliy λ k), and equals 0 wih probabiliy 1 λ k). Then we updae w using SGD only if he generaed number is 1. 7 Analysis Of The Proposed Algorihms In his secion we provide he complexiy analysis for he proposed algorihms. We prove an upper bound for he weighs λ k), which is significanly less han 1. This bound shows ha he complexiy of he random updaes algorihm is significanly less han he oher proposed algorihms, and slighly greaer han ha of a single WL. Hence, i shows he considerable advanage of boosing wih random updaes in processing of high dimensional daa.

15 Boosed Online Regression Algorihms wih Srong Theoreical Bounds Complexiy Analysis Here we compare he complexiy of he proposed algorihms and find an upper bound for he compuaional complexiy of random updaes scenario inroduced in Secion 5.3 for NM, and in Secion 6.3 for SGD updaes), which shows is significanly lower compuaional burden wih respec o wo oher approaches. For x R r, each WL performs Or) compuaions o generaes is esimae, and if updaed using he NM algorihm, requires Or 2 ) compuaions due o updaing he marix R k), while i needs Or) compuaions when updaed using he SGD mehod in heir mos basic implemenaion). We firs derive he compuaional complexiy of using he NM updaes in differen boosing scenarios. Since here are a oal of m WLs, all of which are updaed in he weighed updaes mehod, his mehod has a compuaional cos of order Omr 2 ) per each ieraion. However, in he random updaes, a ieraion, he k h WL may or may no be updaed wih probabiliies λ k) 1 λ k) respecively, yielding C k) = { Or 2 ) Or) wih probabiliy λ k) wih probabiliy 1 λ k), where C k) indicaes he complexiy of running he k h WL a ieraion. Therefore, he oal compuaional complexiy C a ieraion will be C = m k=1 Ck), which yields [ m ] m E [C ] = E C k) = E[λ k) ]Or 2 ) 15) k=1 Hence, if E [ λ k) ] is upper bounded by λk) < 1, he average compuaional complexiy of he random updaes mehod, will be E [C ] < k=1 and 14) m λ k) Or 2 ). 16) k=1 In Theorem 2, we provide sufficien consrains o have such an upper bound. Furhermore, we can use such a bound for he daa reuse mode as well. In his case, for each WL f k), we perform he NM updae λ k) K imes, resuling a m compuaional complexiy of order E [C ] < K λk) Or 2 )). For he SGD updaes, we similarly obain he compuaional complexiies Omr), m k=1 O λk) r ), and m k=1 O K λk) r ), for he weighed updaes, random updaes, and daa reuse scenarios respecively. The following heorem deermines he upper bound λk) for E [ λ k) ]. Theorem 2. If he WLs converge and achieve a sufficienly small MSE according o he proof following his Theorem), he following upper bound is obained for λ k), given ha σm 2 is chosen properly, [ ] E λ k) λk) = k=1 ) 1 k γ 2σ2 m1 + 2ζ 2 2 ln γ), 17)

16 16 Dariush Kari e al. [ ] where γ E δ k) 1 and ζ 2 E [ ) ] 2 e k). I can be sraighforwardly shown ha, his bound is less han 1 for appropriae choices of σm, 2 and reasonable values for he MSE according o he proof. This heorem saes ha if we adjus σm 2 such ha i is achievable, i.e., he WLs can provide a slighly lower MSE han σm, 2 he probabiliy of updaing he WLs in he random updaes scenario will decrease. This is of course our desired resuls, since if he WLs are performing sufficienly well, here is no need for addiional updaes. Moreover, if σm 2 is oped such ha he WLs canno achieve a MSE equal o σm, 2 he WLs have o be updaed a each ieraion, which increases he complexiy. Proof: For simpliciy, in his proof, we have assumed ha c = 1, however, he resuls are readily exended o he general values of c. We consruc our proof based on he following assumpion: Assumpion: assume ha e k) s are independen and idenically disribued i.i.d) zero-mean Gaussian random variables wih variance ζ 2. We have [ { [ ] E λ k) = E min 1, min { 1, E δ k) 1 }] ) k) l [ ]} ) k) l δ k) 1 18) Now, we show ha under cerain condiions, E [ δ k) ) k) l ] 1 will be less han 1, hence, we obain an upper bound for E [ λ k) ] k). We define s lnδ 1 ), yielding [ ] ) k) l [ [ E δ k) 1 = E E exp s l k) ) ]] [ ] s = E s) s, 19) where M k) l he Algorihm 2, l k) e j) ζ M l k).) is he momen generaing funcion of he random variable l k). From = k 1)σm 2 k 1 j)) 2. j=1 e According o he Assumpion, is a sandard normal random variable. Therefore, k 1 j=1 j)) 2 e has a Gamma disribuion as Γ k 1 2, 2ζ2) Papoulis and Pillai 2002)), which resuls in he following momen generaing funcion for l k) ) ) 1 k M k) l s) = exp sk 1)σm ζ 2 2 s ) k 1)σ = δ k) 2 )) m ζ 2 ln δ k) 1 k ) In he above equaliy δ k) 1 is a random variable, he mean of which is denoed by γ. We poin ou [ ha ] γ will[ approach )] o ζ 2 in convergence. We define a funcion ϕ.) such ha E λ k) = E ϕ δ k) 1, and seek o find a condiion for ϕ.) o be a concave funcion. Then, by using he Jenssen s inequaliy for concave funcions, we have [ ] E λ k) ϕγ). 21)

17 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 17 Inspired by 20), we define A A δ k) 1 )) 1 k 2 δ k) 1. By hese definiions we obain ) ϕ δ k) 1 = 1 k )) A δ k) k ) δ k) 2σ )) ) 2 m ζ 2 ln δ k) 1 and ϕ δ k) 1 [ k 1 2 ) )) A δ k) 2 1 )) + A δ k) 2 1 A δ 1) ] k). 22) Considering ha k > 1, in order for ϕ.) o be concave, i suffices o have )) 2 ) ) A δ k) 1 A δ k) k > A δ k) 2 1)), 23) which reduces o he following necessary and sufficien condiions: and where and ) 2σ δ k) 2 m ζ 2 ln )) 2 < δ k) σ 2 m ) 2 4k + 1), 24) 1 ξ 1 )σm 2 ) < ζ 2 1 ξ 2 )σm < 2 ), 25) 1 2σm 2 ln δ k) 1 1 2σm 2 ln δ k) 1 ξ 1 = α σm) 2 + α 1 + 2σm) 2 2 α 2 4k + 1)δ k) 2k + 1)δ k) 1 )2σ2 m ξ 2 = α σm) 2 α 1 + 2σm) 2 2 α 2 4k + 1)δ k) 2k + 1)δ k) 1 )2σ2 m ) α 1 + 2ζ 2 ln δ k) 1. 1 )2σ2 m, 1 )2σ2 m, Under hese condiions, ϕ.) is concave, herefore, by subsiuing ϕ.) in 21) we achieve 17). This concludes he proof of he Theorem 2. 8 Experimens In his secion, we demonsrae he efficacy of he proposed boosing algorihms for NM and SGD linear WLs under differen scenarios. To his end, we firs consider he online regression of daa generaed wih a saionary linear model. Then, we illusrae he performance of our algorihms under nonsaionary condiions, o horoughly es he adapaion capabiliies of he proposed boosing framework. Furhermore, since he mos imporan parameers in he proposed mehods are σ 2 m, c, and m, we invesigae heir effecs on he final MSE performance. Finally, we provide he resuls of he experimens over several real and synheic benchmark daases.

18 18 Dariush Kari e al. Throughou his secion, SGD represens he linear SGD-based WL, NM represens he linear NM-based WL, and a prefix B indicaes he boosing algorihms. In addiion, we use he suffixes -WU, -RU, or -DR o denoe he weighed updaes, random updaes, or daa reuse modes, respecively, e.g., he BSGD-RU represens he Boosed SGD-based algorihm using Random Updaes. In order o observe he boosing effec, in all experimens, we se he sep size of SGD and he forgeing facor of he NM o heir opimal values, and use hose parameers for he WLs, oo. In addiion, he iniial values of all of he weak learners in all of he experimens are se o zero. However, in all experimens, since we use K = 5 in BSGD-DR algorihm, we se he sep size of he WLs in BSGD- DR mehod o µ/k = µ/5, where, µ is he sep size of he SGD. To compare he MSE resuls, we have provided he Accumulaed Square Error ASE) resuls. 8.1 Saionary Daa In his experimen, we consider he case where he desired daa is generaed by a saionary linear model. The inpu vecors x = [x 1 x 2 1] are 3-dimensional, where [x 1 x 2 ] is drawn from a joinly Gaussian random process and hen scaled such ha x = [x 1 x 2 ] T [0 1] 2. We include 1 as he hird enry of x o consider affine learners. Specifically he desired daa is generaed by d = [1 1 1] T x + ν, where ν is a random Gaussian noise wih a variance of In our simulaions, we use m = 20 WLs and µ = 0.1 for all SGD learners. In addiion, for NM-based boosing algorihms, we se he forgeing facor β = for all algorihms. Moreover, we choose σm 2 = 0.02 for SGD-based algorihms and σm 2 = for NM-based algorihms, K = 5 for daa reuse approaches, and c = 1 for all boosing algorihms. To achieve robusness, we average he resuls over 100 rials. As depiced in Fig. 2, our proposed mehods boos he performance of a single linear SGD-based WL. Neverheless, we canno furher improve he performance of a linear NM-based WL in such a saionary experimen since he NM achieves he lowes MSE. We poin ou ha he random updaes mehod achieves he performance of he weighed updaes mehod and he daa reuse mehod wih a much lower complexiy. In addiion, we observe ha by increasing he daa lengh, he performance improvemen increases Noe ha he disance beween he ASE curves is slighly increasing). 8.2 Chaoic Daa Here, in order o show he racking capabiliy of our algorihms in nonsaionary environmens, we consider he case where he desired daa is generaed by he Duffing map Wiggins 2003)) as a chaoic model. Specifically, he daa is generaed by he following equaion x +1 = 2.75x x 3 0.2x 1, where we se x 1 = and x 0 = We consider d = x +1 as he desired daa and [x 1 x 1] as he inpu vecor. In his experimen, each boosing algorihm uses 20 WLs. The sep sizes for he SGD-based algorihms are se o 0.1, he forgeing facor β for he NM-based algorihms are se o 0.999, and he modified desired

19 Boosed Online Regression Algorihms wih Srong Theoreical Bounds Accumulaed Squared Error Performance - Saionary Experimen Accumulaed Squared Error SGD NM BNM-WU BNM-DR BNM-RU SGD BSGD-WU BSGD-DR BSGD-RU 4.8 NM and BNM algorihms BSGD algorihms Daa Lengh T) 10 4 Fig. 2: The ASE performnce of he proposed algorihms in he saionary daa experimen. MSE parameer σ 2 m is se o 0.25 for BSGD mehods, and 0.17 for he BNM mehods. Noe ha alhough he value of σ 2 m is higher han he achieved MSE, i can improve he performance significanly. This is because of he boosing effec, i.e., emphasizing on he harder daa paerns. The figures show he superior performance of our algorihms over a single WL whose sep size is chosen o be he bes), in his highly nonsaionary environmen. Moreover, as shown in Fig. 3, in he SGD-based boosed algorihms, he daa reuse mehod shows a beer performance relaive o he oher boosing mehods. However, he random updaes mehod has a significanly lower ime consumpion, which makes i desirable for larger daa lenghs. From he Fig. 3, one can see ha our mehod is ruly boosing he performance of he convenional linear WLs in his chaoic environmen. From he Fig. 4, we observe he approximae changes of he weighs, in he BSGD-RU algorihm running over he Duffing daa. As shown in his figure, he weighs do no change monoonically, and his shows he capabiliy of our algorihm in effecive racking of he nonsaionary daa. Furhermore, since we updae he WLs in an ordered manner, i.e., we updae he k + 1) h WL afer he k h WL is updaed, he weighs assigned o he las WLs are generally smaller han he weighs assigned o he previous WLs. As an example, in Fig. 4 we see ha he weighs assigned o he 5 h WL are larger han hose of he 10 h and 20 h WLs. Furhermore, noe ha in his experimen, he dependency parameer c is se o 1. We should menion ha increasing he value of his parameer, in general, causes he lower weighs, hence, i can considerably reduce he complexiy of he random updaes and daa reuse mehods. 8.3 The Effec of Parameers In his secion, we invesigae he effecs of he dependence parameer c and he modified desired MSE σ 2 m as well as he number of WLs,m, on he boosing performance of our mehods in he Duffing daa experimen, explained in Secion 8.2. From he resuls in Fig. 5c, we observe ha, increasing he number of WLs up o 30 can improve he performance significanly, while furher increasing of m only increases he compuaional complexiy wihou improving he performance.

20 20 Dariush Kari e al. Accumulaed Squared Error Performance - Duffing Experimen NM 0.18 SGD Accumulaed Squared Error BNM-RU BNM-DR BNM-WU BSGD-WU and BSGD-RU 0.15 BSGD-DR Daa Lengh T) 10 4 Fig. 3: ASE performance of he proposed mehods on a Duffing daa se Changes in λ values in Duffing experimen, BSGD-RU 5h WL 10h WL 20h WL λ Daa Lengh T) 10 4 Fig. 4: The changing of he weighs in BSGD-RU algorihm in he Duffing daa experimen. In addiion, as shown in Fig. 5b, in his experimen, he dependency parameer c has an opimum value around 1. We noe ha choosing small values for c reduces he boosing effec, and causes he weighs o be larger, which in urn increases he compuaional complexiy in random updaes and daa reuse approaches. On he oher hand, choosing very large values for c increases he dependency, i.e., in his case he generaed weighs are very close o 1 or 0, hence, he boosing effec is decreased. Overall, one should choose values around 1 for c o avoid hose exreme cases. Furhermore, as depiced in Fig. 5a, here is an opimum value around 0.5 for σ 2 m in his experimen. Noe ha, choosing small values for σ 2 m resuls in large weighs, hus, increases he complexiy and reduces he diversiy. However, choosing higher values for σ 2 m resuls in smaller weighs, and in urn reduces he complexiy. Neverheless, we noe ha increasing he value of σ 2 m does no necessarily enhance he performance. Through he experimens, we find ou ha σ 2 m mus be in he order of he MSE amoun o obain he bes performance.

21 Boosed Online Regression Algorihms wih Srong Theoreical Bounds 21 The effec ofσ m 2 on he MSE performance-duffing BNM-RU BSGD-RU 0.18 Mean Squared Error σ m a) The effec of he parameer σ 2 m The effec of c on he MSE performance-duffing BNM-RU BSGD-RU 0.18 Mean Squared Error b) The effec of he parameer c c The effec of m on he MSE performance-duffing 0.18 BNM-RU BSGD-RU Mean Squared Error he number of WLs m) c) The effec of he parameer m Fig. 5: The effec of he parameers σm 2, c, and m, on he MSE performance of he BNM-RU and BSGD-RU algorihms in he Duffing daa experimen. 8.4 Benchmark Real and Synheic Daa Ses In his secion, we demonsrae he efficiency of he inroduced mehods over some widely used real life machine learning regression daa ses. We have normalized each dimension of he daa o he inerval [ 1, 1] in all algorihms. We presen he MSE performance of he algorihms in Table 1. These experimens show ha our algorihms can successfully improve he performance of single linear WLs. We

22 22 Dariush Kari e al. Algorihms Daa Ses SGD BSGD-WU BSGD-DR BSGD-RU NM BNM-WU BNM-DR BNM-RU MV Puma8NH Kinemaics Compaciv Proein Teriary ONP California Housing YPMSD Table 1: The MSE of he proposed algorihms on real daa ses. now describe he experimens and provide he resuls: Here, we briefly explain he deails of he daa ses: 1. MV: This is an arificial daase wih dependencies beween he aribue values. One can refer o Torgo) for furher deails. There are 10 aribues and one arge value. In his daase, we can slighly improve he performance of a single linear WL by using any of he proposed mehods. 2. Puma Dynamics Puma8NH): This daase is a realisic simulaion of he dynamics of a Puma 560 robo arm Torgo). The ask is o predic he angular acceleraion of one of he robo arm s links. The inpus include angular posiions, velociies and orques of he robo arm. According o he ASE resuls in Fig. 6a, he BNM-WU has he bes boosing performance in his experimen. Noneheless, he SGD-based mehods also improve he performance. 3. Kinemaics: This daase is concerned wih he forward kinemaics of an 8 link robo arm Torgo). We use he varian 8nm, which is highly non-linear and noisy. As shown in Fig. 6b, our proposed algorihms slighly improve he performance in his experimen. 4. Compuer Aciviy Compaciv): This real daase is a collecion of compuer sysems aciviy measures Torgo). The ask is o predic USR, he porion of ime ha CPUs run in user mode from all aribues Torgo). The NMbased boosing algorihms deliver a significan performance improvemen in his experimen, as shown by he resuls in Table Proein Teriary Lichman 2013)): This daase is colleced from Criical Assessmen of proein Srucure Predicion CASP) experimens 5 9. The aim is o predic he size of he residue using 9 aribues over daa insances. 6. Online News Populariy ONP) Lichman 2013); Pereira e al. 2015)): This daase summarizes a heerogeneous se of feaures abou aricles published by Mashable in a period of wo years. The goal is o predic he number of shares in social neworks populariy). 7. California Housing: This daase has been obained from SaLib reposiory. They have colleced informaion on he variables using all he block groups in California from he 1990 Census. Here, we seek o find he house median values, based on he given aribues. For furher descripion one can refer o Torgo). 8. Year Predicion Million Song Daase YPMSD) Berin-Mahieux e al. 2011)): The aim is predicing he release year of a song from is audio feaures. Songs are mosly wesern, commercial racks ranging from 1922 o 2011, wih a peak in he year 2000s. We use a subse of he Million Song Daase Berin-Mahieux

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

BOOSTED ADAPTIVE FILTERS

BOOSTED ADAPTIVE FILTERS BOOSTED ADAPTIVE FILTERS a hesis submied o he graduae school of engineering and science of bilken universiy in parial fulfillmen of he requiremens for he degree of maser of science in elecrical and elecronics

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Highly efficient nonlinear regression for big data with lexicographical splitting

Highly efficient nonlinear regression for big data with lexicographical splitting DOI 10.1007/s11760-016-0972-8 ORIGINAL PAPER Highly efficien nonlinear regression for big daa wih lexicographical spliing Mohammadreza Mohaghegh Neyshabouri 1 Oguzhan Demir 1 Ibrahim Delibala 2 Suleyman

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Ensemble Confidence Estimates Posterior Probability

Ensemble Confidence Estimates Posterior Probability Ensemble Esimaes Poserior Probabiliy Michael Muhlbaier, Aposolos Topalis, and Robi Polikar Rowan Universiy, Elecrical and Compuer Engineering, Mullica Hill Rd., Glassboro, NJ 88, USA {muhlba6, opali5}@sudens.rowan.edu

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

A Shooting Method for A Node Generation Algorithm

A Shooting Method for A Node Generation Algorithm A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi Creep in Viscoelasic Subsances Numerical mehods o calculae he coefficiens of he Prony equaion using creep es daa and Herediary Inegrals Mehod Navnee Saini, Mayank Goyal, Vishal Bansal (23); Term Projec

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Boosting with Online Binary Learners for the Multiclass Bandit Problem

Boosting with Online Binary Learners for the Multiclass Bandit Problem Shang-Tse Chen School of Compuer Science, Georgia Insiue of Technology, Alana, GA Hsuan-Tien Lin Deparmen of Compuer Science and Informaion Engineering Naional Taiwan Universiy, Taipei, Taiwan Chi-Jen

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

DEPARTMENT OF STATISTICS

DEPARTMENT OF STATISTICS A Tes for Mulivariae ARCH Effecs R. Sco Hacker and Abdulnasser Haemi-J 004: DEPARTMENT OF STATISTICS S-0 07 LUND SWEDEN A Tes for Mulivariae ARCH Effecs R. Sco Hacker Jönköping Inernaional Business School

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Appendix to Creating Work Breaks From Available Idleness

Appendix to Creating Work Breaks From Available Idleness Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

Mean Square Projection Error Gradient-based Variable Forgetting Factor FAPI

Mean Square Projection Error Gradient-based Variable Forgetting Factor FAPI 3rd Inernaional Conference on Advances in Elecrical and Elecronics Engineering (ICAEE'4) Feb. -, 4 Singapore Mean Square Projecion Error Gradien-based Variable Forgeing Facor FAPI Young-Kwang Seo, Jong-Woo

More information

Sensors, Signals and Noise

Sensors, Signals and Noise Sensors, Signals and Noise COURSE OUTLINE Inroducion Signals and Noise: 1) Descripion Filering Sensors and associaed elecronics rv 2017/02/08 1 Noise Descripion Noise Waveforms and Samples Saisics of Noise

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Failure of the work-hamiltonian connection for free energy calculations. Abstract

Failure of the work-hamiltonian connection for free energy calculations. Abstract Failure of he work-hamilonian connecion for free energy calculaions Jose M. G. Vilar 1 and J. Miguel Rubi 1 Compuaional Biology Program, Memorial Sloan-Keering Cancer Cener, 175 York Avenue, New York,

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief

More information

4.1 Other Interpretations of Ridge Regression

4.1 Other Interpretations of Ridge Regression CHAPTER 4 FURTHER RIDGE THEORY 4. Oher Inerpreaions of Ridge Regression In his secion we will presen hree inerpreaions for he use of ridge regression. The firs one is analogous o Hoerl and Kennard reasoning

More information

Lab #2: Kinematics in 1-Dimension

Lab #2: Kinematics in 1-Dimension Reading Assignmen: Chaper 2, Secions 2-1 hrough 2-8 Lab #2: Kinemaics in 1-Dimension Inroducion: The sudy of moion is broken ino wo main areas of sudy kinemaics and dynamics. Kinemaics is he descripion

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN Inernaional Journal of Applied Economerics and Quaniaive Sudies. Vol.1-3(004) STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN 001-004 OBARA, Takashi * Absrac The

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

FITTING OF A PARTIALLY REPARAMETERIZED GOMPERTZ MODEL TO BROILER DATA

FITTING OF A PARTIALLY REPARAMETERIZED GOMPERTZ MODEL TO BROILER DATA FITTING OF A PARTIALLY REPARAMETERIZED GOMPERTZ MODEL TO BROILER DATA N. Okendro Singh Associae Professor (Ag. Sa.), College of Agriculure, Cenral Agriculural Universiy, Iroisemba 795 004, Imphal, Manipur

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

1. VELOCITY AND ACCELERATION

1. VELOCITY AND ACCELERATION 1. VELOCITY AND ACCELERATION 1.1 Kinemaics Equaions s = u + 1 a and s = v 1 a s = 1 (u + v) v = u + as 1. Displacemen-Time Graph Gradien = speed 1.3 Velociy-Time Graph Gradien = acceleraion Area under

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2 Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes

More information

An Online Minimax Optimal Algorithm for Adversarial Multi-Armed Bandit Problem

An Online Minimax Optimal Algorithm for Adversarial Multi-Armed Bandit Problem 1 An Online Minimax Opimal Algorihm for Adversarial Muli-Armed Bandi Problem Kaan Gokcesu and Suleyman S. Koza, Senior Member, IEEE Absrac We invesigae he adversarial muli-armed bandi problem and inroduce

More information

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS Xinping Guan ;1 Fenglei Li Cailian Chen Insiue of Elecrical Engineering, Yanshan Universiy, Qinhuangdao, 066004, China. Deparmen

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information