New Optimisation Methods for Machine Learning

Size: px
Start display at page:

Download "New Optimisation Methods for Machine Learning"

Transcription

1 New Optmsato Methods for Mache Learg Aaro Defazo (Uder Examato) A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty November 204

2 c Aaro Defazo 204

3 Except where otherwse dcated, ths thess s my ow orgal work. Aaro Defazo 7 November 204

4 v

5 Ackowledgemets I would lke to thak several NICTA researchers for coversatos ad brastormg sessos durg the course of my PhD, partcularly Scott Saer ad my supervsor Tbero Caetao. I would lke to thak Just Domke for may dscussos about the Fto algorthm, ad hs assstace wth developg ad checkg the proof. Lkewse, for the SAGA algorthm I would lke to thak Fracs Bach ad Smo Lacoste-Jule for dscusso ad assstace wth the proofs. The SAGA algorthm was dscovered collaborato wth them whle vstg the INRIA lab, wth some facal support from INRIA. I would also lke to thak my famly for all ther support durg the course of my PhD. Partcularly my mother for gvg me a place to stay for part of the durato of the PhD as well as food, love ad support. I do ot thak her ofte eough. I also would lke to thak NICTA for ther scholarshp durg the course of the PhD. NICTA s fuded by the Australa Govermet through the Departmet of Commucatos ad the Australa Research Coucl through the ICT Cetre of Excellece Program. v

6 v

7 Abstract I ths work we troduce several ew optmsato methods for problems mache learg. Our algorthms broadly fall to two categores: optmsato of fte sums ad of graph structured objectves. The fte sum problem s smply the mmsato of objectve fuctos that are aturally expressed as a summato over a large umber of terms, where each term has a smlar or detcal weght. Such objectves most ofte appear mache learg the emprcal rsk mmsato framework the o-ole learg settg. The secod category, that of graph structured objectves, cossts of objectves that result from applyg maxmum lkelhood to Markov radom feld models. Ulke the fte sum case, all the o-learty s cotaed wth a partto fucto term, whch does ot readly decompose to a summato. For the fte sum problem, we troduce the Fto ad SAGA algorthms, as well as varats of each. The Fto algorthm s best suted to strogly covex problems where the umber of terms s of the same order as the codto umber of the problem. We prove the fast covergece rate of Fto for strogly covex problems ad demostrate ts state-of-the-art emprcal performace o 5 datasets. The SAGA algorthm we troduce s complemetary to the Fto algorthm. It s more geerally applcable, as t ca be appled to problems wthout strog covexty, ad to problems that have a o-dfferetable regularsato term. I both cases we establsh strog covergece rate proofs. It s also better suted to sparser problems tha Fto. The SAGA method has a broader ad smpler theory tha ay exstg fast method for the problem class of fte sums, partcular t s the frst such method that ca provably be appled to o-strogly covex problems wth odfferetable regularsers wthout troducto of addtoal regularsato. For graph-structured problems, we take three complemetary approaches. We look at learg the parameters for a fxed structure, learg the structure depedetly, ad learg both smultaeously. Specfcally, for the combed approach, we troduce a ew method for ecouragg graph structures wth the scale-free property. For the structure learg problem, we establsh SHORTCUT, a O( 2.5 ) expected tme approxmate structure learg method for Gaussa graphcal models. For problems where the structure s kow but the parameters ukow, we troduce a approxmate maxmum lkelhood learg algorthm that s capable of learg a useful subclass of Gaussa graphcal models. v

8 v Our thess as a whole troduces a ew sut of techques for mache learg practtoers that creases the sze ad type of problems that ca be effcetly solved. Our work s backed by extesve theory, cludg proofs of covergece for each method dscussed.

9 Cotets Itroducto ad Overvew. Covex Mache Learg Problems Problem Structure ad Black Box Methods Early & Late Stage Covergece Approxmatos No-dfferetablty Mache Learg Publcatos Related to Ths Thess Icremetal Gradet Methods 9 2. Problem Setup Explotg problem structure Radomess ad expected covergece rates Data access order Early Icremetal Gradet Methods Stochastc Dual Coordate Descet (SDCA) Alteratve steps Reducg storage requremets Accelerated SDCA Stochastc Average Gradet (SAG) Stochastc Varace Reduced Gradet (SVRG) x

10 x Cotets 3 New Dual Icremetal Gradet Methods The Fto Algorthm Addtoal otato Method Storage costs Permutato & the Importace of Radomess Expermets The MISO Method A Prmal Form of SDCA Prox-Fto: a Novel Mdpot Algorthm Prox-Fto relato to Fto No-Uform Lpschtz Costats Fto Theory Ma proof Prox-Fto Theory Ma result Proof of Theorem New Prmal Icremetal Gradet Methods Composte Objectves SAGA Algorthm Relato to Exstg Methods SAG SVRG Fto Implemetato Expermets SAGA Theory Lear covergece for strogly covex problems /k covergece for o-strogly covex problems Uderstadg the Covergece of the SVRG Method Verfyg SAGA Costats Strogly covex step sze g = /2(µ + L) Strogly covex step sze g = /3L No-strogly covex step sze g = /3L... 8

11 Cotets x 5 Access Orders ad Complexty Bouds Lower Complexty Bouds Techcal assumptos Smple ( )k boud Mmsato of o-strogly covex fte sums Ope problems Access Ordergs MISO Robustess Beyod Fte Sums: Learg Graphcal Models Beyod the Fte Sum Structure The Structure Learg Problem Covarace Selecto Drect optmsato approaches Neghbourhood selecto Thresholdg approaches Codtoal thresholdg Alteratve Regularsers Learg Scale Free Networks Combatoral Objectve Submodularty Optmsato Alteratg drecto method of multplers Proxmal operator usg dual decomposto Alteratve Degree Prors Expermets Recostructo of sythetc etworks Recostructo of a gee actvato etwork Rutme comparso: dfferet proxmal operator methods Rutme comparso: submodular relaxato agast other approaches Proof of Correctess

12 x Cotets 8 Fast Approxmate Structural Iferece SHORTCUT Rug Tme Expermets Sythetc datasets Real world datasets Theoretcal Propertes Fast Approxmate Parameter Iferece Model Class Improper models Precso matrx restrctos A Approxmate Costraed Maxmum Etropy Learg Algorthm Maxmum Etropy Learg The Bethe Approxmato Maxmum etropy learg of ucostraed Gaussas dstrbutos Restrcted Gaussa dstrbutos Maxmum Lkelhood Learg wth Belef Propagato Collaboratve Flterg The Item Graph Lmtatos of prevous approaches The Item Feld Model Predcto Rule Expermets Related Work Extesos Mssg Data & Kerel Fuctos Codtoal Radom Feld Varats

13 Cotets x 0 Cocluso ad Dscusso Icremetal Gradet Methods Summary of cotrbutos Applcatos Ope problems Learg Graph Models Summary of cotrbutos Applcatos Ope Problems A Basc Covexty Theorems 63 A. Deftos A.2 Useful Propertes of Covex Cojugates A.3 Types of Dualty A.4 Propertes of Dfferetable Fuctos A.5 Covexty Bouds A.5. Taylor lke bouds A.5.2 Gradet dfferece bouds A.5.3 Ier product bouds A.5.4 Stregtheed bouds usg both Lpschtz ad strog covexty 70 B Mscellaeous Lemmas 75 Bblography 79 Refereces 79

14 xv Cotets

15 Chapter Itroducto ad Overvew Numercal optmsato s may ways the core problem moder mache learg. Vrtually all learg problems ca be tackled by formulatg a real valued objectve fucto expressg some otato of loss or suboptmalty whch ca be optmsed over. Ideed approaches that do t have well fouded objectve fuctos are rare, perhaps cotrastve dvergece (Hto, 2002) ad some samplg schemes beg otable examples. May methods that started as heurstcs were able to be sgfcatly mproved oce well-fouded objectves were dscovered ad exploted, o-tree belef propagato ad the relato to the Bethe approxmato (Yedda et al., 2000), ad the later developmet of tree weghted varats (Wawrght et al., 2003) beg a otable example. The core of ths thess s the developmet of several ew umercal optmsato schemes, whch ether address lmtatos of exstg approaches, or mprove o the performace of state-of-the-art algorthms. These methods crease the breadth ad depth of mache learg problems that are tractable o moder computers.. Covex Mache Learg Problems I ths work we partcularly focus o problems that have covex objectves. Ths s a major restrcto, ad oe at the core of much of moder optmsato theory, but oe that evertheless requres justfcato. The prmary reasos for targetg covex problems s ther ubqutousess applcatos ad ther relatve ease of solvg them. Logstc regresso, least-squares, support vector maches, hdde- Markov models, codtoal radom felds ad tree-weghted belef propagato all volve covex models. All of these techques have see real world applcato, although ther use has bee overshadowed recet years by o-covex models such as eural etworks. Covex optmsato s stll of terest whe addressg o-covex problems though. May algorthms that were developed for covex problems, motvated by ther provably fast covergece have later bee appled to o-covex problems wth good emprcal results.

16 2 Itroducto ad Overvew The class of covex umercal problems s sometmes cosdered syoymous wth that of computatoally tractable problems. Ths s o loger ecessarly the case practce, as we ca tackle o-covex problems of massve scale usg moder approaches (.e. Dea et al., 202). Istead, covex problems ca be better thought of as the relably solvable problems. For covex problems we ca almost always establsh theoretcal results gvg a practcal boud o the amout of computato tme requred to solve a gve covex problem (Nesterov ad Nemrovsk, 994). Together wth the small or o tug requred by covex optmsato algorthms, they ca be used as buldg blocks wth larger programs; detals of the problem ca be abstracted away from the users. Ths s ot the case for o-covex problems, where kow methods requre substatal had tug. Gve these advatages, may researchers cosder covex optmsato a solved problem. Ths s largely the result of udergraduate courses ad text books treatg teror pot methods as the prcpal soluto to covex problems. Ths vew s partcularly prevalet amog statstcas, uder the reweghted least-squares omeclature. Whle Newto s method s strkgly successful o small problems, ts approxmately cubc rug tme per terato resultg from the eed to do a lear solve meas that t scales extremely poorly to problems wth large umbers of varables. It s also uable to drectly hadle o-dfferetable problems commo mache learg. Both of these shortcomgs have bee addressed to some degree (Nocedal, 980; Lu ad Nocedal, 989; Adrew ad Gao, 2007), by the use of low-rak approxmatos ad trcks for specfc o-dfferetable structures, although problems rema. A addtoal complcato s a dvergece betwee the umercal optmsato ad mache learg commutes. Numercal covex optmsato researchers the 80s ad 90s largely focused o solvg problems wth large umbers of complex costrats, partcularly Quadratc Programmg (QP) ad Lear Programmg (LP) problems. These advaces were applcable to the kerel methods of the early 2000s, but at odds wth may of the more moder mache learg problems whch are charactersed by large umbers of potetally o-dfferetable terms. The core examples would be lear support vector maches, other max-marg methods ad eural etworks wth o-dfferetable actvato fuctos. The problem we address Chapter 7 also fts to ths class. I ths thess we wll focus o smooth optmsato problems that obey the Lpschtz smoothess crtero. A fucto f s Lpschtz smooth wth costat L f ts gradets are Lpschtz cotuous. That s, for all x, y 2 R d : f 0 (x) f 0 (y) apple L kx yk. We should ote that the geeral statemet that all covex problems are computato tractable s actually ot true the usual computer scece sese. There exst covex problems that are NP-hard, the sese that problems the NP-hard class are reducble to covex optmsato problems (.e. de Klerk ad Pasechk, 2006).

17 .2 Problem Structure ad Black Box Methods 3 Lpschtz smooth fuctos are dfferetable, ad f ther Hessa matrx exsts t s bouded spectral orm. The other assumpto we wll sometmes make s that of strog covexty. A fucto f s strogly covex wth costat µ f for all x, y 2 R d ad a 2 [0, ]: f (ax +( a)y) apple a f (x)+( a) f (y) a ( a) µ 2 kx yk2. Essetally rather tha the usual covexty terpolato boud f (ax +( a f (x)+( a) f (y), we have t stregtheed by a quadratc term. a)y) apple.2 Problem Structure ad Black Box Methods The last few years have see a resurgece covex optmsato cetred aroud the techque of explotg problem structure, a approach we take as well. Whe o structure s assumed by the optmsato method about the problem other tha the degree of covexty, very strog results are kow about the best possble covergece rates obtaable. These results date back to the semal work of Nemrovsky ad Yud (983) ad Nesterov (998, earler work Russa). These results have cotrbuted to the wdely held atttude that covex optmsato s a solved problem. But whe the problem has some sort of addtoal structure these worst-case theoretcal results are o loger applcable. Ideed, a seres of recet results suggest that practcally all problems of terest have such structure, allowg advaces theoretcal, ot just practcal covergece. For example, o-dfferetable problems uder reasoable Lpschtz smoothess assumptos ca be solved wth a error reducto of O( p t) tmes after t teratos, for stadard measures of covergece rate, at best (Nesterov, 998, Theorem 3.2.). I practce, vrtually all o-dfferetable problems ca be treated by a smoothg trasformato, gvg a O(t) reducto error after t teratos whe a optmal algorthm s used (Nesterov, 2005). May problems of terest have a structure where most terms the objectve volve oly a small umber of varables. Ths s the case for example ferece problems o graphcal models. I such cases block coordate descet methods ca gve better theoretcal ad practcal results (Rchtark ad Takac, 20). Aother explotable structure volves a sum of two terms F(x) = f (x)+h(x), where the frst term f (x) s structurally ce, say smooth ad dfferetable, but potetally complex to evaluate, ad where the secod term h(x) s o-dfferetable. As log as h(x) s smple the sese that ts proxmal operator s easy to evaluate, the algorthms exst wth the same theoretcal covergece rate as f h(x) was ot part of the objectve at all (F(x) = f (x)) (Beck ad Teboulle, 2009). The proxmal operator

18 4 Itroducto ad Overvew Suboptmalty Iterato LBFGS SGD Icremetal Gradet Fgure.: Schematc llustrato of covergece rates s a key costructo ths work, ad deed moder optmsato theory. It s defed for a fucto h ad costat g as: proxg h (v) =arg m h(x)+ g x 2 kx vk2o. Some deftos of the proxmal operator use the weghtg 2g stead of g 2 ; we use ths form throughout ths work. The proxmal operator s ts self a optmsato problem, ad so geeral t s oly useful whe the fucto h s smple. I may cases of terest the proxmal operator has a closed form soluto. The frst four chapters of ths work focus o qute possbly the smplest problem structure, that of a fte summato. Ths occurs whe there s a large umber of terms wth smlar structure added together or averaged the objectve. Recet results have show that for strogly covex problems better covergece rates are possble uder such summato structures tha s possble for black box problems (Schmdt et al., 203; Shalev-Shwartz ad Zhag, 203b). We provde three ew algorthms for ths problem structure, dscussed Chapters 3 ad 4. We also dscuss propertes of problems the fte sum class extesvely Chapter 5..3 Early & Late Stage Covergece Whe dealg wth problems wth a fte sum structure, practtoers have tradtoally had to make a key trade-off betwee stochastc methods whch access the objectve oe term at a tme, ad batch methods whch work drectly wth the full

19 .3 Early & Late Stage Covergece 5 objectve. Stochastc methods such as SGD exhbt rapd covergece durg early stages of optmsato, yeldg a good approxmate soluto quckly, but ths covergece slows dow over tme; gettg a hgh accuracy soluto s early mpossble wth SGD. Fortuately, mache learg t s ofte the case that a low accuracy soluto gves just as a good a result as a hgh accuracy soluto for mmsg the test loss o held out data. A hgh accuracy soluto ca effectvely over-ft to the trag data. Rug SGD for a small umber of epochs s commo practce. Batch methods o the other had are slowly covergg but steady; f ru for log eough they yeld a hgh accuracy soluto. For strogly covex problems, the dfferece covergece s betwee a O(/t) error after t teratos for SGD versus a O(r t ) error (r < ) for LBFGS 2, the most popular batch method (Nocedal, 980). We have llustrated the dfferece schematcally Fgure.. The SGD ad LBFGS les here are typcal of smple logstc regresso problems, where SGD gves acceptable solutos after 5-0 epochs (passes over the data), where LBFGS evetually gves a better soluto, takg teratos to do so. LBFGS s partcularly well suted to use a dstrbuted computg settg, ad t s sometmes the case LBFGS wll gve better results ultmately o the test loss, partcularly for poorly codtoed (hgh-curvature) problems. Fgure. also llustrates the kd of covergece that the recetly developed class of cremetal gradet methods potetally offers. Icremetal gradet methods have the same lear O(r t ) error after t epochs as a batch method, but wth a coeffcet r dramatcally better. The dfferece beg theory thousads of tmes faster covergece, ad practce usually 0-20 tmes better. Essetally cremetal gradet methods are able to offer the best of both worlds, havg rapd tal covergece wthout the later stage slow-dow of SGD. Aother tradtoal advatage of batch methods over stochastc methods s ther ease of use. Methods such as LBFGS requre o had tug to be appled to vrtually ay smooth problem. Some tug of the memory costat that holds the umber of past gradets to remember at each step ca gve faster covergece, but bad choces of ths costat stll result covergece. SGD ad other tradtoal stochastc methods o the other had requre a step sze parameter ad a parameter aealg schedule to be set. SGD s sestve to these choces, ad wll dverge for poor choces. Icremetal gradet methods offer a soluto to the tug problem as well. Most cremetal gradet algorthms have oly a sgle step sze parameter that eeds to be set. Fortuately the covergece rate s farly robust to the value of ths parameter. The SDCA algorthm reduces ths to 0 parameters, but at the expese of beg lmted to problems wth effcet to compute proxmal operators. 2 Quas-ewto methods are ofte cted as havg super-lear covergece. Ths s oly true f the dmesoalty of the uderlyg parameter space s comparable to the umber of teratos used. I mache learg the parameter space s usually much larger effectve dmeso tha the umber of teratos.

20 6 Itroducto ad Overvew 3 5 C = P = C = Fgure.2: Gaussa graphcal model defed by the precso matrx P, together wth the o-sparse covarace matrx C t duces wth roudg to sgfcat fgure. Correlatos are dcated by egatve edge weghts a Gaussa model..4 Approxmatos The explotato of problem structure s ot always drectly possble wth the objectves we ecouter mache learg. A case we focus o ths work s the learg of weght parameters a Gaussa graphcal model structure. Ths s a udrected graph structure wth weghts assocated wth both edges ad odes. These weghts are the etres of the precso matrx (verse covarace matrx) of a Gaussa dstrbuto. Abset edges effectvely have a weght of zero (Fgure.2). A formal defto s gve Chapter 6. A key approach to such problems s the use of approxmatos that troduce addtoal structure the objectve whch we ca explot. The regularsed maxmum lkelhood objectve for fttg a Gaussa graphcal model ca requre tme O( 3 ) to evaluate 3. Ths s prohbtvely log o may problems of terest. Istead, approxmatos ca be troduced that decompose the objectve, allowg more effcet techques to be used. I Chapter 9 we show how the Bethe approxmato may be appled for learg the edge weghts o restrcted classes of Gaussa graphcal models. Ths approxmato allows for the use of a effcet dual decomposto optmsato method, ad has drect practcal applcablty the doma of recommedato systems. Besdes parameter learg, the other prmary task volvg graphs s drectly learg the structure. Structure learg for Gaussa graphcal models s problem that has see a lot of terest mache learg. The structure ca be used a mache learg ppele as the precursor to parameter learg, or t ca be used for ts 3 Theoretcally t takes tme equvalet to the bg-o cost of a fast matrx multplcato such as Strasse s algorthm ( O( 2.8 )), but practce smpler O( 3 ) techques are used.

21 .5 No-dfferetablty Mache Learg 7 ow sake as dcator of correlato structure a dataset. The use of approxmatos structure learg s more wdespread tha parameter learg, ad we gve a overvew of approaches Chapter 6. We mprove o a exstg techque Chapter 8, where we show that a exstg approxmato ca be further approxmated, gvg a substatal practcal ad theoretcal speed-up by a factor of O( p )..5 No-dfferetablty Mache Learg As metoed, mache learg problems ted to have substatal o-dfferetable structure compared to the costrat structures more commoly addressed umercal optmsato. These two forms of structure are a sese two sdes of the same co, as for covex problems the trasformato to the dual problem ca ofte covert from oe to the other. The prmary example beg support vector maches, where o-dfferetablty the prmal hge loss s coverted to a costrat set whe the dual s cosdered. Recet progress optmsato has see the use of proxmal methods as the tool of choce for hadlg both structures mache learg problems. Whe usg a regularsed loss objectve of the form F(x) = f (x) +h(x) as metoed above Secto.2, the o-dfferetablty ca be the regularser h(x) or the loss term f (x). We troduce methods addressg both cases ths work. The SAGA algorthm of Chapter 4 s a ew prmal method, the frst prmal cremetal gradet method able to be used o o-strogly covex problems wth o-dfferetable regularsers drectly. It makes use of the proxmal operator of the regularser. It ca also be used o problems wth costrats, where the fucto h(x) s the dcator fucto of the costrat set, ad proxmal operator s projecto oto the costrat set. I ths work we also troduce a ew o-dfferetable regularser for the above metoed graph structure learg problem, whch ca also be attacked usg proxmal methods. Its o-dfferetable structure s substatally more complex tha other regularsers used mache learg, requrg a specal optmsato procedure to be used just to evaluate the proxmal operator (Typcally proxmal operators mache learg have closed form solutos). For o-dfferetable losses, we troduce the Prox-Fto algorthm (Secto 3.6). Ths cremetal gradet algorthm uses the proxmal operator of the sgle datapot loss. It provdes a brdge betwee the Fto algorthm (Secto 3.) ad the SDCA algorthm (Shalev-Shwartz ad Zhag, 203b), havg propertes of both methods..6 Publcatos Related to Ths Thess The majorty of the cotet ths thess has bee publshed as coferece artcles. For the work o cremetal gradet methods, the Fto method has bee publshed

22 8 Itroducto ad Overvew as Defazo et al. (204b), ad the SAGA method as Defazo et al. (204a). Chapters 3 & 4 cota much more detaled theory tha has bee prevously publshed. Some of the dscusso Chapter 5 appears Defazo et al. (204b) also. For the porto of ths thess o Gaussa graphcal models, Chapter 7 largely follows the publcato Defazo ad Caetao (202a). Chapter 9 s based o the work Defazo ad Caetao (202b), although heavly revsed.

23 Chapter 2 Icremetal Gradet Methods I ths chapter we gve a troducto to the class of cremetal gradet (IG) methods. Icremetal gradet methods are smply the class of methods that take advatage of ay kow summato structure a optmsato objectve by accessg the objectve oe term at a tme. Objectves that are decomposable as a sum of a umber of terms come up ofte appled mathematcs ad scetfc computg, but are partcularly prevalet mache learg applcatos. Research the last two decades o optmsato problems wth a summato structure has focused more o the stochastc approxmato settg, where the summato s assumed to be over a fte set of terms. The fte sum case that cremetal gradet methods cover has see a resurgece recet years after the dscovery that there exst fast cremetal gradet methods whose covergece rates are better tha ay possble black box method for fte sums wth partcular (commo) structures. We provde a extesve overvew of all kow fast cremetal gradet methods the later parts of ths chapter. Buldg o the descrbed methods, Chapters 3 & 4 we troduce three ovel fast cremetal gradet methods. Depedg o the problem structure, each of these methods ca have state-of-the-art performace. 2. Problem Setup We are terested mmsg fuctos of the form f (x) = Â f (x), = where x 2 R d ad each f s covex ad Lpschtz smooth wth costat L. We wll also cosder the case where each f s addtoally strogly covex wth costat µ. Icremetal gradet methods are algorthms that at each step evaluate the gradet ad fucto value of oly a sgle f. 9

24 0 Icremetal Gradet Methods We wll measure covergece rates terms of the umber of ( f (x), f 0 (x)) evaluatos, ormally these are much cheaper computatoally tha evaluatos of the whole fucto gradet f 0, such as performed by the gradet descet algorthm. We use the otato x to deote a mmser of f. For strogly covex problems ths s the uque mmser. Ths setup dffers from the tradtoal black box smooth covex optmsato problem oly that we are assumg that our fucto s decomposable to a fte sum structure. Ths fte sum structure s wdespread mache learg applcatos. For example, the stadard framework of Emprcal Rsk Mmsato (ERM) takes ths form, where for a loss fucto L : R d R! R ad data label tuples (x, y ), we have: R emp (h) = Â L(h(x ), y ), where h s the hypothess fucto that we ted to optmse over. The most commo case of ERM s mmsato of the egatve log-lkelhood, for stace the classcal logstc regresso problem. 2.. Explotg problem structure Gve the very geeral ature of the fte sum structure, we ca ot expect to get faster covergece tha we would by accessg the whole gradet wthout addtoal assumptos. For example, suppose the summato oly has oe term, or alteratvely each f s the zero fucto except oe of the. Notce that the Lpschtz smoothess ad strog covexty assumptos we made are o each f rather tha o f. Ths s a key pot. If the drectos of maxmum curvature of each term are alged ad of smlar magtude, the we ca expect the term Lpschtz smoothess to be smlar to the smoothess of the whole fucto. However, t s easy to costruct problems for whch ths s ot the case, fact the Lpschtz smoothess of f may be tmes smaller tha that of each f. I that case the cremetal gradet methods wll gve o mprovemet over black box optmsato methods. For mache learg problems, ad partcularly the emprcal rsk mmsato problem, ths worst case behavor s ot commo. The curvature ad hece the Lpschtz costats are defed largely by the loss fucto, whch s shared betwee the terms, rather tha the data pot. Commo data preprocessg methods such as data whteg ca mprove ths eve further. The requremet that the magtude of the Lpschtz costats be approxmately balaced ca be relaxed some cases. It s possble to formulate IG methods where the covergece s stated terms of the average of the Lpschtz costats of the

25 2. Problem Setup f stead of the maxmum. Ths s the case for the Prox-Fto algorthm descrbed Secto 3.6. All kow methods that make use of the average Lpschtz costat requre kowledge of the ratos of the Lpschtz costats of the f terms, whch lmts ther practcalty ufortuately. Regardless of the codto umber of the problem, f we have a summato wth eough terms optmsato becomes easy. Ths made precse the defto that follows. Defto 2.. The bg data codto: For some kow costat b, b L µ. Ths codto obvously requres strog covexty ad Lpschtz smoothess so that L/µ s well defed. It s a very strog assumpto for small, as the codto umber L/µ typcal mache learg problems s at least the thousads. For applcatos of ths assumpto, b s typcally betwee ad 8. Several of the methods we descrbe below have a fxed ad very fast covergece rate depedet of the codto umber whe ths bg-data codto holds Radomess ad expected covergece rates Ths thess works extesvely wth optmsato methods that make radom decsos durg the course of the algorthm. Ulke the stochastc approxmato settg, we are dealg wth determstc, kow optmsato problems; the stochastcty s troduced by our optmsato methods, t s ot heret the problem. We troduce radomess because t allows us to get covergece rates faster tha that of ay curretly kow determstc methods. The caveat s that these covergece rates are expectato, so they do t always hold precsely. Ths s ot as bad as t frst seems though. Determg that the expectato of a geeral radom varable coverges s ormally qute a weak result, as ts value may vary aroud the expectato substatally practce, potetally by far more tha t coverges by. The reaso why ths s ot a ssue for the optmsato methods we cosder s that all the radom varables we boud are o-egatve. A o-egatve radom varable X wth a very small expectato, say: E[X] = 0 5, s wth hgh probablty close to ts expectato. Ths s a fudametal result mpled by Markov s equalty. For example, suppose E[X] = 0 5 ad we wat to boud the probablty that X s greater tha 0 3,.e a factor of 00 worse tha ts expectato. The Markov s equalty tells us that: P(X 0 3 ) apple 00.

26 2 Icremetal Gradet Methods So there s oly a % chace of X beg larger tha 00 tmes ts expected value here. We wll largely focus o methods wth lear covergece the followg chapters, so order to crease the probablty of the value X holdg by a factor r, oly a logarthmc umber of addtoal teratos r s requred (O(log r)). We would also lke to ote that Markov s equalty ca be qute coservatve. Our expermets later chapters show lttle the way of radom ose attrbutable to the optmsato procedure, partcularly whe the amout of data s large Data access order The source of radomess all the methods cosdered ths chapter s the order of accessg the f terms. By access we mea the evaluato of f (x) ad f 0 (x) at a x of our choce. Ths s more formally kow as a oracle evaluato (see Secto 5.), ad typcally costtutes the most computatoally expesve part of the ma loop of each algorthm we cosder. The access order s defed o a per-epoch bass, where a epoch s evaluatos. Oly three dfferet access orders are cosdered ths work: Cyclc Each step wth j = +(k mod ). Effectvely we access f the order they appear, the loop to the begg ad the ed of every epoch. Permuted Each epoch wth j s sampled wthout replacemet from the set of dces ot accessed yet that epoch. Ths s equvalet to permutg the f at the begg of each epoch, the usg the cyclc order wth the epoch. Radomsed The value of j s sampled uformly at radom wth replacemet from,...,. The permuted termology s our omeclature, whereas the other two terms are stadard. 2.2 Early Icremetal Gradet Methods The classcal cremetal gradet (IG) method s smply a step of the form: x k+ = x k g k f 0 j (xk ), where at step k we use cyclc access, takg j = +(k mod ). Ths s smlar to the more well kow stochastc gradet descet, but wth a cyclc order of access of the

27 2.3 Stochastc Dual Coordate Descet (SDCA) 3 data stead of a radom order. We have troduced here a superscrpt otato x k for the varable x at step k. We use ths otato throughout ths work. It turs out to be much easer to aalyse such methods uder a radom access orderg. For the radom order IG method (.e. SGD) o smooth strogly covex problems, the followg rate holds for a approprately chose step szes: h E f (x k ) f (x ) apple L 2k x 0 x 2. The step sze scheme requred s of the form g k = q k, where q s a costat that depeds o the gradet orm boud R as well as the degree of strog covexty µ. It may be requred to be qute small some cases. Ths s what s kow as a sublear rate of covergece, as the depedece o k s of the form O( L 2k ), whch s slower tha the lear rate O(( a) k ) for ay a 2 (0, ) asymptotcally. Icremetal gradet methods for strogly covex smooth problems were of lttle terest up utl the developmet of fast varats (dscussed below), as the sublear rates for the prevously kow methods dd ot compare favourably to the lear rate of quas-newto methods. For o-strogly covex problems, or strogly covex but o-smooth problems, the story s qute dfferet. I those cases, the theoretcal ad practcal rates are hard to beat wth full (sub-)gradet methods. The o-covex case s of partcular terest mache learg. SGD has bee the de facto stadard optmsato method for eural etworks for example sce the 980s (Rumelhart et al., 986). Such cremetal gradet methods have a log hstory, havg bee appled to specfc problems as far back as the 960s (Wdrow ad Hoff, 960). A up-to-date survey ca be foud Bertsekas (202). 2.3 Stochastc Dual Coordate Descet (SDCA) The stochastc dual coordate descet method (Shalev-Shwartz ad Zhag, 203b) s based o the prcple that for problems wth explct quadratc regularsers, the dual takes a partcularly easy to work wth form. Recall the fte sum structure f (x) = Â = f (x) defed earler. Istead of assumg that each f s strogly covex, we stead eed to cosder the regularsed objectve: f (x) = Â f (x)+ µ 2 kxk2. = For ay strogly covex f, we may trasform our fucto to ths form by replacg each f wth f µ 2 kxk2, the cludg a separate regularser. Ths chages the

28 4 Icremetal Gradet Methods Lpschtz smoothess costat for each f to L µ, ad preserves covexty. We are ow ready to cosder the dual trasformato. We apply the techque of dual decomposto, where we decouple the terms our objectve as follows: m f (x) = x,x,...x,...,x  f (x )+ µ 2 kxk2, = s.t. x = x =... Ths reformulato tally acheves othg, but the key dea s that we ow have a costraed optmsato problem, ad so we may apply Lagraga dualty (Secto A.3). The Lagraga fucto s: L(x, x,...a,...) = =  f (x )+ µ 2 kxk2 + =  ha, x  ( f (x ) ha, x ) + µ 2 kxk2 + = x *  a, x +, (2.) where a 2 R d are the troduced dual varables. The Lagraga dual fucto s formed by takg the mmum of L wth respect to each x, leavg a, the set of a =... free: D(a) =  = m x { f (x ) ha, x } + m x ( * µ 2 kxk2 +  a, x Now recall that the defto of the covex cojugate (Secto A.2) says that: m { f (x) ha, x} = sup x {ha, x f (x)} = f (a). +), (2.2) Clearly we ca plug ths for each f to get: D(a) = " *  f (a µ )+m x 2 kxk2 +  = a, x +#. We stll eed to smplfy the remag m term, whch s also the form of a covex cojugate. We kow that squared orms are self-cojugate, ad scalg a fucto by a postve costat b trasforms ts cojugate from f (a) to b f (a/b), so we fact have: D(a) =  f (a ) = µ 2 µ  a 2.

29 2.3 Stochastc Dual Coordate Descet (SDCA) 5 Algorthm 2. SDCA (exact coordate descet) Italse x 0 ad a 0 as the zero vector, for all. Step k + :. Pck a dex j uformly at radom. 2. Update a j, leavg the other a uchaged: a k+ j = arg m y " f j (y)+µ 2 x k # 2 y a k j. µ 3. Update x k+ = x k µ a k+ j a k j. At completo, for smooth f retur x k. For o-smooth, retur a tal average of the x k sequece. Ths s the objectve drectly maxmsed by SDCA. As the ame mples, SDCA s radomsed (block) coordate ascet o ths objectve, where oly oe a s chaged each step. I coordate descet we have the opto of performg a gradet step a coordate drecto, or a exact mmsato. For the exact coordate mmsato, the update s easy to derve: 2 j = arg m 4 a j  f (a )+ µ 2 = 2 a k+ = arg m a j 4 f j (a j)+ µ 2 µ µ   a a (2.3) The prmal pot x k correspodg to the dual varables a k at step k s the mmser h of the cojugate problem x k = µ arg m x 2 kxk2 +  a k, x, whch closed form s smply x k = µ  a k. Ths ca be used to further smplfy Equato 2.3. The full method s Algorthm 2.. The SDCA method has a smple geometrc covergece rate the dual objectve D of the form: h E D(a k ) D(a ) apple µ k D(a 0 ) D(a ). L + µ Ths s easly exteded to a statemet about the dualty gap f (x k ) D(a k ) ad hece

30 6 Icremetal Gradet Methods the suboptmalty f (x k ) f (x ) by usg the relato: f (x k ) D(a k ) apple L + µ µ D(a k ) D(a ) Alteratve steps The full coordate mmsato step dscussed the prevous secto s ot always practcal. If we are treatg each elemet f the summato  f (x) as a sgle data pot loss, the eve for the smple bary logstc loss there s ot a closed form soluto for the exact coordate step. We ca use a black-box D optmsato method to fd the coordate mmser, but ths wll geerally requre expoetal fucto evaluatos, together wth oe vector dot product. For multclass logstc loss, the subproblem solve s ot fast eough to yeld a usable algorthm. I the case of o-dfferetable losses, the stuato s better. Most odfferetable fuctos we use mache learg, such as the hge loss, yeld closed form solutos. For performace reasos we ofte wat to treat each f as a mbatch loss, whch case we vrtually ever have a closed form soluto for the subproblem, eve the o-dfferetable case. Shalev-Shwartz ad Zhag (203a) descrbe a umber of other possble steps whch lead to the same theoretcal covergece rate as the exact mmsato step, but whch are more usable practce: Iterval Le search: It turs out that t s suffcet to perform the mmsato Equato 2.3 alog the terval betwee the curret dual varable a k j ad the pot u = f 0 j (xk ). The update takes the form: s = arg m s2[0,] " f j a k j + s(u ak j ) + µ 2 a k+ j = a k j + s(u ak j ). x k + s # 2 u a k j, µ Costat step: If the value of the Lpschtz smoothess costat L s kow, we ca calculate a coservatve value for the parameter s stead of optmsg over t wth a terval le search. Ths gves a update of the form: a k+ j = a k j + s(u ak j ) where s = µ µ + L. Ths method s much slower practce tha performg a le-search, just as a step sze wth gradet descet s much slower tha performg a le search. L

31 2.3 Stochastc Dual Coordate Descet (SDCA) Reducg storage requremets We have preseted the SDCA algorthm full geeralty above. Ths results dual varables of dmeso d, for whch the total storage d ca be prohbtve. I practce, the dual varables ofte le o a low-dmesoal subspace. Ths s the case wth lear classfers ad regressors, where a r class problem has gradets o a r dmesoal subspace. A lear classfer takes the form f (x) =f X T x, for a fxed loss f : Rr! R ad data stace matrx X : d r. I the smplest case X s just the data pot duplcated as r rows. The the dual varables are r dmesoal, ad the x k updates chage to: x k = µ Â X a. a k+ j = arg m a " f j (a)+µ 2 x k + # 2 µ X a a k j. Ths s the form of SDCA preseted by Shalev-Shwartz ad Zhag (203a), although wth the egato of our dual varables Accelerated SDCA The SDCA method s also curretly the oly fast cremetal gradet method to have a kow accelerated varat. By accelerato, we refer to the modfcato of a optmsato method to mprove the covergece rate by a amout greater tha ay costat factor. Ths termology s commo optmsato although a precse defto s ot ormally gve. The ASDCA method (Shalev-Shwartz ad Zhag, 203a) works by utlsg the regular SDCA method as a sub-procedure. It has a outer loop, whch at each step vokes SDCA o a modfed problem x k+ = m x f (x)+ l 2 kx yk2, where y s chose as a over-relaxed step of the form: y = x k + b(x k x k ), for some kow costat b. The costat l s lkewse computed from the Lpschtz smoothess ad strog covexty costats. These regularsed sub-problems f (x)+ l 2 kx yk2 have a greater degree of strog covexty tha f (x), ad so dvdually are much faster to solve. By a careful choce of the accuracy at whch they are computed to, the total umber of steps made betwee all the subproblem solves s

32 8 Icremetal Gradet Methods much smaller tha would be requred f regular SDCA s appled drectly to f (x) to reach the same accuracy. I partcular, they state that to reach a accuracy of e expectato for the fucto value, they eed k teratos, where: k = Õ d + m ( s )! dl µ, d L log(/e). µ The Õ otato hdes costat factors. Ths rate s ot of the same precse form as the other covergece rates wll dscuss ths chapter. We ca make some geeral statemets though. Whe s the rage of the bg-data codto, ths rate s o better tha regular SDCA s rate, ad probably worse practce due to overheads hdde by the Õ otato. Whe s much smaller tha L µ, the potetally t ca be much faster tha regular SDCA. Ufortuately, the ASDCA procedure has sgfcat computatoal overheads that make t ot ecessarly the best choce practce. Probably the bggest ssue however s a sestvty to the Lpschtz smoothess ad strog covexty costats. It assumes these are kow, ad f the used values dffer from the true values, t may be sgfcatly slower tha regular SDCA. I cotrast, regular SDCA requres o kowledge of the Lpschtz smoothess costats (for the prox varat at least), just the strog covexty (regularsato) costat. 2.4 Stochastc Average Gradet (SAG) The SAG algorthm (Schmdt et al., 203) s the closest form to the classcal SGD algorthm amog the fast cremetal gradet methods. Istead of storg dual varables a lke SDCA above, we store a table of past gradets y, whch has the same storage cost geeral, d. The SAG method s gve Algorthm 2.2. They key equato for SAG s the step: x k+ = x k g  y k. Essetally we move the drecto of the average of the past gradets. Note that ths average cotas oe past gradet for each term, ad they are equally weghted. Ths ca be cotrasted to the SGD method wth mometum, whch uses a geometrcally decayg weghted sum of all past gradet evaluatos. SGD wth mometum however s ot a learly coverget method. It s surprsg that usg equal weghts lke ths actually yelds a much faster covergg algorthm, eve though some of the gradets the summato ca be extremely out of date.

33 2.4 Stochastc Average Gradet (SAG) 9 Algorthm 2.2 SAG Italse x 0 as the zero vector, ad y = f 0 (x0 ) for each. Step k + :. Pck a dex j uformly at radom. 2. Update x usg step legth costat g: x k+ = x k g 3. Set y k+ j = f 0 j (xk+ ). Leave y k+ = y k for 6= j.  y k. SAG s a evoluto of the earler cremetal averaged gradet method (IAG, Blatt et al., 2007) whch has the same update wth a dfferet costat factor, ad wth cyclc access used stead of radomsed. It has a more lmted covergece theory coverg quadratc or bouded gradet problems, ad a much slower rate of covergece. The covergece rate of SAG for strogly covex problems s of the same order as SDCA, although the costats are ot qute as good. I partcular, we have a expected covergece rate terms of fucto value suboptmalty of: E[ f (x k ) f (x )] apple m 8, µ 6L k L 0, Where L 0 s a complex expresso volvg f (x 0 + g  y 0 ) ad a quadratc form of x 0 ad each y 0. Ths theoretcal covergece rate s betwee 8 ad 6 tmes worse tha SDCA. I practce SAG s ofte faster tha SDCA though, suggestg that the SAG theory s ot tght. A ce feature of SAG s that ulke SDCA, t ca be drectly appled to o-strogly covex problems. Dfferetablty s stll requred though. The covergece rate s the terms of the average terate x k = k Âk l xl : E[ f ( x k ) f (x )] apple 32 k L 0. The SAG algorthm has great practcal performace, but t s surprsgly dffcult to aalyse theoretcally. The above rates are lkely coservatve by a factor of betwee 4 ad 8. Due to the dffculty of aalyss, the proxmal verso for composte losses has ot yet had ts theoretcal covergece establshed.

34 20 Icremetal Gradet Methods Algorthm 2.3 SVRG Italse x 0 as the zero vector, g k = Â f 0 (x0 ) ad x 0 = x 0. Step k + :. Pck j uformly at radom. 2. Update x: x k+ = x k h f j 0(xk )+ h fj 0 h ( xk ) g k. 3. Every m teratos, set x ad recalculate the full gradet at that pot: x k+ = x k+. g k = Â f 0 ( xk+ ). Otherwse leave x k+ = x k ad g k+ = g k. At completo retur x. 2.5 Stochastc Varace Reduced Gradet (SVRG) The SVRG method (Johso ad Zhag, 203) s a recetly developed fast cremetal gradet method. It was developed to address the potetally hgh storage costs of SDCA ad SAG, by tradg off storage agast computato. The SVRG method s gve Algorthm 2.3. Ulke the other methods dscussed, there s a tuable parameter m, whch specfes the umber of teratos to complete before the curret gradet approxmato s recalbrated by computg a full gradet f 0 ( x) at the last terate before the recalbrato, x := x k. Essetally, stead of matag a table of past gradets y for each lke SAG does, the algorthm just stores the locato x at whch those gradets should be evaluated, the re-evaluates them whe eeded by just computg fj 0( x). Lke the SAG algorthm, at each step we eed to kow the updated term gradet f 0 j (xk ), the old term gradet f 0 j ( x) ad the average of the old gradets f 0 ( x). Sce we are ot storg the old term gradet, just ts average, we eed to calculate two term gradets stead of the oe term gradet calculated by SAG at each step. The S2GD method (Koečý ad Rchtárk, 203) was cocurretly developed wth SVRG. It has the same update as SVRG, just dfferg that the theoretcal choce of x dscussed the ext paragraph. We use SVRG heceforth to refer to both methods. The update x k+ = x k+ step 3 above s techcally ot supported by the theory. Istead, oe of the followg two updates are used:. x s the average of the x values from the last m teratos. Ths s the varat suggested by Johso ad Zhag (203).

35 2.5 Stochastc Varace Reduced Gradet (SVRG) 2 2. x s a radomly sampled x from the last m teratos. Ths s used the S2GD varat (Koečý ad Rchtárk, 203). These alteratve updates are requred theoretcally as the covergece betwee recalbratos s expressed terms of the sum of fucto values of the last m pots, k  r=k m [ f (x r ) f (x )], stead of terms of f (x k ) f (x ) drectly. Varat avods ths ssue by usg Jese s equalty to pull the summato sde: k  r=k m [ f (x r ) f (x )] f ( k  r=k x r ) f (x ). m Varat 2 uses a sampled x, whch expectato wll also have the requred value. I practce, there s a very hgh probablty that f (x k ) f (x ) s less tha the last-m sum, so just takg x = x k works. The SVRG method has the followg covergece rate f k s a multple of m: E[ f ( x k ) f (x )] apple r k/m f ( x 0 ) f (x ), where r = h 4L(m + ) + µ( 4L/h)m h( 4L/h)m. Note also that each step requres two term gradets, so the rate must be halved whe comparg agast the other methods descrbed ths chapter. There s also the cost of the recalbrato pass, whch (depedg o m) ca further crease the ru tme to three tmes that of SAG per step. Ths covergece rate has qute a dfferet form from that of the other methods cosdered ths secto, makg drect comparso dffcult. However, for most parameter values ths theoretcal rate s worse tha that of the other fast cremetal gradet methods. I Secto 4.7 we gve a aalyss of SVRG that requres addtoal assumptos, but gves a rate that s drectly comparable to the other fast cremetal gradet methods.

36 22 Icremetal Gradet Methods

37 Chapter 3 New Dual Icremetal Gradet Methods I ths chapter we troduce a ovel fast cremetal gradet method for strogly covex problems that we call Fto. Lke SDCA, SVRG ad SAG, Fto s a stochastc method that s able to acheve lear covergece rates for strogly covex problems. Although the Fto algorthm oly uses prmal quattes drectly, the proof of ts covergece rate uses lower bouds extesvely, so t ca be cosdered a dual method, lke SDCA. Smlar to SDCA, ts theory does ot support ts use o ostrogly covex problems, although there s o practcal ssues wth ts applcato. I Secto 3.7 we prove the covergece rate of the Fto method uder the bg-data codto descrbed the prevous chapter. Ths theoretcal rate s better tha the SAG ad SVRG rates but ot qute as good as the SDCA rate. I Secto 3.3 we compare Fto emprcally agast SAG ad SDCA, showg that t coverges faster, partcularly f the permuted access order s used. The relatoshp betwee Fto ad SDCA allows a kd of mdpot algorthm to be costructed, whch has favourable propertes of both methods. We call ths mdpot Prox-Fto. It s descrbed Secto 3.6. A earler verso of the work ths chapter has bee publshed as Defazo et al. (204b). 3. The Fto Algorthm As dscussed Chapter 2, we are terested covex fuctos of the form f (w) = 23 Â f (w). =

New Optimisation Methods for Machine Learning Aaron Defazio

New Optimisation Methods for Machine Learning Aaron Defazio New Optmsato Methods for Mache Learg Aaro Defazo A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty October 205 c Aaro Defazo 204 Except where otherwse dcated, ths thess

More information

arxiv: v1 [cs.lg] 22 Feb 2015

arxiv: v1 [cs.lg] 22 Feb 2015 SDCA wthout Dualty Sha Shalev-Shwartz arxv:50.0677v cs.lg Feb 05 Abstract Stochastc Dual Coordate Ascet s a popular method for solvg regularzed loss mmzato for the case of covex losses. I ths paper we

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier Baa Classfcato CS6L Data Mg: Classfcato() Referece: J. Ha ad M. Kamber, Data Mg: Cocepts ad Techques robablstc learg: Calculate explct probabltes for hypothess, amog the most practcal approaches to certa

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

An Introduction to. Support Vector Machine

An Introduction to. Support Vector Machine A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork

More information

5 Short Proofs of Simplified Stirling s Approximation

5 Short Proofs of Simplified Stirling s Approximation 5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

ECON 5360 Class Notes GMM

ECON 5360 Class Notes GMM ECON 560 Class Notes GMM Geeralzed Method of Momets (GMM) I beg by outlg the classcal method of momets techque (Fsher, 95) ad the proceed to geeralzed method of momets (Hase, 98).. radtoal Method of Momets

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

Investigating Cellular Automata

Investigating Cellular Automata Researcher: Taylor Dupuy Advsor: Aaro Wootto Semester: Fall 4 Ivestgatg Cellular Automata A Overvew of Cellular Automata: Cellular Automata are smple computer programs that geerate rows of black ad whte

More information

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method

More information

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation PGE 30: Formulato ad Soluto Geosystems Egeerg Dr. Balhoff Iterpolato Numercal Methods wth MATLAB, Recktewald, Chapter 0 ad Numercal Methods for Egeers, Chapra ad Caale, 5 th Ed., Part Fve, Chapter 8 ad

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary

More information

L5 Polynomial / Spline Curves

L5 Polynomial / Spline Curves L5 Polyomal / Sple Curves Cotets Coc sectos Polyomal Curves Hermte Curves Bezer Curves B-Sples No-Uform Ratoal B-Sples (NURBS) Mapulato ad Represetato of Curves Types of Curve Equatos Implct: Descrbe a

More information

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific CIS 54 - Iterpolato Roger Crawfs Basc Scearo We are able to prod some fucto, but do ot kow what t really s. Ths gves us a lst of data pots: [x,f ] f(x) f f + x x + August 2, 25 OSU/CIS 54 3 Taylor s Seres

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Eulerian numbers revisited : Slices of hypercube

Eulerian numbers revisited : Slices of hypercube Eulera umbers revsted : Slces of hypercube Kgo Kobayash, Hajme Sato, Mamoru Hosh, ad Hroyosh Morta Abstract I ths talk, we provde a smple proof o a terestg equalty coectg the umber of permutatos of,...,

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

This lecture and the next. Why Sorting? Sorting Algorithms so far. Why Sorting? (2) Selection Sort. Heap Sort. Heapsort

This lecture and the next. Why Sorting? Sorting Algorithms so far. Why Sorting? (2) Selection Sort. Heap Sort. Heapsort Ths lecture ad the ext Heapsort Heap data structure ad prorty queue ADT Qucksort a popular algorthm, very fast o average Why Sortg? Whe doubt, sort oe of the prcples of algorthm desg. Sortg used as a subroute

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Logistic regression (continued)

Logistic regression (continued) STAT562 page 138 Logstc regresso (cotued) Suppose we ow cosder more complex models to descrbe the relatoshp betwee a categorcal respose varable (Y) that takes o two (2) possble outcomes ad a set of p explaatory

More information

Regression and the LMS Algorithm

Regression and the LMS Algorithm CSE 556: Itroducto to Neural Netorks Regresso ad the LMS Algorthm CSE 556: Regresso 1 Problem statemet CSE 556: Regresso Lear regresso th oe varable Gve a set of N pars of data {, d }, appromate d b a

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory ROAD MAP... AE301 Aerodyamcs I UNIT C: 2-D Arfols C-1: Aerodyamcs of Arfols 1 C-2: Aerodyamcs of Arfols 2 C-3: Pael Methods C-4: Th Arfol Theory AE301 Aerodyamcs I Ut C-3: Lst of Subects Problem Solutos?

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

A New Family of Transformations for Lifetime Data

A New Family of Transformations for Lifetime Data Proceedgs of the World Cogress o Egeerg 4 Vol I, WCE 4, July - 4, 4, Lodo, U.K. A New Famly of Trasformatos for Lfetme Data Lakhaa Watthaacheewakul Abstract A famly of trasformatos s the oe of several

More information

Algorithms Design & Analysis. Hash Tables

Algorithms Design & Analysis. Hash Tables Algorthms Desg & Aalyss Hash Tables Recap Lower boud Order statstcs 2 Today s topcs Drect-accessble table Hash tables Hash fuctos Uversal hashg Perfect Hashg Ope addressg 3 Symbol-table problem Symbol

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

1 Convergence of the Arnoldi method for eigenvalue problems

1 Convergence of the Arnoldi method for eigenvalue problems Lecture otes umercal lear algebra Arold method covergece Covergece of the Arold method for egevalue problems Recall that, uless t breaks dow, k steps of the Arold method geerates a orthogoal bass of a

More information

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem Joural of Amerca Scece ;6( Cubc Nopolyomal Sple Approach to the Soluto of a Secod Order Two-Pot Boudary Value Problem W.K. Zahra, F.A. Abd El-Salam, A.A. El-Sabbagh ad Z.A. ZAk * Departmet of Egeerg athematcs

More information

Lecture 2 - What are component and system reliability and how it can be improved?

Lecture 2 - What are component and system reliability and how it can be improved? Lecture 2 - What are compoet ad system relablty ad how t ca be mproved? Relablty s a measure of the qualty of the product over the log ru. The cocept of relablty s a exteded tme perod over whch the expected

More information

Analysis of Lagrange Interpolation Formula

Analysis of Lagrange Interpolation Formula P IJISET - Iteratoal Joural of Iovatve Scece, Egeerg & Techology, Vol. Issue, December 4. www.jset.com ISS 348 7968 Aalyss of Lagrage Iterpolato Formula Vjay Dahya PDepartmet of MathematcsMaharaja Surajmal

More information

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set. Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean Research Joural of Mathematcal ad Statstcal Sceces ISS 30 6047 Vol. 1(), 5-1, ovember (013) Res. J. Mathematcal ad Statstcal Sc. Comparso of Dual to Rato-Cum-Product Estmators of Populato Mea Abstract

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

EECE 301 Signals & Systems

EECE 301 Signals & Systems EECE 01 Sgals & Systems Prof. Mark Fowler Note Set #9 Computg D-T Covoluto Readg Assgmet: Secto. of Kame ad Heck 1/ Course Flow Dagram The arrows here show coceptual flow betwee deas. Note the parallel

More information

Median as a Weighted Arithmetic Mean of All Sample Observations

Median as a Weighted Arithmetic Mean of All Sample Observations Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of

More information

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Prelmary theorem 3. Proof 4. Expla 5. Cocluso. Itroduce The P versus NP problem s a major usolved problem computer scece. Iformally, t asks whether a computer

More information

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis) We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statstcal Learg Teory Lecturer: Tegyu Ma Lecture #7 Scrbe: Bra Zag October 5, 08 Revew ad Overvew We wll frst gve a bref revew of wat as bee covered so far I te frst few lectures, we stated

More information

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Strategy 3. Prelmary theorem 4. Proof 5. Expla 6. Cocluso. Itroduce The P vs. NP problem s a major usolved problem computer scece. Iformally, t asks whether

More information

The Occupancy and Coupon Collector problems

The Occupancy and Coupon Collector problems Chapter 4 The Occupacy ad Coupo Collector problems By Sarel Har-Peled, Jauary 9, 08 4 Prelmares [ Defto 4 Varace ad Stadard Devato For a radom varable X, let V E [ X [ µ X deote the varace of X, where

More information

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design Authors: Pradp Basak, Kaustav Adtya, Hukum Chadra ad U.C. Sud Applcato of Calbrato Approach for Regresso Coeffcet Estmato uder Two-stage Samplg Desg Pradp Basak, Kaustav Adtya, Hukum Chadra ad U.C. Sud

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET Abstract. The Permaet versus Determat problem s the followg: Gve a matrx X of determates over a feld of characterstc dfferet from

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Runtime analysis RLS on OneMax. Heuristic Optimization

Runtime analysis RLS on OneMax. Heuristic Optimization Lecture 6 Rutme aalyss RLS o OeMax trals of {,, },, l ( + ɛ) l ( ɛ)( ) l Algorthm Egeerg Group Hasso Platter Isttute, Uversty of Potsdam 9 May T, We wat to rgorously uderstad ths behavor 9 May / Rutme

More information

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018 Chrs Pech Fal Practce CS09 Dec 5, 08 Practce Fal Examato Solutos. Aswer: 4/5 8/7. There are multle ways to obta ths aswer; here are two: The frst commo method s to sum over all ossbltes for the rak of

More information

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems Char for Network Archtectures ad Servces Prof. Carle Departmet of Computer Scece U Müche Aalyss of System Performace IN2072 Chapter 5 Aalyss of No Markov Systems Dr. Alexader Kle Prof. Dr.-Ig. Georg Carle

More information

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 1 Review of Fundamental Statistical Concepts Lecture Revew of Fudametal Statstcal Cocepts Measures of Cetral Tedecy ad Dsperso A word about otato for ths class: Idvduals a populato are desgated, where the dex rages from to N, ad N s the total umber

More information

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006 Mult-Layer Refracto Problem Rafael Espercueta, Bakersfeld College, November, 006 Lght travels at dfferet speeds through dfferet meda, but refracts at layer boudares order to traverse the least-tme path.

More information

A Remark on the Uniform Convergence of Some Sequences of Functions

A Remark on the Uniform Convergence of Some Sequences of Functions Advaces Pure Mathematcs 05 5 57-533 Publshed Ole July 05 ScRes. http://www.scrp.org/joural/apm http://dx.do.org/0.436/apm.05.59048 A Remark o the Uform Covergece of Some Sequeces of Fuctos Guy Degla Isttut

More information