Online l 1 -Dictionary Learning with Application to Novel Document Detection

Size: px
Start display at page:

Download "Online l 1 -Dictionary Learning with Application to Novel Document Detection"

Transcription

1 Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan General Elecric Global Research kasivisw@gmail.com Arindam Banerjee Universiy of Minnesoa banerjee@cs.umn.edu Huahua Wang Universiy of Minnesoa huwang@cs.umn.edu Prem Melville IBM T.J. Wason Research Cener pmelvil@us.ibm.com Absrac Given heir pervasive use, social media, such as Twier, have become a leading source of breaking news. A key ask in he auomaed idenificaion of such news is he deecion of novel documens from a voluminous sream of ex documens in a robus and scalable manner. Moivaed by his challenge, we inroduce he problem of online l -dicionary learning where unlike radiional dicionary learning, which uses squared loss, he l -penaly is used for measuring he reconsrucion error. We presen an efficien online algorihm for his problem based on alernaing direcions mehod of mulipliers, and esablish a sublinear regre bound for his algorihm. Empirical resuls on news-sream and Twier daa, shows ha his online l -dicionary learning algorihm for novel documen deecion gives more han an order of magniude speedup over he previously known bach algorihm, wihou any significan loss in qualiy of resuls. Our algorihm for online l -dicionary learning could be of independen ineres. Inroducion The high volume and velociy of social media, such as blogs and Twier, have propelled hem o he forefron as sources of breaking news. On Twier, i is possible o find he laes updaes on diverse opics, from naural disasers o celebriy deahs; and idenifying such emerging opics has many pracical applicaions, such as in markeing, disease conrol, and naional securiy [8]. The key challenge in auomaic deecion of breaking news, is being able o deec novel documens in a sream of ex; where a documen is considered novel if i is unlike documens seen in he pas. Recenly, his has been made possible by dicionary learning, which has emerged as a powerful daa represenaion framework. In dicionary learning each daa poin y is represened as a sparse linear combinaion Ax of dicionary aoms, where A is he dicionary and x is a sparse vecor [, 6]. A dicionary learning approach can be easily convered ino a novel documen deecion mehod: le A be a dicionary represening all documens ill ime, for a new daa documen y arriving a ime, if one does no find a sparse combinaion x of he dicionary aoms, and he bes reconsrucion Ax yields a large loss, hen y clearly is no well represened by he dicionary A, and is hence novel compared o documens in he pas. A he end of imesep, he dicionary is updaed o represen all he documens ill ime. Par of his wok was done while he auhor was a posdoc a he IBM T.J. Wason Research Cener. H. Wang and A. Banerjee was suppored in par by NSF CAREER gran IIS , NSF grans IIS , 0297, IIS-08283, and NASA gran NNX2AQ39A.

2 Kasiviswanahan e al. [4] presened such a bach dicionary learning approach for deecing novel documens/opics. They used an l -penaly on he reconsrucion error insead of squared loss commonly used in he dicionary learning lieraure as he l -penaly has been found o be more effecive for ex analysis see Secion 3. They also showed his approach ouperforms oher echniques, such as a neares-neighbor approach popular in he relaed area of Firs Sory Deecion [20]. We build upon his work, by proposing an efficien algorihm for online dicionary learning wih l -penaly. Our online dicionary learning algorihm is based on he online alernaing direcions mehod which was recenly proposed by Wang and Banerjee [26] o solve online composie opimizaion problems wih addiional linear equaliy consrains. Tradiional online convex opimizaion mehods such as [32,, 7, 8, 29] require explici compuaion of he subgradien making hem compuaionally expensive o be applied in our high volume ex seing, whereas in our algorihm he subgradiens are compued implicily. The algorihm has simple closed form updaes for all seps yielding a fas and scalable algorihm for updaing he dicionary. Under suiable assumpions o cope wih he non-convexiy of he dicionary learning problem, we esablish an O T regre bound for he objecive, maching he regre bounds of exising mehods [32, 7, 8, 29]. Using his online algorihm for l -dicionary learning, we obain an online algorihm for novel documen deecion, which we empirically validae on radiional news-sreams as well as sreaming daa from Twier. Experimenal resuls show a subsanial speedup over he bach l -dicionary learning based approach of Kasiviswanahan e al. [4], wihou a loss of performance in deecing novel documens. Relaed Work. Online convex opimizaion is an area of acive research and for a deailed survey on he lieraure we refer he reader o [24]. Online dicionary learning was recenly inroduced by Mairal e al. [6] who showed ha i provides a scalable approach for handling large dynamic daases. They considered an l 2 -penaly and showed ha heir online algorihm converges o he minimum objecive value in he sochasic case i.e., wih disribuional assumpions on he daa. However, he ideas proposed in [6] do no ranslae o he l -penaly. The problem of novel documen/opics deecion was also addressed by a recen work of Saha e al. [22], where hey proposed a non-negaive marix facorizaion based approach for capuring evolving and novel opics. However, heir algorihm operaes over a sliding ime window does no have online regre guaranees and works only for l 2 -penaly. 2 Preliminaries Noaion. Vecors are always column vecors and are denoed by boldface leers. For a marix Z is norm, Z = i,j z ij and Z 2 F = ij z2 ij. For arbirary real marices he sandard inner produc is defined as Y, Z = TrY Z. We use Ψ max Z o denoe he larges eigenvalue of Z Z. For a scalar r R, le signr = if r > 0, if r < 0, and 0 if r = 0. Define sofr, T = signr max{ r T, 0}. The operaors sign and sof are exended o a marix by applying i o every enry in he marix. 0 m n denoes a marix of all zeros of size m n and he subscrip is omied when he dimension of he represened marix is clear from he conex. Dicionary Learning Background. Dicionary learning is he problem of esimaing a collecion of basis vecors over which a given daa collecion can be accuraely reconsruced, ofen wih sparse encodings. I falls ino a general caegory of echniques known as marix facorizaion. Classic dicionary learning echniques for sparse represenaion see [, 9, 6] and references herein consider a finie raining se of signals P = [p,..., p n ] R m n and opimize he empirical cos funcion which is defined as fa = n i= lp i, A, where l, is a loss funcion such ha lp i, A should be small if A is good a represening he signal p i in a sparse fashion. Here, A R m k is referred o as he dicionary. In his paper, we use a l -loss funcion wih an l -regularizaion erm, and our lp i, A = min x p i Ax + λ x, where λ is he regularizaion parameer. We define he problem of dicionary learning as ha of minimizing he empirical cos fa. In oher words, he dicionary learning is he following opimizaion problem def n min fa = fa, X = min lp i, A = min P AX + λ X. A A,X A,X i= For mainaining inerpreabiliy of he resuls, we would addiionally require ha he A and X marices be non-negaive. To preven A from being arbirarily large which would lead o arbirarily 2

3 small values of X, we add a scaling consan on A as follows. Le A be he convex se of marices defined as A = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π A o denoe he Euclidean projecion ono he neares poin in he convex se A. The resuling opimizaion problem can be wrien as min P AX + λ X A A,X 0 The opimizaion problem is in general non-convex. Bu if one of he variables, eiher A or X is known, he objecive funcion wih respec o he oher variable becomes a convex funcion in fac, can be ransformed ino a linear program. 3 Novel Documen Deecion Using Dicionary Learning In his secion, we describe he problem of novel documen deecion and explain how dicionary learning could be used o ackle his problem. Our problem seup is similar o [4]. Novel Documen Deecion Task. We assume documens arrive in sreams. Le {P : P R m n, =, 2, 3,... } denoe a sequence of sreaming marices where each column of P represens a documen arriving a ime. Here, P represens he erm-documen marix observed a ime. Each documen is represened is some convenional vecor space model such as TF-IDF [7]. The could be a any granulariy, e.g., i could be he day ha he documen arrives. We use n o represen he number of documens arriving a ime. We normalize P such ha each column documen in P has a uni l -norm. For simpliciy in exposiion, we will assume ha m = m for all. We use he noaion P [] o denoe he erm-documen marix obained by verically concaenaing he marices P,..., P, i.e., P [] = [P P 2... P ]. Le N be he number of documens arriving a ime, hen P [] R m N. Under his seup, he goal of novel documen deecion is o idenify documens in P ha are dissimilar o he documens in P [ ]. Sparse Coding o Deec Novel Documens. Le A R m k represen he dicionary marix afer ime ; where dicionary A is a good basis o represen of all he documens in P [ ]. The exac consrucion of he dicionary is described laer. Now, consider a documen y R m appearing a ime. We say ha i admis a sparse represenaion over A, if y could be well approximaed as a linear combinaion of few columns from A. Modeling a vecor wih such a sparse decomposiion is known as sparse coding. In mos pracical siuaions i may no be possible o represen y as A x, e.g., if y has new words which are absen in A. In such cases, one could represen y = A x + e where e is an unknown noise vecor. We consider he following sparse coding formulaion ly, A = min x 0 y A x + λ x. 2 The formulaion 2 naurally akes ino accoun boh he reconsrucion error wih he y A x erm and he complexiy of he sparse decomposiion wih he x erm. The reconsrucion error measures he qualiy of he approximaion while he complexiy is measured by he l -norm of he opimal x. I is quie easy o ransform 2 ino a linear program. Hence, i can be solved using a variey of mehods. In our experimens, we use he alernaing direcions mehod of mulipliers ADMM [3] o solve 2. ADMM has recenly gahered significan aenion in he machine learning communiy due o is wide applicabiliy o a range of learning problems wih complex objecive funcions [3] see Appendix A for a brief background on ADMM. We can use sparse coding o deec novel documens as follows. For each documen y arriving a ime, we do he following. Firs, we solve 2 o check wheher y could be well approximaed as a sparse linear combinaion of he aoms of A. If he objecive value ly, A is big hen we mark he documen as novel, oherwise we mark he documen as non-novel. Since, we have normalized all documens in P o uni l -lengh, he objecive values are in he same scale. Choice of he Error Funcion. A very common choice of reconsrucion error is he l 2 -penaly. In fac, in he presence of isoopic Gaussian noise he l 2 -penaly on e = y A x gives he maximum As new documens come in and new erms are idenified, we expand he vocabulary and zero-pad he previous marices so ha a he curren ime, all previous and curren documens have a represenaion over he same vocabulary space. 3

4 likelihood esimae of x [28, 30]. However, for ex documens, he noise vecor e rarely saisfies he Gaussian assumpion, as some of is coefficiens conain large, impulsive values. For example, in fields such as poliics and spors, a cerain erm may become suddenly dominan in a discussion [4]. In such cases imposing an l -penaly on he error is a beer choice han imposing an l 2 -penaly e.g., recen research [28, 3, 27] have successfully shown he superioriy of l over l 2 penaly for a differen bu relaed applicaion domain of face recogniion. We empirically validae he superioriy of using he l -penaly for novel documen deecion in Secion 5. Size of he Dicionary. Ideally, in our applicaion seing, changing he size of he dicionary k dynamically wih would lead o a more efficien and effecive sparse coding. However, in our heoreical analysis, we make he simplifying assumpion ha k is a consan independen of. In our experimens, we allow for small increases in he size of he dicionary over ime when required. The problem of designing an adapive dicionary whose size auomaically increase or decrease over ime is an ineresing open problem. Bach Algorihm for Novel Documen Deecion. We now describe a simple bach algorihm slighly modified from [4] for deecing novel documens. The Algorihm BATCH alernaes beween a novel documen deecion and a bach dicionary learning sep. Algorihm : BATCH Inpu: P [ ] R m N, P = [p,..., p n ] R m n, A R m k, λ 0, ζ 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x + λ x if p j A x j + λ x j > ζ Mark p j as novel Bach Dicionary Learning Sep: Se P [] [P [ ] p,..., p n ] Solve: [A +, X [] ] = argmin A A,X 0 P [] AX + λ X Bach Dicionary Learning. We now describe he bach dicionary learning sep. A ime, he dicionary learning sep is 2 [A +, X [] ] = argmin A A,X 0 P [] AX + λ X. 3 Even hough concepually simple, Algorihm BATCH is compuaionally inefficien. The boleneck comes in he dicionary learning sep. As increases, so does he size of P [], so solving 3 becomes prohibiive even wih efficien opimizaion echniques. To achieve compuaional efficiency, in [4], he auhors solved an approximaion of 3 where in he dicionary learning sep hey only updae he A s and no he X s. 3 This leads o faser running imes, bu because of he approximaion, he qualiy of he dicionary degrades over ime and he performance of he algorihm decreases. In his paper, we propose an online learning algorihm for 3 and show ha his online algorihm is boh compuaionally efficien and generaes good qualiy dicionaries under reasonable assumpions. 4 Online l -Dicionary Learning In his secion, we inroduce he online l -dicionary learning problem and propose an efficien algorihm for i. The sandard goal of online learning is o design algorihms whose regre is sublinear in ime T, since his implies ha on he average he algorihm performs as well as he bes fixed sraegy in hindsigh [24]. Now consider he l -dicionary learning problem defined in 3. Since his problem is non-convex, i may no be possible o design efficien i.e., polynomial running ime algorihms ha solves i wihou making any assumpions on eiher he dicionary A or he sparse code X. This also means ha i may no be possible o design efficien online algorihm wih 2 In our algorihms, i is quie sraighforward o replace he condiion A A by some oher condiion A C, where C is some closed non-empy convex se. 3 In paricular, define recursively X [] = [ X [ ] x,..., x n ] where x j s are coming from he novel documen deecion sep a ime. In [4], he dicionary learning sep is A + = argmin A A P [] A X []. 4

5 sublinear regre wihou making any assumpions on eiher A or X because an efficien online algorihm wih sublinear regre would imply an efficien algorihm for solving in he offline case. Therefore, we focus on obaining regre bounds for he dicionary updae, assuming ha he a each imesep he sparse codes given o he bach and online algorihms are close. This moivaes he following problem. Definiion 4. Online l -Dicionary Learning Problem. A ime, he online algorihm picks  + A. Then, he naure adversary reveals P +, ˆX + wih P + R m n and ˆX + R k n. The problem is o pick he Â+ sequence such ha he following regre funcion is minimized 4 RT = = P  ˆX min A A P AX, = where ˆX = X + E and E is an error marix dependen on. The regre defined above admis he discrepancy beween he sparse coding marices supplied o he bach and online algorihms hrough he error marix. The reason for his generaliy is because in our applicaion seing, he sparse coding marices used for updaing he dicionaries of he bach and online algorihms could be differen. We will laer esablish he condiions on E s under which we can achieve sublinear regre. 4. Online l -Dicionary Algorihm In his secion, we design an algorihm for he online l -dicionary learning problem, which we call Online Inexac ADMM OIADMM 5 and bound is regre. Firsly noe ha because of he non-smooh l -norms involved i is compuaionally expensive o apply sandard online learning algorihms like online gradien descen [32, ], COMID [8], FOBOS [7], and RDA [29], as hey require compuing a cosly subgradien a every ieraion. The subgradien of P AX a A = Ā is X signx Ā P. Our algorihm for online l -dicionary learning is based on he online alernaing direcion mehod which was recenly proposed by Wang e al. [26]. Our algorihm firs performs a simple variable subsiuion by inroducing an equaliy consrain. The updae for each of he resuling variable has a closed-form soluion wihou he need of esimaing he subgradiens explicily. Algorihm 2 : OIADMM Inpu: P R m n,  R m k, R m n, ˆX R k n, β 0, τ 0 Γ P  ˆX Γ + = argmin Γ Γ +, Γ Γ + β /2 Γ Γ 2 F Γ + = sof Γ + /β, /β G + /β + Γ Γ + ˆX  + = argmin A A β G +, A  + /2τ A  2 F Â+ = Π A max{0,  τ G + } + = + β P Â+ ˆX Γ + Reurn Â+ and + The Algorihm OIADMM is simple. Consider he following minimizaion problem a ime min P A ˆX. A A We can rewrie his above minimizaion problem as: min Γ such ha P A ˆX = Γ. 4 A A,Γ 4 For ease of presenaion and analysis, we will assume ha m and n don vary wih ime. One could allow for changing m and n by carefully adjusing he size of he marices by zero-padding. 5 The reason for naming i OIADMM is because he algorihm is based on alernaing direcions mehod of mulipliers ADMM procedure. See Appendix A for a brief background on ADMM. 5

6 The augmened Lagrangian of 4 is: LA, Γ, = min A A,Γ Γ +, P A ˆX Γ + β 2 where R m n is a muliplier and β > 0 is a penaly parameer. P A ˆX Γ 2 F, 5 OIADMM is summarized in Algorihm 2. The algorihm generaes a sequence of ieraes {Γ, A, } =. A each ime, insead of solving 4 compleely, i only runs one sep ADMM updae of he variables Γ, A,. Le Γ = P Â ˆX. The updae seps are as follows. i. Firs for a fixed A = Â and, Γ ha minimizes 5 could be obained by solving argmin Γ Γ +, Γ Γ + β /2 Γ Γ 2 F. The Γ ha minimizes his opimizaion problem is se as Γ +. ii. Using Γ = Γ + and, a simple manipulaion shows ha we can obain he A ha minimizes 5 by solving min A A Insead of solving 6 exacly, we approximae i by β 2 P A ˆX Γ + + min β G +, A Â + /2τ A Â 2 F, A A β 2 F. 6 where τ > 0 is a proximal parameer and G + is he gradien of P A ˆX Γ + + /β 2 F a A = Â. The above approach belongs o he class of proximal gradien mehods in opimizaion [25, 3]. The A ha minimizes his opimizaion problem is se as Â+. iii. Updae as + = + β P Â+ ˆX Γ +. Equaliy Consrain Violaion. OIADMM could emporary violae he equaliy consrain in 4, bu saisfies he consrain on average in he long run. More formally, a each ime i could happen ha Â+ and Γ + produced by OIADMM is such ha P Â+ ˆX Γ +. However, we show see Theorem 4.7 ha he algorihm has he propery ha Γ + P + Â+ ˆX 2 2 = O T, = which implies ha over ime, on average, he equaliy consrain 4 ges saisfied. 4.2 Analysis of OIADMM Firs, Le us recap he OIADMM updae rules. Γ + = argmin Γ Â + = argmin A A Le A op be he opimum soluion o he bach problem Γ +, Γ Γ + β 2 Γ Γ 2 F. 7 β G +, A Â + 2τ A Â 2 F, 8 + = + β P Â+X Γ +. 9 min A A P AX. = Le Γ = P Â ˆX and Γ = P Â+ ˆX. For any, A A, le Γ = P A ˆX. The lemmas below hold for any A A so in paricular i holds for A se as A op. 6

7 Proof Flow. Alhough he algorihm is relaively simple, he analysis is somewha involved. Define, = P A op X. Then he regre of he OIADMM is Γ op RT = = Γ Γ op. We spli he proof ino hree echnical lemmas. We firs upper bound, Γ Γ Lemma 4.3, and use i o bound Γ + Γ Lemma 4.4. In he proof of Lemma 4.5, we bound Γ Γ + and his when added o he bound on Γ + Γ from Lemma 4.4 gives a bound on Γ Γ. The proof of he regre bound uses a canceling elescoping sum on he bound on Γ Γ. We use he following simple inequaliy in our proofs. Lemma 4.2. For marices M, M 2, M 3, M 4 R m n, we have he following 2 M M 2, M 3 M 4 = M M 4 2 F + M 2 M 3 2 F M M 3 2 F M 2 M 4 2 F. Lemma 4.3. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have, Γ Γ β A 2τ Â 2 F A Â+ 2 F Γ Γ + 2 F Γ + Γ 2 F Γ Γ 2 F + β 2 β 2 Ψ max τ ˆX Â+ Â 2 F. Proof. For any A A, 8 is equivalen o he following variaional inequaliy [2]: β G + + τ Â+ Â, A Â Using Γ = P Â+ ˆX and subsiuing for G +, we have β G +, A Â+ = β /β + Γ Γ + ˆX, A Â+ = β /β + Γ Γ +, Â+ ˆX A ˆX Subsiuing ino 0 and rearranging he erms yield = β /β + Γ Γ +, P A ˆX P Â+ ˆX =, Γ Γ + β Γ Γ +, Γ Γ., Γ Γ β Γ Γ +, Γ Γ + β τ Â+ Â, A Â+. 2 By using Lemma 4.2, he firs erm on he righ side can be rewrien as Γ Γ +, Γ Γ = 2 Γ Γ 2 F + Γ Γ + 2 F Γ + Γ 2 F Γ Γ 2 F. Subsiuing he definiions of Γ and Γ, we have Γ Γ 2 F = P Â ˆX P Â+ ˆX 2 F = Â+ Â ˆX 2 F Ψ max ˆX Â+ Â 2 F, 4 Remember ha Ψ max ˆX is he maximum eigenvalue of X X. Using Lemma 4.2, we ge ha he second erm in he righ hand side of 2 is equivalen o Â+ Â, A Â+ = A 2 Â 2 F A Â+ 2 F Â+ Â 2 F. 5 Combining resuls in 2, 3, 4, and 5, we ge he desired bound. 3 7

8 Lemma 4.4. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have Γ + Γ 2 F + 2 F 2β β 2 + β 2τ τ Ψ max ˆX A Â 2 F A Â+ 2 F Â+ Â 2 F β 2 Γ + Γ 2 F. Proof. Le Γ + denoe he subgradien of Γ +. Now Γ + is a minimizer of 7. Therefore, 0 m n Γ + β Γ Γ +. Rearranging he erms gives + β Γ Γ + Γ +. Since Γ + is a convex funcion, we have Γ + Γ + β Γ Γ +, Γ + Γ, Γ + Γ +, Γ Γ + β Γ Γ +, Γ + Γ. 6 Using Lemma 4.2, he las erm can be rewrien as β Γ Γ +, Γ + Γ = β Γ 2 Γ 2 F Γ Γ + 2 F Γ + Γ 2 F Combining he inequaliy of Lemma 4.3 wih 7 gives, Γ Γ + β Γ Γ +, Γ + Γ β A 2τ Â 2 F A Â+ 2 F β Ψ max 2 τ ˆX Â+ Â 2 F β 2 Γ + Γ 2 F Γ + Γ 2 F. 8 Since Γ + Γ = + /β, we have, Γ + Γ β 2 Γ + Γ 2 F = 2, F 2β = 2 F + 2 F. 9 2β Plugging 8 and 9 ino 6 yields he resul. Lemma 4.5. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. If τ saisfies 2Ψ max ˆX. Then τ Γ Γ Λ 2 F + 2 F + 2 β F + A 2β 2β 2τ Â 2 F A Â+ 2 F, where Λ Γ. Proof. Le Λ Γ. Therefore, Γ Γ + Λ, Γ Γ +. Now, Therefore, Λ, Γ Γ + = Λ / β, β Γ Γ + 2β Λ 2 F + β 2 Γ Γ + 2 F 7 Γ Γ + 2β Λ 2 F + β 2 Γ Γ + 2 F. 20 Adding 20 and he inequaliy of Lemma 4.4 ogeher we ge Γ Γ 2β Λ 2 F + 2β 2 F + 2 F + β 2τ β 2 A Â 2 F A Â+ 2 F Â+ Â 2 F. τ Ψ max ˆX 8

9 Seing /τ 2Ψ max ˆX means ha β /2 τ Ψ max ˆX Â+  2 F 0, Therefore, Γ Γ Λ 2 F + 2 F + 2 β F + A 2β 2β 2τ  2 F A Â+ 2 F, Theorem 4.6. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure and RT be he regre as defined above. Assume he following condiions hold: i, he Frobenius norm of Γ is upper bounded by Φ, ii  = 0 m k, A op F D, iii = 0 m n, and iv, /τ 2Ψ max ˆX. Seing, β = Φ D τm T where τ m = max {/τ }, we have RT ΦD T τm + A op E. Proof. Subsiuing, Γ op = P A op ˆX for Γ and A op for A in Lemma 4.5. Se β = Φ D τm T. Γ Γ op = = = Λ 2 F + 2 F + 2 β F + A op 2β 2β 2τ  2 F A op Â+ 2 F D 2Φ Λ 2 D F + τ m T 2Φ 2 F + 2 F + Φ T τ = m T 2D τ = m D 2Φ τ m T T D Φ2 + 2Φ τ m T 2 F + Φ T 2D A op τ  2 F m D T Φ D T Φ τ m 2 τ m = D T Φ τm. Since we have hen Γ op Γ op A op  2 F A op Â+ 2 F = = P A op X = P A op ˆX + E = Γ op A op E, + A op E Γ op. The regre is bounded as follows: RT = = Γ Γ op ΦD T + τm A op E. = Remark. In he above heorem one could replace τ m by any upper bound on i i.e., we don need o know τ m exacly. Condiion on E s for Sublinear Regre. In a sandard online learning seing, he P, ˆX made available o he online learning algorihm will be he same as P, X made available o he bach dicionary learning algorihm in hindsigh, so ha ˆX = X E = 0, yielding a O T regre. More generally, as long as E p = ot = for some suiable p-norm, we ge a sublinear regre bound. 6 For example, if {Z } is a sequence of marices such ha for all, Z p = O, hen seing E = ɛ Z, ɛ > 0 yields a sublinear regre. 6 This follows from Hölder s inequaliy which gives T = Aop E A op T q = E p for p, q and /p + /q =, and by he assuming A op q is bounded. Here, p denoes Schaen p-norm. 9

10 This gives a sufficien condiion for sublinear regre 7 and i is an ineresing open problem o exend he analysis o oher cases. As menioned in Secion 4., OIADMM can violae he equaliy consrain a each i.e., P  + ˆX Γ +. However, we show in Theorem 4.7 ha he accumulaed loss caused by he violaion of equaliy consrain is sublinear in T, i.e., he equaliy consrain is saisfied on average in he long run. Theorem 4.7. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. Assume he following condiions hold: i, he Frobenius norm of Γ is upper bounded by Φ, ii  = 0 m k, A op F D, iii = 0 m n, and iv, /τ 2Ψ max ˆX. Seing, β = Φ D τm T where τ m = max {/τ }, we have = Γ + P + Â+ ˆX 2 2 2D2 τ m + 4ΥD T Φ. τ m Proof. Le Γ = P Â+ ˆX. Le us look a Γ + Γ 2 F. Γ + Γ 2 F = Γ + Γ + Γ Γ 2 F 2 Γ + Γ 2 F + Γ Γ 2 F 2 Γ + Γ 2 F + Ψ max ˆX Â+  2 F. 2 For he firs inequaliy, we used he simple fac ha for any wo marices M and M 2 M M 2 2 F 2 M 2 F + M 2 2 F. The second inequaliy is because of 4. Firsly, since Γ + 0 Γ + Γ op Γ op Υ. Using his and rearranging erms in he inequaliy of Lemma 4.4 wih A op insead of A gives Γ + Γ 2 F β 2 2 F + 2 F + A op τ  2 F A op Â+ 2 F Ψ max τ ˆX Â+  2 F + 2Υ, β Plugging his ino 2 yields Γ + Γ 2 F 2 β 2 2 F F + A op τ  2 F A op Â+ 2 F 2 2Ψ max τ ˆX Â+  2 F + 4Υ. β Leing /τ 2Ψ max ˆX and summing over from 0 o T and simplifying he resuling equaion we ge Γ + Γ 2 F 2D2 + 4ΥD T τ m Φ. τ m = Again, as was he case wih Theorem 4.6, we could replace τ m in he above heorem by any upper bound on i. Running Time. For he ih column in he dicionary marix he projecion ono A can be done in Os i log m ime where s i is he number of non-zero elemens in he ih column using he projecion ono l -ball algorihm of Duchi e al. [6]. The simples implemenaion of OIADMM akes Omnk ime a each imesep because of he marix muliplicaions involved. However, in pracice, we can exploi he sparsiies of he marices o make he algorihm run much faser. OIADMM is also memory efficien, as a each ime, oher han he curren ieraes, i only need  from previous imeseps. 7 In a differen conex, a similar assumpion on he rae of error decay appeared in a recen paper by Schmid e al. [23] while analyzing he convergence raes of inexac proximal gradien mehods. 0

11 5 Experimenal Resuls In his secion, we presen experimens o compare and conras he performance of l -bach and l -online dicionary learning algorihms for he ask of novel documen deecion. We also presen resuls highlighing he superioriy of using an l - over an l 2 -penaly on he reconsrucion error for his ask validaing he discussion in Secion 3. Implemenaion of BATCH. In our implemenaion, we grow he dicionary size by η in each imesep. Growing he dicionary size is essenial for he bach algorihm because as increases he number of columns of P [] also increases, and herefore, a larger dicionary is required o compacly represen all he documens in P []. For solving 3, we use alernaive minimizaion over he variables. The complee pseudo-code is given Algorihm BATCH-IMPL see Appendix B. The opimizaion problems arising in he sparse coding and dicionary learning seps are solved using ADMM s. Online Algorihm for Novel Documen Deecion. Our online algorihm Algorihm 8 uses he same novel documen deecion sep as Algorihm BATCH, bu dicionary learning is done using OIADMM. Algorihm 3 : Inpu: P = [p,..., p n ] R m n, Â R m k, R m n, λ 0, ζ 0, β 0, τ 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j Âx + λ x if p j Âx j + λ x j > ζ Mark p j as novel Online Dicionary Learning Sep: Se ˆX [x,..., x n ] Â+, + OIADMMP, Â,, ˆX, β, τ 9 Noice ha he sparse coding marices of he Algorihm BATCH, X,..., X could be differen from ˆX,..., ˆX. If hese sequence of marices are close o each oher, hen we have a sublinear regre on he objecive funcion. 0 Evaluaion of Novel Documen Deecion. For performance evaluaion, we assume ha documens in he corpus have been manually idenified wih a se of opics. For simpliciy, we assume ha each documen is agged wih he single, mos dominan opic ha i associaes wih, which we call he rue opic of ha documen. We call a documen y arriving a ime novel if he rue opic of y has no appeared before he ime. So a ime, given a se of documens, he ask of novel documen deecion is o classify each documen as eiher novel posiive or non-novel negaive. For evaluaing his classificaion ask, we use he sandard Area Under he ROC Curve AUC [7]. Performance Evaluaion for l -Dicionary Learning. We use a simple reconsrucion error measure for comparing he dicionaries produced by our l -bach and l -online algorihms. We wan he dicionary a ime o be a good basis o represen all he documens in P [] R m N. This leads us o define he sparse reconsrucion error SRE of a dicionary A a ime as SREA def = N min X 0 P [] AX + λ X A dicionary wih a smaller SRE is beer on average a sparsely represening he documens in P []. Novel Documen Deecion using l 2 -dicionary learning. To jusify he choice of using an l - penaly on he reconsrucion error for novel documen deecion, we performed experimens comparing l - vs. l 2 -penaly for his ask. In he l 2 -seing, for he sparse coding sep we used a fas 8 In our experimens, he number of documens inroduced in each imesep is almos of he same order, and hence here is no need o change he size of he dicionary across imeseps for he online algorihm. 9 Before invoking Algorihm OIADMM we may have o zero-pad he marices in he argumens appropriaely. 0 As noed earlier, we can no do a comparison wihou making any assumpions..

12 implemenaion of he LARS algorihm wih posiiviy consrains [9] and he dicionary learning was done by solving a non-negaive marix facorizaion problem wih addiional sparsiy consrains also known as he non-negaive sparse coding problem [3]. A pseudo-code descripion is given in Appendix B. Experimenal Seup. All repored resuls are based on a Malab implemenaion running on a quad-core 2.33 GHz Inel processor wih 32GB RAM. The parameers o our l -online dicionary learning algorihm are: i iniial size of dicionary, ii regularizaion parameer, iii parameers o OIADMM β and τ, and iv ADMM parameers for sparse coding. The l -bach and l 2 -bach dicionary learning algorihm ake an addiional parameer η which describes he increase in he bach dicionary size in each imesep. xthe regularizaion parameer λ is se o 0. which yields reasonable sparsiies in our experimens. OIADMM parameers τ is se /2Ψ max ˆX chosen according o Theorem 4.6 and β is fixed o 5 obained hrough uning. The ADMM parameers for sparse coding and bach dicionary learning are se as suggesed in [4] see Appendix app:admm. In he bach algorihms, we grow he dicionary sizes by η = 0 in each imesep. The hreshold value ζ is reaed as a unable parameer. 5. Experimens on News Sreams Our firs daase is drawn from he NIST Topic Deecion and Tracking TDT2 corpus which consiss of news sories in he firs half of 998. In our evaluaion, we used a se of 9000 documens represened over 9528 erms and disribued ino he op 30 TDT2 human-labeled opics over a period of 27 weeks. We inroduce he documens in groups. A imesep 0, we inroduce he firs 000 documens and hese documens are used for iniializing he dicionary. We use an alernaive minimizaion procedure over he variables of o iniialize he dicionary. In hese experimens he size of he iniial dicionary k = 200. In each subsequen imesep {,..., 8}, we provide he bach and online algorihms he same se of 000 documens. In Figure, we presen novel documen deecion resuls for hose imeseps where a leas one novel documen was inroduced. Table shows he corresponding AUC numbers. The resuls show ha using an l -penaly on he reconsrucion error is beer for novel documen deecion han using an l 2 -penaly. True Posiive Rae Timesep True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 2 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 5 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 6 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep L2 BATCH False Posiive Rae Figure : ROC curves for TDT2 for imeseps where novel documens were inroduced. Timesep No. of Novel Docs. No. of Nonnovel Docs. AUC l -online AUC l -bach AUC l 2-bach Avg Table : AUC Numbers for ROC Plos in Figure. Comparison of he l -online and l -bach Algorihms. The l -online and l -bach algorihms have almos idenical performance in erms of deecing novel documens see Table. However, he online algorihm is much more compuaionally efficien. In Figure 2a, we compare he running imes of hese algorihms. As noed earlier, he running ime of he bach algorihm goes up as increases as i has o opimize over he enire pas. However, he running ime of he online algorihm is independen of he pas and only depends on he number of documens inroduced in each imesep which in his case is always 000. Therefore, he running ime of he online algorihm is almos he same across differen imeseps. As expeced he run-ime gap beween he We used he SPAMS package hp://spams-devel.gforge.inria.fr/ in our implemenaion. 2

13 l -bach and l -online algorihms widen as increases in he firs imesep he online algorihm is 5.4 imes faser, and his rapidly increases o a facor of.5 in jus 7 imeseps. In Figure 2b, we compare he dicionaries produced by he l -bach and l -online algorihms under he SRE meric. In he firs few imeseps, he SRE of he dicionaries produced by he online algorihm is slighly lower han ha of he bach algorihm. However, his ges correced afer a few imeseps and as expeced laer on he bach algorihm produces beer dicionaries. CPU Running Time in mins Running Time Plo for TDT2 L2 BATCH Timesep Sparse Reconsrucion Error SRE Sparse Reconsrucion Error Plo for TDT Timesep Timesep a b c Figure 2: Running ime and SRE plos for TDT2 and Twier daases. CPU Running Time in mins Run Time Plo for Twier Sparse Reconsrucion Error SRE Sparse Reconsrucion Error Plo for Twier Timesep d 5.2 Experimens on Twier Our second daase is from an applicaion of monioring Twier for Markeing and PR for smarphone and wireless providers. We used he Twier Decahose o collec a 0% sample of all wees poss from Sep 5 o Oc 05, 20. From his, we filered he wees relevan o Smarphones using a scheme presened in [4] which uilizes he Wikipedia onology o do he filering. Our daase comprises of wees over hese 2 days and he vocabulary size is 6237 words. We used he wees from Sep 5 o in number o iniialize he dicionaries. Subsequenly, a each imesep, we give as inpu o boh he algorihms all he wees from a given day for a period of 4 days beween Sep 22 o Oc 05. Since his daase is unlabeled, we do a quaniaive evaluaion of l -bach vs. l -online algorihms in erms of SRE and do a qualiaive evaluaion of he l -online algorihm for he novel documen deecion ask. Here, he size of he iniial dicionary k = 00. Figure 2c shows he running imes on he Twier daase. A firs imesep he online algorihm is already 0.8 imes faser, and his speedup escalaes o 8.2 by he 4h imesep. Figure 2d shows he SRE of he dicionaries produced by hese algorihms. In his case, he SRE of he dicionaries produced by he bach algorihm is consisenly beer han ha of he online algorihm, bu as he running ime plos suggess his improvemen comes a a very seep price. Table 2 below shows a represenaive se of novel wees idenified by our online algorihm. In each imesep, insead of hresholding by ζ, we ake he op 0% of wees measured in erms of he sparse coding objecive value and run a dicionary-based clusering, described in [4], on i. Furher pos-processing is done o discard clusers wihou much suppor and o pick a represenaive wee for each cluser. Using his compleely auomaed process, we are able o deec breaking news and rending relevan o he smarphone marke, such as AT&T hroling daa bandwidh, launch of IPhone 4S, and he deah of Seve Jobs. Dae Sample Novel Twees Deeced Using our Online Algorihm Android powered 56 percen of smarphones sold in he las hree monhs. Sad hing is i can lower he raing of ios! How Windows 8 is faser, ligher and more efficien: WP7 Droid Bionic Android HP TouchPad whie ipods U.S. News: AT&T begins sending hroling warnings o op daa hogs: AT&T did away wih is unlimied da... #iphone Can wai for he iphone 4s #Le usalkiphone Everybody pu an iphone up in he air one ime #ripsevejobs Table 2: Sample novel documens deeced by our online algorihm. 6 Conclusion The main conribuion of his paper is a new online l -dicionary learning algorihm, based on which we develop a scalable approach o deecing novel documens in sreams of ex. We esablish a sublinear regre bound, and empirically demonsrae orders of magniude speedup over he bach algorihm, wihou much loss in performance. A furher speedup can be achieved by disribuing 3

14 he algorihm using known echniques [3]. In bach seing, wih he l /l - formulaion, he dual augmened Lagrangian marginally ouperforms he primal augmened Lagrangian in pracice [3]. A priori i is unclear wheher he marginal improvemens observed by [3] carries over o he online seing, bu i is an ineresing open problem. Apar from he arge applicaion of novel documen deecion, our online l -dicionary learning algorihm could have broader applicabiliy o oher asks using ex and beyond, e.g., signal processing [0]. On a differen noe, here are several echniques ha are relaed o dicionary learning, such as Laen Dirichle Allocaion [2], Probabilisic Laen Semanic Analysis [2], and Non-negaive Marix Facorizaion [5], and adaping hese echniques for online deecion of novel documens is a rich area for fuure work. References [] M. Aharon, M. Elad, and A. Brucksein. The K-SVD: An Algorihm for Designing Overcomplee Dicionaries for Sparse Represenaion. IEEE Transacions on Signal Processing, 54, [2] D. Blei, A. Ng, and M. Jordan. Laen Dirichle Allocaion. JMLR, 3: , [3] S. Boyd, N. Parikh, E. Chu, B. Peleao, and J. Ecksein. Disribued Opimizaion and Saisical Learning via he Alernaing Direcion Mehod of Mulipliers. Foundaions and Trends in Machine Learning, 20. [4] V. Chenhamarakshan, P. Melville, V. Sindhwani, and R. D. Lawrence. Concep Labeling: Building Tex Classifiers wih Minimal Supervision. In IJCAI, pages , 20. [5] P. Combees and J. Pesque. Proximal Spliing Mehods in Signal Processing. arxiv: , [6] J. Duchi, S. Shalev-Shwarz, Y. Singer, and T. Chandra. Efficien Projecions ono he l-ball for Learning in High Dimensions. In ICML, pages , [7] J. Duchi and Y. Singer. Efficien Online and Bach Learning using Forward Backward Spliing. JMLR, 0: , [8] J. C. Duchi, S. Shalev-Shwarz, Y. Singer, and A. Tewari. Composie Objecive Mirror Descen. In COLT, pages 4 26, 200. [9] J. Friedman, T. Hasie, H. Hfling, and R. Tibshirani. Pahwise Coordinae Opimizaion. The Annals of Applied Saisics, 2: , [0] Q. Geng and J. Wrigh. On he Local Correcness of l -Minimizaion for Dicionary Learning. Preprin: hp:// jw2966, 20. [] E. Hazan, A. Agarwal, and S. Kale. Logarihmic Regre Algorihms for Online Convex Opimizaion. Machine Learning, 692-3:69 92, [2] T. Hofmann. Probabilisic Laen Semanic Analysis. In UAI, pages , 999. [3] P. O. Hoyer. Non-Negaive Sparse Coding. In IEEE Workshop on Neural Neworks for Signal Processing, pages , [4] S. P. Kasiviswanahan, P. Melville, A. Banerjee, and V. Sindhwani. Emerging Topic Deecion using Dicionary Learning. In CIKM, pages , 20. [5] D. Lee and H. Seung. Learning he Pars of Objecs by Non-negaive Marix Facorizaion. Naure, 999. [6] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Marix Facorizaion and Sparse Coding. JMLR, :9 60, 200. [7] C. Manning, P. Raghavan, and H. Schüze. Inroducion o Informaion Rerieval. Cambridge Universiy Press, [8] P. Melville, J. Leskovec, and F. Provos, ediors. Proceedings of he Firs Workshop on Social Media Analyics. ACM, 200. [9] B. Olshausen and D. Field. Sparse Coding wih an Overcomplee Basis Se: A Sraegy Employed by V? Vision Research, 3723: , 997. [20] S. Perović, M. Osborne, and V. Lavrenko. Sreaming Firs Sory Deecion wih Applicaion o Twier. In HLT 0, pages ACL, 200. [2] R. T. Rockafellar and R. J.-B. Wes. Variaional Analysis. Springer-Verlag, [22] A. Saha and V. Sindhwani. Learning Evolving and Emerging Topics in Social Media: A Dynamic NMF Approach wih Temporal Regularizaion. In WSDM, pages , 202. [23] M. Schmid, N. L. Roux, and F. Bach. Convergence Raes of Inexac Proximal-Gradien Mehods for Convex Opimizaion. In NIPS, pages , 20. 4

15 [24] S. Shalev-Shwarz. Online Learning and Online Convex Opimizaion. Foundaions and Trends in Machine Learning, 42, 202. [25] P. Tseng. Aprroximaion Accuracy, Gradien Mehods, and Error Bound for Srucured Convex Opimizaion. Mahemaical Programming, Series B, 25: , 200. [26] H. Wang and A. Banerjee. Online Alernaing Direcion Mehod. In ICML, 202. [27] J. Wrigh and Y. Ma. Dense Error Correcion Via L-Minimizaion. IEEE Transacions on Informaion Theory, 567: , 200. [28] J. Wrigh, A. Yang, A. Ganesh, S. Sasry, and Y. Ma. Robus Face Recogniion via Sparse Represenaion. IEEE Transacions on Paern Analysis and Machine Inelliegence, 32:20 227, Feb [29] L. Xiao. Dual Averaging Mehods for Regularized Sochasic Learning and Online Opimizaion. JMLR, : , 200. [30] A. Y. Yang, S. S. Sasry, A. Ganesh, and Y. Ma. Fas L-minimizaion Algorihms and an Applicaion in Robus Face Recogniion: A Review. In Inernaional Conference on Image Processing, pages , 200. [3] J. Yang and Y. Zhang. Alernaing Direcion Algorihms for L-Problems in Compressive Sensing. SIAM Journal of Scienific Compuing, 33: , 20. [32] M. Zinkevich. Online Convex Programming and Generalized Infiniesimal Gradien Ascen. In ICML, pages , A Background on ADMM In his secion, we give a brief review of he general framework of ADMM. Le px : R a R and qy : R b R be convex funcions, F R c a, G R c b, and z R c. Consider he following opimizaion problem min x,y px + qy s.. F x + Gy = z, 22 where he variable vecors x and y are separae in he objecive, and coupled only in he consrain. The augmened Lagrangian for he above problem is given by Lx, y, ρ = px + qy + ρ z F x Gy + ϕ 2 z F x Gy 2 2, where ρ R c is he Lagrangian muliplier and ϕ > 0 is a penaly parameer. ADMM uilizes he separabiliy form of 22 and replaces he join minimizaion over x and y wih wo simpler problems. The ADMM firs minimizes L over x, hen over y, and hen applies a proximal minimizaion sep wih respec o he Lagrange muliplier ρ. The enire ADMM procedure is summarized in Algorihm 4. The γ > 0 is a consan. The subscrip i denoes he ih ieraion of he ADMM procedure. The ADMM procedure has been proved o converge o he global opimal soluion under quie broad condiions [5]. Algorihm 4 : ADMM Updae Equaions for Solving 22 Ierae unil convergence x i+ argmin x Lx, y i, ρ i, y i+ argmin y Lx i+, y, ρ i, ρ i+ ρ i + γϕz F x i+ Gy i+. A. ADMM Equaions for updaing X and A s Consider he l -dicionary learning problem min P AX + λ X, A A,X 0 where A is defined in Secion 2. We use he following algorihm from [4] o solve his problem. I is quie easy o adap he ADMM updaes oulined in Algorihm 4 o updae X s and A s, when he oher variable is fixed see e.g., [4]. 5

16 ADMM for updaing X, given fixed A. Here we are given marices P R m n and A R m k, and we wan o solve he following opimizaion problem min P AX + λ X. X 0 Algorihm 5 shows he ADMM updae seps for solving his problem. The enire derivaion is presened in [4] and we are reproducing hem here for compleeness. In our experimens, we se ϕ = 5, κ = /Ψ max A, and γ =.89. These parameers are chosen based on he ADMM convergence resuls presened in [4, 3]. Algorihm 5 : ADMM for Updaing X ADMM procedure for solving min X 0 P AX + λ X Inpu: A R m k, P R m n, λ 0, γ 0, ψ 0, κ 0 X 0 k n, E P, ρ 0 m n for i =, 2,..., o convergence do E i+ sofp AX i + ρ i /ϕ, /ϕ G A AX i + E i+ P ρ i /ϕ X i+ max { X i κg λκ/ϕ, 0 } ρ i+ ρ i + γϕp AX i+ E i+ Reurn X a convergence ADMM for Updaing A, given fixed X. Given inpus P R m n and X R k n, consider he following opimizaion problem min A A P AX. Algorihm 6 : ADMM for Updaing A ADMM procedure for solving min A A P AX Inpu: X R k n, P R m n, γ 0, ψ 0, κ 0 A 0 m k, E P, ρ 0 m n for i =, 2,..., o convergence do E i+ sofp A i X + ρ i /ϕ, /ϕ G A i X + E i+ P ρ i /ϕx A i+ Π A max { A i κg, 0 } ρ i+ ρ i + γϕp A i+ X E i+ Reurn A a convergence When repeaing his opimizaion over muliple imeseps, we use warm sars for faser convergence, i.e., insead of iniializing A o 0 m k, we iniialize A o he dicionary obained a he end of he previous imesep. B Pseudo-Codes from Secion 5 Le us sar by exending he definiion of A, define A k = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. Define A k as A k = {A R m k : A 0 m k j =,..., k, A j 2 }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. 6

17 Algorihm 7 : BATCH-IMPL Inpu: P [ ] R m N, X [ ] R k N, P = [p,..., p n ] R m n, A R m k, λ, ζ, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x + λ x solved using Algorihm 5 if p j A x j + λ x j > ζ Mark p j as novel Bach Dicionary Learning Sep: Se k + k + η Se Z [] [X [ ] x,..., x n ] Se X [] [ ] Z[] 0 η N Se P [] [P [ ] p,..., p n ] for i = o convergence do Solve: A + = argmin A Ak+ P [] AX [] solved using Algorihm 6 wih warm sars Solve: X [] = argmin X 0 P [] A + X + λ X solved using Algorihm 5 Algorihm 8 : L2-BATCH Inpu: P [ ] R m N, P = [p,..., p n ] R m n, A R m k, λ 0, ζ 0, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x 2 + λ x solved using he LARS mehod [9] if p j A x j 2 + λ x j > ζ Mark p j as novel l 2 -bach Dicionary Learning Sep: Se k + k + η Se P [] [P [ ] p,..., p n ] [A +, X [] ] = argmin A Ak+,X 0 P [] AX 2 + λ X non-negaive sparse coding problem 7

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Online l 1 -Dictionary Learning with Application to Novel Document Detection

Online l 1 -Dictionary Learning with Application to Novel Document Detection Online l -Dictionary Learning with Application to Novel Document Detection Shiva Prasad Kasiviswanathan GE Global Research, San Ramon, CA kasivisw@gmail.com Huahua Wang University of Minnesota anteaglew2h@gmail.com

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

A Local Regret in Nonconvex Online Learning

A Local Regret in Nonconvex Online Learning Sergul Aydore Lee Dicker Dean Foser Absrac We consider an online learning process o forecas a sequence of oucomes for nonconvex models. A ypical measure o evaluae online learning policies is regre bu such

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Version April 30, 2004.Submied o CTU Repors. EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Per Krysl Universiy of California, San Diego La Jolla, California 92093-0085,

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION MATH 28A, SUMME 2009, FINAL EXAM SOLUTION BENJAMIN JOHNSON () (8 poins) [Lagrange Inerpolaion] (a) (4 poins) Le f be a funcion defined a some real numbers x 0,..., x n. Give a defining equaion for he Lagrange

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Active Dictionary Learning for Image Representation

Active Dictionary Learning for Image Representation Acive Dicionary Learning for Image Represenaion Tong Wu, Anand D. Sarwae, and Waheed U. Bajwa Deparmen of Elecrical and Compuer Engineering Rugers, The Sae Universiy of New Jersey, Piscaaway, NJ 08854

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization 1 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari, Wei Shi, Qing Ling, and Alejandro Ribeiro Absrac This paper considers decenralized consensus

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Primal-Dual Splitting: Recent Improvements and Variants

Primal-Dual Splitting: Recent Improvements and Variants Primal-Dual Spliing: Recen Improvemens and Varians 1 Thomas Pock and 2 Anonin Chambolle 1 Insiue for Compuer Graphics and Vision, TU Graz, Ausria 2 CMAP & CNRS École Polyechnique, France The proximal poin

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Network Newton Distributed Optimization Methods

Network Newton Distributed Optimization Methods Nework Newon Disribued Opimizaion Mehods Aryan Mokhari, Qing Ling, and Alejandro Ribeiro Absrac We sudy he problem of minimizing a sum of convex objecive funcions where he componens of he objecive are

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 2, NO. 4, DECEMBER 2016 507 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari,

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Refinement of Document Clustering by Using NMF *

Refinement of Document Clustering by Using NMF * Refinemen of Documen Clusering by Using NMF * Hiroyuki Shinnou and Minoru Sasaki Deparmen of Compuer and Informaion Sciences, Ibaraki Universiy, 4-12-1 Nakanarusawa, Hiachi, Ibaraki JAPAN 316-8511 {shinnou,

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Ordinary Differential Equations

Ordinary Differential Equations Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

THE DISCRETE WAVELET TRANSFORM

THE DISCRETE WAVELET TRANSFORM . 4 THE DISCRETE WAVELET TRANSFORM 4 1 Chaper 4: THE DISCRETE WAVELET TRANSFORM 4 2 4.1 INTRODUCTION TO DISCRETE WAVELET THEORY The bes way o inroduce waveles is hrough heir comparison o Fourier ransforms,

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Adaptive Noise Estimation Based on Non-negative Matrix Factorization

Adaptive Noise Estimation Based on Non-negative Matrix Factorization dvanced cience and Technology Leers Vol.3 (ICC 213), pp.159-163 hp://dx.doi.org/1.14257/asl.213 dapive Noise Esimaion ased on Non-negaive Marix Facorizaion Kwang Myung Jeon and Hong Kook Kim chool of Informaion

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Lecture 4: November 13

Lecture 4: November 13 Compuaional Learning Theory Fall Semeser, 2017/18 Lecure 4: November 13 Lecurer: Yishay Mansour Scribe: Guy Dolinsky, Yogev Bar-On, Yuval Lewi 4.1 Fenchel-Conjugae 4.1.1 Moivaion Unil his lecure we saw

More information

Online Sparsifying Transform Learning for Signal Processing

Online Sparsifying Transform Learning for Signal Processing Online Sparsifying ransform Learning for Signal Processing Saiprasad Ravishankar, Bihan Wen, and Yoram Bresler Deparmen of Elecrical and Compuer Engineering and he Coordinaed Science Laboraory Universiy

More information

Latent Spaces and Matrix Factorization

Latent Spaces and Matrix Factorization Compuaional Linguisics Laen Spaces and Marix Facorizaion Dierich Klakow FR 4.7 Allgemeine Linguisik (Compuerlinguisik) Universiä des Saarlandes Summer 0 Goal Goal: rea documen clusering and word clusering

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

ELE 538B: Large-Scale Optimization for Data Science. Introduction. Yuxin Chen Princeton University, Spring 2018

ELE 538B: Large-Scale Optimization for Data Science. Introduction. Yuxin Chen Princeton University, Spring 2018 ELE 538B: Large-Scale Opimizaion for Daa Science Inroducion Yuxin Chen Princeon Universiy, Spring 2018 Surge of daa-inensive applicaions Widespread applicaions in large-scale daa science and learning 2.5

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990),

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990), SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F Trench SIAM J Marix Anal Appl 11 (1990), 601-611 Absrac Le T n = ( i j ) n i,j=1 (n 3) be a real symmeric

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS Xinping Guan ;1 Fenglei Li Cailian Chen Insiue of Elecrical Engineering, Yanshan Universiy, Qinhuangdao, 066004, China. Deparmen

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

The expectation value of the field operator.

The expectation value of the field operator. The expecaion value of he field operaor. Dan Solomon Universiy of Illinois Chicago, IL dsolom@uic.edu June, 04 Absrac. Much of he mahemaical developmen of quanum field heory has been in suppor of deermining

More information

CS376 Computer Vision Lecture 6: Optical Flow

CS376 Computer Vision Lecture 6: Optical Flow CS376 Compuer Vision Lecure 6: Opical Flow Qiing Huang Feb. 11 h 2019 Slides Credi: Krisen Grauman and Sebasian Thrun, Michael Black, Marc Pollefeys Opical Flow mage racking 3D compuaion mage sequence

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information