Online l 1 -Dictionary Learning with Application to Novel Document Detection

Size: px

Start display at page:

Download "Online l 1 -Dictionary Learning with Application to Novel Document Detection"

Maximillian Cole
6 years ago
Views:

1 Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan General Elecric Global Research kasivisw@gmail.com Arindam Banerjee Universiy of Minnesoa banerjee@cs.umn.edu Huahua Wang Universiy of Minnesoa huwang@cs.umn.edu Prem Melville IBM T.J. Wason Research Cener pmelvil@us.ibm.com Absrac Given heir pervasive use, social media, such as Twier, have become a leading source of breaking news. A key ask in he auomaed idenificaion of such news is he deecion of novel documens from a voluminous sream of ex documens in a robus and scalable manner. Moivaed by his challenge, we inroduce he problem of online l -dicionary learning where unlike radiional dicionary learning, which uses squared loss, he l -penaly is used for measuring he reconsrucion error. We presen an efficien online algorihm for his problem based on alernaing direcions mehod of mulipliers, and esablish a sublinear regre bound for his algorihm. Empirical resuls on news-sream and Twier daa, shows ha his online l -dicionary learning algorihm for novel documen deecion gives more han an order of magniude speedup over he previously known bach algorihm, wihou any significan loss in qualiy of resuls. Our algorihm for online l -dicionary learning could be of independen ineres. Inroducion The high volume and velociy of social media, such as blogs and Twier, have propelled hem o he forefron as sources of breaking news. On Twier, i is possible o find he laes updaes on diverse opics, from naural disasers o celebriy deahs; and idenifying such emerging opics has many pracical applicaions, such as in markeing, disease conrol, and naional securiy [8]. The key challenge in auomaic deecion of breaking news, is being able o deec novel documens in a sream of ex; where a documen is considered novel if i is unlike documens seen in he pas. Recenly, his has been made possible by dicionary learning, which has emerged as a powerful daa represenaion framework. In dicionary learning each daa poin y is represened as a sparse linear combinaion Ax of dicionary aoms, where A is he dicionary and x is a sparse vecor [, 6]. A dicionary learning approach can be easily convered ino a novel documen deecion mehod: le A be a dicionary represening all documens ill ime, for a new daa documen y arriving a ime, if one does no find a sparse combinaion x of he dicionary aoms, and he bes reconsrucion Ax yields a large loss, hen y clearly is no well represened by he dicionary A, and is hence novel compared o documens in he pas. A he end of imesep, he dicionary is updaed o represen all he documens ill ime. Par of his wok was done while he auhor was a posdoc a he IBM T.J. Wason Research Cener. H. Wang and A. Banerjee was suppored in par by NSF CAREER gran IIS , NSF grans IIS , 0297, IIS-08283, and NASA gran NNX2AQ39A.

2 Kasiviswanahan e al. [4] presened such a bach dicionary learning approach for deecing novel documens/opics. They used an l -penaly on he reconsrucion error insead of squared loss commonly used in he dicionary learning lieraure as he l -penaly has been found o be more effecive for ex analysis see Secion 3. They also showed his approach ouperforms oher echniques, such as a neares-neighbor approach popular in he relaed area of Firs Sory Deecion [20]. We build upon his work, by proposing an efficien algorihm for online dicionary learning wih l -penaly. Our online dicionary learning algorihm is based on he online alernaing direcions mehod which was recenly proposed by Wang and Banerjee [26] o solve online composie opimizaion problems wih addiional linear equaliy consrains. Tradiional online convex opimizaion mehods such as [32,, 7, 8, 29] require explici compuaion of he subgradien making hem compuaionally expensive o be applied in our high volume ex seing, whereas in our algorihm he subgradiens are compued implicily. The algorihm has simple closed form updaes for all seps yielding a fas and scalable algorihm for updaing he dicionary. Under suiable assumpions o cope wih he non-convexiy of he dicionary learning problem, we esablish an O T regre bound for he objecive, maching he regre bounds of exising mehods [32, 7, 8, 29]. Using his online algorihm for l -dicionary learning, we obain an online algorihm for novel documen deecion, which we empirically validae on radiional news-sreams as well as sreaming daa from Twier. Experimenal resuls show a subsanial speedup over he bach l -dicionary learning based approach of Kasiviswanahan e al. [4], wihou a loss of performance in deecing novel documens. Relaed Work. Online convex opimizaion is an area of acive research and for a deailed survey on he lieraure we refer he reader o [24]. Online dicionary learning was recenly inroduced by Mairal e al. [6] who showed ha i provides a scalable approach for handling large dynamic daases. They considered an l 2 -penaly and showed ha heir online algorihm converges o he minimum objecive value in he sochasic case i.e., wih disribuional assumpions on he daa. However, he ideas proposed in [6] do no ranslae o he l -penaly. The problem of novel documen/opics deecion was also addressed by a recen work of Saha e al. [22], where hey proposed a non-negaive marix facorizaion based approach for capuring evolving and novel opics. However, heir algorihm operaes over a sliding ime window does no have online regre guaranees and works only for l 2 -penaly. 2 Preliminaries Noaion. Vecors are always column vecors and are denoed by boldface leers. For a marix Z is norm, Z = i,j z ij and Z 2 F = ij z2 ij. For arbirary real marices he sandard inner produc is defined as Y, Z = TrY Z. We use Ψ max Z o denoe he larges eigenvalue of Z Z. For a scalar r R, le signr = if r > 0, if r < 0, and 0 if r = 0. Define sofr, T = signr max{ r T, 0}. The operaors sign and sof are exended o a marix by applying i o every enry in he marix. 0 m n denoes a marix of all zeros of size m n and he subscrip is omied when he dimension of he represened marix is clear from he conex. Dicionary Learning Background. Dicionary learning is he problem of esimaing a collecion of basis vecors over which a given daa collecion can be accuraely reconsruced, ofen wih sparse encodings. I falls ino a general caegory of echniques known as marix facorizaion. Classic dicionary learning echniques for sparse represenaion see [, 9, 6] and references herein consider a finie raining se of signals P = [p,..., p n ] R m n and opimize he empirical cos funcion which is defined as fa = n i= lp i, A, where l, is a loss funcion such ha lp i, A should be small if A is good a represening he signal p i in a sparse fashion. Here, A R m k is referred o as he dicionary. In his paper, we use a l -loss funcion wih an l -regularizaion erm, and our lp i, A = min x p i Ax + λ x, where λ is he regularizaion parameer. We define he problem of dicionary learning as ha of minimizing he empirical cos fa. In oher words, he dicionary learning is he following opimizaion problem def n min fa = fa, X = min lp i, A = min P AX + λ X. A A,X A,X i= For mainaining inerpreabiliy of he resuls, we would addiionally require ha he A and X marices be non-negaive. To preven A from being arbirarily large which would lead o arbirarily 2

3 small values of X, we add a scaling consan on A as follows. Le A be he convex se of marices defined as A = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π A o denoe he Euclidean projecion ono he neares poin in he convex se A. The resuling opimizaion problem can be wrien as min P AX + λ X A A,X 0 The opimizaion problem is in general non-convex. Bu if one of he variables, eiher A or X is known, he objecive funcion wih respec o he oher variable becomes a convex funcion in fac, can be ransformed ino a linear program. 3 Novel Documen Deecion Using Dicionary Learning In his secion, we describe he problem of novel documen deecion and explain how dicionary learning could be used o ackle his problem. Our problem seup is similar o [4]. Novel Documen Deecion Task. We assume documens arrive in sreams. Le {P : P R m n, =, 2, 3,... } denoe a sequence of sreaming marices where each column of P represens a documen arriving a ime. Here, P represens he erm-documen marix observed a ime. Each documen is represened is some convenional vecor space model such as TF-IDF [7]. The could be a any granulariy, e.g., i could be he day ha he documen arrives. We use n o represen he number of documens arriving a ime. We normalize P such ha each column documen in P has a uni l -norm. For simpliciy in exposiion, we will assume ha m = m for all. We use he noaion P [] o denoe he erm-documen marix obained by verically concaenaing he marices P,..., P, i.e., P [] = [P P 2... P ]. Le N be he number of documens arriving a ime, hen P [] R m N. Under his seup, he goal of novel documen deecion is o idenify documens in P ha are dissimilar o he documens in P [ ]. Sparse Coding o Deec Novel Documens. Le A R m k represen he dicionary marix afer ime ; where dicionary A is a good basis o represen of all he documens in P [ ]. The exac consrucion of he dicionary is described laer. Now, consider a documen y R m appearing a ime. We say ha i admis a sparse represenaion over A, if y could be well approximaed as a linear combinaion of few columns from A. Modeling a vecor wih such a sparse decomposiion is known as sparse coding. In mos pracical siuaions i may no be possible o represen y as A x, e.g., if y has new words which are absen in A. In such cases, one could represen y = A x + e where e is an unknown noise vecor. We consider he following sparse coding formulaion ly, A = min x 0 y A x + λ x. 2 The formulaion 2 naurally akes ino accoun boh he reconsrucion error wih he y A x erm and he complexiy of he sparse decomposiion wih he x erm. The reconsrucion error measures he qualiy of he approximaion while he complexiy is measured by he l -norm of he opimal x. I is quie easy o ransform 2 ino a linear program. Hence, i can be solved using a variey of mehods. In our experimens, we use he alernaing direcions mehod of mulipliers ADMM [3] o solve 2. ADMM has recenly gahered significan aenion in he machine learning communiy due o is wide applicabiliy o a range of learning problems wih complex objecive funcions [3] see Appendix A for a brief background on ADMM. We can use sparse coding o deec novel documens as follows. For each documen y arriving a ime, we do he following. Firs, we solve 2 o check wheher y could be well approximaed as a sparse linear combinaion of he aoms of A. If he objecive value ly, A is big hen we mark he documen as novel, oherwise we mark he documen as non-novel. Since, we have normalized all documens in P o uni l -lengh, he objecive values are in he same scale. Choice of he Error Funcion. A very common choice of reconsrucion error is he l 2 -penaly. In fac, in he presence of isoopic Gaussian noise he l 2 -penaly on e = y A x gives he maximum As new documens come in and new erms are idenified, we expand he vocabulary and zero-pad he previous marices so ha a he curren ime, all previous and curren documens have a represenaion over he same vocabulary space. 3

4 likelihood esimae of x [28, 30]. However, for ex documens, he noise vecor e rarely saisfies he Gaussian assumpion, as some of is coefficiens conain large, impulsive values. For example, in fields such as poliics and spors, a cerain erm may become suddenly dominan in a discussion [4]. In such cases imposing an l -penaly on he error is a beer choice han imposing an l 2 -penaly e.g., recen research [28, 3, 27] have successfully shown he superioriy of l over l 2 penaly for a differen bu relaed applicaion domain of face recogniion. We empirically validae he superioriy of using he l -penaly for novel documen deecion in Secion 5. Size of he Dicionary. Ideally, in our applicaion seing, changing he size of he dicionary k dynamically wih would lead o a more efficien and effecive sparse coding. However, in our heoreical analysis, we make he simplifying assumpion ha k is a consan independen of. In our experimens, we allow for small increases in he size of he dicionary over ime when required. The problem of designing an adapive dicionary whose size auomaically increase or decrease over ime is an ineresing open problem. Bach Algorihm for Novel Documen Deecion. We now describe a simple bach algorihm slighly modified from [4] for deecing novel documens. The Algorihm BATCH alernaes beween a novel documen deecion and a bach dicionary learning sep. Algorihm : BATCH Inpu: P [ ] R m N, P = [p,..., p n ] R m n, A R m k, λ 0, ζ 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x + λ x if p j A x j + λ x j > ζ Mark p j as novel Bach Dicionary Learning Sep: Se P [] [P [ ] p,..., p n ] Solve: [A +, X [] ] = argmin A A,X 0 P [] AX + λ X Bach Dicionary Learning. We now describe he bach dicionary learning sep. A ime, he dicionary learning sep is 2 [A +, X [] ] = argmin A A,X 0 P [] AX + λ X. 3 Even hough concepually simple, Algorihm BATCH is compuaionally inefficien. The boleneck comes in he dicionary learning sep. As increases, so does he size of P [], so solving 3 becomes prohibiive even wih efficien opimizaion echniques. To achieve compuaional efficiency, in [4], he auhors solved an approximaion of 3 where in he dicionary learning sep hey only updae he A s and no he X s. 3 This leads o faser running imes, bu because of he approximaion, he qualiy of he dicionary degrades over ime and he performance of he algorihm decreases. In his paper, we propose an online learning algorihm for 3 and show ha his online algorihm is boh compuaionally efficien and generaes good qualiy dicionaries under reasonable assumpions. 4 Online l -Dicionary Learning In his secion, we inroduce he online l -dicionary learning problem and propose an efficien algorihm for i. The sandard goal of online learning is o design algorihms whose regre is sublinear in ime T, since his implies ha on he average he algorihm performs as well as he bes fixed sraegy in hindsigh [24]. Now consider he l -dicionary learning problem defined in 3. Since his problem is non-convex, i may no be possible o design efficien i.e., polynomial running ime algorihms ha solves i wihou making any assumpions on eiher he dicionary A or he sparse code X. This also means ha i may no be possible o design efficien online algorihm wih 2 In our algorihms, i is quie sraighforward o replace he condiion A A by some oher condiion A C, where C is some closed non-empy convex se. 3 In paricular, define recursively X [] = [ X [ ] x,..., x n ] where x j s are coming from he novel documen deecion sep a ime. In [4], he dicionary learning sep is A + = argmin A A P [] A X []. 4

5 sublinear regre wihou making any assumpions on eiher A or X because an efficien online algorihm wih sublinear regre would imply an efficien algorihm for solving in he offline case. Therefore, we focus on obaining regre bounds for he dicionary updae, assuming ha he a each imesep he sparse codes given o he bach and online algorihms are close. This moivaes he following problem. Definiion 4. Online l -Dicionary Learning Problem. A ime, he online algorihm picks Â + A. Then, he naure adversary reveals P +, ˆX + wih P + R m n and ˆX + R k n. The problem is o pick he Â+ sequence such ha he following regre funcion is minimized 4 RT = = P Â ˆX min A A P AX, = where ˆX = X + E and E is an error marix dependen on. The regre defined above admis he discrepancy beween he sparse coding marices supplied o he bach and online algorihms hrough he error marix. The reason for his generaliy is because in our applicaion seing, he sparse coding marices used for updaing he dicionaries of he bach and online algorihms could be differen. We will laer esablish he condiions on E s under which we can achieve sublinear regre. 4. Online l -Dicionary Algorihm In his secion, we design an algorihm for he online l -dicionary learning problem, which we call Online Inexac ADMM OIADMM 5 and bound is regre. Firsly noe ha because of he non-smooh l -norms involved i is compuaionally expensive o apply sandard online learning algorihms like online gradien descen [32, ], COMID [8], FOBOS [7], and RDA [29], as hey require compuing a cosly subgradien a every ieraion. The subgradien of P AX a A = Ā is X signx Ā P. Our algorihm for online l -dicionary learning is based on he online alernaing direcion mehod which was recenly proposed by Wang e al. [26]. Our algorihm firs performs a simple variable subsiuion by inroducing an equaliy consrain. The updae for each of he resuling variable has a closed-form soluion wihou he need of esimaing he subgradiens explicily. Algorihm 2 : OIADMM Inpu: P R m n, Â R m k, R m n, ˆX R k n, β 0, τ 0 Γ P Â ˆX Γ + = argmin Γ Γ +, Γ Γ + β /2 Γ Γ 2 F Γ + = sof Γ + /β, /β G + /β + Γ Γ + ˆX Â + = argmin A A β G +, A Â + /2τ A Â 2 F Â+ = Π A max{0, Â τ G + } + = + β P Â+ ˆX Γ + Reurn Â+ and + The Algorihm OIADMM is simple. Consider he following minimizaion problem a ime min P A ˆX. A A We can rewrie his above minimizaion problem as: min Γ such ha P A ˆX = Γ. 4 A A,Γ 4 For ease of presenaion and analysis, we will assume ha m and n don vary wih ime. One could allow for changing m and n by carefully adjusing he size of he marices by zero-padding. 5 The reason for naming i OIADMM is because he algorihm is based on alernaing direcions mehod of mulipliers ADMM procedure. See Appendix A for a brief background on ADMM. 5

6 The augmened Lagrangian of 4 is: LA, Γ, = min A A,Γ Γ +, P A ˆX Γ + β 2 where R m n is a muliplier and β > 0 is a penaly parameer. P A ˆX Γ 2 F, 5 OIADMM is summarized in Algorihm 2. The algorihm generaes a sequence of ieraes {Γ, A, } =. A each ime, insead of solving 4 compleely, i only runs one sep ADMM updae of he variables Γ, A,. Le Γ = P Â ˆX. The updae seps are as follows. i. Firs for a fixed A = Â and, Γ ha minimizes 5 could be obained by solving argmin Γ Γ +, Γ Γ + β /2 Γ Γ 2 F. The Γ ha minimizes his opimizaion problem is se as Γ +. ii. Using Γ = Γ + and, a simple manipulaion shows ha we can obain he A ha minimizes 5 by solving min A A Insead of solving 6 exacly, we approximae i by β 2 P A ˆX Γ + + min β G +, A Â + /2τ A Â 2 F, A A β 2 F. 6 where τ > 0 is a proximal parameer and G + is he gradien of P A ˆX Γ + + /β 2 F a A = Â. The above approach belongs o he class of proximal gradien mehods in opimizaion [25, 3]. The A ha minimizes his opimizaion problem is se as Â+. iii. Updae as + = + β P Â+ ˆX Γ +. Equaliy Consrain Violaion. OIADMM could emporary violae he equaliy consrain in 4, bu saisfies he consrain on average in he long run. More formally, a each ime i could happen ha Â+ and Γ + produced by OIADMM is such ha P Â+ ˆX Γ +. However, we show see Theorem 4.7 ha he algorihm has he propery ha Γ + P + Â+ ˆX 2 2 = O T, = which implies ha over ime, on average, he equaliy consrain 4 ges saisfied. 4.2 Analysis of OIADMM Firs, Le us recap he OIADMM updae rules. Γ + = argmin Γ Â + = argmin A A Le A op be he opimum soluion o he bach problem Γ +, Γ Γ + β 2 Γ Γ 2 F. 7 β G +, A Â + 2τ A Â 2 F, 8 + = + β P Â+X Γ +. 9 min A A P AX. = Le Γ = P Â ˆX and Γ = P Â+ ˆX. For any, A A, le Γ = P A ˆX. The lemmas below hold for any A A so in paricular i holds for A se as A op. 6

7 Proof Flow. Alhough he algorihm is relaively simple, he analysis is somewha involved. Define, = P A op X. Then he regre of he OIADMM is Γ op RT = = Γ Γ op. We spli he proof ino hree echnical lemmas. We firs upper bound, Γ Γ Lemma 4.3, and use i o bound Γ + Γ Lemma 4.4. In he proof of Lemma 4.5, we bound Γ Γ + and his when added o he bound on Γ + Γ from Lemma 4.4 gives a bound on Γ Γ. The proof of he regre bound uses a canceling elescoping sum on he bound on Γ Γ. We use he following simple inequaliy in our proofs. Lemma 4.2. For marices M, M 2, M 3, M 4 R m n, we have he following 2 M M 2, M 3 M 4 = M M 4 2 F + M 2 M 3 2 F M M 3 2 F M 2 M 4 2 F. Lemma 4.3. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have, Γ Γ β A 2τ Â 2 F A Â+ 2 F Γ Γ + 2 F Γ + Γ 2 F Γ Γ 2 F + β 2 β 2 Ψ max τ ˆX Â+ Â 2 F. Proof. For any A A, 8 is equivalen o he following variaional inequaliy [2]: β G + + τ Â+ Â, A Â Using Γ = P Â+ ˆX and subsiuing for G +, we have β G +, A Â+ = β /β + Γ Γ + ˆX, A Â+ = β /β + Γ Γ +, Â+ ˆX A ˆX Subsiuing ino 0 and rearranging he erms yield = β /β + Γ Γ +, P A ˆX P Â+ ˆX =, Γ Γ + β Γ Γ +, Γ Γ., Γ Γ β Γ Γ +, Γ Γ + β τ Â+ Â, A Â+. 2 By using Lemma 4.2, he firs erm on he righ side can be rewrien as Γ Γ +, Γ Γ = 2 Γ Γ 2 F + Γ Γ + 2 F Γ + Γ 2 F Γ Γ 2 F. Subsiuing he definiions of Γ and Γ, we have Γ Γ 2 F = P Â ˆX P Â+ ˆX 2 F = Â+ Â ˆX 2 F Ψ max ˆX Â+ Â 2 F, 4 Remember ha Ψ max ˆX is he maximum eigenvalue of X X. Using Lemma 4.2, we ge ha he second erm in he righ hand side of 2 is equivalen o Â+ Â, A Â+ = A 2 Â 2 F A Â+ 2 F Â+ Â 2 F. 5 Combining resuls in 2, 3, 4, and 5, we ge he desired bound. 3 7

8 Lemma 4.4. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have Γ + Γ 2 F + 2 F 2β β 2 + β 2τ τ Ψ max ˆX A Â 2 F A Â+ 2 F Â+ Â 2 F β 2 Γ + Γ 2 F. Proof. Le Γ + denoe he subgradien of Γ +. Now Γ + is a minimizer of 7. Therefore, 0 m n Γ + β Γ Γ +. Rearranging he erms gives + β Γ Γ + Γ +. Since Γ + is a convex funcion, we have Γ + Γ + β Γ Γ +, Γ + Γ, Γ + Γ +, Γ Γ + β Γ Γ +, Γ + Γ. 6 Using Lemma 4.2, he las erm can be rewrien as β Γ Γ +, Γ + Γ = β Γ 2 Γ 2 F Γ Γ + 2 F Γ + Γ 2 F Combining he inequaliy of Lemma 4.3 wih 7 gives, Γ Γ + β Γ Γ +, Γ + Γ β A 2τ Â 2 F A Â+ 2 F β Ψ max 2 τ ˆX Â+ Â 2 F β 2 Γ + Γ 2 F Γ + Γ 2 F. 8 Since Γ + Γ = + /β, we have, Γ + Γ β 2 Γ + Γ 2 F = 2, F 2β = 2 F + 2 F. 9 2β Plugging 8 and 9 ino 6 yields he resul. Lemma 4.5. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. If τ saisfies 2Ψ max ˆX. Then τ Γ Γ Λ 2 F + 2 F + 2 β F + A 2β 2β 2τ Â 2 F A Â+ 2 F, where Λ Γ. Proof. Le Λ Γ. Therefore, Γ Γ + Λ, Γ Γ +. Now, Therefore, Λ, Γ Γ + = Λ / β, β Γ Γ + 2β Λ 2 F + β 2 Γ Γ + 2 F 7 Γ Γ + 2β Λ 2 F + β 2 Γ Γ + 2 F. 20 Adding 20 and he inequaliy of Lemma 4.4 ogeher we ge Γ Γ 2β Λ 2 F + 2β 2 F + 2 F + β 2τ β 2 A Â 2 F A Â+ 2 F Â+ Â 2 F. τ Ψ max ˆX 8

9 Seing /τ 2Ψ max ˆX means ha β /2 τ Ψ max ˆX Â+ Â 2 F 0, Therefore, Γ Γ Λ 2 F + 2 F + 2 β F + A 2β 2β 2τ Â 2 F A Â+ 2 F, Theorem 4.6. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure and RT be he regre as defined above. Assume he following condiions hold: i, he Frobenius norm of Γ is upper bounded by Φ, ii Â = 0 m k, A op F D, iii = 0 m n, and iv, /τ 2Ψ max ˆX. Seing, β = Φ D τm T where τ m = max {/τ }, we have RT ΦD T τm + A op E. Proof. Subsiuing, Γ op = P A op ˆX for Γ and A op for A in Lemma 4.5. Se β = Φ D τm T. Γ Γ op = = = Λ 2 F + 2 F + 2 β F + A op 2β 2β 2τ Â 2 F A op Â+ 2 F D 2Φ Λ 2 D F + τ m T 2Φ 2 F + 2 F + Φ T τ = m T 2D τ = m D 2Φ τ m T T D Φ2 + 2Φ τ m T 2 F + Φ T 2D A op τ Â 2 F m D T Φ D T Φ τ m 2 τ m = D T Φ τm. Since we have hen Γ op Γ op A op Â 2 F A op Â+ 2 F = = P A op X = P A op ˆX + E = Γ op A op E, + A op E Γ op. The regre is bounded as follows: RT = = Γ Γ op ΦD T + τm A op E. = Remark. In he above heorem one could replace τ m by any upper bound on i i.e., we don need o know τ m exacly. Condiion on E s for Sublinear Regre. In a sandard online learning seing, he P, ˆX made available o he online learning algorihm will be he same as P, X made available o he bach dicionary learning algorihm in hindsigh, so ha ˆX = X E = 0, yielding a O T regre. More generally, as long as E p = ot = for some suiable p-norm, we ge a sublinear regre bound. 6 For example, if {Z } is a sequence of marices such ha for all, Z p = O, hen seing E = ɛ Z, ɛ > 0 yields a sublinear regre. 6 This follows from Hölder s inequaliy which gives T = Aop E A op T q = E p for p, q and /p + /q =, and by he assuming A op q is bounded. Here, p denoes Schaen p-norm. 9

10 This gives a sufficien condiion for sublinear regre 7 and i is an ineresing open problem o exend he analysis o oher cases. As menioned in Secion 4., OIADMM can violae he equaliy consrain a each i.e., P Â + ˆX Γ +. However, we show in Theorem 4.7 ha he accumulaed loss caused by he violaion of equaliy consrain is sublinear in T, i.e., he equaliy consrain is saisfied on average in he long run. Theorem 4.7. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. Assume he following condiions hold: i, he Frobenius norm of Γ is upper bounded by Φ, ii Â = 0 m k, A op F D, iii = 0 m n, and iv, /τ 2Ψ max ˆX. Seing, β = Φ D τm T where τ m = max {/τ }, we have = Γ + P + Â+ ˆX 2 2 2D2 τ m + 4ΥD T Φ. τ m Proof. Le Γ = P Â+ ˆX. Le us look a Γ + Γ 2 F. Γ + Γ 2 F = Γ + Γ + Γ Γ 2 F 2 Γ + Γ 2 F + Γ Γ 2 F 2 Γ + Γ 2 F + Ψ max ˆX Â+ Â 2 F. 2 For he firs inequaliy, we used he simple fac ha for any wo marices M and M 2 M M 2 2 F 2 M 2 F + M 2 2 F. The second inequaliy is because of 4. Firsly, since Γ + 0 Γ + Γ op Γ op Υ. Using his and rearranging erms in he inequaliy of Lemma 4.4 wih A op insead of A gives Γ + Γ 2 F β 2 2 F + 2 F + A op τ Â 2 F A op Â+ 2 F Ψ max τ ˆX Â+ Â 2 F + 2Υ, β Plugging his ino 2 yields Γ + Γ 2 F 2 β 2 2 F F + A op τ Â 2 F A op Â+ 2 F 2 2Ψ max τ ˆX Â+ Â 2 F + 4Υ. β Leing /τ 2Ψ max ˆX and summing over from 0 o T and simplifying he resuling equaion we ge Γ + Γ 2 F 2D2 + 4ΥD T τ m Φ. τ m = Again, as was he case wih Theorem 4.6, we could replace τ m in he above heorem by any upper bound on i. Running Time. For he ih column in he dicionary marix he projecion ono A can be done in Os i log m ime where s i is he number of non-zero elemens in he ih column using he projecion ono l -ball algorihm of Duchi e al. [6]. The simples implemenaion of OIADMM akes Omnk ime a each imesep because of he marix muliplicaions involved. However, in pracice, we can exploi he sparsiies of he marices o make he algorihm run much faser. OIADMM is also memory efficien, as a each ime, oher han he curren ieraes, i only need Â from previous imeseps. 7 In a differen conex, a similar assumpion on he rae of error decay appeared in a recen paper by Schmid e al. [23] while analyzing he convergence raes of inexac proximal gradien mehods. 0

11 5 Experimenal Resuls In his secion, we presen experimens o compare and conras he performance of l -bach and l -online dicionary learning algorihms for he ask of novel documen deecion. We also presen resuls highlighing he superioriy of using an l - over an l 2 -penaly on he reconsrucion error for his ask validaing he discussion in Secion 3. Implemenaion of BATCH. In our implemenaion, we grow he dicionary size by η in each imesep. Growing he dicionary size is essenial for he bach algorihm because as increases he number of columns of P [] also increases, and herefore, a larger dicionary is required o compacly represen all he documens in P []. For solving 3, we use alernaive minimizaion over he variables. The complee pseudo-code is given Algorihm BATCH-IMPL see Appendix B. The opimizaion problems arising in he sparse coding and dicionary learning seps are solved using ADMM s. Online Algorihm for Novel Documen Deecion. Our online algorihm Algorihm 8 uses he same novel documen deecion sep as Algorihm BATCH, bu dicionary learning is done using OIADMM. Algorihm 3 : Inpu: P = [p,..., p n ] R m n, Â R m k, R m n, λ 0, ζ 0, β 0, τ 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j Âx + λ x if p j Âx j + λ x j > ζ Mark p j as novel Online Dicionary Learning Sep: Se ˆX [x,..., x n ] Â+, + OIADMMP, Â,, ˆX, β, τ 9 Noice ha he sparse coding marices of he Algorihm BATCH, X,..., X could be differen from ˆX,..., ˆX. If hese sequence of marices are close o each oher, hen we have a sublinear regre on he objecive funcion. 0 Evaluaion of Novel Documen Deecion. For performance evaluaion, we assume ha documens in he corpus have been manually idenified wih a se of opics. For simpliciy, we assume ha each documen is agged wih he single, mos dominan opic ha i associaes wih, which we call he rue opic of ha documen. We call a documen y arriving a ime novel if he rue opic of y has no appeared before he ime. So a ime, given a se of documens, he ask of novel documen deecion is o classify each documen as eiher novel posiive or non-novel negaive. For evaluaing his classificaion ask, we use he sandard Area Under he ROC Curve AUC [7]. Performance Evaluaion for l -Dicionary Learning. We use a simple reconsrucion error measure for comparing he dicionaries produced by our l -bach and l -online algorihms. We wan he dicionary a ime o be a good basis o represen all he documens in P [] R m N. This leads us o define he sparse reconsrucion error SRE of a dicionary A a ime as SREA def = N min X 0 P [] AX + λ X A dicionary wih a smaller SRE is beer on average a sparsely represening he documens in P []. Novel Documen Deecion using l 2 -dicionary learning. To jusify he choice of using an l - penaly on he reconsrucion error for novel documen deecion, we performed experimens comparing l - vs. l 2 -penaly for his ask. In he l 2 -seing, for he sparse coding sep we used a fas 8 In our experimens, he number of documens inroduced in each imesep is almos of he same order, and hence here is no need o change he size of he dicionary across imeseps for he online algorihm. 9 Before invoking Algorihm OIADMM we may have o zero-pad he marices in he argumens appropriaely. 0 As noed earlier, we can no do a comparison wihou making any assumpions..

12 implemenaion of he LARS algorihm wih posiiviy consrains [9] and he dicionary learning was done by solving a non-negaive marix facorizaion problem wih addiional sparsiy consrains also known as he non-negaive sparse coding problem [3]. A pseudo-code descripion is given in Appendix B. Experimenal Seup. All repored resuls are based on a Malab implemenaion running on a quad-core 2.33 GHz Inel processor wih 32GB RAM. The parameers o our l -online dicionary learning algorihm are: i iniial size of dicionary, ii regularizaion parameer, iii parameers o OIADMM β and τ, and iv ADMM parameers for sparse coding. The l -bach and l 2 -bach dicionary learning algorihm ake an addiional parameer η which describes he increase in he bach dicionary size in each imesep. xthe regularizaion parameer λ is se o 0. which yields reasonable sparsiies in our experimens. OIADMM parameers τ is se /2Ψ max ˆX chosen according o Theorem 4.6 and β is fixed o 5 obained hrough uning. The ADMM parameers for sparse coding and bach dicionary learning are se as suggesed in [4] see Appendix app:admm. In he bach algorihms, we grow he dicionary sizes by η = 0 in each imesep. The hreshold value ζ is reaed as a unable parameer. 5. Experimens on News Sreams Our firs daase is drawn from he NIST Topic Deecion and Tracking TDT2 corpus which consiss of news sories in he firs half of 998. In our evaluaion, we used a se of 9000 documens represened over 9528 erms and disribued ino he op 30 TDT2 human-labeled opics over a period of 27 weeks. We inroduce he documens in groups. A imesep 0, we inroduce he firs 000 documens and hese documens are used for iniializing he dicionary. We use an alernaive minimizaion procedure over he variables of o iniialize he dicionary. In hese experimens he size of he iniial dicionary k = 200. In each subsequen imesep {,..., 8}, we provide he bach and online algorihms he same se of 000 documens. In Figure, we presen novel documen deecion resuls for hose imeseps where a leas one novel documen was inroduced. Table shows he corresponding AUC numbers. The resuls show ha using an l -penaly on he reconsrucion error is beer for novel documen deecion han using an l 2 -penaly. True Posiive Rae Timesep True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 2 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 5 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep 6 True Posiive Rae 0.5 L2 BATCH False Posiive Rae Timesep L2 BATCH False Posiive Rae Figure : ROC curves for TDT2 for imeseps where novel documens were inroduced. Timesep No. of Novel Docs. No. of Nonnovel Docs. AUC l -online AUC l -bach AUC l 2-bach Avg Table : AUC Numbers for ROC Plos in Figure. Comparison of he l -online and l -bach Algorihms. The l -online and l -bach algorihms have almos idenical performance in erms of deecing novel documens see Table. However, he online algorihm is much more compuaionally efficien. In Figure 2a, we compare he running imes of hese algorihms. As noed earlier, he running ime of he bach algorihm goes up as increases as i has o opimize over he enire pas. However, he running ime of he online algorihm is independen of he pas and only depends on he number of documens inroduced in each imesep which in his case is always 000. Therefore, he running ime of he online algorihm is almos he same across differen imeseps. As expeced he run-ime gap beween he We used he SPAMS package hp://spams-devel.gforge.inria.fr/ in our implemenaion. 2

13 l -bach and l -online algorihms widen as increases in he firs imesep he online algorihm is 5.4 imes faser, and his rapidly increases o a facor of.5 in jus 7 imeseps. In Figure 2b, we compare he dicionaries produced by he l -bach and l -online algorihms under he SRE meric. In he firs few imeseps, he SRE of he dicionaries produced by he online algorihm is slighly lower han ha of he bach algorihm. However, his ges correced afer a few imeseps and as expeced laer on he bach algorihm produces beer dicionaries. CPU Running Time in mins Running Time Plo for TDT2 L2 BATCH Timesep Sparse Reconsrucion Error SRE Sparse Reconsrucion Error Plo for TDT Timesep Timesep a b c Figure 2: Running ime and SRE plos for TDT2 and Twier daases. CPU Running Time in mins Run Time Plo for Twier Sparse Reconsrucion Error SRE Sparse Reconsrucion Error Plo for Twier Timesep d 5.2 Experimens on Twier Our second daase is from an applicaion of monioring Twier for Markeing and PR for smarphone and wireless providers. We used he Twier Decahose o collec a 0% sample of all wees poss from Sep 5 o Oc 05, 20. From his, we filered he wees relevan o Smarphones using a scheme presened in [4] which uilizes he Wikipedia onology o do he filering. Our daase comprises of wees over hese 2 days and he vocabulary size is 6237 words. We used he wees from Sep 5 o in number o iniialize he dicionaries. Subsequenly, a each imesep, we give as inpu o boh he algorihms all he wees from a given day for a period of 4 days beween Sep 22 o Oc 05. Since his daase is unlabeled, we do a quaniaive evaluaion of l -bach vs. l -online algorihms in erms of SRE and do a qualiaive evaluaion of he l -online algorihm for he novel documen deecion ask. Here, he size of he iniial dicionary k = 00. Figure 2c shows he running imes on he Twier daase. A firs imesep he online algorihm is already 0.8 imes faser, and his speedup escalaes o 8.2 by he 4h imesep. Figure 2d shows he SRE of he dicionaries produced by hese algorihms. In his case, he SRE of he dicionaries produced by he bach algorihm is consisenly beer han ha of he online algorihm, bu as he running ime plos suggess his improvemen comes a a very seep price. Table 2 below shows a represenaive se of novel wees idenified by our online algorihm. In each imesep, insead of hresholding by ζ, we ake he op 0% of wees measured in erms of he sparse coding objecive value and run a dicionary-based clusering, described in [4], on i. Furher pos-processing is done o discard clusers wihou much suppor and o pick a represenaive wee for each cluser. Using his compleely auomaed process, we are able o deec breaking news and rending relevan o he smarphone marke, such as AT&T hroling daa bandwidh, launch of IPhone 4S, and he deah of Seve Jobs. Dae Sample Novel Twees Deeced Using our Online Algorihm Android powered 56 percen of smarphones sold in he las hree monhs. Sad hing is i can lower he raing of ios! How Windows 8 is faser, ligher and more efficien: WP7 Droid Bionic Android HP TouchPad whie ipods U.S. News: AT&T begins sending hroling warnings o op daa hogs: AT&T did away wih is unlimied da... #iphone Can wai for he iphone 4s #Le usalkiphone Everybody pu an iphone up in he air one ime #ripsevejobs Table 2: Sample novel documens deeced by our online algorihm. 6 Conclusion The main conribuion of his paper is a new online l -dicionary learning algorihm, based on which we develop a scalable approach o deecing novel documens in sreams of ex. We esablish a sublinear regre bound, and empirically demonsrae orders of magniude speedup over he bach algorihm, wihou much loss in performance. A furher speedup can be achieved by disribuing 3

14 he algorihm using known echniques [3]. In bach seing, wih he l /l - formulaion, he dual augmened Lagrangian marginally ouperforms he primal augmened Lagrangian in pracice [3]. A priori i is unclear wheher he marginal improvemens observed by [3] carries over o he online seing, bu i is an ineresing open problem. Apar from he arge applicaion of novel documen deecion, our online l -dicionary learning algorihm could have broader applicabiliy o oher asks using ex and beyond, e.g., signal processing [0]. On a differen noe, here are several echniques ha are relaed o dicionary learning, such as Laen Dirichle Allocaion [2], Probabilisic Laen Semanic Analysis [2], and Non-negaive Marix Facorizaion [5], and adaping hese echniques for online deecion of novel documens is a rich area for fuure work. References [] M. Aharon, M. Elad, and A. Brucksein. The K-SVD: An Algorihm for Designing Overcomplee Dicionaries for Sparse Represenaion. IEEE Transacions on Signal Processing, 54, [2] D. Blei, A. Ng, and M. Jordan. Laen Dirichle Allocaion. JMLR, 3: , [3] S. Boyd, N. Parikh, E. Chu, B. Peleao, and J. Ecksein. Disribued Opimizaion and Saisical Learning via he Alernaing Direcion Mehod of Mulipliers. Foundaions and Trends in Machine Learning, 20. [4] V. Chenhamarakshan, P. Melville, V. Sindhwani, and R. D. Lawrence. Concep Labeling: Building Tex Classifiers wih Minimal Supervision. In IJCAI, pages , 20. [5] P. Combees and J. Pesque. Proximal Spliing Mehods in Signal Processing. arxiv: , [6] J. Duchi, S. Shalev-Shwarz, Y. Singer, and T. Chandra. Efficien Projecions ono he l-ball for Learning in High Dimensions. In ICML, pages , [7] J. Duchi and Y. Singer. Efficien Online and Bach Learning using Forward Backward Spliing. JMLR, 0: , [8] J. C. Duchi, S. Shalev-Shwarz, Y. Singer, and A. Tewari. Composie Objecive Mirror Descen. In COLT, pages 4 26, 200. [9] J. Friedman, T. Hasie, H. Hfling, and R. Tibshirani. Pahwise Coordinae Opimizaion. The Annals of Applied Saisics, 2: , [0] Q. Geng and J. Wrigh. On he Local Correcness of l -Minimizaion for Dicionary Learning. Preprin: hp:// jw2966, 20. [] E. Hazan, A. Agarwal, and S. Kale. Logarihmic Regre Algorihms for Online Convex Opimizaion. Machine Learning, 692-3:69 92, [2] T. Hofmann. Probabilisic Laen Semanic Analysis. In UAI, pages , 999. [3] P. O. Hoyer. Non-Negaive Sparse Coding. In IEEE Workshop on Neural Neworks for Signal Processing, pages , [4] S. P. Kasiviswanahan, P. Melville, A. Banerjee, and V. Sindhwani. Emerging Topic Deecion using Dicionary Learning. In CIKM, pages , 20. [5] D. Lee and H. Seung. Learning he Pars of Objecs by Non-negaive Marix Facorizaion. Naure, 999. [6] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Marix Facorizaion and Sparse Coding. JMLR, :9 60, 200. [7] C. Manning, P. Raghavan, and H. Schüze. Inroducion o Informaion Rerieval. Cambridge Universiy Press, [8] P. Melville, J. Leskovec, and F. Provos, ediors. Proceedings of he Firs Workshop on Social Media Analyics. ACM, 200. [9] B. Olshausen and D. Field. Sparse Coding wih an Overcomplee Basis Se: A Sraegy Employed by V? Vision Research, 3723: , 997. [20] S. Perović, M. Osborne, and V. Lavrenko. Sreaming Firs Sory Deecion wih Applicaion o Twier. In HLT 0, pages ACL, 200. [2] R. T. Rockafellar and R. J.-B. Wes. Variaional Analysis. Springer-Verlag, [22] A. Saha and V. Sindhwani. Learning Evolving and Emerging Topics in Social Media: A Dynamic NMF Approach wih Temporal Regularizaion. In WSDM, pages , 202. [23] M. Schmid, N. L. Roux, and F. Bach. Convergence Raes of Inexac Proximal-Gradien Mehods for Convex Opimizaion. In NIPS, pages , 20. 4

15 [24] S. Shalev-Shwarz. Online Learning and Online Convex Opimizaion. Foundaions and Trends in Machine Learning, 42, 202. [25] P. Tseng. Aprroximaion Accuracy, Gradien Mehods, and Error Bound for Srucured Convex Opimizaion. Mahemaical Programming, Series B, 25: , 200. [26] H. Wang and A. Banerjee. Online Alernaing Direcion Mehod. In ICML, 202. [27] J. Wrigh and Y. Ma. Dense Error Correcion Via L-Minimizaion. IEEE Transacions on Informaion Theory, 567: , 200. [28] J. Wrigh, A. Yang, A. Ganesh, S. Sasry, and Y. Ma. Robus Face Recogniion via Sparse Represenaion. IEEE Transacions on Paern Analysis and Machine Inelliegence, 32:20 227, Feb [29] L. Xiao. Dual Averaging Mehods for Regularized Sochasic Learning and Online Opimizaion. JMLR, : , 200. [30] A. Y. Yang, S. S. Sasry, A. Ganesh, and Y. Ma. Fas L-minimizaion Algorihms and an Applicaion in Robus Face Recogniion: A Review. In Inernaional Conference on Image Processing, pages , 200. [3] J. Yang and Y. Zhang. Alernaing Direcion Algorihms for L-Problems in Compressive Sensing. SIAM Journal of Scienific Compuing, 33: , 20. [32] M. Zinkevich. Online Convex Programming and Generalized Infiniesimal Gradien Ascen. In ICML, pages , A Background on ADMM In his secion, we give a brief review of he general framework of ADMM. Le px : R a R and qy : R b R be convex funcions, F R c a, G R c b, and z R c. Consider he following opimizaion problem min x,y px + qy s.. F x + Gy = z, 22 where he variable vecors x and y are separae in he objecive, and coupled only in he consrain. The augmened Lagrangian for he above problem is given by Lx, y, ρ = px + qy + ρ z F x Gy + ϕ 2 z F x Gy 2 2, where ρ R c is he Lagrangian muliplier and ϕ > 0 is a penaly parameer. ADMM uilizes he separabiliy form of 22 and replaces he join minimizaion over x and y wih wo simpler problems. The ADMM firs minimizes L over x, hen over y, and hen applies a proximal minimizaion sep wih respec o he Lagrange muliplier ρ. The enire ADMM procedure is summarized in Algorihm 4. The γ > 0 is a consan. The subscrip i denoes he ih ieraion of he ADMM procedure. The ADMM procedure has been proved o converge o he global opimal soluion under quie broad condiions [5]. Algorihm 4 : ADMM Updae Equaions for Solving 22 Ierae unil convergence x i+ argmin x Lx, y i, ρ i, y i+ argmin y Lx i+, y, ρ i, ρ i+ ρ i + γϕz F x i+ Gy i+. A. ADMM Equaions for updaing X and A s Consider he l -dicionary learning problem min P AX + λ X, A A,X 0 where A is defined in Secion 2. We use he following algorihm from [4] o solve his problem. I is quie easy o adap he ADMM updaes oulined in Algorihm 4 o updae X s and A s, when he oher variable is fixed see e.g., [4]. 5

16 ADMM for updaing X, given fixed A. Here we are given marices P R m n and A R m k, and we wan o solve he following opimizaion problem min P AX + λ X. X 0 Algorihm 5 shows he ADMM updae seps for solving his problem. The enire derivaion is presened in [4] and we are reproducing hem here for compleeness. In our experimens, we se ϕ = 5, κ = /Ψ max A, and γ =.89. These parameers are chosen based on he ADMM convergence resuls presened in [4, 3]. Algorihm 5 : ADMM for Updaing X ADMM procedure for solving min X 0 P AX + λ X Inpu: A R m k, P R m n, λ 0, γ 0, ψ 0, κ 0 X 0 k n, E P, ρ 0 m n for i =, 2,..., o convergence do E i+ sofp AX i + ρ i /ϕ, /ϕ G A AX i + E i+ P ρ i /ϕ X i+ max { X i κg λκ/ϕ, 0 } ρ i+ ρ i + γϕp AX i+ E i+ Reurn X a convergence ADMM for Updaing A, given fixed X. Given inpus P R m n and X R k n, consider he following opimizaion problem min A A P AX. Algorihm 6 : ADMM for Updaing A ADMM procedure for solving min A A P AX Inpu: X R k n, P R m n, γ 0, ψ 0, κ 0 A 0 m k, E P, ρ 0 m n for i =, 2,..., o convergence do E i+ sofp A i X + ρ i /ϕ, /ϕ G A i X + E i+ P ρ i /ϕx A i+ Π A max { A i κg, 0 } ρ i+ ρ i + γϕp A i+ X E i+ Reurn A a convergence When repeaing his opimizaion over muliple imeseps, we use warm sars for faser convergence, i.e., insead of iniializing A o 0 m k, we iniialize A o he dicionary obained a he end of he previous imesep. B Pseudo-Codes from Secion 5 Le us sar by exending he definiion of A, define A k = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. Define A k as A k = {A R m k : A 0 m k j =,..., k, A j 2 }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. 6

17 Algorihm 7 : BATCH-IMPL Inpu: P [ ] R m N, X [ ] R k N, P = [p,..., p n ] R m n, A R m k, λ, ζ, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x + λ x solved using Algorihm 5 if p j A x j + λ x j > ζ Mark p j as novel Bach Dicionary Learning Sep: Se k + k + η Se Z [] [X [ ] x,..., x n ] Se X [] [ ] Z[] 0 η N Se P [] [P [ ] p,..., p n ] for i = o convergence do Solve: A + = argmin A Ak+ P [] AX [] solved using Algorihm 6 wih warm sars Solve: X [] = argmin X 0 P [] A + X + λ X solved using Algorihm 5 Algorihm 8 : L2-BATCH Inpu: P [ ] R m N, P = [p,..., p n ] R m n, A R m k, λ 0, ζ 0, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = argmin x 0 p j A x 2 + λ x solved using he LARS mehod [9] if p j A x j 2 + λ x j > ζ Mark p j as novel l 2 -bach Dicionary Learning Sep: Se k + k + η Se P [] [P [ ] p,..., p n ] [A +, X [] ] = argmin A Ak+,X 0 P [] AX 2 + λ X non-negaive sparse coding problem 7

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief