Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Similar documents
Online l 1 -Dictionary Learning with Application to Novel Document Detection

Lecture 9: September 25

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Online Convex Optimization Example And Follow-The-Leader

10. State Space Methods

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Primal-Dual Splitting: Recent Improvements and Variants

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Chapter 3 Boundary Value Problem

Economics 8105 Macroeconomic Theory Recitation 6

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Lecture 4: November 13

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

Particle Swarm Optimization

Lecture 20: Riccati Equations and Least Squares Feedback Control

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

BU Macro BU Macro Fall 2008, Lecture 4

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

THE BELLMAN PRINCIPLE OF OPTIMALITY

Some Basic Information about M-S-D Systems

Let us start with a two dimensional case. We consider a vector ( x,

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Chapter 2. First Order Scalar Equations

Lecture 2 October ε-approximation of 2-player zero-sum games

System of Linear Differential Equations

A Shooting Method for A Node Generation Algorithm

Vehicle Arrival Models : Headway

Random Walk with Anti-Correlated Steps

3, so θ = arccos

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Chapter 7: Solving Trig Equations

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

Week 1 Lecture 2 Problems 2, 5. What if something oscillates with no obvious spring? What is ω? (problem set problem)

1 1 + x 2 dx. tan 1 (2) = ] ] x 3. Solution: Recall that the given integral is improper because. x 3. 1 x 3. dx = lim dx.

Notes on online convex optimization

ME 391 Mechanical Engineering Analysis

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff

EE363 homework 1 solutions

Predator - Prey Model Trajectories and the nonlinear conservation law

Lecture Notes 5: Investment

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Notes on Kalman Filtering

An introduction to the theory of SDDP algorithm

CHAPTER 12 DIRECT CURRENT CIRCUITS

dt = C exp (3 ln t 4 ). t 4 W = C exp ( ln(4 t) 3) = C(4 t) 3.

Chapter 8 The Complete Response of RL and RC Circuits

Chapter 6. Systems of First Order Linear Differential Equations

SOLUTIONS TO ECE 3084

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition

ENGI 9420 Engineering Analysis Assignment 2 Solutions

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

1 Review of Zero-Sum Games

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Testing for a Single Factor Model in the Multivariate State Space Framework

Machine Learning 4771

Correspondence should be addressed to Nguyen Buong,

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

The expectation value of the field operator.

Approximation Algorithms for Unique Games via Orthogonal Separators

Christos Papadimitriou & Luca Trevisan November 22, 2016

Notes for Lecture 17-18

2. Nonlinear Conservation Law Equations

Scheduling of Crude Oil Movements at Refinery Front-end

Optimizing heat exchangers

Differential Equations

Final Spring 2007

Math 315: Linear Algebra Solutions to Assignment 6

ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

DISCRETE GRONWALL LEMMA AND APPLICATIONS

Variational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations

MOMENTUM CONSERVATION LAW

An Introduction to Malliavin calculus and its applications

On-line Adaptive Optimal Timing Control of Switched Systems

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Homework sheet Exercises done during the lecture of March 12, 2014

Solutions from Chapter 9.1 and 9.2

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Math 333 Problem Set #2 Solution 14 February 2003

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Concourse Math Spring 2012 Worked Examples: Matrix Methods for Solving Systems of 1st Order Linear Differential Equations

The Contradiction within Equations of Motion with Constant Acceleration

Transcription:

Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief review of he general framework of ADMM. ADMM has recenly gahered significan aenion in he machine learning communiy due o is wide applicabiliy o a range of learning problems wih complex objecive funcions [, ]. Le px : R a R and qy : R b R be convex funcions, F R c a, G R c b, and z R c. Consider he following opimizaion problem px + qy s.. F x + Gy = z, x,y where he variable vecors x and y are separae in he objecive, and coupled only in he consrain. The augmened Lagrangian for he above problem is given by Lx, y, ρ = px + qy + ρ z F x Gy + ϕ z F x Gy, where ρ R c is he Lagrangian muliplier and ϕ > 0 is a penaly parameer. ADMM uilizes he separabiliy form of and replaces he join imizaion over x and y wih wo simpler problems. The ADMM firs imizes L over x, hen over y, and hen applies a proximal imizaion sep wih respec o he Lagrange muliplier ρ. The enire ADMM procedure is summarized in Algorihm. The γ > 0 is a consan. The subscrip i denoes he ih ieraion of he ADMM procedure. The ADMM procedure has been proved o converge o he global opimal soluion under quie broad condiions []. Algorihm : ADMM Updae Equaions for Solving Ierae unil convergence x i+ arg x Lx, y i, ρ i, y i+ arg y Lx i+, y, ρ i, ρ i+ ρ i + γϕz F x i+ Gy i+. A. ADMM Equaions for updaing X and A s Consider he l -dicionary learning problem A A,X 0 P AX + λ X, where A is defined in Secion 3.. We use he following algorihm from [4] o solve his problem. I is quie easy o adap he ADMM updaes oulined in Algorihm o updae X s and A s, when he oher variable is fixed see e.g., [4]. ADMM for updaing X, given fixed A. Here we are given marices P R m n and A R m k, and we wan o solve he following opimizaion problem X 0 P AX + λ X. Algorihm shows he ADMM updae seps for solving his problem. The enire derivaion is presened in [4] and we are reproducing hem here for compleeness. In our experimens, we se

ϕ = 5, κ = /Ψ max A, and γ =.89. These parameers are chosen based on he ADMM convergence resuls presened in [4, 6]. Algorihm : ADMM for Updaing X ADMM procedure for solving X 0 P AX + λ X Inpu: A R m k, P R m n, λ 0, γ 0, ψ 0, κ 0 X 0 k n, E P, ρ 0 m n for i =,,..., o convergence do E i+ sofp AX i + ρ i /ϕ, /ϕ G A AX i + E i+ P ρ i /ϕ X i+ max { X i κg λκ/ϕ, 0 } ρ i+ ρ i + γϕp AX i+ E i+ Reurn X a convergence ADMM for Updaing A, given fixed X. Given inpus P R m n and X R k n, consider he following opimizaion problem A A P AX. When repeaing his opimizaion over muliple imeseps, we use warm sars for faser convergence, i.e., insead of iniializing A o 0 m k, we iniialize A o he dicionary obained a he end of he previous imesep. Algorihm 3 : ADMM for Updaing A ADMM procedure for solving A A P AX Inpu: X R k n, P R m n, γ 0, ψ 0, κ 0 A 0 m k, E P, ρ 0 m n for i =,,..., o convergence do E i+ sofp A i X + ρ i /ϕ, /ϕ G A i X + E i+ P ρ i /ϕx A i+ Π Amax { A i κg, 0 } ρ i+ ρ i + γϕp A i+ X E i+ Reurn A a convergence B Analysis of OIADMM: Proofs from Secion 4 Firs, les recap he OIADMM updae rules. Γ + = arg Γ Â + = arg A A Γ +, Γ Γ + β Γ Γ F. β G +, A Â + τ A Â F, 3 + = + β P Â+X Γ +. 4 Le A op be he opimum soluion o he bach problem P AX. A A = Le Γ = P Â ˆX and Γ = P Â+ ˆX. For any, A A, le Γ = P A ˆX. The lemmas below hold for any A A so in paricular i holds for A se as A op. Proof Flow. Alhough he algorihm is relaively simple, he analysis is somewha involved. Define, Γ op = P A op X. Then he regre of he OIADMM is RT = Γ Γ op. = We spli he proof ino hree echnical lemmas. We firs upper bound, Γ Γ Lemma B., and use i o bound Γ + Γ Lemma B.3. In he proof of Lemma B.4, we bound Γ Γ + and his when added o he bound on Γ + Γ from Lemma B.3 gives a bound

on Γ Γ. The proof of he regre bound uses a canceling elescoping sum on he bound on Γ Γ. We use he following simple inequaliy in our proofs. Lemma B.. For marices M, M, M 3, M 4 R m n, we have he following M M, M 3 M 4 = M M 4 F + M M 3 F M M 3 F M M 4 F. Lemma B.. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have, Γ Γ β A τ Â F A Â+ F + β Γ Γ + F Γ + Γ F Γ Γ F β Ψ max τ ˆX Â+ Â F. Proof. For any A A, 3 is equivalen o he following variaional inequaliy [5]: β G + + τ Â+ Â, A Â+ 0. 5 Using Γ = P Â+ ˆX and subsiuing for G +, we have β G +, A Â+ = β /β + Γ Γ + ˆX, A Â+ = β /β + Γ Γ +, Â+ ˆX A ˆX = β /β + Γ Γ +, P A ˆX P Â+ ˆX =, Γ Γ + β Γ Γ +, Γ Γ. 6 Subsiuing 6 ino 5 and rearranging he erms yield, Γ Γ β Γ Γ +, Γ Γ + β τ Â+ Â, A Â+. 7 By using Lemma B., he firs erm on he righ side can be rewrien as Γ Γ +, Γ Γ = Γ Γ F + Γ Γ + F Γ + Γ F Γ Γ F. 8 Subsiuing he definiions of Γ and Γ, we have Γ Γ F = P Â ˆX P Â+ ˆX F = Â+ Â ˆX F Ψ max ˆX Â+ Â F, 9 Remember ha Ψ max ˆX is he maximum eigenvalue of X X. Using Lemma B., we ge ha he second erm in he righ hand side of 7 is equivalen o Â+ Â, A Â+ = A Â F A Â+ F Â+ Â F. 0 Combining resuls in 7, 8, 9, and 0, we ge he desired bound. Lemma B.3. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. For any A A, we have Γ + Γ F + β F + A β τ Â F A Â+ F β Ψ max τ ˆX Â+ Â F β Γ + Γ F. Proof. Le Γ + denoe he subgradien of Γ +. Now Γ + is a imizer of. Therefore, 0 m n Γ + β Γ Γ +. Rearranging he erms gives + β Γ Γ + Γ +. Since Γ + is a convex funcion, we have Γ + Γ + β Γ Γ +, Γ + Γ, Γ + Γ +, Γ Γ + β Γ Γ +, Γ + Γ. Using Lemma B., he las erm can be rewrien as β Γ Γ +, Γ + Γ = β Γ Γ F Γ Γ + F Γ + Γ F

Combining he inequaliy of Lemma B. wih gives, Γ Γ + β Γ Γ +, Γ + Γ β A τ  F A Â+ F β Ψ max τ ˆX Â+  F β Γ + Γ F Γ + Γ F. 3 Since Γ + Γ = + /β, we have, Γ + Γ β Γ + Γ F =, + + F β = F + F. 4 β Plugging 3 and 4 ino yields he resul. Lemma B.4. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure. If τ saisfies Ψ max ˆX. Then τ Γ Γ Λ F + F + β F + A β β τ  F A Â+ F, where Λ Γ. Proof. Le Λ Γ. Therefore, Γ Γ + Λ, Γ Γ +. Now, Therefore, Λ, Γ Γ + = Λ / β, β Γ Γ + β Λ F + β Γ Γ + F Γ Γ + β Λ F + β Γ Γ + F. 5 Adding 5 and he inequaliy of Lemma B.3 ogeher we ge Γ Γ β Λ F + β F + F + β τ β A  F A Â+ F Â+  F. τ Ψ max ˆX Seing /τ Ψ max ˆX means ha β / τ Ψ max ˆX Â+  F 0, Therefore, Γ Γ Λ F + F + β F + A β β τ  F A Â+ F, Theorem B.5 Theorem 4. Resaed. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure and RT be defined as above. Assume he following condiions hold: i he Frobenius norm of Γ is upper bounded by Φ, ii  0 = 0 m k, A op F D, iii 0 = 0 m n, and iv /τ Ψ max ˆX. Seing β = Φ/D τ T, we have RT ΦD T T τ + A op E. =

Proof. Subsiuing, Γ op = P A op ˆX for Γ and A op for A in Lemma B.4 and sumg he inequaliy of over from o T, we ge he following canceling elescoping sum Γ Γ op = β β Φ T β Since = = Λ F + β 0 F T + F + β τ A op Â0 F A op ÂT + F Λ F + β τ A op F β T + F β τ A op ÂT + F + D β τ. Γ op = P A op X = P A op ˆX + E = Γ op A op E, we have hen Γ op Γ op A op E. The regre is bounded as follows: RT = Γ Γ op Φ T + D β + A op E. β τ = Seing β = Φ D τ T yields desired bound. = As menioned in Secion 4, OIADMM can violae he equaliy consrain a each i.e., P Â + ˆX Γ +. However, we show in Theorem B.6 ha he accumulaed loss caused by he violaion of equaliy consrain is sublinear in T, i.e., he equaliy consrain is saisfied on average in he long run. Theorem B.6. Le {Γ, Â, } be he sequences generaed by he OIADMM procedure and RT be defined as above. Assume he following condiions hold: i he Frobenius norm of Γ is upper bounded by Φ, ii Â0 = 0 m k, A op F D, iii 0 = 0 m n, iv /τ Ψ max ˆX, and v Γ op Υ. Seing β = Φ/D τ T, we have = Γ + Γ D τ + 4ΥD Φ τ T. Proof. Les look a Γ + Γ F. Γ + Γ F = Γ + Γ + Γ Γ F Γ + Γ F + Γ Γ F Γ + Γ F + Ψ max ˆX Â+ Â F. 6 For he firs inequaliy, we used he simple fac ha for any wo marices M and M M M F M F + M F. The second inequaliy is because of 9. Firsly, since Γ + 0 Γ + Γ op Γ op Υ. Using his and rearranging erms in he inequaliy of Lemma B.3 wih A op insead of A gives Γ + Γ F β F + F + A op τ Â F A op Â+ F Ψ max τ ˆX Â+ Â F + Υ, β Plugging his ino 6 yields Γ + Γ F β F + F + A op τ Â F A op Â+ F Ψ max τ ˆX Â+ Â F + 4Υ. β

Leing /τ Ψ max ˆX and sumg over from o T, we have Γ + Γ F β = D + 4ΥT. τ β Seing β = Φ D τ T yields he desired bound. C Pseudo-Codes from Secion 5 0 F T + F + τ A op Â0 F A op ÂT + F + 4ΥT β Le us sar by exending he definiion of A, define A k = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. Algorihm 4 : BATCH-IMPL Inpu: P [ ] R m N, X [ ] R k N, P = [p,..., p n ] R m n, A R m k, λ, ζ, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = arg x 0 p j A x + λ x solved using Algorihm if p j A x j + λ x j > ζ Mark p j as novel Bach Dicionary Learning Sep: Se k + k + η Se Z [] [X [ ] x,..., x n ] Se X [] [ Z[] 0 η N ] Se P [] [P [ ] p,..., p n ] for i = o convergence do Solve: A + = arg A Ak+ P [] AX [] solved using Algorihm 3 wih warm sars Solve: X [] = arg X 0 P [] A +X + λ X solved using Algorihm Define A k as A k = {A R m k : A 0 m k j =,..., k, A j }, where A j is he jh column in A. We use Π Ak o denoe he projecion ono he neares poin in he convex se A k. Algorihm 5 : L-BATCH Inpu: P [ ] R m N, P = [p,..., p n ] R m n, A R m k, λ 0, ζ 0, η 0 Novel Documen Deecion Sep: for j = o n do Solve: x j = arg x 0 p j A x + λ x solved using he LARS mehod [3] if p j A x j + λ x j > ζ Mark p j as novel l -bach Dicionary Learning Sep: Se k + k + η Se P [] [P [ ] p,..., p n ] [A +, X [] ] = arg A Ak+,X 0 P [] AX + λ X non-negaive sparse coding problem D Addiional Experimenal Evaluaion In Figure, we show he effec of he size of he dicionary on he performance of Algorihm ON- LINE. The average AUC is compued as in Table. No surprisingly, as he size of dicionary k increases he average AUC also increases, bu correspondingly he running ime of he algorihm also increases. The plo suggess ha here is a diishing reurn on AUC wih increase in he size of he dicionary, and his increase in AUC comes a he cos of higher running imes. Pos-processing done o Generae Table. In each imesep, insead of hresholding by ζ, we ake he op 0% of wees measured in erms of he sparse coding objecive value and run a dicionarybased clusering, described in [4], on i. Furher pos-processing is done o discard clusers wihou much suppor and o pick a represenaive wee for each cluser.

AUC vs. Time for Differen Online Dicionary Sizes 0.9 Average AUC 0.8 0.7 0.6 0 30 40 50 60 70 80 CPU Running Time in s Figure : TDT daase: Average AUC vs. running ime for differen values of dicionary sizes k in Algorihm ONLINE. The poins ploed from lef o righ are for k = 50, 00, 50, and 00. References [] S. Boyd, N. Parikh, E. Chu, B. Peleao, and J. Ecksein. Disribued Opimizaion and Saisical Learning via he Alernaing Direcion Mehod of Mulipliers. Foundaions and Trends in Machine Learning, 0. [] P. Combees and J. Pesque. Proximal Spliing Mehods in Signal Processing. arxiv:09.35, 009. [3] J. Friedman, T. Hasie, H. Hfling, and R. Tibshirani. Pahwise Coordinae Opimizaion. The Annals of Applied Saisics,, 007. [4] S. P. Kasiviswanahan, P. Melville, A. Banerjee, and V. Sindhwani. Emerging Topic Deecion using Dicionary Learning. In CIKM, pages 745 754, 0. [5] R. T. Rockafellar and R. J.-B. Wes. Variaional Analysis. Springer-Verlag, 004. [6] J. Yang and Y. Zhang. Alernaing Direcion Algorihms for L-Problems in Compressive Sensing. SIAM Journal of Scienific Compuing, 33:50 78, 0.