A FINITE HYPERPLANE TRAVERSAL ALGORITHM FOR 1-DIMENSIONAL L 1 ptv MINIMIZATION, FOR 0 < p 1

Size: px

Start display at page:

Download "A FINITE HYPERPLANE TRAVERSAL ALGORITHM FOR 1-DIMENSIONAL L 1 ptv MINIMIZATION, FOR 0 < p 1"

Abner Campbell
6 years ago
Views:

1 A FINITE HYPERPLANE TRAVERSAL ALGORITHM FOR 1-DIMENSIONAL L 1 ptv MINIMIZATION, FOR 0 < p 1 HEATHER A. MOON AND THOMAS J. ASAKI Abstract. In ths paper, we consder a dscrete formulaton of the one-dmensonal L 1 ptv functonal and ntroduce a fnte algorthm that fnds exact mnmzers of ths functonal for 0 < p 1. Our algorthm for the specal case for L 1 TV returns globally optmal solutons for all λ 0 at the same computatonal cost of determnng a sngle optmal soluton assocated wth a partcular value of lambda. Ths fnte set of mnmzers contans the scale sgnature of the known ntal data. A varaton on ths algorthm returns locally optmal solutons for all λ 0 for the case when 0 < p < 1. The algorthm utlzes the geometrc structure of the set of hyperplanes defned by the nonsmooth ponts of the L 1 ptv functonal. We dscuss effcent mplementatons of the algorthm for both general and bnary data. u R m+1 1. Introducton In ths paper, we ntroduce an effcent fnte hyperplane traversal (ht) algorthm for solvng the 1D dscrete L 1 ptv problem, for all parameters λ > 0 m 1 m mn G p(u) u +1 u p + λ f u, (1) =0 where 0 < p 1, f R m+1 s some gven data, wth ether fxed (u 0 = f 0 and u m = f m ) or free boundary condtons. The ht algorthm requres only fntely many teratons to obtan a complete set of exact mnmzers for (1) for all λ > 0. For p = 1, these mnmzers are global mnmzers, whle for p < 1, the mnmzers are local. Computatonally effcent mplementatons of the ht algorthm are presented for both general and bnary data. The ht algorthm uses the geometrc structure of the dscrete L 1 ptv functon, G p, n (1). Notce that G p s nonsmooth on hyperplanes of the form {u u = u, where = 1, 0, m} and {u u = f, 0 m} and smooth everywhere else n R m+1. We show that mnmzers of G p are located at ntersectons of these hyperplanes. Whle the ht algorthm s an teratve mproved-pont algorthm, ths geometrc structure allows us to avod computng descent drectons and step szes n the typcal optmzaton sense. The algorthm teratvely reduces the parameter λ (n the sprt of parametrc programmng) determnng at each step an optmal descent drecton and performng a lne search on a fnte subset of ponts. Furthermore, t s never necessary to leave hyperplanes of the form {u u = u, where = 1, 0, m} and {u u = f, 0 m}, so that the dmenson of the problem s reduced after each teraton. A complete set of mnmzers for (1) can also be obtaned by reformulaton to a lnear program and solved usng a parametrc smplex method (for example, see [8]). However, the ht algorthm s dstnctly advantageous for several reasons. Sgnfcantly, a reformulaton s only possble for the p = 1 case, whereas the p < 1 solutons are of mportance as descrbed later. A reformulaton changes an unconstraned problem n m varables to a lnear program wth 5m varables and 2(m 1) constrants and 5(m 1) addtonal sgn restrctons. Thus, t cannot take advantage of any dmenson reducton strateges and requres a large amount of data storage and computaton due to the large number of constrants. For the p = 1 case, the ht algorthm produces a sequence of mnmzers that correspond to basc feasble solutons (or mnmzng =0 Date: September 5,

2 2 H. A. MOON AND T. J. ASAKI lnear combnatons) of the reformulated problem. However, the ht algorthm does not rely on the hghdmensonal geometry of the feasble set of the reformulated problem, workng n a contnually reduced dmensonal space. Graph-cut methods have also been successful n effcently solvng (1) for both one and two dmensonal data [10, 9]. They can only address the p = 1 case. However, the λ-parametrc problem can be solved by ths method. The advantage of the hyperplane traversal algorthm s n smplcty of mplementaton and mnmal data footprnt. Problem (1) s a dscretzaton of mn u B n (Ω) Ω u p dx + λ f u dx, for 0 < p 1, (2) Ω where f : R n R s a gven functon, Ω s a bounded doman, and the mnmzaton s taken over all functons of bounded varaton, u : R n R. Problems such as (2) are motvated by applcatons n sgnal and mage analyss tasks such as denosng and fndng scales n data. Varatonal and PDE based methods of denosng have been used for more than two decades ([14],[12],[2],and [3]). The most notable varatonal technques solve the problem u arg mn u L 2 (Ω) F p,q (u, u) u p + λ f u q dx. (3) Now, f λ s large, the term contanng λ f u q must be small, therefore makng our mnmzer u close to f n the L q sense. But, f λ s small, u p wll be small, that s, u wll have smaller varaton n the L p sense. In the applcatons of denosng and fndng scales, we note that u s less nosy than f (or has less of the smaller scales of f ) and more flat when λ s small and u s more lke f (more of the nose remans) when λ s large. We now brefly dscuss a few of the results for partcular values of p and q. F 2,2 (u, u) was frst ntroduced by Tkhonov [14]. Ths functonal s strctly convex. Usng results from the calculus of varatons (see [5, 6, 7]), we can say that there exsts a unque mnmzer. In mage denosng, we fnd the mnmzer of F 2,2 (u, u), has smoothed edges around obects (F 2,2 (u, u) s larger for functons wth ump dscontnutes than for those that wll ncrease steadly and thus losng edge locaton). Pxel ntensty s also lost. Ths problem then s not suffcent for mages wth regons of hgh contrast or well defned obect edges. Rudn, Osher, and Fatem proposed [12] mnmzng F 1,2 (u, u), also called the ROF functonal, to allow ump dscontnutes n u whch makes sense n many real mages. F 1,2 (u, u) s also strctly convex. In mage denosng, we see that mnmzers preserve the locaton of obect edges, but stll lose contrast (even when f s a noseless mage) and features wth hgh curvature are lost [13]. That s, corners get rounded. In [2], Chan and Esedoḡlu show that mnmzng the L 1 TV functonal, F 1,1, for magng tasks wll preserve pxel ntensty. However, features of hgh curvatures are stll lost. Ths functonal s agan convex, but ths tme t s not strctly convex. Therefore t should be noted that we cannot guarantee a unque mnmzer for L 1 TV. For a dscusson about the dscretzed L 1 TV see also [1] and [11]. In 2007, Chartrand [3] proposed to mnmze F p,2 for 0 < p < 1. It s worth notng that the functonals, F p,2, are not convex and therefore standard methods do not guarantee that we fnd a global mnmzer. Despte ths lack of guarantee, Chartrand found success n obtanng what appear to be local mnmzers. For a cartoon mage or an mage wth pecewse constant ntenstes, these solutons preserve obect edges, pxel ntensty, and areas of hgh curvature where sharp corners occur. In Secton 2 we present a varety of propertes of (1) when p = 1 and provde helpful defntons and notaton used throughout the paper. In Secton 3 we provde a formal descrpton of a prelmnary ht algorthm ncludng motvatng concepts. In Secton 4 we show that ths algorthm provdes a global mnmzer (at fxed λ) for the case p = 1. In Secton 5 we show that the teratve soluton of the λ = 0 (p = 1) problem provdes a complete set of global mnmzers for all λ 0, and formally descrbe an effcent ht algorthm. In Secton 6 we show results of tme trals. In Secton 7 we consder an example of usng ht for extractng Ω

3 HT ALGORITHMS 3 scale nformaton from daly sunspot number data. In Secton 8 we dscuss the generalzaton of ht to the p < 1 case. Fnally, n Secton 9 we provde concludng remarks. 2. Propertes of the dscrete L 1 TV functon In ths secton we present a varety of propertes of (1) for p = 1: m 1 m mn G 1(u) u +1 u + λ f u. (4) u R m+1 =0 We also provde helpful defntons and notaton used throughout the paper. In partcular, we consder the geometrc propertes of G 1 and consequences for fndng solutons to (4). Frst, we defne the sets on whch G s nonsmooth. =0 S u := {u = (u 1,..., u m ) u = u +1, for some = 0,..., m 1} (5) and S f := {u = (u 1, u 2,..., u m ) u = f, for some = 0,..., m}. (6) Now we collect some propertes that descrbe the mnmzers of G 1. Lemma 2.1. G 1 has a mnmzer. Proof. G 1 s bounded below, convex, and coercve n that as u G 1. Thus, by the Weerstrass optmalty condton, G 1 has a mnmzer. Defnton 2.1. We defne the set of global mnmzers, M λ for G 1, notng that ths set depends on the value λ > 0, M λ arg mn u G 1. Lemma 2.2. M λ s bounded and convex. Proof. The boundedness and convexty of M λ follow drectly from the boundedness and convexty of G 1. Lemma 2.3. If u s a local mnmzer of G 1, then u M λ. That s, f u s a local mnmzer, then t s also a global mnmzer. Proof. Ths lemma follows drectly from the convexty of G 1. Defnton 2.2. Let G 1 be defned as n (4). Let Y be the set of ponts of ntersectons of at least m + 1 hyperplanes of the form {u = f } and/or {u = u +1 }. Lemma 2.4. Y (2m+1)! (m+1)!m! <. Proof. Frst note that the number of hyperplanes of the form {u = u +1 } s m. The number of hyperplanes of the form {u = f } s m + 1. Usng the defnton of Y, we see that the number of all the possble ntersectons ( ) ( ) 2m + 1 2m + 1 of m + 1 such hyperplanes s. That s Y m + 1 m + 1 nequalty because we may have hyperplanes that are everywhere the same.) Lemma 2.5. There s a mnmzer, u, of G 1 n Y. = (2m+1)! (m+1)!(m)!. (Here, the frst s an Proof. Let û be a mnmzer of G 1 and û Y. Suppose frst that G 1 (û) exsts. Then G 1 (û) = 0. But then because G 1 s affne at ponts where t s dfferentable, G 1 s constant n the whole regon contanng û up to and ncludng the boundng hyperplanes where G 1 s nonsmooth. By the coercvty condton, ths regon must be bounded. Therefore, there s a pont, u, on the boundary of ths regon that s n Y such that G 1 (u ) = G 1 (û) where G 1 s gven n (4). Second, suppose that û H where H s the ntersecton of l < m hyperplanes where G 1 s nonsmooth, then we consder the functon G 1 whch s G 1 restrcted to H. Then G 1 (û) exsts and s zero. We can then use the same argument above to get that the set of mnmzers ncludes a pont n Y.

4 4 H. A. MOON AND T. J. ASAKI 3. Prelmnary ht Algorthm for dscrete L 1 TV We now propose a prelmnary ht algorthm for solvng (4) for a sngle fxed value of λ. The convergence propertes of ths algorthm are examned n Secton 4. In Secton 5 we show that a fnte λ-parametrc teratve mplementaton can be used to dscover globally optmal solutons for all λ 0. Consder a pedagogcal example. Let λ = 1, f = (0, 0.9, 0.4, 1) wth fxed boundary condtons, u 0 = f 0, u 3 = f 3. The geometry of G 1 s llustrated n Fgure 1. The level lnes of G 1 appear as smple polygons wth blue ndcatng lower value. There are seven hyperplanes (lnes) where G 1 s nonsmooth and Y = 11. The prelmnary algorthm s an mproved terate algorthm n whch all terates le n S u S f, and an optmal pont les n Y. Fgure 1. Level lnes for the functon G 1 (u 1, u 2 ) = u 2 u 1 + u u 2, showng the affne nature of the dscrete formulaton for L 1 TV. Here blue represents low values and red represents hgh values. The prelmnary algorthm works as follows. Start at the pont u = f. Check postve and negatve coordnate drectons for descent n G 1. Sum all coordnate descent drectons to get an α-descent drecton (formal defnton below). Perform a fnte lne search on the set of ponts where the search lne ntersects the hyperplane set S u S f. If the algorthm steps to a pont n S f, compute a new α-descent drecton and repeat. Otherwse, (f the algorthm steps to a pont n S u ), proect the current pont onto R l k, the space that s somorphc to the ntersectons of the hyperplanes of the form { = }. Repeat untl no coordnate descent exsts. The formal algorthm s gven n Tables 1 and 2 and makes use of the followng defntons. Let the coordnate drectons, e be defned as usual, that s e = (0,..., 0, 1, 0,..., 0), where the 1 s located n the th poston. Defnton 3.1. We defne an α-drecton to be a vector v so that v = α e, where α { 1, 0, 1}. We say that v s an α-descent drecton for G 1 at the pont u f v s an α-drecton and G 1 (u + γv) < G 1 (u) for all 0 < γ < γ, for some γ > Global Mnmzers for a fxed λ Usng Lemmas 2.1, 2.4, and 2.5, we know that G 1 has a mnmzer n the fnte set Y. Below we show that Algorthm 3.1 fnds a mnmzer after fntely many teratons. We begn wth the statement of ths theorem. Theorem 4.1. Algorthm 3.1 converges to a mnmum and s fnte.

5 HT ALGORITHMS 5 Algorthm 3.1. (L 1 TV) Gven f = ( f 1,..., f m ); Set = ( 1,..., u(0) m ) = ( f 1,... f m ); Set k 0; do Compute d (k) usng { Algorthm 3.2 α k arg mn α G1 ( + αd (k) ) } + αd (k) S u S f ; u (k+1) + α k d (k) ; k k + 1; untl d (k) = 0 Table 1. Prelmnary L 1 TV algorthm Algorthm 3.2. (Descent at teraton k) Gven u 1,..., u m ; = 1; Evaluate G0 = G 1 (u 1,..., u m ); Set d 0; whle m l max = arg max {l u h = u, h + l}; v = +l max l= e l ; G1 = G 1 (u + (u 1 u )v); ( G2 = G 1 u + (u+lmax +1 u +lmax )v ) ; f G1 < G0 d d + sgn(u 1 u )v; elsef G2 < G0 d d + sgn(u +lmax +1 u +lmax )v; end + l max + 1; end Table 2. α-descent algorthm The proof of Theorem 4.1 s a consequence of the next four lemmas whch we wll prove n subsequent subsectons. Lemma 4.1. Wherever descent exts, α-descent also exsts. Lemma 4.2. Whenever α-descent exsts, the α-drecton of Algorthm 3.1 also exsts. Lemma 4.3. α-drectons found n Algorthm 3.1 gve strct descent. Lemma 4.4. Only fntely many steps of the algorthm are needed to get from one pont n Y to another Proof of Lemma 4.1. In ths secton we wll show that f at u, G 1 has a descent drecton, then G 1 also has an α-descent drecton at u. We begn by borrowng from [4], the defnton of the generalzed gradent and generalzed dervatve whch we use to dscuss descent drectons for G 1. Defnton 4.1. We defne the generalzed gradent of a locally Lpschtz functon g at a pont u to be { g(x) = co lm g(x } ) x x, g(x ) exsts. (7)

6 6 H. A. MOON AND T. J. ASAKI We defne the generalzed dervatve, g (u; v), of a functon g at a pont u n the drecton v to be where y R m and t > 0. g (u; v) lm sup y u,t 0 We also take from [4] the followng proposton: g(y + tv) g(y), (8) t Proposton 4.1. Let g : X R be Lpschtz near u. Then for every v X, we have g (x; v) = max{ ζ, v ζ g(x)}. (9) Because n a neghborhood of each pont u R m there are only fntely many g(y), Equaton 9 reduces to g(u) = co { g(u 1 ),..., g(u l )} = α l 1 g(u 1 ) α l g(u l ) α 0, = 1...l, α = 1. (10) and g (u; v) = max α 1 g(u 1 ) v α l g(u l ) v α 0, = 1...l, (α 1,...,α l ) l α = 1. Let K(u) be the cone of descent drectons for G 1 at u. We now prove a more general statement about functons that are contnuous and pecewse affne. Lemma 4.5. Let g : R n R be a contnuous, pecewse affne (wth fntely many peces) functon that s smooth on convex domans. If g (u; v) < 0 then v s a descent drecton. Proof. Let u be a pont so that g(u) does not exst. Ths means that u s on a secton of the boundary of l domans where g s smooth. Because the domans where g s smooth are convex, we can choose ponts, u 1,..., u l, one n each of these domans, so that g s lnear along the lne segments connectng u and u and then usng Defnton 4.1 we have that f g (u; v) < 0 then g(u ) v < 0 for = 1,..., l. Otherwse f for some, g(u ) v 0 then we could choose α = 0 for and α = 1 and g (u; v) 0. Let t 0 > 0 be suffcently small so that g s lnear along the lne u + tv for 0 < t t 0. Then, we know that g(u ) g(u) = g(u + tv) g(u + tv). Thus, Thus, v s a descent drecton. g(u + tv) g(u) = g(u + tv) g(u ) = t g(u ) v < 0. We recall that G 1 dvdes R m nto domans where G 1 s lnear n the nteror of these domans and G 1 s nonsmooth on the boundary of these domans. Notce f R s one such doman, then R s contaned n the unon of hyperplanes of the form {u = u } and/or {u = f } for some,. Usng Lemma 4.5, we prove n the next few lemmas that f at a pont u on the boundary of one of these regons there s a descent drecton for G 1, then there s also an α-descent drecton n the lower dmensonal space to whch we have stepped. Lemma 4.6. As above, let K(u) be the cone of descent drectons for G 1 at u. Proof. a. v K(u) f and only f G 1 (u; v) 0. b. If v K(u) then G 1 (u + tv) = G 1 (u) for all t > 0 suffcently small.

7 HT ALGORITHMS 7 a. Frst note that usng Lemma 4.5, and snce G 1 s pecewse affne wth fntely many peces, we get We need only show that G 1 (u; v) < 0 v K(u). v K(u) G 1 (u; v) < 0. Let v K(u). Then G 1 (u+tv) G 1 (u) < 0 for all t > 0 suffcently small. We also know that snce G 1 s convex we have that G 1 (u) G 1 (u tv) < 0 for any t > 0. Suppose that at u, u u for some, and u k f k for some k, then we choose ε > 0 suffcently small so that for all ũ B ε (u), ũ ũ and ũ k f k. Let y B ε (u). Let R be a regon as descrbed above. We now break ths argument nto cases: Case 1: Suppose thats u, u + tv, y, y + tv R. We know then that G 1 s contnuous and affne n B ε (u) R. Thus, G 1 (y + tv) G 1 (y) < 0. R ε y u + tv u y + tv Bε(u) R ε y y + tv u + tv u Bε(u) R ε y + tv y u u + tv Bε(u) Case 1: u, u + tv, y, y + tv R. Case 2: u, u + tv R but y R. Case 3: u, y R but u + tv, y + tv R. R y ε u tv y + tv u + tv u Bε(u) R u u + tv Case 4: u R but y, u + tv, y + tv R. Case 5: u R \ R. Fgure 2. 1D examples for the cases of Lemma 4.6. Case 2: Suppose that u, u + tv R, but that y R. Snce G 1 s affne n B ε (u) R, G 1 (y + tv) G 1 (y) < 0. Case 3: Suppose that u, y R, but u + tv, y + tv R, then usng that G 1 s affne n R, we see that G 1 (y + tv) G 1 (y) < 0. In each of the above cases, we see then that G 1 (u; v) < 0. Case 4: Suppose that u R, but u + tv, y R. Then by convexty of G 1 and that G 1 s affne on B ε (u) R, usng case 3, we know that G 1 (y + tv) G 1 (y) = G 1 (u) G 1 (u tv) < 0. (11) Case 5: Fnally, suppose that u R \ R, then G 1 s smooth at u. Thus G 1 (u; v) = G 1(u) v < 0. b. Let v K(u) then we can construct a sequence {v k } K(u) so that v k v. Then there s a k 0 suffcently large so that for all k k 0, v v k < ɛ for some ɛ > 0. But G 1 (u + tv k ) G 1 (u) < 0 for all k snce v k K(u). G 1 contnuous gves us that G 1 (u + tv) G 1 (u) < ɛ for all ɛ > 0. Thus G 1 (u + tv) G 1 (u) 0. But f G 1 (u + tv) G 1 (u) < 0 then, by contnuty, v K(u). Thus, G 1 (u + tv) G 1 (u) = 0. Defnton 4.2. Let P R m. We defne π : P R m by π(u) = ũ,

8 8 H. A. MOON AND T. J. ASAKI to be the proecton map that removes redundancy n u, where m m. That s, f {u = u } s actve at u, then the th or th (whchever s larger) component s removed from u to get ũ and f {u = f } s actve at u, then the th component of u s removed to get ũ. Note, ths proecton s nvertble. Example 4.1. For example, let P R 4 be the 2-dmensonal subset gven by P = {(u 1, u 2, u 3, u 4 ) u 2 = u 1 and u 4 = f 4 }. (12) We defne π : P R 2 by π(u 1, u 1, u 3, f 4 ) = (u 1, u 3 ). (13) If we pck any pont n R 2, we can fnd ts nverse proecton n P by π 1 (u 1, u 2 ) = (u 1, u 1, u 2, f 4 ). (14) Lemma 4.7. Let u R where R s one of the regons descrbed above. Let K(u). Then N(u) K(u) R has an α-descent drecton, where N(u) s some neghborhood of u. Proof. We know that R s contaned n the unon and ntersecton of some hyperplanes of the form {u = u } and/or {u = f } for some,. By lookng n N(u) K(u) R we can restrct G 1 to ponts n the lower dmensonal space, P, defned by the actve hyperplanes at u. Let G 1 : R m R be defned by G 1 (ũ) = G 1 (u), where π 1 (ũ) = u. Then we see that G 1 (ũ) exsts. Snce G 1 s affne n R we have that K(ũ) = {v v, G 1 (u) < 0} whch s a half space and therefore contans an α-drecton, ṽ. We then have that v = π 1 (ṽ) s an α-descent drecton n P. Usng the proof of Lemma 4.7, the followng result holds mmedately. Remark 4.1. v s an α-descent drecton at u π(v) s an α-descent drecton at π(u). Remark 4.2. Lemma 4.7 gves us that f at there s a descent drecton for G 1 then there s also an α-descent drecton for G 1 at Proof of Lemma 4.2. Lemma 4.8. If at ũ R l an α-descent drecton, ṽ, exsts then there exsts an α-descent drecton of the form ˆṽ = α ẽ, (15) where α { 1, 0, 1}, ẽ are coordnate drectons, and α ẽ are descent drectons whenever α 0. Proof. We can let ṽ = l =1 α ẽ where α { 1, 0, 1} and ẽ are coordnate drectons. Snce G 1 s lnear on R l l l G 1 (ũ + tṽ) = G 1 ũ + t α ẽ = G 1 (η ũ + tα ẽ ) = G 1 l =1 =1 η ( ũ + t η α ẽ ) = l =1 =1 η G 1 ( ũ + t η α ẽ ), (16) where (ũ ) l =1 η = 1. Now, snce ṽ s a descent drecton, some of the terms, G 1 + t η α ẽ < G1 (ũ). Otherwse, f G 1 + (ũ ) t η α ẽ G1 (ũ), we would have l ( G 1 (ũ + tṽ) = η G 1 ũ + t ) l α ẽ η G 1 (ũ) = G 1 (ũ). (17) η =1 =1

9 HT ALGORITHMS 9 (ũ ) Let I = { G 1 + t η α ẽ < G1 (ũ)}. Now, we know that G 1 (ũ + tṽ) < G 1 (ũ) for all t > 0 suffcently small. Thus, G 1 (ũ + tα ẽ ) < G 1 (ũ) for t > 0 suffcently small and I. Thus, α ẽ s a descent drecton whenever I. And we can create ˆṽ, by choosng ˆα n the followng way { α whenever I ˆα =. (18) 0 otherwse Then Then ˆṽ = l ˆα ẽ. (19) =1 G 1 (ũ) > G 1 (ũ + tˆṽ) = G 1 (ũ) + t s G 1 (α ẽ ). (20) 4.3. Proof of Lemma 4.3. Recall that n Algorthm 3.1, we step to hyperplanes where G 1 s nonsmooth. Then we work wthn the lower dmensonal space defned by the hyperplanes to whch we have stepped. We defne clusters below and use these clusters to algorthmcally defne the lower dmensonal space. { Defnton 4.3. Let C (k) } = c 1 = 1 < c 2 <... < c qk m c 1 u(k) c be the set of ndces so that = c for all c c Then we defne a cluster, C (k), to be the set of ndces so that has the same value as c, that s, C (k) { for all c c +1 1}. Notce, that a cluster wll have sze C (k) = c +1 c (the last cluster wll have sze m c qk + 1) and q =1 C(k) = m. We wll say that s n a cluster C and mean C. Defnton 4.4. If = ( 1,... ) u(k) m R m s obtaned by k teratons of Algorthm 3.1, we let 1 f c +1 1 α (k) =c e s a descent drecton for G 1 at = 1 f c +1 1 =c e s a descent drecton for G 1 at 0 otherwse for 1 m. For fxed boundary condtons, we set α (k) 1, α(k) m = 0. The next two lemmas show that our clusters only get larger n Algorthm 3.1 and that we fnd descent when no pont, n cluster C wll move ndependently of the cluster. Ths means that we never go back to the hgher dmensonal space, that s, the algorthm contnues to step to lower and lower dmensonal spaces. Actually, these two lemmas are for a more general algorthm, that s, for an algorthm that fnds mnmzers of L 1 ptv for 0 < p 1. The frst of these two lemmas gves us ths result for teraton 1 of Algorthm 3.1 and the second lemma gves the result for all other teratons. Both lemmas are proved by lookng at the varous neghborhood cases that are possble at each pont u to show that f u s n a cluster, C, t won t break away from the cluster n next steps. And then we determne whch λ values gve us descent when movng a cluster C Clusters need not break up for descent (teraton 1). Lemma 4.9. Let 0 < p 1. Let η l u c 1 u c and η r u c+1 u c+1 1. Then the followng statements hold. =1

10 10 H. A. MOON AND T. J. ASAKI (1) If there exsts a cluster C wth {u c 1 > u c and u c+1 1 > u c+1 } and 0 < λ < ηp l (η l αη) p + η p r (η r + αη) p η C then a descent drecton for G p, at the pont u = f, s c +1 1 =c e when η r < η l and c +1 1 (2) If there exsts a cluster C wth {u c 1 < u c and u c+1 1 < u c+1 } and 0 < λ < ηp l (η l + αη) p + η p r (η r αη) p η C then a descent drecton for G p, at the pont u = f, s c +1 1 =c e when η r > η l and (21) =c e when η r > η l. (22) c +1 1 (3) If there exsts a cluster C wth {u c 1 < u c and u c+1 1 < u c+1 } and 0 < λ < ηp l (η l αη) p + η p r (η r αη) p η C then a descent drecton for G p, at the pont u = f, s c +1 1 (4) If there exsts a cluster C wth {u c 1 < u c and u c+1 1 < u c+1 } and (23) =c e when η r < η l. (24) (25) =c e. (26) 0 < λ < ηp l (η l + αη) p + η p r (η r + αη) p η C then a descent drecton for G p, at the pont u = f, s (27) c +1 1 e. (28) =c Notce, for p = 1, n cases 1 and 2, the condton for λ s 0 < λ < 0. Snce there s no such λ, we see that our L 1 TV algorthm wll not fnd descent n these cases. The condton for λ for p = 1 n cases 3 and 4 s 0 < λ < 2 C. (29) Proof. We begn ths proof by showng that f we start wth u = f, then to get descent we need not break up clusters. We break ths nto 2 cases. We assume for these cases that u c s not a pont on the boundary of Ω (1 < c < m). We prove ths by consderng whether or not G p (u + ηαe ) G p (u) = u + ηα u 1 p + u + ηα u +1 p + λ u + ηα f u u 1 p + u u +1 p + λ u f < 0. (30) Case 1: Suppose u c 1 = u c = u c +1. We assume that η > 0 s small and compute G p (u + ηαe ) G p (u) = 2η p + λη > 0, η > 0. (31) Thus, n ths case, no descent exsts. That s, a data pont n the mddle of a cluster wll not move n the frst teraton. We can also see that for each pont we move from the nsde of a cluster causes an ncrease n the

11 HT ALGORITHMS 11 Fgure 3. Case1: c c 1 = f c 1 c = f c c +1 = f c+1 s a pont n the mddle of a cluster. fdelty term and we also see that the varaton term wll not decrease. Thus, we see that no descent s found by movng any ponts from nsde the cluster n a drecton dfferent than the rest of cluster. Case 2: Suppose u c 1 = u c < u c +1 (see Fgure 4) We assume that 0 < η s at most η r u c +1 u c and compute G p (u + ηαe ) G p (u) = η p + (η r αη) p η p r + λη = η p (1 a p + (a α) p ) + λη where a = η r η 1 (32) Notce that f α = 1, we have G p (u + ηαe ) G p (u) = 1 a p + (a + 1) p + λη > 0. Now f α = 1, we have G p (u + ηαe ) G p (u) = 1 a p + (a 1) p + λη. Ths s also postve snce a p 1 = (a 1) p when a = 1 and the left-hand sde s ncreasng faster than the rght-hand sde for a > 1. Thus, n ths case, no descent c 1 = f c 1 c = f c c = f c c +1 = f c+1 c +1 = f c+1 c 1 = f c 1 c +1 = f c+1 c 1 = f c 1 c = f c c +1 = f c+1 c 1 = f c 1 c = f c Fgure 4. Case2: c s a pont on the end of a cluster (four cases). exsts. That s, a data pont on the end of a cluster wll not move n the frst teraton. Notce the other cases for u c 1 = u c u c +1 or u c +1 = u c u c 1 have the same result and are proved smlarly. Notce that f we moved several ponts at the end together away from the rest of the cluster, we wll see the same result n the varaton term, but we wll multply the fdelty term by the number of ponts we move. Therefore, clusters wll not break apart n the frst teraton of our algorthm. Next we show that when clusters move together n the frst teraton we fnd descent when λ satsfes the condtons stated n the lemma. We show ths by consderng whether or not c +1 1 G p u + ηα =c e G p(u) = u c + ηα u c 1 p + u c ηα u c+1 p + λ c +1 1 =c u + ηα f c +1 1 u c u c 1 p + u c+1 1 u c+1 p + λ u f < 0. (33) =c We break ths step nto four cases. Let η r u c+1 u c+1 1 and η l u c u c 1. We also assume η mn{η r, η l }.

12 12 H. A. MOON AND T. J. ASAKI Case 1: Suppose u c 1 > u c =... = u c+1 1 > u c+1. We compute c 1 = f c 1... c = f c = f... c +1 1 = f c +1 1 c +1 = f c+1 Fgure 5. Case 1: C ( ) has left neghbor above and rght neghbor below. c +1 1 G p u + αη =c We see that for ths case, we fnd descent when e G p(u) = (η l αη) p η p l + (η r + αη) p η p r + λ C η. (34) 0 < λ < ηp l (η l αη) p + ηr p (η r + αη) p. (35) η C That s, descent s found wth ths λ by movng the cluster up when η r < η l and down when η r > η l. (For p = 1, ths condton s 0 < λ < 0, thus descent does not exst.) Case 2: Suppose u c 1 < u c =... = u c+1 1 < u c+1. If we use a smlar argument to that of Case 1, we see that for ths case, we fnd descent when λ < ηp l (η l + αη) p + ηr p (η r αη) p. (36) η C That s, descent s found, wth ths λ, by movng the cluster up when η r > η l and down when η r < η l. (For p = 1, ths condton s 0 < λ < 0, thus descent does not exst.) Case 3: Suppose u c =... = u c+1 1 < u c 1, u c+1. We compute c 1 = f c 1 c +1 = f c+1 c +1 1 G p u + αη c = f c = f c +1 1 = f c +1 1 Fgure 6. Case 3: C ( ) has both neghbors above. =c e G p(u) = (η l αη) p η p l + (η r αη) p η p r + λ C η. (37) We see that for ths case, we fnd descent by movng the cluster up when (For p = 1, ths condton s 0 < λ < 2/ C.) λ < ηp l (η l η) p + ηr p (η r η) p. (38) η C

13 HT ALGORITHMS 13 Case 4: Suppose u c =... = u c+1 1 > u c 1, u c+1. Agan, ths case s smlar to Case 3. A smlar argument gves us that movng the cluster down gves descent when λ < ηp l (η l η) p + ηr p (η r η) p. (39) η C (For p = 1, ths condton s 0 < λ < 2/ C.) Fnally, n the case that we choose the free boundary opton (lettng boundares move), we show that f u c s on the boundary, then we fnd descent when λ satsfes the condtons stated n the lemma. We prove ths for the left endpont of the data snce the argument for the rght endpont s smlar. We break ths nto two cases. Case 1: u 1 = u 2. In ths case, we assume η > 0 s small and we compute G p (u + ηαe ) G p (u) = η p + λη > 0. (40) Thus we wll not fnd descent by movng ths endpont wthout ts neghbors. Case 2: u 1 =... = u c2 1 < u c2. In ths case, we assume 0 < η η r = (u c2 u c2 1) and we compute G p (u + ηαe ) G p (u) = C 1 λη + (η r αη) p η p r. (41) Thus, we fnd descent by movng the endpont up whenever λ < ηp r (η r η) p s 0 < λ < 1/ C 1.) η C 1 = ηp r η C 1. (For p = 1, ths condton Clusters need not break up for descent (teraton k). For ths lemma, we need to defne some notaton. Defnton 4.5. We defne q g, q l, q e to be the number of elements,, n the cluster that are greater than, less than, and equal to (respectvely) the correspondng f : { q g = C ( ) } > f, { q l = C ( ) } < f, and { q e = C ( ) } = f. Lemma Let 0 < p 1. Let q g, q e, and q l be defned as above. Let η l u c 1 u c and η r u c+1 u c+1 1 and let be a pont obtaned usng a ht algorthm. Then the followng statements hold. (1) If there exsts a cluster C wth {u c 1 > u c = u c+1 1 > u c+1 } and 0 < λ < ηp l (η l αη) p + η p r (η r + αη) p η((q g q l )α + q e ) then a descent drecton for G p at the pont s c +1 1 =c e when η r < η l and c +1 1 (2) If there exsts a cluster C wth {u c 1 < u c = u c+1 1 < u c+1 } and 0 < λ < ηp l (η l + αη) p + η p r (η r αη) p η((q g q l )α + q e ) then a descent drecton for G p at the pont s c +1 1 =c e when η r > η l and (42) =c e when η r > η l. (43) c +1 1 (44) =c e when η r < η l. (45)

14 14 H. A. MOON AND T. J. ASAKI (3) If there exsts a cluster C wth {u c 1 > u c and u c = u c+1 1 < u c+1 } and 0 < λ < ηp l (η l η) p + η p r (η r η) p η(q g q l + q e ) then a descent drecton for G p at the pont s c +1 1 (46) =c e. (47) (4) If there exsts a cluster C wth {u c 1 < u c and u c = u c+1 1 > u c+1 } and 0 < λ < ηp l (η l η) p + η p r (η r η) p η( q g + q l + q e ) then a descent drecton for G p at the pont s (48) c +1 1 e. (49) =c Agan, for p = 1, n Cases 1 and 2, the condton for λ s 0 < λ < 0. Thus our L 1 TV algorthm wll not fnd descent n these cases. The condton for λ for p = 1 n case 3 s 2 0 < λ <. (50) q g q l + q e And for Case 4, wth p = 1, the condton for λ s 0 < λ < 2 q g + q l + q e. (51) Proof. We begn agan by showng that to get descent we need not break up clusters. Agan for ease of notaton, we wrte u nstead of. We break ths nto two cases. Case 1: u c 1 = u c = u c +1. We assume η > 0 s small. We compute f c c 1 c c +1 c 1 c = f c c +1 c 1 c c +1 f c Fgure 7. Case 1: c s a pont n the mddle of a cluster (3 possble cases). G p (u + ηαe ) G p (u) = 2η p + λη((q g q l )α + q e ). (52) Here we treat u c as a cluster of sze 1 by movng t alone. Ths means that only one of q g, q e, q l s 1, whle the others are 0. Notce that f q e = 1, there s no descent. Notce also that λ > 2η p 1 gves descent by movng u c toward f c, but ths would be undong what we dd n a prevous step. That s, n the prevous

15 HT ALGORITHMS 15 step, we could have moved u c to ths cluster by tself n whch case movng t back by tself s undong a step that gave us descent and thus t would gve us ascent. The other possble case would have been f we moved u c wth a cluster to ths poston. In ths case, we know from Lemma 4.9 that to move t by tself away would be a step that gves ascent. Consequently, we wll not fnd descent breakng up ths cluster. Case 2: u c 1 = u c u c +1 or u c +1 = u c u c 1. We wll prove one of these cases, namely u c 1 = u c < u c +1, because the four cases are smlar n argument. We assume that η mn{η r, f c u c } and we compute c = f (k) c c +1 c 1 c = f (k) c c 1 c +1 f c f c c 1 c c c +1 c +1 c 1 c 1 c +1 c c +1 (k) uc 1 c f c f c c 1 u (k) c +1 c = f (k) c c +1 c 1 c = f (k) c Fgure 8. Case 2: c s a pont on the end of a cluster (8 possble cases). Notce, we get descent f or G p (u + ηαe ) G p (u) = (η r ηα) p η p r + η p + λη((q g q l )α + q e ). (53) λ < ηp r η p (η r ηα) p η((q g q l )α + q e ) λ > ηp r + η p + (η r ηα) p η((q g q l )α + q e ) = η p 1 ap 1 (a α) p (q g q l )α + q e < 0 (54) = η p 1 ap 1 (a α) p (q g q l )α + q e (55) However, notce that ths second nequalty s takng us back toward f whch s agan, undong a prevous step. The frst nequalty says λ < 0. Thus, we do not fnd descent n ths case ether. Thus he algorthm wll not break up clusters.

16 16 H. A. MOON AND T. J. ASAKI Now we consder movng the full cluster together. We wll show that we fnd descent when λ satsfes the condtons stated n the lemma. We break ths step nto four cases. Let η r u c+1 u c+1 1 and η l u c u c 1. We also assume η mn{η r, η l }. Case 1: Suppose u c 1 > u c =... = u c+1 1 > u c+1. We compute c 1 c c +1 1 c +1 Fgure 9. Case 1: C ( ) has left neghbor above and rght neghbor below. c +1 1 G p u + αη =c We see that for ths case, we fnd descent when e G p(u) = (η l αη) p η p l + (η r + αη) p η p r + λ((q g q l )α + q e )η. (56) λ < ηp l (η l αη) p + ηr p (η r + αη) p. (57) η((q g q l )α + q e ) As n the last lemma, we fnd descent, wth ths λ, by movng the cluster up when η r > η l and down when η r < η l. (As we saw n the last lemma, for p = 1, ths condton s 0 < λ < 0. Thus descent does not exst.) Case 2: Suppose u c 1 < u c =... = u c+1 1 < u c+1. If we use a smlar argument to that of Case 1, we see that for ths case, we fnd descent when λ < ηp l (η l + αη) p + ηr p (η r αη) p. (58) η((q g q l )α + q e ) That s, descent s found, wth ths λ, by movng the cluster up when η r < η l and down when η r > η l. (For p = 1 descent does not exst.) Case 3: Suppose u c =... = u c+1 1 < u c 1, u c+1. We compute c 1 c +1 c c +1 1 c +1 1 G p u + αη =c e Fgure 10. Case 3: C ( ) has both neghbors above. G p(u) = (η l αη) p η p l + (η r αη) p η p r + λ((q g q l )α + q e )η. (59)

17 We see that for ths case, we fnd descent by movng the cluster up when HT ALGORITHMS 17 λ < ηp l (η l η) p + ηr p (η r η) p. (60) η(q g q l + q e ) (For p = 1, ths condton s 0 < λ < 2/(q g q l + q e ).) Case 4: Suppose u c =... = u c+1 1 > u c 1, u c+1. Agan, ths case s smlar to Case 3. A smlar argument gves us that movng the cluster down gves descent when (For p = 1, ths condton s 0 < λ < 2/( q g + q l + q e ).) λ < ηp l (η l η) p + ηr p (η r η) p. (61) η( q g + q l + q e ) Usng the proof of Lemmas 4.9 and 4.10, we see that for every cluster C (k), we get C (k) C (k+1). That s, no pont wll leave a cluster and at each teraton the algorthm wll reduce the problem to mnmzng a lower dmensonal problem Proof of Lemma 4.4. Up to ths pont, we have shown that Algorthm 3.1 converges to a mnmzer. Now we need to show that the algorthm s ndeed fnte. Proof. Suppose Y, then k > 0 snce = f Y. Therefore G 1 ( ) does not exst because the algorthm always steps to a pont where G 1 s nonsmooth. Therefore H, where H s an l < m dmensonal hyperplane formed from ntersectons of some of the l 1 hyperplanes of the form {u = u (, ) E} and or {u = f }. Note that l > 1 snce the algorthm stops when l = 1. If s not a mnmzer there exsts a descent drecton for G 1 at. Then there exsts an α-descent drecton n H. The algorthm takes the step n ths drecton to get u (k+1) whch les on a hyperplane whose dmenson s smaller than l. We can contnue ths process at most l tmes to land at a pont u (k ) Y. To summarze ths secton, we have ust shown that Algorthm 3.1 converges to a mnmum of G 1 by showng that f there s a descent drecton at a pont u, then the partcular α-drecton of Algorthm 3.1 exsts and gves strct descent. We also showed that Algorthm 3.1 s fnte by showng that the algorthm takes only fntely many steps to get from one pont n Y to the next. Thus, Theorem 4.1 holds. 5. Global Mnmzers for all λ In ths secton we wll ntroduce a more effcent algorthm for the case when p = 1. In the proof of Lemmas 4.9 and 4.10, we found condtons on λ for whch descent occurs. Here we use those condtons to formulate a new algorthm that does not need to compute G 1 values. Recall, we found that descent occurs, n the p = 1 case, only when clusters are lower than both neghbors or hgher than both neghbors. We restate the condtons here. In the case when C s lower than ts neghbors, we fnd descent n movng the cluster up when 2 0 < λ <. (62) q g q l + q e For such clusters, we call Q = q g q l + q e the effectve cluster sze. In the case when C s hgher than ts neghbors, we fnd descent n movng the cluster down when 2 0 < λ <. (63) q g + q l + q e For these clusters, we call Q = q g + q l + q e the effectve cluster sze. Remark 5.1. In the case of free boundary condtons, f a cluster C s on the boundary of the data, we use the results from Lemmas 4.9 and 4.10 to say the effectve cluster sze s twce the effectve cluster sze of an nteror cluster. We see that ths makes sense snce movng the cluster only affects the varaton based on one neghbor nstead of two and therefore t takes a smaller λ to move t.

18 18 H. A. MOON AND T. J. ASAKI We also recognze that the value of G 1 s constant f a cluster moves to a heght somewhere between ts hghest neghbor and ts lowest neghbor. Instead of movng clusters up and down n parallel, we move up (down) all clusters wth the approprate effectve cluster sze for the gven λ frst. These clusters wll move up (down) to meet another cluster, stoppng at S u or to meet an f value, stoppng at S f. Snce some clusters wll on wth others, we recompute effectve cluster szes for all clusters that changed and then move down (up) all clusters wth the approprate effectve cluster sze for the gven λ. For ths verson of the algorthm, we are stll steppng n an α-descent drecton to ponts n S u and/or S f. We are not breakng up clusters as before, but we are now steppng through effectve cluster szes to make the algorthm more effcent. As long as the effectve cluster sze does not decrease, we know that the convergence gven n Secton 4 stll holds. In fact, we can easly show that effectve cluster sze does not decrease. Lemma 5.1. The effectve cluster sze (ECS) for any cluster at any teraton gven by Q up = q g q l + q e or Q dwn = q g + q l + q e, (64) wll never decrease, here Q up s the ECS for a cluster ntended to move up and Q dwn s the ECS for a cluster ntended to move down. Proof. We wll prove ths lemma s true for an up cluster C. The argument for a down cluster s smlar. We recall Defnton 4.5. For ths proof, we wll say that u n C contrbutes to q g f u > f, to q l f u < f, and to q e f u = f. Notce the followng: If u n C contrbutes to q g and C moves up to form the new cluster C, then u n C contrbutes to q g snce u > u > f. If u n C contrbutes to q e and C moves up to form the new cluster C then u n C contrbutes to q g snce u > u = f. If u n C contrbutes to q l and C moves up to form the new cluster C, then u n C contrbutes to ether q e or q l snce a cluster wll stop at the closest of ts neghbors or correspondng f values. Thus f u > u. Ths all tells us that when C moves up, q l can only change by decreasng, q g can only change by ncreasng, and q e can change by ether ncreasng or decreasng. Now, we consder the effectve cluster sze for three cases for the newly formed cluster C. A C = C, the actual cluster sze does not change. Ths happens when C moves up to meet an f value, B C s lower than both of ts neghbors, and C C s a cluster that s hgher than both of ts neghbors. We don t consder the case when one of the neghbors of C s below and the other above the cluster, snce movng ths cluster wll not gve descent n G 1. In case A, snce both neghborng clusters, C 1 and C +1 are stll above C, the effectve cluster sze s gven by Q up n (64). Usng the argument above we see that the new effectve cluster sze ncreases snce at least one u n C that contrbutes to q l wll move up to u that contrbutes to q e thus Q up wll ncrease. In case B, C wll move up to on wth at least one of ts neghborng clusters C 1 and C +1. Now, let Q 1, Q +1 denote the effectve cluster szes of C 1, C +1, respectvely and Q be the contrbuton from C after ts move. From the above argument, we know that Q = q g q l + q e q g q l + q e = Q. If C moves up to on wth C 1, the new effectve cluster sze s ust the sum of the contrbutons from both clusters, that s, Q = Q 1 + Q. If C moves up to on wth C +1, the new effectve cluster sze s Q = Q + Q +1. And f C moves up to on wth both C 1 and C +1, the new effectve cluster sze s Q = Q 1 + Q + Q +1. In these three cases, f Q 1, Q +1 > 0 then the effectve cluster sze does not decrease. Notce that case C can only happen f C moves up to meet both of ts neghbors (for otherwse, at least one wll stll be above C ). Thus, the new effectve cluster sze s Q = Q 1 + Q + Q +1. Also, for ths case,

19 HT ALGORITHMS 19 the new cluster that s formed s a down cluster, that s, Q s computed usng Q dwn n (64). Snce C was below clusters C 1 and C +1 before the move, we know that C 1 and C +1 were down clusters before C moved up. Snce we are ncrementng on the ECS, we know that Q 1, Q +1 Q. Snce the newly formed cluster, C s a down cluster, the amount that C contrbutes to Q s Q = q g + q l + q e. Notce that Q s not an effectve cluster sze, rather t only contrbutes to the new effectve cluster sze therefore t may be negatve. If Q s negatve, then we wll get the smallest value for Q. But the smallest ths can be happens when u = f for all u n C so that after the move they contrbuted to q g, but then Q = C and we get Q = Q 1 + Q + Q +1 = Q 1 C + Q +1 Q. And, we know that, n ths case also, the effectve cluster sze never decreases. Notce that snce the algorthm starts wth u = f, Q = C > 0 for every cluster C thus usng the above arguments, the mnmum effectve cluster szes never decrease. Now we gve the formal algorthm. Let C (k) 1,..., C(k) q k be q g, q e, and q l for cluster at teraton k. Algorthm 5.1. (L 1 TV) Gven f = ( f 1,..., f m ); Set = ( 1,..., u(0) m ) = ( f 1,... f m ); Fnd C (0) 1, C(0) 2,..., C(0) q 0 ; Set k 1; do Compute g (k), e (k), l (k) for each = 1...q 0 ; U { : all neghbors of C are above C } D { : all neghbors of C are below C } mncs k mn 1 qk {mn U {g (k) l (k) + e (k) mvcl { U : g (k) l (k) be the unque clusters at teraton k. Let g (k) }, mn D { g (k) + l (k) + e (k) = mncs}; f mvcl for dx = 1 : mvcl Move up, C mvcl(dx) to closest of f, C I(dx) 1, and C I(dx)+1 end else + e (k) }}; mvcl { D : g (k) + l (k) + e (k) = mncs}; for dx = 1 : mvcl Move down, C mvcl(dx) to closest of f, C I(dx) 1, and C I(dx)+1 end end k k + 1 Update lst of clusters, [C (k) 1,..., C(k) q k ]; f mncs k mncs k 1 Append lst of solutons wth [C 1,..., C qk ]; Append lst of λ wth 2 mncs+1 ; end untl no descent exsts. Table 3. Effcent L 1 TV algorthm (wrtten wth an up preference), e (k), l (k) In ths algorthm we start at = f and fnd the clusters. We then determne whch clusters mght move up (call them up clusters) and whch mght move down (call them down clusters) gnorng those that have both a neghbor below and a neghbor above the cluster. In the case of Algorthm 5.1, we see t s wrtten

20 20 H. A. MOON AND T. J. ASAKI wth a preference to move clusters up frst and then down. We fnd the mnmum effectve cluster sze (ECS) and move up any up cluster, havng ths ECS, to ts nearest neghborng cluster or f value. If no up cluster has ths ECS, we move down any down cluster that has the same ECS to ts nearest neghborng cluster or f value. We repeat ths untl the stoppng condton s reached. If at any teraton the mnmum effectve cluster sze changed from the prevous teraton, we update the lst of solutons and the λ value. The stoppng condton for ths algorthm depends on the type of data and the boundary condtons. In the general case for data (that s, not necessarly bnary data), we stop the algorthm when monotoncty s reached for fxed boundary data or, for free boundary data, when there s only one cluster left (the soluton s flat). For bnary data, ths algorthm s greatly smplfed. There s never a need to check neghbors of a cluster. If the data s bnary, then the neghbors have the opposte value of that of the cluster. That s, f the cluster s at a heght of 1, then ts neghbors are at a heght of 0 and t s then a down cluster. For down clusters, C, the effectve cluster sze s dependent on l and e whereas for an up cluster, the effectve cluster sze s dependent on g and e. Another smplfcaton for ths algorthm when the data s bnary s that we never need to check the dstance to f values correspondng to a cluster snce these wll also be ether 0 or 1. Thus, the algorthm wll not have moves to heghts other than the heght of cluster neghbors. Fnally, the algorthm wll stop when the mnmum number of clusters, mnq, s reached. The value of mnq depends on whether the boundares are fxed (mnq= 2) or free (mnq= 1). The greatest beneft to Algorthm 5.1, for both the general and the bnary problems, s that we are able to solve the λ = 0 problem and n the process get the solutons λ > 0. That s, the computatonal task of gettng a soluton for all λ > 0, s the same as solvng only one problem. We state ths result n the next theorem. Theorem 5.1. Algorthm 5.1 fnds a soluton to L 1 TV for every λ > 0. Proof. Algorthm 5.1 terates by ncreasng the effectve clusters sze, thus decreasng λ begnnng wth the largest possble λ so that at least one cluster wll move. Because the problem s dscrete, the effectve cluster szes are postve ntegers between 0 and m (the length of the sgnal). The effectve cluster sze (and thus, λ) does not change untl no cluster of ths effectve cluster sze wll move. By the results of Secton 4 we know that for each λ, Algorthm 5.1 fnds descent whenever descent exsts. Therefore at each teraton the algorthm mnmze L 1 TV for the current λ. And by Lemma 5.1, we know that teratng on the effectve cluster sze does not skp a partcular effectve cluster sze that mght need to be revsted later snce the effectve cluster sze never decreases. Thus, for each λ > 0, we fnd a mnmzer to the correspondng L 1 TV problem Extensons of the ht Algorthm. It s worth notng that the algorthm wll not naturally extend to hgher dmensonal data. The ssue les n the neghborhood structure that occurs at hgher dmensons. In hgher dmensonal problems, such as magng problems, t becomes benefcal to break up clusters when there s a data pont that has more neghbors outsde of the cluster than nsde. We see ths occurrng n mages wth L 1 TV when parts of obect edges havng hgh curvature are rounded. We conecture that an adustment to the algorthm to allow cluster break up only when the number of neghbors outsde the cluster s not less than the number nsde wll gve smlar results for hgher dmensonal data. The ht algorthm does not extend easly to an L 1 TV obectve functon where the data fdelty term nvolves a lnear operator such as n the dscretzaton of K u f 1. Ths transformaton causes a rotaton of some of the boundng hyperplanes, changng the geometry of the problem. Because the ht algorthm reles strongly on ths geometry, we beleve t s not obvous, yet worth future nvestgatons, to extend the ht algorthm to such cases. 6. Tme Trals for L 1 TV ht Algorthm Fnally, we show tmng results for both the general and bnary cases as well as for fxed and free boundary condtons.

21 HT ALGORITHMS 21 Frst, we start wth fxed boundary condtons. We ran Algorthm 5.1 on 100 random sgnals of sze N. In Table 4 we have recorded the average number of ntal clusters, the average number of λ solutons, and the average tme n seconds that t takes to perform the man loop of the ht algorthm. The algorthm has a set up that s of order N, but then the man loop depends on the number of ntal clusters. N Ave. # of ntal clusters Ave. # of λ Solutons Ave. tme n seconds Table 4. Tme Trals for Algorthm 5.1 for general random sgnals, of sze N, wth fxed boundary condtons. We then ran 100 random bnary sgnals of length N. We recorded the average tme to complete the man loop of the ht algorthm, the average number of ntal clusters, and average number of λ solutons n Table 5. N Ave. # of ntal clusters Ave. # of λ Solutons Ave. tme n seconds Table 5. Tme Trals for Algorthm 5.1 for random bnary sgnals, of sze N, wth fxed boundary condtons. Next, we looked at some tme trals, but ths tme wth free boundary condtons. We ran 100 random sgnals of length N. In Tables 6 (general random sgnals) and 7 (bnary random sgnals), we recorded the average tme to complete the man loop, the average number of ntal clusters, and the average number of λ solutons. We know that for a random bnary sgnal, ths algorthm s O(M). From our tme trals, t appears that the computatonal complexty of the man loop of our ht algorthm s O(M), where M s the number of ntal clusters n our sgnal. An ntal computaton of O(N) s performed on each sgnal to catalogue the clusters, where N s the length of the sgnal. Puttng these together, we beleve that the computatonal complexty of ths algorthm s O(aN + M) sgnals of length N, where a s small compared to 1. We know that for a bnary

22 22 H. A. MOON AND T. J. ASAKI N Ave. # of ntal clusters Ave. # of λ Solutons Ave. tme n seconds Table 6. Tme Trals for Algorthm 5.1 for general random sgnals, of sze N, wth free boundary condtons. N Ave. # of ntal clusters Ave. # of λ Solutons Ave. tme n seconds Table 7. Tme Trals for Algorthm 5.1 for random bnary sgnals, of sze N, wth free boundary condtons. sgnal, the cluster that moves wll meet up wth both ts neghbors. Thus, after each teraton, the number of clusters decreases by 2 for every cluster that moves. Ths means that there are exactly M 2 cluster moves for any bnary sgnal. The worst case scenaro, s the bnary sgnal gven by = (0, 1, 0, 1, 0, 1,...). Here the number of ntal clusters s N, the sgnal length. Thus, the number of cluster moves s equal to. In a random bnary sgnal, we expect less than N ntal clusters. M 2 = N 2 7. Example for L 1 TV ht Algorthm As we menton above, L 1 TV mnmzaton can be used to fnd scales n data. In ths secton we show that our algorthm for L 1 TV does ndeed gve the expected results. In Fgure 11(a), we show a plot of daly sunspot numbers obtaned from NASA ( The sunspot number 1 for any 1 The sunspot number s commonly refered to as the Wolf Number n honor of Rudolf Wolf who s credted wth the concept n 1848.

23 HT ALGORITHMS 23 gven day s a standardzed measure of the number of sunspots and sunspot groups present on the earthfacng surface of the sun. Sunspots are dynamc and have typcal lfetmes of a few days to a few months. However, the domnant feature n the sgnature s an approxmate 11-year perod n overall sunspot actvty. Fgure 11. Examples of mnmzng sgnals for daly sunspot numbers at sgnfcant scales. The raw daly data s represented at scale 2/λ = 1. We appled the ht algorthm for L 1 TV to ths sunspot data. The sgnature shown n Fgure 12 s the value of the varaton term n (1) ( m =0 u +1 u ) for each dstnct value of 2/λ found by the ht algorthm. The presence of a scale 2/λ n a sgnature s revealed by sgnfcant changes n varaton wth respect to 2/λ. Three scale sgnatures are present, at 2/λ 13, 1200, 11000, n unts of days. The largest value ndcates the tme scale over whch sunspot actvty rses and falls over decades. The medum scale represents the typcal duraton of the decadal peak n sunspot actvty, about 3-4 years. The 13-day scale correlates well wth the typcal duraton of a sunspot on the face of the sun, lmted by half of the the sun s generally accepted synodc rotatonal perod of 26 days. The short scale does appear to be composed of a broad range of scales rangng from one to sx weeks. Scales shorter than two weeks may be due to observatons of the begnnng and end of sunspot lfe cycles. Scales of several weeks may be due to the duraton of sun-wde bursts of actvty. Mnmzng sgnals for these two scales are shown n Fgures 11(b) and 11(c). The nset subfgures show the sgnals over a short range of dates from 1955 through At these scales, features n the data of wdth are sgnfcant and changng rapdly wth respect to λ. Features of effectve cluster sze (see Secton 5) smaller than the gven scale are not present. 8. Hyperplane Traversal Algorthm for dscrete L 1 ptv for p < 1 In ths secton, we consder the dscrete formulaton for L 1 ptv for 0 < p < 1 m 1 m mn G p u +1 u p + λ f u u Rm+1. (65) =0 =0

24 H. A. MOON AND T. J. ASAKI Fgure 12. Scale Sgnature for sunspot data Notce that f u s bnary data, the problem s p-ndependent. Thus, for bnary data, ths problem reduces to the L 1 TV problem.

24 24 H. A. MOON AND T. J. ASAKI Fgure 12. Scale Sgnature for sunspot data Notce that f u s bnary data, the problem s p-ndependent. Thus, for bnary data, ths problem reduces to the L 1 TV problem. For ths problem, we seek a mnmzer n the set Y (Defnton 2.2). Because G p s concave n the regons separated by the hyperplanes of the form {u = u +1 }, {u = u 1 }, or {u = f } (see Fgure 13), we know that mnmzers wll be n Y. We use a varant of the ht algorthm whch stays n Y to fnd local mnmzers. The dfference n the algorthm s that the choces made depends on whether neghborng clusters are both above, both below, or one above and one below the cluster beng moved. Fgure 13. Level lnes of a smple example of G p for p =.5 and λ = ht Algorthm for L 1 ptv. Agan, we use the proof of Lemmas 4.9 and 4.10 to wrte an algorthm to fnd mnmzers of (1). We let C (k) 1,..., C(k) q k be the unque clusters at teraton k. Usng the lemmas, we wrte the condtons on λ for a cluster to move at teraton k. We let q g, q e, and q l be as we defned them n Secton 5. To move a cluster that has one neghbor above and one neghbor below, we let η a be the dstance from the cluster to the neghbor above the cluster and η b be the dstance from the cluster to the neghbor below the cluster. And we consder movng the cluster up a dstance η = mn {η a, { f u : when f > u, u C }}.

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?