Efficient Bregman Projections onto the Simplex
|
|
- Hugo Hensley
- 5 years ago
- Views:
Transcription
1 05 IEEE 54th Annual Conference on Decson and Control (CDC) December 5-8, 05. Osaka, Japan Effcent Bregman Projectons onto the Smplex Wald Krchene Syrne Krchene Alexandre Bayen Abstract We consder the problem of projectng a vector onto the smplex = {x R d + : d = x = }, usng a Bregman projecton. Ths s a common problem n frstorder methods for convex optmzaton and onlne-learnng algorthms, such as mrror descent. We derve the KKT condtons of the projecton problem, and show that for Bregman dvergences nduced by ω-potentals, one can effcently compute the soluton usng a bsecton method. More precsely, an ɛ- approxmate projecton can be obtaned n O(d log ). We also ɛ consder a class of exponental potentals for whch the exact soluton can be computed effcently, and gve a O(d log d) determnstc algorthm and O(d) randomzed algorthm to compute the projecton. In partcular, we show that one can generalze the KL dvergence to a Bregman dvergence whch s bounded on the smplex (unlke the KL dvergence), strongly convex wth respect to the l norm, and for whch one can stll solve the projecton n expected lnear tme. I. INTRODUCTION Many frst-order methods for convex optmzaton and onlne learnng can be formulated as teratve projectons of a vector on a feasble set. Consder for example the constraned convex problem, mnmze x X f(x), where X s a convex set and f : X R s convex. Ths problem can be solved usng the mrror descent algorthm, a frst-order method proposed by Nemrovsk and Yudn n [] (see also [4]), whch generalzes the projected gradent descent method, by replacng the Eucldean projecton step wth a generalzed Bregman projecton. Ths method can be summarzed n Algorthm. Algorthm Mrror descent method wth learnng rates (η τ ) and Bregman dvergence D ψ. : for τ N do : Query a sub-gradent vector g (τ) f(x (τ) ) 3: Update x (τ+) = arg mn D ψ (x, ( ψ) ( ψ(x (τ) ) η τ g (τ) )) x X () 4: end for Here, D ψ s the Bregman dvergence nduced by a dstance generatng functon ψ. The defnton and propertes Wald Krchene s wth the department of Electrcal Engneerng and Computer Scences, Unversty of Calforna, Berkeley, USA. wald@eecs.berkeley.edu Syrne Krchene s wth the ENSIMAG school of Computer Scences and Appled Mathematcs of Grenoble, France. syrne.krchene@ensmag.grenoble-np.fr Alexandre Bayen s wth the department of Electrcal Engneerng and Computer Scences, and the department of Cvl and Envronmental Engneerng, Unversty of Calforna, Berkeley, USA. bayen@berkeley.edu of Bregman dvergences wll be revewed n Secton II. Some mportant nstances of the mrror descent method nclude projected gradent descent, obtaned by takng the Bregman dvergence to be the squared Eucldean dstance, and the exponentated gradent descent [8] (also called Hedge algorthm or multplcatve weghts algorthm []), obtaned by takng the Bregman dvergence to be the KL dvergence. In ths artcle, we focus specfcally on smplexconstraned convex problems. That s, we suppose that X s the smplex d = {x R d + : d = x = }, or more generally, a product of scaled smplexes, X = α d α I d I. Smplex-constraned problems nclude nonparametrc statstcal estmaton, see for example Secton 7. n [8], mult-commodty flow problems, see Chapter n [0], tomography mage reconstructon [5] and learnng dynamcs n repeated games [0]. Other varants of the mrror descent method have been studed as well, such as stochastc mrror descent [7], [9]. Besdes ts applcatons to convex optmzaton, smplexconstraned mrror descent plays an mportant role n onlne learnng problems [9], n whch a decson maker chooses, at each teraton τ, a dstrbuton x (τ) over a fnte acton set A wth A = d. Then, a bounded loss vector l (τ) [0, ] d s revealed, and the decson maker ncurs expected loss l (τ), x (τ) = d = x(τ) l (τ). Ths sequental decson problem s also called predcton wth expert advce [], and has a long hstory whch dates back to Hannan [5] and Blackwell [6], who studed ths problem n the context of repeated games. In (adversaral) onlne learnng problems, one seeks to desgn an algorthm whch has a guarantee on the worst-case regret, defned as follows: f the algorthm s presented wth a sequence of losses (l (τ) ) τ T, and t generates a sequence of decsons (x (τ) ) τ T, then the cumulatve regret of the algorthm up to teraton T s R((l (τ) ) 0 τ T ) = T τ= l (τ), x (τ) mn T τ= l (τ), x, and the worst-case regret s the maxmum such regret over admssble sequences of losses max (l (τ) ) 0 τ T R((l (τ) ) 0 τ T ). An algorthm s sad to have sublnear regret f ts worstcase regret grows sub-lnearly n T, that s, R((l lm sup T max (τ) ) 0 τ T ) (l (τ) ) 0 τ T T 0. The onlne mrror descent method, obtaned smply by replacng the subgradent vector g (τ) n Algorthm wth the loss vector l (τ), defnes a large class of onlne learnng algorthms wth sub-lnear regret, see for example the survey of Bubeck and Cesa-Banch n [9]. The onlne /5/$ IEEE 39
2 mrror descent method s summarzed n Algorthm. Algorthm Onlne mrror descent method wth learnng rates (η τ ) and Bregman dvergence D ψ. : for τ N do : Play acton a (τ) x (τ) 3: Dscover loss vector l (τ) [0, ] d 4: Incur expected loss l (τ), x (τ) 5: Update x (τ+) = arg mn D ψ (x, ( ψ) ( ψ(x (τ) ) η τ l (τ) )) () 6: end for Onlne mrror descent, and ts stochastc varant, have been appled to several problems ncludng mult-armed bandts [9], [], machne learnng [] and repeated games [], to cte a few. In all the varants of smplex-constraned mrror descent, one needs to solve, at each teraton τ, the Bregman projecton step gven n equaton () or (). Some nstances of Bregman projectons are known to have an exact soluton whch can be computed effcently. For example, the soluton of the KL dvergence projecton on the smplex s gven by the exponental weghts update [], [3], and the Eucldean projecton on the smplex can be computed effcently ether by sortng and thresholdng n O(d log d), or by usng a randomzed pvot method n O(d), see [3]. In ths artcle, we start by dervng the KKT condtons of the Bregman projecton problem n Secton II, then consder, n Secton III, a general class of Bregman dvergences, nduced by ω-potentals, as defned by Audbert et al. []. We show that for ths class, the soluton can be approxmated effcently: an ɛ-approxmate soluton can be computed n O(d log ɛ ) operatons. In Secton IV, we consder a class of exponental potentals, and study the resultng Bregman projecton, a generalzaton of the KL-dvergence projecton. We show that for ths class, the exact soluton can be computed usng a determnstc algorthm wth O(d log d) complexty, or a randomzed algorthm wth expected lnear complexty. We also study the propertes of the resultng Bregman dvergence. In partcular, we emphasze a tradeoff between strong convexty and boundedness, two propertes whch affect the convergence rates of the mrror descent method. II. BREGMAN PROJECTION AND OPTIMALITY CONDITIONS Let ψ : X R be a convex functon defned on a convex set X, and let X be the subset of X on whch ψ s dfferentable. Let ψ : X R be the gradent of ψ, and R ts range. The Bregman dvergence nduced by ψ s defned as follows D ψ : X X R + (x, y) D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y (3) By convexty of ψ, the Bregman dvergence s non-negatve, and x D ψ (x, y) s convex. We wll refer to ψ as the dstance-generatng functon. We say that ψ s l ψ -strongly convex wth respect to a reference norm f D ψ (x, y) l ψ x y x, y X X. In order for the Bregman projecton () to be well-defned, the gradent vector (or loss vector) at teraton τ must satsfy the followng consstency condton: ψ(x (τ) ) η τ g (τ) R. (4) A. Interpretatons of the Bregman projecton The Bregman projecton, gven n equaton (), can be nterpreted as projectng on X, the vector ( ψ) ( ψ(x (τ) ) η τ g (τ) ), obtaned by mappng the current terate x (τ) to the set R through ψ, takng a step n the opposte drecton of the gradent, then mappng the new vector back through ( ψ), see Nemrovsk and Yudn []. A second nterpretaton can be obtaned, as observed by Beck and Teboulle [3], by rewrtng the objectve functon as follows: denotng the vector ( ψ) ( ψ(x (τ) ) η τ g (τ) ) by x (τ), we have by defnton of D ψ x (t+) = arg mn D ψ (x, x (τ) ) = arg mn = arg mn ψ(x) whch s equvalent to mnmzng x (τ+) = arg mn η τ (f(x (τ) ) + ψ(x) ψ( x (τ) ) ψ( x (τ) ), x x (τ) ψ(x (τ) ) η τ g (τ), x, g (τ), x x (τ) ) +D ψ (x, x (τ) ), whch can be nterpreted as follows: the frst term f(x (τ) )+ g (τ), x x (τ) s the lnear approxmaton of f around the current terate x (τ), and the second term D ψ (x, x (τ) ) s a non-negatve functon whch penalzes devatons from x (τ). The step sze (or learnng rate) η τ, controls the relatve weght of both terms. B. Smplex-constraned Bregman projecton In the remander of the paper, we wll assume, to smplfy the dscusson, that the feasble set s the smplex d = {x R d + : d = x = }. We observe that all the results can be readly extended to the case n whch X s a product of scaled smplexes, as follows: suppose X = α d α K d K, wth α k > 0, and let ψ k be a dstance generatng functon on d k. Then consder the functon ψ : α d α K d K R K (α x,..., α K x K ) α k ψ k (x k ). k= The gradent of ψ s smply ψ : α d α K d K R R K, (α x,..., α K x K ) ( ψ (x ),..., ψ K (x K )), and ts nverse s gven by 39
3 ( ψ) : R... R K α d α K d K, (y,..., y K ) (α ψ (y ),..., α K ψ K (y K)). Fnally, the Bregman dvergence decomposes as follows D ψ ((α k x k ) k, (α k y k ) k ) = k = k α k ψ k (x k ) k α k D ψk (x k, y k ). α k ψ k (y k ) k ψ(y k ), α k (x k y k ) Therefore, the projecton on X wth Bregman dvergence D ψ can be decomposed nto K projectons on d k wth Bregman dvergence D ψk, as follows: arg mn D ψ (x, ( ψ) ( ψ(x (τ) ) η τ g (τ) )) x k d k = arg mn x k d k α k D ψk (x k, ψ k ( ψ k(x (τ) k ) η τ g (τ) k )), assumng the consstency condton holds for each k. Example (Eucldean projecton): Consder the functon ψ(x) = x. Then ψ(x) = x, and the Bregman dvergence s smply D ψ (x, y) = x y. As a consequence, the Bregman projecton step reduces to arg mn d D ψ (x, ( ψ) ( ψ(x (τ) ) η τ g (τ) )) = arg mn x (x(τ) η τ g (τ) ), d whch corresponds to a projected gradent descent update, wth step sze η τ. C. Optmalty condtons We now derve the KKT condtons for the Bregman projecton problem gven by mnmze x R d D ψ (x, ( ψ) ( ψ( x) ḡ)) subject to x d (5) where, x d, and ḡ R d are gven. Note that we combne η τ g (τ) nto a sngle vector ḡ, to smplfy notaton. By strong convexty, the soluton s unque. Proposton : Consder the Bregman projecton problem (5). Then x R d s optmal f and only f there exst λ R d + and ν R such that x = ( ψ) ( ψ( x) ḡ + λ + ν ), d = x =,, x 0, λ x = 0, where ν s the vector whose entres are all equal to ν. Proof: Defne the Lagrangan, for x R d, λ R d +, and ν R, L(x, λ, ν) = D ψ (x, ( ψ) ( ψ( x) ḡ)) λ, x + ν( x ). (6) = For all x, y X, the gradent of the Bregman dvergence s gven by x D ψ (x, y) = ψ(x) ψ(y). Thus the gradent of L s gven by x L(x, λ, ν) = ψ(x) ψ( x) + ḡ λ ν. Wrtng the KKT condtons of problem (5), we have that (x, λ, ν ) s optmal f and only f ψ(x ) ψ( x) + ḡ λ ν = 0, x =,, x 0, λ 0, λ x = 0, and the frst equaton can be rearranged as x = ( ψ) ( ψ( x) ḡ + λ + ν ), whch proves the clam. In the next secton, we wll derve an effcent algorthm to compute an approxmate soluton for the class of Bregman dvergences nduced by ω-potentals, by solvng the KKT system gven n Proposton. III. EFFICIENT APPROXIMATE PROJECTION WITH ω-potentials Defnton : Let a (, + ] and ω 0. An ncreasng, C -dffeomorphsm φ : (, a) (ω, + ) s called an ω-potental f lm φ(u) = ω, lm u φ(u) = +, φ (u)du <. u a We assocate, to an ω-potental φ, the dstance-generatng x ω Fg.. φ(u) x φ (u)du functon ψ defned as follows Illustraton of an ω-potental ψ : (ω, + ) d R x = x 0 0 a φ (u)du. By defnton, ψ s fnte (n partcular, the thrd condton on the potental ensures that ψ s fnte on the boundary of the smplex snce 0 φ (u)du < ), dfferentable on (ω, + ) d, and ts gradent s gven by ψ : (ω, ) d R = (, a) d x ψ(x) = (φ (x )) =,...,d, and snce φ n ncreasng, ψ s convex. Smlarly, the nverse of ts gradent s ( ψ) : (, a) d (ω, ) d y (φ(y )) =,...,d. 393
4 Proposton : Consder the Bregman projecton onto the smplex gven n Problem (5), and assume that ψ s nduced by an ω-potental φ. Then x s a soluton f and only f there exsts ν R such that {, x = ( φ(φ ( x ) ḡ + ν ) ) +, d = x =, where x + denoted the postve part of x, x + = max(x, 0). Proof: Combnng the expresson of ψ and ( ψ) wth Proposton, we have that x s optmal f and only f there exst ν R and λ R d + such that, x = φ(φ ( x ) ḡ + ν + λ ), d = x =,, x 0, x λ = 0. Let I = { : x > 0} be the support of x. Then by the complementary slackness condton, we have for all I, λ = 0, thus x = φ(φ ( x ) ḡ + ν ), and for all / I, φ(φ ( x ) ḡ + ν ) φ(φ ( x ) ḡ + ν + λ ) = x = 0. snce φ s ncreasng Therefore ( x can be smply wrtten x = φ(φ ( x ) ḡ + ν ) ) whch proves the clam. + Next, we make the followng observaton regardng the support of the soluton: Proposton 3: Let x be the soluton to the projecton problem (5), and let I be ts support. Then for all, j, f I and φ ( x ) ḡ φ ( x j ) ḡ j, then j I. Proof: Follows from Proposton and the fact that φ s ncreasng. As a consequence of the prevous propostons, computng the projecton reduces to computng the optmal dual varable ν, and snce the potental s ncreasng, one can teratvely approxmate ν usng a bsecton method, gven n Algorthm 3: we start by defnng a bound on the optmal ν, ν ν ν, then we teratvely halve the sze of the nterval by nspectng the value of a carefully defned crteron functon. Theorem : Consder the Bregman projecton onto the smplex gven n Problem (5), and assume that ψ s nduced by an ω-potental φ. Let ɛ > 0, and consder the bsecton method gven n Algorthm 3. Then the Algorthm termnates after T = O(log ɛ ) steps, and ts output x( ν(t ) ) s such that x( ν (T ) ) x ɛ. Each step of the algorthm has complexty O(d), thus the total complexty s O ( d log ɛ ). Proof: Defne, as n Algorthm 3, the functon x(ν) = ( φ(φ ( x ) ḡ + ν) + )=,...,d. Snce φ s, by assumpton, ncreasng, so s ν x (ν), whch s the key fact that allows us to use a bsecton. We wll denote by a superscrpt (t) the value of each varable at teraton t of the loop. To prove the clam, we show the followng nvarant for t: Algorthm 3 Bsecton method to compute the projecton x wth precson ɛ. : Input: x, ḡ, ɛ. : Intalze ν = φ () max φ ( x ) ḡ ν = φ (/d) max φ ( x ) ḡ 3: Defne x(ν) = ( φ(φ ( x ) ḡ + ν) + )=,...,d 4: whle x(ν) x(ν) > ɛ do 5: Let ν + ν+ν 6: f x (ν + ) > then 7: ν ν + 8: else 9: ν ν + 0: end f : end whle : Return x( ν) () 0 ν (t) ν (t) ν(0) ν (0), t (), 0 x (ν (t) ) x (ν (t) ), () d = x (ν (t) ) d = x ( ν (t) ). We frst prove the nvarant for t = 0. Let 0 = arg max φ ( x ) ḡ. By defnton of ν (0) and ν (0), we have φ (/d) ν = φ ( x 0 ) ḡ 0 = φ () ν, (7) and t follows that x 0 (ν (0) ) = d and x 0 ( ν (0) ) =. By (7), ν (0) ν (0) = φ () φ (/d) 0 (snce φ s ncreasng), whch proves (). Next, snce ν x (ν) s ncreasng, we have 0 x (ν (0) ) x ( ν (0) ) x 0 ( ν (0) ) =, whch proves (). Fnally, we have d = x (ν (0) ) d x 0 (ν (0) ) =, d = x ( ν (0) ) x 0 ( ν (0) ) =, whch proves (). Ths proves the nvarant for t = 0. Now suppose t holds at teraton t, and let us prove t stll holds at t +. By defnton of the bsecton (lnes 5 0), we mmedately have ν (t+) ν (t+) = ν(t) ν (t) = ν (0) ν (0) t, whch proves (). We also have that ν (t) ν (t+) ν (t+) ν (t), whch proves () snce ν x (ν) s ncreasng. Fnally, () follows from the condton of the bsecton (lne 6). To conclude the proof, we smply observe that snce the dstance ν ν decreases exponentally, the algorthm wll termnate after a number of steps logarthmc n /ɛ. Indeed, snce φ s C on (, a), t s Lpschtz-contnuous on 394
5 [φ (0), φ ()]. Let L be ts Lpschtz constant, then x(ν (t) ) x( ν (t) ) = x (ν (t) ) x ( ν (t) ) = dl ν (t) ν (t) = dl ν(0) ν (0) t by (), thus the algorthm termnates after T = log ν (0) ν (0) ɛdl teratons, and the last terate satsfes x(ν ) x( ν (T ) ) x(ν (T ) ) x( ν (t) ) ɛ, whch concludes the proof. by () and snce x are ncreasng IV. EFFICIENT EXACT PROJECTION WITH EXPONENTIAL POTENTIALS We now consder a subclass of ω-potentals, for whch we derve the exact soluton. Defnton (Exponental potental): Let ɛ 0. The functon φ ɛ : (, + ) ( ɛ, + ) u e u ɛ, s called the exponental potental wth parameter ɛ. It s a ( ɛ)-potental. The dstance generatng functon nduced by ths class of potentals s gven by ψ ɛ(x) = = = x φ ɛ (u)du = = x + ln(u + ɛ)du (x + ɛ) ln(x + ɛ) ( + ɛ) ln( + ɛ) = = H(x + ɛ) H( + ɛ), where ɛ s the vector whose entres are all equal to ɛ, and H s the generalzed negatve entropy functon, defned on R d + H(x) = d = x ln x. The correspondng Bregman dvergence s D ψɛ (x, y) = H(x + ɛ) H(y + ɛ) H(y + ɛ), x y = D KL (x + ɛ, y + ɛ) = (x + ɛ) ln x + ɛ y + ɛ, = and wll be denoted D KL,ɛ (x, y). In partcular, when ɛ = 0, D KL,ɛ (x, y) s the KL dvergence between the dstrbuton vectors x and y. When ɛ > 0, the Bregman dvergence s the KL dvergence between x + ɛ and y + ɛ. In partcular, as we wll see n Proposton 6, D KL,ɛ (x, y) s bounded whenever ɛ > 0, whle the KL dvergence (ɛ = 0) can be unbounded. As mentoned n the ntroducton, projectng on the smplex wth the KL dvergence plays a central role n many applcatons such as onlne learnng. In partcular, the projecton problem can be solved exactly n O(d) operatons, whch H(x) H ɛ (x) = H(x + ɛ), ɛ =. 0 ɛ ln ɛ + ( + ɛ) ln( + ɛ) Fg.. Illustraton of the dstance generatng functon nduced by exponental potentals wth parameter ɛ, for d = : H(x) = x ln(x ) + ( x ) ln( x ). makes ths projecton effcent. However, some varants of mrror descent, such as stochastc mrror descent, requre the Bregman dvergence to be bounded on the smplex n order to have guarantees on the convergence rate, see for example [4]. In the remander of ths secton, we wll show that projectng wth the generalzed KL dvergence D KL,ɛ enjoys many desrable propertes (strong convexty wth respect to the l norm, boundedness), and the projecton can stll be computed effcently. A. A sortng algorthm to compute the exact projecton We frst apply the optmalty condtons of Proposton to ths specal class, and show that the soluton s entrely determned by ts support. Proposton 4: Consder the Bregman projecton onto the smplex gven n Problem (5), wth Bregman dvergence D KL,ɛ. Let x be the soluton and I = { : x > 0} ts support. Then { I, x = ɛ + ( x+ɛ)e ḡ Z, Z I = ( x+ɛ)e ḡ (8) + I ɛ. Proof: Applyng Proposton wth the expresson φ(u) = e u + ɛ and φ (u) = + ln(u + ɛ), x s a soluton f and only f there exsts ν R such that, x = ( ɛ + ( x + ɛ)e ḡ e )+ ν, and x =. Thus, f I s the support of x, then these optmalty condtons are equvalent to { I, x = ɛ + ( x + ɛ)e ḡ e ν, I ɛ + ( x + ɛ)e ḡ e ν =, and the second equaton can be rewrtten as + ɛ I = e ν I ( x + ɛ)e ḡ, whch proves the clam, wth Z = e ν. Proposton 4 shows that solvng the Bregman projecton wth generalzed KL dvergence reduces to fndng the support of the soluton. Next, we show that the support has a smple characterzaton. To ths end, we assocate to ( x, ḡ) the vector ȳ defned as follows, ȳ = ( x + ɛ)e ḡ, and we denote by ȳ σ() the -th largest element of ȳ. 395
6 Algorthm 4 Sortng method to compute the Bregman projecton wth D ψɛ : Input: x, ḡ : Output: x 3: Form the vector ȳ = ( x + ɛ)e ḡ 4: Sort y, let ȳ σ() be the -th smallest element of y. 5: Let j be the smallest ndex for whch 6: Set Z = 7: Set c(j) := ( + ɛ(d j + ))ȳ σ(j) ɛ j j ȳ σ() +ɛ(d j +) x = ( ɛ + Proposton 5: The functon ) ȳ Z(j ) + c(j) ( + ɛ(d j + ))ȳ σ(j) ɛ j ȳ σ() > 0 ȳ σ() s ncreasng, and the support of x s {σ(j ),..., σ(n)}, where j = mn{j : c(j) > 0}. Proof: Frst, straghtforward algebra shows that c(j + ) c(j) = ( + ɛ(d j))(ȳ σ(j+) ȳ σ(j) ) 0. Thus c s ncreasng. To prove the second part of the clam, we know by Proposton 3 that the support s {σ( ),..., σ(n)} for some, and to show that = j = mn{j : c(j) > 0}, t suffces to show that c( ) > 0 and c(j) 0 for all j <. Frst, by the expresson (8) of x, we have x σ( ) = ɛ + ȳ σ( ) ȳ σ() +ɛ(d +) > 0, whch s equvalent to c( ) > 0. And f j < (.e. σ(j) s outsde the support), then by the expresson (8) agan, whch s equvalent to 0 = x σ(j) ɛ + ȳ σ(j) ȳ σ() +ɛ(d +) ( + ɛ(d ))ȳ σ(j) ɛ ȳ σ() 0, but c(j) s smaller than the LHS, snce c(j) ( + ɛ(d ))ȳ σ(j) ɛ ȳ σ() = ɛ j < ȳ σ(j) ȳ σ() 0, whch concludes the proof. Theorem : Algorthm 4 solves the Bregman projecton problem wth exponental potental φ ɛ n O(d log d) teratons. Proof: Correctness of the algorthm follows from the characterzaton of the support of x n Proposton 5 and Algorthm 5 QuckProjecton Algorthm to compute the Bregman projecton wth D ψɛ : Input: x, ḡ : Output: x 3: Form the vector ȳ = ( x + ɛ)e ḡ 4: Intalze J = {,..., d}, S = 0, C = 0, s = d + 5: whle J = do 6: Select a random pvot ndex j J 7: Partton J J + = { J : ȳ ȳ j } J = { J : ȳ < ȳ j } and compute S + = J + ȳ C + = J + 8: Let γ = ( + ɛ(c + C + ))ȳ j ɛ(s + S + ) 9: f γ > 0 then 0: J J, s = j : S S + S +, C C + C + : else 3: J J + 4: end f 5: end whle 6: Set Z = S +ɛc 7: Set ( ) x = ɛ + ȳ Z + the expresson of x n Proposton 4. The complexty of the sort operaton (step 4) s O(d log d), and fndng j (step 5) can be done n lnear tme snce the crteron functon c( ) s such that c(j +) c(j) = (+ɛ(d j))(ȳ σ(j+) ȳ σ(j) ), so each crteron evaluaton costs O(). Therefore, the overall complexty of Algorthm 4 s O(d log d). B. A randomzed pvot algorthm to compute the exact soluton We now propose a randomzed verson of Algorthm 4, whch selects a random pvot at each teraton, nstead of sortng the full vector. The resultng algorthm, whch we call QuckProject, s an extenson of the QuckSelect algorthm due to Hoare [6]. A smlar dea s used n the randomzed verson of the l projecton on the smplex n [3]. Theorem 3: In expectaton, the QuckProject Algorthm termnates after O(d) operatons, and outputs the soluton x of the Bregman projecton problem 5 wth the Bregman dvergence D KL,ɛ. Proof: Frst, we prove that the algorthm has expected lnear complexty. Let T (n) be the expected complexty of the whle loop when J = n. The partton and compute step (7) takes 3n operatons, then we recursvely apply the loop to J or J +, whch have szes (m, n m) for any m {,..., n}, wth unform 396
7 probablty. Thus we can bound T (n) as follows T (n) 3n + n 3n + n n T (max(m, n m)) m= n T (m), m= n and we can show by nducton that T (n) n, snce T (0) = 0 and 3n + n n m 3n + 3n 4 = n. m= n To prove the correctness of the algorthm, we wll prove that once the whle loop termnates, s = σ(j ), and S, C are respectvely the sum and the cardnalty of {ȳ σ() : j }, then by Proposton 4, we have the correct expresson of x. We start by showng the followng nvarants: () If ȳ σ(mt), s the largest element n J (t), then σ(m t + ) = (s ) (t). () J (t) contans σ(j ) or σ(j ). () S and C are the sum and cardnalty of { : σ() s }. (v) γ (t) = c(j (t) ), where c s the crteron functon defned n Proposton 5. The nvarant holds for the frst teraton snce J () = {,..., d}, m t = d, and S () = C () = 0. Suppose the nvarant s true at teraton t of the loop. Then two cases are possble: ) If γ (t) 0, then J (t+) = (J (t) ) + and m (t+) = m (t), and the nvarant stll holds. ) If γ (t) > 0, then J (t+) = (J (t) ) and (s ) (t+) = j (t), thus { : σ() (s ) t+ } = { : σ() (s ) (t) } { : (s ) t+ σ() (s ) (t) } = { : σ() (s ) (t) } (J (t) ) +, and by the update step (lnes 0 ), the nvarant stll holds. To fnsh the proof, suppose the whle loop termnates after T teratons,.e. J (T +) =. We clam that (s ) (T +) = σ(j ). Durng the last update, two cases are possble: ) If γ (T ) > 0, then ȳ j (T ) s the smallest element of J (T ). In ths case, snce c() 0 for < j, and J (T ) contans σ(j ) or σ(j ), t must be that j (T ) = σ(j ), thus (s ) T + = j (T ) = σ(j ). ) If γ (T ) 0, then ȳ j (T ) s the largest element of J (T ), n ths case, snce c(j ) > 0, t must be that j (T ) = σ(j ), so m (t) = j and (s ) (T +) = (s ) (T ) = σ(m (t) + ) = σ(j ). Ths concludes the proof. C. Propertes of the generalzed KL dvergence Algorthms 4 and 5 gve effcent methods for computng the projecton wth generalzed KL dvergence D KL,ɛ. In ths secton, we show that ths famly of Bregman dvergences enjoys addtonal propertes, gven below. Proposton 6: For all ɛ > 0, D KL,ɛ s l ɛ -strongly convex and L ɛ -smooth w.r.t., and bounded by D ɛ on, wth l ɛ + dɛ, L ɛ ɛ, D ɛ ln + ɛ. ɛ Proof: Frst, we show strong convexty. Let x, y. By Taylor s theorem, z (x + ɛ, y + ɛ) such that D KL,ɛ (x, y) = H(x + ɛ) H(y + ɛ) H(y + ɛ), x y = x y, H(z)(x y) = (x y ), z where we used the fact that the Hessan of the negatve entropy functon s H(z) = dag( z ). And snce, z ɛ (z belongs to the segment (x + ɛ, y + ɛ)), t follows that D KL,ɛ (x, y) (x y ) ɛ ɛ x y. Furthermore, by the Cauchy-Schwartz nequalty, ( x y ) (x y ) z z, thus D KL,ɛ (x, y) x y = z D KL,ɛ (δ 0, δ j0 ) = + dɛ x y. To compute the upper bound on D KL,ɛ, we observe that D KL,ɛ (x, y) s jontly-convex n (x, y) (by jont-convexty of the KL dvergence), therefore, ts maxmum on d d s attaned on a vertex of the feasble set, that s, for (x, y) = (δ 0, δ j0 ), for some ( 0, j 0 ), where δ 0 s the Drac dstrbuton on 0. Fnally, smple calculaton shows that { 0 f 0 = j 0, p q ln +ɛ ɛ otherwse. D KL(x, y 0) D KL,ɛ(x, y 0) lɛ x y 0 Lɛ x y0 0 p Fg. 3. Illustraton of Proposton 6, when d =. The dstrbutons x and y are parameterzed as follows: x = (p, p) and y = (q, q). The surface plot (left) shows the generalzed KL dvergence for ɛ =., l wth, n dashed lnes, the quadratc upper and lower bounds, ɛ y x and Lɛ x y. The second plot (rght) compares D KL,.(x, y 0 ) and D KL (x, y 0 ) for a fxed y 0 = (.35,.65). 397
8 D. Numercal experments We provde a smple python mplementaton of the projecton algorthms at gthub.com/waldk/ BregmanProjecton. The mplementaton of Algorthm 3 s generc and can be nstantated for any ω-potental by provdng the functon φ and ts nverse. The mplementaton of Algorthm 4 and QuckProject are specfc to the generalzed exponental potental. Fnally, we report n Fgure 4 the run tmes of both algorthms as the dmenson d grows, averaged over 50 runs, for randomly generated, normally dstrbuted vectors x and ḡ. The numercal smulatons are also avalable on the same repostory. Average run tme (s) SortProjecton QuckProjecton d Average run tme (s) SortProjecton QuckProjecton d 0 6 Fg. 4. Executon tme as a functon of the dmenson d, wth ɛ =., n log-log scale (left). The hghlghted regon s zoomed-n n lnear scale on the rght. The smulaton confrms that the QuckProject algorthm s, on average, faster than the sortng algorthm, especally for large d. V. CONCLUSION We studed the Bregman projecton problem on the smplex wth ω-potentals, and derved optmalty condtons for the soluton, whch motvated a smple bsecton algorthm to compute ɛ approxmate solutons n O(d log(/ɛ)) tme. Then we focused on the projecton problem wth exponental potentals, resultng n a Bregman dvergence whch generalzes the KL dvergence. We showed that n ths case, the soluton can be computed exactly n O(d log d) tme usng a sortng algorthm, or n expected O(d) tme usng a randomzed pvot algorthm. Ths class of dvergences s of partcular nterest because t has a quadratc upper and lower bound (.e. ts dstance generatng functon s both strongly convex and smooth), a property whch s essental to obtan convergence guarantees n some settngs, such as stochastc mrror descent. A queston whch remans open s whether one can project n O(d) tme usng a determnstc algorthm akn to the medan of medans algorthm due to Blum et al. [7] whch solves the selecton problem n determnstc lnear tme. The fact that one can effcently compute the exact soluton hnges on the exstence of a closed-form soluton of the dual varable ν gven the support of the soluton (Proposton 4). Ths s also the case for the Eucldean projecton,.e. when D ψ s the squared Eucldean norm, see [3]. Ths suggests that one may derve effcent projecton algorthms for other classes of Bregman dvergences, whch would, n turn, lead to new effcent nstances of the mrror descent method. REFERENCES [] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multplcatve weghts update method: a meta-algorthm and applcatons. Theory of Computng, 8(): 64, 0. [] Jean-Yves Audbert, Sébasten Bubeck, and Gàbor Lugos. Regret n onlne combnatoral optmzaton. Mathematcs of Operatons Research, 39():3 45, 04. [3] Amr Beck and Marc Teboulle. Mrror descent and nonlnear projected subgradent methods for convex optmzaton. Oper. Res. Lett., 3(3):67 75, May 003. [4] A. Ben-Tal and A. Nemrovsk. Lectures on Modern Convex Optmzaton. Socety for Industral and Appled Mathematcs, 00. [5] Aharon Ben-Tal, Tamar Margalt, and Arkad Nemrovsk. The ordered subsets mrror descent optmzaton method wth applcatons to tomography. SIAM J. on Optmzaton, ():79 08, January 00. [6] Davd Blackwell. An analog of the mnmax theorem for vector payoffs. Pacfc Journal of Mathematcs, 6(): 8, 956. [7] Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rvest, and Robert E. Tarjan. Tme bounds for selecton. J. Comput. Syst. Sc., 7(4):448 46, August 973. [8] Stephen Boyd and Leven Vandenberghe. Convex Optmzaton, volume 5. Cambrdge Unversty Press, 00. [9] Sébasten Bubeck and Ncolò Cesa-Banch. Regret analyss of stochastc and nonstochastc mult-armed bandt problems. Foundatons and Trends n Machne Learnng, 5():, 0. [0] Yar Censor and Stavros Zenos. Parallel Optmzaton: Theory, Algorthms and Applcatons. Oxford Unversty Press, 997. [] Ncolò Cesa-Banch and Gábor Lugos. Predcton, learnng, and games. Cambrdge Unversty Press, 006. [] Ofer Dekel, Ran Glad-Bachrach, Ohad Shamr, and Ln Xao. Optmal dstrbuted onlne predcton. In Proceedngs of the 8th Internatonal Conference on Machne Learnng (ICML), June 0. [3] John Duch, Sha Shalev-Shwartz, Yoram Snger, and Tushar Chandra. Effcent projectons onto the l-ball for learnng n hgh dmensons. In Proceedngs of the 5th Internatonal Conference on Machne Learnng, ICML 08, pages 7 79, New York, NY, USA, 008. ACM. [4] John C. Duch, Alekh Agarwal, Mkael Johansson, and Mchael Jordan. Ergodc mrror descent. SIAM Journal on Optmzaton (SIOPT), (4): , 00. [5] James Hannan. Approxmaton to Bayes rsk n repeated plays. Contrbutons to the Theory of Games, 3:97 39, 957. [6] C. A. R. Hoare. Algorthm 65: Fnd. Commun. ACM, 4(7):3 3, July 96. [7] Anatol Judtsky, Arkad Nemrovsk, and Clare Tauvel. Solvng varatonal nequaltes wth stochastc mrror-prox algorthm. Stoch. Syst., ():7 58, 0. [8] Jyrk Kvnen and Manfred K. Warmuth. Exponentated gradent versus gradent descent for lnear predctors. Informaton and Computaton, 3(): 63, 997. [9] Syrne Krchene, Wald Krchene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous dstrbuted learnng n the stochastc routng game. In Proceedngs of the 53rd Annual Allerton Conference on Communcaton, Control, and Computng, 05. [0] Wald Krchene, Syrne Krchene, and Alexandre Bayen. Convergence of mrror descent dynamcs n the routng game. In European Control Conference (ECC), accepted, 05. [] A. S. Nemrovsky and D. B. Yudn. Problem complexty and method effcency n optmzaton. Wley-Interscence seres n dscrete mathematcs. Wley,
Lecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationConvergence rates of proximal gradient methods via the convex conjugate
Convergence rates of proxmal gradent methods va the convex conjugate Davd H Gutman Javer F Peña January 8, 018 Abstract We gve a novel proof of the O(1/ and O(1/ convergence rates of the proxmal gradent
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationGeneral viscosity iterative method for a sequence of quasi-nonexpansive mappings
Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationMaximizing the number of nonnegative subsets
Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationAffine transformations and convexity
Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationFinding Primitive Roots Pseudo-Deterministically
Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms
More informationGames of Threats. Elon Kohlberg Abraham Neyman. Working Paper
Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationBOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS
BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationExercise Solutions to Real Analysis
xercse Solutons to Real Analyss Note: References refer to H. L. Royden, Real Analyss xersze 1. Gven any set A any ɛ > 0, there s an open set O such that A O m O m A + ɛ. Soluton 1. If m A =, then there
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationComputing Correlated Equilibria in Multi-Player Games
Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,
More informationVARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES
VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationCommunication Complexity 16:198: February Lecture 4. x ij y ij
Communcaton Complexty 16:198:671 09 February 2010 Lecture 4 Lecturer: Troy Lee Scrbe: Rajat Mttal 1 Homework problem : Trbes We wll solve the thrd queston n the homework. The goal s to show that the nondetermnstc
More informationDECOUPLING THEORY HW2
8.8 DECOUPLIG THEORY HW2 DOGHAO WAG DATE:OCT. 3 207 Problem We shall start by reformulatng the problem. Denote by δ S n the delta functon that s evenly dstrbuted at the n ) dmensonal unt sphere. As a temporal
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationSimultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals
Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationP exp(tx) = 1 + t 2k M 2k. k N
1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationOn the Global Linear Convergence of the ADMM with Multi-Block Variables
On the Global Lnear Convergence of the ADMM wth Mult-Block Varables Tany Ln Shqan Ma Shuzhong Zhang May 31, 01 Abstract The alternatng drecton method of multplers ADMM has been wdely used for solvng structured
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationLecture 4: November 17, Part 1 Single Buffer Management
Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input
More informationThe Experts/Multiplicative Weights Algorithm and Applications
Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationMAT 578 Functional Analysis
MAT 578 Functonal Analyss John Qugg Fall 2008 Locally convex spaces revsed September 6, 2008 Ths secton establshes the fundamental propertes of locally convex spaces. Acknowledgment: although I wrote these
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationEdge Isoperimetric Inequalities
November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationPerfect Competition and the Nash Bargaining Solution
Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange
More informationCalculation of time complexity (3%)
Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationSL n (F ) Equals its Own Derived Group
Internatonal Journal of Algebra, Vol. 2, 2008, no. 12, 585-594 SL n (F ) Equals ts Own Derved Group Jorge Macel BMCC-The Cty Unversty of New York, CUNY 199 Chambers street, New York, NY 10007, USA macel@cms.nyu.edu
More informationApproximate Smallest Enclosing Balls
Chapter 5 Approxmate Smallest Enclosng Balls 5. Boundng Volumes A boundng volume for a set S R d s a superset of S wth a smple shape, for example a box, a ball, or an ellpsod. Fgure 5.: Boundng boxes Q(P
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationRemarks on the Properties of a Quasi-Fibonacci-like Polynomial Sequence
Remarks on the Propertes of a Quas-Fbonacc-lke Polynomal Sequence Brce Merwne LIU Brooklyn Ilan Wenschelbaum Wesleyan Unversty Abstract Consder the Quas-Fbonacc-lke Polynomal Sequence gven by F 0 = 1,
More informationResearch Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization
To appear n Optmzaton Vol. 00, No. 00, Month 20XX, 1 27 Research Artcle Almost Sure Convergence of Random Projected Proxmal and Subgradent Algorthms for Dstrbuted Nonsmooth Convex Optmzaton Hdea Idua a
More informationRandom Projection Algorithms for Convex Set Intersection Problems
Random Projecton Algorthms for Convex Set Intersecton Problems A. Nedć Department of Industral and Enterprse Systems Engneerng Unversty of Illnos, Urbana, IL 61801 angela@llnos.edu Abstract The focus of
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationDiscussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek
Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson
More information