Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

Size: px
Start display at page:

Download "Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization"

Transcription

1 The Thirty-Second AAAI Conference on Artificial Intelligence AAAI-8) Inexact Proxial Gradient Methods for Non-Convex and Non-Sooth Optiization Bin Gu, De Wang, Zhouyuan Huo, Heng Huang * Departent of Electrical & Coputer Engineering, University of Pittsburgh, USA Dept. of Coputer Science and Engineering, University of Texas at Arlington, USA big0@pitt.edu, wangdelp@gail.co, zhouyuan.huo@pitt.edu, heng.huang@pitt.edu Abstract In achine learning research, the proxial gradient ethods are popular for solving various optiization probles with non-sooth regularization. Inexact proxial gradient ethods are extreely iportant when exactly solving the proxial operator is tie-consuing, or the proxial operator does not have an analytic solution. However, existing inexact proxial gradient ethods only consider convex probles. The knowledge of inexact proxial gradient ethods in the non-convex setting is very liited. To address this challenge, in this paper, we first propose three inexact proxial gradient algoriths, including the basic version and Nesterov s accelerated version. After that, we provide the theoretical analysis to the basic and Nesterov s accelerated versions. The theoretical results show that our inexact proxial gradient algoriths can have the sae convergence rates as the ones of exact proxial gradient algoriths in the non-convex setting. Finally, we show the applications of our inexact proxial gradient algoriths on three representative non-convex learning probles. Epirical results confir the superiority of our new inexact proxial gradient algoriths. Introduction Many achine learning probles involve non-sooth regularization, such as the achine learning odels with a variety of sparsity-inducing penalties Bach et al. 0). Thus, efficiently solving the optiization proble with nonsooth regularization is iportant for any achine learning applications. In this paper, we focus on the optiization proble of achine learning odel with non-sooth regularization as: in fx) =gx)+hx) ) x R N where g : R N R corresponding to the epire risk is sooth and possibly non-convex, and h : R N R corresponding to the regularization ter is non-sooth and possibly non-convex. Proxial gradient ethods are popular for solving various optiization probles with non-sooth regularization. The pivotal step of the proxial gradient ethod is to solve * To who all correspondence should be addressed. Copyright 08, Association for the Advanceent of Artificial Intelligence All rights reserved. the proxial operator as following: Prox y) =argin x R N γ x y + hx) ) where γ is the stepsize. If the function hx) is siple enough, we can obtain the solution of the proxial operator analytically. For exaple, if hx) = x, the solution of the proxial operator can be obtained by a shrinkage thresholding operator Beck and Teboulle 009). If the function hx) is coplex such that the corresponding proxial operator does not have an analytic solution, a specific algorith should be designed for solving the proxial operator. For exaple, if hx) = x + c i<j ax{ x i, x j } as used in OSCAR Zhong and Kwok 0) for the sparse regression with autoatic feature grouping, Zhong and Kwok 0) proposed an iterative group erging algorith for exactly solving the proxial operator. However, it would be expensive to solve the proxial operators when the function hx) is coplex. Once again, take OSCAR as an exaple, when the data is with high diensionality epirically larger than,000), the iterative group erging algorith would becoe very inefficient. Even worse, there would be no solver for exactly solving the proxial operators when the function hx) is over coplex. For exaple, Grave, Obozinski, and Bach 0) proposed the trace Lasso nor to take into account the correlation of the design atrix to stabilize the estiation in regression. However, due to the coplexity of trace Lasso nor, there still have no solver for solving the corresponding proxial operator, to the best of our knowledge. To address the above issues, Schidt, Le Roux, and Bach 0) first proposed the inexact proxial gradient ethods including the basic version ) and Nesterov s accelerated version A)), which solves the proxial operator approxiately i.e., tolerating an error in the calculation of the proxial operator). They proved that the inexact proxial gradient ethods can have the sae convergence rates as the ones of exact proxial gradient ethods, provided that the errors in coputing the proxial operator decrease at appropriate rates. Independently, Villa et al. 03) proposed A algorith and proved the corresponding convergence rate. In the paper of Villa et al. 03), The stepsize γ is set anually or autoatically deterined by a backtracking line-search procedure Beck and Teboulle 009). 3093

2 Table : Representative exact and inexact) proxial gradient algoriths. C and NC are the abbreviations of convex and non-convex respectively.) Algorith Proxial Accelerated gx) hx) Reference +A Exact Yes C C Beck and Teboulle 009) A Exact Yes C+NC C Ghadii and Lan 06) Exact No NC NC Boţ, Csetnek, and László 06) A Exact Yes C+NC C+NC Li and Lin 05) +A Inexact Yes C C Schidt, Le Roux, and Bach 0) AIFB A) Inexact Yes C C Villa et al. 03) +A Inexact Yes C+NC C+NC Ours they called A as the inexact forward-backward splitting ethod AIFB) which is well-known in the field of signal processing. We suarize these works in Table. Fro Table, we find that the existing inexact proxial gradient ethods only consider convex probles. However, a lot of optiization probles in achine learning are nonconvex. The non-convexity originates either fro the epirical risk function gx) or the regularization function hx). First, we investigate the epirical risk function gx) i.e., loss function). The correntropy induced loss Feng et al. 05) is widely used for robust regression and classification, which is non-convex. The syetric sigoid loss on the unlabeled saples is used in sei-supervised SVM Chapelle, Chi, and Zien 006) which is non-convex. Second, we investigate the regularization function hx). Capped-l penalty Zhang 00) is used for unbiased variable selection, and the low rank constraint Jain, Meka, and Dhillon 00) is widely used for the atrix copletion. Both of these regularization functions are non-convex. However, our knowledge of inexact proxial gradient ethods is very liited in the non-convex setting. To address this challenge, in this paper, we first propose three inexact proxial gradient algoriths, including the basic and Nesterov s accelerated versions, which can handle the non-convex probles. Then we give the theoretical analysis to the basic and Nesterov s accelerated versions. The theoretical results show that our inexact proxial gradient algoriths can have the sae convergence rates as the ones of exact proxial gradient algoriths. Finally, we provide the applications of our inexact proxial gradient algoriths on three representative non-convex learning probles. The applications on robust OSCAR and link prediction show that, our inexact proxial gradient algoriths could be significantly faster than the exact proxial gradient algoriths. The application on robust Trace Lasso fills the vacancy that there is no proxial gradient algorith for Trace Lasso. Contributions. The ain contributions of this paper are suarized as follows:. We first propose the basic and accelerated inexact proxial gradient algoriths with rigorous convergence guarantees. Specifically, our inexact proxial gradient algoriths can have the sae convergence rates as the ones of exact proxial gradient algoriths in the non-convex setting.. We provide the applications of our inexact proxial gradient algoriths on three representative non-convex learning probles, i.e., robust OSCAR, link prediction and robust Trace Lasso. The results confir the superiority of our inexact proxial gradient algoriths. Related Works Proxial gradient ethods are one of the ost iportant ethods for solving various optiization probles with non-sooth regularization. There have been a variety of exact proxial gradient ethods. Specifically, for convex probles, Beck and Teboulle 009) proposed basic proxial gradient ) ethod and Nesterov s accelerated proxial gradient A) ethod. They proved that has the convergence rate O T ), and A has the convergence rate O T ), where T is the nuber of iterations. For non-convex probles, Ghadii and Lan 06) considered that only gx) could be non-convex, and proved that the convergence rate of A ethod is O T ). Boţ, Csetnek, and László 06) considered that both of gx) and hx) could be non-convex, and proved the convergence of ethod. Li and Lin 05) considered that both of gx) and hx) could be non-convex, and proved that the A algorith can converge in a finite nuber of iterations, in a linear rate or a sublinear rate i.e., O T )) at different conditions. We suarize these exact proxial gradient ethods in Table. In addition to the above batch exact proxial gradient ethods, there are the stochastic and online proxial gradient ethods Duchi and Singer 009; Xiao and Zhang 04). Because they are beyond the scope of this paper, we do not review the in this paper. Preliinaries In this section, we introduce the Lipschitz sooth, ε- approxiate subdifferential and ε-approxiate Kurdyka- Łojasiewicz KL) property, which are critical to the convergence analysis of our inexact proxial gradient ethods in the non-convex setting. Lipschitz sooth: For the sooth functions gx),wehave the Lipschitz constant L for gx) Wood and Zhang 996) as following. Assuption. L is the Lipschitz constant for gx). Thus, for all x and y, L-Lipschitz sooth can be presented as gx) gy) L x y 3) 3094

3 Equivalently, L-Lipschitz sooth can also be written as the forulation 4). gx) gy)+ gy),x y + L x y 4) ε-approxiate subdifferential: Because inexact proxial gradient is used in this paper, an ε-approxiate proxial operator ay produce an ε-approxiate subdifferential. In the following, we give the definition of ε-approxiate subdifferential Bertsekas et al. 003) which will be used in the ε-approxiate KL property i.e., Definition ). Definition ε-approxiate subdifferential). Given a convex function hx) :R N R and a positive scalar ε, the ε-approxiate subdifferential of hx) at a point x R N denoted as ε hx))is ε hx) = { d R N : hy) hx)+ d, y x ε } 5) Based on Definition, if d ε hx), we say that d is an ε-approxiate subgradient of hx) at the point x. ε-approxiate KL property: Originally, KL property is introduced to analyze the convergence rate of exact proxial gradient ethods in the non-convex setting Li and Lin 05; Boţ, Csetnek, and László 06). Because this paper focuses on the inexact proxial gradient ethods, correspondingly we propose the ε-approxiate KL property in Definition, where the function distx, S) is defined by distx, S) =in y S x y, and S is a subset of R N. Definition ε-kl property). A function fx) =gx) + hx) :R N, + ] is said to have the ε-kl property at u {u R N : gu) + ε hu)) }, if there exists η 0, + ], a neighborhood U of u and a function ϕ Φ η, such that for all u U {u R N : fu) <fu) < fu)+η}, the following inequality holds ϕ fu) fu))dist0, gu)+ ε hu))) 6) where Φ η stands for a class of functions ϕ :[0,η) R + satisfying:. ϕ is concave and continuously differentiable function on 0,η);. ϕ is continuous at 0, ϕ0) = 0; 3. and ϕ x) > 0, x 0,η). Inexact Proxial Gradient Algoriths In this section, we first propose the basic inexact proxial gradient algorith for the non-convex optiization, and then propose two accelerated inexact proxial gradient algoriths. Basic Version As shown in Table, for the convex probles i.e., both the functions gx) and hx) are convex), Schidt, Le Roux, and Bach 0) proposed a basic inexact proxial gradient ) ethod. We follow the fraework of in Schidt, Le Roux, and Bach 0), and give our algorith for the non-convex optiization probles i.e., either the function gx) or the function hx) is non-convex). Specifically, our algorith is presented in Algorith. Siilar with the exact proxial gradient algorith, the pivotal step of our i.e., Algorith ) is to copute an inexact proxial operator x Prox ε y) as following. { x Prox ε y) = z R N : γ z y + hz) } ε +in x γ x y + hx) where ε denotes an error in the calculation of the proxial operator. As discussed in Tappenden, Richtárik, and Gondzio 06), there are several ethods to copute the inexact proxial operator. The ost popular ethod is using a prial dual algorith to control the dual gap Bach et al. 0). Based on the dual gap, we can strictly control the error in the calculation of the proxial operator. Algorith Basic inexact proxial gradient ethod ) Input:, error ε k k =,,), stepsize γ< L. Output: x. : Initialize x 0 R d. : for k =,,do 3: Copute x k x k γ gx k )). 4: end for Accelerated Versions We first propose a Nesterov s accelerated inexact proxial gradient algorith for non-convex optiization, then give a nononotone accelerated inexact proxial gradient algorith. As shown in Table, for the convex optiization probles, Beck and Teboulle 009) proposed a Nesterov s accelerated inexact proxial gradient i.e., A) ethod, and Schidt, Le Roux, and Bach 0) proposed a Nesterov s accelerated inexact proxial gradient i.e., A) ethod. Both of A and A are accelerated by a oentu ter. However, as entioned in Li and Lin 05), traditional Nesterov s accelerated ethod ay encounter a bad oentu ter for the non-convex optiization. To address the bad oentu ter, Li and Lin 05) added another proxial operator as a onitor to ake the objective function sufficient descent. To ake the objective functions generated fro our A strictly descent, we follow the fraework of A in Li and Lin 05). Thus, we copute two inexact proxial operators z k+ y k γ gy k )) and v k+ x k γ gx k )), where v k+ is a onitor to ake the objective function strictly descent. Specifically, our A is presented in Algorith. To address the bad oentu ter in the non-convex setting, our A i.e., Algorith ) uses a pair of inexact proxial operators to ake the objective functions strictly descent. Thus, A is a onotone algorith. Actually, using two proxial operators is a conservative strategy. As entioned in Li and Lin 05), we can accept 3095

4 Algorith Accelerated inexact proxial gradient ethod A) Input:, error ε k k =,,), t 0 =0, t =, stepsize γ< L. Output: x +. : Initialize x 0 R d, and x = z = x 0. : for k =,,,do 3: y k = x k + t k t k z k x k )+ t k t k x k x k ). 4: Copute z k+ such that z k+ y k γ gy k )). 5: Copute v k+ such that v k+ x k γ gx k )). 4t k ++. 6: t k+ = { zk+ if fz 7: x k+ = k+ ) fv k+ ) v k+ otherwise 8: end for z k+ as x k+ directly if it satisfies the criterion fz k+ ) fx k ) δ z k+ y k which shows that y k is a good extrapolation. v k+ is coputed only when this criterion is not et. Thus, the average nuber of proxial operators can be reduced. Following this idea, we propose our nononotone accelerated inexact proxial gradient algorith ) in Algorith 3. Epirically, we find that the with the value of δ [0.5, ] works good. In our experients, we set δ =0.6. Algorith 3 Nononotone accelerated inexact proxial gradient ethod ) Input:, ε k k =,,), t 0 =0, t =, stepsize γ< L, δ>0. Output: x +. : Initialize x 0 R d, and x = z = x 0. : for k =,,,do 3: y k = x k + t k t k z k x k )+ t k t k x k x k ). 4: Copute z k+ such that z k+ y k γ gy k )). 5: 6: if fz k+ ) fx k ) δ z k+ y k then x k+ = z k+ 7: else 8: Copute v k+ such that v k+ x k γ gx k )). { zk+ if fz 9: x k+ = k+ ) fv k+ ) v k+ otherwise 0: end if 4t : t k+ = k ++. : end for Convergence Analysis As entioned before, the convergence analysis of inexact proxial gradient ethods for the non-convex probles is still an open proble. This section will address this challenge. Specifically, we first prove that and A converge to a critical point in the convex or non-convex setting Theore ) if {ε k } is a decreasing sequence and k= ε k <. Next, we prove that has the convergence rate O T ) for the non-convex probles Theore ) when the errors decrease at an appropriate rate. Then, we prove the convergence rates for A in the non-convex setting Theore 3). The detailed proofs of Theores, and 3 can be found in Appendix. Convergence of and A We first prove that A and A converge to a critical point Theore ) if {ε k } is a decreasing sequence and k= ε k <. Theore. With Assuption, if {ε k } is a decreasing sequence and k= ε k <, we have 0 li k gx k )+ εk hx k ) for and A in the convex and non-convex optiization. Reark. Theore guarantees that and A converge to a critical point or called as stationary point) after an infinite nuber of iterations in the convex or non-convex setting. Convergence Rates of Because both the functions gx) and hx) are possibly non-convex, we cannot directly use fx k ) fx ) or x k x for analyzing the convergence rate of, where x is an optial solution of ). In this paper, we use k= x k x k for analyzing the convergence rate of in the non-convex setting. The detailed reason is provided in Appendix. Theore shows that has the convergence rate O T ) for the non-convex optiization when the errors decrease at an appropriate rate, which is exactly the sae as the error-free case see discussion in Reark ). Theore. For gx) is non-convex, and hx) is convex or non-convex, we have the following results for :. If hx) is convex, we have that x k x k 7) k= A + fx 0 ) fx )) + ) B γ L where A = k= k= ε k. γ L γ L εk γ and B =. If hx) is non-convex, we have that x k x k 8) k= ) ) fx 0 ) fx )+ ε k γ L k= 3096

5 Reark. Theore iplies that has the convergence rate O T ) for the non-convex optiization without errors. If { ε k } is suable and hx) is convex, we can also have that has the convergence rate O T ) for the non-convex optiization. If {ε k } is suable and hx) is non-convex, we can also have that has the convergence rate O T ) for the non-convex optiization. Convergence Rates of A In this section, based on the ε-kl property, we prove that A converges in a finite nuber of iterations, in a linear rate or a sublinear rate at different conditions in the nonconvex setting Theore 3), which is exactly the sae as the error-free case Li and Lin 05). Theore 3. Assue that g is a non-convex function with Lipschitz continuous gradients, h is a proper and lower seicontinuous function. If the function f satisfies the ε-kl property, ε k = α v k+ x k, α 0, γ L α 0 and the desingularising function has the for ϕt) = C θ tθ for soe C>0, θ 0, ], then. If θ =, there exists k such that fx k )=f for all k>k and A terinates in a finite nuber of steps, where li k fx k )=f.. If θ [, ), there exists k such that for all k>k fx k ) li fx d C ) k k k) k +d C fv k ) f ) where d = ) γ +L+ α γ. γ L α 3. If θ 0, ), there exists k 3 such that for all k>k 3 fx k ) li fx C k) k k k 3 )d θ) where d =in ) θ { d C, C ) } θ θ fv 0 ) f ) θ θ Reark 3. Theore 3 iplies that A converges in a finite nuber of iterations when θ =, in a linear rate when θ [, ) and at least a sublinear rate when θ 0, ). Experients We first give the experiental setup, then present the ipleentations of three non-convex applications with the corresponding experiental results. Experiental Setup Design of Experients To validate the effectiveness of our inexact proxial gradients ethods i.e.,, A and ), we apply the to solve three representative nonconvex learning probles as follows.. Robust OSCAR: Robust OSCAR is a robust version of OSCAR ethod Zhong and Kwok 0), which is a feature selection odel with the capability to autoatically detect feature group structure. The function gx) is non-convex, and the function hx) is convex. Zhong and Kwok Zhong and Kwok 0) proposed a fast iterative group erging algorith for exactly solving the proxial operator.. Social Link Prediction: Given an incoplete atrix M user-by-user) with each entry M ij {+, }, social link prediction is to recover the atrix M i.e. predict the potential social links or friendships between users) with low-rank constraint. The function gx) is convex, and the function hx) is non-convex. The low-rank proxial operator can be solved exactly by the Lanczos ethod Larsen 998). 3. Robust Trace Lasso: Robust trace Lasso is a robust version of trace Lasso Grave, Obozinski, and Bach 0). The function gx) is non-convex, and the function hx) is convex. To the best of our knowledge, there is still no proxial gradient algorith for solving trace Lasso. We also suarize these three non-convex learning probles in Table. To show the advantages of our inexact proxial gradients ethods, we copare the convergence rates and the running tie for our inexact proxial gradients ethods and the exact proxial gradients ethods i.e.,, A and ). Table : Three typical learning applications. C, NC and PO are the abbreviations of convex, non-convex and proxial operator, respectively.) Application gx) hx) Exact PO Robust OSCAR NC C Yes Link Prediction C NC Yes Robust Trace Lasso NC C No Datasets Table 3 suarizes the six datasets used in our experients. Specifically, the Cardiac and Coil0 datasets are for the robust OSCAR application. The Cardiac dataset was collected fro hospital, which is to predict the area of the left ventricle, and is encouraged to find the hoogenous groups of features. The Soc-sign-epinions and Soc- Epinions datasets 3 are the social network data for the social link prediction application, where each row corresponds to a node, and each collu corresponds to an edge. The GasSensorArrayDrift and YearPredictionMSD datasets fro UCI 4 are for the robust trace Lasso application. Ipleentations and Results Robust OSCAR For the robust regression, we replace the square loss originally used in OSCAR with the correntropy induced loss He, Zheng, and Hu 0). Thus, we consider the robust OSCAR with the functions gx) and hx) as follows. l ) gx) = σ e y i X T i x) σ, 9) i= 0.php

6 A A A A A A A A a) Cardiac: obj. vs. iteration A A e) SSE: obj. vs. iteration A b) Cardiac: obj. vs. tie A A f) SSE: obj. vs. tie A c) Coil0: obj. vs. iteration A A g) SE: obj. vs. iteration d) Coil0: obj. vs tie A A h) SE: obj. vs. tie i) GS: obj. vs. iteration j) GS: obj. vs. tie k) YP: obj. vs. iteration l) YP: obj. vs. tie Figure : Coparison of convergence speed for different ethods. a-d): Robust OSCAR. e-h): Link prediction. i-l) Robust trace LASSO A A Table 3: The datasets used in the experients. Application Dataset Saple size Attributes Robust Cardiac 3, OSCAR Coil0,440,04 Social Link Soc-sign-epinionsSSE) 3,88 84,37 Prediction Soc-EpinionsSE) 75, ,837 Robust GasSensorArrayDriftGS) 3,90 8 Trace Lasso YearPredictionMSDYP) 5, hx) = λ x + λ ax{ x i, x j }, 0) i<j where λ 0 and λ 0 are two regularization paraeters. For exact proxial gradients algoriths, Zhong and Kwok Zhong and Kwok 0) proposed a fast iterative group erging algorith for exactly solving the proxial subproble. We propose a subgradient algorith to approxiately solve the proxial subproble. Let oj) {,,,N} denote the order of x j 5 aong { x, x,, x N } such that if oj ) <oj ),wehave 5 Here, x j denotes the j-th coordinate of the vector x. x j x j. Thus, the subgradient of hx) can be coputed as hx) = N j= λ + λ oj) )) x j. The subgradient algorith is oitted here due to the space liit. We ipleent our, A and ethods for robust OSCAR in Matlab. We also ipleent, A and in Matlab. Figures a and c show the convergence rates of the objective value vs. iteration for the exact and inexact proxial gradients ethods. The results confir that exact and inexact proxial gradients ethods have the sae convergence rates. The convergence rates of exact and inexact accelerated ethods are faster than the ones of the basic ethods and ). Figures b and d show the convergence rates of the vs. the running tie for the exact and inexact proxial gradients ethods. The results show that our inexact ethods are significantly faster than the exact ethods. When the diension increases, we can even achieve ore than 00 folds speedup. This is because our subgradient based algorith for approxiately solving the proxial subproble is uch efficient than the projection algorith for exactly solving the proxial subproble Zhong and Kwok 0). 3098

7 Social Link Prediction In social link prediction proble, we hope to predict the new potential social links or friendships between online users. Given an incoplete atrix M user-by-user) with each entry M ij {+, }, social link prediction is to recover the atrix M with low-rank constraint. Specifically, social link prediction considers the function fx) as following: in X log + exp X ij M ij )) i,j) Ω }{{} gx) ) s.t. rankx) r, where Ω is a set of i, j) corresponding to the entries of M which are observed, λ is a regularization paraeter. The proxial operator in rankx) r X X t γ gx t )) can be solved by the rank-r singular value decoposition SVD) Jain, Meka, and Dhillon 00). The rank-r SVD can be solved exactly by the Lanczos ethod Larsen 998), and also can be solved approxiately by the power ethod Halko, Martinsson, and Tropp 0; Journée et al. 00). We ipleent our, A and ethods for social link prediction in Matlab. We also ipleent, A and in Matlab. Figures e and g show the convergence rates of the objective value vs. iteration for the exact and inexact proxial gradients ethods. The results confir that exact and inexact proxial gradients ethods have the sae convergence rates. Figures f and h illustrate the convergence rates of the vs. running tie for the exact and inexact proxial gradients ethods. The results verify that our inexact ethods are faster than the exact ethods. Robust Trace Lasso Robust trace Lasso is a robust version of trace Lasso Grave, Obozinski, and Bach 0; Bach 008). Sae with robust OSCAR, we replace the square loss originally used in trace Lasso with the correntropy induced loss He, Zheng, and Hu 0) for robust regression. Thus, we consider the robust trace Lasso with the functions gx) and hx) as following: gx) = σ l e y i X i= ) T i x) σ ) hx) = λ XDiagx), 3) where is the trace nor, λ is a regularization paraeter. To the best of our knowledge, there is still no algorith to exactly solve the proxial subproble. To ipleent our and A, we propose a subgradient algorith to approxiately solve the proxial subproble. Specifically, the subgradient of the trace Lasso regularization XDiagx) can be coputed by Theore 4 which is originally provided in Bach 008). We ipleent our, A and ethods for robust trace Lasso in Matlab. Theore 4. Let UDiags)V T be the singular value decoposition of XDiagx). Then, the subgradient of the trace Lasso regularization XDiagx) is given by XDiagx) = {Diag X T UV T + M) ) : 4) M,U T M =0 and MV =0} Figures i-l show the convergence rates of the objective value vs. iteration and running tie respectively, for our inexact proxial gradient ethods. The results deonstrate that we provide efficient proxial gradient algoriths for the robust trace Lasso. More iportantly, our, A and algoriths fill the vacancy that there is no proxial gradient algorith for trace Lasso. This is because that, directly solving the proxial subproble for robust trace Lasso is quite difficult Grave, Obozinski, and Bach 0; Bach 008). Our subgradient based algorith provides an alternative approach for solving the proxial subproble. Suary of the Experiental results Based on the results of three non-convex achine learning applications, our conclusion is that our inexact proxial gradient algoriths can provide flexible algoriths to the optiization probles with coplex non-sooth regularization. More iportantly, our inexact algoriths could be significantly faster than the exact proxial gradient algoriths. Conclusion Existing inexact proxial gradient ethods only consider convex probles. The knowledge of inexact proxial gradient ethods in the non-convex setting is very liited. To address this proble, we proposed three inexact proxial gradient algoriths with the theoretical analysis. The theoretical results show that our inexact proxial gradient algoriths can have the sae convergence rates as the ones of exact proxial gradient algoriths in the non-convex setting. We also provided the applications of our inexact proxial gradient algoriths on three representative non-convex learning probles. The results confir the superiority of our inexact proxial gradient algoriths. Acknowledgeent This work was partially supported by the following grants: NSF-IIS 30675, NSF-IIS 3445, NSF-DBI 35668, NSF-IIS 69308, NSF-IIS , NIH R0 AG References Bach, F.; Jenatton, R.; Mairal, J.; Obozinski, G.; et al. 0. Optiization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4): 06. Bach, F. R Consistency of trace nor iniization. Journal of Machine Learning Research 9Jun): Beck, A., and Teboulle, M A fast iterative shrinkagethresholding algorith for linear inverse probles. SIAM journal on iaging sciences ):83 0. Bertsekas, D. P.; Nedi, A.; Ozdaglar, A. E.; et al Convex analysis and optiization. Boţ, R. I.; Csetnek, E. R.; and László, S. C. 06. An inertial forward backward algorith for the iniization of the su of two nonconvex functions. EURO Journal on Coputational Optiization 4):

8 Chapelle, O.; Chi, M.; and Zien, A A continuation ethod for sei-supervised svs. In Proceedings of the 3rd international conference on Machine learning, ACM. Duchi, J., and Singer, Y Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research 0Dec): Feng, Y.; Huang, X.; Shi, L.; Yang, Y.; and Suykens, J. A. 05. Learning with the axiu correntropy criterion induced losses for regression. Journal of Machine Learning Research 6: Ghadii, S., and Lan, G. 06. Accelerated gradient ethods for nonconvex nonlinear and stochastic prograing. Matheatical Prograing 56-): Grave, E.; Obozinski, G. R.; and Bach, F. R. 0. Trace lasso: a trace nor regularization for correlated designs. In Advances in Neural Inforation Processing Systes, Halko, N.; Martinsson, P.-G.; and Tropp, J. A. 0. Finding structure with randoness: Probabilistic algoriths for constructing approxiate atrix decopositions. SIAM review 53):7 88. He, R.; Zheng, W.-S.; and Hu, B.-G. 0. Maxiu correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 338): Hsieh, C.-J., and Olsen, P. A. 04. Nuclear nor iniization via active subspace selection. In ICML, Jain, P.; Meka, R.; and Dhillon, I. S. 00. Guaranteed rank iniization via singular value projection. In Advances in Neural Inforation Processing Systes, Journée, M.; Nesterov, Y.; Richtárik, P.; and Sepulchre, R. 00. Generalized power ethod for sparse principal coponent analysis. Journal of Machine Learning Research Feb): Larsen, R. M Lanczos bidiagonalization with partial reorthogonalization. DAIMI Report Series 7537). Li, H., and Lin, Z. 05. Accelerated proxial gradient ethods for nonconvex prograing. In Advances in Neural Inforation Processing Systes, Lin, Q.; Lu, Z.; and Xiao, L. 04. An accelerated proxial coordinate gradient ethod. In Advances in Neural Inforation Processing Systes, Schidt, M.; Le Roux, N.; and Bach, F. 0. Convergence rates of inexact proxial-gradient ethods for convex optiization. arxiv preprint arxiv: Tappenden, R.; Richtárik, P.; and Gondzio, J. 06. Inexact coordinate descent: coplexity and preconditioning. Journal of Optiization Theory and Applications Villa, S.; Salzo, S.; Baldassarre, L.; and Verri, A. 03. Accelerated and inexact forward-backward algoriths. SIAM Journal on Optiization 33): Wood, G., and Zhang, B Estiation of the lipschitz constant of a function. Journal of Global Optiization 8):9 03. Xiao, L., and Zhang, T. 04. A proxial stochastic gradient ethod with progressive variance reduction. SIAM Journal on Optiization 44): Zhang, T. 00. Analysis of ulti-stage convex relaxation for sparse regularization. Journal of Machine Learning Research Mar): Zhong, L. W., and Kwok, J. T. 0. Efficient sparse odeling with autoatic feature grouping. IEEE transactions on neural networks and learning systes 39):

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh Septeber 7, 07 Abstract We investigate the convergence properties of increental irror descent

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh April 6, 08 Abstract We investigate the convergence properties of increental irror descent type

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili, Australian Journal of Basic and Applied Sciences, 5(3): 35-358, 20 ISSN 99-878 Generalized AOR Method for Solving Syste of Linear Equations Davod Khojasteh Salkuyeh Departent of Matheatics, University

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Optimum Value of Poverty Measure Using Inverse Optimization Programming Problem

Optimum Value of Poverty Measure Using Inverse Optimization Programming Problem International Journal of Conteporary Matheatical Sciences Vol. 14, 2019, no. 1, 31-42 HIKARI Ltd, www.-hikari.co https://doi.org/10.12988/ijcs.2019.914 Optiu Value of Poverty Measure Using Inverse Optiization

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France SipleMKL Alain Rakotoaonjy LITIS EA 48 Université de Rouen 768 Saint Etienne du Rouvray, France Francis Bach INRIA - Willow project Départeent d Inforatique, Ecole Norale Supérieure 45, Rue d Ul 7523 Paris,

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming

Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Prograing Bingsheng He and Xiaoing Yuan October 8, 0 [First version: January 3, 0] Abstract Recently, we have

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Complex Quadratic Optimization and Semidefinite Programming

Complex Quadratic Optimization and Semidefinite Programming Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network 565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Composite optimization for robust blind deconvolution

Composite optimization for robust blind deconvolution Coposite optiization for robust blind deconvolution Vasileios Charisopoulos Daek Davis Mateo Díaz Ditriy Drusvyatskiy Abstract The blind deconvolution proble seeks to recover a pair of vectors fro a set

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information