An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University

Size: px

Start display at page:

Download "An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University"

Anthony Ball
5 years ago
Views:

1 An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University

2 Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed Method Experimental Results Conclusion Future Work

3 Linear Regression Linear Regression is a widely used tool in data mining and machine learning areas A linear regression model is yy = xx TT ww + εε Outcome Feature Coefficients Noise yy xx TT ww εε = +

4 Regression with Interactions Regression with only linear effects may not be sufficient One effective way to capture the nonlinearity is to include interactions yy = xx TT ww xxtt QQQQ + εε Feature Coefficients Coefficients xx TT ww xx TT QQ xx Main Effects Interactions

5 Issues of Including Interactions ISSUE 1: Including interactions leads to ultra high dimensionality e.g., 100 features will result in 10,000 interactions ISSUE 2: Hierarchical structural estimator is typically difficult to obtain

6 Related Work Ling Yan et al used lowrank interaction model for click through rate prediction Peter Radchenko et al proposed a Lasso Type penalty VANISH to achieve strong hierarchical structure Ming Yuan et al proposed a type of nonnegative garrote method to achieve hierarchical structures Peng Zhao et al proposed Composite Absolute Penalties for hierarchical interaction models Nam Hee Choi et al formulated the interaction coefficients as the product of main effects coefficients to achieve strong hierarchy

7 Weak Hierarchical Lasso (Jacob et al Annals of Statistics) Weak Hierarchy An interaction is selected only if at least one of the main effects is included in the model QQ iiii 00 only if ww ii 00 OR ww jj 00. Age Gender MMSE Age Gender Age MMSE Gender MMSE

8 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 1, ww M1 M2 M3 M4 M5 QQ M1 M1 M1 M2 M1 M3 M1 M4 M1 M5

9 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 2, ww M1 M2 M3 M4 M5 QQ M1 M2 M2 M2 M2 M3 M2 M4 M2 M5

10 Weak Hierarchical Lasso Let LL(ww, QQ) be the loss function for interaction regression The Weak Hierarchical Lasso formulation: min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd. Constraint QQ,jj 1 ww jj makes the optimization problem nonconvex

11 Weak Hierarchical Lasso Jacob et al propose to solve a relaxed version: min ww,qq LL ww+ ww, QQ + λλ1 TT (ww + + ww ) + λλ 2 QQ 1 ss. tt. QQ,jj ww + 1 jj + ww jj ww + jj 0, jj = 1,, dd ww jj 0 ww 1 is relaxed to ww + + ww such that the optimization problem is convex The relaxed formulation guarantees a weak hierarchical solution only if a ridge penalty exists

12 Main Contributions We propose to directly solve the nonconvex Weak Hierarchical Lasso by a proximal algorithm We show that the associated proximal operator admits a closedform solution We accelerate the computation of each subproblem of the proximal operator from quadratic to linearithmic

13 Weak Hierarchical Lasso : Proximal Operator min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd Proximal Operator: arg min ww,qq 1 2 ww vv QQ UU FF 2 + λλ tt ww 1 + λλ 2tt QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd

14 Factorizing Unknown Variables The KEY IDEA is to factorize unknown coefficients ww and QQ by their signs SS ww, SS QQ and magnitudes ww and QQ Unknown coefficients ww and QQ are factorized as = + + = ww SS ww ww QQ = + + = + = + + = + = + + = SS QQ QQ

15 Proximal Operator Subproblem The proximal operator problem can be decoupled ww jj = SS ww jj ww jj By factorization QQ,jj = SS QQjj QQ,jj ww jj 0 QQ,jj 0, the subproblem is: 1 min SS ww jj,ss QQ, ww jj jj, QQ,jj 2 SS wwjj ww 2 1 jj vv jj SS 2 λλ QQ jj QQ jj UU,jj + FF tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, jj = 1,, dd

16 Solving Sign Variables We solve sign variables SS ww, SS QQ and magnitude variables ww, QQ together Since 1 2 ww 2 1 jj vv jj = 2 SS wwjj ww jj vv jj = 1 2 SS wwjj (SS wwjj ww jj vv jj ) 2 2 = 1 2 ww jj SS ww jj vv jj 2, then SS ww = sign(vv) SS QQjj must be the sign of UU,jj, i.e., SS QQjj = sign(uu,jj )

17 Solving Magnitude Variables The magnitude variables can thus be obtained by solving 1 min QQ,jj, ww jj 2 ww 2 1 jj vv jj + QQ λλ jj UU,jj + 2 tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, where vv jj = SS ww jj vv jj λλ tt 11 and UU,jj = SS QQjj UU,jj λλ 2tt 11 The above subproblem has closed form solutions

18 Accelerating Computation A straightforward implementation of solving the magnitude variables will take quadratic time Not scalable for large scale data We further accelerate the time complexity of computing each proximal operator subproblem to linearithmic Exploit special structure of the subproblem

19 Comparisons of Efficiency We compare the efficiency of our algorithm with the convex Weak Hierarchical Lasso in terms of running time and numbers of iterations

20 Comparisons of Recovery We compare the recovery performances of the two algorithms

21 Performance on Real Data We compare the original Weak Hierarchical Lasso with the convex version and other baseline methods on ADNI data

22 Conclusion We study Weak Hierarchical Lasso which is popular for building nonlinear regression models with interactions nonconvex, challenging to solve We propose a proximal algorithm to directly solve the Weak Hierarchical Lasso We show that the associated proximal operator admits a closed form solution We further accelerate the computation of each proximal operator subproblem from quadratic to linearithmic Empirical study demonstrated superior efficiency and effectiveness of the proposed algorithm

23 Future Work Apply the nonconvex Weak Hierarchical Lasso to other challenging applications such as depression study Extend our proposed algorithm to solve Strong Hierarchical Lasso problems

24 Thank You!

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence