Convex relaxation for Combinatorial Penalties

Size: px
Start display at page:

Download "Convex relaxation for Combinatorial Penalties"

Transcription

1 Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation, Inference and Optimization IHES - March 20th 2013 Convex relaxation for Combinatorial Penalties 1/33

2 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Convex relaxation for Combinatorial Penalties 2/33

3 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Convex relaxation for Combinatorial Penalties 2/33

4 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

5 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

6 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd 0 Convex relaxation for Combinatorial Penalties 2/33

7 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

8 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

9 From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Lasso min w R d L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 2/33

10 ... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Convex relaxation for Combinatorial Penalties 3/33

11 ... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. Convex relaxation for Combinatorial Penalties 3/33

12 ... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. The variables lie in a hierarchy. Convex relaxation for Combinatorial Penalties 3/33

13 ... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. The variables lie in a hierarchy. The variables lie on a graph or network and the support should be localized or densely connected on the graph. Convex relaxation for Combinatorial Penalties 3/33

14 Difficult inverse problem in Brain Imaging Scale 6 - Fold e e-02 L R L R y=-84 x=17 z=-13 Jenatton et al. (2012) Convex relaxation for Combinatorial Penalties 4/33

15 Hierarchical Dictionary Learning Figure: Hierarchical dictionary of image patches Figure: Hierarchical Topic model Mairal, Jenatton, Obozinski and Bach (2010) Convex relaxation for Combinatorial Penalties 5/33

16 Ideas in structured sparsity Convex relaxation for Combinatorial Penalties 6/33

17 Group Lasso and l 1 /l p norm (Yuan and Lin, 2006) Group Lasso Given G = {A 1,..., A m } a partition of V := {1,..., d} consider w l1 /l p = A G δ A w A p Convex relaxation for Combinatorial Penalties 7/33

18 Group Lasso and l 1 /l p norm (Yuan and Lin, 2006) Group Lasso Given G = {A 1,..., A m } a partition of V := {1,..., d} consider w l1 /l p = A G δ A w A p Overlapping groups: direct extension of Jenatton et al. (2011). Interesting induced structures Induce patterns of rooted subtree Induce convex patterns on a grid Convex relaxation for Combinatorial Penalties 7/33

19 Hierarchical Norms (Zhao et al., 2009; Bach, 2008) (Jenatton, Mairal, Obozinski and Bach, 2010a) A covariate can only be selected after its ancestors Structure on parameters w Convex relaxation for Combinatorial Penalties 8/33

20 Hierarchical Norms (Zhao et al., 2009; Bach, 2008) (Jenatton, Mairal, Obozinski and Bach, 2010a) A covariate can only be selected after its ancestors Structure on parameters w Hierarchical penalization: Ω(w) = g G w g 2 where groups g in G are equal to the set of descendants of some nodes in a tree. Convex relaxation for Combinatorial Penalties 8/33

21 A new approach based on combinatorial functions Convex relaxation for Combinatorial Penalties 9/33

22 General framework Let V = {1,..., d}. Given a set function F : 2 V R + Convex relaxation for Combinatorial Penalties 10/33

23 General framework Let V = {1,..., d}. Given a set function F : 2 V R + consider min L(w) + F (Supp(w)) w Rd Convex relaxation for Combinatorial Penalties 10/33

24 General framework Let V = {1,..., d}. Given a set function F : 2 V R + consider min L(w) + F (Supp(w)) w Rd Examples of combinatorial functions Use recursivity or counts of structures (e.g. tree) with DP Block-coding (Huang et al., 2011) G(A) = min B i F (B 1 ) F (B k ) s.t. B 1... B k A Submodular functions (Work on convex relaxations by Bach (2010)) Convex relaxation for Combinatorial Penalties 10/33

25 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Convex relaxation for Combinatorial Penalties 11/33

26 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation Convex relaxation for Combinatorial Penalties 11/33

27 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) Convex relaxation for Combinatorial Penalties 11/33

28 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 11/33

29 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) L(w) + λ F (Supp(w)) L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 11/33

30 A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A F (A) L(w) + λ Supp(w) L(w) + λ F (Supp(w))? L(w) + λ w 1 L(w) + λ...?... Convex relaxation for Combinatorial Penalties 11/33

31 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Convex relaxation for Combinatorial Penalties 12/33

32 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: Convex relaxation for Combinatorial Penalties 12/33

33 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball Convex relaxation for Combinatorial Penalties 12/33

34 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary Convex relaxation for Combinatorial Penalties 12/33

35 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. Convex relaxation for Combinatorial Penalties 12/33

36 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. What happens in the non-submodular case? Convex relaxation for Combinatorial Penalties 12/33

37 Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. What happens in the non-submodular case? Convex relaxation for Combinatorial Penalties 12/33

38 Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Convex relaxation for Combinatorial Penalties 13/33

39 Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Convex relaxation for Combinatorial Penalties 13/33

40 Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Required for F allowing large supports such as A 1 {A } Convex relaxation for Combinatorial Penalties 13/33

41 Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Required for F allowing large supports such as A 1 {A } Leads to a penalty which is coercive so that a convex relaxation on R d will not be trivial. Convex relaxation for Combinatorial Penalties 13/33

42 A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Convex relaxation for Combinatorial Penalties 14/33

43 A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) 1 g h : x inf λ>0 λ g(λx). Convex relaxation for Combinatorial Penalties 14/33

44 A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Convex relaxation for Combinatorial Penalties 14/33

45 A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Leads us to consider: 1 ( ) pen h (w) = inf µ F (Supp(λw)) + ν λw p λ>0 λ p Convex relaxation for Combinatorial Penalties 14/33

46 A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Leads us to consider: 1 ( ) pen h (w) = inf µ F (Supp(λw)) + ν λw p λ>0 λ p Θ(w) := w p F (Supp(w)) 1/q with 1 p + 1 q = 1. Convex relaxation for Combinatorial Penalties 14/33

47 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Ω p(s) = max A V,A s A q F (A) 1/q. Convex relaxation for Combinatorial Penalties 15/33

48 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } Convex relaxation for Combinatorial Penalties 15/33

49 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Convex relaxation for Combinatorial Penalties 15/33

50 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q Convex relaxation for Combinatorial Penalties 15/33

51 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max A V max w w A R A A s A w A p F (A) 1/q Convex relaxation for Combinatorial Penalties 15/33

52 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max max w A V w A R A A s A w A p F (A) 1/q = max ι { s A V A q F (A) 1/q } Convex relaxation for Combinatorial Penalties 15/33

53 Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max max w A V w A R A A s A w A p F (A) 1/q = max ι { s A V A q F (A) 1/q } = ι {Ω p (s) 1} Convex relaxation for Combinatorial Penalties 15/33

54 Graphs of the different penalties for w R 2 F (Supp(w)) Convex relaxation for Combinatorial Penalties 16/33

55 Graphs of the different penalties for w R 2 F (Supp(w)) pen(w) = µ F (Supp(w)) + ν w 2 2 Convex relaxation for Combinatorial Penalties 16/33

56 Graphs of the different penalties for w R 2 Θ(w) = F (Supp(w)) w 2 Convex relaxation for Combinatorial Penalties 17/33

57 Graphs of the different penalties for w R 2 Θ(w) = F (Supp(w)) w 2 Ω F (w) Convex relaxation for Combinatorial Penalties 17/33

58 A large latent group Lasso (Jacob et al., 2009) V = {v = (v A ) A V ( R V ) 2 V s.t. Supp(v A ) A} Ω p (w) = min F (A) 1 q v A p s.t. w = v A, v V A V A V v {1} v {2}... v {1,2}... v {1,2,3,4} w = Convex relaxation for Combinatorial Penalties 18/33

59 Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p Convex relaxation for Combinatorial Penalties 19/33

60 Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p When p = and F is submodular, our relaxation coincides with that of Bach (2010). Convex relaxation for Combinatorial Penalties 19/33

61 Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p When p = and F is submodular, our relaxation coincides with that of Bach (2010). However, when G is not a partition and p <, Ω p is not in general an l 1 /l p -norms! New norms... e.g. the k-support norm of Argyriou et al. (2012). Convex relaxation for Combinatorial Penalties 19/33

62 Example Consider V={1, 2, 3}. G = {{1, 2}, {1, 3}, {2, 3}} F ({1, 2}) = 1, F ({1, 3}) = 1, F ({2, 3}) = 1, F (A) = or defined by block-coding. Convex relaxation for Combinatorial Penalties 20/33

63 Example Consider V={1, 2, 3}. G = {{1, 2}, {1, 3}, {2, 3}} F ({1, 2}) = 1, F ({1, 3}) = 1, F ({2, 3}) = 1, F (A) = or defined by block-coding. Convex relaxation for Combinatorial Penalties 20/33

64 How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. Convex relaxation for Combinatorial Penalties 21/33

65 How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Convex relaxation for Combinatorial Penalties 21/33

66 How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 Convex relaxation for Combinatorial Penalties 21/33

67 How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 The relaxation fails Convex relaxation for Combinatorial Penalties 21/33

68 How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 The relaxation fails New concept of Lower Combinatorial envelope provides a tool to analyze the tightness of the relaxation. Convex relaxation for Combinatorial Penalties 21/33

69 Submodular penalties A function F : 2 V R is submodular if A, B V, F (A) + F (B) F (A B) + F (A B) (1) For these functions Ω F (w) = f ( w ) for f the Lovász extension of F. Properties of submodular function f is computed efficiently (via the so-called greedy algorithm) decomposition ( weak separability) properties F and f can be minimized in polynomial time. Convex relaxation for Combinatorial Penalties 22/33

70 Submodular penalties A function F : 2 V R is submodular if A, B V, F (A) + F (B) F (A B) + F (A B) (1) For these functions Ω F (w) = f ( w ) for f the Lovász extension of F. Properties of submodular function f is computed efficiently (via the so-called greedy algorithm) decomposition ( weak separability) properties F and f can be minimized in polynomial time.... leads to properties of the corresponding submodular norms Regularized empirical risk minimization problems solved efficiently Statistical guarantees in terms of consistency and support recovery. Convex relaxation for Combinatorial Penalties 22/33

71 Consistency for the Lasso (Bickel et al., 2009) Assume that y = Xw + σε, with ε N (0, Id n ) Let Q = 1 n X X R d d. Denote J = Supp(w ). Assume the l 1 -Restricted Eigenvalue condition: s.t. J c 1 3 J 1, Q κ J 2 1. Then we have 1 n X ŵ Xw J σ2 κ ( 2 log p + t 2 n ), with probability larger than 1 exp( t 2 ). Convex relaxation for Combinatorial Penalties 23/33

72 Support Recovery for the Lasso (Wainwright, 2009) Assume y = Xw + σε, with ε N (0, Id n ) Let Q = 1 n X X R d d. Denote by J = Supp(w ). Define ν = min j,w j 0 w j > 0 Assume κ = λ min (Q JJ ) > 0 Assume the Irrepresentability Condition, i.e., that for η > 0, 2σ 2 log(p) n Q 1 JJ Q JJ c, 1 η. Then, if 2 η < λ < κν J, the minimizer ŵ is unique and has support equal to J, with probability larger than 1 4 exp( c 1 nλ 2 ). Convex relaxation for Combinatorial Penalties 24/33

73 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Convex relaxation for Combinatorial Penalties 25/33

74 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. Convex relaxation for Combinatorial Penalties 25/33

75 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... Convex relaxation for Combinatorial Penalties 25/33

76 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A Convex relaxation for Combinatorial Penalties 25/33

77 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A But F (A) : = d 1 + range(a) is submodular! Convex relaxation for Combinatorial Penalties 25/33

78 An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A But F (A) : = d 1 + range(a) is submodular! In fact F (A) = B G 1 {A B } for B of the form: Jenatton et al. (2011) considered Ω(w) = B B w B d B 2. Convex relaxation for Combinatorial Penalties 25/33

79 Experiments S S 2 S 3 S 4 S S 1 constant S 2 triangular shape S 3 x sin(x) sin(5x) S 4 a slope pattern S 5 i.i.d. Gaussian pattern Figure: Signals Compare: Lasso Elastic Net Naive l 2 group-lasso Ω 2 for F (A) = d 1 + range(a) Ω for F (A) = d 1 + range(a) The weighted l 2 group-lasso of (Jenatton et al., 2011). Convex relaxation for Combinatorial Penalties 26/33

80 Constant signal Best Hamming d=256, k=160, σ=0.5 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ = n Convex relaxation for Combinatorial Penalties 27/33

81 Triangular signal Best Hamming Best Hamming d=256, k=160, σ=0.5, S 2, cov=id n EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ =.5 Convex relaxation for Combinatorial Penalties 28/33

82 (x 1, x 2 ) sin(x 1 ) sin(5x 1 ) sin(x 2 ) sin(5x 2 ) signal in 2D Best Hamming d=256, k=160, σ=1.0 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ = n Convex relaxation for Combinatorial Penalties 29/33

83 i.i.d Random signal in 2D Best Hamming d=256, k=160, σ=1.0 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ = n Convex relaxation for Combinatorial Penalties 30/33

84 Summary A convex relaxation for functions penalizing (a) the support via a general set function (b) the l p norm of the parameter vector w. Principled construction of: known norms like the group Lasso or l 1 /l p -norm many new sparsity inducing norms Caveat: the relaxation can fail to capture the structure (e.g. range function) For submodular functions we can obtain efficient algorithms, and theoretical results such as consistency and support recovery guarantees. Convex relaxation for Combinatorial Penalties 31/33

85 References I Argyriou, A., Foygel, R., and Srebro, N. (2012). Sparse prediction with the k-support norm. In Advances in Neural Information Processing Systems 25, pages Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems. Bach, F. (2010). Structured sparsity-inducing norms through submodular functions. In Adv. NIPS. Bickel, P., Ritov, Y., and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics, 37(4): Huang, J., Zhang, T., and Metaxas, D. (2011). Learning with structured sparsity. J. Mach. Learn. Res., 12: Jacob, L., Obozinski, G., and Vert, J. (2009). Group lasso with overlap and graph lasso. In ICML. Jenatton, R., Audibert, J., and Bach, F. (2011). Structured variable selection with sparsity-inducing norms. JMLR, 12: Jenatton, R., Gramfort, A., Michel, V., Obozinski, G., Eger, E., Bach, F., and Thirion, B. (2012). Multiscale mining of fmri data with hierarchical structured sparsity. SIAM Journal on Imaging Sciences, 5(3): Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. (2011). Convex and network flow optimization for structured sparsity. JMLR, 12: Convex relaxation for Combinatorial Penalties 32/33

86 References II Wainwright, M. J. (2009). Sharp thresholds for noisy and high-dimensional recovery of sparsity using l 1 - constrained quadratic programming. IEEE Transactions on Information Theory, 55: Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 68: Zhao, P., Rocha, G., and Yu, B. (2009). Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A): Convex relaxation for Combinatorial Penalties 33/33

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach Sierra team, INRIA - Ecole Normale Supérieure - CNRS Thanks to R. Jenatton, J. Mairal, G. Obozinski June 2011 Outline Introduction:

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski IPAM - February 2013 Outline

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski IMA - September 2011 Outline

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Trace Lasso: a trace norm regularization for correlated designs

Trace Lasso: a trace norm regularization for correlated designs Author manuscript, published in "NIPS 2012 - Neural Information Processing Systems, Lake Tahoe : United States (2012)" Trace Lasso: a trace norm regularization for correlated designs Edouard Grave edouard.grave@inria.fr

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Journées INRIA - Apprentissage - December

More information

Group Lasso with Overlaps: the Latent Group Lasso approach

Group Lasso with Overlaps: the Latent Group Lasso approach Group Lasso with Overlaps: the Latent Group Lasso approach Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert To cite this version: Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert. Group Lasso

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

Tales from fmri Learning from limited labeled data. Gae l Varoquaux

Tales from fmri Learning from limited labeled data. Gae l Varoquaux Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Cambridge University - May 2012 Outline

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Sparse Estimation and Dictionary Learning

Sparse Estimation and Dictionary Learning Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

A Family of Penalty Functions for Structured Sparsity

A Family of Penalty Functions for Structured Sparsity A Family of Penalty Functions for Structured Sparsity Charles A. Micchelli Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue, Kowloon Tong Hong Kong charles micchelli@hotmail.com

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui THE GROUP k-support NORM FOR LEARNING WITH STRUCTURED SPARSITY Nikhil Rao, Miroslav Dudík, Zaid Harchaoui Technicolor R&I, Microsoft Research, University of Washington ABSTRACT Several high-dimensional

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi Structured Sparsity group testing & compressed sensing a norm that induces structured sparsity (ariv.org/abs/1110.0413 Obozinski, G., Jacob, L., & Vert, J.-P., October 2011) Mittagssear 2011 / 11 / 10

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

(k, q)-trace norm for sparse matrix factorization

(k, q)-trace norm for sparse matrix factorization (k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Submodular Functions Properties Algorithms Machine Learning

Submodular Functions Properties Algorithms Machine Learning Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Submitted to the Statistical Science Structured sparsity through convex optimization Francis Bach, Rodolphe Jenatton, Julien Mairal and Guillaume Obozinski INRIA and University of California, Berkeley

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing

More information

An Interactive Greedy Approach to Group Sparsity in High Dimension

An Interactive Greedy Approach to Group Sparsity in High Dimension arxiv:1707.0963v [stat.ml] 7 Jul 017 An Interactive Greedy Approach to Group Sparsity in High Dimension Wei Qian a, Wending Li b, Yasuhiro Sogawa c, Ryohei Fujimaki c, Xitong Yang b, and Ji Liu b a Department

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Structured Variable Selection with Sparsity-Inducing Norms

Structured Variable Selection with Sparsity-Inducing Norms STRUCTURED VARIABLE SELECTION WITH SPARSITY-INDUCING NORMS Structured Variable Selection with Sparsity-Inducing Norms Rodolphe Jenatton INRIA - WILLOW Project-team, Laboratoire d Informatique de l Ecole

More information

Bolasso: Model Consistent Lasso Estimation through the Bootstrap

Bolasso: Model Consistent Lasso Estimation through the Bootstrap Francis R. Bach francis.bach@mines.org INRIA - WILLOW Project-Team Laboratoire d Informatique de l Ecole Normale Supérieure (CNRS/ENS/INRIA UMR 8548) 45 rue d Ulm, 75230 Paris, France Abstract We consider

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Proximal Methods for Hierarchical Sparse Coding

Proximal Methods for Hierarchical Sparse Coding Proximal Methods for Hierarchical Sparse Coding Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach To cite this version: Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Francis Bach INRIA - Willow Project, École Normale Supérieure 45, rue d Ulm, 75230 Paris, France francis.bach@mines.org Abstract

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Oracle Inequalities for High-dimensional Prediction

Oracle Inequalities for High-dimensional Prediction Bernoulli arxiv:1608.00624v2 [math.st] 13 Mar 2018 Oracle Inequalities for High-dimensional Prediction JOHANNES LEDERER 1, LU YU 2, and IRINA GAYNANOVA 3 1 Department of Statistics University of Washington

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Learning with Submodular Functions

Learning with Submodular Functions Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Supérieure Machine Learning Summer School, Kyoto September 2012 Submodular functions- References and Links References

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Learning Spectral Graph Segmentation

Learning Spectral Graph Segmentation Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique Graph-based

More information

Learning with Submodular Functions: A Convex Optimization Perspective

Learning with Submodular Functions: A Convex Optimization Perspective Foundations and Trends R in Machine Learning Vol. 6, No. 2-3 (2013) 145 373 c 2013 F. Bach DOI: 10.1561/2200000039 Learning with Submodular Functions: A Convex Optimization Perspective Francis Bach INRIA

More information

Penalized versus constrained generalized eigenvalue problems

Penalized versus constrained generalized eigenvalue problems Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Graph Wavelets to Analyze Genomic Data with Biological Networks

Graph Wavelets to Analyze Genomic Data with Biological Networks Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

AFRL-OSR-VA-TR

AFRL-OSR-VA-TR AFRL-OSR-VA-TR-2013-0111 Multi-Stage Convex Relaxation Methods for Machine Learning Tong Zhang Rutgers University March 2013 Final Report DISTRIBUTION A: Approved for public release. AIR FORCE RESEARCH

More information

TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING

TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING By Seyoung Kim and

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

Guaranteed Sparse Recovery under Linear Transformation

Guaranteed Sparse Recovery under Linear Transformation Ji Liu JI-LIU@CS.WISC.EDU Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Lei Yuan LEI.YUAN@ASU.EDU Jieping Ye JIEPING.YE@ASU.EDU Department of Computer Science

More information