Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Size: px
Start display at page:

Download "Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms"

Transcription

1 university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional Statistics and Learning Theory Barron, Huang, Luo Penalized Squared Error and Likelihood 1/54

2 Outline 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 2/54

3 Outline Settings and Penalized Estimator Acceptability of Penalty General View 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 3/54

4 Settings Settings and Penalized Estimator Acceptability of Penalty General View Regression Y = f (X) + ɛ Training data (X, Y ) = (X i, Y i ) n i=1 Evaluation sample X = (X i )n i=1 Target function f (x) = E[Y X =x] Assume f B Noise ɛ = Y f (X) satisfies Bernstein s moment conditions Candidate functions f from a class F Average squared error Y f 2 X = 1 n n i=1 (Y i f (X i )) 2 Barron, Huang, Luo Penalized Squared Error and Likelihood 4/54

5 Penalized Least Squares Settings and Penalized Estimator Acceptability of Penalty General View ˆf chosen to satisfy Y ˆf { } 2 X + pen n (ˆf ) inf Y f 2 X + pen n (f ) + A f f F pen n (f ) and A f may depend on the data X, Y A f is index of computational accuracy Truncated estimator Tˆf at a level B B We want risk bounded by } E Tˆf f 2 (1 + δ) inf { f f 2 + Epen n (f ) + EA f f F }{{} index of resolvability university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 5/54

6 Acceptable Penalties Settings and Penalized Estimator Acceptability of Penalty General View What kinds of penalties produce the required risk bound? Acceptable or proper penalties Barron, Huang, Luo Penalized Squared Error and Likelihood 6/54

7 Countable Case Settings and Penalized Estimator Acceptability of Penalty General View Consider countable F Penalty γl(f )/n proportional to complexities Kraft inequality f F e L(f ) 1 P n, P n empirical distribution of X and X From Hoeffding and Bernstein inequality 1 E sup f F c P n[(f f ) 2 2 ] P }{{} n [(Y f ) }{{} f f 2 X Y f 2 X ɛ 2 ] γl(f ) n 0 c > 1 γ depends on B, B, c, σ 2 and h Bern university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 7/54

8 Risk Bound in Countable Case Settings and Penalized Estimator Acceptability of Penalty General View Risk is bounded by E Tˆf f 2 X {min c f f 2 + γl(f ) } f F n Barron, Huang, Luo Penalized Squared Error and Likelihood 8/54

9 Uncountable Case Settings and Penalized Estimator Acceptability of Penalty General View Valid pen n (f ) for uncountable F If there exists F and complexity L( f ) with { } 1 sup f F c P n(g f ) P n (ρ f ) pen n (f ) { } 1 c P n(g f ) P n (ρ f ) γl( f ), n sup f F where c c > 1. Inequality holds point-wise or in expectation g f (X) = (f (X) f (X)) 2 ρ f (X, Y ) = (Y f (X)) 2 (Y f (X)) 2 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 9/54

10 Acceptable Penalty Settings and Penalized Estimator Acceptability of Penalty General View Variable-complexity, variable-distortion cover For f in F, penalty pen n (f ) valid if there is a representor f, s.t. pen n (f ) is at least γl( f ) n + n (f, f ), n (f, f ) = Y f 2 X Y f 2 X + 1 c f Tf 2 X 1 c f f 2 X with c c > 1 F consists of f bounded by B Risk is bounded by E Tˆf f 2 X c inf f F { } f f 2 + E [pen n (f ) + A f ] university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 10/54

11 Settings and Penalized Estimator Acceptability of Penalty General View Penalty via Complexity Distortion Trade-off Allowing unbounded f Acceptable penalty at least where inf f F { γl( f ) + D n (f, f ) n } D n (f, f ) = Y f 2 X Y f 2 X + f f 2 X γ = 1.6(B + B ) 2 + 2h }{{} Bern (B + B ) + 2.7σ 2 }{{} main term arising from noise university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 11/54

12 Risk Bound Settings and Penalized Estimator Acceptability of Penalty General View Risk of Tˆf E Tˆf f 2 X 3 inf f F { f f 2 + E [pen n (f ) + A f ] + tail } n Noise bounded, tail = 0, B = B + C Noise sub-gaussian, tail = const, B = B + C log n Noise Bernstein, tail = const, B = B + C log n Barron, Huang, Luo Penalized Squared Error and Likelihood 12/54

13 Our Work Settings and Penalized Estimator Acceptability of Penalty General View General penalty condition Subset selection pen n (f m ) = γ n log ( ) M m + m log n n l 1 penalization pen n (f β ) = λ n β 1 What size λ n? Combinations thereof (see paper) Greedy algorithm for each Barron, Huang, Luo Penalized Squared Error and Likelihood 13/54

14 Sampling Idea Settings and Penalized Estimator Acceptability of Penalty General View f = h β hh a linear combination of h in H Randomly draw h 1, h 2,..., h m independently with probability proportional to β h for h i = h This idea is useful in Approximation bound Proof of the acceptability of a penalty via countable covers Greedy algorithm computational inaccuracy Squared error of order 1 m or better for each Barron, Huang, Luo Penalized Squared Error and Likelihood 14/54

15 Outline Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 15/54

16 Regression Problem Settings Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Training data (X, Y ) = (X i, Y i ) n i=1 Evaluation at X = (X i )n i=1, independent copy of X Target function f (x) = E[Y X =x], f B Noise ɛ = Y f (X) satisfies Bernstein s conditions Function class F = F H is the linear span of a library H f in F H of form f (x) = f β (x) = h β hh(x) with β = (β h : h H) Barron, Huang, Luo Penalized Squared Error and Likelihood 16/54

17 l 1 Penalized Least Squares Estimator Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Find ˆβ, ˆf = f ˆβ to satisfy { } Y f ˆβ 2 X + λ ˆβ 1,a = min Y f β 2 X + λ β 1,a. β where f β (x) = h β hh(x) and β 1,a = h β h a h. Lasso (Tibshirani 1996) Basis Pursuit (Chen and Donoho 1996) Barron, Huang, Luo Penalized Squared Error and Likelihood 17/54

18 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54

19 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54

20 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54

21 Risk bound in the case that H is finite Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Consider case H is finite with size M (also called p) Weights a h = h in traditional setting Weights a h = 2 h X,X in the transductive setting where h 2 X,X = 1 n n i=1 (h2 (X i ) + h 2 (X i )) λ is chosen at least 2 2γ(log 2M)/n with γ = 1.6(B + B ) 2 + 2(B + B )h Bern + 2.7σ 2 Tˆf truncates to the level B B Risk satisfies [ E Tˆf f 2 3 inf β } ] { f β f 2 + λ β 1,a + adjust/n Adjustment terms negligible compared to main terms university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 19/54

22 Glance of the Proof Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability F consists of f of the form f (x) = v m m h k (x)/a hk, h k H, m = 1, 2,... k=1 Complexities L( f ) = m log M + m log 2 Using sampling idea, there is a representor f m for each f Y f m 2 X Y f 2 X + f f m X + γl( f m )/n, }{{}}{{}}{{} v β 1,a /m v β 1,a /m γm log(2m)/n m = β 1,a /η and v = mη; optimal η = n/(log M) pen n (f β ) at least 2 2γ(log 2M)/n β 1 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 20/54

23 Improvement Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Improvement based on empirical L 2 covering of library H H 2 finite cover with precision ε 2 and cardinality m 2 Use a h = 1. 2γ log(2m) λ is chosen at least λ n = 2ε 2 n. The risk satisfies [ { } E Tˆf f 2 3 min f β f 2 + λ β 1 + adjust ], β n Barron, Huang, Luo Penalized Squared Error and Likelihood 21/54

24 Stratified Sampling Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability H 2 is an L 2 cover of H with precision ε 2 and cardinality m 2 For f = h β hh in F H, there is an f m = (v/m) m k=1 h k such that f f m 2 ε2 2 β 1v m m 2 v is between h β h and h β h (1 + m 2 /(m m 2 )) Based on Makovoz (1996) university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 22/54

25 Proof of Improvement Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Same F and L( f ) Using stratified sampling idea, there is a representor f m for each f Y f m 2 X Y f 2 X + }{{} f f m X }{{} + γl( f m )/n }{{}, ε 2 2 β 1v/(m m 2 ) ε 2 2 β 1v/(m m 2 ) γm log(2m)/n Set v = mη/ɛ 2 and ε 2 h β h/η m ε 2 h β h/η + m 2 Optimizing η Penalty at least λ n β 1 + γ m 2 log(2m) n Barron, Huang, Luo Penalized Squared Error and Likelihood 23/54

26 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Risk bound in the case that H is infinite Improvement with two levels of cover. A fine precision ε 1 typically of order ε 2 / n, we consider empirical covers H 1 providing effective library size M 1. This size M 1 serves as surrogate for M H 2 is the same as before Use a h = 1 2γ log(2m λ at least 2ε 1 ) 2 n + 16B ε 1 The risk satisfies [ { } E Tˆf f 2 3 min f β f 2 + λ β 1 + adjust ] β n There is a quantity 2γm 2 log(2m 1 ) in the adjust university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 24/54

27 Further exploration of the infinite case Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Take advantage of the covering properties of H to relate M 1 and ε 1 ; likewise m 2 and ε 2 Library H have metric dimension d 1 w.r.t. the empirical L 1 norm, if the cardinality M 1 is of order (1/ε) d 1 Likewise, the metric dimension d 2 w.r.t. the empirical L 2 norm d 1 d 2 2d 1 Barron, Huang, Luo Penalized Squared Error and Likelihood 25/54

28 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability l 1 Penalty for Libraries of Finite Metric Dimension The library H has dimensions d 1 and d 2 w.r.t. empirical L 1 and L 2 norms λ at least ( d1 λ n,d = C 1 (d 1, d 2 ) n log n ) (d2 +2)/(2d 2 +2) d 1 The risk tends to zero at rate ( inf β f β f 2 d1 + n log n ) d (d 2 +1) β 1 d 1. Barron, Huang, Luo Penalized Squared Error and Likelihood 26/54

29 A Refined Penalty Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Library H has dimensions d 1 and d 2 w.r.t. empirical norms Using penalty with λ at least λ n,d pen n (f β ) = λ β d 2/(d 2 +1) 1 The penalized least squares estimator ˆf satisfies the resolvability risk bound [ { } E Tˆf f 2 3 min β f β f 2 + λ n,d β d 2 d ] + adjust n Smaller index of resolvability university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 27/54

30 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability university-logo Variation and L 1,H Variation V (f ) of f, w.r.t. H and weights a = (a h ), is { } V (f ) = lim inf ε 0 f ε F H β 1,a : f ε = β h h and f ε f ε h A natural extension of β 1,a L 1,H consists of functions with finite variation Barron, Huang, Luo Penalized Squared Error and Likelihood 28/54

31 Approximation and Penalty Trade-off Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We discuss the trade-off between approximation error and penalty as expressed in the resolvability and its relationship to interpolation spaces between two classes of functions. Squared approximation error Resolvability App(f, v) = inf { f β f 2 } f β : β 1 =v R 1 (f, λ n ) = inf v {App(f, v) + λ n v} If f L 1,H, R 1 (f, λ n ) λ n V (f ) goes to 0 linearly If f L 2 (P), the convergence rate can be arbitrarily slow university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 29/54

32 Interpolation Space B res 1,p Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Consider B res 1,p = {f : R 1 (f, λ) c f λ 2 p for all λ>0}, indexed by 1 p 2 Coincide with traditional interpolation spaces B p = [L 2 (P), L 1,H ] θ When p =1, we see B res 1,1 includes L 1,H. If f B1,p res, the resolvability of order λ2 p n, provides rate ε 2 p 2 ( γ log M n ) 1 p/2 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 30/54

33 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Trade-off for finite-dimensional libraries f B res 1,p and H has dimensions d 1 and d 2 The resolvability R 1 (f, λ n ), with R 1 (f, λ) = inf v {App(f, v) + λv}, is of order ( d1 ) (1 p/2)(d 2 +2) (d 2 +1) n log n d 1 The resolvability R 1 r (f, λ n ), with r = d 2 /(d 2 + 1), R 1 r (f, λ) = inf v {App(f, v) + λv r }, is of order ( d1 ) (1 p/2)(d 2 +2) (d 2 +2 p) n log n d 1 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 31/54

34 Variable Complexity Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Finite Library H Variable complexities L(h), satisfying h e L(h) 1 a L,h = h L(h)+log 2 in traditional setting a L,h = h 2n L(h)+log 2 in transductive setting Similar risk bound holds Using L(h)+log 2 inside the sum defining β 1,aL in place of the constant log M +log 2 outside the sum Barron, Huang, Luo Penalized Squared Error and Likelihood 32/54

35 al Accuracy Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability ˆβ and f ˆβ satisfy Y f ˆβ 2 X + λ ˆβ } 1,a inf { Y f β 2 β X + λ β 1,a + A β,m. Same risk bound still holds with EA β,m inside the index of resolvability Barron, Huang, Luo Penalized Squared Error and Likelihood 33/54

36 Outline l 1 Penalized Least Squares l 1 Penalized Loglikelihood 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 34/54

37 l 1 Penalized Least Squares l 1 Penalized Least Squares l 1 Penalized Loglikelihood Dictionary H = {h(x)} of size M (also called p). Data (X i, Y i ) n i=1. Fit a function in the linear span F H to minimize 1 n ( n Y i h i=1 ) 2 β h h(x i ) + λ β h. h Barron, Huang, Luo Penalized Squared Error and Likelihood 35/54

38 l 1 Penalized Least Squares l 1 Penalized Loglikelihood Algorithms for l 1 Penalized Least Squares Examples of existing algorithms: interior point method (Boyed et al, 2004) LARS (EHJT, 2004) coordinate descent (Friedman et al, 2007) Built on Jones (1992) and others. l 1 penalized greedy pursuit (LPGP), HCB (2008), sec 3. Barron, Huang, Luo Penalized Squared Error and Likelihood 36/54

39 LPGP for Least Squares l 1 Penalized Least Squares l 1 Penalized Loglikelihood Initialize f 0 (x) = 0. Iteratively seek f m (x) = (1 α m )f m 1 (x) + β m h m (x) to minimize over h H, α (0, 1) and β R: Y i (1 α) f m 1 (X i ) βh(x i ) 2 n + λ [ β + (1 α) v m 1] where v m = m j=1 β j,m for f m = m j=1 β j,mh j. Barron, Huang, Luo Penalized Squared Error and Likelihood 37/54

40 l 1 Penalized Least Squares l 1 Penalized Loglikelihood Theorem for Accuracy Empirical l 2 norm h n. V f = inf { h β h h n : f (x) = h β hh(x)}. The m step estimator of LPGP within order Vf 2 /m of the minimal objective: Y i f m (X i ) n + λv m inf f F H { Y i f (X i ) n + λv f + 4V } f 2 m + 1 Barron, Huang, Luo Penalized Squared Error and Likelihood 38/54

41 Advantages and Disadvantages l 1 Penalized Least Squares l 1 Penalized Loglikelihood Advantages: explicit guarantee of accuracy; cost Mnm v.s. Mn 2 for LARS; inexpensive optimization at each iteration. Disadvantages: approximate solution; fixed λ n. Barron, Huang, Luo Penalized Squared Error and Likelihood 39/54

42 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood WLOG β h 0 (assume H closed under sign change). For an arbitrary f = h β hh, e 2 m = Y i f m (X i ) 2 n Y i f (X i ) 2 n + λv m. f m = (1 α m )f m 1 + β m h m, then em 2 is at least as good as choosing α = 2 m+1, β = αv f, an h chosen at random. Barron, Huang, Luo Penalized Squared Error and Likelihood 40/54

43 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood Rearrange to have em 2 (1 α)em α2 b(v f h) + αλv f 2α(1 α) 1 n (Y i f m 1 (X i )) (V n f h(x i ) f (X i )) }{{} i=1 0 averaging over h where b(v f h) = Y V f h 2 n Y f 2 n. Consider drawing h with probability β h /V f, cross term vanishes and averaging b(v f h) over h bounded by V 2 f. Barron, Huang, Luo Penalized Squared Error and Likelihood 41/54

44 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood We show e 2 m (1 α)e 2 m 1 + α2 V 2 f + αλv f. Induction reveals the accuracy of O(1/m). Barron, Huang, Luo Penalized Squared Error and Likelihood 42/54

45 l 1 Penalized Loglikelihood l 1 Penalized Least Squares l 1 Penalized Loglikelihood X 1,..., X n be i.i.d. in R p distributed as Let L n (f ) = 1 n log(1/p f (X n )). p f (x) = ef (x) p 0 (x) C f. l 1 penalized loglikelihood estimator f = f β minimizes L n (f ) + λv f where V f = inf{ h β h : f (x) = h β hh(x), h H}. Barron, Huang, Luo Penalized Squared Error and Likelihood 43/54

46 Motivation l 1 Penalized Least Squares l 1 Penalized Loglikelihood Minimization is computationally demanding when p large. Term by term selection is favored in sparse settings. Approximate optimization good enough for risk analysis. BHLL (2008) extends LPGP for penalized loglikelihood. Barron, Huang, Luo Penalized Squared Error and Likelihood 44/54

47 LPGP for Penalized Loglikelihood l 1 Penalized Least Squares l 1 Penalized Loglikelihood Initialize with f 0 (x) = 0. f m (x) = (1 α m )f m 1 (x) + β m h m (x) with α m, β m and h m chosen by argmin α, β, h {L n (f m ) + λ[(1 α)v m 1 + β ]} where v m 1 = m 1 j=1 βj,m 1 for fm 1 = m 1 j=1 β j,m 1h j. Barron, Huang, Luo Penalized Squared Error and Likelihood 45/54

48 Theorem l 1 Penalized Least Squares l 1 Penalized Loglikelihood Theorem Suppose h(x) C for all h(x) H. The m step LPGP estimator f m (x) has L n (f m ) + λv m inf {L n(f ) + λv f + 2V f 2 f F m + 1 }. Barron, Huang, Luo Penalized Squared Error and Likelihood 46/54

49 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood m step error has linear and nonlinear components. Linear parts are handled similarly to the least squares. Nonlinear(normalizing constants) of O(α 2 ) by a mgf bound. Induction completes the proof. Barron, Huang, Luo Penalized Squared Error and Likelihood 47/54

50 Accuracy l 1 Penalized Least Squares l 1 Penalized Loglikelihood f m (x) = (1 α m )f m 1 (x) + β m h m (x) e m = L n (f m ) L n (f ) + λ [(1 α m )v m 1 + β m ]. From definition L n (f m ) L n (f ) equals 1 n e f m(t) p 0 (t) [(1 α m )f m 1 (X i ) + β m h m (X i ) f (X i )] log n e f (t). p i=1 0 (t) }{{}}{{} linear nonlinear Barron, Huang, Luo Penalized Squared Error and Likelihood 48/54

51 Sampling h l 1 Penalized Least Squares l 1 Penalized Loglikelihood Consider α = 2/(m + 1), β = αv f, a random h(x). Rearrange and write p α = e (1 α)[f m 1(x) f (x)] p f (x)/c e m (1 α)e m 1 + αλv f + α 1 n + log n [f (X i ) V f h(x i )] i=1 } {{ } 0 averaging over h p α (t)exp{α(v f h(t) f (t))}. Sample h with probability β h /V f, third term vanishes. Bring the average over h inside log and the expectation over random h of exp{α(v f h(t) f (t))} not more than e α2 Vf 2/2. university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 49/54

52 Induction l 1 Penalized Least Squares l 1 Penalized Loglikelihood We show e m (1 α)e m 1 + α2 Vf αλv f Induction completes the proof. Barron, Huang, Luo Penalized Squared Error and Likelihood 50/54

53 Current Work l 1 Penalized Least Squares l 1 Penalized Loglikelihood Generalize to permit l 2 norm in penalized loglikelihood. High dimensional graphical models: logistic, gaussian. R package will be publicly available. Barron, Huang, Luo Penalized Squared Error and Likelihood 51/54

54 Outline 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 52/54

55 Our Work university-logo General penalty condition Subset selection pen n (f m ) = γ n log ( ) M m + m log n n l 1 penalization pen n (f β ) = λ n β 1 valid What size λ n? Combinations thereof (see paper) Greedy algorithm for each valid Barron, Huang, Luo Penalized Squared Error and Likelihood 53/54

56 Sampling Idea university-logo f = h β hh a linear combination of h in H Randomly draw h 1, h 2,..., h m independently with probability proportional to β h for h i = h This idea is useful in Approximation bound Proof of the acceptability of a penalty via countable covers Greedy algorithm computational inaccuracy Squared error of order 1 m or better for each Barron, Huang, Luo Penalized Squared Error and Likelihood 54/54

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

Information and Statistics

Information and Statistics Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Regularization Algorithms for Learning

Regularization Algorithms for Learning DISI, UNIGE Texas, 10/19/07 plan motivation setting elastic net regularization - iterative thresholding algorithms - error estimates and parameter choice applications motivations starting point of many

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Case study: stochastic simulation via Rademacher bootstrap

Case study: stochastic simulation via Rademacher bootstrap Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 2 November 6 November Deadline to hand in the homeworks: your exercise class on week 9 November 13 November Exercises (1) Let X be the following space of piecewise

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information