Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
|
|
- Audrey Townsend
- 5 years ago
- Views:
Transcription
1 university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional Statistics and Learning Theory Barron, Huang, Luo Penalized Squared Error and Likelihood 1/54
2 Outline 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 2/54
3 Outline Settings and Penalized Estimator Acceptability of Penalty General View 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 3/54
4 Settings Settings and Penalized Estimator Acceptability of Penalty General View Regression Y = f (X) + ɛ Training data (X, Y ) = (X i, Y i ) n i=1 Evaluation sample X = (X i )n i=1 Target function f (x) = E[Y X =x] Assume f B Noise ɛ = Y f (X) satisfies Bernstein s moment conditions Candidate functions f from a class F Average squared error Y f 2 X = 1 n n i=1 (Y i f (X i )) 2 Barron, Huang, Luo Penalized Squared Error and Likelihood 4/54
5 Penalized Least Squares Settings and Penalized Estimator Acceptability of Penalty General View ˆf chosen to satisfy Y ˆf { } 2 X + pen n (ˆf ) inf Y f 2 X + pen n (f ) + A f f F pen n (f ) and A f may depend on the data X, Y A f is index of computational accuracy Truncated estimator Tˆf at a level B B We want risk bounded by } E Tˆf f 2 (1 + δ) inf { f f 2 + Epen n (f ) + EA f f F }{{} index of resolvability university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 5/54
6 Acceptable Penalties Settings and Penalized Estimator Acceptability of Penalty General View What kinds of penalties produce the required risk bound? Acceptable or proper penalties Barron, Huang, Luo Penalized Squared Error and Likelihood 6/54
7 Countable Case Settings and Penalized Estimator Acceptability of Penalty General View Consider countable F Penalty γl(f )/n proportional to complexities Kraft inequality f F e L(f ) 1 P n, P n empirical distribution of X and X From Hoeffding and Bernstein inequality 1 E sup f F c P n[(f f ) 2 2 ] P }{{} n [(Y f ) }{{} f f 2 X Y f 2 X ɛ 2 ] γl(f ) n 0 c > 1 γ depends on B, B, c, σ 2 and h Bern university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 7/54
8 Risk Bound in Countable Case Settings and Penalized Estimator Acceptability of Penalty General View Risk is bounded by E Tˆf f 2 X {min c f f 2 + γl(f ) } f F n Barron, Huang, Luo Penalized Squared Error and Likelihood 8/54
9 Uncountable Case Settings and Penalized Estimator Acceptability of Penalty General View Valid pen n (f ) for uncountable F If there exists F and complexity L( f ) with { } 1 sup f F c P n(g f ) P n (ρ f ) pen n (f ) { } 1 c P n(g f ) P n (ρ f ) γl( f ), n sup f F where c c > 1. Inequality holds point-wise or in expectation g f (X) = (f (X) f (X)) 2 ρ f (X, Y ) = (Y f (X)) 2 (Y f (X)) 2 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 9/54
10 Acceptable Penalty Settings and Penalized Estimator Acceptability of Penalty General View Variable-complexity, variable-distortion cover For f in F, penalty pen n (f ) valid if there is a representor f, s.t. pen n (f ) is at least γl( f ) n + n (f, f ), n (f, f ) = Y f 2 X Y f 2 X + 1 c f Tf 2 X 1 c f f 2 X with c c > 1 F consists of f bounded by B Risk is bounded by E Tˆf f 2 X c inf f F { } f f 2 + E [pen n (f ) + A f ] university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 10/54
11 Settings and Penalized Estimator Acceptability of Penalty General View Penalty via Complexity Distortion Trade-off Allowing unbounded f Acceptable penalty at least where inf f F { γl( f ) + D n (f, f ) n } D n (f, f ) = Y f 2 X Y f 2 X + f f 2 X γ = 1.6(B + B ) 2 + 2h }{{} Bern (B + B ) + 2.7σ 2 }{{} main term arising from noise university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 11/54
12 Risk Bound Settings and Penalized Estimator Acceptability of Penalty General View Risk of Tˆf E Tˆf f 2 X 3 inf f F { f f 2 + E [pen n (f ) + A f ] + tail } n Noise bounded, tail = 0, B = B + C Noise sub-gaussian, tail = const, B = B + C log n Noise Bernstein, tail = const, B = B + C log n Barron, Huang, Luo Penalized Squared Error and Likelihood 12/54
13 Our Work Settings and Penalized Estimator Acceptability of Penalty General View General penalty condition Subset selection pen n (f m ) = γ n log ( ) M m + m log n n l 1 penalization pen n (f β ) = λ n β 1 What size λ n? Combinations thereof (see paper) Greedy algorithm for each Barron, Huang, Luo Penalized Squared Error and Likelihood 13/54
14 Sampling Idea Settings and Penalized Estimator Acceptability of Penalty General View f = h β hh a linear combination of h in H Randomly draw h 1, h 2,..., h m independently with probability proportional to β h for h i = h This idea is useful in Approximation bound Proof of the acceptability of a penalty via countable covers Greedy algorithm computational inaccuracy Squared error of order 1 m or better for each Barron, Huang, Luo Penalized Squared Error and Likelihood 14/54
15 Outline Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 15/54
16 Regression Problem Settings Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Training data (X, Y ) = (X i, Y i ) n i=1 Evaluation at X = (X i )n i=1, independent copy of X Target function f (x) = E[Y X =x], f B Noise ɛ = Y f (X) satisfies Bernstein s conditions Function class F = F H is the linear span of a library H f in F H of form f (x) = f β (x) = h β hh(x) with β = (β h : h H) Barron, Huang, Luo Penalized Squared Error and Likelihood 16/54
17 l 1 Penalized Least Squares Estimator Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Find ˆβ, ˆf = f ˆβ to satisfy { } Y f ˆβ 2 X + λ ˆβ 1,a = min Y f β 2 X + λ β 1,a. β where f β (x) = h β hh(x) and β 1,a = h β h a h. Lasso (Tibshirani 1996) Basis Pursuit (Chen and Donoho 1996) Barron, Huang, Luo Penalized Squared Error and Likelihood 17/54
18 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54
19 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54
20 Areas to be explored Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We show β 1 = β 1,a is a proper penalty and hence the corresponding resolvability risk bound follows. What kinds of weights a h What is the condition for λ What is the convergence rate of the risk Results in Huang, Cheang and Barron (2008) Section 4. Barron, Huang, Luo Penalized Squared Error and Likelihood 18/54
21 Risk bound in the case that H is finite Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Consider case H is finite with size M (also called p) Weights a h = h in traditional setting Weights a h = 2 h X,X in the transductive setting where h 2 X,X = 1 n n i=1 (h2 (X i ) + h 2 (X i )) λ is chosen at least 2 2γ(log 2M)/n with γ = 1.6(B + B ) 2 + 2(B + B )h Bern + 2.7σ 2 Tˆf truncates to the level B B Risk satisfies [ E Tˆf f 2 3 inf β } ] { f β f 2 + λ β 1,a + adjust/n Adjustment terms negligible compared to main terms university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 19/54
22 Glance of the Proof Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability F consists of f of the form f (x) = v m m h k (x)/a hk, h k H, m = 1, 2,... k=1 Complexities L( f ) = m log M + m log 2 Using sampling idea, there is a representor f m for each f Y f m 2 X Y f 2 X + f f m X + γl( f m )/n, }{{}}{{}}{{} v β 1,a /m v β 1,a /m γm log(2m)/n m = β 1,a /η and v = mη; optimal η = n/(log M) pen n (f β ) at least 2 2γ(log 2M)/n β 1 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 20/54
23 Improvement Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Improvement based on empirical L 2 covering of library H H 2 finite cover with precision ε 2 and cardinality m 2 Use a h = 1. 2γ log(2m) λ is chosen at least λ n = 2ε 2 n. The risk satisfies [ { } E Tˆf f 2 3 min f β f 2 + λ β 1 + adjust ], β n Barron, Huang, Luo Penalized Squared Error and Likelihood 21/54
24 Stratified Sampling Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability H 2 is an L 2 cover of H with precision ε 2 and cardinality m 2 For f = h β hh in F H, there is an f m = (v/m) m k=1 h k such that f f m 2 ε2 2 β 1v m m 2 v is between h β h and h β h (1 + m 2 /(m m 2 )) Based on Makovoz (1996) university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 22/54
25 Proof of Improvement Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Same F and L( f ) Using stratified sampling idea, there is a representor f m for each f Y f m 2 X Y f 2 X + }{{} f f m X }{{} + γl( f m )/n }{{}, ε 2 2 β 1v/(m m 2 ) ε 2 2 β 1v/(m m 2 ) γm log(2m)/n Set v = mη/ɛ 2 and ε 2 h β h/η m ε 2 h β h/η + m 2 Optimizing η Penalty at least λ n β 1 + γ m 2 log(2m) n Barron, Huang, Luo Penalized Squared Error and Likelihood 23/54
26 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Risk bound in the case that H is infinite Improvement with two levels of cover. A fine precision ε 1 typically of order ε 2 / n, we consider empirical covers H 1 providing effective library size M 1. This size M 1 serves as surrogate for M H 2 is the same as before Use a h = 1 2γ log(2m λ at least 2ε 1 ) 2 n + 16B ε 1 The risk satisfies [ { } E Tˆf f 2 3 min f β f 2 + λ β 1 + adjust ] β n There is a quantity 2γm 2 log(2m 1 ) in the adjust university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 24/54
27 Further exploration of the infinite case Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Take advantage of the covering properties of H to relate M 1 and ε 1 ; likewise m 2 and ε 2 Library H have metric dimension d 1 w.r.t. the empirical L 1 norm, if the cardinality M 1 is of order (1/ε) d 1 Likewise, the metric dimension d 2 w.r.t. the empirical L 2 norm d 1 d 2 2d 1 Barron, Huang, Luo Penalized Squared Error and Likelihood 25/54
28 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability l 1 Penalty for Libraries of Finite Metric Dimension The library H has dimensions d 1 and d 2 w.r.t. empirical L 1 and L 2 norms λ at least ( d1 λ n,d = C 1 (d 1, d 2 ) n log n ) (d2 +2)/(2d 2 +2) d 1 The risk tends to zero at rate ( inf β f β f 2 d1 + n log n ) d (d 2 +1) β 1 d 1. Barron, Huang, Luo Penalized Squared Error and Likelihood 26/54
29 A Refined Penalty Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Library H has dimensions d 1 and d 2 w.r.t. empirical norms Using penalty with λ at least λ n,d pen n (f β ) = λ β d 2/(d 2 +1) 1 The penalized least squares estimator ˆf satisfies the resolvability risk bound [ { } E Tˆf f 2 3 min β f β f 2 + λ n,d β d 2 d ] + adjust n Smaller index of resolvability university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 27/54
30 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability university-logo Variation and L 1,H Variation V (f ) of f, w.r.t. H and weights a = (a h ), is { } V (f ) = lim inf ε 0 f ε F H β 1,a : f ε = β h h and f ε f ε h A natural extension of β 1,a L 1,H consists of functions with finite variation Barron, Huang, Luo Penalized Squared Error and Likelihood 28/54
31 Approximation and Penalty Trade-off Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability We discuss the trade-off between approximation error and penalty as expressed in the resolvability and its relationship to interpolation spaces between two classes of functions. Squared approximation error Resolvability App(f, v) = inf { f β f 2 } f β : β 1 =v R 1 (f, λ n ) = inf v {App(f, v) + λ n v} If f L 1,H, R 1 (f, λ n ) λ n V (f ) goes to 0 linearly If f L 2 (P), the convergence rate can be arbitrarily slow university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 29/54
32 Interpolation Space B res 1,p Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Consider B res 1,p = {f : R 1 (f, λ) c f λ 2 p for all λ>0}, indexed by 1 p 2 Coincide with traditional interpolation spaces B p = [L 2 (P), L 1,H ] θ When p =1, we see B res 1,1 includes L 1,H. If f B1,p res, the resolvability of order λ2 p n, provides rate ε 2 p 2 ( γ log M n ) 1 p/2 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 30/54
33 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Trade-off for finite-dimensional libraries f B res 1,p and H has dimensions d 1 and d 2 The resolvability R 1 (f, λ n ), with R 1 (f, λ) = inf v {App(f, v) + λv}, is of order ( d1 ) (1 p/2)(d 2 +2) (d 2 +1) n log n d 1 The resolvability R 1 r (f, λ n ), with r = d 2 /(d 2 + 1), R 1 r (f, λ) = inf v {App(f, v) + λv r }, is of order ( d1 ) (1 p/2)(d 2 +2) (d 2 +2 p) n log n d 1 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 31/54
34 Variable Complexity Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability Finite Library H Variable complexities L(h), satisfying h e L(h) 1 a L,h = h L(h)+log 2 in traditional setting a L,h = h 2n L(h)+log 2 in transductive setting Similar risk bound holds Using L(h)+log 2 inside the sum defining β 1,aL in place of the constant log M +log 2 outside the sum Barron, Huang, Luo Penalized Squared Error and Likelihood 32/54
35 al Accuracy Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability ˆβ and f ˆβ satisfy Y f ˆβ 2 X + λ ˆβ } 1,a inf { Y f β 2 β X + λ β 1,a + A β,m. Same risk bound still holds with EA β,m inside the index of resolvability Barron, Huang, Luo Penalized Squared Error and Likelihood 33/54
36 Outline l 1 Penalized Least Squares l 1 Penalized Loglikelihood 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 34/54
37 l 1 Penalized Least Squares l 1 Penalized Least Squares l 1 Penalized Loglikelihood Dictionary H = {h(x)} of size M (also called p). Data (X i, Y i ) n i=1. Fit a function in the linear span F H to minimize 1 n ( n Y i h i=1 ) 2 β h h(x i ) + λ β h. h Barron, Huang, Luo Penalized Squared Error and Likelihood 35/54
38 l 1 Penalized Least Squares l 1 Penalized Loglikelihood Algorithms for l 1 Penalized Least Squares Examples of existing algorithms: interior point method (Boyed et al, 2004) LARS (EHJT, 2004) coordinate descent (Friedman et al, 2007) Built on Jones (1992) and others. l 1 penalized greedy pursuit (LPGP), HCB (2008), sec 3. Barron, Huang, Luo Penalized Squared Error and Likelihood 36/54
39 LPGP for Least Squares l 1 Penalized Least Squares l 1 Penalized Loglikelihood Initialize f 0 (x) = 0. Iteratively seek f m (x) = (1 α m )f m 1 (x) + β m h m (x) to minimize over h H, α (0, 1) and β R: Y i (1 α) f m 1 (X i ) βh(x i ) 2 n + λ [ β + (1 α) v m 1] where v m = m j=1 β j,m for f m = m j=1 β j,mh j. Barron, Huang, Luo Penalized Squared Error and Likelihood 37/54
40 l 1 Penalized Least Squares l 1 Penalized Loglikelihood Theorem for Accuracy Empirical l 2 norm h n. V f = inf { h β h h n : f (x) = h β hh(x)}. The m step estimator of LPGP within order Vf 2 /m of the minimal objective: Y i f m (X i ) n + λv m inf f F H { Y i f (X i ) n + λv f + 4V } f 2 m + 1 Barron, Huang, Luo Penalized Squared Error and Likelihood 38/54
41 Advantages and Disadvantages l 1 Penalized Least Squares l 1 Penalized Loglikelihood Advantages: explicit guarantee of accuracy; cost Mnm v.s. Mn 2 for LARS; inexpensive optimization at each iteration. Disadvantages: approximate solution; fixed λ n. Barron, Huang, Luo Penalized Squared Error and Likelihood 39/54
42 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood WLOG β h 0 (assume H closed under sign change). For an arbitrary f = h β hh, e 2 m = Y i f m (X i ) 2 n Y i f (X i ) 2 n + λv m. f m = (1 α m )f m 1 + β m h m, then em 2 is at least as good as choosing α = 2 m+1, β = αv f, an h chosen at random. Barron, Huang, Luo Penalized Squared Error and Likelihood 40/54
43 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood Rearrange to have em 2 (1 α)em α2 b(v f h) + αλv f 2α(1 α) 1 n (Y i f m 1 (X i )) (V n f h(x i ) f (X i )) }{{} i=1 0 averaging over h where b(v f h) = Y V f h 2 n Y f 2 n. Consider drawing h with probability β h /V f, cross term vanishes and averaging b(v f h) over h bounded by V 2 f. Barron, Huang, Luo Penalized Squared Error and Likelihood 41/54
44 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood We show e 2 m (1 α)e 2 m 1 + α2 V 2 f + αλv f. Induction reveals the accuracy of O(1/m). Barron, Huang, Luo Penalized Squared Error and Likelihood 42/54
45 l 1 Penalized Loglikelihood l 1 Penalized Least Squares l 1 Penalized Loglikelihood X 1,..., X n be i.i.d. in R p distributed as Let L n (f ) = 1 n log(1/p f (X n )). p f (x) = ef (x) p 0 (x) C f. l 1 penalized loglikelihood estimator f = f β minimizes L n (f ) + λv f where V f = inf{ h β h : f (x) = h β hh(x), h H}. Barron, Huang, Luo Penalized Squared Error and Likelihood 43/54
46 Motivation l 1 Penalized Least Squares l 1 Penalized Loglikelihood Minimization is computationally demanding when p large. Term by term selection is favored in sparse settings. Approximate optimization good enough for risk analysis. BHLL (2008) extends LPGP for penalized loglikelihood. Barron, Huang, Luo Penalized Squared Error and Likelihood 44/54
47 LPGP for Penalized Loglikelihood l 1 Penalized Least Squares l 1 Penalized Loglikelihood Initialize with f 0 (x) = 0. f m (x) = (1 α m )f m 1 (x) + β m h m (x) with α m, β m and h m chosen by argmin α, β, h {L n (f m ) + λ[(1 α)v m 1 + β ]} where v m 1 = m 1 j=1 βj,m 1 for fm 1 = m 1 j=1 β j,m 1h j. Barron, Huang, Luo Penalized Squared Error and Likelihood 45/54
48 Theorem l 1 Penalized Least Squares l 1 Penalized Loglikelihood Theorem Suppose h(x) C for all h(x) H. The m step LPGP estimator f m (x) has L n (f m ) + λv m inf {L n(f ) + λv f + 2V f 2 f F m + 1 }. Barron, Huang, Luo Penalized Squared Error and Likelihood 46/54
49 Idea of Proof l 1 Penalized Least Squares l 1 Penalized Loglikelihood m step error has linear and nonlinear components. Linear parts are handled similarly to the least squares. Nonlinear(normalizing constants) of O(α 2 ) by a mgf bound. Induction completes the proof. Barron, Huang, Luo Penalized Squared Error and Likelihood 47/54
50 Accuracy l 1 Penalized Least Squares l 1 Penalized Loglikelihood f m (x) = (1 α m )f m 1 (x) + β m h m (x) e m = L n (f m ) L n (f ) + λ [(1 α m )v m 1 + β m ]. From definition L n (f m ) L n (f ) equals 1 n e f m(t) p 0 (t) [(1 α m )f m 1 (X i ) + β m h m (X i ) f (X i )] log n e f (t). p i=1 0 (t) }{{}}{{} linear nonlinear Barron, Huang, Luo Penalized Squared Error and Likelihood 48/54
51 Sampling h l 1 Penalized Least Squares l 1 Penalized Loglikelihood Consider α = 2/(m + 1), β = αv f, a random h(x). Rearrange and write p α = e (1 α)[f m 1(x) f (x)] p f (x)/c e m (1 α)e m 1 + αλv f + α 1 n + log n [f (X i ) V f h(x i )] i=1 } {{ } 0 averaging over h p α (t)exp{α(v f h(t) f (t))}. Sample h with probability β h /V f, third term vanishes. Bring the average over h inside log and the expectation over random h of exp{α(v f h(t) f (t))} not more than e α2 Vf 2/2. university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 49/54
52 Induction l 1 Penalized Least Squares l 1 Penalized Loglikelihood We show e m (1 α)e m 1 + α2 Vf αλv f Induction completes the proof. Barron, Huang, Luo Penalized Squared Error and Likelihood 50/54
53 Current Work l 1 Penalized Least Squares l 1 Penalized Loglikelihood Generalize to permit l 2 norm in penalized loglikelihood. High dimensional graphical models: logistic, gaussian. R package will be publicly available. Barron, Huang, Luo Penalized Squared Error and Likelihood 51/54
54 Outline 1 Settings and Penalized Estimator Acceptability of Penalty General View 2 Settings and l 1 Penalization Risk bound for l 1 Penalized Least Squares Risk Properties for the Finite-Dimension Libraries Trade-off in the Resolvability 3 l 1 Penalized Least Squares l 1 Penalized Loglikelihood 4 university-logo Barron, Huang, Luo Penalized Squared Error and Likelihood 52/54
55 Our Work university-logo General penalty condition Subset selection pen n (f m ) = γ n log ( ) M m + m log n n l 1 penalization pen n (f β ) = λ n β 1 valid What size λ n? Combinations thereof (see paper) Greedy algorithm for each valid Barron, Huang, Luo Penalized Squared Error and Likelihood 53/54
56 Sampling Idea university-logo f = h β hh a linear combination of h in H Randomly draw h 1, h 2,..., h m independently with probability proportional to β h for h i = h This idea is useful in Approximation bound Proof of the acceptability of a penalty via countable covers Greedy algorithm computational inaccuracy Squared error of order 1 m or better for each Barron, Huang, Luo Penalized Squared Error and Likelihood 54/54
A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo
A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND
More informationInformation and Statistics
Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationRegularization Algorithms for Learning
DISI, UNIGE Texas, 10/19/07 plan motivation setting elastic net regularization - iterative thresholding algorithms - error estimates and parameter choice applications motivations starting point of many
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationRelaxed Lasso. Nicolai Meinshausen December 14, 2006
Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationFunctional Analysis Exercise Class
Functional Analysis Exercise Class Week 2 November 6 November Deadline to hand in the homeworks: your exercise class on week 9 November 13 November Exercises (1) Let X be the following space of piecewise
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationlearning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationGraphlet Screening (GS)
Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More information