Robust estimation, efficiency, and Lasso debiasing

Size: px
Start display at page:

Download "Robust estimation, efficiency, and Lasso debiasing"

Transcription

1 Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

2 High-dimensional linear regression n 1 n p n 1 Linear model: p 1 y i = x T i β + ɛ i, i = 1,..., n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

3 High-dimensional linear regression n 1 n p n 1 Linear model: p 1 y i = x T i β + ɛ i, i = 1,..., n When p n, assume sparsity: β 0 k Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

4 Robust M-estimators Generalization of OLS suitable for heavy-tailed/contaminated errors: { } 1 n β arg min l(xi T β y i ) β n i=1 Loss Least squares Absolute value Huber Tukey Millions of calls Least squares Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26 Year

5 Robust M-estimators Generalization of OLS suitable for heavy-tailed/contaminated errors: { } 1 n β arg min l(xi T β y i ) β n Extensive theory (consistency, asymptotic normality) for p fixed, n i=1 Loss Least squares Absolute value Huber Tukey Millions of calls Least squares Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26 Year

6 High-dimensional M-estimators Natural idea: For p > n, use regularized version: { } 1 n β arg min l(xi T β y i ) + λ β 1 β n i=1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

7 High-dimensional M-estimators Natural idea: For p > n, use regularized version: { } 1 n β arg min l(xi T β y i ) + λ β 1 β n Complications: Optimization for nonconvex l? i=1 Statistical theory? Are certain losses provably better than others? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

8 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

9 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Compare to Lasso theory: Requires sub-gaussian ɛ i s Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

10 Some statistical theory When l < C, global optima of high-dimensional M-estimator satisfy k log p β β 2 C, n regardless of distribution of ɛ i Compare to Lasso theory: Requires sub-gaussian ɛ i s If l(u) is locally convex/smooth for u r, any local optima within radius cr of β satisfy β β 2 C k log p n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

11 Some optimization theory O r! k log p n r b e Local optima may be obtained via two-step algorithm Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

12 Some optimization theory O r! k log p n r b e Local optima may be obtained via two-step algorithm Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

13 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

14 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Rearranging basic inequality L n ( β) L n (β ) and assuming λ 2 X T ɛ, obtain n β β 2 cλ k Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

15 Motivating calculation Lasso analysis (e.g., van de Geer 07, Bickel et al. 08): { } 1 β arg min β n y X β λ β 1 }{{} L n(β) Rearranging basic inequality L n ( β) L n (β ) and assuming λ 2 X T ɛ, obtain n β β 2 cλ k ( Sub-Gaussian assumptions on x i s and ɛ i s provide O bounds, minimax optimal ) k log p n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

16 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

17 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, l (ɛ) sub-gaussian whenever l bounded Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

18 Motivating calculation Key observation: For general loss function, if λ 2 obtain β β 2 cλ k X T l (ɛ) n, l (ɛ) sub-gaussian whenever l bounded = can achieve estimation error k log p β β 2 c, n without assuming ɛ i is sub-gaussian Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

19 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

20 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

21 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual When l is nonconvex, local optima β may exist that are not global optima Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

22 Technical challenges Lasso analysis also requires verifying restricted eigenvalue (RE) condition on design matrix, more complicated for general l Addressed by local curvature of robust losses around origin Loss Least squares Absolute value Huber Tukey Residual When l is nonconvex, local optima β may exist that are not global optima Addressed by theoretical analysis of β β 2 and derivation of suitable optimization algorithms Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

23 Related work: Nonconvex regularized M-estimators Composite objective function { β arg min β 1 R L n (β) + } p ρ λ (β j ) j=1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

24 Related work: Nonconvex regularized M-estimators Composite objective function { β arg min β 1 R L n (β) + } p ρ λ (β j ) j=1 Assumptions: L n satisfies restricted strong convexity with curvature α (Negahban et al. 12) ρ λ has bounded subgradient at 0, and ρ λ (t) + µt 2 convex α > µ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

25 Stationary points (L. & Wainwright 15) O r! k log p n b e Stationary points statistically indistinguishable from global optima L n ( β) + ρ λ ( β), β β 0, β feasible Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

26 Stationary points (L. & Wainwright 15) O r! k log p n b e Stationary points statistically indistinguishable from global optima L n ( β) + ρ λ ( β), β β 0, β feasible log p Under suitable distributional assumptions, for λ n and R 1 λ, k log p β β 2 c statistical error n Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

27 Mathematical statement Theorem (L. & Wainwright 15) Suppose R is chosen s.t. β is feasible, and λ satisfies { } log p max L n (β ), α λ α n R. For n Cτ2 R 2 log p, any stationary point β satisfies α 2 β β 2 λ k α µ, where k = β 0. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

28 Mathematical statement Theorem (L. & Wainwright 15) Suppose R is chosen s.t. β is feasible, and λ satisfies { } log p max L n (β ), α λ α n R. For n Cτ2 R 2 log p, any stationary point β satisfies α 2 β β 2 λ k α µ, where k = β 0. New ingredient for robust setting: l convex only in local region = need for local consistency results Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

29 Local RSC condition Local RSC condition: For := β 1 β 2, L n (β 1 ) L n (β 2 ), α 2 2 τ log p n 2 1, β j β 2 r Loss function has directions of both positive and negative curvature. Negative directions are forbidden by regularizer. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

30 Local RSC condition Local RSC condition: For := β 1 β 2, L n (β 1 ) L n (β 2 ), α 2 2 τ log p n 2 1, β j β 2 r Loss function has directions of both positive and negative curvature. Only requires restricted Negative directions curvature are forbiddenwithin by regularizer. constant-radius region around β Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

31 Consistency of local stationary points O r! k log p n r b e Theorem (L. 17) Suppose L n satisfies α-local RSC and ρ λ is µ-amenable, with α > µ. Suppose l log p τ C and λ n. For n α µ k log p, any stationary point β s.t. β β 2 r satisfies β β 2 λ k α µ. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

32 Optimization theory Question: How to obtain sufficiently close local solutions? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

33 Optimization theory Question: How to obtain sufficiently close local solutions? Goal: For regularized M-estimator { 1 n β arg min l(xi T β 1 R n i=1 β y i ) + ρ λ (β) }, where l satisfies α-local RSC, find stationary point such that β β 2 r Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

34 Wisdom from Huber Descending ψ-functions are tricky, especially when the starting values for the iterations are non-robust.... It is therefore preferable to start with a monotone ψ, iterate to death, and then append a few (1 or 2) iterations with the nonmonotone ψ. Huber 1981, pp Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

35 Two-step algorithm (L. 17) Two-step M-estimator: Finds local stationary points of nonconvex, robust loss + µ-amenable penalty { } 1 n β arg min l(xi T β y i ) + ρ λ (β) β 1 R n i=1 Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

36 Two-step algorithm (L. 17) Two-step M-estimator: Finds local stationary points of nonconvex, robust loss + µ-amenable penalty { } 1 n β arg min l(xi T β y i ) + ρ λ (β) β 1 R n i=1 Algorithm 1 Run composite gradient descent on convex, robust loss + l 1 -penalty until convergence, output β H 2 Run composite gradient descent on nonconvex, robust loss + µ-amenable penalty, input β 0 = β H Note: Want to optimize original nonconvex objective, since it leads to more efficient (lower-variance) estimators Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

37 Scale calibration Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

38 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

39 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

40 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

41 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Better objective (low-dimensional version proposed by Huber): { 1 n ( yi x T ) } i β ( β, σ) arg min l σ + aσ +λ β 1 β,σ n σ i=1 }{{} L n(β,σ) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

42 Scale calibration Closer look: Loss function l in some sense calibrated to scale of ɛ i If Huber parameter too large, estimation error bound based on l becomes suboptimal If Huber parameter too small, RSC no longer satisfied w.h.p. For Lasso, optimal λ known to depend on σ ɛ, but loss function does not require calibration Better objective (low-dimensional version proposed by Huber): { 1 n ( yi x T ) } i β ( β, σ) arg min l σ + aσ +λ β 1 β,σ n σ i=1 }{{} L n(β,σ) However, joint location/scale estimation notoriously difficult even in low dimensions Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

43 Scale calibration Another idea: MM-estimator { 1 n ( yi x β T ) } i β arg min l + λ β 1, β n σ 0 i=1 using robust estimate of scale σ 0 based on preliminary estimate β 0 How to obtain ( β 0, σ 0 )? Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

44 Scale calibration Another idea: MM-estimator { 1 n ( yi x β T ) } i β arg min l + λ β 1, β n σ 0 i=1 using robust estimate of scale σ 0 based on preliminary estimate β 0 How to obtain ( β 0, σ 0 )? S-estimators/LMS: where σ(r) = r (n nδ ) LTS: β 0 arg min β β 0 arg min β { σ(r(β))}, 1 n n nα i=1 (y i xi T β) 2 (i) + λ β 1 Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

45 Our approach Lepski s method originally proposed for adaptive bandwidth selection in nonparametric regression Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

46 Our approach Lepski s method originally proposed for adaptive bandwidth selection in nonparametric regression Can be used to select σ in location/scale problem: { } 1 n β σ arg min l σ (y i xi T β) + λσ β 1, β n i=1 where l σ is Huber loss parametrized by σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

47 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

48 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ min max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

49 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

50 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

51 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min j ` max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

52 Lepski s method Preceding theory implies k log p β σ β 2 Cσ, n w.h.p., assuming σ Var(ɛ i ) := σ Basic idea of Lepski s method: Compute β σ on gridding {σ 1,..., σ M } of interval [σ min, σ max ] σ For each σ j, check if β σj β k log p σl 2 2Cσ l n for all l > j, and let σ be argmin in this set min b max Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

53 Statistical guarantee Theorem (L. 17) With high probability, output of Lepski s method satisfies k log p β σ β 2 C σ, n Method does not require prior knowledge of scale σ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

54 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

55 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance One-step estimation proposed for obtaining better efficiency (Bickel 75): b ψ = β + (X T X ) 1 Â(ψ) where Â(ψ) is estimate of E[ψ (ɛ i )] 1 n n i=1 ψ(y i x T i β)x i, Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

56 Efficiency Although β σ guaranteed to be l 2 -consistent, estimator may have relatively high variance One-step estimation proposed for obtaining better efficiency (Bickel 75): b ψ = β + (X T X ) 1 Â(ψ) where Â(ψ) is estimate of E[ψ (ɛ i )] Low-dimensional result: n( bψ β) 1 n ( d N 0, n i=1 ψ(y i x T i β)x i, E[ψ 2 ) (ɛ i )] E[ψ (ɛ i )] 2 Θ, so asymptotic variance for ψ = f f matches variance of MLE Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

57 Efficiency In high dimensions, define b ψ = β σ + Θ Â(ψ) 1 n n ψ(y i xi T i=1 β σ )x i, where Â(ψ) = 1 n n i=1 ψ (y i xi T β σ ) and Θ is high-dimensional estimate of Θ (e.g., graphical Lasso estimator) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

58 Efficiency In high dimensions, define b ψ = β σ + Θ Â(ψ) 1 n n ψ(y i xi T i=1 β σ )x i, where Â(ψ) = 1 n n i=1 ψ (y i xi T β σ ) and Θ is high-dimensional estimate of Θ (e.g., graphical Lasso estimator) Resembles Lasso debiasing procedure (Zhang & Zhang 14, van de Geer et al. 14, Javanmard & Montanari 14) Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

59 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

60 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

61 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Can derive asymptotic confidence intervals/regions for subsets of coefficients Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

62 Efficiency Theorem (L. 17) Let J {1,..., p} denote a subset of coordinates of constant dimension. Then ( n( bψ β σ d E[ψ 2 ) (ɛ i )] ) J N 0, E[ψ (ɛ i )] 2 Θ JJ Implies semiparametric efficiency of one-step estimator when ψ = f f Can derive asymptotic confidence intervals/regions for subsets of coefficients Important: Allows statistical inference for high-dimensional regression in cases when x i s, ɛ i s are heavy-tailed Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

63 Summary New theory for robust high-dimensional M-estimators implies ( ) O error rates when l C based on local RSC k log p n Lepski s method proposed to avoid joint scale parameter estimation Derived properties of one-step estimator for semiparametric efficiency and high-dimensional inference Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

64 Summary New theory for robust high-dimensional M-estimators implies ( ) O error rates when l C based on local RSC k log p n Lepski s method proposed to avoid joint scale parameter estimation Derived properties of one-step estimator for semiparametric efficiency and high-dimensional inference Loh (2017). Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Annals of Statistics. Thank you! Po-Ling Loh (UW-Madison) Robust estimation, efficiency, and debiasing Aug 12, / 26

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Sparse Learning and Distributed PCA. Jianqing Fan

Sparse Learning and Distributed PCA. Jianqing Fan w/ control of statistical errors and computing resources Jianqing Fan Princeton University Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu Outline Computational Resources and Statistical

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

The Geometry of Hypothesis Testing over Convex Cones

The Geometry of Hypothesis Testing over Convex Cones The Geometry of Hypothesis Testing over Convex Cones Yuting Wei Department of Statistics, UC Berkeley BIRS workshop on Shape-Constrained Methods Jan 30rd, 2018 joint work with: Yuting Wei (UC Berkeley)

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo) Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions

Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions Susan Athey Guido W. Imbens Stefan Wager Current version November 2016 Abstract There are many settings

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities,

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Jinghui Chen Department of Systems and Information Engineering University of Virginia Quanquan Gu

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Multivariate Calibration with Robust Signal Regression

Multivariate Calibration with Robust Signal Regression Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

arxiv: v3 [stat.me] 14 Nov 2016

arxiv: v3 [stat.me] 14 Nov 2016 Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions Susan Athey Guido W. Imbens Stefan Wager arxiv:1604.07125v3 [stat.me] 14 Nov 2016 Current version November

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Stat 710: Mathematical Statistics Lecture 40

Stat 710: Mathematical Statistics Lecture 40 Stat 710: Mathematical Statistics Lecture 40 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 40 May 6, 2009 1 / 11 Lecture 40: Simultaneous

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers Victor Chernozhukov MIT Whitney K. Newey MIT Rahul Singh MIT September 10, 2018 Abstract Many objects of interest can be

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

arxiv: v2 [stat.me] 3 Jan 2017

arxiv: v2 [stat.me] 3 Jan 2017 Linear Hypothesis Testing in Dense High-Dimensional Linear Models Yinchu Zhu and Jelena Bradic Rady School of Management and Department of Mathematics University of California at San Diego arxiv:161.987v

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Journal of Machine Learning Research 20 (2019) 1-37 Submitted 12/16; Revised 1/18; Published 2/19 Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Han Chen Garvesh Raskutti

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

Asymmetric least squares estimation and testing

Asymmetric least squares estimation and testing Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic

More information

Distribution-Free Predictive Inference for Regression

Distribution-Free Predictive Inference for Regression Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information