Uncertainty quantification in high-dimensional statistics
|
|
- Gerald Hutchinson
- 5 years ago
- Views:
Transcription
1 Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier
2 Phenotype index High-dimensional data Behavioral economics and genetics (with Ernst Fehr, U. Zurich) n = persons genetic information (SNPs): p response variables, measuring behavior p n Number of significant target SNPs per phenotype goal: find significant associations between behavioral responses and genetic markers Number of significant target SNPs
3 Ars conjectandi in the light of modern applications from Stigler (1986), Bolthausen (2010) I learned: Jakob Bernoulli has developed: Law of Large Numbers for Bernoulli distributed random variables in the proof: some result about Large Deviations (concentration inequality; no mentioning of variance ) point estimation rate of convergence
4 regarding high-dimensional statistics: a lot of progress has been achieved over the last 8-10 years for point estimation rates of convergence and a substantial mathematical part relies on concentration inequalities, results on large deviations link to Ars Conjectandi: the general approach is still the same and was established 300 years ago
5 for high-dimensional statistics: very little work on assigning measures of uncertainty, p-values, confidence intervals and in Ars Conjectandi...? I ll come to this in a moment
6 for high-dimensional statistics: very little work on assigning measures of uncertainty, p-values, confidence intervals and in Ars Conjectandi...? I ll come to this in a moment
7 we need uncertainty quantification! (the core of statistics)
8 we need uncertainty quantification! (the core of statistics) did Jakob Bernoulli address this point?
9 Jakob Bernoulli wrote:... the most important part is missing where I am showing how the fundamental principles of Ars Conjectandi can be applied for civil, moral and economic matters Bernoulli describes how one can approximate unknown probabilities by finite sample quantities (i.e. frequentist ) Leibniz was not very convinced: how can one describe complex phenomena like diseases with probabilities (potential environmental changes,...) we certainly do nowadays...! ( Personalized medicine ) Bernoulli gives a brief response addressing the criticisms raised by some academics : probably Bernoulli had not enough time to present a more detailed treatment indications that Bernoulli had something like a confidence interval in mind
10 Jakob Bernoulli wrote:... the most important part is missing where I am showing how the fundamental principles of Ars Conjectandi can be applied for civil, moral and economic matters Bernoulli describes how one can approximate unknown probabilities by finite sample quantities (i.e. frequentist ) Leibniz was not very convinced: how can one describe complex phenomena like diseases with probabilities (potential environmental changes,...) we certainly do nowadays...! ( Personalized medicine ) Bernoulli gives a brief response addressing the criticisms raised by some academics : probably Bernoulli had not enough time to present a more detailed treatment indications that Bernoulli had something like a confidence interval in mind
11 Jakob Bernoulli wrote:... the most important part is missing where I am showing how the fundamental principles of Ars Conjectandi can be applied for civil, moral and economic matters Bernoulli describes how one can approximate unknown probabilities by finite sample quantities (i.e. frequentist ) Leibniz was not very convinced: how can one describe complex phenomena like diseases with probabilities (potential environmental changes,...) we certainly do nowadays...! ( Personalized medicine ) Bernoulli gives a brief response addressing the criticisms raised by some academics : probably Bernoulli had not enough time to present a more detailed treatment indications that Bernoulli had something like a confidence interval in mind
12 goal (regarding the title of the talk): p-values/confidence interval for a high-dimensional linear model (and we can then generalize to other models)
13 Motif regression and variable selection for finding HIF1α transcription factor binding sites in DNA seq. Müller, Meier, PB & Ricci for coarse DNA segments i = 1,..., n : predictor X i = (X (1) i,..., X (p) i ) R p : abundance score of candidate motifs j = 1,..., p in DNA segment i (using sequence data and computational biology algorithms, e.g. MDSCAN) univariate response Y i R: binding intensity of HIF1α to coarse DNA segment (from CHIP-chip experiments)
14 question: relation between the binding intensity Y and the abundance of short candidate motifs? linear model is often reasonable motif regression (Conlon, X.S. Liu, Lieb & J.S. Liu, 2003) Y i = p j=1 β 0 j X (j) i + ε i i = 1,..., n = 143, p = 195 goal: variable selection and significance of variables find the relevant motifs among the p = 195 candidates
15 Lasso (Tibshirani, 1996) Lasso for linear models ˆβ(λ) = argmin β (n 1 Y Xβ 2 + }{{} λ 0 well-known facts: convex optimization β 1 }{{} p j=1 β j Lasso does variable selection some of the ˆβ j (λ) = 0 (because of l 1 -geometry ) ˆβ(λ) is a shrunken OLS-estimate )
16 Lasso for variable selection: Ŝ(λ) = {j; ˆβ j (λ) 0} estimate for S 0 = {j; β 0 j 0} no significance testing involved it s convex optimization only! and it s very popular (Meinshausen & PB, 2006;Zhao & Yu, 2006; Wainwright, 2009;...)
17 for motif regression (finding HIF1α transcription factor binding sites) n=143, p=195 Lasso selects 26 covariates when choosing λ = ˆλ CV via cross-validation and resulting R 2 50% i.e. 26 interesting candidate motifs how significant are the findings?
18 for motif regression (finding HIF1α transcription factor binding sites) n=143, p=195 Lasso selects 26 covariates when choosing λ = ˆλ CV via cross-validation and resulting R 2 50% i.e. 26 interesting candidate motifs how significant are the findings?
19 estimated coefficients ˆβ(ˆλ CV ) original data coefficients variables p-values for H 0,j : β 0 j = 0?
20 High-dimensional linear models and what statistical theory tells us Y = Xβ 0 + ε, p n with fixed (deterministic) design X problem of identifiability: for p > n: Xβ 0 = Xθ for any θ = β 0 + ξ, ξ in the null-space of X cannot say anything about ˆβ β 0 without further assumptions!
21 Assumption 1: design conditions in terms of restricted eigenvalues minimal eigenvalue of ˆΣ = X T X/n equals zero (if p > n) consider smallest restricted eigenvalue (or compatibility constant) and require it to be bounded away from zero (van de Geer, 2007; Candes & Tao, 2007;...;Bickel, Ritov & Tsybakov, 2009; Wainwright, 2009;... ) Example: X has i.i.d. rows with sub-gaussian distribution Cov(X i ) is e.g. Toeplitz matrix; or equi-corr. with 0 < ρ < 1 with high probablity: smallest restricted eigenvalue of ˆΣ bounded away from 0
22 various conditions and their relations (van de Geer & PB, 2009) RIP 8 S * \S s 6 weak (S, 2s)- irrepresentable weak (S,2s)- RIP coherence adaptive (S, 2s)- restricted regression adaptive (S, s)- restricted regression 6 (S,2s)-irrepresentable oracle inequalities for prediction and estimation (S,2s)-restricted eigenvalue (S,s)-restricted eigenvalue (S,s)-uniform irrepresentable 9 S-compatibility 6 S \S =0 * S =S 6 * 2
23 consider Lasso ˆβ(λ) = argmin β (n 1 Y Xβ 2 + λ β 1 ) assuming restricted l 1 -eigenvalue (compatibility) condition: n 1 X( ˆβ β 0 ) 2 2 O P (s 0 log(p)/n), λ log(p)/n ˆβ β 0 1 O P (s 0 log(p)/n), λ log(p)/n s 0 = S 0 is the cardinality of the active set that is: β 0 is identifiable (if s 0 n/ log(p))
24 Assumption 2: beta-min condition beta-min condition: min βj 0 s 0 log(p)/n j S 0 (or s0 log(p)/n or log(p)/n) from ˆβ β 0 1 O P (s 0 log(p)/n) we immediately obtain: variable screening: Ŝ S 0 with high probability i.e., we will not miss a true variable! but we may (typically) have too many false positive selections
25 estimated coefficients ˆβ(ˆλ CV ) original data coefficients variables which variables in Ŝ are false positives? p-values would be very useful!
26 P-values for high-dimensional linear models Y = Xβ 0 + ε goal: statistical hypothesis testing H 0,j : β 0 j = 0 or H 0,G : β 0 j = 0 for all j G {1,..., p} background: if we could handle the asymptotic distribution of the Lasso ˆβ(λ) under the null-hypothesis could construct p-values this is very difficult! asymptotic distribution of ˆβ has some point mass at zero,... Knight and Fu (2000) for p < and n
27 standard bootstrapping and subsampling cannot be used either but there are recent proposals when using adaptations of standard resampling methods (Chatterjee & Lahiri, 2013; Liu & Yu, 2013) non-uniformity/super-efficiency issues remain...
28 Low-dimensional projections and bias correction Or de-sparsifying the Lasso estimator related work by Zhang and Zhang (2011) motivation: ˆβ OLS,j = projection of Y onto residuals (X j X j ˆγ (j) OLS ) projection not well defined if p > n use regularized residuals from Lasso on X-variables Z j = X j X j ˆγ (j) Lasso
29 using Y = Xβ 0 + ε Zj T Y = Zj T X j βj 0 + k j Z T j X k + Zj T ε and hence Zj T Y Zj T = βj 0 + X j ˆb j = Z j T Y Z T j Zj T X k Z T βk 0 + k j j X j }{{} bias X j Zj T ε Zj T X j }{{} noise component Zj T X k ˆβ Z T Lasso;k k j j X j }{{} Lasso-estim. bias corr.
30 ˆb j is not sparse!... and this is crucial to obtain Gaussian limit nevertheless: it is optimal (see later) target: low-dimensional component β 0 j η := {β 0 k ; k j} is a high-dimensional nuisance parameter exactly as in semiparametric modeling! and sparsely estimated (e.g. with Lasso)
31 ˆb j is not sparse!... and this is crucial to obtain Gaussian limit nevertheless: it is optimal (see later) target: low-dimensional component β 0 j η := {β 0 k ; k j} is a high-dimensional nuisance parameter exactly as in semiparametric modeling! and sparsely estimated (e.g. with Lasso)
32 Asymptotic pivot and optimality Theorem (van de Geer, PB & Ritov, 2013) n(ˆbj β 0 j ) N (0, σ 2 ε Ω jj ) (j = 1,..., p) Ω jj explicit expression (Σ 1 ) jj optimal! reaching semiparametric information bound asympt. optimal p-values and confidence intervals if we assume: population Cov(X) = Σ has minimal eigenvalue M > 0 sparsity for regr. Y vs. X: s 0 = o( n/ log(p)) quite sparse sparsity of design: Σ 1 sparse i.e. sparse regressions X j vs. X j : s j o( n/ log(p)) may not be OK
33 It is optimal! Cramer-Rao
34 for design with Σ 1 non-sparse: Ridge projection (PB, 2013): good type I error control but not optimal in terms of power convex program instead of Lasso for Z j (Javanmard & Montanari, 2013; MSc. thesis Dezeure, 2013) Javanmard & Montanari prove optimality careful choice of regularization parameters with e.g. square root Lasso (van de Geer & PB, in progress) so far: no convincing empirical evidence that we can deal well with such scenarios (Σ 1 non-sparse)
35 Uniform convergence: n(ˆbj β 0 j ) N (0, σ 2 ε Ω jj ) (j = 1,..., p) convergence is uniform over B(s 0 ) = {β; β 0 s 0 } honest tests and confidence regions! and we can avoid post model selection inference (cf. Pötscher and Leeb)
36 Simultaneous inference over all components: n(ˆb β 0 ) (W 1,..., W p) N p(0, σ 2 εω) can construct P-values for: H 0,G with any G: test-statistics max j G ˆb j since covariance structure Ω is known and can easily do efficient multiple testing adjustment since covariance structure Ω is known!
37 Alternatives? versions of bootstrapping (Chatterjee & Lahiri, 2013) super-efficiency phenomenon! i.e. non-uniform convergence Joe Hodges good for estimating the zeroes (i.e., j S c 0 with β0 j = 0) bad for estim. the non-zeroes (i.e., j S 0 with β 0 j 0) multiple sample splitting (Meinshausen, Meier & PB, 2009) split the sample repeatedly in two halfs: select variables on first half p-values using second half, based on selected variables avoids (because of sample splitting) over-optmistic p-values, but potentially suffers in terms of power
38 Some empirical results (Dezeure, Meier & PB, in progress) compare power and control of familywise error rate (FWER) always p = 500, n = 100 and s 0 = 15 equi-correlation design Σ jk 0.8 (j k) (Σ 1 not sparse): multi smp split Ridge projection Lasso projection power power power FWER FWER FWER projection estimators are unreliable for controlling FWER!
39 Toeplitz design with banded (very sparse) Σ 1 : multi smp split Ridge projection Lasso projection power power power FWER FWER FWER Lasso-projection method is slightly best (as it should!)
40 design with exponential decaying (approx. sparse) Σ 1 : multi smp split Ridge projection Lasso projection power power power FWER FWER FWER methods are roughly on par Lasso projection method has one scenario with bad FWER
41 Real data X: 500 variables with highest emp. variance multi smp split Ridge projection Lasso projection power power power FWER FWER FWER Lasso-projection method is unreliable for controlling FWER!
42 Real data X: 500 variables with highest pairwise correlations multi smp split Ridge projection Lasso projection power power power FWER FWER FWER Lasso-projection method is slightly best multi-sample splitting has once rather bad FWER
43 overall: multi sample splitting seems most reliable (for type I err. contr.) at the price of being more conservative (less power) (and absence of optimality theory) Leo Breiman Brad Efron is our empirical finding true more generally? so far: theory doesn t give a clear answer
44 Motif regression example one significant variable with both de-sparsified Lasso and multi sample splitting motif regression coefficients variables : variable/motif with FWER-adjusted p-value : p-value clearly larger than 0.05 (this variable corresponds to known true motif)
45 for data-sets with p and n 100 often no significant variable because it is a too extreme ratio log(p)/n
46 Behavioral economics and genomewide association with Ernst Fehr, University of Zurich n = 1525 probands (all students!) m = 79 response variables measuring various behavioral characteristics (e.g. risk aversion) from well-designed experiments 460 Target SNPs (as a proxy for 10 6 SNPs): 1380 parameters per response (but only 1341 meaningful parameters) model: multivariate linear model Y } n m = X {{} n p β }{{} p m + ε } n m {{} responses SNP data error although p < n, the design matrix X (with categorical values {1, 2, 3}) does not have full rank
47 Y n m = X n p β p m + ε n m interested in p-values for H 0,jk : β jk = 0 versus H A,jk : β jk 0, H 0,G : β jk = 0 for all j, k G versus H A,G = H0,G c adjusted to control the familywise error rate (i.e. conservative criterion) in total: we consider hypotheses we test for non-marginal regression coefficients predictive GWAS
48 there is structure! 79 response experiments 23 chromosomes per response experiment 20 Target SNPs per chromosome = 460 Target SNPs global
49 do hierarchical FWER adjustment (Meinshausen, 2008) global significant not significant 1. test global hypothesis 2. if significant: test all single response hypotheses 3. for the significant responses: test all single chromosome hyp. 4. for the significant chromosomes: test all TargetSNPs powerful multiple testing with data dependent adaptation of the resolution level (our analysis with 20 TagetSNPs per chromosome is ad-hoc) cf. general sequential testing principle (Goeman & Solari, 2010)
50 number of significant SNP parameters per response Number of significant target SNPs per phenotype Number of significant target SNPs Phenotype index response 40 has most significant (levels of) Target SNPs
51 Conclusions 300 years ago = now & etc... computing but the basic principles appearing in nowadays books are still related to Ars Conjectandi our statistical inference methods are/will be available in R-package hdi }{{} high-dimensional inference (Meier, 2013)
52 Conclusions 300 years ago = now & etc... computing but the basic principles appearing in nowadays books are still related to Ars Conjectandi our statistical inference methods are/will be available in R-package hdi }{{} high-dimensional inference (Meier, 2013)
53 can construct asymptotically optimal p-values and confidence intervals in high-dimensional models assuming suitable conditions: sparsity of Y vs X sparsity of X j vs X j (j = 1,..., p) design matrix X is not too ill-posed (e.g. restricted eigenvalue assumption) these conditions are typically uncheckable... confirmatory high-dimensional inference remains challenging Thank you!
54 can construct asymptotically optimal p-values and confidence intervals in high-dimensional models assuming suitable conditions: sparsity of Y vs X sparsity of X j vs X j (j = 1,..., p) design matrix X is not too ill-posed (e.g. restricted eigenvalue assumption) these conditions are typically uncheckable... confirmatory high-dimensional inference remains challenging Thank you!
55 can construct asymptotically optimal p-values and confidence intervals in high-dimensional models assuming suitable conditions: sparsity of Y vs X sparsity of X j vs X j (j = 1,..., p) design matrix X is not too ill-posed (e.g. restricted eigenvalue assumption) these conditions are typically uncheckable... confirmatory high-dimensional inference remains challenging Thank you!
56 References: Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methodology, Theory and Applications. Springer. Meinshausen, N., Meier, L. and Bühlmann, P. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association 104, Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19, van de Geer, S., Bühlmann, P. and Ritov, Y. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint arxiv: v1 Meier, L. (2013). hdi: High-dimensional inference. R-package available from R-Forge.
DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich
Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan
More informationHierarchical High-Dimensional Statistical Inference
Hierarchical High-Dimensional Statistical Inference Peter Bu hlmann ETH Zu rich main collaborators: Sara van de Geer, Nicolai Meinshausen, Lukas Meier, Ruben Dezeure, Jacopo Mandozzi, Laura Buzdugan 0
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationHigh-dimensional statistics, with applications to genome-wide association studies
EMS Surv. Math. Sci. x (201x), xxx xxx DOI 10.4171/EMSS/x EMS Surveys in Mathematical Sciences c European Mathematical Society High-dimensional statistics, with applications to genome-wide association
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationHigh-dimensional statistics and data analysis Course Part I
and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationarxiv: v1 [stat.me] 26 Sep 2012
Correlated variables in regression: clustering and sparse estimation Peter Bühlmann 1, Philipp Rütimann 1, Sara van de Geer 1, and Cun-Hui Zhang 2 arxiv:1209.5908v1 [stat.me] 26 Sep 2012 1 Seminar for
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationP-Values for High-Dimensional Regression
P-Values for High-Dimensional Regression Nicolai einshausen Lukas eier Peter Bühlmann November 13, 2008 Abstract Assigning significance in high-dimensional regression is challenging. ost computationally
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationCausal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data
Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen ... and designing new perturbation
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationHigh-dimensional statistics with a view towards applications in biology
High-dimensional statistics with a view towards applications in biology Peter Bühlmann, Markus Kalisch and Lukas Meier ETH Zürich May 23, 2013 Abstract We review statistical methods for high-dimensional
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationGoodness-of-fit tests for high dimensional linear models
J. R. Statist. Soc. B (2018) 80, Part 1, pp. 113 135 Goodness-of-fit tests for high dimensional linear models Rajen D. Shah University of Cambridge, UK and Peter Bühlmann Eidgenössiche Technische Hochschule
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationThe adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)
Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationarxiv: v2 [stat.me] 16 May 2009
Stability Selection Nicolai Meinshausen and Peter Bühlmann University of Oxford and ETH Zürich May 16, 9 ariv:0809.2932v2 [stat.me] 16 May 9 Abstract Estimation of structure, such as in variable selection,
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationarxiv: v2 [math.st] 15 Sep 2015
χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct
More informationarxiv: v2 [stat.me] 11 Dec 2018
Predictor ranking and false discovery proportion control in high-dimensional regression X. Jessie Jeng a,, Xiongzhi Chen b a Department of Statistics, North Carolina State University, Raleigh, NC 27695,
More informationExact Post Model Selection Inference for Marginal Screening
Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationarxiv: v2 [stat.me] 3 Jan 2017
Linear Hypothesis Testing in Dense High-Dimensional Linear Models Yinchu Zhu and Jelena Bradic Rady School of Management and Department of Mathematics University of California at San Diego arxiv:161.987v
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationA knockoff filter for high-dimensional selective inference
1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationarxiv: v1 [math.st] 8 Jan 2008
arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationComputationally Tractable Methods for High-Dimensional Data
Computationally Tractable Methods for High-Dimensional Data Peter Bühlmann Seminar für Statistik, ETH Zürich August 2008 Riboflavin production in Bacillus Subtilis in collaboration with DSM (former Roche
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationarxiv: v1 [math.st] 5 Oct 2009
On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationPost-Selection Inference
Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis
More informationDelta Theorem in the Age of High Dimensions
Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationEstimating LASSO Risk and Noise Level
Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu
More informationOn Model Selection Consistency of Lasso
On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationA Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations
A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations arxiv:1510.08986v2 [math.st] 23 Jun 2016 Matey Neykov Yang Ning Jun S. Liu Han Liu Abstract We propose a new
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationSparse survival regression
Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationLASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0041.R2 Title Empirical Likelihood Ratio Tests for Coefficients in High Dimensional Heteroscedastic Linear Models Manuscript ID SS-2017-0041.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees
Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud
More informationSummary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club
Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationKnockoffs as Post-Selection Inference
Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationPeter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBootstrap & Confidence/Prediction intervals
Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with
More informationarxiv: v1 [stat.ml] 3 Nov 2010
Preprint The Lasso under Heteroscedasticity Jinzhu Jia, Karl Rohe and Bin Yu, arxiv:0.06v stat.ml 3 Nov 00 Department of Statistics and Department of EECS University of California, Berkeley Abstract: The
More informationMinwise hashing for large-scale regression and classification with sparse data
Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLasso-type recovery of sparse representations for high-dimensional data
Lasso-type recovery of sparse representations for high-dimensional data Nicolai Meinshausen and Bin Yu Department of Statistics, UC Berkeley December 5, 2006 Abstract The Lasso (Tibshirani, 1996) is an
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More information