Statistical Inference
|
|
- Eunice Della McLaughlin
- 5 years ago
- Views:
Transcription
1 Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
2 Outline The Bayesian Lasso Trevor Park and George Casella, The Bayesian Lasso, Journal of the American Statistical Association, June 2008, Vol. 103, No. 482, Theory and Methods DOI Bootstrap Lasso Trevor Hastie, Robert Tibshirani and Martin Wainwright (2015). Statistical Learning with Sparsity-The lasso and Generalizations. (pp ). Chapman and Hall/CRC Post-Selection Inference for the Lasso - The Covariance Test Trevor Hastie, Robert Tibshirani and Martin Wainwright (2015). Statistical Learning with Sparsity-The lasso and Generalizations. (pp ). Chapman and Hall/CRC Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
3 Why Statistical Inference Statistical Inference provides confidence interval and statistical strength of variables, as p-value in models. Statistical Inference is well proposed in low dimension cases, for high dimensional problems, traditional methods may not be applicable. We need some statistical inference methods for high dimensional studies. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
4 The Bayesian Lasso We adopt the approach of Park and Casella (2008), involving a hierarchical model of the form y µ, X, β, σ 2 N n (µ1 n + Xβ, σ 2 I n ) (1) β λ, σ p j=1 λ 2 λ β σ 2 e σ 2 j For a complete Bayesian model, we use the improper prior density for σ 2 and a hyperprior for λ 2 with the class of gamma priors: π(σ 2 ) = 1/σ 2 and π(λ 2 ) = δr Γ(r) (λ2 ) r 1 exp( δλ 2 ) (3) (2) Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
5 Bayesian Lasso Prior and posterior distribution for the seventh variable in the diabetes example. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
6 Bayesian Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
7 Bayesian Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
8 Bootstrap Lasso The bootstrap is popular for assessing the statistical properties of complex estimators. How do we obtain the sampling distribution of ˆβ through bootstrap method? The non parametric bootstrap is one method for approximating this sampling distribution. Another method is parametric bootstrap. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
9 Bootstrap Lasso Suppose that we have obtained an estimate ˆβ(ˆλ CV ) for a lasso problem according to Cross-validation procedure: Fit a lasso path to (X, y) over a dense grid of values Λ = {λ ι } L ι=1. Divide the training samples into 10 groups at random. With the k th group left out, fit a lasso path to the remaining 9/10ths, using the same grid Λ. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
10 Bootstrap Lasso For each λ Λ compute the mean-squared prediction error for the left-out group. Average these errors to obtain a prediction error curve over the grid Λ. Find the value ˆλ CV that minimizes this curve, and then return the coefficient vector from our original fit in the first step at that value of λ. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
11 Non parametric Bootstrap It approximates the cumulative distribution function F of the random pair (X, Y ) by the empirical CDF ˆF N defined by the N samples. Draw N samples with replacement from the given dataset. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
12 Bootstrap Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
13 Bootstrap Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
14 Bootstrap Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
15 Parametric Bootstrap We have a parametric estimate of F, or its corresponding density function f. We can sample from residual or Gaussian process regression. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
16 Bootstrap Lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
17 Bayesian Lasso vs Bootstrap Lasso In a general sense, results for the Bayesian lasso and lasso/bootstrap are similar. Nonparametric bootstrap can be treated as a kind of posterior-bayes estimate under a non-informative prior in the multinomial model. (Rubin 1981, Efron 1982) Bayesian Lasso leans more on parametric assumptions. Bootstrap scales better. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
18 Bayesian Lasso vs Bootstrap Lasso p Bayesian Lasso Lasso/Bootstrap secs secs secs secs mins 14.7 mins hours 18.1 mins Table: Timing for Bayesian lasso and bootstrapped lasso Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
19 The Covariance Test Bayesian methods and the bootstrap are two "traditional "models and we would like to present some newer approaches. We would like to describe two methods proposed for assigning p-values or confidence interval to predictors as they are successively entered by the lasso and forward stepwise regression. These two methods have different results. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
20 The Covariance Test Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
21 The Covariance Test We start from a usual linear regression setup: y = Xβ + ɛ, ɛ N(0, σ 2 I N N ) (4) Consider forward-stepwise regression. Defining RSS k to be the residual sum of squares for the model containing k predictors, we can use this change in residual sum-of-squares to form a test statistic and compare it to a χ 2 1 distribution. R k = 1 σ 2 (RSS k 1 RSS k ) (5) Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
22 The Covariance Test Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
23 The Covariance Test Forward stepwise procedure has chosen the strongest predictor among all of the available choices and it yields a larger drop in training error than expected. It is difficult to derive an appropriate p-value for forward stepwise regression, if we want properly account the adaptive nature of the fitting. For the lasso, a simple test can be derived that properly accounts for the adaptivity. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
24 The Covariance Test Suppose that we wish to test significance of the predictor entered by LAR at λ k. Let A k 1 be the set of predictors with nonzero coefficients before this predictor was added and let the estimate at the end of this step be ˆβ(λ k+1 ). We refit the lasso, keeping λ = λ k+1 but using just the variables in A k 1. This yields the estimate ˆβ Ak 1 (λ k+1 ). The covariance test statistic is: T k = 1 σ 2 ( y, X ˆβ(λ k+1 ) y, X ˆβ Ak 1 (λ k+1 ) ) (6) T k Exp(1) (7) This statistic measures how much of the covariance between the outcome and the fitted model can be attributed to the predictor that has just entered the model. We can present a quantile-quantile plot for T 1 versus Exp(1): Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
25 The Covariance Test Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
26 The exponential limiting distribution for the covariance test requires certain conditions on the data: signal variables are not too correlated with the noise variables. Assumes linearity of the underlying model. In the next section, Libo will show a more general scheme that gives the spacing test, which works for any data matrix X, and the null distribution holds exactly for finite N and p. Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
27 Reference Trevor Hastie, Robert Tibshirani, Martin Wainwright Statistical Learning with Sparsity - The Lasso and Generalizations. Chapman and Hall/CRC, Trevor Park and George Casella The Bayesian Lasso Journal of the American Statistical Association, June 2008, Vol. 103, No. 482, Theory and Methods DOI Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, / 27
28 Post-selection Inference and Bayesian Inference Libo Wang Department of Statistics Florida State University Oct 19th, 2016
29 Motivation Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. Statistics versus Machine Learning Libo Wang Post-selection Inference and Bayesian Inference
30 What is post-selection inference? Inference the old way (pre- 1980?): 1. Devise a model 2. Collect data 3. Test hypothesis Classical inference Inference the new way: 1. Collect data 2. Select a model 3. Test hypothesis Post-selection inference Classical tools cannot be used post-selection, because they do not yield valid inferences (generally, too optimistic, underestimate p-value). Reason? For parametric model with full column rank p < n, the true parameter β is well-defined as the target of statistical inference. When we allow p n, we should add an selection step before working on inference that contains only submodels of full column rank. Libo Wang Post-selection Inference and Bayesian Inference
31 Example: Lasso with fixed-λ HIV data: mutations that predict response to a drug. Selection intervals for lasso with fixed tuning parameter λ. Libo Wang Post-selection Inference and Bayesian Inference
32 Formal goal of Post-selective inference [Lee et al. and Fithian, Sun, Taylor] Having selected a model ˆM based on our data y, we d like to test an hypothesis Ĥ 0. Note that Ĥ 0 will be random a function of the selected model and hence of y If our rejection region is {T (y) R}, we want to control the selected type I error: Prob(T (y) R ˆM, Ĥ 0 ) α Libo Wang Post-selection Inference and Bayesian Inference
33 Existing Approaches Data splitting - fit on one half of the data, do inferences on the other half. Problem: fitted model changes varies with random choice of half ; loss of power. Reference: P-Values for High-Dimensional Regression, Meinshausen N, Meier L, Buhlmann P, 2009 Permutations and related methods: not clear how to use these, beyond the global null Libo Wang Post-selection Inference and Bayesian Inference
34 A key mathematical result Polyhedral lemma: Provides a good solution for FS; an optimal solution for the fixed -λ lasso Polyhedrol section events Response vector y N(µ, Σ). Suppose we make a selection that can be written as y : Ay b with A, b not depending on y. This is true for forward stepwise regression, lasso with fixed λ, least angle regression and other procedures. Libo Wang Post-selection Inference and Bayesian Inference
35 Some intuition for Forward stepwise regression Suppose that we run forward stepwise regression for k steps {y : Ay b} is the set of y vectors that would yield the same predictors and their signs entered at each step. Each step represents a competition involving inner products between each x j and y; Polyhedral Ay b summarizes the results of the competition after k steps. Similar result holds for Lasso (fixed-λ or LAR) Libo Wang Post-selection Inference and Bayesian Inference
36 Example: The lasso and its selection event Lasso estimation: ˆβ argmin y Xβ 2 2/2 + λ β 1 Re-write the KKT condition by partitioning them according to activate set ( ˆM) or inactivate set ( ˆM). (to be continued.) X Ṱ M (X Ṱ M ˆ β M y) + λŝ ˆM = 0, X T ˆM (X Ṱ M ˆ β M y) + λŝ ˆM = 0, sign( ˆβ ˆM ) = ŝ ˆM, ŝ ˆM < 1 Libo Wang Post-selection Inference and Bayesian Inference
37 Example: The lasso and its selection event If and only if: X Ṱ M (X Ṱ w y) + λs = 0, M X T ˆM (X Ṱ w y) + λu = 0, M sign(w) = s, u < 1 Solve w and u by the first 2 equations substituting expression for w and u Libo Wang Post-selection Inference and Bayesian Inference
38 The polyhedral lemma [Lee et al, Ryan Tibs. et al.] For vector η F ν,ν + (η T y) {Ay b} Unif (0, 1) η T µ,σ 2 η 2 2 (truncated Gaussian distribution), where ν, ν + are (computable) values that are functions of η, A, b. Let F [a,b] denote the CDF of µ,σ 2 a N(µ, σ 2 ) r.v. truncated to the interval [a, b], that is, F [a,b] µ,σ 2 = φ((x µ)/σ) φ((a µ)/σ) φ((b µ)/σ) φ((a µ)/σ) Typically choose η so that η T y is the partial least squares estimate for a selected variable. Libo Wang Post-selection Inference and Bayesian Inference
39 Schematic illustrating the polyhedral lemma for the case N = 2. Libo Wang Post-selection Inference and Bayesian Inference
40 Example: Fixed-λ inference for the Lasso The intervals of Lasso is calculated by the selection procedure which is adaptive to the data (choose probabilistic model for the data first then formulate testing). The OLS model is pre-specified with only the selected 7 variables by LASSO. Selection-adjusted intervals are similar with strong signals, but larger for weak signals because these variables are close to the endpoint of the selection region, therefore wide ranges of µ would be consistent with observations. Libo Wang Post-selection Inference and Bayesian Inference
41 Current Work: Lasso with λ estimated by Cross-validation and unknown σ Selective inference with a randomized response by Tian, Taylor Can condition on the selection of λ by CV, and addition to the selection of the model Not clear about the difference between the Lasso with fixed-λ and the Lasso with λ estimated by cross-validation. Selective inference with unknown variance via the square-root LASSO by Tian, Loftus and Taylor Focuses on adapting post-selection inference to the case of unknown σ and the choice of tuning parameter. Square-root LASSO is favorable because the independence of λ with noise level σ. Libo Wang Post-selection Inference and Bayesian Inference
42 Improving the power The preceding approach conditions on the part of y orthogonal to the direction of interest η. This is for computational convenience yielding an analytic solution. Conditioning on less more power Are we conditioning on too much? Libo Wang Post-selection Inference and Bayesian Inference
43 Data splitting, carving, and adding noise Further improvements in power Fithian, Sun, Taylor, Tian Selection inference yields correct post-selection type I error. But confidence intervals are sometimes quite large. How to do better? (say, to make the randomness in selection is independent with the data for inference) Data carving: withholds a small proportion (say 10%) of data in selection stage, then uses all data for inference (conditioning using theory outlined above) Randomized response: add noise to y in selection stage. Like withholding data, but smoother. Then use unnoised data in inference stage. Related to differential privacy techniques. Libo Wang Post-selection Inference and Bayesian Inference
44 Data splitting, carving, and adding noise Libo Wang Post-selection Inference and Bayesian Inference
45 Alternative Method: Bayesian quantile regression inference Setting: y i = β 0 + β 1 X 1i + β 2 X 2i + ɛ i with β 0 = 1/3, β 1 = β 2 = 1, ɛ i exp(1) log(2). Table : Simulation results of Bayesian quantile regression Method ˆβ0 ˆβ1 ˆβ2 Err( ˆβ 0 ) 100 Err( ˆβ) 100 Err(ŷ) 100 BQR QR Libo Wang Post-selection Inference and Bayesian Inference
46 Difference between Bayesian and Classical Frequentist Inference Frequentist: 1. Point estimates and standard errors or 95% confidence intervals. 2. Deduction from P(data H 0 ), by setting α in advance. 3. Accept H 1 if P(data H 0 ) < α. Bayesian: 1. Induction from P(θ data), starting with P(θ). 2. Broad descriptions of the posterior distribution such as means and quantiles. Frequentist: P(data H 0 ) is the sampling distribution of the data given the parameter Bayesian: P(θ) is the prior distribution, P(θ data) is the posterior distribution of the parameter Libo Wang Post-selection Inference and Bayesian Inference
47 Bayesian feature selection methods Laplacian shrinkage. Bayesian lasso Adaptive shrinkage. Spike and Slab Get the selection inference P(θ data) by running 5000 or more iterations. Libo Wang Post-selection Inference and Bayesian Inference
48 Conclusions Post-selection inference is an exciting new area. Lots of potential research problems and generalizations. Bayesian and Frequentist methods both have drawbacks in finite sample settings. R package on CRAN: selectiveinference. Forward stepwise regression, Lasso, Lars Libo Wang Post-selection Inference and Bayesian Inference
49 References Book: Statistical Learning with Sparsity, Chapter 6 (Hastie, Tibshirani, Wainwright) Lee, Sun, Sun, Taylor (2013) Exact post-selection inference with the lasso. arxiv; To appear Tian, X. and Taylor, J. (2015) Selective inference with a randomized response. arxiv Libo Wang Post-selection Inference and Bayesian Inference
50 Thank you! Libo Wang Post-selection Inference and Bayesian Inference
Post-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationRecent Advances in Post-Selection Statistical Inference
Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with
More informationRecent Advances in Post-Selection Statistical Inference
Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationInference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities
Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events
More informationSummary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club
Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report
More informationPost-Selection Inference
Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationRecent Developments in Post-Selection Inference
Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationDISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich
Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan
More informationCovariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20
Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather
More informationExact Post Model Selection Inference for Marginal Screening
Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationLecture 17 May 11, 2018
Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationBootstrap & Confidence/Prediction intervals
Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with
More informationarxiv: v5 [stat.me] 11 Oct 2015
Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon
More informationDiscussion of Least Angle Regression
Discussion of Least Angle Regression David Madigan Rutgers University & Avaya Labs Research Piscataway, NJ 08855 madigan@stat.rutgers.edu Greg Ridgeway RAND Statistics Group Santa Monica, CA 90407-2138
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationExact Post-selection Inference for Forward Stepwise and Least Angle Regression
Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,
More informationInference After Variable Selection
Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationarxiv: v1 [math.st] 9 Feb 2014
Degrees of Freedom and Model Search Ryan J. Tibshirani arxiv:1402.1920v1 [math.st] 9 Feb 2014 Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationUncertainty quantification in high-dimensional statistics
Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationRobust methods and model selection. Garth Tarr September 2015
Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationConjugate direction boosting
Conjugate direction boosting June 21, 2005 Revised Version Abstract Boosting in the context of linear regression become more attractive with the invention of least angle regression (LARS), where the connection
More informationSummary and discussion of: Controlling the False Discovery Rate via Knockoffs
Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary
More informationSelective Inference for Effect Modification: An Empirical Investigation
Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationDEGREES OF FREEDOM AND MODEL SEARCH
Statistica Sinica 25 (2015), 1265-1296 doi:http://dx.doi.org/10.5705/ss.2014.147 DEGREES OF FREEDOM AND MODEL SEARCH Ryan J. Tibshirani Carnegie Mellon University Abstract: Degrees of freedom is a fundamental
More informationMachine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationSupplement to Bayesian inference for high-dimensional linear regression under the mnet priors
The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More information