Recent Developments in Post-Selection Inference

Size: px
Start display at page:

Download "Recent Developments in Post-Selection Inference"

Transcription

1 Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics Shashank Singh Department of Statistics Machine Learning Department Abstract It is common in modern applications to use data-dependent model selection tools to select a promising model before drawing inference over the parameters of the selected model. However, this simple series of steps conceals a significant fault that is often left unattended: the act of selection biases the distributions of test statistics and makes standard inference procedures unsound. This is referred to as the problem of post-selection inference. We review recent work showing how to correct for this in certain common settings. 1 Introduction In the linear regression setting, if M denotes the set of indices for the predictors spanning X M, and β M the least square solution for the regression of Y with X M predictors, then it is well known that under the null hypothesis (i.e., the true coefficient is 0), for each j M, Z j M := βj M ) N (0, 1), se ( βj M This pivotal distribution is the driving force behind inference on least square solutions. It is implicitly assumed that β j M was chosen to be the predictor of interest prior to sampling the data, and hence the model is fixed. Furthermore, it is also assumed that the only predictor of interest is j, thought, there is a need to provide inference over a larger set of predictors, standard corrections for multiple comparisons, such as the Bonferonni correction, can be used. In practice, the widespread use of model selection tools introduces a stochastic component to selection of the model. When the data is being used twice (i.e., for both model selection and inference), the assumption of a fixed model is violated and the inference is no longer sound. 1.1 An illustrative simulation The following simulation demonstrates the problem in a simple case, which, unfortunately, can describe an enthusiastic but naive researcher hunting for a discovery. Consider p independent predictors, X 1,..., X p N (0, 1) and an independent observation vector Y N (0, 1) (i.e., all vectors are independent random noise). In order to find the most promising predictor explaining Y, p different univariate regression models are fitted to Y, and the most significant is chosen at each iteration. Figure 1 shows the distribution of max j,m Z j M as a function of the number of predictors p. In the simulation, univariate regression was done in order to simplify the example. In multivariate model selection, a model X M has M different predictors. Although there is dependence between the statistics of the same predictor between different models, naively correcting for the maximum would require p ( p k=1 k) k tests. This exponential growth results in performing 192 different tests for p = 6, emphasizing that the unsound distribution of n = 100, presented in Figure 1, offering 1

2 Figure 1: The distribution of max j,m Z j M, for several values of the dimension, p. When p = 1, Z j M N (0, 1). However, as p increases, rejecting at Zj M > z α/2 is almost certain to provide false positive results. less than 8% coverage, is easily accessible with improper use of model selection, even in lowdimensional settings. 1.2 Historical approaches to the post-selection inference problem One approach to ensuring validity of post-selection inference technically embeds the problem in the field of multiple comparisons. By approaching each coefficient inference as a single inference, controlling the family-wise error rate over the family of inferences guarantees valid inference for the selected model. The simplest such solution is to apply a Bonferroni correction, dividing the type I error threshold by the (large) number of tests being performed. Recently, Berk et al. (2013) provided a less conservative approach which simultaneously bounded all linear functions that arise as coefficient estimates in all submodels. However, due to the exponential growth of the number of inferences as a function of dimension, these simultaneous methods quickly become too conservative. A more precise solution to the problem requires accounting for the specific nature of the selection procedure. For the example above, this is quite straightforward; since each Z j M N (0, 1), max M,j Z j M has the cumulative distribution function (CDF) Φp, where Φ denotes the CDF of N (0, 1), and, therefore, ) P H0 (max j,m Zj M [ c, c] = Φ p (c) Φ p ( c). For p 3, Φ p ( c) 0, and a one sided test typically suffices. Thus a valid acceptance region controls Φ p (c) = 1 α, which is equivalent to rejecting when c z (1 α) 1/p. In more general cases, understanding the distribution of the statistics conditioned on the selection is not as trivial. Conditional inference has been actively researched throughout the years by Buehler and Feddersen (1963); Brown (1967); Olshen (1973), and others. Weinstein et al. (2013) provide a conditional approach towards confidence intervals for the case of Gaussian distributions, which can be utilized when selection is determined by some predefined threshold (say z 1 α/2 ). 2

3 1.3 Recent progress In this project, we review two recent breakthroughs in the conditional inference school, which have contributed substantially to solving the problem of post-selection inference in several common settings. The first, due to Fithian et al. (2014), approaches conditional inference in general way by defining conditional tests and confidence intervals, and then explores in detail how to perform conditional inference in an exponential family setting. This results in an important theoretical framework, but involves quantities that are not always easy to compute in practice. The second paper, due to Taylor et al. (2014), considers the special case of a Gaussian noise model and a specific class of polyhedral selection procedures which select variables based on a set of affine constraints on the response variable. In this case, they are able do derive p-values and selection intervals for exact inference conditioned on the model selected. While this approach has less generality, in contrast to Fithian et al. (2014), Taylor et al. (2014) are able to give easily computable tools for inference, as well as tight, computationally efficient approximations for several common selection procedures. In the following sections we present the methods and results of both Fithian et al. (2014) and Taylor et al. (2014) in greater detail, and then conclude with a general discussion of post-selection inference. 2 Conditioning on selection in exponential families Fithian et al. (2014) provide definitions and notations towards conditional inference, and demonstrate how it can be performed in exponential family setting. Their leading guideline towards the meaning of the inference is that it must be valid, given that the question was asked, where question is precisely defined as a tuple ( M, H M 0,j), testing the hypothesis H M 0,j for the j th predictor in the selected model M. In that case, a level-α selective test is expected to control the selective type I error: P M,H0 ( reject H M 0,j ( M, H M 0,j) selected ) α. However, Fithian et al. (2014) develop an even more general framework for the problem. For any event A, a hypothesis test is be a function φ (y) of the random variable Y, taking values in [0, 1]. φ (y) is said to control the selective type I error if E F [φ (Y ) A] α, for all F H 0. Adopting the conditional point of view, classical statistical theory can be restated in a conditional manner, defining power functions, confidence intervals, tests and admissibility as expected in a conditional context. For example, selective confidence intervals have 1 α coverage for the parameter θ (F ) if P F (θ (F ) C (Y ) A) 1 α, for all F H 0, and they can be obtained by inverting selective tests, as in the classical case. The major question remaining is how to control the conditional distributions of test statistics. The general case is often quite hard, but a main result of the paper is that, for test statistics with exponential family distributions, it follows naturally. If Y follows a multi-parameter exponential family, then by the factorization theorem { } Y exp θ T (y) ψ (θ) f 0 (y), and so, conditioning Y on the event A, { } (Y Y A) exp θ T (y) ψ (θ) f 0 (y) I {y A}, which by is also an exponential family distribution. This reduces the problem to a well known problem of exponential family inference, in which many classical tools are available, such as the uniformly most powerful unbiased (UMPU) test developed by Lehmann and Scheffe (1955). However, even with the UMPU test, in order to compute the test values, it is required to know the likelihood function, L θ (Y A), for all θ Θ. In simple cases this can be computed explicitly, such 3

4 as the selection of the maximum described in the simulation. In most cases, however, computationally expensive Monte Carlo methods are required. The final point to be discussed is the choice of the event A on which to condition. A must be generated from a selection variable partitioning the space for the different models, but there are many ways to partition the space in such a way, and the inference can vary significantly by the selection of the conditioning event. An important result demonstrates that the finer the partitioning, the more information to be gained by conditioning. A particular case of interest is be data splitting, which can be considered as a scenario in which conditioning discards all the information on the training set. A finer partition would condition on the values observed in the training set that resulted in the model selected, yielding greater power. The paper demonstrates that it is almost always be possible to choose a finer partition than the one used for data splitting, and hence, data splitting is inadmissible. 3 The special case of Gaussian inference after polyhedral selection Taylor et al. (2014) pioneer a new exact approach to post-selection inference for a class of variable selection procedures that are polyhedral (as defined in Section 3.1). For such selectors, they derive exact p-values for testing the significance of predictor variables, conditioned on the results of variables selection. To do this, they make two assumptions on the distributions of the true data. First, they assume the covariate matrix X lies in general position (a very weak rank-like assumption, which is satisfied almost surely by continuously distributed variables). The second and more restrictive assumption is that the noise is Gaussian. That is, the true model has the form y = θ(x) + ε, (1) where ε N (0, Σ) and θ : R p R is some (not necessarily linear) function of the predictors. In this section, we first describe the structure of polyhedral selection, and outline how this allows us to perform valid post-selection inference under the model (1). We then show that several common variable selection procedures are polyhedral, and derive the corresponding post-selection inference procedures. Figure 2: Illustration of marginal screening as a polyhedral selector. Here, we observe two samples and have three covariates. If the response y R 2 falls in either shaded region, then variables x 1 and x 3 will be selected (as they are sufficiently correlated with y, while x 2 is not). The bottom left and top right regions corresponds to negative and positive correlations, and hence negative and positive estimated coefficients, respectively. 4

5 3.1 Polyhedral selection Suppose we observe n samples of p predictor variables (x 1,..., x p ) R p and a corresponding response variable y R, giving a covariate matrix X R n p and a response vector y R n. Recall that a polyhedron in R n is any set P R n defined by a finite set of affine constraints (i.e., P = {z R n : Γz u} for some Γ R m n, u R m, where the inequality holds componentwise). A variable selection procedure is called polyhedral if, for any covariate matrix X R n p, the sets of outcomes y R n resulting in any particular selection of variables with particular correlation signs sign(xj T y) are all polyhedra. As an illustrative example, a very simple polyhedral selection procedure is marginal screening (MS), illustrated in Figure 2. MS simply selects those variables which are sufficiently correlated with the response. That is, for some c [0, 1], MS will select a variable x j if and only if X T j y c X j 2, where c can be chosen based on desired number of selected variables. Suppose for instance, that MS selects the l variables {x ki } l i=1 with a particular sequence {s i} l i=1 = {sign(xk T i y)} l i=1 of signs. This occurs if and only if, for each selected x k i, Xk T i y c X ki 2 and, for each unselected x j, both Xj T y c X j 2 and Xj T y c X j 2. Thus, if, for each i {1,..., l}, Γ R (k+2(p k)) n has a row s i Xk T i and u R n has a corresponding entry c X ki 2, and for each j / {k 1,..., k l }, Γ has two rows, Xj T and XT j, and u has two corresponding entries, both c X j 2. See Lee and Taylor (2014) for a detailed study of inference after model selection through marginal screening. The next section presents two lemmas, which together suggest how to perform exact post-selection inference after polyhedral selection. 3.2 Conditional Gaussian inference after polyhedral selection The hypothesis that y is uncorrelated with x j can be phrased as x T j θ = 0 (recalling that y is θ plus noise), suggesting that we are interested in the distribution of v T y for certain v R n. The lemma 1 gives an representation of a polygon in terms of terms of arbitrary such v R n. Figure 3: Illustration of the polyhedron P = {y R n : Γy u} encoded in terms of V lo (y), V hi (y), and V 0 (y). Since v points precisely to the right V lo (y) encodes the rightmost constraint to the left of y, V hi (y) encodes the leftmost constraint to the right of y, and V 0 (y) encodes constraints parallel to v (of which none are shown here). Lemma 1 (Polyhedral selection as truncation): Suppose y N (θ, Σ), where θ R n is unknown but Σ R n n is known. Consider a polyhedron P = {y R n : Γy u}, where Γ R m n and u R m. If v T Σv 0, then, for ρ := ΓΣv v T Σv, V0 (y) := max j:ρ j=0 u j (Γy) j, 5

6 V hi u j (Γy) j + ρ j v T y (y) := min, and V lo u j (Γy) j + ρ j v T y (y) := max, j:ρ j>0 ρ j j:ρ j<0 ρ j (V 0 (y), V hi (y), V lo (y)) is independent of v T y and y P V 0 (y) 0 and v T y [V lo (y), V hi (y)]. The intuition behind Lemma 1 is illustrated in Figure 3. The independence results from the fact that (V 0 (y), V hi (y), V lo (y)) is a function of y ΣvvT y v T Σv, which is independent of vt y because y N (θ, Σ). Since v T y has a Gaussian distribution, Lemma 1 tells us that v T y y P has a 1-dimensional truncated Gaussian distribution. We now establish some notation which will allow us to construct our desired test statistic. For a, b, µ, σ R with a < b, σ > 0, let F [a,b] µ,σ denote the CDF of N (µ, σ 2 ) truncated to [a, b], i.e., 2 F a,b µ,σ (x) = Φ ( ) ( x µ σ Φ a µ ) ( ) σ 2 Φ b µ σ Φ ( ), x [a, b]. a µ σ where Φ : R R + denotes the CDF of N (0, 1). We then have the following pivotal statistic: Lemma 2 (Conditional p-values): For v R n, Σ R n n, if v T Σv 0, then F Vlo (y),v hi (y) v T θ,v T Σv (v T y) y P Unif(0, 1), i.e., F Vlo (y),v hi (y) v T θ,v T Σv (v T y) is a valid p-value conditioned on y P. The proof of Lemma 2 is also fairly simple, and depends essentially on the independence of (V 0, V lo, V hi ) and v T y. Lemma 2 in turn gives rise to the following two-sided conditional hypothesis test (testing H 0 v T θ = 0 against H 1 : v T θ 0): 1 Lemma 3 (Two-sided conditional inference after polyhedral selection): Suppose v T Σv 0. Define the statistic { } T := 2 min F Vlo (y),v hi (y) (v T y), 1 F Vlo (y),v hi (y) (v T y). 0,v T Σv 0,v T Σv Then, T is a valid conditional p-value for H 0 ; i.e., under H 0, T y P Unif(0, 1). 3.3 Application to forward stepwise regression In section 3.1, we showed that a simple selection procedure, marginal screening, is polyhedral. A somewhat more elaborate selection procedure, forward stepwise regression (FS), is discussed by Taylor et al. (2014). FS is an iterative procedure, which, in each iteration, adds the variable causing the largest drop in residual error. 2 In particular, let A k = [j 1,..., j k ] and s Ak = [s 1,..., s k ] denote the ordered list of active variables and their signs (upon entering) after k iterations. Since FS greedily chooses variables which minimize the residual sum of squares (RSS), for any j / {j 1,..., j k }, and RSS(y, X Ak ) RSS(y, X Ak 1 {j}) (2) s k = sign(e T k (X Ak ) + y). (3) This allows us to express FS as a polyhedral selection method, as follows. 1 A more powerful one-sided conditional test for the alternative H 1 : v T θ > 0 is similarly easy to derive. 2 The assumption that X has columns in general position guarantees uniqueness of the selected variables and their signs. 6

7 Theorem 1 (FS selection is polyhedral): The set P := {y R n : Âk(y) = A k, ŝ Ak (y) = s Ak } of response vectors y resulting in the FS active set A k and signs s Ak is a polyhedron P = {y R n : Γy 0}. Proof: We construct Γ by adding 2(p k) rows in each iteration k FS. If j k and s k are the first variable and sign chosen by FS, then the conditions (2) and (3) are equivalent to Thus, we add to 2(p k) rows to Γ, of the form (with one row for each pair of ± and j / A k ). s 1 Xj T 1 ± XT j y, j / A k. X j1 2 X j 2 s 1 Xj T k ± X j, X j1 2 X)j 2 To test the null hypothesis H 0 : v T θ = 0 against H 1 : v t θ 0, we simply compute V lo (y), V hi (y), V 0 (y) as in Lemma 1 and then use the test statistic { } T k := 2 min F [Vlo (y),v hi (y)] (v T y), 1 F [Vlo (y),v hi (y)] (v T y), 0,σ 2 v 2 2 0,σ 2 v 2 2 and, similarly, a (1 α)-confidence intervals for v T θ is [δ α/2, δ 1 α/2 ] satisfying and 1 F [Vlo (y),v hi (y)] (v T y) = α/2 δ α/2,σ 2 v F [Vlo (y),v hi (y)] 1 δ α/2,σ 2 v 2 2 (v T y) = 1 α/ Other consequences The ideas presented in Section 3.2 can be applied and extended in a straightforward manner to derive other useful tools for model selection and post-selection inference, some of which we discuss below. Other Selection Procedures: Lee et al. (2013) apply the polyhedral selection framework for the lasso at any value of the regularization parameter λ and derive appropriate p-values and selection intervals. Taylor et al. (2014) do the same for least angle regression (LAR). Because, in practice the Γ constraint matrix used in Lemma 1, can grow very large (with roughly 4pk rows after k steps of LAR), Taylor et al. (2014) also derive an much smaller approximate set of constraints (with at most k + 1 rows), which they show yields a highly accurate and computationally efficient approximation to the true selection polyhedron. Selection Intervals: The hypothesis test presented in Lemma 3 can be inverted to compute exact selection intervals (as described in Section 4.2). Stopping Rules: Using general methods for sequential hypothesis testing from G Sell et al. (2013b), we can establish a stopping rule for iterative selection methods (such as FS and LAR) with guarantees on the false discovery rate. 4 Discussion 4.1 Coverage in Model Selection It is important to emphasize that the entire field of inference after model selection is highly susceptible to the assumptions. Without the (usually unreasonable) assumption that the true model is generated by the linear model just selected, the meaning of the coefficients is vague. Even assuming that there exists a real vector β generating Y in a linear way, any selected model will typically contain a subset of β, in which each of the selected components is a linear combination of the full vector 7

8 β. In particular, any estimator upon which we perform inference is highly influenced by correlations with predictors that are not selected. This point has been demonstrated in G Sell et al. (2013a). One way of defining coverage in this context is as in Berk et al. (2013), in which an estimator should reliably cover its own expected value. This addresses the problem of defining confidence intervals and valid tests, although the interpretation of the interval is still unclear, as the expected value is also affected by the unselected predictors. 4.2 Selection Intervals As mentioned previously, conditional hypothesis test such as that presented in Lemma 3 can be inverted to compute selection intervals. In contrast with traditional confidence intervals, selection intervals guarantee coverage of the expectation of a parameter (which is itself a random variable resulting from model selection), consistent with the notion of conditional inference given in Section 4.1. That is, we can find an interval I around an estimated coefficient β j with the property [ ] P E[ β j ] I y P = 1 α. 4.3 Unconditional Inference Taylor et al. (2014) point out a simple but important observation that characterizes the utility of conditional hypothesis tests. In particular, if A is the event of a type 1 error, then, for a valid α-level hypothesis test conditioned on some random variable X = x, α P [A X = x] = P [A, X = x] P [X = x] P [A, X = x]. Then, integrating over x gives α P [A]. Thus, a valid conditional α-level hypothesis test is also a valid unconditional α-level hypothesis test. In the case of variable selection, X might be the subset of selected variables. In fact, a similar argument shows that the conditional hypothesis is valid for any coarsening of the condition; for example, a valid test conditioned on X = x would also be a valid test conditioned on X X, for any set X containing x. 8

9 References Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, and Linda Zhao. Valid post-selection inference. The Annals of Statistics, 41(2): , Larry Brown. The conditional level of Student s t test. The Annals of Mathematical Statistics, 38(4): , Robert J. Buehler and Alan P. Feddersen. Note on a conditional property of student s t1. The Annals of Mathematical Statistics, 34(3): , William Fithian, Dennis Sun, and Jonathan Taylor. Optimal inference after model selection. arxiv preprint arxiv: , Max Grazier G Sell, Trevor Hastie, and Robert Tibshirani. False variable selection rates in regression. arxiv preprint arxiv: , 2013a. Max Grazier G Sell, Stefan Wager, Alexandra Chouldechova, and Robert Tibshirani. procedures and false discovery rate control. arxiv preprint arxiv: , 2013b. Sequential selection Jason D Lee and Jonathan E Taylor. Exact post model selection inference for marginal screening. In Advances in Neural Information Processing Systems, pages , Jason D Lee, Dennis L Sun, Yuekai Sun, and Jonathan E Taylor. Exact post-selection inference with the lasso. arxiv preprint arxiv: , E.L. Lehmann and Henry Scheffe. Completeness, similar regions and unbiased tests, Part II. Sankhya, 15: , RA Olshen. The Conditional Level of the F - Test. Journal of the American Statistical Association, 68(343): , Jonathan Taylor, Richard Lockhart, Ryan J Tibshirani, and Robert Tibshirani. Post-selection adaptive inference for least angle regression and the lasso. arxiv preprint, Asaf Weinstein, William Fithian, and Yoav Benjamini. Selection Adjusted Confidence Intervals With More Power to Determine the Sign. Journal of the American Statistical Association, 108(501): , March ISSN doi: /

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,

More information

arxiv: v5 [stat.me] 11 Oct 2015

arxiv: v5 [stat.me] 11 Oct 2015 Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Exact Post Model Selection Inference for Marginal Screening

Exact Post Model Selection Inference for Marginal Screening Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

Selective Inference for Effect Modification: An Empirical Investigation

Selective Inference for Effect Modification: An Empirical Investigation Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Lecture 17 May 11, 2018

Lecture 17 May 11, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

PoSI Valid Post-Selection Inference

PoSI Valid Post-Selection Inference PoSI Valid Post-Selection Inference Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia,

More information

Selection-adjusted estimation of effect sizes

Selection-adjusted estimation of effect sizes Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

Uniform Asymptotic Inference and the Bootstrap After Model Selection

Uniform Asymptotic Inference and the Bootstrap After Model Selection Uniform Asymptotic Inference and the Bootstrap After Model Selection Ryan Tibshirani, Alessandro Rinaldo, Rob Tibshirani, and Larry Wasserman Carnegie Mellon University and Stanford University arxiv:1506.06266v2

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY BY YINGQIU MA A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

An ensemble learning method for variable selection

An ensemble learning method for variable selection An ensemble learning method for variable selection Vincent Audigier, Avner Bar-Hen CNAM, MSDMA team, Paris Journées de la statistique 2018 1 / 17 Context Y n 1 = X n p β p 1 + ε n 1 ε N (0, σ 2 I) β sparse

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

A Nonlinear Predictive State Representation

A Nonlinear Predictive State Representation Draft: Please do not distribute A Nonlinear Predictive State Representation Matthew R. Rudary and Satinder Singh Computer Science and Engineering University of Michigan Ann Arbor, MI 48109 {mrudary,baveja}@umich.edu

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

On Post-selection Confidence Intervals in Linear Regression

On Post-selection Confidence Intervals in Linear Regression Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Spring 5-2017 On Post-selection Confidence Intervals in Linear

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Rank conditional coverage and confidence intervals in high dimensional problems

Rank conditional coverage and confidence intervals in high dimensional problems conditional coverage and confidence intervals in high dimensional problems arxiv:1702.06986v1 [stat.me] 22 Feb 2017 Jean Morrison and Noah Simon Department of Biostatistics, University of Washington, Seattle,

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017 Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about

More information

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example

More information

Accumulation test demo - simulated data

Accumulation test demo - simulated data Accumulation test demo - simulated data Ang Li and Rina Foygel Barber May 27, 2015 Introduction This demo reproduces the ordered hypothesis testing simulation in the paper: Ang Li and Rina Foygel Barber,

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 8 P-values When reporting results, we usually report p-values in place of reporting whether or

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

PoSI and its Geometry

PoSI and its Geometry PoSI and its Geometry Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA Simon Fraser

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Unsupervised Learning Methods

Unsupervised Learning Methods Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information