Post-selection Inference for Forward Stepwise and Least Angle Regression
|
|
- Maximilian Dawson
- 5 years ago
- Views:
Transcription
1 Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September / 45
2 Matching Results from picadilo.com Ryan Tibshirani, CMU. PhD student of Taylor 2011 Rob Tibshirani Stanford 2 / 45
3 81% 71% Ryan Tibshirani CMU Rob Tibshirani Stanford ë Top matches from picadilo.com 3 / 45
4 81% Ryan Tibshirani CMU Rob Tibshirani Stanford 71% 69% 4 / 45
5 Conclusion Confidence the strength of evidence matters! 5 / 45
6 Outline Setup and basic question Quick review of least angle regression and the covariance test A new framework for inference after selection Application to forward stepwise and least angle regression Application of these and related ideas to other problems 6 / 45
7 Setup and basic question Given an outcome vector y R n and a predictor matrix X R n p, we consider the usual linear regression setup: y = Xβ + σɛ, where β R p are unknown coefficients to be estimated, and the components of the noise vector ɛ R n are i.i.d. N(0, 1) Main question: If we apply least angle or forward stepwise regression, how can we compute valid p-values and confidence intervals? 7 / 45
8 Forward stepwise regression This procedure enters predictors one a time, choosing the predictor that most decreases the residual sum of squares at each stage. Defining RSS to be the residual sum of squares for the model containing k predictors, and RSS null the residual sum of squares before the kth predictor was added, we can form the usual statistic R k = (RSS null RSS)/σ 2 (with σ assumed known), and compare it to a χ 2 1 distribution. 8 / 45
9 Simulated example: naive forward stepwise Setup: n = 100, p = 10, true model null Test statistic Chi squared on 1 df Test is too liberal: for nominal size 5%, actual type I error is 39%. (Yes, Larry, can get proper p-values by sample splitting: but messy, loss of power) 9 / 45
10 Quick review of LAR and the covariance test Least angle regression or LAR is a method for constructing the path of solutions for the lasso: min β 0,β j i (y i β 0 j x ij β j ) 2 + λ β j LAR is a more democratic version of forward stepwise regression. Find the predictor most correlated with the outcome Move the parameter vector in the least squares direction until some other predictor has as much correlation with the current residual This new predictor is added to the active set, and the procedure is repeated Optional ( lasso mode ): if a non-zero coefficient hits zero, that predictor is dropped from the active set, and the process is restarted j 10 / 45
11 Least angle regression in a picture Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 11 / 45
12 The covariance test for LAR (Lockhart, Taylor, Ryan Tibshirani, Rob Tibshirani, discussion paper in Annals of Statistics, 2014) The covariance test provides a p-value for each variable as it is added to lasso model via the LAR algorithm. In particular it tests the hypothesis: H 0 : A supp(β ) where A is the running active set, at the current step of LAR 12 / 45
13 The covariance test for LAR Suppose we want a p-value for predictor 2, entering at step 3 Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 13 / 45
14 Compute covariance at λ 4 : y, X ˆβ(λ 4 ) Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 14 / 45
15 Drop x 2, yielding active yet A; ( refit at λ 4, and compute resulting covariance at λ 4, giving T 3 = y, X ˆβ(λ ) 4 ) y, X A ˆβA (λ 4 ) /σ 2 Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 15 / 45
16 Null distribution of the covariance statistic Under the null hypothesis that all signal variables are in the model: H 0 : A supp(β ), the covariance statistic follows T j 1 ( σ 2 y, X ˆβ(λ ) j+1 ) y, X A ˆβA (λ j+1 ) Exp(1) as n, p 16 / 45
17 Null distribution of the covariance statistic Under the null hypothesis that all signal variables are in the model: H 0 : A supp(β ), the covariance statistic follows T j 1 ( σ 2 y, X ˆβ(λ ) j+1 ) y, X A ˆβA (λ j+1 ) Exp(1) as n, p Equivalent knot form : T j = c j σ 2 λ j(λ j λ j+1 ) with c j = 1 for global null case (j = 1) 16 / 45
18 Simulated example: covariance test Setup: n = 100, p = 10, true model null Test statistic Exp(1) 17 / 45
19 Example: prostate cancer data Data from a study of the level of prostate-specific antigen and p = 8 clinical measures, in n = 67 men about to receive a radical prostatectomy Stepwise, naive LAR, covariance test lcavol lweight svi lbph pgg age lcp gleason / 45
20 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] 19 / 45
21 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 19 / 45
22 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 19 / 45
23 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 19 / 45
24 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 19 / 45
25 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 19 / 45
26 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 5. Doesn t have a cool enough name 19 / 45
27 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 5. Doesn t have a cool enough name We ll discuss a new framework that overcomes 1-4, and especially 5 19 / 45
28 What s coming next: a roadmap 20 / 45
29 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 20 / 45
30 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 20 / 45
31 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples 20 / 45
32 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 20 / 45
33 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: Spacing test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 1 Φ(λ 1 ) Unif(0, 1) for any n, p 1 Φ(λ 2 ) 20 / 45
34 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: Spacing test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 1 Φ(λ 1 ) Unif(0, 1) for any n, p 1 Φ(λ 2 ) (Intriguing connection: these are asymptotically equivalent) 20 / 45
35 Preview: prostate cancer data Naive Selection adjusted lcavol lweight svi lbph pgg45 lcp age gleason 21 / 45
36 A new framework for inference after selection 22 / 45
37 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) 23 / 45
38 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) We wish to make inferences on ν T θ a linear contrast of the mean θ conditional on y S, where S is polyhedron S = {y : Γy u} The vector ν = ν(s) is allowed to depend on S 23 / 45
39 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) We wish to make inferences on ν T θ a linear contrast of the mean θ conditional on y S, where S is polyhedron S = {y : Γy u} The vector ν = ν(s) is allowed to depend on S E.g., we d like a p-value P (y, S, ν) that satisfies ( ) P (y, S, ν) α Γy u = α P ν T θ=0 for any 0 α 1 23 / 45
40 Fundamental result Lemma (Polyhedral selection as truncation) For any ν 0, we have {Γy u} = {V ν T y V +, V 0 0}, where γ = ΓΣν ν T Σν V (y) = max j:γ j >0 V + (y) = min j:γ j <0 u j (Γy) j + γ j ν T y γ j u j (Γy) j + γ j ν T y γ j V 0 (y) = min j:γ j =0 u j (Γy) j, Moreover, the triplet (V, V +, V 0 )(y) is independent of ν T y. 24 / 45
41 Proof V P ν y y V + ν ν T y {Γy u} 25 / 45
42 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) 26 / 45
43 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? 26 / 45
44 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 26 / 45
45 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 1. For Z N [a,b] (µ, σ 2 ), and F [a,b] µ,σ 2 its CDF, we have ( ) P F [a,b] (Z) α µ,σ 2 = α 26 / 45
46 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 1. For Z N [a,b] (µ, σ 2 ), and F [a,b] µ,σ 2 its CDF, we have ( ) P F [a,b] (Z) α µ,σ 2 = α 2. Hence also ( ) P F [V,V + ] ν T θ,ν T Σν (νt y) α Γy u = α 26 / 45
47 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), 27 / 45
48 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α 27 / 45
49 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α Furthermore, because the same statement holds for any fixed µ, ( ) P ν T θ=µ 1 F [V,V + ] µ,ν T Σν (νt y) α Γy u = α 27 / 45
50 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α Furthermore, because the same statement holds for any fixed µ, ( ) P ν T θ=µ 1 F [V,V + ] µ,ν T Σν (νt y) α Γy u = α we can compute a conditional confidence interval by inverting the pivot, yielding ( ) P ν T θ [δ α/2, δ 1 α/2 ] Γy u = 1 α 27 / 45
51 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45
52 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45
53 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45
54 Application to forward stepwise and least angle regression 29 / 45
55 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. 30 / 45
56 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. 30 / 45
57 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al at step l. gives the signs of active coefficients 30 / 45
58 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps 30 / 45
59 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: 30 / 45
60 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows 30 / 45
61 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows For LAR, Γ has only k + 1 rows! 30 / 45
62 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows For LAR, Γ has only k + 1 rows! Computation of V, V + is O(number of rows of Γ) 30 / 45
63 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. 31 / 45
64 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: 31 / 45
65 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k 31 / 45
66 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k We place no restrictions on predictors X (general position) 31 / 45
67 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k We place no restrictions on predictors X (general position) Interesting case to keep in mind: ν = X Ak (X T A k X Ak ) 1 e k, so that ν T θ = e T k (XT A k X Ak ) 1 X T A k θ which is the kth coefficient in the multiple regression of θ on active variables A k 31 / 45
68 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 32 / 45
69 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 32 / 45
70 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 When ν = X Ak (X T A k X Ak ) 1 e k : this tests the significance of the projected regression coefficient of the last variable entered into the model, 32 / 45
71 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 When ν = X Ak (X T A k X Ak ) 1 e k : this tests the significance of the projected regression coefficient of the last variable entered into the model, conditional on all selections that have been made so far 32 / 45
72 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot 33 / 45
73 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α 33 / 45
74 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α When ν = X Ak (X T A k X Ak ) 1 e k : this interval traps the projected regression coefficient of the last variable to enter with probability 1 α, 33 / 45
75 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α When ν = X Ak (X T A k X Ak ) 1 e k : this interval traps the projected regression coefficient of the last variable to enter with probability 1 α, conditional on all selections made so far 33 / 45
76 The spacing test The spacing test is the name we give to our framework applied to LAR 34 / 45
77 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) 34 / 45
78 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) Because Γ is especially simple for LAR, this test easy and efficient to implement. 34 / 45
79 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) Because Γ is especially simple for LAR, this test easy and efficient to implement. The p-value for H 0 : e T k (XT A k X Ak ) 1 X T A k θ = 0 is P = Φ ( Φ V ) σ ω k Φ ( ) V σ ω k Φ ( ) λk σ ω ( k ) V + σ ω k where λ k is the kth LAR knot, and ω k = X Ak (XA T k X Ak ) 1 s Ak X Ak 1 (XA T k 1 X Ak 1 ) 1 2 s Ak 1 34 / 45
80 Example: selection adjustment under the global null Step= 1 Step= 2 < max-t > Step= 3 Step= 4 Maxt Maxt Maxt Maxt Expected Expected Expected Expected Step= 1 <- Selection-adjusted FS-> Step= 2 Step= 3 T Step= 4 Exact FS Exact FS Exact FS Exact FS Expected Expected Expected Expected Step= 1 Step= 2 <- Spacing test-> Step= 3 Step= 4 Spacing Spacing Spacing Spacing Expected Expected Expected Expected 35 / 45
81 Example: back to prostate data Stepwise, naive Stepwise, adjusted lcavol lweight svi lbph pgg lcp age gleason LAR, covariance test LAR, spacing test lcavol lweight svi lbph pgg age lcp gleason / 45
82 Selection intervals Interestingly, our conditional intervals also hold unconditionally 37 / 45
83 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (X T A k X Ak ) 1 e k for concreteness; 37 / 45
84 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α 37 / 45
85 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; 37 / 45
86 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, 37 / 45
87 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, here the (random) projected regression coefficient of the kth variable to enter 37 / 45
88 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, here the (random) projected regression coefficient of the kth variable to enter Roughly speaking, we can think of this interval as trapping the projected coefficient of the kth most important variable as deemed by the algorithm, with probability 1 α 37 / 45
89 Example: selection intervals for LAR Selection interval for predictor 1 Parameter value Realization Selection interval for predictor 2 Parameter value Realization Selection intervals for first predictor entered, at each LAR step Parameter value LAR step 38 / 45
90 Application of these and related ideas to other problems Beyond polyhedra (Taylor, Loftus, Ryan Tibs 2014) Graphical models and clustering (G Sell, Taylor, Tibs 2014) Many normal means (Reid, Taylor, Tibs 2014) Lasso with fixed λ (Lee, Sun, Sun, Taylor 2014) Marginal screening (Sun, Taylor 2014) PCA (Choi, Taylor, Tibs in preparation) 39 / 45
91 PCA example Scree Plot of (true) rank = 2 Singular values(snr=1.5) Singular values(snr=0.23) Singular values Singular values index index (p-values for right: 0.030, 0.064, 0.222, 0.286, 0.197, 0.831, 0.510, 0.185, ) 40 / 45
92 Generalization to principal component analysis Model Y = B + ɛ and we want to test rank(b) k. Test is based on singular values of Wishart matrix Y T Y. Largest singular value has a Tracy-Widom distribution asymptotically (Johnstone 2001) and this can be used to construct a test of the global null rank(b) = 0. This can be applied sequentially for other ranks (Kritchman and Nadler, 2008) We derive the conditional distribution of each singular value λ k, conditional on λ k+1... λ p and use this to obtain an exact test 41 / 45
93 rank=0, step=1 rank=0, step=2 rank=0, step=3 rank=0, step=4 sort(result0[, 1]) Jon's method integrated out Nadler's sort(result0[, j]) sort(result0[, j]) sort(result0[, j]) qs qs qs qs rank=1, step=1 rank=1, step=2 rank=1, step=3 rank=1, step=4 sort(result1[, j]) sort(result1[, j]) sort(result1[, j]) sort(result1[, j]) qs qs qs qs rank=2, step=1 rank=2, step=2 rank=2, step=3 rank=2, step=4 sort(result2[, j]) sort(result2[, j]) sort(result2[, j]) sort(result2[, j]) qs qs qs qs 42 / 45
94 Final comments Data analysts need tools for inference after selection, and these are now becoming available But much work is left to be done (power, robustness, computation, etc.) R language software is on its way! 43 / 45
95 Questions Larry: How do you estimate σ 2? Estimating σ 2 is harder than estimating the signal! Ale: What s the Betti number of the Rips complex of the selection set under Hausdorff paracompactness? Jay: Is there a Bayesian analog? Jim Ramsey: Assuming normality is bogus. Nature just gives you a bunch of numbers Ryan: Why can t we just apply trend filtering? 44 / 45
96 Questions Larry: You are doing inference for ν T θ but your ν is random. How do you do inference for E (ν) T θ? Ale: How does this work for log linear models? Jing: How does this work for sparse PCA? Max: How dependent are the p-values? Could this work with ForwardStop? 45 / 45
A significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationSummary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club
Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report
More informationExact Post-selection Inference for Forward Stepwise and Least Angle Regression
Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationRecent Advances in Post-Selection Statistical Inference
Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,
More informationInference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities
Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events
More informationRecent Developments in Post-Selection Inference
Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu
More informationRecent Advances in Post-Selection Statistical Inference
Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with
More informationarxiv: v5 [stat.me] 11 Oct 2015
Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationPost-Selection Inference
Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationSampling Distributions
Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationSampling Distributions
Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationCovariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20
Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationA significance test for the lasso
A significance test for the lasso Richard Lockhart 1 Jonathan Taylor 2 Ryan J. Tibshirani 3 Robert Tibshirani 2 arxiv:1301.7161v1 [math.st] 30 Jan 2013 1 Simon Fraser University, 2 Stanford University,
More informationDISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich
Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan
More informationUniform Asymptotic Inference and the Bootstrap After Model Selection
Uniform Asymptotic Inference and the Bootstrap After Model Selection Ryan Tibshirani, Alessandro Rinaldo, Rob Tibshirani, and Larry Wasserman Carnegie Mellon University and Stanford University arxiv:1506.06266v2
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Richard Lockhart 1 Jonathan Taylor 2 Ryan J. Tibshirani 3 Robert Tibshirani 2 1 Simon Fraser University, 2 Stanford University, 3 Carnegie Mellon University Abstract In
More informationarxiv: v3 [stat.me] 31 May 2015
Submitted to the Annals of Statistics arxiv: arxiv:1410.8260 SELECTING THE NUMBER OF PRINCIPAL COMPONENTS: ESTIMATION OF THE TRUE RANK OF A NOIS MATRIX arxiv:1410.8260v3 [stat.me] 31 May 2015 By unjin
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationIdentify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R
Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear regression
COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationPackage covtest. R topics documented:
Package covtest February 19, 2015 Title Computes covariance test for adaptive linear modelling Version 1.02 Depends lars,glmnet,glmpath (>= 0.97),MASS Author Richard Lockhart, Jon Taylor, Ryan Tibshirani,
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear Regression
COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise
More informationCS 340 Lec. 15: Linear Regression
CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationLecture 4: Newton s method and gradient descent
Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationRegression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays
Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationExact Post Model Selection Inference for Marginal Screening
Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationLecture 17 May 11, 2018
Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model
More informationCovariate-Assisted Variable Ranking
Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationIntroduction to the genlasso package
Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationarxiv: v2 [stat.me] 13 Mar 2015
arxiv:1503.00334v2 [stat.me] 13 Mar 2015 Sparse regression and marginal testing using cluster prototypes Stephen Reid and Robert Tibshirani Abstract We propose a new approach for sparse regression and
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationStatistics Ph.D. Qualifying Exam
Department of Statistics Carnegie Mellon University May 7 2008 Statistics Ph.D. Qualifying Exam You are not expected to solve all five problems. Complete solutions to few problems will be preferred to
More informationHigh-dimensional statistics and data analysis Course Part I
and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationDistribution-Free Predictive Inference for Regression
Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More informationSelective Inference for Effect Modification: An Empirical Investigation
Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationarxiv: v1 [math.st] 27 May 2014
The Annals of Statistics 2014, Vol. 42, No. 2, 518 531 DOI: 10.1214/14-AOS1175REJ Main article DOI: 10.1214/13-AOS1175 c Institute of Mathematical Statistics, 2014 REJOINDER: A SIGNIFICANCE TEST FOR THE
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationLinear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks!
Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, we ve been interested in learning P(Y X) where Y has discrete values
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationRobust Testing and Variable Selection for High-Dimensional Time Series
Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional
More informationMallows Cp for Out-of-sample Prediction
Mallows Cp for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania lbrown@wharton.upenn.edu WHOA-PSI conference, St. Louis, Oct 1, 2016 Joint work
More informationProblem Set 2 Solution Sketches Time Series Analysis Spring 2010
Problem Set 2 Solution Sketches Time Series Analysis Spring 2010 Forecasting 1. Let X and Y be two random variables such that E(X 2 ) < and E(Y 2 )
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationConsistent Selection of Tuning Parameters via Variable Selection Stability
Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More information