Post-selection Inference for Forward Stepwise and Least Angle Regression

Size: px
Start display at page:

Download "Post-selection Inference for Forward Stepwise and Least Angle Regression"

Transcription

1 Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September / 45

2 Matching Results from picadilo.com Ryan Tibshirani, CMU. PhD student of Taylor 2011 Rob Tibshirani Stanford 2 / 45

3 81% 71% Ryan Tibshirani CMU Rob Tibshirani Stanford ë Top matches from picadilo.com 3 / 45

4 81% Ryan Tibshirani CMU Rob Tibshirani Stanford 71% 69% 4 / 45

5 Conclusion Confidence the strength of evidence matters! 5 / 45

6 Outline Setup and basic question Quick review of least angle regression and the covariance test A new framework for inference after selection Application to forward stepwise and least angle regression Application of these and related ideas to other problems 6 / 45

7 Setup and basic question Given an outcome vector y R n and a predictor matrix X R n p, we consider the usual linear regression setup: y = Xβ + σɛ, where β R p are unknown coefficients to be estimated, and the components of the noise vector ɛ R n are i.i.d. N(0, 1) Main question: If we apply least angle or forward stepwise regression, how can we compute valid p-values and confidence intervals? 7 / 45

8 Forward stepwise regression This procedure enters predictors one a time, choosing the predictor that most decreases the residual sum of squares at each stage. Defining RSS to be the residual sum of squares for the model containing k predictors, and RSS null the residual sum of squares before the kth predictor was added, we can form the usual statistic R k = (RSS null RSS)/σ 2 (with σ assumed known), and compare it to a χ 2 1 distribution. 8 / 45

9 Simulated example: naive forward stepwise Setup: n = 100, p = 10, true model null Test statistic Chi squared on 1 df Test is too liberal: for nominal size 5%, actual type I error is 39%. (Yes, Larry, can get proper p-values by sample splitting: but messy, loss of power) 9 / 45

10 Quick review of LAR and the covariance test Least angle regression or LAR is a method for constructing the path of solutions for the lasso: min β 0,β j i (y i β 0 j x ij β j ) 2 + λ β j LAR is a more democratic version of forward stepwise regression. Find the predictor most correlated with the outcome Move the parameter vector in the least squares direction until some other predictor has as much correlation with the current residual This new predictor is added to the active set, and the procedure is repeated Optional ( lasso mode ): if a non-zero coefficient hits zero, that predictor is dropped from the active set, and the process is restarted j 10 / 45

11 Least angle regression in a picture Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 11 / 45

12 The covariance test for LAR (Lockhart, Taylor, Ryan Tibshirani, Rob Tibshirani, discussion paper in Annals of Statistics, 2014) The covariance test provides a p-value for each variable as it is added to lasso model via the LAR algorithm. In particular it tests the hypothesis: H 0 : A supp(β ) where A is the running active set, at the current step of LAR 12 / 45

13 The covariance test for LAR Suppose we want a p-value for predictor 2, entering at step 3 Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 13 / 45

14 Compute covariance at λ 4 : y, X ˆβ(λ 4 ) Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 14 / 45

15 Drop x 2, yielding active yet A; ( refit at λ 4, and compute resulting covariance at λ 4, giving T 3 = y, X ˆβ(λ ) 4 ) y, X A ˆβA (λ 4 ) /σ 2 Coefficients λ 1 λ 2 λ 3 λ 4 λ log(λ) 15 / 45

16 Null distribution of the covariance statistic Under the null hypothesis that all signal variables are in the model: H 0 : A supp(β ), the covariance statistic follows T j 1 ( σ 2 y, X ˆβ(λ ) j+1 ) y, X A ˆβA (λ j+1 ) Exp(1) as n, p 16 / 45

17 Null distribution of the covariance statistic Under the null hypothesis that all signal variables are in the model: H 0 : A supp(β ), the covariance statistic follows T j 1 ( σ 2 y, X ˆβ(λ ) j+1 ) y, X A ˆβA (λ j+1 ) Exp(1) as n, p Equivalent knot form : T j = c j σ 2 λ j(λ j λ j+1 ) with c j = 1 for global null case (j = 1) 16 / 45

18 Simulated example: covariance test Setup: n = 100, p = 10, true model null Test statistic Exp(1) 17 / 45

19 Example: prostate cancer data Data from a study of the level of prostate-specific antigen and p = 8 clinical measures, in n = 67 men about to receive a radical prostatectomy Stepwise, naive LAR, covariance test lcavol lweight svi lbph pgg age lcp gleason / 45

20 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] 19 / 45

21 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 19 / 45

22 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 19 / 45

23 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 19 / 45

24 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 19 / 45

25 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 19 / 45

26 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 5. Doesn t have a cool enough name 19 / 45

27 Shortcomings of the covariance test The covariance test is highly intuitive and actually pretty broad: it can be extended to other sparse estimation problems [e.g., graphical models and clustering (G Sell et al 2014)] But it has some definite weaknesses: 1. Places correlation restrictions on the predictors X (somewhat similar to standard conditions for exact model recovery) 2. Assumes linearity of the underlying model (i.e., y = Xβ + ɛ) 3. Significance statements are asymptotic 4. Hard to get confidence statements out of the covariance test statistic (not easily pivotable ) 5. Doesn t have a cool enough name We ll discuss a new framework that overcomes 1-4, and especially 5 19 / 45

28 What s coming next: a roadmap 20 / 45

29 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 20 / 45

30 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 20 / 45

31 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples 20 / 45

32 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 20 / 45

33 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: Spacing test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 1 Φ(λ 1 ) Unif(0, 1) for any n, p 1 Φ(λ 2 ) 20 / 45

34 What s coming next: a roadmap 1. General framework for inference after polyhedral selection 2. Application to forward stepwise regression and LAR 3. For LAR, we obtain the spacing test, which is exact in finite samples Consider the global null, σ 2 = 1, and λ 1, λ 2 first two LAR knots. Covariance test: Spacing test: λ 1 (λ 1 λ 2 ) Exp(1) as n, p 1 Φ(λ 1 ) Unif(0, 1) for any n, p 1 Φ(λ 2 ) (Intriguing connection: these are asymptotically equivalent) 20 / 45

35 Preview: prostate cancer data Naive Selection adjusted lcavol lweight svi lbph pgg45 lcp age gleason 21 / 45

36 A new framework for inference after selection 22 / 45

37 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) 23 / 45

38 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) We wish to make inferences on ν T θ a linear contrast of the mean θ conditional on y S, where S is polyhedron S = {y : Γy u} The vector ν = ν(s) is allowed to depend on S 23 / 45

39 The polyhedral testing framework Suppose we observe y N(θ, Σ), with mean parameter θ unknown (but covariance Σ known) We wish to make inferences on ν T θ a linear contrast of the mean θ conditional on y S, where S is polyhedron S = {y : Γy u} The vector ν = ν(s) is allowed to depend on S E.g., we d like a p-value P (y, S, ν) that satisfies ( ) P (y, S, ν) α Γy u = α P ν T θ=0 for any 0 α 1 23 / 45

40 Fundamental result Lemma (Polyhedral selection as truncation) For any ν 0, we have {Γy u} = {V ν T y V +, V 0 0}, where γ = ΓΣν ν T Σν V (y) = max j:γ j >0 V + (y) = min j:γ j <0 u j (Γy) j + γ j ν T y γ j u j (Γy) j + γ j ν T y γ j V 0 (y) = min j:γ j =0 u j (Γy) j, Moreover, the triplet (V, V +, V 0 )(y) is independent of ν T y. 24 / 45

41 Proof V P ν y y V + ν ν T y {Γy u} 25 / 45

42 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) 26 / 45

43 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? 26 / 45

44 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 26 / 45

45 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 1. For Z N [a,b] (µ, σ 2 ), and F [a,b] µ,σ 2 its CDF, we have ( ) P F [a,b] (Z) α µ,σ 2 = α 26 / 45

46 Using this result for conditional inference In other words, the distribution ν T y Γy u is the same as ν T y V ν T y V +, V 0 0 This is a truncated Gaussian distribution (with random limits) How do we use this result for conditional inference on ν T θ? Follow two main ideas: 1. For Z N [a,b] (µ, σ 2 ), and F [a,b] µ,σ 2 its CDF, we have ( ) P F [a,b] (Z) α µ,σ 2 = α 2. Hence also ( ) P F [V,V + ] ν T θ,ν T Σν (νt y) α Γy u = α 26 / 45

47 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), 27 / 45

48 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α 27 / 45

49 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α Furthermore, because the same statement holds for any fixed µ, ( ) P ν T θ=µ 1 F [V,V + ] µ,ν T Σν (νt y) α Γy u = α 27 / 45

50 Therefore to test H 0 : ν T θ = 0 against H 1 : ν T θ > 0, we can take as a conditional p-value P = 1 F [V,V + ] 0,ν T Σν (νt y), since ( ) P ν T θ=0 1 F [V,V + ] 0,ν T Σν (νt y) α Γy u = α Furthermore, because the same statement holds for any fixed µ, ( ) P ν T θ=µ 1 F [V,V + ] µ,ν T Σν (νt y) α Γy u = α we can compute a conditional confidence interval by inverting the pivot, yielding ( ) P ν T θ [δ α/2, δ 1 α/2 ] Γy u = 1 α 27 / 45

51 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45

52 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45

53 Illustration Truncated normal density x We observe random limits V, V + (in black), and a random point ν T y (in red); we form truncated normal density f [V,V + ], and the 0,ν T Σν mass to the right of ν T y is our p-value 28 / 45

54 Application to forward stepwise and least angle regression 29 / 45

55 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. 30 / 45

56 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. 30 / 45

57 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al at step l. gives the signs of active coefficients 30 / 45

58 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps 30 / 45

59 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: 30 / 45

60 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows 30 / 45

61 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows For LAR, Γ has only k + 1 rows! 30 / 45

62 Sequential model selection as a polyhedral set Suppose we run k steps of forward stepwise or LAR, and encounter the sequence of active sets A l, l = 1,... k. We can express { y : ( Â l (y), ŝ Al (y) ) } = (A l, s Al ), l = 1,... k = {y : Γy 0} for some matrix Γ. Here s Al gives the signs of active coefficients at step l. This describes all vectors y such that the algorithm would make the same selections (variables and signs) over k steps Important points: For forward stepwise, Γ has 2pk rows For LAR, Γ has only k + 1 rows! Computation of V, V + is O(number of rows of Γ) 30 / 45

63 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. 31 / 45

64 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: 31 / 45

65 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k 31 / 45

66 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k We place no restrictions on predictors X (general position) 31 / 45

67 Inference for forward stepwise and LAR Using our polyhedral framework, we can derive conditional p-values or confidence intervals for any linear contrast ν T θ. Notes: Here ν can depend on the polyhedron, i.e., can depend on the selection A l, s Al, l = 1,... k We place no restrictions on predictors X (general position) Interesting case to keep in mind: ν = X Ak (X T A k X Ak ) 1 e k, so that ν T θ = e T k (XT A k X Ak ) 1 X T A k θ which is the kth coefficient in the multiple regression of θ on active variables A k 31 / 45

68 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 32 / 45

69 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 32 / 45

70 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 When ν = X Ak (X T A k X Ak ) 1 e k : this tests the significance of the projected regression coefficient of the last variable entered into the model, 32 / 45

71 Our conditional p-value for H 0 : ν T θ = 0 is Φ P = Φ ( V + ) σ ν 2 Φ ( ) V + σ ν 2 Φ ( ) ν T y σ ν 2 ( ) V σ ν 2 This has exact (!) conditional size: ( P α ( Â l (y), ŝ Al (y) ) ) = (A l, s Al ), l = 1,... k = α P ν T θ=0 When ν = X Ak (X T A k X Ak ) 1 e k : this tests the significance of the projected regression coefficient of the last variable entered into the model, conditional on all selections that have been made so far 32 / 45

72 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot 33 / 45

73 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α 33 / 45

74 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α When ν = X Ak (X T A k X Ak ) 1 e k : this interval traps the projected regression coefficient of the last variable to enter with probability 1 α, 33 / 45

75 Our conditional confidence interval is I = [δ α/2, δ 1 α/2 ], where the endpoints are computed by inverting the truncated Gaussian pivot This has exact (!) conditional coverage: ( P ν T θ [δ α/2, δ 1 α/2 ] (Âl (y), ŝ Al (y) ) = (A l, s Al ), l = 1,... k) = 1 α When ν = X Ak (X T A k X Ak ) 1 e k : this interval traps the projected regression coefficient of the last variable to enter with probability 1 α, conditional on all selections made so far 33 / 45

76 The spacing test The spacing test is the name we give to our framework applied to LAR 34 / 45

77 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) 34 / 45

78 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) Because Γ is especially simple for LAR, this test easy and efficient to implement. 34 / 45

79 The spacing test The spacing test is the name we give to our framework applied to LAR (... tentative paper title 2014: A Spacing Odyssey ) Because Γ is especially simple for LAR, this test easy and efficient to implement. The p-value for H 0 : e T k (XT A k X Ak ) 1 X T A k θ = 0 is P = Φ ( Φ V ) σ ω k Φ ( ) V σ ω k Φ ( ) λk σ ω ( k ) V + σ ω k where λ k is the kth LAR knot, and ω k = X Ak (XA T k X Ak ) 1 s Ak X Ak 1 (XA T k 1 X Ak 1 ) 1 2 s Ak 1 34 / 45

80 Example: selection adjustment under the global null Step= 1 Step= 2 < max-t > Step= 3 Step= 4 Maxt Maxt Maxt Maxt Expected Expected Expected Expected Step= 1 <- Selection-adjusted FS-> Step= 2 Step= 3 T Step= 4 Exact FS Exact FS Exact FS Exact FS Expected Expected Expected Expected Step= 1 Step= 2 <- Spacing test-> Step= 3 Step= 4 Spacing Spacing Spacing Spacing Expected Expected Expected Expected 35 / 45

81 Example: back to prostate data Stepwise, naive Stepwise, adjusted lcavol lweight svi lbph pgg lcp age gleason LAR, covariance test LAR, spacing test lcavol lweight svi lbph pgg age lcp gleason / 45

82 Selection intervals Interestingly, our conditional intervals also hold unconditionally 37 / 45

83 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (X T A k X Ak ) 1 e k for concreteness; 37 / 45

84 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α 37 / 45

85 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; 37 / 45

86 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, 37 / 45

87 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, here the (random) projected regression coefficient of the kth variable to enter 37 / 45

88 Selection intervals Interestingly, our conditional intervals also hold unconditionally Consider ν = X Ak (XA T k X Ak ) 1 e k for concreteness; by averaging over all possible selections (A l, s Al ), l = 1,... k we obtain ( ) P e T k (XṰ XÂk ) 1 X Ṱ θ [δ A k A k α/2, δ 1 α/2 ] = 1 α We call this a selection interval; in contrast to a typical confidence interval, this tracks a moving target, here the (random) projected regression coefficient of the kth variable to enter Roughly speaking, we can think of this interval as trapping the projected coefficient of the kth most important variable as deemed by the algorithm, with probability 1 α 37 / 45

89 Example: selection intervals for LAR Selection interval for predictor 1 Parameter value Realization Selection interval for predictor 2 Parameter value Realization Selection intervals for first predictor entered, at each LAR step Parameter value LAR step 38 / 45

90 Application of these and related ideas to other problems Beyond polyhedra (Taylor, Loftus, Ryan Tibs 2014) Graphical models and clustering (G Sell, Taylor, Tibs 2014) Many normal means (Reid, Taylor, Tibs 2014) Lasso with fixed λ (Lee, Sun, Sun, Taylor 2014) Marginal screening (Sun, Taylor 2014) PCA (Choi, Taylor, Tibs in preparation) 39 / 45

91 PCA example Scree Plot of (true) rank = 2 Singular values(snr=1.5) Singular values(snr=0.23) Singular values Singular values index index (p-values for right: 0.030, 0.064, 0.222, 0.286, 0.197, 0.831, 0.510, 0.185, ) 40 / 45

92 Generalization to principal component analysis Model Y = B + ɛ and we want to test rank(b) k. Test is based on singular values of Wishart matrix Y T Y. Largest singular value has a Tracy-Widom distribution asymptotically (Johnstone 2001) and this can be used to construct a test of the global null rank(b) = 0. This can be applied sequentially for other ranks (Kritchman and Nadler, 2008) We derive the conditional distribution of each singular value λ k, conditional on λ k+1... λ p and use this to obtain an exact test 41 / 45

93 rank=0, step=1 rank=0, step=2 rank=0, step=3 rank=0, step=4 sort(result0[, 1]) Jon's method integrated out Nadler's sort(result0[, j]) sort(result0[, j]) sort(result0[, j]) qs qs qs qs rank=1, step=1 rank=1, step=2 rank=1, step=3 rank=1, step=4 sort(result1[, j]) sort(result1[, j]) sort(result1[, j]) sort(result1[, j]) qs qs qs qs rank=2, step=1 rank=2, step=2 rank=2, step=3 rank=2, step=4 sort(result2[, j]) sort(result2[, j]) sort(result2[, j]) sort(result2[, j]) qs qs qs qs 42 / 45

94 Final comments Data analysts need tools for inference after selection, and these are now becoming available But much work is left to be done (power, robustness, computation, etc.) R language software is on its way! 43 / 45

95 Questions Larry: How do you estimate σ 2? Estimating σ 2 is harder than estimating the signal! Ale: What s the Betti number of the Rips complex of the selection set under Hausdorff paracompactness? Jay: Is there a Bayesian analog? Jim Ramsey: Assuming normality is bogus. Nature just gives you a bunch of numbers Ryan: Why can t we just apply trend filtering? 44 / 45

96 Questions Larry: You are doing inference for ν T θ but your ν is random. How do you do inference for E (ν) T θ? Ale: How does this work for log linear models? Jing: How does this work for sparse PCA? Max: How dependent are the p-values? Could this work with ForwardStop? 45 / 45

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with

More information

arxiv: v5 [stat.me] 11 Oct 2015

arxiv: v5 [stat.me] 11 Oct 2015 Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

A significance test for the lasso

A significance test for the lasso A significance test for the lasso Richard Lockhart 1 Jonathan Taylor 2 Ryan J. Tibshirani 3 Robert Tibshirani 2 arxiv:1301.7161v1 [math.st] 30 Jan 2013 1 Simon Fraser University, 2 Stanford University,

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Uniform Asymptotic Inference and the Bootstrap After Model Selection

Uniform Asymptotic Inference and the Bootstrap After Model Selection Uniform Asymptotic Inference and the Bootstrap After Model Selection Ryan Tibshirani, Alessandro Rinaldo, Rob Tibshirani, and Larry Wasserman Carnegie Mellon University and Stanford University arxiv:1506.06266v2

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Richard Lockhart 1 Jonathan Taylor 2 Ryan J. Tibshirani 3 Robert Tibshirani 2 1 Simon Fraser University, 2 Stanford University, 3 Carnegie Mellon University Abstract In

More information

arxiv: v3 [stat.me] 31 May 2015

arxiv: v3 [stat.me] 31 May 2015 Submitted to the Annals of Statistics arxiv: arxiv:1410.8260 SELECTING THE NUMBER OF PRINCIPAL COMPONENTS: ESTIMATION OF THE TRUE RANK OF A NOIS MATRIX arxiv:1410.8260v3 [stat.me] 31 May 2015 By unjin

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Package covtest. R topics documented:

Package covtest. R topics documented: Package covtest February 19, 2015 Title Computes covariance test for adaptive linear modelling Version 1.02 Depends lars,glmnet,glmpath (>= 0.97),MASS Author Richard Lockhart, Jon Taylor, Ryan Tibshirani,

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

CS 340 Lec. 15: Linear Regression

CS 340 Lec. 15: Linear Regression CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Exact Post Model Selection Inference for Marginal Screening

Exact Post Model Selection Inference for Marginal Screening Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Lecture 17 May 11, 2018

Lecture 17 May 11, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model

More information

Covariate-Assisted Variable Ranking

Covariate-Assisted Variable Ranking Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Introduction to the genlasso package

Introduction to the genlasso package Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

arxiv: v2 [stat.me] 13 Mar 2015

arxiv: v2 [stat.me] 13 Mar 2015 arxiv:1503.00334v2 [stat.me] 13 Mar 2015 Sparse regression and marginal testing using cluster prototypes Stephen Reid and Robert Tibshirani Abstract We propose a new approach for sparse regression and

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Statistics Ph.D. Qualifying Exam

Statistics Ph.D. Qualifying Exam Department of Statistics Carnegie Mellon University May 7 2008 Statistics Ph.D. Qualifying Exam You are not expected to solve all five problems. Complete solutions to few problems will be preferred to

More information

High-dimensional statistics and data analysis Course Part I

High-dimensional statistics and data analysis Course Part I and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Distribution-Free Predictive Inference for Regression

Distribution-Free Predictive Inference for Regression Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Selective Inference for Effect Modification: An Empirical Investigation

Selective Inference for Effect Modification: An Empirical Investigation Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

arxiv: v1 [math.st] 27 May 2014

arxiv: v1 [math.st] 27 May 2014 The Annals of Statistics 2014, Vol. 42, No. 2, 518 531 DOI: 10.1214/14-AOS1175REJ Main article DOI: 10.1214/13-AOS1175 c Institute of Mathematical Statistics, 2014 REJOINDER: A SIGNIFICANCE TEST FOR THE

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Linear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks!

Linear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks! Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, we ve been interested in learning P(Y X) where Y has discrete values

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

Mallows Cp for Out-of-sample Prediction

Mallows Cp for Out-of-sample Prediction Mallows Cp for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania lbrown@wharton.upenn.edu WHOA-PSI conference, St. Louis, Oct 1, 2016 Joint work

More information

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010 Problem Set 2 Solution Sketches Time Series Analysis Spring 2010 Forecasting 1. Let X and Y be two random variables such that E(X 2 ) < and E(Y 2 )

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Consistent Selection of Tuning Parameters via Variable Selection Stability

Consistent Selection of Tuning Parameters via Variable Selection Stability Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information