Post-selection Inference for Changepoint Detection

Size: px

Start display at page:

Download "Post-selection Inference for Changepoint Detection"

Gwendoline Lynch
5 years ago
Views:

1 Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder, Larry Wasserman, 1 / 25

2 Outline Introduction to post-selection inference ideas. Motivating applications Copy number variation (CNV) analysis. Simultaneous changepoint detection. Inference for changepoint algorithms Binary segmentation (work in progress) Generalized lasso (2016, submitted) Examples of analysis. Post-selection goodness-of-fit tests Future work. 2 / 25

3 Introduction: What is post-selection inference? 2 / 25

4 Motivation: an example Say you had noisy data generated around the underlying mean, Data Mean Location 3 / 25

5 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint 3 / 25

6 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint Question: How to infer about the changepoint at A, or B? 3 / 25

7 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Data Estimate Changepoint Location Not quite right answer: Classical hypothesis tests? Invalid because (1) A&B are not a priori fixed quantities, and more specifically (2) A&B were chosen to somehow maximize this gap. 3 / 25

8 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint Location Naive p-values TG p-values A B / 25

9 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. 4 / 25

10 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. 4 / 25

11 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. 4 / 25

12 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. Classical tests are not valid (generally too optimistic.) But why? 4 / 25

13 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. Classical tests are not valid (generally too optimistic.) But why? Plainly put: Fit-maximizing model quantities are often already optimistic w.r.t. alternative hypotheses. Specifically: Classical inference considers an a priori fixed hypothesis to be tested, not a random one (adaptively specified). 4 / 25

14 Post-selection inference literature In order of relatedness: Sequential regression procedures (covering FS, LAR, lasso): Tibshirani, Taylor, Lockhart, Tibshirani (2014) Lasso at fixed λ value: Lee, Sun, Sun, Taylor (2014) Sequential goodness-of-fit tests: Fithian, Taylor, Tibshirani, Tibshirani (2014) Other applications: Marginal screening, Many normal means, Grouped stepwise regression, Principal component analysis, L1 penalized likelihood models Asymptotics of non-gaussian error: discussed in Tian, Taylor (2015a), Tibshirani, Rinaldo, Tibshirani, Wasserman (2016). 5 / 25

15 Motivating applications Copy number variation, and Biological Molecular Motors. 5 / 25

16 Example : DNA copy number data DNA copy number is the number of copies of the DNA in that genomic region within the genome of the sample, relative to one or more control samples. Goals: Detect genomic aberration (gain/loss) region between two+ individuals. Gain better understanding of the role of DNA copy number changes in human disease. Early diagnosis of diseases (cancer, prenatal diseases) 6 / 25

17 Example : DNA copy number data Data Estimate Data Estimate Data obtained from array - Comparative Genomic Hybridization (a-cgh) technique: Tumor vs. reference genomic DNAs are processed, then the fluorescence ratios log 2 ratio of count-like measurements are collected along the length of chromosomes. Developed to survey DNA copy-number variations across a whole genome, at a high resolution. 6 / 25

18 Example: Probing the motion of biological molecular motors. 7 / 25

19 Example: Probing the motion of biological molecular motors. Two potential questions of interest for collaborators: After algorithm is applied, how do you infer about a changepoint î in the fitted model? H 0 : mean shift doesn t occur at î in the signal After model size is selected, how do you know the quality of the recovered locations of slope change (knots)? H 0 : Êk includes all true knots Example of analysis: use isotonic trend filtering (for piecewise linear signal recovery), then condition on this selection event, to conduct appropriate tests using post-selection inference machinery. 8 / 25

20 Possible other examples Financial time series: Post-selection tests in autoregressive data models. Images/spatial scan: Scan statistics on images. Neuroscience data: Possible application of post-selection scan statistics. 9 / 25

21 Methods: Binary Segmentation, Generalized Lasso, and Goodness of Fit Tests 9 / 25

22 Data model Let s assume Gaussian data model: Y = (Y 1,, Y n ) T N (θ, Σ) where Σ R n n covariance matrix. Make it even simpler; treat the covariance matrix Σ = σ 2 I n. In the end, we discuss departures from the i.i.d. Gaussian model. 10 / 25

23 Standard Binary segmentation Standard binary segmentation (SBS) 1 is recursively splitting to find the best break location, according to the so-called CUSUM statistic: Ỹ s,e 1 b = 1 R + 1 (ȲR ȲL), L = (s + 1) : b, R = (b + 1) : e L (1) Assuming constant variance σ, this is basically the variance-stabilized mean difference, for T = (ȲR ȲL): Ỹ = T Var[T ] σ i.i.d. Gaussian N (0, 1) i.e. CUSUM s scale is globally meaningful, at any step of the recursive algorithm. 1 Well studied in the literature, see Fryzlewicz / 25

24 Standard Binary segmentation Algorithm 1 Standard binary segmentation 1: B 2: function StandardBinSeg(s, e, ζ, Y ) 3: if e s < 1 then 4: Stop 5: else 6: b 0 := arg max Ỹ s,e b b {s,,e 1} 7: if X b 0 s,e > ζ then 8: Add b 0 to the collection of changepoints B. 9: StandardBinseg(s,, b 0, ζ, Y ) 10: StandardBinseg(b 0 + 1, e, ζ, Y ) 11: else 12: Stop 13: end if 14: end if 15: end function 11 / 25

25 Standard Binary segmentation / 25

26 Standard Binary segmentation Breaks CUSUMs Estimate / 25

27 Standard Binary segmentation Breaks CUSUMs Estimate / 25

28 Standard Binary segmentation Breaks CUSUMs Estimate / 25

29 Standard Binary segmentation Breaks CUSUMs Estimate / 25

30 Standard Binary segmentation Breaks CUSUMs Estimate / 25

31 Standard Binary segmentation Breaks CUSUMs Estimate / 25

32 Standard Binary segmentation Breaks CUSUMs Estimate / 25

33 Standard Binary segmentation Breaks CUSUMs Estimate / 25

34 Standard Binary segmentation Q: Why is a selection event polyhedral? 11 / 25

35 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 11 / 25

36 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 A: Sign of CUSUM s b = sign(ỹb) also halfspace: s b Ỹb 0 for all b 11 / 25

37 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 A: Sign of CUSUM s b = sign(ỹb) also halfspace: A: Passing threshold? Similarly: s b Ỹb 0 for all b s b Ỹb ζ for all b 11 / 25

38 Standard Binary segmentation Short inductive proof shows that the intersection of such halfspaces, which forms a polyhedron: g1 T u 1 g2 T P = {y : Gy = y u 2.. } (1) gm T u m characterizes the exact sequence of selection events for y: y P = {y : Gy 0} Algorithm applied to y results in that sequence of algorithm output for data on hand. 11 / 25

39 Inference after selection Post-selection inference 1 is regarding the conditional distribution Some intuition: v T Y P v Y, P where Y N (θ, Σ) (2) 1. Parametrize Gaussian with (µ, η) = (v T θ, P v θ), whose sufficient statistics are (T (Y ), U(Y )) = (v T Y, P v Y ). 2. The distribution of (2) is a conditional pivot, whose distribution depends only on µ, and not on η. Allows for testing of H 0 : v T θ = 0 vs H A : v T θ 0 3. v R n is user designed, and can depend on model selection event (RHS of (2)). 1 Saturated model test from Lee et al. (2016), Tibshirani et al. (2016), Fithian et al. (2014, 2016) 12 / 25

40 Truncated Gaussian statistic V up V lo P v y v T y y v {Γy u} v T Y P V Y, P d = v T Y V lo (Y ) v T Y V up (Y ), V 0 (Y ) 0 d = Truncated Gaussian with fixed parameters 13 / 25

41 Truncated Gaussian statistic V lo P v y v v T y V up y {Γy u} In principle, you could just sample: You can simulate directly from v T Y P, but this is very inefficient even for moderate sized problems. After conditioning on nuisance, sampling is not needed: You can simulate directly from v T Y P v Y, P. This is more efficient, but under Gaussian model, TG quantiles are available in closed form! (no sampling) 13 / 25

42 Testing after binary segmentation (work in progress) Take these steps: 1. Run standard binary segmentation on data. 2, Form polyhedron P = {y : Gy u}. 2. Form contrast v R n which is regarding a particular changepoint î = 3 (closest nearby changepoint is +7). v = ( 1 3, 1 3, 1 3, 1 4, 1 4, 1 4, 1, 0,, 0)T 4 3. Calculate p-value via TG statistic, for test: 4. Repeat for all î. H 0 : v T θ = θ 1:3 θ 4:7 = 0 vs H A : v T θ > 0 2 Fixed-step SBS algorithm also exists, and also constitutes a polyhedral event. 14 / 25

43 Related algorithms (future work) 1. Wild Binary Segmentation: Do SBS on randomly drawn intervals. Use randomized inference from Tian et al. (2016) 2. Circular Binary segmentation: SBS but replace CUSUM with a statistic that looks for & instead of. Better for CNV data analysis, (gain/losses are of this shape). 3. Multiple-source segmentation: Several sources of data Y = (Y 1,, Y K ), where each Y i R n is source from different mechanisms. Simultaneous changepoint detection algorithm + inference. 15 / 25

44 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) 16 / 25

45 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) Data example (δ = 3) d fused lasso (piece-wise constant) Trend filter (piece-wise polynomial) 2d fused lasso (piece-wise constant in 2d) 16 / 25

46 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) Group 2 (mean = 1) Initial graph Group 3 (mean = 3) Group 1 (mean = 0) Graph fused lasso (Piece-wise constant nodes) Stock price Trading day (2015) stock 1 stock 3 stock Regression, with X I n (piece-wise constant coefficients) 1st coefficient nd coefficient rd coefficient A B A B p value = 1.46e 05 p value = 3.46e 08 Estimate Truth A B A A B p value = p value = 1.81e 05 Original changepoints Decluttered changepoints Estimate Truth A p value = Estimate Truth Location 16 / 25

47 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) d fused lasso (Piece-wise constant) D = D (1) = / 25

48 Path algorithm (From Tibshirani & Taylor (2010)). Traces the dual solution û(λ) of the generalized lasso problem, and produces a piecewise-linear continuous solution path for λ [0, ). Easily transformable to primal solution ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 for free, from optimality conditions. 17 / 25

49 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ > / 25

50 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

51 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

52 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

53 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

54 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

55 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: 19 / 25

56 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. 19 / 25

57 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. 19 / 25

58 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. Data-driven model size selection A stopping rule based on a generic information criterion, to select a data-driven stopping rule for the path. 19 / 25

59 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. Data-driven model size selection A stopping rule based on a generic information criterion, to select a data-driven stopping rule for the path. Post-processing tools to improve power and a visualization aid, to improve practical useability. 19 / 25

60 1D fused lasso simulations. Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location 20 / 25

61 Copy number variation revisited / 25

62 Copy number variation revisited Data Estimate / 25

63 Copy number variation revisited / 25

64 Copy number variation revisited / 25

65 Copy number variation revisited / 25

66 Copy number variation revisited Glioblas- toma multiforme (GBM) tumor a-cgh data A B CD E F G H I L J K M Data Fused lasso estimate Sparse fused lasso estimate Step sign plot A,D,E,G are significant after fused lasso selection, and E from the sparse fused lasso. (using segment comparisons) 3 3 Post-processing tools were used. 21 / 25

67 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

68 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 Procedure: For a changepoint set Mk = (i 1,, i k ) selected by your procedure, a goodness of fit test is regarding the null: H 0 : θ span{1 1:i1, 1 i1+1:i 2,, 1 ik,n} Note, for this test, we are assuming the latest selected, lower-dimensional statistical model, parametrized by the changepoint set M k. 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

69 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 Procedure: For a changepoint set Mk = (i 1,, i k ) selected by your procedure, a goodness of fit test is regarding the null: H 0 : θ span{1 1:i1, 1 i1+1:i 2,, 1 ik,n} Note, for this test, we are assuming the latest selected, lower-dimensional statistical model, parametrized by the changepoint set M k. The test is based on the distribution of: Ȳ (ik +1):n Ȳ(i k +1):i k+1 Ȳi:i 1, Ȳ(i 1+1):i 2,, Ȳ(i k +1):n, (3) i 1, i 2,, i k+1, Y 2 Form p-value based on large absolute values of (3). 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

70 Reject for large values of Goodness of fit tests Sequential changepoint detection. Ȳ (ik +1):n Ȳ(i k +1):i k+1 Ȳi:i 1, Ȳ(i 1 +1):i 2,, Ȳ(i k +1):n, i 1, i 2,, i k+1, Y 2 Q: Why is this a good quantity? A: If Ȳ(i k +1):n Ȳ(i k +1):i k+1 large after k steps of the algorithm, the latest model M k may not be adequate enough (yet)! This is called selected model testing; developed by Fithian (2015, 2016), for sequential procedures on regression. No need for known/estimated σ. p-values/ci s calculated using Monte Carlo. 22 / 25

71 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. 23 / 25

72 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. Specifically: Define k0 as the first index of the first correct model k 0 = min{k : F M k (Y )}. Then, we would like to find stopping time ˆk in algorithm such that ( ) FDR = E (ˆk k 0 ) + ˆk > 0 α is controlled at pre-specified level, say α = / 25

73 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. Specifically: Define k0 as the first index of the first correct model k 0 = min{k : F M k (Y )}. Then, we would like to find stopping time ˆk in algorithm such that ( ) FDR = E (ˆk k 0 ) + ˆk > 0 α is controlled at pre-specified level, say α = Goodness-of-fit p-values are independent, which allows us to use FDR-controlling multiple testing rules such as ForwardStop of ordered hypotheses from G sell et al.(2014). 23 / 25

74 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} 24 / 25

75 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} Modify procedure to select all changepoints within a log n vicinity of elements in E k 1. M k 1 = e E k 1 {u ± 1, u ± 2,, u ± log n } 24 / 25

76 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} Modify procedure to select all changepoints within a log n vicinity of elements in E k 1. M k 1 = e E k 1 {u ± 1, u ± 2,, u ± log n } Verify other properties required for this machinery (subpath sufficiency, properties of p-values, proper reasoning for FDR tools in random ordered hypotheses) 24 / 25

77 Other research items for future work Scan statistics Targeted tests after scan, as an improvement to conventional scan statistics. Careful application of CNV application and enzyme data More general Σ non-i.i.d. Σ (e.g. banded) Handle more realistic time series data non-gaussian, exponential family distributed noises. More: Inference for Simultaneous changepoint detection Improved model size detection Improved calculation/storage for polyhedron Randomized inference Software for all of the above. 25 / 25

78 Thank you! Questions? 25 / 25

79 Additional slides: 0 / 0

80 Linear contrasts for 1d changepoint detection Data A B Estimate Changepoint Segment contrast p value = p value = Data A B Estimate Changepoint Spike contrast p value = p value = Figure: v = (0 0 1 n 1 Location Segment test. 1 1, n 1 n r 1 n r 0) T Figure: Location Spike test. v = ( ) T 1 / 0

81 Segment Test: Designing contrasts v. H 0 : v T θ = 0 where v = (0 0 1 n 1 1 n 1, 1 n r 1 n r ) T vs H A : v T θ Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

82 Designing contrasts v. Segment Test: ( ) 1 H 0 : y i + 1 y i = 0 vs H A : (same) 0 n 1 n r i L i R Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

83 Designing contrasts v. Segment Test: ( ) 1 H 0 : s y i + 1 y i = 0 vs H A : s (same) > 0 n 1 n r i L i R Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

84 Data driven model selection. 3 / 0

85 Data driven model selection. Model selection with information criteria: 3 / 0

86 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C 3 / 0

87 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C In words: model size k = ˆk can also be conditioned on! 3 / 0

88 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C In words: model size k = ˆk can also be conditioned on! IC All steps Candidate steps IC selected step IC All steps Candidate steps IC selected step Step Step 3 / 0

89 1d changepoint detection. Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location Data example Spike test p values Segment test p values δ = 2 Step 1 Step 2 Observed Step 1 Step 2 Observed Expected Expected Location 4 / 0

90 Saturated model inference Saturated model inference 4 is regarding the conditional distribution Some intuition: v T Y P V, P where Y N (θ, Σ) 1. Parametrize Gaussian with (µ, η) = (v T θ, P v θ), whose sufficient statistics are (T, U) = (v T θ, P v yθ). 2. P v θ is nuisance parameter; in exponential family distributions, T U allows for the formation of a pivot. i.e. one that only depends only on µ, not on η. 3. The conditioning on P v y plays the role of restricting our inference quantity to a line, and a test statistic based on quantiles of (*) is just measuring the deviation of y to either direction of the line (amount of v T y, within the intersecting points of the polyhedron, and away from the hypothesized v T θ 0 (often zero). 4 Borrowing terminology from Fithian et al. (2014, 2016) 5 / 0

91 1d changepoint detection Data A B Estimate Changepoint Segment contrast p value = p value = Data A B Estimate Changepoint Spike contrast p value = p value = Figure: v = (0 0 1 n 1 Location Segment test. 1 1, n 1 n r 1 n r 0) T Figure: Location Spike test. v = ( ) T 6 / 0

92 Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location Data example Spike test p values Segment test p values δ = 2 Step 1 Step 2 Observed Step 1 Step 2 Observed Expected Expected Location 7 / 0

93 Applications to various setting. 8 / 0

94 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering 8 / 0

95 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering Data example Data Mean δ = 5 δ = 2 δ = 1 δ = Location 8 / 0

96 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering Data example Segment test p values Data Mean δ = 5 δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 δ = Location Expected 8 / 0

97 Applications to various setting. 2d image 8 / 0

98 Applications to various setting. 2d image Mean (δ = 3) / 0

99 Applications to various setting. 2d image Data example (δ = 3) / 0

100 Applications to various setting. 2d image Segment test p values Data example (δ = 3) Observed Expected δ = 0 δ = 3 δ = 5 8 / 0

101 Applications to various setting. General graph 8 / 0

102 Applications to various setting. General graph Initial graph Group 1 (mean = 0) Step 46 Step 47 Group 2 mean = 1) p value = Group 3 (mean = 3) 8 / 0

103 Applications to various setting. Changepoints in linear regression coefficients. 8 / 0

104 Applications to various setting. Changepoints in linear regression coefficients. Stock price stock 1 stock 3 stock Trading day (2015) 8 / 0

105 Applications to various setting. Changepoints in linear regression coefficients. Stock price stock 1 stock 3 stock Trading day (2015) 1st coefficient rd coefficient nd coefficient A B A B p value = 1.46e 05 p value = 3.46e 08 Estimate Truth A A p value = Estimate Truth Location A B A B p value = p value = 1.81e 05 Original changepoints Decluttered changepoints Estimate Truth / 0

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September