Post-selection Inference for Changepoint Detection

Size: px
Start display at page:

Download "Post-selection Inference for Changepoint Detection"

Transcription

1 Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder, Larry Wasserman, 1 / 25

2 Outline Introduction to post-selection inference ideas. Motivating applications Copy number variation (CNV) analysis. Simultaneous changepoint detection. Inference for changepoint algorithms Binary segmentation (work in progress) Generalized lasso (2016, submitted) Examples of analysis. Post-selection goodness-of-fit tests Future work. 2 / 25

3 Introduction: What is post-selection inference? 2 / 25

4 Motivation: an example Say you had noisy data generated around the underlying mean, Data Mean Location 3 / 25

5 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint 3 / 25

6 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint Question: How to infer about the changepoint at A, or B? 3 / 25

7 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Data Estimate Changepoint Location Not quite right answer: Classical hypothesis tests? Invalid because (1) A&B are not a priori fixed quantities, and more specifically (2) A&B were chosen to somehow maximize this gap. 3 / 25

8 Motivation: an example Say you had noisy data generated around the underlying mean, and some changepoint model fit to the same (grey) data A B Location Data Estimate Changepoint Location Naive p-values TG p-values A B / 25

9 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. 4 / 25

10 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. 4 / 25

11 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. 4 / 25

12 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. Classical tests are not valid (generally too optimistic.) But why? 4 / 25

13 What is post-selection inference? First (classical) analyst: 1. Devise a model. 2. Collect data. 3. Test hypothesis. Second (common) analyst: 1. Collect data. 2. Select a model from the data. 3. Form hypothesis and tests. Classical tests are not valid (generally too optimistic.) But why? Plainly put: Fit-maximizing model quantities are often already optimistic w.r.t. alternative hypotheses. Specifically: Classical inference considers an a priori fixed hypothesis to be tested, not a random one (adaptively specified). 4 / 25

14 Post-selection inference literature In order of relatedness: Sequential regression procedures (covering FS, LAR, lasso): Tibshirani, Taylor, Lockhart, Tibshirani (2014) Lasso at fixed λ value: Lee, Sun, Sun, Taylor (2014) Sequential goodness-of-fit tests: Fithian, Taylor, Tibshirani, Tibshirani (2014) Other applications: Marginal screening, Many normal means, Grouped stepwise regression, Principal component analysis, L1 penalized likelihood models Asymptotics of non-gaussian error: discussed in Tian, Taylor (2015a), Tibshirani, Rinaldo, Tibshirani, Wasserman (2016). 5 / 25

15 Motivating applications Copy number variation, and Biological Molecular Motors. 5 / 25

16 Example : DNA copy number data DNA copy number is the number of copies of the DNA in that genomic region within the genome of the sample, relative to one or more control samples. Goals: Detect genomic aberration (gain/loss) region between two+ individuals. Gain better understanding of the role of DNA copy number changes in human disease. Early diagnosis of diseases (cancer, prenatal diseases) 6 / 25

17 Example : DNA copy number data Data Estimate Data Estimate Data obtained from array - Comparative Genomic Hybridization (a-cgh) technique: Tumor vs. reference genomic DNAs are processed, then the fluorescence ratios log 2 ratio of count-like measurements are collected along the length of chromosomes. Developed to survey DNA copy-number variations across a whole genome, at a high resolution. 6 / 25

18 Example: Probing the motion of biological molecular motors. 7 / 25

19 Example: Probing the motion of biological molecular motors. Two potential questions of interest for collaborators: After algorithm is applied, how do you infer about a changepoint î in the fitted model? H 0 : mean shift doesn t occur at î in the signal After model size is selected, how do you know the quality of the recovered locations of slope change (knots)? H 0 : Êk includes all true knots Example of analysis: use isotonic trend filtering (for piecewise linear signal recovery), then condition on this selection event, to conduct appropriate tests using post-selection inference machinery. 8 / 25

20 Possible other examples Financial time series: Post-selection tests in autoregressive data models. Images/spatial scan: Scan statistics on images. Neuroscience data: Possible application of post-selection scan statistics. 9 / 25

21 Methods: Binary Segmentation, Generalized Lasso, and Goodness of Fit Tests 9 / 25

22 Data model Let s assume Gaussian data model: Y = (Y 1,, Y n ) T N (θ, Σ) where Σ R n n covariance matrix. Make it even simpler; treat the covariance matrix Σ = σ 2 I n. In the end, we discuss departures from the i.i.d. Gaussian model. 10 / 25

23 Standard Binary segmentation Standard binary segmentation (SBS) 1 is recursively splitting to find the best break location, according to the so-called CUSUM statistic: Ỹ s,e 1 b = 1 R + 1 (ȲR ȲL), L = (s + 1) : b, R = (b + 1) : e L (1) Assuming constant variance σ, this is basically the variance-stabilized mean difference, for T = (ȲR ȲL): Ỹ = T Var[T ] σ i.i.d. Gaussian N (0, 1) i.e. CUSUM s scale is globally meaningful, at any step of the recursive algorithm. 1 Well studied in the literature, see Fryzlewicz / 25

24 Standard Binary segmentation Algorithm 1 Standard binary segmentation 1: B 2: function StandardBinSeg(s, e, ζ, Y ) 3: if e s < 1 then 4: Stop 5: else 6: b 0 := arg max Ỹ s,e b b {s,,e 1} 7: if X b 0 s,e > ζ then 8: Add b 0 to the collection of changepoints B. 9: StandardBinseg(s,, b 0, ζ, Y ) 10: StandardBinseg(b 0 + 1, e, ζ, Y ) 11: else 12: Stop 13: end if 14: end if 15: end function 11 / 25

25 Standard Binary segmentation / 25

26 Standard Binary segmentation Breaks CUSUMs Estimate / 25

27 Standard Binary segmentation Breaks CUSUMs Estimate / 25

28 Standard Binary segmentation Breaks CUSUMs Estimate / 25

29 Standard Binary segmentation Breaks CUSUMs Estimate / 25

30 Standard Binary segmentation Breaks CUSUMs Estimate / 25

31 Standard Binary segmentation Breaks CUSUMs Estimate / 25

32 Standard Binary segmentation Breaks CUSUMs Estimate / 25

33 Standard Binary segmentation Breaks CUSUMs Estimate / 25

34 Standard Binary segmentation Q: Why is a selection event polyhedral? 11 / 25

35 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 11 / 25

36 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 A: Sign of CUSUM s b = sign(ỹb) also halfspace: s b Ỹb 0 for all b 11 / 25

37 Standard Binary segmentation Q: Why is a selection event polyhedral? A: CUSUM comparisons are all halfspaces in y. At some step of the algorithm, selecting breakpoint b 0 {s,, e 1} is equivalent to CUSUM maximized at b 0 : for all s b e 1. Ỹb 0 Ỹb 0 g T Y 0 A: Sign of CUSUM s b = sign(ỹb) also halfspace: A: Passing threshold? Similarly: s b Ỹb 0 for all b s b Ỹb ζ for all b 11 / 25

38 Standard Binary segmentation Short inductive proof shows that the intersection of such halfspaces, which forms a polyhedron: g1 T u 1 g2 T P = {y : Gy = y u 2.. } (1) gm T u m characterizes the exact sequence of selection events for y: y P = {y : Gy 0} Algorithm applied to y results in that sequence of algorithm output for data on hand. 11 / 25

39 Inference after selection Post-selection inference 1 is regarding the conditional distribution Some intuition: v T Y P v Y, P where Y N (θ, Σ) (2) 1. Parametrize Gaussian with (µ, η) = (v T θ, P v θ), whose sufficient statistics are (T (Y ), U(Y )) = (v T Y, P v Y ). 2. The distribution of (2) is a conditional pivot, whose distribution depends only on µ, and not on η. Allows for testing of H 0 : v T θ = 0 vs H A : v T θ 0 3. v R n is user designed, and can depend on model selection event (RHS of (2)). 1 Saturated model test from Lee et al. (2016), Tibshirani et al. (2016), Fithian et al. (2014, 2016) 12 / 25

40 Truncated Gaussian statistic V up V lo P v y v T y y v {Γy u} v T Y P V Y, P d = v T Y V lo (Y ) v T Y V up (Y ), V 0 (Y ) 0 d = Truncated Gaussian with fixed parameters 13 / 25

41 Truncated Gaussian statistic V lo P v y v v T y V up y {Γy u} In principle, you could just sample: You can simulate directly from v T Y P, but this is very inefficient even for moderate sized problems. After conditioning on nuisance, sampling is not needed: You can simulate directly from v T Y P v Y, P. This is more efficient, but under Gaussian model, TG quantiles are available in closed form! (no sampling) 13 / 25

42 Testing after binary segmentation (work in progress) Take these steps: 1. Run standard binary segmentation on data. 2, Form polyhedron P = {y : Gy u}. 2. Form contrast v R n which is regarding a particular changepoint î = 3 (closest nearby changepoint is +7). v = ( 1 3, 1 3, 1 3, 1 4, 1 4, 1 4, 1, 0,, 0)T 4 3. Calculate p-value via TG statistic, for test: 4. Repeat for all î. H 0 : v T θ = θ 1:3 θ 4:7 = 0 vs H A : v T θ > 0 2 Fixed-step SBS algorithm also exists, and also constitutes a polyhedral event. 14 / 25

43 Related algorithms (future work) 1. Wild Binary Segmentation: Do SBS on randomly drawn intervals. Use randomized inference from Tian et al. (2016) 2. Circular Binary segmentation: SBS but replace CUSUM with a statistic that looks for & instead of. Better for CNV data analysis, (gain/losses are of this shape). 3. Multiple-source segmentation: Several sources of data Y = (Y 1,, Y K ), where each Y i R n is source from different mechanisms. Simultaneous changepoint detection algorithm + inference. 15 / 25

44 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) 16 / 25

45 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) Data example (δ = 3) d fused lasso (piece-wise constant) Trend filter (piece-wise polynomial) 2d fused lasso (piece-wise constant in 2d) 16 / 25

46 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) Group 2 (mean = 1) Initial graph Group 3 (mean = 3) Group 1 (mean = 0) Graph fused lasso (Piece-wise constant nodes) Stock price Trading day (2015) stock 1 stock 3 stock Regression, with X I n (piece-wise constant coefficients) 1st coefficient nd coefficient rd coefficient A B A B p value = 1.46e 05 p value = 3.46e 08 Estimate Truth A B A A B p value = p value = 1.81e 05 Original changepoints Decluttered changepoints Estimate Truth A p value = Estimate Truth Location 16 / 25

47 Generalized lasso Recover coefficients β R p with the criterion: ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 Different choices of the penalty matrix D allow for handling of different applications. (In this work, first use X = I n ) d fused lasso (Piece-wise constant) D = D (1) = / 25

48 Path algorithm (From Tibshirani & Taylor (2010)). Traces the dual solution û(λ) of the generalized lasso problem, and produces a piecewise-linear continuous solution path for λ [0, ). Easily transformable to primal solution ˆβ(λ) = arg min β R p y Xβ λ Dβ 1 for free, from optimality conditions. 17 / 25

49 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ > / 25

50 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

51 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

52 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

53 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

54 Path algorithm Solution to the generalized lasso, over λ (0, ). At knots : (Changepoints, directions, and other quantities are selected) Figure: λ = / 25

55 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: 19 / 25

56 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. 19 / 25

57 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. 19 / 25

58 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. Data-driven model size selection A stopping rule based on a generic information criterion, to select a data-driven stopping rule for the path. 19 / 25

59 Summary of contributions See Hyun, G sell, Tibshirani (2016, submitted for review) for details. Summary of contributions: Characterized selections made by generalized lasso path algorithm. Tools for inference in 1d fused lasso, trend filtering, graph fused lasso, and regression problems. Data-driven model size selection A stopping rule based on a generic information criterion, to select a data-driven stopping rule for the path. Post-processing tools to improve power and a visualization aid, to improve practical useability. 19 / 25

60 1D fused lasso simulations. Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location 20 / 25

61 Copy number variation revisited / 25

62 Copy number variation revisited Data Estimate / 25

63 Copy number variation revisited / 25

64 Copy number variation revisited / 25

65 Copy number variation revisited / 25

66 Copy number variation revisited Glioblas- toma multiforme (GBM) tumor a-cgh data A B CD E F G H I L J K M Data Fused lasso estimate Sparse fused lasso estimate Step sign plot A,D,E,G are significant after fused lasso selection, and E from the sparse fused lasso. (using segment comparisons) 3 3 Post-processing tools were used. 21 / 25

67 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

68 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 Procedure: For a changepoint set Mk = (i 1,, i k ) selected by your procedure, a goodness of fit test is regarding the null: H 0 : θ span{1 1:i1, 1 i1+1:i 2,, 1 ik,n} Note, for this test, we are assuming the latest selected, lower-dimensional statistical model, parametrized by the changepoint set M k. 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

69 Goodness of fit tests Sequential changepoint detection. Goal: After k steps of a changepoint algorithm, goal is to test whether we have reached the simplest model in the path that is not refuted by the available data. 4 Procedure: For a changepoint set Mk = (i 1,, i k ) selected by your procedure, a goodness of fit test is regarding the null: H 0 : θ span{1 1:i1, 1 i1+1:i 2,, 1 ik,n} Note, for this test, we are assuming the latest selected, lower-dimensional statistical model, parametrized by the changepoint set M k. The test is based on the distribution of: Ȳ (ik +1):n Ȳ(i k +1):i k+1 Ȳi:i 1, Ȳ(i 1+1):i 2,, Ȳ(i k +1):n, (3) i 1, i 2,, i k+1, Y 2 Form p-value based on large absolute values of (3). 4 The regression perspective developed in Fithian et.al (2014,2016). 22 / 25

70 Reject for large values of Goodness of fit tests Sequential changepoint detection. Ȳ (ik +1):n Ȳ(i k +1):i k+1 Ȳi:i 1, Ȳ(i 1 +1):i 2,, Ȳ(i k +1):n, i 1, i 2,, i k+1, Y 2 Q: Why is this a good quantity? A: If Ȳ(i k +1):n Ȳ(i k +1):i k+1 large after k steps of the algorithm, the latest model M k may not be adequate enough (yet)! This is called selected model testing; developed by Fithian (2015, 2016), for sequential procedures on regression. No need for known/estimated σ. p-values/ci s calculated using Monte Carlo. 22 / 25

71 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. 23 / 25

72 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. Specifically: Define k0 as the first index of the first correct model k 0 = min{k : F M k (Y )}. Then, we would like to find stopping time ˆk in algorithm such that ( ) FDR = E (ˆk k 0 ) + ˆk > 0 α is controlled at pre-specified level, say α = / 25

73 Goodness of fit tests Model selection with FDR control Each p-value corresponds to an addition of a changepoint. Stopping rule: Controlling the number of false discoveries in the selected changepoint set. Specifically: Define k0 as the first index of the first correct model k 0 = min{k : F M k (Y )}. Then, we would like to find stopping time ˆk in algorithm such that ( ) FDR = E (ˆk k 0 ) + ˆk > 0 α is controlled at pre-specified level, say α = Goodness-of-fit p-values are independent, which allows us to use FDR-controlling multiple testing rules such as ForwardStop of ordered hypotheses from G sell et al.(2014). 23 / 25

74 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} 24 / 25

75 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} Modify procedure to select all changepoints within a log n vicinity of elements in E k 1. M k 1 = e E k 1 {u ± 1, u ± 2,, u ± log n } 24 / 25

76 Goodness of fit tests Modification for changepoint regime (future work). 1d fused lasso is known to make some mistakes in location detection, even in high signal-to-noise scenarios. In this case, the selected model null may rarely be true. H 0 : θ span{1 1:i1, 1 i1 +1:i 2,, 1 ik,n} Modify procedure to select all changepoints within a log n vicinity of elements in E k 1. M k 1 = e E k 1 {u ± 1, u ± 2,, u ± log n } Verify other properties required for this machinery (subpath sufficiency, properties of p-values, proper reasoning for FDR tools in random ordered hypotheses) 24 / 25

77 Other research items for future work Scan statistics Targeted tests after scan, as an improvement to conventional scan statistics. Careful application of CNV application and enzyme data More general Σ non-i.i.d. Σ (e.g. banded) Handle more realistic time series data non-gaussian, exponential family distributed noises. More: Inference for Simultaneous changepoint detection Improved model size detection Improved calculation/storage for polyhedron Randomized inference Software for all of the above. 25 / 25

78 Thank you! Questions? 25 / 25

79 Additional slides: 0 / 0

80 Linear contrasts for 1d changepoint detection Data A B Estimate Changepoint Segment contrast p value = p value = Data A B Estimate Changepoint Spike contrast p value = p value = Figure: v = (0 0 1 n 1 Location Segment test. 1 1, n 1 n r 1 n r 0) T Figure: Location Spike test. v = ( ) T 1 / 0

81 Segment Test: Designing contrasts v. H 0 : v T θ = 0 where v = (0 0 1 n 1 1 n 1, 1 n r 1 n r ) T vs H A : v T θ Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

82 Designing contrasts v. Segment Test: ( ) 1 H 0 : y i + 1 y i = 0 vs H A : (same) 0 n 1 n r i L i R Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

83 Designing contrasts v. Segment Test: ( ) 1 H 0 : s y i + 1 y i = 0 vs H A : s (same) > 0 n 1 n r i L i R Data and true mean Fused lasso estimate s= 1 Low power High power Location 2 / 0

84 Data driven model selection. 3 / 0

85 Data driven model selection. Model selection with information criteria: 3 / 0

86 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C 3 / 0

87 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C In words: model size k = ˆk can also be conditioned on! 3 / 0

88 Data driven model selection. Model selection with information criteria: IC(M k 1 ) < IC(M k ) RSS(M k 1 ) + P (M k 1 ) < RSS(M k ) + P (M k ) y T P (M k 1 )y y T P (M k )y C y T aa T y C C a T y C In words: model size k = ˆk can also be conditioned on! IC All steps Candidate steps IC selected step IC All steps Candidate steps IC selected step Step Step 3 / 0

89 1d changepoint detection. Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location Data example Spike test p values Segment test p values δ = 2 Step 1 Step 2 Observed Step 1 Step 2 Observed Expected Expected Location 4 / 0

90 Saturated model inference Saturated model inference 4 is regarding the conditional distribution Some intuition: v T Y P V, P where Y N (θ, Σ) 1. Parametrize Gaussian with (µ, η) = (v T θ, P v θ), whose sufficient statistics are (T, U) = (v T θ, P v yθ). 2. P v θ is nuisance parameter; in exponential family distributions, T U allows for the formation of a pivot. i.e. one that only depends only on µ, not on η. 3. The conditioning on P v y plays the role of restricting our inference quantity to a line, and a test statistic based on quantiles of (*) is just measuring the deviation of y to either direction of the line (amount of v T y, within the intersecting points of the polyhedron, and away from the hypothesized v T θ 0 (often zero). 4 Borrowing terminology from Fithian et al. (2014, 2016) 5 / 0

91 1d changepoint detection Data A B Estimate Changepoint Segment contrast p value = p value = Data A B Estimate Changepoint Spike contrast p value = p value = Figure: v = (0 0 1 n 1 Location Segment test. 1 1, n 1 n r 1 n r 0) T Figure: Location Spike test. v = ( ) T 6 / 0

92 Data example Spike test p values Segment test p values Data Mean Spike contrast Segment contrast δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 Observed Expected Expected Location Data example Spike test p values Segment test p values δ = 2 Step 1 Step 2 Observed Step 1 Step 2 Observed Expected Expected Location 7 / 0

93 Applications to various setting. 8 / 0

94 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering 8 / 0

95 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering Data example Data Mean δ = 5 δ = 2 δ = 1 δ = Location 8 / 0

96 Applications to various setting. Knot detection with linear (1st order polynomial) trend filtering Data example Segment test p values Data Mean δ = 5 δ = 2 δ = 1 δ = 0 Observed δ = 0 δ = 1 δ = 2 δ = Location Expected 8 / 0

97 Applications to various setting. 2d image 8 / 0

98 Applications to various setting. 2d image Mean (δ = 3) / 0

99 Applications to various setting. 2d image Data example (δ = 3) / 0

100 Applications to various setting. 2d image Segment test p values Data example (δ = 3) Observed Expected δ = 0 δ = 3 δ = 5 8 / 0

101 Applications to various setting. General graph 8 / 0

102 Applications to various setting. General graph Initial graph Group 1 (mean = 0) Step 46 Step 47 Group 2 mean = 1) p value = Group 3 (mean = 3) 8 / 0

103 Applications to various setting. Changepoints in linear regression coefficients. 8 / 0

104 Applications to various setting. Changepoints in linear regression coefficients. Stock price stock 1 stock 3 stock Trading day (2015) 8 / 0

105 Applications to various setting. Changepoints in linear regression coefficients. Stock price stock 1 stock 3 stock Trading day (2015) 1st coefficient rd coefficient nd coefficient A B A B p value = 1.46e 05 p value = 3.46e 08 Estimate Truth A A p value = Estimate Truth Location A B A B p value = p value = 1.81e 05 Original changepoints Decluttered changepoints Estimate Truth / 0

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,

More information

Selective Sequential Model Selection

Selective Sequential Model Selection Selective Sequential Model Selection William Fithian, Jonathan Taylor, Robert Tibshirani, and Ryan J. Tibshirani August 8, 2017 Abstract Many model selection algorithms produce a path of fits specifying

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Exact Post-Selection Inference for the Generalized Lasso Path

Exact Post-Selection Inference for the Generalized Lasso Path Exact Post-Selection Inference for the Generalized Lasso Path Sangwon Hyun Max G Sell Ryan J. Tibshirani Abstract We study tools for inference conditioned on model selection events that are defined by

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

Wild Binary Segmentation for multiple change-point detection

Wild Binary Segmentation for multiple change-point detection for multiple change-point detection Piotr Fryzlewicz p.fryzlewicz@lse.ac.uk Department of Statistics, London School of Economics, UK Isaac Newton Institute, 14 January 2014 Segmentation in a simple function

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Financial Time Series: Changepoints, structural breaks, segmentations and other stories.

Financial Time Series: Changepoints, structural breaks, segmentations and other stories. Financial Time Series: Changepoints, structural breaks, segmentations and other stories. City Lecture hosted by NAG in partnership with CQF Institute and Fitch Learning Rebecca Killick r.killick@lancs.ac.uk

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

arxiv: v1 [math.st] 1 Dec 2014

arxiv: v1 [math.st] 1 Dec 2014 HOW TO MONITOR AND MITIGATE STAIR-CASING IN L TREND FILTERING Cristian R. Rojas and Bo Wahlberg Department of Automatic Control and ACCESS Linnaeus Centre School of Electrical Engineering, KTH Royal Institute

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Recent advances in multiple change-point detection

Recent advances in multiple change-point detection Recent advances in multiple change-point detection London School of Economics, UK Vienna University of Economics and Business, June 2017 Introduction nonparametric estimators as algorithms Estimators formulated

More information

On Post-selection Confidence Intervals in Linear Regression

On Post-selection Confidence Intervals in Linear Regression Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Spring 5-2017 On Post-selection Confidence Intervals in Linear

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Hypes and Other Important Developments in Statistics

Hypes and Other Important Developments in Statistics Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p

More information

A Brief Introduction to the R package StepSignalMargiLike

A Brief Introduction to the R package StepSignalMargiLike A Brief Introduction to the R package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the R package StepSignalMargiLike for estimating change-points in stepwise

More information

Introduction to the genlasso package

Introduction to the genlasso package Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Multiscale and multilevel technique for consistent segmentation of nonstationary time series

Multiscale and multilevel technique for consistent segmentation of nonstationary time series Multiscale and multilevel technique for consistent segmentation of nonstationary time series Haeran Cho Piotr Fryzlewicz University of Bristol London School of Economics INSPIRE 2009 Imperial College London

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

A Brief Introduction to the Matlab package StepSignalMargiLike

A Brief Introduction to the Matlab package StepSignalMargiLike A Brief Introduction to the Matlab package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the Matlab package for estimating change-points in stepwise signals.

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model

Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model LETTER Communicated by Ilya M. Nemenman Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model Charles K. Fisher charleskennethfisher@gmail.com Pankaj Mehta pankajm@bu.edu

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 17 May 11, 2018

Lecture 17 May 11, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

A segmentation-clustering problem for the analysis of array CGH data

A segmentation-clustering problem for the analysis of array CGH data A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E. Lebarbier, J-J. Daudin UMR INA P-G / ENGREF / INRA MIA 518 APPLIED STOCHASTIC MODELS AND DATA ANALYSIS Brest

More information

Optimal Covariance Change Point Detection in High Dimension

Optimal Covariance Change Point Detection in High Dimension Optimal Covariance Change Point Detection in High Dimension Joint work with Daren Wang and Alessandro Rinaldo, CMU Yi Yu School of Mathematics, University of Bristol Outline Review of change point detection

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Bayesian Regression of Piecewise Constant Functions

Bayesian Regression of Piecewise Constant Functions Marcus Hutter - 1 - Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA,

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Using Multiple Kernel-based Regularization for Linear System Identification

Using Multiple Kernel-based Regularization for Linear System Identification Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

arxiv: v5 [stat.me] 11 Oct 2015

arxiv: v5 [stat.me] 11 Oct 2015 Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,

More information