Distribution-free inference for estimation and classification

Size: px

Start display at page:

Download "Distribution-free inference for estimation and classification"

Branden Crawford
5 years ago
Views:

1 Distribution-free inference for estimation and classification Rina Foygel Barber (joint work with Fan Yang)

2 Inference without assumptions? Training data (X 1, Y 1 ),..., (X n, Y n ) Fit a parametric model, e.g. Y β 1 X (1) + + β d X (d) & then build confidence intervals for E [Y X = x] or prediction intervals for Y X = x Assume an incorrect model Underestimate uncertainty Distribution-free inference w/o assuming model is correct 2/22

3 Prediction via conformal inference Inspiration: Distribution-Free Predictive Inference For Regression Lei, G Sell, Rinaldo, Tibshirani, Wasserman (2016) Exchangeable data: (X 1, Y 1 ),..., (X n, Y n ), (X n+1,???) Prediction interval for Y n+1 under no assumptions, using any regression method Coverage holds under exchangeability of the training data & test point 3/22

4 Inference for estimation & classification If E [Y X] = P (X): Fitted regression function P (X) If Y is binary, then prediction interval is meaningless always given by {0, 1} More generally, if noise is high, prediction interval is very wide Can we build a confidence interval for P (X), with no assumptions? 4/22

5 Inference for estimation & classification Our plan: 1 Given estimate P (x), and fresh data (X 1, Y 1 ),..., (X n, Y n ), how can we build a confidence band for P (x)? 2 How can we use a single data set to both construct P (x), and build confidence band? 5/22

6 Confidence band for P (x) Given estimate P (x), and fresh data (X 1, Y 1 ),..., (X n, Y n ), how can we build a confidence band for P (x)? 6/22

7 Confidence band for P (x) Given estimate P (x), and fresh data (X 1, Y 1 ),..., (X n, Y n ), how can we build a confidence band for P (x)? WLOG reorder indices so that P (X 1 ) P (X n ) Fitted prob True prob True prob (smoothed) Prob { Y=1 X } Index (sorted) 6/22

8 Isotonic regression 1 Calibration via isotonic regression: { } ( p 1,..., p n ) = arg min (Y i p i ) 2 : p 1 p n 2 Best possible outcome: { } ( p 1,..., p n ) = arg min (P (X i ) p i ) 2 : p 1 p n 3 Build confidence band for ( p 1,..., p n )? i i 7/22

9 Isotonic regression Fitted prob Calibrated estimate True prob True prob (isotonic) Prob { Y=1 X } Index (sorted) 8/22

10 Isotonic regression Known results: if Y i = P (X i ) + subgaussian noise, p i p i n 1/3 or n 1/2, with exponent depending on local properties of the true P (X i ) s. If P (X i ) is locally strictly increasing, n 1/3 rate If P (X i ) is locally constant, n 1/2 rate Chatterjee et al 2015; Cator 2011; many others Our goal: construct a data-adaptive bound on p i p i, that does not depend on knowing properties of the true means 9/22

11 A geometric approach Isotonic regression = projection: p = P iso (Y ), p = P iso (P (X)), where P iso is projection to the isotonic cone, {t : t 1 t n }. Convex projection p p 2 2 Y P (X) 2 2 Y P (X) 2 2 n, so at best, confidence interval width 1 10/22

12 A geometric approach Can we use a different norm? Theorem: contraction via isotonic projection For any norm, P iso (u) P iso (v) u v for all u, v if and only if is nonincreasing under neighbor averaging, ( u u 1,..., u i 1, u i + u i+1, u ) i + u i+1, u i+1,..., u n u /22

13 A geometric approach Sliding window norm: u SW = max j i + 1 ui:j 1 i j n If Y i = P (X i ) + subgaussian noise, j i + 1 (Y P (X))i:j is subgaussian Y P (X) SW log(n). Since SW is contractive by our theorem, p p SW log(n) 12/22

14 Data-adaptive bands Know: p p SW log(n), and p, p are both monotonic. log(n) p i p i:j p i:j + for any j i j i + 1 by taking a minimum over all j, we find a data-adaptive bound If p i is locally strictly increasing minimum achieved at j i n 2/3, with n 1/3 rate If p i is locally constant minimum achieved at j i n, with n 1/2 rate 13/22

15 Data-adaptive bands Index i Data y i Estimate iso(y) i Confidence band 14/22

16 Data-adaptive bands Convergence in flat regions log( n/log(n) ) log( mean confidence band width ) Least squares regression line (Slope = ) Convergence in increasing region log( n/log(n) ) log( mean confidence band width ) Least squares regression line (Slope = ) 15/22

17 Data-adaptive bands Summary new features of our method: Data-adaptive band, don t need to know properties of P (X i ) Do need a bound σ on noise level for binary data, use σ = 1, otherwise can estimate with σ 2 = Y p 2 2 n ( effective d.f. ) (Meyer & Woodroofe 2000) Confidence band contains the entire function too conservative at any single point 16/22

18 Reusing data classification Goal: using a single data set, Estimate a regression function, P (x) P (x) = E [Y X = x] And, build a confidence band containing the true P (x) (at most points x) This is a selective inference problem choosing model, & assessing its accuracy, using a single data set 17/22

19 Reusing data classification How can we use a single data set to both construct P (x), and build confidence band? 1 Randomize the data by flipping some of the Y i s: Y i Y i, with probability 1 φ, = 1 Y i, with probability φ flip probability. 18/22

20 Reusing data classification How can we use a single data set to both construct P (x), and build confidence band? 2 Fit regression function P (x) using blurred data (X 1, Y 1 ),..., (X n, Y n) Distribution of Y X: P {Y i = 1 X i } = P (X i ) (1 φ) + (1 P (X i )) φ 19/22

21 Reusing data classification How can we use a single data set to both construct P (x), and build confidence band? 3 Then perform inference for P (X) by revealing original data Y : Distribution of Y X, Y : P {Y i = 1 X i, Y i P (X i ) (1 φ) = 1} = P (X i ) (1 φ) + (1 P (X i )) φ P {Y i = 1 X i, Y i P (X i ) φ = 0} = P (X i ) φ + (1 P (X i )) (1 φ) 20/22

22 Summary inference for estimation Isotonic regression + fresh data distribution-free confidence bands Possible to fit model + perform inference on single data set? (Distribution-free selective inference?) 21/22

23 Thank you! Website: Preprint: Thanks to funding from NSF & Sloan Fellowship 22/22

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint