Bootstrap, Jackknife and other resampling methods

Similar documents
Bootstrap, Jackknife and other resampling methods

PDEEC Machine Learning 2016/17

Machine Learning for OR & FE

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Machine Learning. Ensemble Methods. Manfred Huber

A Bias Correction for the Minimum Error Rate in Cross-validation

Proteomics and Variable Selection

Bootstrap & Confidence/Prediction intervals

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Resampling and the Bootstrap

Constructing Prediction Intervals for Random Forests

How do we compare the relative performance among competing models?

MS&E 226: Small Data

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Performance Evaluation and Comparison

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Chapter ML:II (continued)

CS7267 MACHINE LEARNING

Data Mining und Maschinelles Lernen

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

High-dimensional regression modeling

A better way to bootstrap pairs

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Model comparison and selection

Introduction to Statistical modeling: handout for Math 489/583

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Chapter 7: Model Assessment and Selection

Scatter plot of data from the study. Linear Regression

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

BAGGING PREDICTORS AND RANDOM FOREST

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Regression, Ridge Regression, Lasso

MS-C1620 Statistical inference

Scatter plot of data from the study. Linear Regression

Statistical Machine Learning from Data

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Machine learning - HT Basis Expansion, Regularization, Validation

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Statistical View of Least Squares

Chapter 7. Data Partitioning. 7.1 Introduction

Minimum Description Length (MDL)

Correction for Tuning Bias in Resampling Based Error Rate Estimation

Terminology for Statistical Data

Dimension Reduction Methods

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

Bias-Variance Tradeoff

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Learning with multiple models. Boosting.

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Machine Learning Linear Classification. Prof. Matteo Matteucci

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

The Nonparametric Bootstrap

Hypothesis Evaluation

Lecture 13: Ensemble Methods

Introduction to Machine Learning and Cross-Validation

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lossless Online Bayesian Bagging

Weighted Least Squares

Lecture 14 Simple Linear Regression

TDT4173 Machine Learning

Cross Validation & Ensembling

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..

Comparing Learning Methods for Classification

the logic of parametric tests

Diagnostics. Gad Kimmel

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Opening Theme: Flexibility vs. Stability

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Algorithm Independent Topics Lecture 6

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Resampling techniques for statistical modeling

Bootstrap for model selection: linear approximation of the optimism

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Weighted Least Squares

Specification Errors, Measurement Errors, Confounding

Linear and Logistic Regression. Dr. Xiaowei Huang

A Significance Test for the Lasso

Bootstrapping, Randomization, 2B-PLS

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Resampling and the Bootstrap

Performance of Cross Validation in Tree-Based Models

COMPARING LEARNING METHODS FOR CLASSIFICATION

Stochastic Gradient Descent

Least Squares Regression

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Transcription:

Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern statistical methods 2005 1 / 20

Type of resampling Randomization exact test (or permutation test) developed by R. A. Fisher (1935/1960), the founder of classical statistical testing. Jackknife invented by Maurice Quenouille (1949) and later developed by John W. Tukey (1958). Bootstrap invented by Bradley Efron (1979, 1981, 1982) and further developed by Efron and Tibshirani (1993). Cross-validation R. Dahyot (TCD) 453 Modern statistical methods 2005 2 / 20

Resampling Definition Resampling means that the inference is based upon repeated sampling within the same sample. Resampling is tied to the Monte Carlo simulation. Some may distinguish both in that resampling consider all possible replications (permutation test, jackknife) and the Monte-Carlo sampling restrict the resampling to a certain number of replications (bootstrap, randomisation test). Another difference in between resampling methods is the nature of the resamples: computed with or without replacement. R. Dahyot (TCD) 453 Modern statistical methods 2005 3 / 20

Introduction to cross-validation Cross-validation was proposed by Kurtz (1948-simple cross-validation), extended by Mosier (1951-double cross-validation) and by Krus and Fuller (1982- multicross-validation). The original objective of cross-validation is to verify replicability of results. Similarly with hypothesis testing, the goal is to find out if the result is replicable or just a matter of random. R. Dahyot (TCD) 453 Modern statistical methods 2005 4 / 20

prediction error Definition A prediction error measures how well a model predicts the response value of a future observation. It is often used for model selection. It is defined as: the expected square difference between a future response and the prediction from the model in regression models: E(y ŷ) the probability of incorrect classification in classification problem: P(y ŷ) R. Dahyot (TCD) 453 Modern statistical methods 2005 5 / 20

The linear regression model We have a set of (2-dimensional) points z = {(x i, y i )} i=1,,n. An unknow relation is linking y i to x i such as: y i = β(x i ) + ɛ i = p q=0 β q x q i + ɛ i The error terms ɛ i are assumed to be random sample from a random distribution having expectation 0 and variance σ 2. R. Dahyot (TCD) 453 Modern statistical methods 2005 6 / 20

The Model Selection Problem 35 30 25 20 y 15 10 5 0 1 2 3 4 5 6 7 8 9 10 x Figure: Exemple of data set ( In this example (β 0, β 1, β 2 ) = (1, 3, 0), σ = 3 and n = 10). R. Dahyot (TCD) 453 Modern statistical methods 2005 7 / 20

The Model Selection Problem 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Model A: ŷ = ˆβ 1 x + ˆβ 0 Model B: ŷ = ˆβ 2 x 2 + ˆβ 1 x + ˆβ 0 Figure: The data points (blue cross) fitted assuming a linear model (left), and a quadratic model (right). Regression estimates are ( ˆβ 0, ˆβ 1 ) = ( 0.9802, 3.1866) for model A, and ( ˆβ 0, ˆβ 1, ˆβ 2 ) = (4.2380, 0.8875, 0.1997) for model 2. Which model is the best? R. Dahyot (TCD) 453 Modern statistical methods 2005 8 / 20

The Model Selection Problem Which model is the best? = How well are you going to predict future data drawn from the same distribution? As a prediction error, we can compute the average Residual Squared Error (RSE): PE = RSE n i=1 = (y i ŷ i ) 2 n n where ŷ i is the prediction by the model ŷ i = ˆβ(x i ). To start, y i is taken in the sample z (instead of a future response). In this case, the prediction error is then called the apparent prediction error (APE) R. Dahyot (TCD) 453 Modern statistical methods 2005 9 / 20

The Model Selection Problem 35 30 25 20 15 (x i,y ^ i ) y i y ^ i =ε i (x i,y i ) 10 5 0 1 2 3 4 5 6 7 8 9 10 R. Dahyot (TCD) 453 Modern statistical methods 2005 10 / 20

The Model Selection Problem For our two models: For model A, the apparent prediction error is 7.1240. For model B, the apparent prediction error is 5.8636 The apparent prediction error computed with model B is better (smaller) than with the model A. Note that the family of functions in B contains the ones defined in A. So B can capture more variabilities in the data set z, and can fit (in the sense of the APE) better the observation z. But this is an apparent prediction error. What is the prediction error for a new observation? How to get a new sample to compute this prediction? R. Dahyot (TCD) 453 Modern statistical methods 2005 11 / 20

Cross-validation For a more realistic estimate of the prediction error, we need a test sample different from the training sample. One way is to split the observation z into two sets, one for training and the other one for testing. Using a part of the available observation to fit the model, and a different part to test in the computation of predication error is known as the cross-validation. R. Dahyot (TCD) 453 Modern statistical methods 2005 12 / 20

Simple and double cross-validation Simple cross-validation. Divide the set z into two groups, one for training and the other one for testing. The parameters β are estimated on the training data set. The cross-validation is the prediction error computed using the test sample. Double cross-validation. Models are generated for both sub-samples, and then both equations are used to generate cross-validation. R. Dahyot (TCD) 453 Modern statistical methods 2005 13 / 20

K-Fold cross-validation 1 Split z into K equal subsets z k 2 Do K times: Estimate the model β (k) on z (k) = {z 1,, z k 1, z k+1,, z K } Compute the prediction error PE(k) between the test sample zk and the predicted model by β (k). 3 Compute the average of those K prediction errors as the overall estimate of the prediction error CV = 1 K K PE(k) k=1 R. Dahyot (TCD) 453 Modern statistical methods 2005 14 / 20

K-Fold cross-validation The number K of subsets depend on the size n of the observation z. For large datasets, even 3-Fold Cross Validation will be quite accurate. For small datasets, we may have to use leave-one-out cross validation where K = n. Multicross-validation is an extension of double cross-validation. Double cross-validation procedures are repeated many times by randomly selecting sub-samples from the data set. R. Dahyot (TCD) 453 Modern statistical methods 2005 15 / 20

Cross-validation and parameter selection We just show how to perform model selection (i.e. choice of the number of parameters p in the regression model). Cross-validation can also be used for parameter estimation by choosing the parameter value which minimises the prediction error. The cross-validation is computed using subsamples of z. An alternative consists in considering bootstrap samples. R. Dahyot (TCD) 453 Modern statistical methods 2005 16 / 20

Bootstrap estimates of the prediction error 1 Compute B boostrap resamples z from the observation z 2 Compute the model parameter β (b) from z, and the corresponding prediction error PE(b) with the testing observation being the original sample z. 3 Compute the average of those B prediction errors as the overall estimate of the prediction error CV boot = 1 B B PE(b) b=1 R. Dahyot (TCD) 453 Modern statistical methods 2005 17 / 20

Bootstrap estimates of the prediction error This bootstrap extension to cross-validation turns out to not work very well. But it can be improved. Keep the record of the results from the previous procedure. Run a second experiment but choosing the bootstrap sample z itself as a test sample. Compute the difference of the two previous results (this difference is called optimism). This optimism is then added to the APE ( as a bias correction). Which is best between CV and bootstrap alternatives is not really clear. R. Dahyot (TCD) 453 Modern statistical methods 2005 18 / 20

Other estimates of the prediction error So far, we have define the PE using the average RSE. This is the prediction error measure used in cross-validation. One can think of using other measures such as the Bayesian Information Criterion (BIC) RSE n + log n pˆσ 2 /n BIC penalises the model as the number of parameter p increases. It is a consistent criterion i.e. it chooses the good model as n. However, two drawbacks of the BIC compared to CV: you need an estimate ˆσ. you need the knowledge of p. R. Dahyot (TCD) 453 Modern statistical methods 2005 19 / 20

Summary Cross validation method applied to selection model (regression). Other applications such as classification use the cross-validation. In this case, y i is a label indicating the class. The prediction error is defined as a misclassification rate. Those methods are very much used in machine learning. R. Dahyot (TCD) 453 Modern statistical methods 2005 20 / 20