STAT 830 Non-parametric Inference Basics

Similar documents
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Chapter 2: Resampling Maarten Jansen

Bootstrap, Jackknife and other resampling methods

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bootstrap Resampling

One-Sample Numerical Data

7 Influence Functions

STAT 830 Hypothesis Testing

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

STAT 512 sp 2018 Summary Sheet

Better Bootstrap Confidence Intervals

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

The Nonparametric Bootstrap

6 The normal distribution, the central limit theorem and random samples

Economics 583: Econometric Theory I A Primer on Asymptotics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

HT Introduction. P(X i = x i ) = e λ λ x i

f (1 0.5)/n Z =

Accuracy & confidence

Section 8.1: Interval Estimation

Math 494: Mathematical Statistics

Bootstrap Confidence Intervals

Statistics: Learning models from data

Lecture 28: Asymptotic confidence sets

Practice Problems Section Problems

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

ST495: Survival Analysis: Hypothesis testing and confidence intervals

IEOR E4703: Monte-Carlo Simulation

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Lecture 7 Introduction to Statistical Decision Theory

3.3 Estimator quality, confidence sets and bootstrapping

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

SOLUTION FOR HOMEWORK 4, STAT 4352

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

Nonparametric Methods

simple if it completely specifies the density of x

Regression Estimation Least Squares and Maximum Likelihood

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Stat 710: Mathematical Statistics Lecture 31

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

Bias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017

Brief Review on Estimation Theory

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 8: Information Theory and Statistics

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Primer on statistics:

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Statistics 3858 : Maximum Likelihood Estimators

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

1. The number of eggs laid by Green Sea Turtles is approximately normal with mean 100 and standard deviation 20.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Week 9 The Central Limit Theorem and Estimation Concepts

Mathematical statistics

Chapter 6: Large Random Samples Sections

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Resampling and the Bootstrap

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Mathematical statistics

ISyE 6644 Fall 2014 Test 3 Solutions

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Bias-Variance Tradeoff

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

ECE531 Lecture 10b: Maximum Likelihood Estimation

This does not cover everything on the final. Look at the posted practice problems for other topics.

Bias Variance Trade-off

STAT100 Elementary Statistics and Probability

High-dimensional regression

STAT 285: Fall Semester Final Examination Solutions

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Lecture 2: CDF and EDF

STAT 830 Hypothesis Testing

Spring 2012 Math 541B Exam 1

Part 4: Multi-parameter and normal models

A Primer on Asymptotics

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Inference on distributions and quantiles using a finite-sample Dirichlet process

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Bootstrap metody II Kernelové Odhady Hustot

Bootstrap & Confidence/Prediction intervals

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

STAT Sample Problem: General Asymptotic Results

Unit 10: Simple Linear Regression and Correlation

Transcription:

STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 1 / 25

The Empirical Distribution Function EDF pp 97-99 Suppose we have sample X 1,...,X n of iid real valued rvs. The empirical distribution function is ˆF n (x) = 1 n n 1(X i x) i=1 This is a cdf and is an estimate of F, the cdf of the Xs. People also speak of the empirical distribution: ˆP(A) = 1 n n 1(X i A) i=1 This is the probability distribution corresponding to ˆF n. Now we consider the qualities of ˆF n as an estimate, the standard error of the estimate, the estimated standard error, confidence intervals, simultaneous confidence intervals and so on. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 2 / 25

Bias, variance and mean squared error Judge estimates in many ways; focus is distribution of error ˆθ θ. Distribution computed when θ is true value of parameter. For our non-parametric iid sampling model we are interested in ˆF ( x) F(x) when F is the true distribution function of the Xs. Simplest summary of size of a variable is root mean squared error: ] RMSE = E θ [(ˆθ θ) 2 Subscript θ on E is important specifies true value of θ and matches θ in the error! Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 3 / 25

MSE decomposition & variance-bias trade-off The MSE of any estimate is MSE = E θ [ (ˆθ θ) 2] = E θ [ (ˆθ E θ (ˆθ)+E θ (ˆθ) θ) 2] = E θ [(ˆθ E θ (ˆθ)) 2] + { } 2 E θ (ˆθ) θ In making this calculation there was a cross product term which is 0. The two terms each have names: the first is the variance of ˆθ while the second is the square of the bias. Def n: The bias of an estimator ˆθ is So our decomposition is biasˆθ (θ) = E θ(ˆθ) θ MSE = Variance+(bias) 2. In practice often find a trade-off. Trying to make the variance small increases the bias. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 4 / 25

Applied to the EDF The EDF is an unbiased estimate of F. That is: E[ˆF n (x)] = 1 n E[1(X i x)] n = 1 n so the bias is 0. The mean squared error is then i1= n F(x) = F(x) i=1 MSE = Var(ˆF n (x)) = 1 n n 2 Var[1(X i x)] = 1 n F(x)[1 F(x)]. i=1 This is very much the most common situation: the MSE is proportional to 1/n in large samples. So the RMSE is proportional to 1/ n. RMSE is measured in same units as ˆθ so is scientifically right. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 5 / 25

Biased estimates Many estimates exactly or approximately averages or ftns of averages. So, for example, X = 1 n X i and X2 = 1 n X2 i are unbiased estimates of E(X) and E(X 2 ). We might combine these to get a natural estimate of σ 2 : This estimate is biased: ˆσ 2 = X 2 X 2 E [ ( X) 2] = Var( X)+ [ E( X) ] 2 = σ 2 /n+µ 2. So the bias of ˆσ 2 is ] E[ X2 E [ ( X) 2] σ 2 = µ 2 µ2 σ 2 /n σ 2 = σ 2 /n. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 6 / 25

Relative sizes of bias and variance In this case and many others the bias is proportional to 1/n. The variance is proportional to 1/n. The squared bias is proportional to 1/n 2. So in large samples the variance is more important! The biased estimate ˆσ 2 is traditionally changed to the usual sample variance s 2 = nˆσ 2 /(n 1) to remove the bias. WARNING: the MSE of s 2 is larger than that of ˆσ 2. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 7 / 25

Standard Errors and Interval Estimation In any case point estimation is a silly exercise. Assessment of likely size of error of estimate is essential. A confidence interval is one way to provide that assessment. The most common kind is approximate: estimate ± 2 estimated standard error This is an interval of values L(X) < parameter < U(X) where U and L are random. Justification for the two se interval above? Notation ˆφ is the estimate of φ. ˆσˆφ is the estimated standard error. Use central limit theorem, delta method, Slutsky s theorem to prove ) (ˆφ φ lim P F x = Φ(x) n ˆσˆφ Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 8 / 25

Pointwise limits for F(x) Define, as usual z α by Φ(z α ) = 1 α and approximate ( P F z α/2 ˆφ φ ) z α/2 1 α. Solve inequalities to get usual interval. Now we apply this to φ = F(x) for one fixed x. Our estimate is ˆφ ˆF n (x). The random variable nˆφ has a Binomial distribution. So Var( ˆF n (x)) = F(x)(1 F(x))/n. The standard error is ˆσˆφ σˆφ σˆfn(x) SE F(x)[1 F(x)] n. According to the central limit theorem ˆF n (x) F(x) σˆf n(x) d N(0,1) See homework to turn this into a confidence interval. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830 Fall 2012 9 / 25

Plug-in estimates Now to estimate the standard error. It is easier to solve the inequality ˆF n (x) F(x) SE z α/2 if the term SE does not contain the unknown quantity F(x). This is why we use an estimated standard error. In our example we will estimate F(x)[1 F(x)]/n by replacing F(x) by ˆF n (x): ˆF n (x)[1 ˆσ Fn(x) = ˆF n (x). n This is an example of a general strategy: plug-in. Start with estimator, confidence interval or test whose formula depends on other parameter; plug-in estimate of that other parameter. Sometimes the method changes the behaviour of our procedure and sometimes, at least in large samples, it doesn t. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 10 / 25

Pointwise versus Simultaneous Confidence Limits In our example Slutsky s theorem shows ˆF n (x) F(x) d N(0,1). ˆσ Fn(x) So there was no change in the limit law (alternative jargon for distribution). We now have two pointwise 95% confidence intervals: or ˆF n (x)±z 0.025 ˆF n (x)[1 ˆF n (x)]/n n(ˆfn (x) F(x)) {F(x) : z 0.025 } F(x)[1 F(x)] When we use these intervals they depend on x. And we usually look at a plot of the results against x. If we pick out an x for which the confidence interval is surprising to us we may well be picking one of the x values for which the confidence interval misses its target. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 11 / 25

Simultaneous intervals So we really want P F (L(X,x) F(x) U(X,x) for all x) 1 α. In that case the confidence intervals are called simultaneous. Two possible methods: one exact, but conservative, one approximate, less conservative. Dvoretsky-Kiefer-Wolfowitz inequality: log(α/2) P F ( x : ˆF n (x) F(x) > ) α 2n Limit theory: P F ( x : n ˆF n (x) F(x) > y) P( x : B 0 (x) > y) where B 0 is a Brownian Bridge (special Gaussian process). Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 12 / 25

Statistical Functionals Not all parameters are created equal. In the Weibull model density f(x;α,β) = 1 ( ) x α 1 exp{ (x/β) α }1(x > 0). β β there are two parameters: shape α and scale β. These parameters have no meaning in other densities. But every distribution has a median and other quantiles: p th -quantile = inf{x : F(x) p}. If r is bounded ftn then every distribution has value for parameter φ E F (r(x)) r(x)df(x). Most distributions have a mean, variance and so on. A ftn from set of all cdfs to real line is called a statistical functional. Example: E F (X 2 ) [E F (X)] 2. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 13 / 25

Statistical functionals The statistical functional T(F) = r(x)df(x) is linear. The sample variance is not a linear functional. Statistical functionals are often estimated using plug-in estimates so ˆ T(F) = r(x)d ˆF n (x) = 1 n This estimate is unbiased and has variance [ r (x)df(x) { 2 σ 2 ˆ T(F) = n 1 n r(x i ). 1 } ] 2 r(x)df(x). This can in turn be estimated using a plug-in estimate: [ } ] 2 ˆσ 2 T(F) ˆ = n 1 r 2 (x)d ˆF n (x) { r(x)d ˆF n (x). Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 14 / 25

Plug-in estimates of functionals; bootstrap standard errors When r(x) = x we have T(T) = µ F (the mean) The standard error is σ/ n. Plug-in estimate of SE replaces σ with sample SD (with n not n 1 as the divisor). Now consider a general functional T(F). The plug-in estimate of this is T(ˆF n ). The plug-in estimate of the standard error of this estimate is VarˆFn (T(ˆF n )). which is hard to read and seems hard to calculate in general. The solution is to simulate, particularly to estimate the standard error. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 15 / 25

Basic Monte Carlo To compute a probability or expected value can simulate. Example: To compute P( X > 2) use software to generate some number, say M, of replicates: X1,...,X M all having same distribution as X. Estimate desired probability using sample fraction. R code: x=rnorm(1000000) ; y =rep(0,1000000); y[abs(x) >2] =1 ; sum(y) Produced 45348 when I tried it. Gives ˆp = 0.045348. Correct answer is 0.04550026. Using a million samples gave 2 correct digits, error of 2 in third digit. Using M = 10000 is more common. I got ˆp = 0.0484. SE of ˆp is p(1 p)/100 = 0.0021. So error of up to 4 in second significant digit is likely. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 16 / 25

The bootstrap In bootstrapping X is replaced by the whole data set. Generate new data sets (X ) from distribution F of X. Don t know F so use ˆF n. Example: Interested in distribution of t pivot: n( X µ) t =. s Have data X 1,...,X n. Don t know µ or cdf of Xs. Replace these by quantities computed from ˆF n. Call µ = xd ˆF n (x) = X. Draw X1,1,...,X 1,n an iid sample from the cdf ˆF. Repeat M times computing t from * values each time. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 17 / 25

Bootstrapping the t pivot Here is R code: x=runif(5) mustar = mean(x) tv=rep(0,m) tstarv=rep(0,m) for( i in 1:M){ xn=runif(5) tv[i]=sqrt(5)*mean(xn-0.5)/sqrt(var(xn)) xstar=sample(x,5,replace=true) tstarv[i]=sqrt(5)*mean(xstar-mustar)/sqrt(var(xstar)) } Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 18 / 25

Bootstrapping a pivot continued Loop does two simulations. in xn and tv we do parametric bootstrapping: simulate t-pivot from parametric model. xstar is bootstrap sample from population x. tstarv is t-pivot computed from xstar. Original data set is (0.7432447, 0.8355277, 0.8502119, 0.3499080, 0.8229354) So mustar =0.7203655 Side by side histrograms of tv and tstarv on next slide. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 19 / 25

Bootstrap distribution histograms Density 0.0 0.1 0.2 0.3 0.4 Density 0.0 0.1 0.2 0.3 0.4 20 0 10 20 tv 20 0 10 20 tstarv Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 20 / 25

Using the bootstrap distribution Confidence intervals: based on t-statistic: T = n( X µ)/s. Use the bootstrap distribution to estimate P( T > t). Adjust t to make this 0.05. Call result c. Solve T < c to get interval X ±cs/ n. Get c = 22.04, x = 0.720, s = 0.211; interval is -1.36 to 2.802. Pretty lousy interval. Is this because it is a bad idea? Repeat but simulate X µ. Learn P( X µ < 0.192) = 0.025 = P( X µ > 0.119) Solve inequalities to get (much better) interval 0.720 0.119 < µ < 0.720+0.192 Of course the interval missed the true value! Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 21 / 25

Monte Carlo Study So how well do these methods work? Theoretical analysis: let C n be resulting interval. Assume number of bootstrap reps is so large that we can ignore simulation error. Compute lim P F(µ(F) C n ) n Method is asymptotically valid (or calibrated or accurate) if this limit is 1 α. Simulation analysis: generate many data sets of size 5 from Uniform. Then bootstrap each data set, compute C n. Count up number of simulated uniform data sets with 0.5 C n to get coverage probability. Repeat with (many) other distributions. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 22 / 25

R code tstarint = function(x,m=10000){ n = length(x) must=mean(x) se=sqrt(var(x)/n) xn=matrix(sample(x,n*m,replace=t),nrow=m) one = rep(1,n)/n dev= xn%*%one - must tst=dev/sqrt(diag(var(t(xn)))/n) c1=quantile(dev,c(0.025,0.975)) c2=quantile(abs(tst),0.95) c(must-c1[2],must-c1[1], must -c2*se,must+c2*se) } Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 23 / 25

R code lims=matrix(0,1000,4) count=lims for(i in 1:1000){ x=runif(5) lims[i,]=tstarint(x) } count[,1][lims[,1]<0.5]=1 count[,2][lims[,2]>0.5]=1 count[,3][lims[,3]<0.5]=1 count[,4][lims[,4]>0.5]=1 sum(count[,1]*count[,2]) sum(count[,3]*count[,4]) Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 24 / 25

Results 804 out of 1000 intervals based on X µ cover the true value of 0.5. 972 out of 1000 intervals based on t statistics cover true value. This is the uniform distribution. Try another distribution. For exponential I get 909, 948. Try another sample size. For uniform n = 25 I got 921, 941. Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference BasicsSTAT 801=830 Fall 2012 25 / 25