STAT440/840: Statistical Computing

Size: px
Start display at page:

Download "STAT440/840: Statistical Computing"

Transcription

1 First Prev Next Last STAT440/840: Statistical Computing Paul Marriott MC 6096 February 2, 2005 Page 1 of 41

2 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the bootstrap Suppose we are interested in a quantity θ θ(f ) that depends on the distribution function F, and that we have available an independent random sample where each For example θ might be X = (X 1,..., X n ) X i F the mean in which case θ = x df (x), the median, θ = F 1 (1/2) the variance θ = (x x df (x)) 2 df (x)

3 First Prev Next Last Estimate of θ From the observed data, we have the single value ˆθ n = ˆθ n (X) as an estimate of θ. For example ˆθ n might be Page 3 of 41 the sample mean in which case ˆθ n = x i /n, the sample median, the sample variance The question arises then, how accurate is our single value ˆθ n as an estimate of θ?

4 First Prev Next Last Idealised Monte Carlo Page 4 of Generate where Y j = (Y j1,..., Y jn ) Y ji F ( ) for i = 1,..., n and j = 1,..., N. 2. Calculate for each j = 1,..., N. ˆθ n (Y j ) 3. Estimate the variability using the N samples The thing that prevents us getting started with this scheme, of course, is that we cannot generate Y because the distribution function F is not known.

5 First Prev Next Last Idealised Monte-Carlo A way of getting round this is to approximate the unknown distribution function F by another distribution function, F where F is obtained from the observed data, We can then proceed as above but generating from F instead of F. This is the approach that underpins the bootstrap method as a computational technique. Page 5 of 41

6 First Prev Next Last Empirical distribution function Page 6 of 41 The empirical distribution function is defined by ˆF (x) = #(X 1 x,..., X n x) n n = n 1 I(X i x). i=1 This puts a mass of probability 1/n at each point in the sample Sampling from ˆF means drawing X i from the sample with probability 1/n

7 First Prev Next Last The non parametric bootstrap Page 7 of 41 We are interested in the distribution of ˆθ, but we cannot work this out because we do not know F. However ˆF (x) converges to F (x) in the limit as the sample size n. Thus provided n is large enough then ˆF ( ) may be a good approximant of F ( ). The key step here is that the unknown distribution function F ( ) is replaced by the (known) empirical distribution function ˆF ( ).

8 First Prev Next Last The approximations Page 8 of 41 The unknown distribution function F ( ) is approximated by the empirical distribution function ˆF ( ). The exact distribution of ˆθ given ˆF is called the ideal bootstrap distribution of θ n, and often is difficult to calculate. The approximate behaviour of the random variable θ( ˆF n ) may be examined through simulation. We generate N independent samples of size n from ˆF ( ), and evaluate the sample value of ˆθ for each of these N samples. Clearly, the larger the value of N then the closer this empirical distribution and the ideal bootstrap distribution of θ n will be.

9 First Prev Next Last Sampling from the empirical distribution function For the nonparametric bootstrap: 1. Generate independent integers i 1,..., i n from the discrete uniform distribution on {1, 2,..., n} Page 9 of Set X = (X i1, X i2,..., X in ) Then X is a sample of size n from ˆF ( ). This scheme is equivalent to sampling n values randomly with replacement from the observed data X 1,..., X n.

10 First Prev Next Last Page 10 of 41 The nonparametric bootstrap algorithm The nonparametric bootstrap algorithm proceeds as follows: Generate N independent bootstrap samples X 1, X 2,..., X N each consisting of n data values drawn randomly with replacement from the original data X. Evaluate the sample value of θ corresponding to each bootstrap sample: ˆθ i n = θ(x i ) for i = 1,..., N. These are called bootstrap replications of θ n.

11 First Prev Next Last Output Page 11 of 41 Bootstrap estimate of standard error of θ n. This is defined as the sample standard error of the N bootstrap replications, i.e. { where ŝe N = (N 1) 1 N i=1 ˆθ n = N 1 (ˆθ i n N i=1 ˆθ i n. } ) 1/2 2 ˆθ n The standard error is sometimes called the standard deviation. The value of ŝe N converges to the standard error of θ( ˆF n ) as N. This limiting value is (of course) the standard error of the ideal bootstrap distribution of θ n.

12 First Prev Next Last Output Page 12 of 41 Bootstrap estimate of bias of θ n. The bias of θ n as an estimator of θ is defined to be E{ˆθ} θ(f ), where θ(f ) means the theoretical value of θ under the true model F. The bootstrap estimate of bias based on N bootstrap replications is then bias N = ˆθ n ˆθ n where ˆθ n = N 1 N i=1 ˆθ i n

13 First Prev Next Last Output Page 13 of 41 Bootstrap based confidence intervals. Instead of using the bootstrap standard error as a measure of precision, we may use the bootstrap to construct approximate confidence intervals. (However, N typically needs to be much larger in this case.) Let θ (1) < < θ (N) denote the ordered bootstrap replications of θ. Then an approximate 100(1 2α)% bootstrap based confidence interval for θ is [θ (αn), θ ({1 α}n) ] This is easiest to implement when N is chosen so that αn is an integer, but this need not be the case.

14 First Prev Next Last Simple bootstrapping in R Page 14 of 41 bootstrap <-function(x,nboot,theta,...) { data <- matrix( sample(x,size = length(x) * nboot, replace = T ), nrow = nboot) answer <- apply(data, 1, theta,...) answer }

15 First Prev Next Last Example: The Old Faithful geyser Histogram of observed Page 15 of 41 Relative Frequency observed

16 First Prev Next Last Bootstrap distribution of sample mean Histogram of bootvals Page 16 of 41 Relative Frequency bootvals

17 First Prev Next Last How many bootstrap replications? Page 17 of 41 Note that the amount of computer time required increases linearly with N. It can be shown that var(ŝe N ) c 1 n 2 + c 2 nn, where c 1 and c 2 are constants that depend on the underlying population distribution F, but not on n or N. The first term represents sampling variation, and tends to zero as the sample size increases. The second term represents the resampling variation, and it approaches zero as N for fixed n.

18 First Prev Next Last How many bootstrap replications? Thus ŝe N always has a greater standard deviation than ŝe, but the practical question is how much greater? An approximate, but quite satisfactory answer can be obtained by looking at the coefficient of variation of ŝe N i.e. the ratio of the standard error of ŝe N to its expected value Page 18 of 41

19 First Prev Next Last How many replications? It can be shown that cv(ŝe N ) { cv(ŝe ) 2 + E( ˆ ) + 2 4N } 1/2. Page 19 of 41 Here, ˆ is a parameter that depends on how long the tail of the distribution of θ( ˆF n ) is, and ŝe is the ideal bootstrap estimate of standard error.

20 First Prev Next Last Page 20 of 41 How many replications? In practice, ˆ is very likely to be less than 10, and the smallest possible value of ˆ is 2. An important consequence of this is that for the values of cv(ŝe ) and ˆ that are likely to arise in practice, cv(ŝe N ) is unlikely to be much greater than for N > 200. cv(ŝe )

21 First Prev Next Last Page 21 of 41 How many replications? The following rules of thumb are given by Efron and Tibshirani: Even a small number of bootstrap samples, N = 25 say, is usually informative and often is sufficient to give a good estimate of the standard error of θ n. It is seldom that more than N = 200 bootstrap replications are needed for estimating a standard error. However, much bigger values of N are required for constructing bootstrap based confidence intervals.

22 First Prev Next Last Some worked examples: more complicated data structures American Law Schools Two measurements were made on the entering classes of each school in 1973: LSAT - the average score for the class on a national law test, and GPA - an average of the student grades for the whole class. LSAT : GPA : Page 22 of 41 LSAT : GPA : LSAT : GPA : We are interested in the standard error of the estimated correlation between these two statistics

23 First Prev Next Last Law School example law <- matrix(ncol=2,nrow=15) law[,1] <- c(576,635,558,578,...) law[,2] <-c(3.39,3.30,2.81,3.03,...) Page 23 of 41 theta.fn1 <- function(selected,xdata) { answer <- cor(xdata[selected,1],xdata[selected,2]) answer } bootvals <- bootstrap(1:15,1000, theta.fn1,xdata=law) hist(bootvals) abline(v = cor(law[,1],law[,2])) mean(bootvals) - cor(law[,1], law[,2])

24 First Prev Next Last Law school example Histogram of bootvals Page 24 of 41 Frequency bootvals

25 First Prev Next Last Comparing two samples: The mouse data In this example we are intested in testing the difference in means of two random samples The data is Treatment : No-treatment : Page 25 of 41 The mean survival for the treatment group is days, whereas the mean survival for the no-treatment group is days. We bootstrap to see if this difference is significant

26 First Prev Next Last Mouse data Page 26 of 41 Let F and G denote the population distributions of the treatment and no-treatment data respectively. The statistical question (hypothesis) of interest is whether the means of F and G are equal, i.e. whether θ(f, G) = µ(f ) µ(g) = 0 in the obvious notation. From the observed data, we have a single value of the random variable the difference between the sample means of independent θ 7,9 (F, G) = samples of sizes 7 and 9 from F and G respectively Take independent values x 1,..., x 7 from F, and independent values y 1,..., y 9 from G. Construct ˆθ 7,9 = x ȳ where x is the sample mean of the x i values, etc.

27 First Prev Next Last Mouse Data Histogram of bootvals Page 27 of 41 Frequency bootvals

28 First Prev Next Last Regression Page 28 of 41 strength curing.time Observed values of strength and curing time, with best fitting (in a least-squares sense) linear regression line.

29 First Prev Next Last Page 29 of 41 Regression There are two possible nonparametric bootstrap approaches: Bootstrapping pairs Construct a bootstrap sample of size 50 by sampling uniformly with replacement from the (y i, t i ) pairs. Estimate α and β by least-squares for each of these bootstrap samples. Repeat many times, and thus obtain the required bootstrap distributions of α and β. Bootstrapping residuals From the fitted model y = t, construct the residuals r i = y i t i. Construct a bootstrap sample of residuals r1,..., r50 by sampling uniformly with replacement from r 1,..., r 50. From r1,..., r50, construct a bootstrap replication of the y-data via yi = t i + ri. Re-fit the model to these (y i, t i ) values to obtain bootstrap estimates of α and β. Repeat, and proceed as usual to obtain the required bootstrap distributions.

30 First Prev Next Last Page 30 of 41 Code theta.lr <- function(selected,xdata, drawplot = FALSE) { dummy <- lm(xdata[selected,1] xdata[selected,2]) if (drawplot == T) abline(dummy) answer <- dummy$coef answer } bootstrap(1:50,25,theta.lr, xdata=cbind(strength,curing.time), drawplot=t)

31 First Prev Next Last Regression Page 31 of 41 strength curing.time

32 First Prev Next Last Regression Page 32 of 41 Frequency Histogram of bootvals[1, ] Frequency Histogram of bootvals[2, ] Intercept parameter (alpha) Slope parameter (beta)

33 First Prev Next Last Regression Page 33 of 41 theta.res <- function(res,xdata,alpha,beta,drawplot=false) { # xdata contains the curing times y.boot <- alpha + beta * xdata + res answer <- lm(y.boot xdata) if (drawplot == T) abline(answer) answer <- answer$coef answer } bootvals <- bootstrap(res.vals,25, theta.res,xdata=curing.time,alpha=al, beta=be,drawplot=t)

34 First Prev Next Last Regression Page 34 of 41 strength curing.time

35 First Prev Next Last Regression Histogram of bootvals[1, ] Histogram of bootvals[2, ] Page 35 of 41 Frequency Frequency Intercept parameter (alpha) Slope parameter (beta)

36 First Prev Next Last Regression Page 36 of 41 We have two approaches that give very similar answers for this set of data. Which method is best in general? The answer depends on how far we believe the assumed structure of the regression model. Under the second approach, we assume that the order of the residuals is not important, so that the residual corresponding to any t i value is equally likely to have arisen with any other t j value. This corresponds to assuming that the distribution of the error ε i does not depend on t i. Bootstrapping pairs does not make this assumption, and so is more robust than bootstrapping residuals.

37 First Prev Next Last An example of the parametric bootstrap Page 37 of 41 boot.bvn <- function(nboot, ndata, m1, m2, v1, v2, rho,...) { # nboot is the number of bootstrap repetitions # ndata is the number of data points how.many <- nboot * ndata X <- rnorm(how.many, 0, 1) X1 <- m1 + sqrt(v1) * X X.mat <- matrix(x1, nrow = nboot) Y <- rnorm(how.many, rho * X, sqrt((1 - rhoˆ(2)))) Y1 <- m2 + sqrt(v2) * Y Y.mat <- matrix(y1, nrow = nboot) data <- cbind(x.mat,y.mat) answer <- apply(data,1,theta.bvn,n=ndata) answer }

38 First Prev Next Last Parametric or nonparametric bootstrap? Page 38 of 41 If the family has distribution F α where α is the (vector of) unknown parameters, then fitting F α to the data involves choosing some particular value ˆα according to a goodness-of-fit criterion maximum likelihood, for example. Once ˆα has been found, then bootstrap samples of size n are generated from Fˆα and used exactly as before to obtain bootstrap replications of θ n. This is called the parametric bootstrap. You resample from an estimate member of the family

39 First Prev Next Last Parametric Bootstrap Page 39 of 41 To illustrate the parametric bootstrap, we will refer back to the Law School data, and assume that the observed values are from the bivariate normal distribution. Let (x i, y i ), i = 1,..., 15 denote the observed (LSAT,GPA) values. Thus we will assume that the (x i, y i ) values are an independent sample from N 2 (µ 1, µ 2, σ 2 1, σ 2 2; ρ). The first thing we need to do is pick a particular member of the bivariate normal family.

40 First Prev Next Last Parametric Bootstrap Here (µ 1, µ 2 ) can be estimated reasonably by ( x, ȳ), The covariance matrix may be estimated reasonably by [ ] [ ] var(x) cov(x, Y ) σ 2 = 1 ρσ 1 σ 2 cov(x, Y ) var(y ) ρσ 1 σ 2 σ2 2 = 1 [ ] (xi x) 2 (yi ȳ)(x i x) 14 (yi ȳ)(x i x) (yi ȳ) 2. Page 40 of 41 This yields µ 1 = 600.3, µ 2 = 3.1, σ 2 1 = , σ 2 2 = and ρ = From this estimated distribution we can resample

41 First Prev Next Last Parametric Bootstrap Histogram of bootvals Page 41 of 41 Frequency bootvals

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

0.1 The plug-in principle for finding estimators

0.1 The plug-in principle for finding estimators The Bootstrap.1 The plug-in principle for finding estimators Under a parametric model P = {P θ ;θ Θ} (or a non-parametric P = {P F ;F F}), any real-valued characteristic τ of a particular member P θ (or

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT 17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

This produces (edited) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = aircondit, statistic = function(x, i) { 1/mean(x[i, ]) }, R = B)

This produces (edited) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = aircondit, statistic = function(x, i) { 1/mean(x[i, ]) }, R = B) STAT 675 Statistical Computing Solutions to Homework Exercises Chapter 7 Note that some outputs may differ, depending on machine settings, generating seeds, random variate generation, etc. 7.1. For the

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term Lecture 7 ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

4 Resampling Methods: The Bootstrap

4 Resampling Methods: The Bootstrap 4 Resampling Methods: The Bootstrap Situation: Let x 1, x 2,..., x n be a SRS of size n taken from a distribution that is unknown. Let θ be a parameter of interest associated with this distribution and

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Semester , Example Exam 1

Semester , Example Exam 1 Semester 1 2017, Example Exam 1 1 of 10 Instructions The exam consists of 4 questions, 1-4. Each question has four items, a-d. Within each question: Item (a) carries a weight of 8 marks. Item (b) carries

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

the logic of parametric tests

the logic of parametric tests the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Outline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650

Outline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650 Outline Confidence intervals More parametric tests More bootstrap and randomization tests Parameter Estimation Collect a sample to estimate the value of a population parameter. Example: estimate mean age

More information

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Simulation of stationary processes. Timo Tiihonen

Simulation of stationary processes. Timo Tiihonen Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation Simulation has always a goal. How to organize simulation to reach the goal sufficiently well and without unneccessary

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

5601 Notes: The Subsampling Bootstrap

5601 Notes: The Subsampling Bootstrap 5601 Notes: The Subsampling Bootstrap Charles J. Geyer April 13, 2006 1 Web Pages This handout accompanies the web pages http://www.stat.umn.edu/geyer/5601/examp/subboot.html http://www.stat.umn.edu/geyer/5601/examp/subtoot.html

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

Interval Estimation III: Fisher's Information & Bootstrapping

Interval Estimation III: Fisher's Information & Bootstrapping Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile

More information

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer

More information

Chapter 4 Describing the Relation between Two Variables

Chapter 4 Describing the Relation between Two Variables Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Bootstrap Resampling

Bootstrap Resampling Bootstrap Resampling Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

Lecture 13. Simple Linear Regression

Lecture 13. Simple Linear Regression 1 / 27 Lecture 13 Simple Linear Regression October 28, 2010 2 / 27 Lesson Plan 1. Ordinary Least Squares 2. Interpretation 3 / 27 Motivation Suppose we want to approximate the value of Y with a linear

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Bootstrapping Spring 2014

Bootstrapping Spring 2014 Bootstrapping 18.05 Spring 2014 Agenda Bootstrap terminology Bootstrap principle Empirical bootstrap Parametric bootstrap January 1, 2017 2 / 16 Empirical distribution of data Data: x 1, x 2,..., x n (independent)

More information

Chapter 10. Simple Linear Regression and Correlation

Chapter 10. Simple Linear Regression and Correlation Chapter 10. Simple Linear Regression and Correlation In the two sample problems discussed in Ch. 9, we were interested in comparing values of parameters for two distributions. Regression analysis is the

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

More information

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information