STAT440/840: Statistical Computing
|
|
- Barnard Shields
- 5 years ago
- Views:
Transcription
1 First Prev Next Last STAT440/840: Statistical Computing Paul Marriott MC 6096 February 2, 2005 Page 1 of 41
2 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the bootstrap Suppose we are interested in a quantity θ θ(f ) that depends on the distribution function F, and that we have available an independent random sample where each For example θ might be X = (X 1,..., X n ) X i F the mean in which case θ = x df (x), the median, θ = F 1 (1/2) the variance θ = (x x df (x)) 2 df (x)
3 First Prev Next Last Estimate of θ From the observed data, we have the single value ˆθ n = ˆθ n (X) as an estimate of θ. For example ˆθ n might be Page 3 of 41 the sample mean in which case ˆθ n = x i /n, the sample median, the sample variance The question arises then, how accurate is our single value ˆθ n as an estimate of θ?
4 First Prev Next Last Idealised Monte Carlo Page 4 of Generate where Y j = (Y j1,..., Y jn ) Y ji F ( ) for i = 1,..., n and j = 1,..., N. 2. Calculate for each j = 1,..., N. ˆθ n (Y j ) 3. Estimate the variability using the N samples The thing that prevents us getting started with this scheme, of course, is that we cannot generate Y because the distribution function F is not known.
5 First Prev Next Last Idealised Monte-Carlo A way of getting round this is to approximate the unknown distribution function F by another distribution function, F where F is obtained from the observed data, We can then proceed as above but generating from F instead of F. This is the approach that underpins the bootstrap method as a computational technique. Page 5 of 41
6 First Prev Next Last Empirical distribution function Page 6 of 41 The empirical distribution function is defined by ˆF (x) = #(X 1 x,..., X n x) n n = n 1 I(X i x). i=1 This puts a mass of probability 1/n at each point in the sample Sampling from ˆF means drawing X i from the sample with probability 1/n
7 First Prev Next Last The non parametric bootstrap Page 7 of 41 We are interested in the distribution of ˆθ, but we cannot work this out because we do not know F. However ˆF (x) converges to F (x) in the limit as the sample size n. Thus provided n is large enough then ˆF ( ) may be a good approximant of F ( ). The key step here is that the unknown distribution function F ( ) is replaced by the (known) empirical distribution function ˆF ( ).
8 First Prev Next Last The approximations Page 8 of 41 The unknown distribution function F ( ) is approximated by the empirical distribution function ˆF ( ). The exact distribution of ˆθ given ˆF is called the ideal bootstrap distribution of θ n, and often is difficult to calculate. The approximate behaviour of the random variable θ( ˆF n ) may be examined through simulation. We generate N independent samples of size n from ˆF ( ), and evaluate the sample value of ˆθ for each of these N samples. Clearly, the larger the value of N then the closer this empirical distribution and the ideal bootstrap distribution of θ n will be.
9 First Prev Next Last Sampling from the empirical distribution function For the nonparametric bootstrap: 1. Generate independent integers i 1,..., i n from the discrete uniform distribution on {1, 2,..., n} Page 9 of Set X = (X i1, X i2,..., X in ) Then X is a sample of size n from ˆF ( ). This scheme is equivalent to sampling n values randomly with replacement from the observed data X 1,..., X n.
10 First Prev Next Last Page 10 of 41 The nonparametric bootstrap algorithm The nonparametric bootstrap algorithm proceeds as follows: Generate N independent bootstrap samples X 1, X 2,..., X N each consisting of n data values drawn randomly with replacement from the original data X. Evaluate the sample value of θ corresponding to each bootstrap sample: ˆθ i n = θ(x i ) for i = 1,..., N. These are called bootstrap replications of θ n.
11 First Prev Next Last Output Page 11 of 41 Bootstrap estimate of standard error of θ n. This is defined as the sample standard error of the N bootstrap replications, i.e. { where ŝe N = (N 1) 1 N i=1 ˆθ n = N 1 (ˆθ i n N i=1 ˆθ i n. } ) 1/2 2 ˆθ n The standard error is sometimes called the standard deviation. The value of ŝe N converges to the standard error of θ( ˆF n ) as N. This limiting value is (of course) the standard error of the ideal bootstrap distribution of θ n.
12 First Prev Next Last Output Page 12 of 41 Bootstrap estimate of bias of θ n. The bias of θ n as an estimator of θ is defined to be E{ˆθ} θ(f ), where θ(f ) means the theoretical value of θ under the true model F. The bootstrap estimate of bias based on N bootstrap replications is then bias N = ˆθ n ˆθ n where ˆθ n = N 1 N i=1 ˆθ i n
13 First Prev Next Last Output Page 13 of 41 Bootstrap based confidence intervals. Instead of using the bootstrap standard error as a measure of precision, we may use the bootstrap to construct approximate confidence intervals. (However, N typically needs to be much larger in this case.) Let θ (1) < < θ (N) denote the ordered bootstrap replications of θ. Then an approximate 100(1 2α)% bootstrap based confidence interval for θ is [θ (αn), θ ({1 α}n) ] This is easiest to implement when N is chosen so that αn is an integer, but this need not be the case.
14 First Prev Next Last Simple bootstrapping in R Page 14 of 41 bootstrap <-function(x,nboot,theta,...) { data <- matrix( sample(x,size = length(x) * nboot, replace = T ), nrow = nboot) answer <- apply(data, 1, theta,...) answer }
15 First Prev Next Last Example: The Old Faithful geyser Histogram of observed Page 15 of 41 Relative Frequency observed
16 First Prev Next Last Bootstrap distribution of sample mean Histogram of bootvals Page 16 of 41 Relative Frequency bootvals
17 First Prev Next Last How many bootstrap replications? Page 17 of 41 Note that the amount of computer time required increases linearly with N. It can be shown that var(ŝe N ) c 1 n 2 + c 2 nn, where c 1 and c 2 are constants that depend on the underlying population distribution F, but not on n or N. The first term represents sampling variation, and tends to zero as the sample size increases. The second term represents the resampling variation, and it approaches zero as N for fixed n.
18 First Prev Next Last How many bootstrap replications? Thus ŝe N always has a greater standard deviation than ŝe, but the practical question is how much greater? An approximate, but quite satisfactory answer can be obtained by looking at the coefficient of variation of ŝe N i.e. the ratio of the standard error of ŝe N to its expected value Page 18 of 41
19 First Prev Next Last How many replications? It can be shown that cv(ŝe N ) { cv(ŝe ) 2 + E( ˆ ) + 2 4N } 1/2. Page 19 of 41 Here, ˆ is a parameter that depends on how long the tail of the distribution of θ( ˆF n ) is, and ŝe is the ideal bootstrap estimate of standard error.
20 First Prev Next Last Page 20 of 41 How many replications? In practice, ˆ is very likely to be less than 10, and the smallest possible value of ˆ is 2. An important consequence of this is that for the values of cv(ŝe ) and ˆ that are likely to arise in practice, cv(ŝe N ) is unlikely to be much greater than for N > 200. cv(ŝe )
21 First Prev Next Last Page 21 of 41 How many replications? The following rules of thumb are given by Efron and Tibshirani: Even a small number of bootstrap samples, N = 25 say, is usually informative and often is sufficient to give a good estimate of the standard error of θ n. It is seldom that more than N = 200 bootstrap replications are needed for estimating a standard error. However, much bigger values of N are required for constructing bootstrap based confidence intervals.
22 First Prev Next Last Some worked examples: more complicated data structures American Law Schools Two measurements were made on the entering classes of each school in 1973: LSAT - the average score for the class on a national law test, and GPA - an average of the student grades for the whole class. LSAT : GPA : Page 22 of 41 LSAT : GPA : LSAT : GPA : We are interested in the standard error of the estimated correlation between these two statistics
23 First Prev Next Last Law School example law <- matrix(ncol=2,nrow=15) law[,1] <- c(576,635,558,578,...) law[,2] <-c(3.39,3.30,2.81,3.03,...) Page 23 of 41 theta.fn1 <- function(selected,xdata) { answer <- cor(xdata[selected,1],xdata[selected,2]) answer } bootvals <- bootstrap(1:15,1000, theta.fn1,xdata=law) hist(bootvals) abline(v = cor(law[,1],law[,2])) mean(bootvals) - cor(law[,1], law[,2])
24 First Prev Next Last Law school example Histogram of bootvals Page 24 of 41 Frequency bootvals
25 First Prev Next Last Comparing two samples: The mouse data In this example we are intested in testing the difference in means of two random samples The data is Treatment : No-treatment : Page 25 of 41 The mean survival for the treatment group is days, whereas the mean survival for the no-treatment group is days. We bootstrap to see if this difference is significant
26 First Prev Next Last Mouse data Page 26 of 41 Let F and G denote the population distributions of the treatment and no-treatment data respectively. The statistical question (hypothesis) of interest is whether the means of F and G are equal, i.e. whether θ(f, G) = µ(f ) µ(g) = 0 in the obvious notation. From the observed data, we have a single value of the random variable the difference between the sample means of independent θ 7,9 (F, G) = samples of sizes 7 and 9 from F and G respectively Take independent values x 1,..., x 7 from F, and independent values y 1,..., y 9 from G. Construct ˆθ 7,9 = x ȳ where x is the sample mean of the x i values, etc.
27 First Prev Next Last Mouse Data Histogram of bootvals Page 27 of 41 Frequency bootvals
28 First Prev Next Last Regression Page 28 of 41 strength curing.time Observed values of strength and curing time, with best fitting (in a least-squares sense) linear regression line.
29 First Prev Next Last Page 29 of 41 Regression There are two possible nonparametric bootstrap approaches: Bootstrapping pairs Construct a bootstrap sample of size 50 by sampling uniformly with replacement from the (y i, t i ) pairs. Estimate α and β by least-squares for each of these bootstrap samples. Repeat many times, and thus obtain the required bootstrap distributions of α and β. Bootstrapping residuals From the fitted model y = t, construct the residuals r i = y i t i. Construct a bootstrap sample of residuals r1,..., r50 by sampling uniformly with replacement from r 1,..., r 50. From r1,..., r50, construct a bootstrap replication of the y-data via yi = t i + ri. Re-fit the model to these (y i, t i ) values to obtain bootstrap estimates of α and β. Repeat, and proceed as usual to obtain the required bootstrap distributions.
30 First Prev Next Last Page 30 of 41 Code theta.lr <- function(selected,xdata, drawplot = FALSE) { dummy <- lm(xdata[selected,1] xdata[selected,2]) if (drawplot == T) abline(dummy) answer <- dummy$coef answer } bootstrap(1:50,25,theta.lr, xdata=cbind(strength,curing.time), drawplot=t)
31 First Prev Next Last Regression Page 31 of 41 strength curing.time
32 First Prev Next Last Regression Page 32 of 41 Frequency Histogram of bootvals[1, ] Frequency Histogram of bootvals[2, ] Intercept parameter (alpha) Slope parameter (beta)
33 First Prev Next Last Regression Page 33 of 41 theta.res <- function(res,xdata,alpha,beta,drawplot=false) { # xdata contains the curing times y.boot <- alpha + beta * xdata + res answer <- lm(y.boot xdata) if (drawplot == T) abline(answer) answer <- answer$coef answer } bootvals <- bootstrap(res.vals,25, theta.res,xdata=curing.time,alpha=al, beta=be,drawplot=t)
34 First Prev Next Last Regression Page 34 of 41 strength curing.time
35 First Prev Next Last Regression Histogram of bootvals[1, ] Histogram of bootvals[2, ] Page 35 of 41 Frequency Frequency Intercept parameter (alpha) Slope parameter (beta)
36 First Prev Next Last Regression Page 36 of 41 We have two approaches that give very similar answers for this set of data. Which method is best in general? The answer depends on how far we believe the assumed structure of the regression model. Under the second approach, we assume that the order of the residuals is not important, so that the residual corresponding to any t i value is equally likely to have arisen with any other t j value. This corresponds to assuming that the distribution of the error ε i does not depend on t i. Bootstrapping pairs does not make this assumption, and so is more robust than bootstrapping residuals.
37 First Prev Next Last An example of the parametric bootstrap Page 37 of 41 boot.bvn <- function(nboot, ndata, m1, m2, v1, v2, rho,...) { # nboot is the number of bootstrap repetitions # ndata is the number of data points how.many <- nboot * ndata X <- rnorm(how.many, 0, 1) X1 <- m1 + sqrt(v1) * X X.mat <- matrix(x1, nrow = nboot) Y <- rnorm(how.many, rho * X, sqrt((1 - rhoˆ(2)))) Y1 <- m2 + sqrt(v2) * Y Y.mat <- matrix(y1, nrow = nboot) data <- cbind(x.mat,y.mat) answer <- apply(data,1,theta.bvn,n=ndata) answer }
38 First Prev Next Last Parametric or nonparametric bootstrap? Page 38 of 41 If the family has distribution F α where α is the (vector of) unknown parameters, then fitting F α to the data involves choosing some particular value ˆα according to a goodness-of-fit criterion maximum likelihood, for example. Once ˆα has been found, then bootstrap samples of size n are generated from Fˆα and used exactly as before to obtain bootstrap replications of θ n. This is called the parametric bootstrap. You resample from an estimate member of the family
39 First Prev Next Last Parametric Bootstrap Page 39 of 41 To illustrate the parametric bootstrap, we will refer back to the Law School data, and assume that the observed values are from the bivariate normal distribution. Let (x i, y i ), i = 1,..., 15 denote the observed (LSAT,GPA) values. Thus we will assume that the (x i, y i ) values are an independent sample from N 2 (µ 1, µ 2, σ 2 1, σ 2 2; ρ). The first thing we need to do is pick a particular member of the bivariate normal family.
40 First Prev Next Last Parametric Bootstrap Here (µ 1, µ 2 ) can be estimated reasonably by ( x, ȳ), The covariance matrix may be estimated reasonably by [ ] [ ] var(x) cov(x, Y ) σ 2 = 1 ρσ 1 σ 2 cov(x, Y ) var(y ) ρσ 1 σ 2 σ2 2 = 1 [ ] (xi x) 2 (yi ȳ)(x i x) 14 (yi ȳ)(x i x) (yi ȳ) 2. Page 40 of 41 This yields µ 1 = 600.3, µ 2 = 3.1, σ 2 1 = , σ 2 2 = and ρ = From this estimated distribution we can resample
41 First Prev Next Last Parametric Bootstrap Histogram of bootvals Page 41 of 41 Frequency bootvals
The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More information0.1 The plug-in principle for finding estimators
The Bootstrap.1 The plug-in principle for finding estimators Under a parametric model P = {P θ ;θ Θ} (or a non-parametric P = {P F ;F F}), any real-valued characteristic τ of a particular member P θ (or
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationRecitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT
17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationLecture 30. DATA 8 Summer Regression Inference
DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and
More informationPost-exam 2 practice questions 18.05, Spring 2014
Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,
More informationThis produces (edited) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = aircondit, statistic = function(x, i) { 1/mean(x[i, ]) }, R = B)
STAT 675 Statistical Computing Solutions to Homework Exercises Chapter 7 Note that some outputs may differ, depending on machine settings, generating seeds, random variate generation, etc. 7.1. For the
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationObjectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters
Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence
More informationThe exact bootstrap method shown on the example of the mean and variance estimation
Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8
CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationECON 3150/4150, Spring term Lecture 7
ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More information4 Resampling Methods: The Bootstrap
4 Resampling Methods: The Bootstrap Situation: Let x 1, x 2,..., x n be a SRS of size n taken from a distribution that is unknown. Let θ be a parameter of interest associated with this distribution and
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More informationSimple Linear Regression Using Ordinary Least Squares
Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression
More informationMy data doesn t look like that..
Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationMultiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:
Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The
More informationSemester , Example Exam 1
Semester 1 2017, Example Exam 1 1 of 10 Instructions The exam consists of 4 questions, 1-4. Each question has four items, a-d. Within each question: Item (a) carries a weight of 8 marks. Item (b) carries
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationthe logic of parametric tests
the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.
More informationSimple Linear Regression for the MPG Data
Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationOutline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650
Outline Confidence intervals More parametric tests More bootstrap and randomization tests Parameter Estimation Collect a sample to estimate the value of a population parameter. Example: estimate mean age
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationIntro to Linear Regression
Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationSimulation of stationary processes. Timo Tiihonen
Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation Simulation has always a goal. How to organize simulation to reach the goal sufficiently well and without unneccessary
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More information5601 Notes: The Subsampling Bootstrap
5601 Notes: The Subsampling Bootstrap Charles J. Geyer April 13, 2006 1 Web Pages This handout accompanies the web pages http://www.stat.umn.edu/geyer/5601/examp/subboot.html http://www.stat.umn.edu/geyer/5601/examp/subtoot.html
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationSTAT Section 2.1: Basic Inference. Basic Definitions
STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.
More informationInterval Estimation III: Fisher's Information & Bootstrapping
Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile
More informationChapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer
Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer
More informationChapter 4 Describing the Relation between Two Variables
Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship
More informationA Non-parametric bootstrap for multilevel models
A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationBootstrap Resampling
Bootstrap Resampling Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationMonte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics
Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,
More informationLecture 13. Simple Linear Regression
1 / 27 Lecture 13 Simple Linear Regression October 28, 2010 2 / 27 Lesson Plan 1. Ordinary Least Squares 2. Interpretation 3 / 27 Motivation Suppose we want to approximate the value of Y with a linear
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationSimple Linear Regression
Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationBootstrapping Spring 2014
Bootstrapping 18.05 Spring 2014 Agenda Bootstrap terminology Bootstrap principle Empirical bootstrap Parametric bootstrap January 1, 2017 2 / 16 Empirical distribution of data Data: x 1, x 2,..., x n (independent)
More informationChapter 10. Simple Linear Regression and Correlation
Chapter 10. Simple Linear Regression and Correlation In the two sample problems discussed in Ch. 9, we were interested in comparing values of parameters for two distributions. Regression analysis is the
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement
More information18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages
Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More information