MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
|
|
- Pierce Shelton
- 5 years ago
- Views:
Transcription
1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical distribution function (EDF) of a random variable X from a set of observations x = {x 1,..., x n }. This technique therefore permits to obtain empirical estimates, without making any assumptions about the distribution of X, when analytical ones are not available. The bootstrap was first introduced in 1979 as an algorithm to obtain sufficient estimates of standard errors (Efron and Tibshirani, 1993). According to the legend, Baron Munchausen saved himself from drowning in quicksand by pulling himself up using only his bootstraps. The statistical bootstrap, which uses re-sampling from a given set of data to mimic the variability that produced the data in the first place, has a rather more dependable theoretical basis, and can be a highly effective procedure for estimation of error quantities in statistical problems. 1.1 Motivations for Using the Bootstrap When performing regression analysis, the distributional assumptions on the behavior of the error terms need not be satisfied. In such cases, it may be difficult to identify the distribution of the regression coefficients, and to compute a test statistic on that basis. Thus far, we have invoked the central limit theorem (CLT) to justify our use of normal assumptions. However, for small sample sizes, such theoretical justifications will not apply. Hence, we need to use a non-parametric toolkit, which does not make any assumption about the distribution of the data. 1.2 The Plug-in Principle Consider a general scenario in which we have drawn realizations from an unknown population distribution F, such that y 1,..., y n F. The sample mean of these realizations is then computed ȳ n = 1 n y i. What is the standard error of the statistic ȳ n? Invoking the central limit theorem, we know that for moderately large n, we would obtain, ȳ n N(µ F, σ 2 F /n), Department of Mathematics and Statistics, Boston University 1
2 where µ F and σf 2 are the mean and variance of the unknown F -distributed random variable. Using this result, we can define the standard error of ȳ n, as follows, se F (Ȳn) = ( Var[Ȳn] ) ( ) 1/2 1/2 1 = n 2 Var[Y i ] = σ F. n where we have emphasized the dependence of this quantity on F, through the use of a subscript. Here, the population standard deviation can be estimated using the sample estimate, σ F 2 = 1 (y i ȳ n ) 2. n 1 Alternatively, we can use the plug-in principle, which proposes to replace the unknown distribution F by a sample estimate F, such that we may use se F (Ȳn), where F is obtained by bootstrapping the sampled data. 1.3 Sampling with Replacement We first introduce the notion of a bootstrap sample, denoted y b. Each such bootstrap sample is drawn from the empirical distribution function (EDF), constructed using the original sample y 1,..., y n, such that F n (t; y) := 1 I {y i t}, n where I{} is the indicator function, defined as follows, { 1 if y i t, I {y i t} := 0 otherwise. The EDF, F n (y), is therefore obtained by assigning an equal probability 1/n and a label i 1, i 2,..., i n to each element in y. We can then sample with replacement from the EDF by drawing n values from the distribution of the indexes. That is, drawing samples from the EDF, y j Fn, j = 1,..., n; is equivalent to drawing indexes from a uniform distribution on the indexes between 1 and n, i j Unif(1,..., n), j = 1,..., n. The resulting bootstrap sample consists of the following sequence of elements, {y 1 = y i1, y 2 = y i2,..., y n = y in }, forming an n-dimensional bootstrap sample. This procedure is repeated B times in order to produce b = 1,..., B samples of the form, y b := [y 1b,..., y nb] T. Such bootstrap samples are best conceived as a resampling or a randomization of the original data. Sampling with replacement ensures that the bootstrap samples are indeed probabilistically independent, E[y j (y k) T ] = E[y j ]E[y k] T, j, k = 1,..., B; where we are here treating each y j as a random vector. It is common practice to draw about B = 1000 bootstrap samples. However, Efron and Tibshirani (1993) originally advocated that anything between 25 and 200 samples was sufficient for most inferential purposes. Department of Mathematics and Statistics, Boston University 2
3 1.4 Bootstrapped Standard Error Continuing the previous example, we may be interested in estimating the standard error of the statistic ȳ n using the bootstrap. Such an estimate can be obtained by computing the statistic of interest here, the sample mean of the data y for each bootstrap sample, θ b := 1 n yib. (1) Once this is obtained, it suffices to compute the standard error of this distribution of bootstrapped sample means, ( ) 1/2 1 B se F (ȳ) := (θb B 1 θ ) 2, where the bootstrap mean of the bootstrapped sample means is given by θ := 1 B b=1 B θ b. The quantity, se F (ȳ), is then referred to as the bootstrapped standard error. Of course, this procedure could be repeated for any statistic θ := s(y), since we are only using the fact that the quantity of interest is a function of the data. In such cases, the bootstrap estimates in equation (1) would be computed using the bootstrap samples, such that θ b := s(y ). The central advantage of using the bootstrap is that we can control the accuracy of the bootstrap estimate through our choice of B. A larger value of B will yield a better estimate of the ideal bootstrap estimate, which would be based on all resamples of the data vector y. Because the number of possible such resamples grows factorially with n, we have adopted a Monte Carlo method for estimating this quantity. Since the bootstrap does not make any assumption about the distribution of the data, it should be regarded as a non-parametric procedure. 2 Bootstrap for Regression A key assumption made when conducting simple or multiple regression is that the error terms are normally distributed. In many practical situations, such an assumption may be untenable, or difficult to verify. When this occurs, one can resort to a bootstrap estimation of the standard errors in the model of interest. There exists two different methods for applying the bootstrap to regression. One can either sample the pairs of predictors and observed values, or directly re-sample the residuals, once we have fitted the model. 2.1 Bootstrapping Cases Firstly, a naive approach to bootstrap estimation in regression analysis is to re-sample cases. With this approach, we proceed as follows, b=1 b b := {(y i1b, x i1b),..., (y inb, x inb)}, for every b = 1,..., B. For each vector of bootstrap replicates, we compute βb, which is obtained by minimizing the RSS based on each bootstrap sample, b b, such that β b := argmin β R p (yib x T ibβ) 2. Department of Mathematics and Statistics, Boston University 3
4 The bootstrap estimate of the standard error of an estimator in our model, say β l for instance, with l = 1,..., p, can then be estimated as where the bootstrap mean is β l := β lb /B. 2.2 Bootstrapping Residuals ( ) 1/2 1 B se( β l ) = (βlb B 1 β l ) 2, b=1 Alternatively, one can sample with replacement from the residuals of a fitted model based on the OLS estimator β. This produces the following bootstrap sample, based on the fitted values ŷ i s, b b := {(x T 1 β + ê i1b, x 1 ),..., (x T n β + ê inb, x n )}, where for every j = 1,..., n, we could also have defined y i j := x T j β + ê ij. Note that the vector of predictor x T j does not have the same index as the residual ê i j. The latter quantity was sampled with replacement from the EDF of residuals under the OLS estimator, β, {ê 1 = y 1 ŷ 1,..., ê n = y n ŷ n }. That is, in this procedure, we are first fitting our standard model to derive the OLS estimate, β. This, in turn, allows us to resample the residuals, given that particular estimate. This second strategy is less statistically robust than the boostrapping cases, as it assumes that homoscedasticity holds. That is, since we are breaking the dependence of the residuals on the vectors of predictors, x i, we are implicitly assuming that the variance of the residuals does not depend on the values of x i. When this assumption is unlikely to hold, it is preferable to boostrap cases, which is more robust than bootstrapping the residuals. 3 Theory of the Bootstrap 3.1 Consistency of the EDF For any set of random variables {Y 1,..., Y n }, from some unknown cumulative distribution function (CDF) denoted F, the empirical distribution function (EDF), F n, is defined for any t R, F n (t; Y) := 1 n I {Y i t}, where we have emphasized the fact that F n is a random quantity, which depends on the full n-dimensional random vector, y. The EDF has two desirable properties. It is both (i) unbiased and (ii) consistent, with respect to F. To show that F n is unbiased with respect to the target CDF, F, it suffices to take the Department of Mathematics and Statistics, Boston University 4
5 expectation for some t R and any n N, E[ F n (t; Y)] = Fn (t; Y)dF (Y 1 )... df (Y n ) R n = 1 I {Y i t} df (Y i ) n = 1 n R P[Y i t] = P[Y t] = F (t), where the penultimate step follows from the fact that Y i F, for every i = 1,..., n. Secondly, Fn can also be shown to be consistent, in the sense that as n, the estimate F n (t; Y n ) converges to F (t), for every t R. That is, for every y R, we have the following pointwise convergence, [ P lim n ] F n (t; Y n ) = F (t) = 1. This is simply the strong law of large numbers, stating for any random variable, X, with finite second moments, we have X n a.s. X. In this case, the sequences of random numbers are composed of the F n (t; y). 3.2 Unbiasedness vs. Consistency Observe that the unbiasedness and consistency of an estimator are two different criteria. i. Unbiasedness refers to the average behavior of an estimator. What is its expectation? ii. Consistency captures the long-range behavior of an estimator, and is generally based on one of the laws of large numbers. Observe that these two criteria are independent: An estimator can be unbiased and inconsistent, such as for any sequence of sample means with expectation θ, such that X n θ, and some random variable, Y, centered at 0; we have E[ X n + Y ] = θ, and lim X n + Y θ, a.s. n Inversely, we may also have a random variable, which is consistent, yet biased, such as for instance, Xn +1/n, which is biased for every n, but nonetheless consistent. That is, [ E X n + 1 ] = θ + 1, and lim X n + 1 n n n n = θ, a.s. 3.3 Rates of Convergence Taken together, these results show that the good performance of the bootstrap relies on the rate of convergence of the EDF, Fn to the population distribution, F. Therefore, we have replaced a distributional assumption on the random variables of interest, by an appeal to the strong law of large number. Since the strong law of large number converges at a rate O(1/n), it follows that we are gaining in accuracy over a reliance on the central limit theorem, whose convergence rate is only of order O(1/ n). Roughly, for any sequence of random variables X 1,..., X n, with mean E[X i ] = µ, the sum S n converges as follows, S n n a.s. µ. Department of Mathematics and Statistics, Boston University 5
6 The strong law captures the first-order approximation of the sample mean. If, in addition, we know that Var[X i ] = σ 2, for every i = 1,..., n, we then have S n nµ n d N(0, σ 2 ), which represents a second-order approximation of the mean µ. When using the bootstrap, we are exploiting the fact that the strong law of large number has a better rate of convergence than the central limit theorem. References Efron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall, London. Department of Mathematics and Statistics, Boston University 6
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationMFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators
MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern
More informationThe exact bootstrap method shown on the example of the mean and variance estimation
Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationBootstrap Method for Dependent Data Structure and Measure of Statistical Precision
Journal of Mathematics and Statistics 6 (): 84-88, 00 ISSN 549-3644 00 Science Publications ootstrap Method for Dependent Data Structure and Measure of Statistical Precision T.O. Olatayo, G.N. Amahia and
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationCharacterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ
Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased
More informationA better way to bootstrap pairs
A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationThe bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationSTAT 830 Non-parametric Inference Basics
STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationDiscussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis
Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,
More informationA Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.
A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationSimple Linear Regression
Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week
More informationThe Surprising Conditional Adventures of the Bootstrap
The Surprising Conditional Adventures of the Bootstrap G. Alastair Young Department of Mathematics Imperial College London Inaugural Lecture, 13 March 2006 Acknowledgements Early influences: Eric Renshaw,
More informationAppendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION
Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION Bootstrapping is a general, distribution-free method that is used to estimate parameters ofinterest from data collected from studies or
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationThe Bootstrap Suppose we draw aniid sample Y 1 ;:::;Y B from a distribution G. Bythe law of large numbers, Y n = 1 B BX j=1 Y j P! Z ydg(y) =E
9 The Bootstrap The bootstrap is a nonparametric method for estimating standard errors and computing confidence intervals. Let T n = g(x 1 ;:::;X n ) be a statistic, that is, any function of the data.
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis
More informationMonte Carlo Methods for Stochastic Programming
IE 495 Lecture 16 Monte Carlo Methods for Stochastic Programming Prof. Jeff Linderoth March 17, 2003 March 17, 2003 Stochastic Programming Lecture 16 Slide 1 Outline Review Jensen's Inequality Edmundson-Madansky
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationGeneralized Cp (GCp) in a Model Lean Framework
Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar
More informationBlock Bootstrap Prediction Intervals for Vector Autoregression
Department of Economics Working Paper Block Bootstrap Prediction Intervals for Vector Autoregression Jing Li Miami University 2013 Working Paper # - 2013-04 Block Bootstrap Prediction Intervals for Vector
More informationLarge Sample Properties & Simulation
Large Sample Properties & Simulation Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid Outline Large Sample Properties (W App. C3) 1 Large Sample Properties (W
More informationBootstrapping Australian inbound tourism
19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Bootstrapping Australian inbound tourism Y.H. Cheunga and G. Yapa a School
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationApplied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections
Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................
More informationVariance and Bias for General Loss Functions
Machine Learning, 51, 115 135, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Variance and Bias for General Loss Functions GARETH M. JAMES Marshall School of Business, University
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More information7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationSTAT Section 2.1: Basic Inference. Basic Definitions
STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.
More informationLecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016
Lecture 1: Random number generation, permutation test, and the bootstrap August 25, 2016 Statistical simulation 1/21 Statistical simulation (Monte Carlo) is an important part of statistical method research.
More informationLinear Model Under General Variance
Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T
More informationBootstrap Resampling
Bootstrap Resampling Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationParticle Filters. Outline
Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationSampling Distributions
Sampling Distributions Mathematics 47: Lecture 9 Dan Sloughter Furman University March 16, 2006 Dan Sloughter (Furman University) Sampling Distributions March 16, 2006 1 / 10 Definition We call the probability
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMonte Carlo Simulations and the PcNaive Software
Econometrics 2 Monte Carlo Simulations and the PcNaive Software Heino Bohn Nielsen 1of21 Monte Carlo Simulations MC simulations were introduced in Econometrics 1. Formalizing the thought experiment underlying
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationEric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION
Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationOverview of statistical methods used in analyses with your group between 2000 and 2013
Department of Epidemiology and Public Health Unit of Biostatistics Overview of statistical methods used in analyses with your group between 2000 and 2013 January 21st, 2014 PD Dr C Schindler Swiss Tropical
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show
More informationA New Bootstrap Based Algorithm for Hotelling s T2 Multivariate Control Chart
Journal of Sciences, Islamic Republic of Iran 7(3): 69-78 (16) University of Tehran, ISSN 16-14 http://jsciences.ut.ac.ir A New Bootstrap Based Algorithm for Hotelling s T Multivariate Control Chart A.
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationInference via Kernel Smoothing of Bootstrap P Values
Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen
More informationChapter 7: Simple linear regression
The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationConditional Least Squares and Copulae in Claims Reserving for a Single Line of Business
Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business Michal Pešta Charles University in Prague Faculty of Mathematics and Physics Ostap Okhrin Dresden University of Technology
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationApplying the proportional hazard premium calculation principle
Applying the proportional hazard premium calculation principle Maria de Lourdes Centeno and João Andrade e Silva CEMAPRE, ISEG, Technical University of Lisbon, Rua do Quelhas, 2, 12 781 Lisbon, Portugal
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationInference in Normal Regression Model. Dr. Frank Wood
Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationMaximum Non-extensive Entropy Block Bootstrap
Overview&Motivation MEB Simulation of the MnEBB Conclusions References Maximum Non-extensive Entropy Block Bootstrap Jan Novotny CEA, Cass Business School & CERGE-EI (with Michele Bergamelli & Giovanni
More informationST 371 (IX): Theories of Sampling Distributions
ST 371 (IX): Theories of Sampling Distributions 1 Sample, Population, Parameter and Statistic The major use of inferential statistics is to use information from a sample to infer characteristics about
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationMonte Carlo Simulations and PcNaive
Econometrics 2 Fall 2005 Monte Carlo Simulations and Pcaive Heino Bohn ielsen 1of21 Monte Carlo Simulations MC simulations were introduced in Econometrics 1. Formalizing the thought experiment underlying
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More information