Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION
|
|
- Vanessa Howard
- 5 years ago
- Views:
Transcription
1 Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION Bootstrapping is a general, distribution-free method that is used to estimate parameters ofinterest from data collected from studies or experiments. It is often referred to as a resampling method because it is carried out by repeatedly drawing samples from the original data that were gathered. This section introduces the basics of bootstrapping and extends it to bootstrapping in regression analysis. For a discussion on calculating bias or calculating confidence intervals using bootstrapping, see Efron and Tibshirani (1993). Bootstrapping is a useful estimation technique when: 1. The formulas that are to be used for calculating estimates are based on assumptions that may not hold or may not be understood well, or cannot be verified, or are simply dubious. 2. The computational formulas hold only for large samples and are unreliable for small samples or simply not valid for small samples. 3. The computational formulas do not exist. To begin the discussion of bootstrapping techniques, assume that a study or experiment was conducted resulting in a data set x 1,..., x n of size n. This is a trivial case where the data are univariate in nature. Most studies involve collection of data on several variables as in the case of regression analysis studies. However, we use the simple example to lay the groundwork for the elements of bootstrapping methods. Assume that the data set was generated by some underlying distribution f(u). Here, f(u) is the probability density function and may be either continuous or discrete. It may be the case that the true density function is unknown and the functional form of f(u)is, therefore, unknown also. We are interested in estimating the parameter u, which describes some feature of the population from which the data were collected. For instance, u could be the true mean, median, the proportion, the variance, or the standard deviation of the population. Assume for the moment that we have a well-defined formula to calculate an estimate, ^u, of u. However, no formulas exist for calculating the confidence interval for u. Under the ideal setting where we have unlimited resources, we could draw a large number of samples from the population. We could then estimate u by calculating ^u for each sample. The calculated values of ^u can then be used to construct an empirical distribution of ^u that could then be used to construct a confidence interval for u. However, in reality we just have a single sample that is a justification for the use of bootstrapping method. The general idea behind bootstrapping is as follows (assuming that a study/experiment resulted in a data set of size n): 1. A sample of size n is drawn with replacement from the data set in hand. 2. An estimate, ^u, of u is calculated. Applied Econometrics Using the SAS Ò System, by Vivek B. Ajmani Copyright Ó 2009 John Wiley & Sons, Inc. 262
2 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION Steps 1 and 2 are repeated several times (sometimes thousands of repetitions are used) to generate a (simulated) distribution of ^u. This simulated distribution is then used for making inferences about u. As an example, suppose that we want to construct a 95% confidence interval for u. However, we do not have formulas that can be used for calculating the interval. We can therefore use bootstrapping to construct the confidence interval. The steps are as follows (Efron and Tibshirani, 1993): 1. Draw 1000 (as an example) bootstrap samples from the original data and calculate ^u 1 ;...; ^u 1000, the estimates from each of the 1000 samples. 2. Next, sort these estimates in increasing order. 3. Calculate the 2.5th and 97.5th percentile from the 1000 simulated values of ^u. The 2.5th percentile will be the average of the 25th and 26th observation while the 97.5th percentile will be the average of the 975th and 976th observation. That is, Lower confidence limit ¼ ^u 25 þ ^u 26 ; 2 Upper confidence limit ¼ ^u 975 þ ^u 976 : 2 Notice that we took the lower 2.5% and the upper 2.5% of the simulated distribution of ^u out to achieve the desired 95% confidence. Also note that we did not make any assumptions about the underlying distribution that generated the original data set. We will now formalize the general bootstrapping method presented so far. Consider a random variable x with cumulative distribution F(x; u). Here, u is a vector of unknown parameters. For example, if the distribution of x is normal, then u ¼ (m, s 2 ). Assume that we are interested in estimating u or some element of u that describes some aspect of f(x; u), the distribution of x. That is, we may be interested in estimating the mean, or the standard deviation, or the standard error of the mean. As we did before, we will assume that a study/experiment resulted in a random sample x 1,..., x n of size n. We can use this sample to approximate the cumulative distribution, F(x; u), with the empirical distribution function, ^Fðx; uþ. The estimate, ^Fðx; uþ, can be written as ^Fðx; uþ ¼ 1 n X n I ð 1;xÞ ðx i Þ; where I is an indicator function that counts the number of x s in the original sample that fall in the interval ( 1, x). This is better illustrated in Figure D.1. In Figure D.1, the true distribution, F(x; u), is given by the smooth line while the estimated function, ^Fðx; uþ, is given by the stepwise representation. The parameter vector u or elements of it could be calculated exactly if the form of F(x; u) were known. FIGURE D.1. Plot comparing actual cumulative versus simulated cumulative distributions. (Graph reproduced with permission from Paul Glewwe, University of Minnesota.)
3 264 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION That is,if we knew the exact form of F(x; u),then we could derive the probability density function, f(x; u),or a function t(f)to calculate u. However, assume that the functional form of F(x; u) is unknown and that it was approximated with ^Fðx; uþ. Therefore, one option we have is to replace F(x; u) with ^Fðx; uþ to get the estimated function tð ^FÞ. We can then use tð ^FÞ to calculate an estimate, ^u,ofu. The estimator ^u in this instance is called the plug-in estimator of u (Efron and Tibshirani, 1993, p. 35). As an example, the plug-in estimator of the population mean m x, is the sample mean m x ¼ ð1 1 x ¼ 1 n xf ðxþ dx; X n Notice that calculating the mean of x was trivial and did not require bootstrapping methods. In general, bootstrapping techniques are used to calculate standard errors and for constructing confidence intervals without making any assumption about the underlying distribution from which the samples are drawn. x i : D.2 CALCULATING STANDARD ERRORS We will now discuss how bootstrapping methods can be used to calculate an estimate of the standard error of the parameter of interest. Assume then that we have an estimate of u. That is, ^u was calculated from the original data set without the use of bootstrapping. Bootstrapping, however, will be used to calculate an estimate of the standard error of ^u. The general method for doing this is as follows (again assume that we have a data set of size n) (Efron and Tibshirani, 2004, p. 45): 1. Draw B samples of size n with replacement from the original data set. 2. Calculate ^u for each of the samples from step 1. That is, we now have ^u 1 ;...; ^u B. 3. We calculate the standard error from the B estimates of u by using the standard formulas for standard errors. That is, vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X B se B ð^uþ ¼t ð^u i ^uþ 2 ; B 1 where ^u PB 1 ¼ B ^u i is simply the mean of the ^u 1 ;...; ^u B. In practice, B is set to a very large number. Most practitioners use bootstrapped samples. D.3 BOOTSTRAPPING IN SAS Bootstrapping can easily be programmed in SAS by using simple routines. SAS macros to calculate bootstrapped estimates are available for download from the SAS Institute. The macros can be used to calculate bootstrapped and jackknife estimates for the standard deviation and standard error, and they are also used to calculate the bootstrapped confidence intervals. The macros can also be used to calculate bootstrapped estimates of coefficients in regression analysis. These macros need to be invoked from within SAS. Wewill illustrate the use of these macros a bit later. For now, we show howasimple program can bewritten to compute bootstrap estimates. Consider a data set that consists of 10 values: 196, 12, 280, 212, 52, 100, 206, 188, 100, 202. We will calculate bootstrap estimates of the standard error for the mean. The following SAS statements can be used: data age_data; input age; cards; 45
4 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION ; data bootstrap; do index=1 to 500; do i=1 to nobs; x=round(ranuni(0)*nobs); set age_data nobs=nobs point=x; output; end; end; stop; The following Proc Univariate statements will calculate the mean of the bootstrapped samples. proc univariate data=bootstrap noprint; var age; by index; output out=out1 mean=mean n=n; Finally, the following Proc Univariate statements will calculate the standard deviation of the 500 bootstrapped means. proc univariate data=out1 noprint; var mean; output out=out2 n=n mean=mean std=se; proc print data=out2; The analysis results in a mean and standard error of 27.6 and 6.8, respectively. D.4 BOOTSTRAPPING IN REGRESSION ANALYSIS Consider the standard linear regression model y i ¼ x T i b þ «i, wherex i and b are k 1 columnvectors and e i is randomerror. Assume that we have a data set comprising n pairs of observations (y 1, x 1 ),...,(y n, x n ). Assume that the conditional expectation E(«i jx i ) ¼ 0.Furthermore,assume that we donot know F(«jx),the cumulative distribution ofe.ingeneral,f isassumed tobenormal.
5 266 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION We will make use the standard least squares estimator for b, namely, ^b ¼ðX T XÞ 1 X T y, to calculate bootstrapped estimates. That is, as was the case, with the mean being calculated without the use of bootstrapping, we will assume that the least squares estimate can be calculated without any need of bootstrapping. However, we are interested in calculating the standard errors of ^b. That is, we assume that the formulas for calculating the standard errors are either unknown, unreliable, or simply do not work for small samples. As shown in Chapter 1, the estimate of the variance of ^b is Varð^bjXÞ ¼^s 2 ðx T XÞ 1, where ^s 2 is estimated as or ^s 2 ¼ 1 n X n ðy i x T i ^bþ 2 ^s 2 ¼ 1 X n ðy i x T n k 1 ^bþ 2 i : Notice that the first version is not an unbiased estimator for s 2 whereas the second version is. These versions are often referred to as the not bias-corrected and the bias-corrected versions, respectively. There are two bootstrapped methods (pairs method, residuals method) that are employed to estimate the standard error of ^b (Glewwe, 2006; Efron and Tibshirani, 1993, p. 113). The bootstrapped pairs method randomly selects pairs of y i and x i to calculate an estimate of «i, while the bootstrapped residuals method takes each x i just once but then links it with a random draw of an estimate of e. The next section outlines both methods. D.4.1 Bootstrapped Residuals Method As before, we assume that a study or experiment resulted in n observations (y 1, x 1 ),...,(y n, x n ). The general method for the bootstrapped residuals method is 1. For each i, calculate an estimate, e i of «i. That is, e i ¼ y i x T ^b i where ^b is the usual OLS estimator calculated from the original data. 2. Randomly draw n values of e i (from step 1) with replacement. Denote the residuals in the sample as e 1 ; e 2 ;...; e n. Notice that the subscripts of the residuals in the selected sample are not the same as the subscripts for the residuals, e i, which were calculated from the original sample. That is, in general e i 6¼ e i for i ¼ 1,..., n. 3. With the values of e i (from step 2), compute y i ¼ x T ^b i þ e i. Notice that the subscripts for x i here match the subscripts of x i in the original data set. That is, we are using each x i only once. Notice also that by construction of e i, y i 6¼ y i. 4. Using the calculated values of y i (from step 3), construct the vector y *. Finally, use X ¼ x 1... x n Š T and y ¼ y 1... y n ŠT to calculate b 1, the first bootstrapped estimate of b. That is, b 1 ¼ðXT XÞ 1 X T y. 5. Steps 2 through 4 are repeated B (typically B ¼ ) times to get B estimates of b. 6. Use the B estimates (from step 5) to calculate the sample standard deviation of ^b using the formula where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u B ð^b s:e: i ^b t Þ 2 ¼ ; B 1 ^b ¼ 1 B is the mean of the B residuals method bootstrapped estimates of b. X B ^b i
6 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 267 D.4.2 Bootstrapped Pairs Method As before, we assume that a study or experiment resulted in n observations (y 1, x 1 ),...,(y n, x n ). The general method for the bootstrapped pairs method is 1. Randomly draw n pairs of values, y i and x i, with replacement. Denote these as y 1 ; y 2 ;...; y n and x 1 ; x 2 ;...; x n.as discussed earlier, the subscripts here do not necessarily match the subscripts in the original data set. 2. Using these values of y i and x i, calculate the first bootstrapped estimate of b by using standard OLS techniques. That is, b 1 ¼ðXT X Þ 1 X T y. 3. Steps 1 and 2 are repeated B times (typically B ¼ ) to get B estimates of b. 4. Use the B estimates b i ; i ¼ 1;...; B, to calculate the sample standard deviation of ^b using the formula where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u B ðb s:e: i ^b t Þ 2 ¼ ; B 1 ^b ¼ 1 B X B b i is the mean of the B pairs method bootstrapped estimates of b. Computationally, the bootstrapped pairs method is more straightforward. As discussed in Efron and Tibshirani (1993, p. 113), the bootstrapped residuals method imposes homoscedasticity because it delinks x i with e i. Therefore, if the homoscedasticity assumption is violated, then we should use the bootstrapped pairs method, which does not impose this. On the other hand, if we are very confident of homoscedasticity, then we can use the bootstrapped residuals method to get more precise estimates of the standard error of ^b. In fact, it can be shown that as B!1the standard errors of the least squares estimates calculated using the bootstrapped residuals method converge to the diagonal elements of the variance covariance matrix ^s 2 ðx T XÞ 1. D.4.3 Bootstrapped Regression Analysis in SAS We will now illustrate the residuals and the pairs methods by using the %BOOT macro that can be downloaded from the SAS Institute website at We will make use of the gasoline consumption data given in Table F2.1 of Greene (2003). We need the bootstrapped macros (labeled JACKBOOT.SAS here) to be called from within the program. The %include statement can be used for this purpose. The following statements can be used: %include "C:\Temp\jackboot.sas"; The data set is then read into SAS and stored into a temporary SAS data set called gasoline. Notice that the raw data are stored in Excel format. proc import out=gasoline datafile="c:\temp\gasoline" dbms=excel Replace; getnames=yes; The following SAS data step statements simply transform the variables in the raw data by using the log transformations: data gasoline; set gasoline; Ln_G_Pop=log(G/Pop);
7 268 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION Ln_pg=Log(Pg); Ln_Income=Log(Y/Pop); Ln_Pnc=Log(Pnc); Ln_Puc=Log(Puc); The following Proc Reg statements are used to run OLS regression on the original data set. The residuals from this are stored in a temporary SAS data set called gasoline. The residuals are labeled as resid. proc reg data=gasoline; model Ln_G_Pop=Ln_pg Ln_Income Ln_Pnc Ln_Puc; output out=gasoline r=resid p=pred; The following macro is required before invoking the bootstrapped macros in the program jackboot.sas. The only inputs that require changes are the variable names in the model statement. The remaining statements can be used as is. See Sample JackKnife and Bootstrap Analyses from the SAS Institute for more details. The following code has been adapted from this publication and has been used with permission from the SAS Institute. %macro analyze(data=,out=); options nonotes; proc reg data=&data noprint outest=&out(drop=y _IN P EDF_); model Ln_G_Pop=Ln_pg Ln_Income Ln_Pnc Ln_Puc; %bystmt; options notes; %mend; This portion of the code invokes the %boot macro within jackboot.sas and conducts a bootstrapped analysis by using the pairs method. Note that the root mean square error (_RMSE_) is not a plug-in estimator for s, and therefore the bias correction is wrong. In other words, even though the mean square error is unbiased for s 2, the root mean square error is not unbiased for s. However, we choose to ignore this because the bias is minimal. title2 Resampling Observations-Pairs Method ; title3 (Bias correction for _RMSE_ is wrong) ; %boot(data=gasoline, random=123); This portion of the code invokes the %boot macro and conducts the bootstrapped analysis by using the residuals method. title2 Resampling Residuals-Residual Method ; title3 (bias correction for _RMSE_ is wrong) ; %boot(data=gasoline, residual=resid, equation=y=pred+resid, random=123); The analysis results are given in Outputs D.1 and D.2. The first part of the output is from the analysis of the original data. Wewill skip any discussion of this portion of the output as we have already discussed OLS regression output from SAS in detail in Chapter 2. The OLS output is followed by the output where bootstrapping is done by resampling pairs (Output D.1) and where the analysis was done using the residuals method (Output D.2).
8 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 269 Resampling Observations (bias correction for _RMSE_ is wrong) The REG Procedure Model: MODEL1 Dependent Variable: Ln_G_Pop Number of Observations Read 36 Number of Observations Used 36 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model < Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept < Ln_pg Ln_Income < Ln_Pnc Ln_Puc Resampling Observations (bias correction for _RMSE_ is wrong) Standard Error Lower Limit Name Observed Statistic Bootstrap Mean Bias Bias-Corrected Statistic Intercept Ln_G_Pop Ln_Income Ln_Pnc Ln_Puc Ln_pg _RMSE_ Upper Limit Method for Interval Minimum Maximum Resampled Resampled Estimate Estimate Number of LABEL OF FORMER Name Level (%) Resamples VARIABLE Intercept Bootstrap Normal Intercept Ln_G_Pop Bootstrap Normal Ln_Income Bootstrap Normal Ln_Pnc Bootstrap Normal Ln_Puc Bootstrap Normal Ln_pg Bootstrap Normal _RMSE_ Bootstrap Normal Root mean squared error OUTPUT D.1. Bootstrapped regression analysis (pairs method) of the gasoline consumption data.
9 270 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION The REG Procedure Model: MODEL1 Dependent Variable: Ln_G_Pop Number of Observations Read 36 Number of Observations Used 36 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model < Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept < Ln_pg Ln_Income < Ln_Pnc Ln_Puc Standard Error Lower Limit Name Observed Statistic Bootstrap Mean Bias Bias-Corrected Statistic Intercept E Ln_G_Pop Ln_Income E Ln_Pnc E Ln_Puc E Ln_pg E _RMSE_ E Upper Limit Resampling Residuals (bias correction for _RMSE_ is wrong) Method for Interval Minimum Maximum Resampled Resampled Estimate Estimate Number of LABEL OF FORMER Name Level (%) Resamples VARIABLE Intercept Bootstrap Normal Intercept Ln_G_Pop Bootstrap Normal Ln_Income Bootstrap Normal Ln_Pnc Bootstrap Normal Ln_Puc Bootstrap Normal Ln_pg Bootstrap Normal _RMSE_ Bootstrap Normal Root mean squared error OUTPUT D.2. Bootstrapped regression analysis (residuals method) of the gasoline consumption data.
10 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 271 The output consists of the OLS estimates in the first column, followed by the mean of the coefficients estimated from the 200 bootstrap samples. The third column gives the bias, which is simply the bootstrap mean minus the observed statistic. The standard errors calculated from the bootstrapped samples are given next. This is followed by the 95% confidence intervals, the biascorrected statistics, and the minimum and maximum of the estimated coefficient values from the bootstrap samples. Notice that the bootstrap estimates of the coefficients and the standard errors are very similar to the OLS estimates. There is a remarkable similarity between the bootstrap estimates of the coefficients and the standard errors obtained from the residual method and the OLS estimates. This is not surprising since under the homoscedastic assumption, it can be shown that as the number of bootstrapped samples increases, the estimated values of the standard errors converge to the diagonal elements of ^sðx T XÞ 1, where ^s 2 is the estimate that is not corrected for bias.
The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationOutline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping
Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More informationChapter 1 Linear Regression with One Predictor
STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the
More informationLecture 11 Multiple Linear Regression
Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression
More informationFailure Time of System due to the Hot Electron Effect
of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationSTAT 3A03 Applied Regression With SAS Fall 2017
STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More informationChapter 8 (More on Assumptions for the Simple Linear Regression)
EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually
More informationFinite Population Correction Methods
Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1
More information3 Variables: Cyberloafing Conscientiousness Age
title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable
More informationOverview Scatter Plot Example
Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables
More informationBE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club
BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook
More informationMulticollinearity Exercise
Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there
More informationEXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"
EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically
More informationST Correlation and Regression
Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)
More informationHandout 1: Predicting GPA from SAT
Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data
More informationthe logic of parametric tests
the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.
More informationPubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES
PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationS The Over-Reliance on the Central Limit Theorem
S04-2008 The Over-Reliance on the Central Limit Theorem Abstract The objective is to demonstrate the theoretical and practical implication of the central limit theorem. The theorem states that as n approaches
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationSTOR 455 STATISTICAL METHODS I
STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted
More informationa. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =
Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationECONOMETRICS I. Cheating and the violation of any of the above instructions, lead to the cancellation of the student s paper.
Jorge Mendes Maria Jordão Midterm (A) Spring 2014 Undergraduate degree in Information Management 22/04/2014 ECONOMETRICS I Name: Number: Grade: Time to completion: 80 minutes This is a CLOSED book exam.
More informationMultiple Linear Regression
Chapter 3 Multiple Linear Regression 3.1 Introduction Multiple linear regression is in some ways a relatively straightforward extension of simple linear regression that allows for more than one independent
More informationSupplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach
Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Part A: Figures and tables Figure 2: An illustration of the sampling procedure to generate a surrogate
More informationChapter 31 The GLMMOD Procedure. Chapter Table of Contents
Chapter 31 The GLMMOD Procedure Chapter Table of Contents OVERVIEW...1639 GETTING STARTED...1639 AOne-WayDesign...1639 SYNTAX...1644 PROCGLMMODStatement...1644 BYStatement...1646 CLASSStatement...1646
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nate Derby, Stakana Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking the underlying
More informationCharacterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ
Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationA better way to bootstrap pairs
A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression
More informationSPECIAL TOPICS IN REGRESSION ANALYSIS
1 SPECIAL TOPICS IN REGRESSION ANALYSIS Representing Nominal Scales in Regression Analysis There are several ways in which a set of G qualitative distinctions on some variable of interest can be represented
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationCHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model
CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model Prof. Alan Wan 1 / 57 Table of contents 1. Assumptions in the Linear Regression Model 2 / 57
More information5.3 Three-Stage Nested Design Example
5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationAnalysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total
Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationHOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS
How to Test Endogeneity or Exogeneity: An E-Learning Hands on SAS 1 HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS *N. Uttam Singh, **Kishore K Das and *Aniruddha Roy *ICAR Research
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationTopic 14: Inference in Multiple Regression
Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationEffect of Centering and Standardization in Moderation Analysis
Effect of Centering and Standardization in Moderation Analysis Raw Data The CORR Procedure 3 Variables: govact negemot Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label govact 4.58699
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationChapter 7: Simple linear regression
The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationLab # 11: Correlation and Model Fitting
Lab # 11: Correlation and Model Fitting Objectives: 1. Correlations between variables 2. Data Manipulation, creation of squares 3. Model fitting with regression 4. Comparison of models Correlations between
More informationSTAT 3A03 Applied Regression Analysis With SAS Fall 2017
STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationApplied Time Series Notes ( 1) Dates. ñ Internal: # days since Jan 1, ñ Need format for reading, one for writing
Applied Time Series Notes ( 1) Dates ñ Internal: # days since Jan 1, 1960 ñ Need format for reading, one for writing ñ Often DATE is ID variable (extrapolates) ñ Program has lots of examples: options ls=76
More informationA Little Stats Won t Hurt You
A Little Stats Won t Hurt You Nate Derby Statis Pro Data Analytics Seattle, WA, USA Edmonton SAS Users Group, 11/13/09 Nate Derby A Little Stats Won t Hurt You 1 / 71 Outline Introduction 1 Introduction
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationSIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION In linear regreion, we conider the frequency ditribution of one variable (Y) at each of everal level of a econd variable (). Y i known a the dependent variable. The variable for
More informationT-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum
T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222
More informationThe exact bootstrap method shown on the example of the mean and variance estimation
Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More informationInference via Kernel Smoothing of Bootstrap P Values
Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationAutocorrelation or Serial Correlation
Chapter 6 Autocorrelation or Serial Correlation Section 6.1 Introduction 2 Evaluating Econometric Work How does an analyst know when the econometric work is completed? 3 4 Evaluating Econometric Work Econometric
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationSection I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -
First Exam: Economics 388, Econometrics Spring 006 in R. Butler s class YOUR NAME: Section I (30 points) Questions 1-10 (3 points each) Section II (40 points) Questions 11-15 (10 points each) Section III
More informationFast and robust bootstrap for LTS
Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationChapter 11. Analysis of Variance (One-Way)
Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationSTAT Section 2.1: Basic Inference. Basic Definitions
STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.
More informationSTAT 704 Sections IRLS and Bootstrap
STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationPLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis
PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationNonparametric Methods II
Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 PART 3: Statistical Inference by
More informationSAS/STAT 15.1 User s Guide The GLMMOD Procedure
SAS/STAT 15.1 User s Guide The GLMMOD Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.
More informationANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003
ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 The MEANS Procedure DRINKING STATUS=1 Analysis Variable : TRIGL N Mean Std Dev Minimum Maximum 164 151.6219512 95.3801744
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationAnswer Keys to Homework#10
Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean
More informationCase of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1
Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis
More informationBusiness Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM
Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF
More information