Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION

Size: px
Start display at page:

Download "Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION"

Transcription

1 Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION Bootstrapping is a general, distribution-free method that is used to estimate parameters ofinterest from data collected from studies or experiments. It is often referred to as a resampling method because it is carried out by repeatedly drawing samples from the original data that were gathered. This section introduces the basics of bootstrapping and extends it to bootstrapping in regression analysis. For a discussion on calculating bias or calculating confidence intervals using bootstrapping, see Efron and Tibshirani (1993). Bootstrapping is a useful estimation technique when: 1. The formulas that are to be used for calculating estimates are based on assumptions that may not hold or may not be understood well, or cannot be verified, or are simply dubious. 2. The computational formulas hold only for large samples and are unreliable for small samples or simply not valid for small samples. 3. The computational formulas do not exist. To begin the discussion of bootstrapping techniques, assume that a study or experiment was conducted resulting in a data set x 1,..., x n of size n. This is a trivial case where the data are univariate in nature. Most studies involve collection of data on several variables as in the case of regression analysis studies. However, we use the simple example to lay the groundwork for the elements of bootstrapping methods. Assume that the data set was generated by some underlying distribution f(u). Here, f(u) is the probability density function and may be either continuous or discrete. It may be the case that the true density function is unknown and the functional form of f(u)is, therefore, unknown also. We are interested in estimating the parameter u, which describes some feature of the population from which the data were collected. For instance, u could be the true mean, median, the proportion, the variance, or the standard deviation of the population. Assume for the moment that we have a well-defined formula to calculate an estimate, ^u, of u. However, no formulas exist for calculating the confidence interval for u. Under the ideal setting where we have unlimited resources, we could draw a large number of samples from the population. We could then estimate u by calculating ^u for each sample. The calculated values of ^u can then be used to construct an empirical distribution of ^u that could then be used to construct a confidence interval for u. However, in reality we just have a single sample that is a justification for the use of bootstrapping method. The general idea behind bootstrapping is as follows (assuming that a study/experiment resulted in a data set of size n): 1. A sample of size n is drawn with replacement from the data set in hand. 2. An estimate, ^u, of u is calculated. Applied Econometrics Using the SAS Ò System, by Vivek B. Ajmani Copyright Ó 2009 John Wiley & Sons, Inc. 262

2 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION Steps 1 and 2 are repeated several times (sometimes thousands of repetitions are used) to generate a (simulated) distribution of ^u. This simulated distribution is then used for making inferences about u. As an example, suppose that we want to construct a 95% confidence interval for u. However, we do not have formulas that can be used for calculating the interval. We can therefore use bootstrapping to construct the confidence interval. The steps are as follows (Efron and Tibshirani, 1993): 1. Draw 1000 (as an example) bootstrap samples from the original data and calculate ^u 1 ;...; ^u 1000, the estimates from each of the 1000 samples. 2. Next, sort these estimates in increasing order. 3. Calculate the 2.5th and 97.5th percentile from the 1000 simulated values of ^u. The 2.5th percentile will be the average of the 25th and 26th observation while the 97.5th percentile will be the average of the 975th and 976th observation. That is, Lower confidence limit ¼ ^u 25 þ ^u 26 ; 2 Upper confidence limit ¼ ^u 975 þ ^u 976 : 2 Notice that we took the lower 2.5% and the upper 2.5% of the simulated distribution of ^u out to achieve the desired 95% confidence. Also note that we did not make any assumptions about the underlying distribution that generated the original data set. We will now formalize the general bootstrapping method presented so far. Consider a random variable x with cumulative distribution F(x; u). Here, u is a vector of unknown parameters. For example, if the distribution of x is normal, then u ¼ (m, s 2 ). Assume that we are interested in estimating u or some element of u that describes some aspect of f(x; u), the distribution of x. That is, we may be interested in estimating the mean, or the standard deviation, or the standard error of the mean. As we did before, we will assume that a study/experiment resulted in a random sample x 1,..., x n of size n. We can use this sample to approximate the cumulative distribution, F(x; u), with the empirical distribution function, ^Fðx; uþ. The estimate, ^Fðx; uþ, can be written as ^Fðx; uþ ¼ 1 n X n I ð 1;xÞ ðx i Þ; where I is an indicator function that counts the number of x s in the original sample that fall in the interval ( 1, x). This is better illustrated in Figure D.1. In Figure D.1, the true distribution, F(x; u), is given by the smooth line while the estimated function, ^Fðx; uþ, is given by the stepwise representation. The parameter vector u or elements of it could be calculated exactly if the form of F(x; u) were known. FIGURE D.1. Plot comparing actual cumulative versus simulated cumulative distributions. (Graph reproduced with permission from Paul Glewwe, University of Minnesota.)

3 264 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION That is,if we knew the exact form of F(x; u),then we could derive the probability density function, f(x; u),or a function t(f)to calculate u. However, assume that the functional form of F(x; u) is unknown and that it was approximated with ^Fðx; uþ. Therefore, one option we have is to replace F(x; u) with ^Fðx; uþ to get the estimated function tð ^FÞ. We can then use tð ^FÞ to calculate an estimate, ^u,ofu. The estimator ^u in this instance is called the plug-in estimator of u (Efron and Tibshirani, 1993, p. 35). As an example, the plug-in estimator of the population mean m x, is the sample mean m x ¼ ð1 1 x ¼ 1 n xf ðxþ dx; X n Notice that calculating the mean of x was trivial and did not require bootstrapping methods. In general, bootstrapping techniques are used to calculate standard errors and for constructing confidence intervals without making any assumption about the underlying distribution from which the samples are drawn. x i : D.2 CALCULATING STANDARD ERRORS We will now discuss how bootstrapping methods can be used to calculate an estimate of the standard error of the parameter of interest. Assume then that we have an estimate of u. That is, ^u was calculated from the original data set without the use of bootstrapping. Bootstrapping, however, will be used to calculate an estimate of the standard error of ^u. The general method for doing this is as follows (again assume that we have a data set of size n) (Efron and Tibshirani, 2004, p. 45): 1. Draw B samples of size n with replacement from the original data set. 2. Calculate ^u for each of the samples from step 1. That is, we now have ^u 1 ;...; ^u B. 3. We calculate the standard error from the B estimates of u by using the standard formulas for standard errors. That is, vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X B se B ð^uþ ¼t ð^u i ^uþ 2 ; B 1 where ^u PB 1 ¼ B ^u i is simply the mean of the ^u 1 ;...; ^u B. In practice, B is set to a very large number. Most practitioners use bootstrapped samples. D.3 BOOTSTRAPPING IN SAS Bootstrapping can easily be programmed in SAS by using simple routines. SAS macros to calculate bootstrapped estimates are available for download from the SAS Institute. The macros can be used to calculate bootstrapped and jackknife estimates for the standard deviation and standard error, and they are also used to calculate the bootstrapped confidence intervals. The macros can also be used to calculate bootstrapped estimates of coefficients in regression analysis. These macros need to be invoked from within SAS. Wewill illustrate the use of these macros a bit later. For now, we show howasimple program can bewritten to compute bootstrap estimates. Consider a data set that consists of 10 values: 196, 12, 280, 212, 52, 100, 206, 188, 100, 202. We will calculate bootstrap estimates of the standard error for the mean. The following SAS statements can be used: data age_data; input age; cards; 45

4 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION ; data bootstrap; do index=1 to 500; do i=1 to nobs; x=round(ranuni(0)*nobs); set age_data nobs=nobs point=x; output; end; end; stop; The following Proc Univariate statements will calculate the mean of the bootstrapped samples. proc univariate data=bootstrap noprint; var age; by index; output out=out1 mean=mean n=n; Finally, the following Proc Univariate statements will calculate the standard deviation of the 500 bootstrapped means. proc univariate data=out1 noprint; var mean; output out=out2 n=n mean=mean std=se; proc print data=out2; The analysis results in a mean and standard error of 27.6 and 6.8, respectively. D.4 BOOTSTRAPPING IN REGRESSION ANALYSIS Consider the standard linear regression model y i ¼ x T i b þ «i, wherex i and b are k 1 columnvectors and e i is randomerror. Assume that we have a data set comprising n pairs of observations (y 1, x 1 ),...,(y n, x n ). Assume that the conditional expectation E(«i jx i ) ¼ 0.Furthermore,assume that we donot know F(«jx),the cumulative distribution ofe.ingeneral,f isassumed tobenormal.

5 266 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION We will make use the standard least squares estimator for b, namely, ^b ¼ðX T XÞ 1 X T y, to calculate bootstrapped estimates. That is, as was the case, with the mean being calculated without the use of bootstrapping, we will assume that the least squares estimate can be calculated without any need of bootstrapping. However, we are interested in calculating the standard errors of ^b. That is, we assume that the formulas for calculating the standard errors are either unknown, unreliable, or simply do not work for small samples. As shown in Chapter 1, the estimate of the variance of ^b is Varð^bjXÞ ¼^s 2 ðx T XÞ 1, where ^s 2 is estimated as or ^s 2 ¼ 1 n X n ðy i x T i ^bþ 2 ^s 2 ¼ 1 X n ðy i x T n k 1 ^bþ 2 i : Notice that the first version is not an unbiased estimator for s 2 whereas the second version is. These versions are often referred to as the not bias-corrected and the bias-corrected versions, respectively. There are two bootstrapped methods (pairs method, residuals method) that are employed to estimate the standard error of ^b (Glewwe, 2006; Efron and Tibshirani, 1993, p. 113). The bootstrapped pairs method randomly selects pairs of y i and x i to calculate an estimate of «i, while the bootstrapped residuals method takes each x i just once but then links it with a random draw of an estimate of e. The next section outlines both methods. D.4.1 Bootstrapped Residuals Method As before, we assume that a study or experiment resulted in n observations (y 1, x 1 ),...,(y n, x n ). The general method for the bootstrapped residuals method is 1. For each i, calculate an estimate, e i of «i. That is, e i ¼ y i x T ^b i where ^b is the usual OLS estimator calculated from the original data. 2. Randomly draw n values of e i (from step 1) with replacement. Denote the residuals in the sample as e 1 ; e 2 ;...; e n. Notice that the subscripts of the residuals in the selected sample are not the same as the subscripts for the residuals, e i, which were calculated from the original sample. That is, in general e i 6¼ e i for i ¼ 1,..., n. 3. With the values of e i (from step 2), compute y i ¼ x T ^b i þ e i. Notice that the subscripts for x i here match the subscripts of x i in the original data set. That is, we are using each x i only once. Notice also that by construction of e i, y i 6¼ y i. 4. Using the calculated values of y i (from step 3), construct the vector y *. Finally, use X ¼ x 1... x n Š T and y ¼ y 1... y n ŠT to calculate b 1, the first bootstrapped estimate of b. That is, b 1 ¼ðXT XÞ 1 X T y. 5. Steps 2 through 4 are repeated B (typically B ¼ ) times to get B estimates of b. 6. Use the B estimates (from step 5) to calculate the sample standard deviation of ^b using the formula where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u B ð^b s:e: i ^b t Þ 2 ¼ ; B 1 ^b ¼ 1 B is the mean of the B residuals method bootstrapped estimates of b. X B ^b i

6 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 267 D.4.2 Bootstrapped Pairs Method As before, we assume that a study or experiment resulted in n observations (y 1, x 1 ),...,(y n, x n ). The general method for the bootstrapped pairs method is 1. Randomly draw n pairs of values, y i and x i, with replacement. Denote these as y 1 ; y 2 ;...; y n and x 1 ; x 2 ;...; x n.as discussed earlier, the subscripts here do not necessarily match the subscripts in the original data set. 2. Using these values of y i and x i, calculate the first bootstrapped estimate of b by using standard OLS techniques. That is, b 1 ¼ðXT X Þ 1 X T y. 3. Steps 1 and 2 are repeated B times (typically B ¼ ) to get B estimates of b. 4. Use the B estimates b i ; i ¼ 1;...; B, to calculate the sample standard deviation of ^b using the formula where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u B ðb s:e: i ^b t Þ 2 ¼ ; B 1 ^b ¼ 1 B X B b i is the mean of the B pairs method bootstrapped estimates of b. Computationally, the bootstrapped pairs method is more straightforward. As discussed in Efron and Tibshirani (1993, p. 113), the bootstrapped residuals method imposes homoscedasticity because it delinks x i with e i. Therefore, if the homoscedasticity assumption is violated, then we should use the bootstrapped pairs method, which does not impose this. On the other hand, if we are very confident of homoscedasticity, then we can use the bootstrapped residuals method to get more precise estimates of the standard error of ^b. In fact, it can be shown that as B!1the standard errors of the least squares estimates calculated using the bootstrapped residuals method converge to the diagonal elements of the variance covariance matrix ^s 2 ðx T XÞ 1. D.4.3 Bootstrapped Regression Analysis in SAS We will now illustrate the residuals and the pairs methods by using the %BOOT macro that can be downloaded from the SAS Institute website at We will make use of the gasoline consumption data given in Table F2.1 of Greene (2003). We need the bootstrapped macros (labeled JACKBOOT.SAS here) to be called from within the program. The %include statement can be used for this purpose. The following statements can be used: %include "C:\Temp\jackboot.sas"; The data set is then read into SAS and stored into a temporary SAS data set called gasoline. Notice that the raw data are stored in Excel format. proc import out=gasoline datafile="c:\temp\gasoline" dbms=excel Replace; getnames=yes; The following SAS data step statements simply transform the variables in the raw data by using the log transformations: data gasoline; set gasoline; Ln_G_Pop=log(G/Pop);

7 268 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION Ln_pg=Log(Pg); Ln_Income=Log(Y/Pop); Ln_Pnc=Log(Pnc); Ln_Puc=Log(Puc); The following Proc Reg statements are used to run OLS regression on the original data set. The residuals from this are stored in a temporary SAS data set called gasoline. The residuals are labeled as resid. proc reg data=gasoline; model Ln_G_Pop=Ln_pg Ln_Income Ln_Pnc Ln_Puc; output out=gasoline r=resid p=pred; The following macro is required before invoking the bootstrapped macros in the program jackboot.sas. The only inputs that require changes are the variable names in the model statement. The remaining statements can be used as is. See Sample JackKnife and Bootstrap Analyses from the SAS Institute for more details. The following code has been adapted from this publication and has been used with permission from the SAS Institute. %macro analyze(data=,out=); options nonotes; proc reg data=&data noprint outest=&out(drop=y _IN P EDF_); model Ln_G_Pop=Ln_pg Ln_Income Ln_Pnc Ln_Puc; %bystmt; options notes; %mend; This portion of the code invokes the %boot macro within jackboot.sas and conducts a bootstrapped analysis by using the pairs method. Note that the root mean square error (_RMSE_) is not a plug-in estimator for s, and therefore the bias correction is wrong. In other words, even though the mean square error is unbiased for s 2, the root mean square error is not unbiased for s. However, we choose to ignore this because the bias is minimal. title2 Resampling Observations-Pairs Method ; title3 (Bias correction for _RMSE_ is wrong) ; %boot(data=gasoline, random=123); This portion of the code invokes the %boot macro and conducts the bootstrapped analysis by using the residuals method. title2 Resampling Residuals-Residual Method ; title3 (bias correction for _RMSE_ is wrong) ; %boot(data=gasoline, residual=resid, equation=y=pred+resid, random=123); The analysis results are given in Outputs D.1 and D.2. The first part of the output is from the analysis of the original data. Wewill skip any discussion of this portion of the output as we have already discussed OLS regression output from SAS in detail in Chapter 2. The OLS output is followed by the output where bootstrapping is done by resampling pairs (Output D.1) and where the analysis was done using the residuals method (Output D.2).

8 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 269 Resampling Observations (bias correction for _RMSE_ is wrong) The REG Procedure Model: MODEL1 Dependent Variable: Ln_G_Pop Number of Observations Read 36 Number of Observations Used 36 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model < Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept < Ln_pg Ln_Income < Ln_Pnc Ln_Puc Resampling Observations (bias correction for _RMSE_ is wrong) Standard Error Lower Limit Name Observed Statistic Bootstrap Mean Bias Bias-Corrected Statistic Intercept Ln_G_Pop Ln_Income Ln_Pnc Ln_Puc Ln_pg _RMSE_ Upper Limit Method for Interval Minimum Maximum Resampled Resampled Estimate Estimate Number of LABEL OF FORMER Name Level (%) Resamples VARIABLE Intercept Bootstrap Normal Intercept Ln_G_Pop Bootstrap Normal Ln_Income Bootstrap Normal Ln_Pnc Bootstrap Normal Ln_Puc Bootstrap Normal Ln_pg Bootstrap Normal _RMSE_ Bootstrap Normal Root mean squared error OUTPUT D.1. Bootstrapped regression analysis (pairs method) of the gasoline consumption data.

9 270 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION The REG Procedure Model: MODEL1 Dependent Variable: Ln_G_Pop Number of Observations Read 36 Number of Observations Used 36 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model < Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept < Ln_pg Ln_Income < Ln_Pnc Ln_Puc Standard Error Lower Limit Name Observed Statistic Bootstrap Mean Bias Bias-Corrected Statistic Intercept E Ln_G_Pop Ln_Income E Ln_Pnc E Ln_Puc E Ln_pg E _RMSE_ E Upper Limit Resampling Residuals (bias correction for _RMSE_ is wrong) Method for Interval Minimum Maximum Resampled Resampled Estimate Estimate Number of LABEL OF FORMER Name Level (%) Resamples VARIABLE Intercept Bootstrap Normal Intercept Ln_G_Pop Bootstrap Normal Ln_Income Bootstrap Normal Ln_Pnc Bootstrap Normal Ln_Puc Bootstrap Normal Ln_pg Bootstrap Normal _RMSE_ Bootstrap Normal Root mean squared error OUTPUT D.2. Bootstrapped regression analysis (residuals method) of the gasoline consumption data.

10 APPENDIX D: INTRODUCTION TO BOOTSTRAP ESTIMATION 271 The output consists of the OLS estimates in the first column, followed by the mean of the coefficients estimated from the 200 bootstrap samples. The third column gives the bias, which is simply the bootstrap mean minus the observed statistic. The standard errors calculated from the bootstrapped samples are given next. This is followed by the 95% confidence intervals, the biascorrected statistics, and the minimum and maximum of the estimated coefficient values from the bootstrap samples. Notice that the bootstrap estimates of the coefficients and the standard errors are very similar to the OLS estimates. There is a remarkable similarity between the bootstrap estimates of the coefficients and the standard errors obtained from the residual method and the OLS estimates. This is not surprising since under the homoscedastic assumption, it can be shown that as the number of bootstrapped samples increases, the estimated values of the standard errors converge to the diagonal elements of ^sðx T XÞ 1, where ^s 2 is the estimate that is not corrected for bias.

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Failure Time of System due to the Hot Electron Effect

Failure Time of System due to the Hot Electron Effect of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Multicollinearity Exercise

Multicollinearity Exercise Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

the logic of parametric tests

the logic of parametric tests the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

S The Over-Reliance on the Central Limit Theorem

S The Over-Reliance on the Central Limit Theorem S04-2008 The Over-Reliance on the Central Limit Theorem Abstract The objective is to demonstrate the theoretical and practical implication of the central limit theorem. The theorem states that as n approaches

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

ECONOMETRICS I. Cheating and the violation of any of the above instructions, lead to the cancellation of the student s paper.

ECONOMETRICS I. Cheating and the violation of any of the above instructions, lead to the cancellation of the student s paper. Jorge Mendes Maria Jordão Midterm (A) Spring 2014 Undergraduate degree in Information Management 22/04/2014 ECONOMETRICS I Name: Number: Grade: Time to completion: 80 minutes This is a CLOSED book exam.

More information

Multiple Linear Regression

Multiple Linear Regression Chapter 3 Multiple Linear Regression 3.1 Introduction Multiple linear regression is in some ways a relatively straightforward extension of simple linear regression that allows for more than one independent

More information

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Part A: Figures and tables Figure 2: An illustration of the sampling procedure to generate a surrogate

More information

Chapter 31 The GLMMOD Procedure. Chapter Table of Contents

Chapter 31 The GLMMOD Procedure. Chapter Table of Contents Chapter 31 The GLMMOD Procedure Chapter Table of Contents OVERVIEW...1639 GETTING STARTED...1639 AOne-WayDesign...1639 SYNTAX...1644 PROCGLMMODStatement...1644 BYStatement...1646 CLASSStatement...1646

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nate Derby, Stakana Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking the underlying

More information

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

SPECIAL TOPICS IN REGRESSION ANALYSIS

SPECIAL TOPICS IN REGRESSION ANALYSIS 1 SPECIAL TOPICS IN REGRESSION ANALYSIS Representing Nominal Scales in Regression Analysis There are several ways in which a set of G qualitative distinctions on some variable of interest can be represented

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model Prof. Alan Wan 1 / 57 Table of contents 1. Assumptions in the Linear Regression Model 2 / 57

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS

HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS How to Test Endogeneity or Exogeneity: An E-Learning Hands on SAS 1 HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS *N. Uttam Singh, **Kishore K Das and *Aniruddha Roy *ICAR Research

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Effect of Centering and Standardization in Moderation Analysis

Effect of Centering and Standardization in Moderation Analysis Effect of Centering and Standardization in Moderation Analysis Raw Data The CORR Procedure 3 Variables: govact negemot Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label govact 4.58699

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Chapter 7: Simple linear regression

Chapter 7: Simple linear regression The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Lab # 11: Correlation and Model Fitting

Lab # 11: Correlation and Model Fitting Lab # 11: Correlation and Model Fitting Objectives: 1. Correlations between variables 2. Data Manipulation, creation of squares 3. Model fitting with regression 4. Comparison of models Correlations between

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Applied Time Series Notes ( 1) Dates. ñ Internal: # days since Jan 1, ñ Need format for reading, one for writing

Applied Time Series Notes ( 1) Dates. ñ Internal: # days since Jan 1, ñ Need format for reading, one for writing Applied Time Series Notes ( 1) Dates ñ Internal: # days since Jan 1, 1960 ñ Need format for reading, one for writing ñ Often DATE is ID variable (extrapolates) ñ Program has lots of examples: options ls=76

More information

A Little Stats Won t Hurt You

A Little Stats Won t Hurt You A Little Stats Won t Hurt You Nate Derby Statis Pro Data Analytics Seattle, WA, USA Edmonton SAS Users Group, 11/13/09 Nate Derby A Little Stats Won t Hurt You 1 / 71 Outline Introduction 1 Introduction

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION In linear regreion, we conider the frequency ditribution of one variable (Y) at each of everal level of a econd variable (). Y i known a the dependent variable. The variable for

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Inference via Kernel Smoothing of Bootstrap P Values

Inference via Kernel Smoothing of Bootstrap P Values Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Autocorrelation or Serial Correlation

Autocorrelation or Serial Correlation Chapter 6 Autocorrelation or Serial Correlation Section 6.1 Introduction 2 Evaluating Econometric Work How does an analyst know when the econometric work is completed? 3 4 Evaluating Econometric Work Econometric

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem - First Exam: Economics 388, Econometrics Spring 006 in R. Butler s class YOUR NAME: Section I (30 points) Questions 1-10 (3 points each) Section II (40 points) Questions 11-15 (10 points each) Section III

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Nonparametric Methods II

Nonparametric Methods II Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 PART 3: Statistical Inference by

More information

SAS/STAT 15.1 User s Guide The GLMMOD Procedure

SAS/STAT 15.1 User s Guide The GLMMOD Procedure SAS/STAT 15.1 User s Guide The GLMMOD Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.

More information

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 The MEANS Procedure DRINKING STATUS=1 Analysis Variable : TRIGL N Mean Std Dev Minimum Maximum 164 151.6219512 95.3801744

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1 Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information