Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching

Size: px
Start display at page:

Download "Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching"

Transcription

1 The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western Michigan University Scott F. Kosten PharmaNet/i3 Joseph W. McKean Western Michigan University

2 In estimating the treatment effect in an observational study there are likely to be differences between treatment and control groups on many baseline covariates. If any of these covariates are correlated with the response variable the difference in sample outcome means is likely to be a biased estimate of the true treatment effect. Propensity score matching can be used to redesign the study in order to provide meaningful comparison groups. After these comparison groups are formed a choice must be made for the outcome analysis.

3 Outcome Analyses The Monte Carlo study by Hill and Reiter (2006) evaluated many outcome analyses, from simple matched paired tests to more complicated bootstrap and Hodges-Lehmann type methods. Results: Popular methods were inefficient with (relatively) wide confidence intervals for the treatment effect. Many of the methods were empirically invalid. Each method performed poorly under some conditions. There was no clear winner.

4 In this work, we introduce new outcome analyses called block-adjusted methods. One method is a block-adjusted method based on the least-squares (LS) fit. In our Monte Carlo it performed well under normal error distributions for the response but it performed poorly under distributions with heavier tails. Our second block-adjusted method is based on a robust-rank-based (Wilcoxon) fit. It performed nearly as well as the LS fit when the errors have a normal distribution and it was much more powerful than the LS procedure for the thicker tailed distributions. In our Monte Carlo study these block-adjusted methods outperformed the methods in the Hill and Reiter study. We also show the results of these methods on a real data set.

5 Notation and Models Each treated subject is matched to its closest control subject. The matching is done with with replacement. So acontrolsubjectmaybematchedwithmorethanone treated subject. n t =NumberofTreatedsubjects n uc = Number of unique Control subjects in the matching; so n uc =NumberofBlocks. s i =LengthofBlocki. N = n uc i=1 s i =TotalSampleSize. Y be n 1vectorofallresponses. X c be N p matrix of covariates. I i be N 1indicatorvectorforith block. T be N 1indicatorvectorfortreatment.

6 Let θ denote the treatment effect, (regr.coef.corresponding to the treatment indicator T). Hypotheses: H 0 : θ =0versusH A : θ 0. (1) Models There is controversy over whether or not the model should include the covariates. This is a point of investigation in our study. The models are: Design matrix with covariates in: X =[1 N TI 2 I nuc X C ]. the Model is Y = Xβ + e. (2) Design matrix without covariates X =[1 N TI 2 I nuc ]. The Model is Y = X β + e. (3)

7 LS Block Adjusted Methods l and L. LS Block Adjusted Methods l. Obtain the LS estimate of β using the model with covariates, β LS =Argmin N [Y j x β] 2. (4) j=1 The estimate of θ is θ LS = β LS,2.TheusualCIforθ is θ LS ± z α/2 σ (X X) 1 22, (5) where (X X) 1 22 is the second diagonal element of (X X) 1 and σ 2 is the usual MSE. The test of H 0 for the l method is: Reject H 0 if 0 is not the confidence interval (5). LS Block Adjusted Method L: Sameasl but use the design matrix X.

8 Wilcoxon Block Adjusted Methods w and W Instead of the Euclidean 2 norm, the Wilcoxon procedures use the norm based on the dispersion function D(β): D(β) = N [R(Y j x jβ) (N +1)/2)](Y j x jβ), (6) j=1 where R(Y j x j β)denotestherankofy j x j β.the R-estimate of β minimizes this dispersion function; see Kloke and McKean (2012) for R software. D(β) isinvarianttothe intercept parameter, so it is estimated separately usually by the median of the residuals. This estimate was proposed by Jaeckel (1972) and discussed in detail in Chapters 3-5 of Hettmansperger and McKean (2011). The estimate is highly efficient, attaining the efficiency of relative to the LS for normal errors and is more efficient for error distributions with thicker tails than the normal. A simple weighting scheme (HBR estimator) can attain up to a 50%breakdown point.

9 Wilcoxon Block Adjusted Methods w The Wilcoxon estimate is β W =ArgminD(β). (7) The Wilcoxon estimate of θ is θ W = β W,2.TheusualCIforθ is θ W ± z α/2 τ (X X) 1 22, (8) where (X X) 1 22 is the second diagonal element of (X X) 1 and the scale estimator τ is estimated as discussed in Kosten et al. (2012). The test of H 0 for the w method is: Reject H 0 if 0 is not the confidence interval (8). Wilcoxon Block Adjusted Method W: Sameasw but use the design matrix X (w/o covariates).

10 Monte Carlo Investigation Our study is similar to that of Hill and Reiter (2006). Their study, however, only included normal errors, where as besides the normal we have added several heavier tailed error distributions to investigate the robustness of the methods. For all situations: A single treatment and a control plus two covariates were employed. Matching based on the propensity scores is done with replacement. For most situations about 150 treated subjects and 350 control subjects were generated. The basic response surface is Y = θc + x 1 +2x 2 + e, (9) where c is either 0 or 1 depending on whether Y is a control or a treated response, x 1 and x 2 are continuous covariates, and e is the error term. Hence, the parameter θ is the treatment effect. For the study, we set θ =4,thesamevalueusedby Hill and Reiter (2006).

11 Misspecified Response Surfaces Besides this response surface, Hill and Reiter (2006) considered two other response surfaces where the fitted model is misspecified. But our fully adjusted procedures l and w are based on full model fits., It is thus easy to use their associated residual analyses to diagnose misspecified models and, hence, to ultimately fit more appropriate models. We show this later in an example.

12 Two main factors of the study: Error Distributions. Normal distribution;a contaminated normal distribution with contamination at 20and ratio (contaminated to good) set at 4; and a Cauchy distribution. Degree of overlap between treated and control subjects a Strong Overlap (SO). All covariates(for both Treated and Control) drawn iid form N(1, 1). This allows for very close matches. b Moderate Overlap (MO). CovariatesforTreatedsubjects drawn as in SO, while only 150 covariates for Control are drawn this way. The remaining Control covariates are iid N(3, 1) (these are called distracters).

13 c Weak Overlap (WO). Similar to MO except now only 50 Control covariates are drawn from N(1, 1) while 300 are drawn form N(3, 1). d Uneven Overlap (UO). The probability of the covariates being assigned to Treatment or Control is dependent on the region the covariates fall into, so there is good matching in some regions but poor in the others; see Kosten et al. (2012).

14 Methods Investigated More detailed descriptions of these methods can be found in the manuscript by Kosten et al. (2012). a Two LS block adjusted methods, l and L. b Two Wilcoxon block adjusted methods, w and W. c Matched Pairs Method (M). Thisistheusualpairedt. Weighted Two Sample Method (T). Same estimator as M, i.e., paired mean difference but the variance is weighted to account for the number of treated subjects a control is matched to. d Weighted LS Method (s, S). AweightedLSfitwith weights similar to the last method. The s method uses the design matrix [1 N TX c ]. The method uses the design matrix [1 N T].

15 e Robust Sandwich Variance Methods (r, R). Thisestimate is described by Huber (1967). A diagonal matrix based on the residuals and weights is used as the sandwich. The method r uses the design matrix with the covariates in while R uses the one without covariates. f Bootstrap Methods (d, D, b, B). Foreachbootstrap,the Treated and Control were each resampled (with replacement) then the matches were obtained with replacement and the selected outcome method was used to estimate the effect. The number of bootstraps was set at B =1000. Thed-methodsusethevariancefromthe resampled bootstraps of the estimates. The method d is based on the weighted LS fit with the design including the covariates the method D is based on the weighted LS fit with the design excluding the covariates. Methods b abd B are similar, but the bootstrap percentile confidence interval is used.

16 Table: Methods in the Simulation. Method Matched Pairs Weighted Two-Sample WLS (Non-Covariate Adjusted) WLS (Covariate Adjusted) WLS Robust Sandwich Variance (Non-Covariate Adjusted) WLS Robust Sandwich Variance (Covariate Adjusted) Bootstrap - Variance (Non-Covariate Adjusted) Bootstrap - Variance (Covariate Adjusted) Bootstrap - Percentile (Non-Covariate Adjusted) Bootstrap - Percentile (Covariate Adjusted) Hodges-Lehmann Aligned Rank (Non-Covariate Adjusted) Hodges-Lehmann Aligned Rank (Covariate Adjusted) Block-Adjusted Least Squares (Non-Covariate Adjusted) Block-Adjusted Least Squares (Covariate Adjusted) Block-Adjusted Wilcoxon (Non-Covariate Adjusted) Block-Adjusted Wilcoxon (Covariate Adjusted) Label M T S s R r D d B b H h L l W w

17 Results of the Monte Carlo Study There are 12 situations: 3Distributions 4DegreesofOverlap. For each situation, 10,000 simulations were run. For each, method its outcome analysis is based on its confidence interval for the effect. Nominal confidence was set at 95%. Validity For each method, its Validity is based on the methods empirical coverage of its confidence interval. We deemed a method to be valid for a situation if its empirical confidence is between 93 and 97%. On this basis:

18 Results of the Validity Study None of the bootstrap procedures were valid. They were far too conservative. The Weighted Two-Sample (T) methodisconservative for 7 situations. The Sandwich Two-Sample (R) methodisconservative for 8 situations. The Weighted LS (S) methodisconservativefor4and liberal for 4 situations each. The Weighted LS (s) methodisliberalfor8situations. As attested by their empirical coverages in the following tables, the remaining procedures (r, H, h, l, L, W, w) are valid for almost all the situations. Efficiency The efficiency for two procedures is the ratio of the mean lengths of their confidence intervals. This measure was obtained for each situation and is tabled by degree of overlap.

19 For each procedure at each situation, the tabled ratio is the mean length of the l (LS-Block-Adjusted with covariates in the design matrix) method to the mean length of the procedure s confidence interval. Thus ratios greater than 1 mean that the procedure is more efficient than the l procedure. Table: Simulation results for the strong overlap (S) setting over all error distributions. Strong Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w

20 Table: Simulation results for the moderately overlap (MO) setting over all error distributions. Moderate Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w

21 Table: Simulation results for the weakly overlap (S) setting over all error distributions. Weak Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w

22 Table: Simulation results for the uneven overlap (MO) setting over all error distributions. Uneven Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w

23 Conclusions on Efficiency The r (robust sandwich) procedure is inefficient. Even the l procedure dominates it for the contaminated normal situations. Between the h and H, the H procedure is more efficient than the h procedure for the nonnormal situations. Evidently, in the presence of heavy tailed error distributions, the less LS estimation the better. The W procedure clearly dominates the H procedure. The w procedure, however, is definitely superior to all of the procedures over the nonnormal situations. It is clearly the most robust procedure in the study. Furthermore, for the normal situations, its efficiency is only slightly less than that of h and l.

24 Learning to Learn (LTL) Data Set Purpose of study was to investigate the effect of the LTL program on grade point average (GPA) and graduation rate. The study was conducted from the fall semester of 1987 to the winter semester of students that participated in the LTL program. The study also collected data on control subjects. Response the student s last known college GPA. Covariates: Gender; Race (Caucasian or otherwise); Age; Alpha program participation (1=yes and 0=no); overall ACT score; English ACT score; Math ACT score; Reading ACT score; Science ACT score; high school GPA; and entry year into the study. One-to-one matching with replacement using propensity scores resulted in 294 matched controls. The next table shows how diverse treatment group is to all controls and how close the matched groups are:

25 Table: Baseline Covariates for the Treated (LTL), Full Control Groups, and Matched Controls Variable Treatment All Controls Matched Controls No. Obs Gender Race Entry Age Alpha ACT Eng. ACT Math ACT Read ACT Sci. ACT HS GPA Entry

26 Results of the Valid Methods for the Treatment Effect Table: Point Estimates and Confidence Intervals for the Seven Valid Methods Method Estimate 95% CI r ( , ) H ( , ) h ( , ) L (0.0014, ) l (0.0034, ) W ( , ) w ( , ) All methods except the LS based L and l conclude that the treatment LTL was ineffective at changing GPA.

27 The discrepancy between the LS-based and robust Wilcoxon-based methods was investigated by a robust diagnostic analysis based on the w-fit; see McKean and Sheather (2009). Numerous outliers in the data led to the difference between the l and w procedures. Further, outliers were detected in factor (covariate) space. Hence, a robust high breakdown fit (HBR) was used. The high breakdown estimate of the effect is , along with the confidence interval ( , ), which confirms the w analysis. In summary, based on our diagnostic analysis, it appears that the outliers in both the response and factor spaces impaired the l analysis. Also, in light of the the high breakdown estimate and confidence interval for the effect, it appears that the analysis obtained from the w procedure is valid.

28 Conclusion Overall, method w has been shown to be the most versatile method for constructing a confidence interval for atreatmenteffectinanobservationalstudy. Method w was the most efficient estimator for heavy-tailed error distributions. It was very close to the most efficient when the errors were normal, while still providing near optimal coverage of the true treatment effect. Further, method w lends itself to a robust diagnostic residual analysis which checks quality of fit and identifies outliers in both response and covariate space. The nonrobust l method is a desirable approach if used in the context of well behaved distributions.

Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching

Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching Western Michigan University ScholarWorks at WMU Dissertations Graduate College 12-2010 Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching Scott F.

More information

Computational rank-based statistics

Computational rank-based statistics Article type: Advanced Review Computational rank-based statistics Joseph W. McKean, joseph.mckean@wmich.edu Western Michigan University Jeff T. Terpstra, jeff.terpstra@ndsu.edu North Dakota State University

More information

Rank-Based Estimation and Associated Inferences. for Linear Models with Cluster Correlated Errors

Rank-Based Estimation and Associated Inferences. for Linear Models with Cluster Correlated Errors Rank-Based Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors John D. Kloke Bucknell University Joseph W. McKean Western Michigan University M. Mushfiqur Rashid FDA Abstract

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Diagnostic Procedures

Diagnostic Procedures Diagnostic Procedures Joseph W. McKean Western Michigan University Simon J. Sheather Texas A&M University Abstract Diagnostic procedures are used to check the quality of a fit of a model, to verify the

More information

On Robustification of Some Procedures Used in Analysis of Covariance

On Robustification of Some Procedures Used in Analysis of Covariance Western Michigan University ScholarWorks at WMU Dissertations Graduate College 5-2010 On Robustification of Some Procedures Used in Analysis of Covariance Kuanwong Watcharotone Western Michigan University

More information

Joseph W. McKean 1. INTRODUCTION

Joseph W. McKean 1. INTRODUCTION Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions

Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional two-sample estimation procedures like pooled-t, Welch s t,

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

Exploring data sets using partial residual plots based on robust fits

Exploring data sets using partial residual plots based on robust fits L χ -Statistical Procedures and Related Topics IMS Lecture Notes - Monograph Series (1997) Volume 31 Exploring data sets using partial residual plots based on robust fits Joseph W. McKean Western Michigan

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

Multivariate Autoregressive Time Series Using Schweppe Weighted Wilcoxon Estimates

Multivariate Autoregressive Time Series Using Schweppe Weighted Wilcoxon Estimates Western Michigan University ScholarWorks at WMU Dissertations Graduate College 4-2014 Multivariate Autoregressive Time Series Using Schweppe Weighted Wilcoxon Estimates Jaime Burgos Western Michigan University,

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012 University of California, Berkeley, Statistics 3A: Statistical Inference for the Social and Life Sciences Michael Lugo, Spring 202 Solutions to Exam Friday, March 2, 202. [5: 2+2+] Consider the stemplot

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last two weeks: Sample, population and sampling

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF

More information

Distribution-Free Tests for Two-Sample Location Problems Based on Subsamples

Distribution-Free Tests for Two-Sample Location Problems Based on Subsamples 3 Journal of Advanced Statistics Vol. No. March 6 https://dx.doi.org/.66/jas.6.4 Distribution-Free Tests for Two-Sample Location Problems Based on Subsamples Deepa R. Acharya and Parameshwar V. Pandit

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Two-sample scale rank procedures optimal for the generalized secant hyperbolic distribution

Two-sample scale rank procedures optimal for the generalized secant hyperbolic distribution Two-sample scale rank procedures optimal for the generalized secant hyperbolic distribution O.Y. Kravchuk School of Physical Sciences, School of Land and Food Sciences, University of Queensland, Australia

More information

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio. Answers to Items from Problem Set 1 Item 1 Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.) a. response latency

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Controlling for overlap in matching

Controlling for overlap in matching Working Papers No. 10/2013 (95) PAWEŁ STRAWIŃSKI Controlling for overlap in matching Warsaw 2013 Controlling for overlap in matching PAWEŁ STRAWIŃSKI Faculty of Economic Sciences, University of Warsaw

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

f(x µ, σ) = b 2σ a = cos t, b = sin t/t, π < t 0, a = cosh t, b = sinh t/t, t > 0,

f(x µ, σ) = b 2σ a = cos t, b = sin t/t, π < t 0, a = cosh t, b = sinh t/t, t > 0, R-ESTIMATOR OF LOCATION OF THE GENERALIZED SECANT HYPERBOLIC DIS- TRIBUTION O.Y.Kravchuk School of Physical Sciences and School of Land and Food Sciences University of Queensland Brisbane, Australia 3365-2171

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Introduction to Estimation. Martina Litschmannová K210

Introduction to Estimation. Martina Litschmannová K210 Introduction to Estimation Martina Litschmannová martina.litschmannova@vsb.cz K210 Populations vs. Sample A population includes each element from the set of observations that can be made. A sample consists

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Half-Day 1: Introduction to Robust Estimation Techniques

Half-Day 1: Introduction to Robust Estimation Techniques Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Half-Day 1: Introduction to Robust Estimation Techniques Andreas Ruckstuhl Institut fr Datenanalyse

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Lecture 12 Inference in MLR

Lecture 12 Inference in MLR Lecture 12 Inference in MLR STAT 512 Spring 2011 Background Reading KNNL: 6.6-6.7 12-1 Topic Overview Review MLR Model Inference about Regression Parameters Estimation of Mean Response Prediction 12-2

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Hotelling s One- Sample T2

Hotelling s One- Sample T2 Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response

More information

STAT440/840: Statistical Computing

STAT440/840: Statistical Computing First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis jmding/math475/index.

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis   jmding/math475/index. istical A istic istics : istical Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html August 29, 2013 istical August 29, 2013 1 / 18 istical A istic

More information

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University Power in Paired-Samples Designs Running head: POWER IN PAIRED-SAMPLES DESIGNS Increasing Power in Paired-Samples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton

More information

POLSCI 702 Non-Normality and Heteroskedasticity

POLSCI 702 Non-Normality and Heteroskedasticity Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html

More information

Density Curves and the Normal Distributions. Histogram: 10 groups

Density Curves and the Normal Distributions. Histogram: 10 groups Density Curves and the Normal Distributions MATH 2300 Chapter 6 Histogram: 10 groups 1 Histogram: 20 groups Histogram: 40 groups 2 Histogram: 80 groups Histogram: 160 groups 3 Density Curve Density Curves

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

Additional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Resampling Methods. Lukas Meier

Resampling Methods. Lukas Meier Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information