Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching


 Helena Harrell
 10 months ago
 Views:
Transcription
1 The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western Michigan University Scott F. Kosten PharmaNet/i3 Joseph W. McKean Western Michigan University
2 In estimating the treatment effect in an observational study there are likely to be differences between treatment and control groups on many baseline covariates. If any of these covariates are correlated with the response variable the difference in sample outcome means is likely to be a biased estimate of the true treatment effect. Propensity score matching can be used to redesign the study in order to provide meaningful comparison groups. After these comparison groups are formed a choice must be made for the outcome analysis.
3 Outcome Analyses The Monte Carlo study by Hill and Reiter (2006) evaluated many outcome analyses, from simple matched paired tests to more complicated bootstrap and HodgesLehmann type methods. Results: Popular methods were inefficient with (relatively) wide confidence intervals for the treatment effect. Many of the methods were empirically invalid. Each method performed poorly under some conditions. There was no clear winner.
4 In this work, we introduce new outcome analyses called blockadjusted methods. One method is a blockadjusted method based on the leastsquares (LS) fit. In our Monte Carlo it performed well under normal error distributions for the response but it performed poorly under distributions with heavier tails. Our second blockadjusted method is based on a robustrankbased (Wilcoxon) fit. It performed nearly as well as the LS fit when the errors have a normal distribution and it was much more powerful than the LS procedure for the thicker tailed distributions. In our Monte Carlo study these blockadjusted methods outperformed the methods in the Hill and Reiter study. We also show the results of these methods on a real data set.
5 Notation and Models Each treated subject is matched to its closest control subject. The matching is done with with replacement. So acontrolsubjectmaybematchedwithmorethanone treated subject. n t =NumberofTreatedsubjects n uc = Number of unique Control subjects in the matching; so n uc =NumberofBlocks. s i =LengthofBlocki. N = n uc i=1 s i =TotalSampleSize. Y be n 1vectorofallresponses. X c be N p matrix of covariates. I i be N 1indicatorvectorforith block. T be N 1indicatorvectorfortreatment.
6 Let θ denote the treatment effect, (regr.coef.corresponding to the treatment indicator T). Hypotheses: H 0 : θ =0versusH A : θ 0. (1) Models There is controversy over whether or not the model should include the covariates. This is a point of investigation in our study. The models are: Design matrix with covariates in: X =[1 N TI 2 I nuc X C ]. the Model is Y = Xβ + e. (2) Design matrix without covariates X =[1 N TI 2 I nuc ]. The Model is Y = X β + e. (3)
7 LS Block Adjusted Methods l and L. LS Block Adjusted Methods l. Obtain the LS estimate of β using the model with covariates, β LS =Argmin N [Y j x β] 2. (4) j=1 The estimate of θ is θ LS = β LS,2.TheusualCIforθ is θ LS ± z α/2 σ (X X) 1 22, (5) where (X X) 1 22 is the second diagonal element of (X X) 1 and σ 2 is the usual MSE. The test of H 0 for the l method is: Reject H 0 if 0 is not the confidence interval (5). LS Block Adjusted Method L: Sameasl but use the design matrix X.
8 Wilcoxon Block Adjusted Methods w and W Instead of the Euclidean 2 norm, the Wilcoxon procedures use the norm based on the dispersion function D(β): D(β) = N [R(Y j x jβ) (N +1)/2)](Y j x jβ), (6) j=1 where R(Y j x j β)denotestherankofy j x j β.the Restimate of β minimizes this dispersion function; see Kloke and McKean (2012) for R software. D(β) isinvarianttothe intercept parameter, so it is estimated separately usually by the median of the residuals. This estimate was proposed by Jaeckel (1972) and discussed in detail in Chapters 35 of Hettmansperger and McKean (2011). The estimate is highly efficient, attaining the efficiency of relative to the LS for normal errors and is more efficient for error distributions with thicker tails than the normal. A simple weighting scheme (HBR estimator) can attain up to a 50%breakdown point.
9 Wilcoxon Block Adjusted Methods w The Wilcoxon estimate is β W =ArgminD(β). (7) The Wilcoxon estimate of θ is θ W = β W,2.TheusualCIforθ is θ W ± z α/2 τ (X X) 1 22, (8) where (X X) 1 22 is the second diagonal element of (X X) 1 and the scale estimator τ is estimated as discussed in Kosten et al. (2012). The test of H 0 for the w method is: Reject H 0 if 0 is not the confidence interval (8). Wilcoxon Block Adjusted Method W: Sameasw but use the design matrix X (w/o covariates).
10 Monte Carlo Investigation Our study is similar to that of Hill and Reiter (2006). Their study, however, only included normal errors, where as besides the normal we have added several heavier tailed error distributions to investigate the robustness of the methods. For all situations: A single treatment and a control plus two covariates were employed. Matching based on the propensity scores is done with replacement. For most situations about 150 treated subjects and 350 control subjects were generated. The basic response surface is Y = θc + x 1 +2x 2 + e, (9) where c is either 0 or 1 depending on whether Y is a control or a treated response, x 1 and x 2 are continuous covariates, and e is the error term. Hence, the parameter θ is the treatment effect. For the study, we set θ =4,thesamevalueusedby Hill and Reiter (2006).
11 Misspecified Response Surfaces Besides this response surface, Hill and Reiter (2006) considered two other response surfaces where the fitted model is misspecified. But our fully adjusted procedures l and w are based on full model fits., It is thus easy to use their associated residual analyses to diagnose misspecified models and, hence, to ultimately fit more appropriate models. We show this later in an example.
12 Two main factors of the study: Error Distributions. Normal distribution;a contaminated normal distribution with contamination at 20and ratio (contaminated to good) set at 4; and a Cauchy distribution. Degree of overlap between treated and control subjects a Strong Overlap (SO). All covariates(for both Treated and Control) drawn iid form N(1, 1). This allows for very close matches. b Moderate Overlap (MO). CovariatesforTreatedsubjects drawn as in SO, while only 150 covariates for Control are drawn this way. The remaining Control covariates are iid N(3, 1) (these are called distracters).
13 c Weak Overlap (WO). Similar to MO except now only 50 Control covariates are drawn from N(1, 1) while 300 are drawn form N(3, 1). d Uneven Overlap (UO). The probability of the covariates being assigned to Treatment or Control is dependent on the region the covariates fall into, so there is good matching in some regions but poor in the others; see Kosten et al. (2012).
14 Methods Investigated More detailed descriptions of these methods can be found in the manuscript by Kosten et al. (2012). a Two LS block adjusted methods, l and L. b Two Wilcoxon block adjusted methods, w and W. c Matched Pairs Method (M). Thisistheusualpairedt. Weighted Two Sample Method (T). Same estimator as M, i.e., paired mean difference but the variance is weighted to account for the number of treated subjects a control is matched to. d Weighted LS Method (s, S). AweightedLSfitwith weights similar to the last method. The s method uses the design matrix [1 N TX c ]. The method uses the design matrix [1 N T].
15 e Robust Sandwich Variance Methods (r, R). Thisestimate is described by Huber (1967). A diagonal matrix based on the residuals and weights is used as the sandwich. The method r uses the design matrix with the covariates in while R uses the one without covariates. f Bootstrap Methods (d, D, b, B). Foreachbootstrap,the Treated and Control were each resampled (with replacement) then the matches were obtained with replacement and the selected outcome method was used to estimate the effect. The number of bootstraps was set at B =1000. Thedmethodsusethevariancefromthe resampled bootstraps of the estimates. The method d is based on the weighted LS fit with the design including the covariates the method D is based on the weighted LS fit with the design excluding the covariates. Methods b abd B are similar, but the bootstrap percentile confidence interval is used.
16 Table: Methods in the Simulation. Method Matched Pairs Weighted TwoSample WLS (NonCovariate Adjusted) WLS (Covariate Adjusted) WLS Robust Sandwich Variance (NonCovariate Adjusted) WLS Robust Sandwich Variance (Covariate Adjusted) Bootstrap  Variance (NonCovariate Adjusted) Bootstrap  Variance (Covariate Adjusted) Bootstrap  Percentile (NonCovariate Adjusted) Bootstrap  Percentile (Covariate Adjusted) HodgesLehmann Aligned Rank (NonCovariate Adjusted) HodgesLehmann Aligned Rank (Covariate Adjusted) BlockAdjusted Least Squares (NonCovariate Adjusted) BlockAdjusted Least Squares (Covariate Adjusted) BlockAdjusted Wilcoxon (NonCovariate Adjusted) BlockAdjusted Wilcoxon (Covariate Adjusted) Label M T S s R r D d B b H h L l W w
17 Results of the Monte Carlo Study There are 12 situations: 3Distributions 4DegreesofOverlap. For each situation, 10,000 simulations were run. For each, method its outcome analysis is based on its confidence interval for the effect. Nominal confidence was set at 95%. Validity For each method, its Validity is based on the methods empirical coverage of its confidence interval. We deemed a method to be valid for a situation if its empirical confidence is between 93 and 97%. On this basis:
18 Results of the Validity Study None of the bootstrap procedures were valid. They were far too conservative. The Weighted TwoSample (T) methodisconservative for 7 situations. The Sandwich TwoSample (R) methodisconservative for 8 situations. The Weighted LS (S) methodisconservativefor4and liberal for 4 situations each. The Weighted LS (s) methodisliberalfor8situations. As attested by their empirical coverages in the following tables, the remaining procedures (r, H, h, l, L, W, w) are valid for almost all the situations. Efficiency The efficiency for two procedures is the ratio of the mean lengths of their confidence intervals. This measure was obtained for each situation and is tabled by degree of overlap.
19 For each procedure at each situation, the tabled ratio is the mean length of the l (LSBlockAdjusted with covariates in the design matrix) method to the mean length of the procedure s confidence interval. Thus ratios greater than 1 mean that the procedure is more efficient than the l procedure. Table: Simulation results for the strong overlap (S) setting over all error distributions. Strong Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w
20 Table: Simulation results for the moderately overlap (MO) setting over all error distributions. Moderate Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w
21 Table: Simulation results for the weakly overlap (S) setting over all error distributions. Weak Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w
22 Table: Simulation results for the uneven overlap (MO) setting over all error distributions. Uneven Overlap Normal CN Cauchy Meth Eff Conf Eff Conf Eff Conf r H h L l W w
23 Conclusions on Efficiency The r (robust sandwich) procedure is inefficient. Even the l procedure dominates it for the contaminated normal situations. Between the h and H, the H procedure is more efficient than the h procedure for the nonnormal situations. Evidently, in the presence of heavy tailed error distributions, the less LS estimation the better. The W procedure clearly dominates the H procedure. The w procedure, however, is definitely superior to all of the procedures over the nonnormal situations. It is clearly the most robust procedure in the study. Furthermore, for the normal situations, its efficiency is only slightly less than that of h and l.
24 Learning to Learn (LTL) Data Set Purpose of study was to investigate the effect of the LTL program on grade point average (GPA) and graduation rate. The study was conducted from the fall semester of 1987 to the winter semester of students that participated in the LTL program. The study also collected data on control subjects. Response the student s last known college GPA. Covariates: Gender; Race (Caucasian or otherwise); Age; Alpha program participation (1=yes and 0=no); overall ACT score; English ACT score; Math ACT score; Reading ACT score; Science ACT score; high school GPA; and entry year into the study. Onetoone matching with replacement using propensity scores resulted in 294 matched controls. The next table shows how diverse treatment group is to all controls and how close the matched groups are:
25 Table: Baseline Covariates for the Treated (LTL), Full Control Groups, and Matched Controls Variable Treatment All Controls Matched Controls No. Obs Gender Race Entry Age Alpha ACT Eng. ACT Math ACT Read ACT Sci. ACT HS GPA Entry
26 Results of the Valid Methods for the Treatment Effect Table: Point Estimates and Confidence Intervals for the Seven Valid Methods Method Estimate 95% CI r ( , ) H ( , ) h ( , ) L (0.0014, ) l (0.0034, ) W ( , ) w ( , ) All methods except the LS based L and l conclude that the treatment LTL was ineffective at changing GPA.
27 The discrepancy between the LSbased and robust Wilcoxonbased methods was investigated by a robust diagnostic analysis based on the wfit; see McKean and Sheather (2009). Numerous outliers in the data led to the difference between the l and w procedures. Further, outliers were detected in factor (covariate) space. Hence, a robust high breakdown fit (HBR) was used. The high breakdown estimate of the effect is , along with the confidence interval ( , ), which confirms the w analysis. In summary, based on our diagnostic analysis, it appears that the outliers in both the response and factor spaces impaired the l analysis. Also, in light of the the high breakdown estimate and confidence interval for the effect, it appears that the analysis obtained from the w procedure is valid.
28 Conclusion Overall, method w has been shown to be the most versatile method for constructing a confidence interval for atreatmenteffectinanobservationalstudy. Method w was the most efficient estimator for heavytailed error distributions. It was very close to the most efficient when the errors were normal, while still providing near optimal coverage of the true treatment effect. Further, method w lends itself to a robust diagnostic residual analysis which checks quality of fit and identifies outliers in both response and covariate space. The nonrobust l method is a desirable approach if used in the context of well behaved distributions.
Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 512009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationJoseph W. McKean 1. INTRODUCTION
Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents
More informationExploring data sets using partial residual plots based on robust fits
L χ Statistical Procedures and Related Topics IMS Lecture Notes  Monograph Series (1997) Volume 31 Exploring data sets using partial residual plots based on robust fits Joseph W. McKean Western Michigan
More informationContents 1. Contents
Contents 1 Contents 1 OneSample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 Onesample Ztest (see Chapter 0.3.1)...... 4 1.1.2 Onesample ttest................. 6 1.1.3 Large sample
More informationChapter 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions
Chapter 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional twosample estimation procedures like pooledt, Welch s t,
More informationOneSample Numerical Data
OneSample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodnessoffit tests University of California, San Diego Instructor: Ery AriasCastro http://math.ucsd.edu/~eariasca/teaching.html
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E  8
CIVL  7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E  8 Chisquare Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I> Range of the class interval
More informationAnswer Key: Problem Set 6
: Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,
More information6 Single Sample Methods for a Location Parameter
6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually
More informationA nonparametric twosample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics  Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric twosample wald test of equality of variances David
More informationHeteroskedasticityRobust Inference in Finite Samples
HeteroskedasticityRobust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticityrobust standard
More informationAN IMPROVEMENT TO THE ALIGNED RANK STATISTIC
Journal of Applied Statistical Science ISSN 10675817 Volume 14, Number 3/4, pp. 225235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWOFACTOR ANALYSIS OF VARIANCE
More informationNonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I
1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and twosample tests 2 / 16 If data do not come from a normal
More informationLecture 11 Multiple Linear Regression
Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.16.5 111 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 112 Multiple Regression
More informationINFLUENCE OF USING ALTERNATIVE MEANS ON TYPEI ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT
Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344349 Anim. Plant Sci. 4():04 ISSN: 08708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPEI ERROR RATE IN THE COMPARISON OF
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia email: gorbunova.alisa@gmail.com
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: MannWhitney
More informationApproximate Median Regression via the BoxCox Transformation
Approximate Median Regression via the BoxCox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The
More informationHypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal
Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Nonparametric
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationChapter 1  Lecture 3 Measures of Location
Chapter 1  Lecture 3 of Location August 31st, 2009 Chapter 1  Lecture 3 of Location General Types of measures Median Skewness Chapter 1  Lecture 3 of Location Outline General Types of measures What
More information8. Nonstandard standard error issues 8.1. The bias of robust standard errors
8.1. The bias of robust standard errors Bias Robust standard errors are now easily obtained using e.g. Stata option robust Robust standard errors are preferable to normal standard errors when residuals
More informationHotelling s One Sample T2
Chapter 405 Hotelling s One Sample T2 Introduction The onesample Hotelling s T2 is the multivariate extension of the common onesample or paired Student s ttest. In a onesample ttest, the mean response
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4  The linear regression model SivElisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationMaster s Written Examination  Solution
Master s Written Examination  Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationEstimation of the Conditional Variance in Paired Experiments
Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often
More informationPrediction Intervals in the Presence of Outliers
Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and nonva)
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationA NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION
Changepoint Problems IMS Lecture Notes  Monograph Series (Volume 23, 1994) A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION BY K. GHOUDI AND D. MCDONALD Universite' Lava1 and
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial UQuantiles
Generalized Multivariate Rank Type Test Statistics via Spatial UQuantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More informationCorrelated and Interacting Predictor Omission for Linear and Logistic Regression Models
Clemson University TigerPrints All Dissertations Dissertations 8207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a  1 CHAPTER 4: IT
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Oneway Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationTests Using Spatial Median
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 331 338 Tests Using Spatial Median Ján Somorčík Comenius University, Bratislava, Slovakia Abstract: The multivariate multisample location problem
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationMultiple Linear Regression Using RankBased Test of Asymptotic Free Distribution
Multiple Linear Regression Using RankBased Test of Asymptotic Free Distribution Kuntoro Department of Biostatistics and Population Study, Airlangga University School of Public Health, Surabaya 60115,
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationA SignedRank Test Based on the Score Function
Applied Mathematical Sciences, Vol. 10, 2016, no. 51, 25172527 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ams.2016.66189 A SignedRank Test Based on the Score Function HyoIl Park Department
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationAccurate and Powerful Multivariate Outlier Detection
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di
More informationSTAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.
STAT 518  Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationBootstrapping, Randomization, 2BPLS
Bootstrapping, Randomization, 2BPLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationA Statistical Perspective on Algorithmic Leveraging
Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics,
More informationAdaptive Procedures for the Wilcoxon Mann Whitney Test: Seven Decades of Advances
Communications in Statistics  Theory and Methods ISSN: 03610926 (Print) 1532415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Adaptive Procedures for the Wilcoxon Mann Whitney Test:
More informationSingle Sample Means. SOCY601 Alan Neustadtl
Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationNonparametric Location Tests: ksample
Nonparametric Location Tests: ksample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04Jan2017 Nathaniel E. Helwig (U of Minnesota)
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationStatistics  Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics  Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More information* Tuesday 17 January :3016:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationInference in Regression Analysis
ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)
More informationMeasuring and Valuing Mobility
Measuring and Valuing Aggregate Frank London School of Economics January 2011 Select bibliography Aggregate, F. A. and E. Flachaire, (2011) Measuring, PEP Paper 8, STICERD, LSE D Agostino, M. and V. Dardanoni
More informationChapter # classifications of unlikely, likely, or very likely to describe possible buying of a product?
A. Attribute data B. Numerical data C. Quantitative data D. Sample data E. Qualitative data F. Statistic G. Parameter Chapter #1 Match the following descriptions with the best term or classification given
More informationA Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances
Available online at ijims.ms.tku.edu.tw/list.asp International Journal of Information and Management Sciences 20 (2009), 243253 A Simulation Comparison Study for Estimating the Process Capability Index
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationImprovement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step MEstimator (WMOM)
Punjab University Journal of Mathematics (ISSN 10162526) Vol. 50(1)(2018) pp. 97112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step MEstimator (WMOM) Firas Haddad
More informationPostexam 2 practice questions 18.05, Spring 2014
Postexam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,
More informationModel Misspecification
Model Misspecification Carlo Favero Favero () Model Misspecification 1 / 28 Model Misspecification Each specification can be interpreted of the result of a reduction process, what happens if the reduction
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationRegression Analysis: Exploring relationships between variables. Stat 251
Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More information36309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationA first look at the performances of a Bayesian chart to monitor. the ratio of two Weibull percentiles
A first loo at the performances of a Bayesian chart to monitor the ratio of two Weibull percentiles Pasquale Erto University of Naples Federico II, Naples, Italy email: ertopa@unina.it Abstract. The aim
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit Applied Multivariate Analysis Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 20092018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More informationData Set 8: Laysan Finch Beak Widths
Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UWMadison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationEXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name
STAT 301, Spring 2013 Name Lec 1, MWF 9:55  Ismor Fischer Discussion Section: Please circle one! TA: Shixue Li...... 311 (M 4:35) / 312 (M 12:05) / 315 (T 4:00) Xinyu Song... 313 (M 2:25) / 316 (T 12:05)
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationA Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008
A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.13.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationGlobal comparisons of medians and other quantiles in a oneway design when there are tied values
Communications in Statistics  Simulation and Computation ISSN: 03610918 (Print) 15324141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 Global comparisons of medians and other quantiles
More informationDo not copy, post, or distribute. IndependentSamples t Test and Mann C h a p t e r 13
C h a p t e r 13 IndependentSamples t Test and Mann Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 1921, 2008 Ksample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationNonparametric methods
Eastern Mediterranean University Faculty of Medicine Biostatistics course Nonparametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish
More informationContinuous Probability Distributions
Continuous Probability Distributions Called a Probability density function. The probability is interpreted as "area under the curve." 1) The random variable takes on an infinite # of values within a given
More informationWhat s New in Econometrics? Lecture 14 Quantile Methods
What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression
More informationMath 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010
Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Instructor Name Time Limit: 120 minutes Any calculator is okay. Necessary tables and formulas are attached to the back of the exam.
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More information18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages
Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 111016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More informationTutorial 2: Power and Sample Size for the Paired Sample ttest
Tutorial 2: Power and Sample Size for the Paired Sample ttest Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationInference Based on the Wild Bootstrap
Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 email: jgm@econ.queensu.ca Ottawa September 14, 2012 The Wild Bootstrap
More informationThe Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart
The Fundamentals of Heavy Tails Properties, Emergence, & Identification Jayakrishnan Nair, Adam Wierman, Bert Zwart Why am I doing a tutorial on heavy tails? Because we re writing a book on the topic Why
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationSPSS Guide For MMI 409
SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More information