Rerandomization to Balance Covariates
|
|
- Tiffany Hood
- 5 years ago
- Views:
Transcription
1 Rerandomization to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University Joint work with Don Rubin University of Minnesota Biostatistics 4/27/16
2 The Gold Standard Randomized experiments are the gold standard for estimating causal effects WHY? 1. They yield unbiased estimates 2. They eliminate confounding variables
3 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Randomize R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
4 Covariate Balance - Gender Suppose you get a bad randomization What would you do??? Can you rerandomize??? 512 Females, 158 Males 158 Females, 512 Males
5 The Origin of the Idea Rubin: What if, in a randomized experiment, the chosen randomized allocation exhibited substantial imbalance on a prognostically important baseline covariate? Cochran: Why didn't you block on that variable? Rubin: Well, there were many baseline covariates, and the correct blocking wasn't obvious; and I was lazy at that time. Cochran: This is a question that I once asked Fisher, and his reply was unequivocal: Fisher (recreated via Cochran): Of course, if the experiment had not been started, I would rerandomize.
6 Mind-Set Matters In 2007, Dr. Ellen Langer tested her hypothesis that mind-set matters with a randomized experiment She recruited 84 maids working at 7 different hotels, and randomly assigned half to a treatment group and half to control The treatment was simply informing the maids that their work satisfies the Surgeon General s recommendations for an active lifestyle Crum, A.J. and Langer, E.J. (2007). Mind-Set Matters: Exercise and the Placebo Effect, Psychological Science, 18:
7 Mind-Set Matters p-value =.01 p-value =.001
8 Covariate Imbalance The more covariates, the more likely at least one covariate will be imbalanced across treatment groups With just 10 independent covariates, the probability of a significant difference (α =.05) for at least one covariate is 1 (1-.05) 10 = 40%! Covariate imbalance is not limited to rare unlucky randomizations
9 The Gold Standard Randomized experiments are the gold standard for estimating causal effects WHY? 1. They yield unbiased estimates 2. They eliminate confounding variables on average! For any particular experiment, covariate imbalance is possible (and likely!), and conditional bias exists
10 1) Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance 2) Randomize (Re)randomize subjects subjects to treated to treated and and control control RERANDOMIZATION Check covariate balance unacceptable acceptable 3) Conduct experiment 4) Analyze results with a randomization test
11 Unbiased? To maintain an unbiased estimate of the treatment effect, the decision to rerandomize or not should be Ø automatic and specified in advance Ø blind to which group is treated Theorem: If the treated and control groups are the same size, and if for every unacceptable randomization the exact opposite randomization is also unacceptable, then rerandomization yields an unbiased estimate of the treatment effect.
12 Unbiased? Equal sized treatment groups are not actually required, but that s the easiest case to consider. Theorem: If the rerandomization criterion is affinely invariant and if the covariates, x, are ellipsoidally symmetric, then rerandomization leads to unbiased estimates of any linear function of x.
13 Criteria for Acceptable Balance We recommend using Mahalanobis Distance, M, to represent multivariate distance between group means: M ( ) [ ] 1 X X 'cov( X) ( X X ) = T C T C Under adequate sample sizes and pure randomization: M 2 ~ χ k Choose a and rerandomize when M > a
14 Distribution of M p a = Probability of accepting a randomization RERANDOMIZE Acceptable Randomizations a M
15 Choosing p a Choosing the acceptance probability is a tradeoff between a desire for better balance and computational time The number of randomizations needed to get one successful one is Geometric(p a ), so the expected number needed is 1/p a Computational time must be considered in advance, since many simulated acceptable randomizations are needed to conduct the randomization test
16 Another Measure of Balance The obvious choice may be to set limits on the acceptable balance for each covariate individually This destroys the joint distribution of the covariates D2 D
17 Rerandomization Based on M Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same (and known) and is the same for any linear combination of the covariates
18 Covariates After Rerandomization Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then ( ) T C E X X M a = 0 and ( ) ( ) XT XC M a = v X a T XC cov cov. v a k a γ 1, , k k a γ, 2 2 where γ is the incomplete gamma function: c b 1 y γ (,) bc y e dy 0
19 Mind-Set Matters Difference in Means = -8.3
20 1000 Randomizations Pure Randomization Rerandomization p a = Difference in Mean Age
21
22 Percent Reduction in Variance Define the percent reduction in variance for each covariate to be the percentage by which rerandomization reduces the randomization variance for the difference in means: ( ) var( X j,t X ) j,c var( X j,t X ) j,c var X j,t X j,c rerandomization For rerandomization when M a, the percent reduction in variance for each covariate is 1 v a
23 Percent Reduction in Variance k = 10, p a =.001 a = 1.48 k v a = 2 γ k 2 +1, a 10 2 k γ 2, a = 2 γ , γ 2, =.12 Pure Randomization Rerandomization Percent reduction in variance = 88% Increase p a, how does percent reduction in variance change? k = 10, p a =.1 a = 4.87 k v a = 2 γ k 2 +1, a k γ 2, a = 2 γ +1, γ 2, =.38 Difference in Mean Age Percent reduction in variance = 62% Difference in Mean Age
24 Percent Reduction in Variance Percent Reduction in Variance R 2 =1 R 2 =.8 Acceptance Probability: log10 scale k: Number of Covariates
25 Estimated Treatment Effect Theorem: If n T = n C and rerandomization occurs when M > a, then E Y# $ Y# & M a = τ and if the covariate means are normally distributed and the treatment effect, τ, is additive, the percent reduction in variance for the outcome difference in means is (1 v a )R 2, where R 2 is the coefficient of determination between Y and x.
26 Outcome Variance Reduction Percent Reduction in Variance R 2 =1 R 2 =.8 R 2 =.5 R 2 =.2 Acceptance Probability: log10 scale k: Number of Covariates
27 Outcome Variance Reduction k = 10, p =.001 v =.12 a Outcome: Weight Change R 2 =.1 Percent Reduction in Variance = (1 v a )R 2 = (1.12 )(.1) = 8.8% a Outcome: Change in Diastolic Blood Pressure R 2 =.64 Percent Reduction in Variance = (1 v a )R 2 = (1.12 )(.64) = 56% Equivalent to increasing the sample size by 1/(1 -.56) = 2.27
28 Analysis Rerandomization can drastically improve power! BUT to take advantage of this, the analysis has to account for the rerandomization Typical distributional based methods (such as a t-test) will overestimate the standard error, and will be too conservative (p-values will be too insignificant) To get an accurate p-value and Type I error rate, use a randomization test, where each simulated randomization follows the same rerandomization procedure used in the actual experiment
29 Extensions Tiers of covariates importance Improving computational efficiency Sequential assignment Block experiments Unequal treatment group sizes Multiple treatment groups Choose the best from a fixed number of randomizations
30 Conclusion Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result
31 Lock Morgan, K. and Rubin, D.B. (2012). Rerandomization to Improve Covariate Balance in Experiments, Annals of Statistics, 40(2): Lock Morgan, K. and Rubin, D.B. (2015). Rerandomization to Balance Tiers of Covariates, JASA, 110(512):
32 Tiers of Covariates Place covariates into tiers of varying importance 1) Assess balance for the top tier as usual (calculate M 1, rerandomize if M 1 > a 1 ) 2) For each lower tier, regress each covariate on all covariates in more important tiers. Balance the residuals (calculate M t for the residuals of tier t, rerandomize if M t > a t ) Balance criteria can be more stringent for more important tiers
33
34 Computational Efficiency First, orthogonalize covariates: regress each covariate on all previous covariates, and use residuals as new covariate Mahalanobis distance simplifies to a sum of squared differences in covariate means If squared difference in means for any covariate exceeds threshold, rerandomize Result is the same as if we used original covariates, because Mahalanobis distance is affinely invariant
35 Sequential Assignment Rerandomization used in the design of a clinical trial proposed to the FDA for a drug to improve female sexual satisfaction Patients enroll sequentially: rerandomize in blocks of B women at a time (B = 6 for this example) 1. Randomize the first B women to enroll 2. Treat these women with placebo or active treatment 3. Wait for B more women to enroll 4. Use rerandomization for these next B women, measuring overall covariate balance for all 2B women enrolled 5. Treat these new B women with placebo or active 6. Wait for B more women to enroll For each rerandomization, generate a fixed number of randomizations and choose the best
36 Blocking This is not meant to replace blocking, and blocking and rerandomization can be used together Blocking is great for balancing a few very important covariates, but is not easily used for many covariates Block what you can, rerandomize what you can t
37 Unequal Group Sizes If necessary for incentive to participate, one option is to rerandomizeinto multiple equal sized groups, then combine groups after the fact Example: for a treatment group twice as big as the control, rerandomize into 3 balanced groups, then combine two to form the treatment group If only enough resources to support a certain number in one group, it can be worth reducing the sample size to get equal group sizes (if you care about unbiasedness)
38 Multiple Treatment Groups Rerandomization is easily extended to multiple treatment groups; just need a balance measure To measure balance, one of the MANOVA test statistics (such as Wilks Λ) can be used, measuring the ratio of within group variability to between group variability for multiple covariates This is equivalent to rerandomizing based on Mahalanobis distance if used on only two groups
39 Small Samples or Non-Normality The theoretical results given depend on M ~ χ k 2 This is not true if n is not large enough for the CLT to make the covariate means normal That is fine! Rerandomization does NOT depend on asymptotics. The threshold for a desired p a and the percent reduction in variance for covariates can be determined empirically through simulation Variance of the estimate is used nowhere in analysis. The randomization test will work, no matter how nonnormal the covariate means are
40 Conclusion Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result
41 Lock Morgan, K. and Rubin, D.B. (2012). Rerandomization to Improve Covariate Balance in Experiments, Annals of Statistics, 40(2): Lock Morgan, K. and Rubin, D.B. (2015). Rerandomization to Balance Tiers of Covariates, JASA, 110(512):
Combining Experimental and Non-Experimental Design in Causal Inference
Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017 A Tribute to Don Design trumps
More informationarxiv: v1 [stat.me] 6 Nov 2015
Improving Covariate Balance in K Factorial Designs via Rerandomization Zach Branson, Tirthankar Dasgupta, and Donald B. Rubin Harvard University, Cambridge, USA Abstract arxiv:5.0973v [stat.me] 6 Nov 05
More informationMatching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14
STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Frequency 0 2 4 6 8 Quiz 2 Histogram of Quiz2 10 12 14 16 18 20 Quiz2
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationObservational Studies and Propensity Scores
Observational Studies and s STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Makeup Class Rather than making you come
More informationWeighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.
Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University
More informationGroup Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology
Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,
More informationIndividualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models
Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018
More informationDEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS
DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu
More informationBalancing Covariates via Propensity Score Weighting: The Overlap Weights
Balancing Covariates via Propensity Score Weighting: The Overlap Weights Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu PSU Methodology Center Brown Bag April 6th, 2017 Joint
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching
More informationCHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials
CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationReview of the General Linear Model
Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis
More informationIntroduction to Crossover Trials
Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain
More informationSection 9c. Propensity scores. Controlling for bias & confounding in observational studies
Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a
More informationBalancing Covariates via Propensity Score Weighting
Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014
More informationConstruction and statistical analysis of adaptive group sequential designs for randomized clinical trials
Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Antoine Chambaz (MAP5, Université Paris Descartes) joint work with Mark van der Laan Atelier INSERM
More informationOptimal Blocking by Minimizing the Maximum Within-Block Distance
Optimal Blocking by Minimizing the Maximum Within-Block Distance Michael J. Higgins Jasjeet Sekhon Princeton University University of California at Berkeley November 14, 2013 For the Kansas State University
More informationQuestion. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?
Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.
More informationGroup Sequential Designs: Theory, Computation and Optimisation
Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationLeast Squares Estimation
Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition
More informationBasics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations
Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 13, 2009 Kosuke Imai (Princeton University) Matching
More informationwith the usual assumptions about the error term. The two values of X 1 X 2 0 1
Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)
More informationBios 6649: Clinical Trials - Statistical Design and Monitoring
Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver
More informationSequential rerandomization
Sequential rerandomization BY QUAN ZHOU, PHILIP. A. ERNST Department of Statistics, Rice University, Houston, Texas 77005, U.S.A. quan.zhou@rice.edu philip.ernst@rice.edu KARI LOCK MORGAN Department of
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 7 Course Summary To this point, we have discussed group sequential testing focusing on Maintaining
More informationMonitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison
Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj
More informationPractical Considerations Surrounding Normality
Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the
More informationGov 2002: 4. Observational Studies and Confounding
Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What
More informationLecture 1 January 18
STAT 263/363: Experimental Design Winter 2016/17 Lecture 1 January 18 Lecturer: Art B. Owen Scribe: Julie Zhu Overview Experiments are powerful because you can conclude causality from the results. In most
More informationGrowth Mixture Model
Growth Mixture Model Latent Variable Modeling and Measurement Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 28, 2016 Slides contributed
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 12 Review So far... We have discussed the role of phase III clinical trials in drug development
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer
More informationSampling and Sample Size. Shawn Cole Harvard Business School
Sampling and Sample Size Shawn Cole Harvard Business School Calculating Sample Size Effect Size Power Significance Level Variance ICC EffectSize 2 ( ) 1 σ = t( 1 κ ) + tα * * 1+ ρ( m 1) P N ( 1 P) Proportion
More informationComparison of Two Samples
2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:
More informationOverlap Propensity Score Weighting to Balance Covariates
Overlap Propensity Score Weighting to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu JSM 2016 Chicago, IL Joint work with Fan Li (Duke) and Alan Zaslavsky
More informationQ learning. A data analysis method for constructing adaptive interventions
Q learning A data analysis method for constructing adaptive interventions SMART First stage intervention options coded as 1(M) and 1(B) Second stage intervention options coded as 1(M) and 1(B) O1 A1 O2
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationCausal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD
Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 267 Optimizing Randomized Trial Designs to Distinguish which Subpopulations Benefit from
More informationEstimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.
Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description
More informationIntroduction to Econometrics. Review of Probability & Statistics
1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical
More informationOptimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping
: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology
More informationHYPOTHESIS TESTING. Hypothesis Testing
MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationECON 3150/4150, Spring term Lecture 7
ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationControl Function Instrumental Variable Estimation of Nonlinear Causal Effect Models
Journal of Machine Learning Research 17 (2016) 1-35 Submitted 9/14; Revised 2/16; Published 2/16 Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models Zijian Guo Department
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,
More informationIgnoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More information11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should
11 CHI-SQUARED Chapter 11 Chi-squared Objectives After studying this chapter you should be able to use the χ 2 distribution to test if a set of observations fits an appropriate model; know how to calculate
More informationAdaptive Trial Designs
Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!
More informationMultivariate Linear Regression Models
Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between
More informationEconometric Causality
Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models
More informationLecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)
Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology
More informationAdvanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1
Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()
More informationInterim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping
Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationStatistical Inference of Covariate-Adjusted Randomized Experiments
1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu
More informationDetermining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *
Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using
More informationBlocks are formed by grouping EUs in what way? How are experimental units randomized to treatments?
VI. Incomplete Block Designs A. Introduction What is the purpose of block designs? Blocks are formed by grouping EUs in what way? How are experimental units randomized to treatments? 550 What if we have
More informationIntermediate Econometrics
Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationThe t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary
Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationBayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects
Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University September 28,
More informationEstimating Causal Effects of Organ Transplantation Treatment Regimes
Estimating Causal Effects of Organ Transplantation Treatment Regimes David M. Vock, Jeffrey A. Verdoliva Boatman Division of Biostatistics University of Minnesota July 31, 2018 1 / 27 Hot off the Press
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationAPPENDIX B Sample-Size Calculation Methods: Classical Design
APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationLogistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015
Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationCompSci Understanding Data: Theory and Applications
CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American
More information5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)
5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1) Assumption #A1: Our regression model does not lack of any further relevant exogenous variables beyond x 1i, x 2i,..., x Ki and
More informationOn the efficiency of two-stage adaptive designs
On the efficiency of two-stage adaptive designs Björn Bornkamp (Novartis Pharma AG) Based on: Dette, H., Bornkamp, B. and Bretz F. (2010): On the efficiency of adaptive designs www.statistik.tu-dortmund.de/sfb823-dp2010.html
More informationNISS. Technical Report Number 167 June 2007
NISS Estimation of Propensity Scores Using Generalized Additive Models Mi-Ja Woo, Jerome Reiter and Alan F. Karr Technical Report Number 167 June 2007 National Institute of Statistical Sciences 19 T. W.
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationHaphazard Intentional Allocation and Rerandomization to Improve Covariate Balance in Experiments
Haphazard Intentional Allocation and Rerandomization to Improve Covariate Balance in Experiments Marcelo S. Lauretto, Rafael B. Stern, Kari L. Morgan, Margaret H. Clark +, Julio M. Stern Universidade de
More informationTutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:
Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported
More informationMultiple Linear Regression
Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random
More informationAn Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies
Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta
More information6 Multivariate Regression
6 Multivariate Regression 6.1 The Model a In multiple linear regression, we study the relationship between several input variables or regressors and a continuous target variable. Here, several target variables
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationSTK4900/ Lecture 3. Program
STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies
More informationPreface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of
Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures
More informationSequential rerandomization
Sequential rerandomization BY QUAN ZHOU, PHILIP. A. ERNST Department of Statistics, Rice University, Houston, Texas 77005, U.S.A. quan.zhou@rice.edu philip.ernst@rice.edu Rerandomization is a method for
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More information