Statistical Inference of Covariate-Adjusted Randomized Experiments
|
|
- Dominic Briggs
- 5 years ago
- Views:
Transcription
1 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Nov 8, 2018 at IMA,
2 2 Outline Introduction Framework General Properties Implementation and Correction Numerical Studies Conclusion
3 1 Introduction 3 1 Introduction Covariate-adjusted randomization is frequently used because it utilizes the covariate information to form more balanced treatment groups. Balance categorical covariates: Pocock and Simon s minimization method and its extensions (Taves,1974; Pocock and Simon 1975; Hu and Hu 2012) Balance continuous covariates based on distribution characteristics, e.g., mean and variance (Frane 1998), quartiles (Su 2011), density function (Ma and Hu 2013). Balance continuous covariates based on models (Atkinson 1982, Smith 1984ab) Balance covariates available prior to the experiment onset (Morgan and Rubin, 2012, 2015, Qin et al. 2017)
4 1 Introduction 4 Since covariate-adjusted randomizations inevitably use the covariate information in forming more balanced treatment groups, the subsequent statistical inference is usually affected and demonstrates undesirable properties, such as reduced type I errors and powers. This phenomenon of conservativeness is particularly common for a working model including only a subset of covariates used in randomization, such as two sample t test.
5 1 Introduction 5 It is ideal that the covariates used in randomization should be included in the subsequent analysis to achieve valid test. However, unadjusted tests still dominate in practice (Sverdlov, 2015). Investigation sites Simplicity of the test procedure Robustness to model misspecification As covariates are commonly used in comparative studies (biomarker analysis, precision medicine and crowdsourced-internet experimentation), understanding the impact of covariate-adjusted randomization on statistical inference is an increasingly pressing problem.
6 1 Introduction 6 Existing work Birkett (1985), Forsythe (1987), etc.. mainly based on simulations. Shao et al. (2010) shows t-test is conservative for stratified biased coin design. Ma et al. (2015) studied tests under a linear model for discrete covariate-adjusted randomization by assuming that overall and marginal imbalances are bounded in probability.
7 1 Introduction 7 Limitations Not applicable to randomizations directly balancing continuous covariates, e.g., Atkinson s D A -Biased Coin Design. The assumed balancing properties are too strong, i.e., O p (1) marginal imbalances. Do not consider the scenario when covariate information are avialable before the experiment starts, e.g., Rerandomization, Pairwise Sequential Randomization.
8 1 Introduction 8 Motivations Derive the statistical properties of inference under general covariate-adjusted randomization methods. Explicitly display the relationship between covariate balance and inference, and explain why inference behaves differently for various randomization methods. Obtain the results that have broad applications, including RR, PSR, and D A -BCD, and compare these methods analytically. Propose a method to attain valid and powerful tests.
9 2 Framework 9 2 Framework Suppose that n units are to be assigned to two treatment groups. T i denotes the assignment of the i-th unit, i.e., T i = 1 for treatment 1 and T i = 0 for treatment 2. Let x i = (x i,1,..., x i,p+q ) t represent p + q iid covariates observed for the i-th unit, where x i,j X j for i = 1,..., n. The underlying model: Y i = µ 1 T i + µ 2 (1 T i ) + p+q j=1 β j x i,j + ɛ i, where µ 1 µ 2 is the treatment effect, β = (β 1,..., β p+q ) t is the covariate effects, and ɛ i is iid random error with mean zero and variance σ 2 ɛ, and is independent of covariates. Covariates are assumed independent of each other with EX j = 0 for j = 1,..., p + q.
10 2 Framework 10 After allocating the units to treatment groups via covariate-adjusted randomization, a working model is used to estimate and test the treatment effect. In such a working model, it is common in practice to include a subset of covariates used in randomization, or sometimes even no covariates at all (Shao et al. 2010, Ma et al. 2015, Sverdlov 2015). The working model:. E[Y i ] = µ 1 T i + µ 2 (1 T i ) + p β j x i,j. j=1
11 2 Framework 11 Let Y = (Y 1,..., Y n ) t, T = (T 1,..., T n ) t, X = [X in ; X ex ], where x 1,1 x 1,p x 1,p+1 x 1,p+q X in =....., X ex =..... x n,1 x n,p x n,p+1 x n,p+q. Further let β in = (β 1,..., β p ) t, β ex = (β p+1,..., β p+q ) t, so that β = (β t in, βt ex) t. Then the working model can also be written as, E[Y ] = Gθ, where G = [T ; 1 n T ; X in ] is the design matrix, θ = (µ 1, µ 2, β t in )t is the vector of parameters of interest, and 1 n is the n-dimensional vector of ones. The ordinary least squares (OLS) estimate of θ, ˆθ = (ˆµ 1, ˆµ 2, ˆβ t in )t is, ˆθ = (G t G) 1 G t Y.
12 2 Framework 12 Testing the treatment effect: and the test statistic is H 0 : µ 1 µ 2 = 0 versus H 1 : µ 1 µ 2 0, S = L t ˆθ ˆσ 2 w L t (G t G) 1 L, where L = (1, 1, 0,..., 0) t is a vector of length p + 2, and ˆσ 2 w = Y G ˆθ 2 /(n p 2) is the model-based estimate of the error variance σ 2 w = σ 2 ɛ + q j=1 β2 p+j Var(X p+j). The traditional testing procedure is to reject the null hypothesis at the significance level α if S > z 1 α/2, where z 1 α/2 is (1 α/2)-th quantile of a standard normal distribution.
13 2 Framework 13 Testing the covariate effects: Let C be an m (p + 2) matrix of rank m (m p) with entries in the first two columns all equal to zero (no treatment effect to test). and the test statistic is, H 0 : Cθ = c 0 versus H 1 : Cθ = c 1, (1) S = (C ˆθ c 0 ) t [C(G t G) 1 C t ] 1 (C ˆθ c 0 ) mˆσ 2 w The traditional testing procedure is to reject the null hypothesis at the significance level α if S > z 1 α/2, where z 1 α/2 is (1 α/2)-th quantile of a standard normal distribution.
14 3 General Properties 14 3 General Properties Assumption 1 Global balance: n 1 n i=1 (2T i 1) p 0. Assumption 2 Covariate balance: n 1/2 n i=1 (2T i 1) x i d ξ, where ξ is a (p+q)-dimensional random vector with E[ξ] = 0.
15 3 General Properties 15 Consistency: Theorem 3.1 Given Assumptions 1 and 2, we have ˆθ p θ.
16 3 General Properties 16 Testing the treatment effect: We partition ξ = (ξ t in, ξt ex) t so that ξ in represents the first p dimensions of ξ, and ξ ex the last q dimensions. Further let λ 1 = σ ɛ /σ w, λ 2 = 1/σ w, and Z be a standard normal random variable that is independent of ξ ex. Theorem 3.2 Given Assumptions 1 and 2, we have 1. Under H 0 : µ 1 µ 2 = 0, then S d λ 1 Z + λ 2 β t exξ ex. 2. Under H 1 : µ 1 µ 2 0, consider a sequence of local alternatives with µ 1 µ 2 = δ/ n for a fixed δ 0, then S d λ 1 Z + λ 2 β t exξ ex λ 2δ.
17 3 General Properties 17 The asymptotic distribution of test statistic S under H 0 consists of two independent components, λ 1 Z and λ 2 β t exξ ex. The first component is due to the random error ɛ i in the underlying model, and remains invariant under different covariate-adjusted randomization. The second component of S represents the impact of a covariate-adjusted randomization on the test statistic through the level of covariate balance. Under covariate-adjusted randomization, ξ is more concentrated around 0 as opposed to complete randomization, leading to conservative tests.
18 3 General Properties 18 Testing the covariate effects: Theorem 3.3 Given Assumptions 1 and 2, we have 1. Under H 0 : Cθ = c 0, then S d χ 2 m /m. 2. Under H 1 : Cθ = c 1, consider a sequence of local alternatives with c 1 c 0 = / n for a fixed 0, then S d χ 2 m (φ)/m, φ = t [CV 1 C t ] 1 /σ 2 w. where φ is the non-central parameter, and V = diag (1/2, 1/2, Var(X 1 ),..., Var(X p )).
19 3 General Properties 19 The type I error is maintained when testing the covariate effects under covariate-adjusted randomization. The power, however, is reduced if not all covariate information is incorporated in the working model.
20 4 Implementation and Correction 20 4 Implementation and Correction 4.1 Examples Complete Randomization Rerandomization (Morgan and Rubin, 2012, 2015) Repeat the traditional randomization process until a satisfactory configuration is achieved. Pairwise Sequential Randomization (Qin et al, 2017) An alternative that achieves the optimal covariate balance and is computationally more efficient. Atkinson s D A -Biased Coin Design (Atkinson 1982, Smith 1984ab) Represent a large class of methods that take covariates into account in allocation rules based on certain optimality criteria.
21 4 Implementation and Correction 21 Rerandomization (1) Collect covariate data. (2) Specify a balance criterion to determine when a randomization is acceptable. For example, the criterion could be defined as a threshold of a > 0 on some user-defined imbalance measure, denoted as M. (3) Randomize the units into treatment groups using traditional randomization methods, such as CR. (4) Check the balance criterion M < a. If the criterion is satisfied, go to Step (5); otherwise, return to Step (3). (5) Perform the experiment using the final randomization obtained in Step (4).
22 4 Implementation and Correction 22 Pairwise Sequential Randomization (1) Collect covariate data. (2) Choose the covariate imbalance measure for n units, denoted as M(n). (3) Randomly arrange all n units in a sequence x 1,..., x n. (4) Separately assign the first two units to treatment 1 and treatment 2.
23 4 Implementation and Correction 23 (5) Suppose that 2i units have been assigned to treatment groups (i 1), for the (2i + 1)-th and (2i + 2)-th units: (5a) If the (2i + 1)-th unit is assigned to treatment 1 and the (2i + 2)-th unit is assigned to treatment 2 (i.e., T 2i+1 = 1 and T 2i+2 = 0), then we can calculate the potential imbalance measure, M (1) i, between the updated treatment groups with 2i + 2 units. (5b) Similarly, if the (2i + 1)-th unit is assigned to treatment 2 and the (2i + 2)-th unit is assigned to treatment 1 (i.e., T 2i+1 = 0 and T 2i+2 = 1), then we can calculate the potential imbalance measure, M (2) i, between the updated treatment groups with 2i + 2 units.
24 4 Implementation and Correction 24 (6) Assign the (2i + 1)-th and (2i + 2)-th units to treatment groups according to the following probabilities: ρ if M (1) i < M (2) i P(T 2i+1 = 1 x 2i,..., x 1, T 2i,..., T 1 ) = 1 ρ if M (1) i > M (2), i 0.5 if M (1) i = M (2) i where 0.5 < ρ < 1, and assign T 2i+2 = 1 T 2i+1 to maintain the equal proportions. (7) Repeat Steps (5) through (7) until all units are assigned.
25 4 Implementation and Correction 25 Atkinson s D A -Biased Coin Design Suppose n units have been assigned to treatment groups, D A -BCD assigns the (n + 1)-th unit to treatment 1 with probability P(T n+1 = 1 x n+1,..., x 1, T n,..., T 1 ) = [1 (1; x t n+1)(f t nf n ) 1 b n ] 2 [1 (1; x t n+1 )(Ft nf n ) 1 b n ] 2 + [1 + (1; x t n+1 )(Ft nf n ) 1 b n ] 2. where F n = [1 n ; X] and b t n = (2T 1 n ) t F n.
26 4 Implementation and Correction 26 Complete Randomization ξ CR N(0, Σ) Rerandomization ξ RR Σ 1/2 D D t D < a Pairwise Sequential Randomization Atkinson s D A -Biased Coin Design ξ PSR = O p ( 1 n ) ξ D-BCD N(0, 1 5 Σ) where Σ = diag(var(x 1 ),..., Var(X p+q )), D N(0, I p+q ) and I p+q is the (p + q)-dim identity matrix.
27 4 Implementation and Correction 27 Testing the Treatment Effect under Atkinson s D A -Biased Coin Design Theorem 4.1 Under D A -BCD, we have 1. Under H 0 : µ 1 µ 2 = 0, then ( S d N 0, σ2 ɛ + 1 q 5 j=1 β2 p+j Var(X ) p+j) σɛ 2 + q. j=1 β2 p+j Var(X p+j) 2. Under H 1 : µ 1 µ 2 0, where µ 1 µ 2 = δ/ n for a fixed δ 0, ( S d 1 N 2 λ 2δ, σ2 ɛ + 1 q 5 j=1 β2 p+j Var(X ) p+j) σɛ 2 + q. j=1 β2 p+j Var(X p+j)
28 4 Implementation and Correction 28 Testing the Treatment Effect under Pairwise Sequential Randomization Theorem 4.2 Under PSR, we have 1. Under H 0 : µ 1 µ 2 = 0, then ( S d N 0, σ 2 ɛ σ 2 ɛ + q j=1 β2 p+j Var(X p+j) 2. Under H 1 : µ 1 µ 2 0, where µ 1 µ 2 = δ/ n for a fixed δ 0, ( ) S d 1 N 2 λ σɛ 2 2δ, σɛ 2 + q j=1 β2 p+j Var(X. p+j) ).
29 4 Implementation and Correction 29 The variance from the covariates is completely eliminated out in the numerator of the asymptotic distribution of S, resulting in a distribution more concentrated around 0 than the standard normal distribution. This can be considered as an extension of the results in Ma et al. (2015) that studied conservative tests for covariate-adaptive designs balancing discrete covariates.
30 4 Implementation and Correction Correction for Conservativeness To correct conservativeness, we need to obtain the correct asymptotic critical values for valid tests. Based on the asymptotic distribution of S in Theorem 3.2. Need to estimate the unknown parameters. Or use Bootstrap method to do the correction. Computationally intensive.
31 4 Implementation and Correction 31 Table 1: Comparison of different covariate-adjusted randomization procedures in terms of covariate balance, traditional tests conservativeness, and corrected tests powers.
32 5 Numerical Studies 32 5 Numerical Studies Verification of Theoretical Results Underlying model: Y i = µ 1 T i + µ 2 (1 T i ) + 4 β j x i,j + ɛ i, j=1 where µ 1 = µ 2 = 0, β j = 1 for j = 1,..., 4. x i,j N(0, 1) for j = 1,..., 4 and is independent of each other. The random error ɛ i N(0, 2 2 ) is independent of all x i,j. Working model:. E[Y i ] = µ 1 T i + µ 2 (1 T i ) + β 1 x i,1 + β 2 x i,2
33 5 Numerical Studies 33 Verification of Theoretical Results CR Rerandomization Atkinson PSR pdf Simulated Theoretical N(0,1) pdf Simulated Theoretical N(0,1) pdf Simulated Theoretical N(0,1) pdf Simulated Theoretical N(0,1) t t t t Figure 1: Comparison of theoretical distributions and simulated distributions of S. In each panel, red solid curve represents the simulated distribution, blue dash curve represents the theoretical distribution, and the gray bold curve is the standard normal density.
34 5 Numerical Studies 34 Conservative Hypothesis Testing for Treatment Effect Underlying model: Y i = µ 1 T i + µ 2 (1 T i ) + 6 β j x i,j + ɛ i, (2) j=1 where β j = 1 for j = 1,...6. x i,j N(0, 1) and is independent of each other. The random error ɛ i N(0, 2 2 ) is independent of all x i,j. Working model: W1: E[Y i ] = µ 1 T i + µ 2 (1 T i ). W2: E[Y i ] = µ 1 T i + µ 2 (1 T i ) + 2 j=1 β jx i,j. W3: E[Y i ] = µ 1 T i + µ 2 (1 T i ) + 6 j=3 β jx i,j. W4: E[Y i ] = µ 1 T i + µ 2 (1 T i ) + 6 j=1 β jx i,j.
35 5 Numerical Studies 35 Conservative Hypothesis Testing for Treatment Effect: Type I error Randomization W1 W2 W3 W4 CR RR D A -BCD PSR Table 2: Type I error of traditional tests for treatment effect using different working models and different randomization procedures.
36 5 Numerical Studies 36 Corrected Hypothesis Testing for Treatment Effect: Type I error Randomization W1 W2 W3 W4 CR RR D A -BCD PSR Table 3: Type I error of hypothesis testing for treatment effect using estimated asymptotic distribution s critical values under different working models and different randomization procedures.
37 5 Numerical Studies 37 Corrected Hypothesis Testing for Treatment Effect: Power CR Rerandomization Atkinson PSR Power W4 W3 W2 W1 Power W4 W3 W2 W1 Power W4 W3 W2 W1 Power W4 W3 W2 W u0 u1 u0 u1 u0 u1 u0 u1 Figure 2: Power against µ 1 µ 2 using estimated asymptotic distribution s critical values and p-values. Sample size n = 500. Note that we plot the power of W4 under CR in bold gray curves in all the panels for a better comparison among different randomizations.
38 6 Conclusion 38 6 Conclusion Derive inference properties under general covariate-adjusted randomization. Explicitly unveil the relationship between covariate-adjusted and inference properties. Apply the general theory to several important randomization methods. A correction approach is proposed to attain valid and powerful test.
39 6 Conclusion 39 Thank you!
Linear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationEXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS
EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker
More informationRANDOMIZATIONN METHODS THAT
RANDOMIZATIONN METHODS THAT DEPEND ON THE COVARIATES WORK BY ALESSANDRO BALDI ANTOGNINI MAROUSSA ZAGORAIOU ALESSANDRA G GIOVAGNOLI (*) 1 DEPARTMENT OF STATISTICAL SCIENCES UNIVERSITY OF BOLOGNA, ITALY
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationApplied Statistics Preliminary Examination Theory of Linear Models August 2017
Applied Statistics Preliminary Examination Theory of Linear Models August 2017 Instructions: Do all 3 Problems. Neither calculators nor electronic devices of any kind are allowed. Show all your work, clearly
More informationInference After Variable Selection
Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection
More informationCorner. Corners are the intersections of two edges of sufficiently different orientations.
2D Image Features Two dimensional image features are interesting local structures. They include junctions of different types like Y, T, X, and L. Much of the work on 2D features focuses on junction L,
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationOn testing the equality of mean vectors in high dimension
ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationAnswer Key for STAT 200B HW No. 8
Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationStatistics. Statistics
The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationProblem 1 (20) Log-normal. f(x) Cauchy
ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set
More information2014/2015 Smester II ST5224 Final Exam Solution
014/015 Smester II ST54 Final Exam Solution 1 Suppose that (X 1,, X n ) is a random sample from a distribution with probability density function f(x; θ) = e (x θ) I [θ, ) (x) (i) Show that the family of
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationEconometrics I. Ricardo Mora
Econometrics I Department of Economics Universidad Carlos III de Madrid Master in Industrial Economics and Markets Outline Motivation 1 Motivation 2 3 4 Motivation The Analogy Principle The () is a framework
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationFinal Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.
1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically
More informationSUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota
Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing
Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Eric Zivot October 12, 2011 Hypothesis Testing 1. Specify hypothesis to be tested H 0 : null hypothesis versus. H 1 : alternative
More informationFixed Effects Models for Panel Data. December 1, 2014
Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationHypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationComprehensive Examination Quantitative Methods Spring, 2018
Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part
More informationCovariance and Correlation
Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationExpectation propagation for symbol detection in large-scale MIMO communications
Expectation propagation for symbol detection in large-scale MIMO communications Pablo M. Olmos olmos@tsc.uc3m.es Joint work with Javier Céspedes (UC3M) Matilde Sánchez-Fernández (UC3M) and Fernando Pérez-Cruz
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationStatistics 135: Fall 2004 Final Exam
Name: SID#: Statistics 135: Fall 2004 Final Exam There are 10 problems and the number of points for each is shown in parentheses. There is a normal table at the end. Show your work. 1. The designer of
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More information1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as
ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationHypothesis Testing For Multilayer Network Data
Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationNonparametric Location Tests: k-sample
Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationQuestions and Answers on Unit Roots, Cointegration, VARs and VECMs
Questions and Answers on Unit Roots, Cointegration, VARs and VECMs L. Magee Winter, 2012 1. Let ɛ t, t = 1,..., T be a series of independent draws from a N[0,1] distribution. Let w t, t = 1,..., T, be
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationLecture 11 Weak IV. Econ 715
Lecture 11 Weak IV Instrument exogeneity and instrument relevance are two crucial requirements in empirical analysis using GMM. It now appears that in many applications of GMM and IV regressions, instruments
More informationStatistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization
Statistics and Probability Letters ( ) Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: wwwelseviercom/locate/stapro Using randomization tests to preserve
More informationBIOS 312: Precision of Statistical Inference
and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationImplementing Response-Adaptive Randomization in Multi-Armed Survival Trials
Implementing Response-Adaptive Randomization in Multi-Armed Survival Trials BASS Conference 2009 Alex Sverdlov, Bristol-Myers Squibb A.Sverdlov (B-MS) Response-Adaptive Randomization 1 / 35 Joint work
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationProf. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis
Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation
More information