A Distributional Approach Using Propensity Scores

Size: px
Start display at page:

Download "A Distributional Approach Using Propensity Scores"

Transcription

1 A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health zta Jue 20, 2005

2 Outlie Itroductio Couterfactual framework Illustratio Applicatio No-cofoudig case Kow propesity score Parametric propesity score Cofoudig case

3 Itroductio Right heart catheterizatio (RHC) is performed daily i hospitals sice 1970s. The beefit of RHC had NOT bee demostrated i a successful radomized cliical trial. Coors et al. s (1996) observatioal study raised the cocer that RHC might ot beefit critically ill patiets ad might i fact cause harm. Data were collected o 5735 critically ill patiets admitted to the ICUs of five medical ceters: Treatmet: No-RHC or RHC Outcome: 30-day survival Covariates: 75 covariates HOW to evaluate the effect of RHC o survival?

4 Couterfactual framework X: covariates measured T : treatmet variable takig value 0 or 1 if a patiet actually receives No-RHC or RHC (Y 0, Y 1 ): potetial outcome that would be observed if a patiet received No-RHC or RHC Y = (1 T ) Y 0 + T Y 1 : observed outcome We are iterested i average causal effect E( Y 1 Y 0 ) = E(Y 1 ) E(Y 0 ) or P ({Y 1 }) versus P ({Y 0 }) Assigmet mechaism No-cofoudig: Cofoudig: T (Y 0, Y 1 ) X T (Y 0, Y 1 ) X Propesity score: π(x) = P (T = 1 X)

5 Thirty day survival curves RHC, Raw No RHC, Raw Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi

6 Illustratio RHC= 1 RHC= 0 BP= 1 (52, 28) 80 (11, 9) 20 BP= 0 (30, 10) 40 (37, 23) 60 82, , Patiets get RHC at radom P ( survival RHC = 1 ) 82/120 = 68.3% P ( survival RHC = 0 ) 48/80 = 60.0% Patiets get RHC at radom give blood pressure Weight each patiet such that 80w 1 (1) = 1/2, 40w 1 (0) = 1/2, 20w 0 (1) = 1/2, 60w 0 (0) = 1/2. Compare the weighted probabilities 52w 1 (1) + 30w 1 (0) = 70.0%, 11w 0 (1) + 37w 0 (0) = 58.3%.

7 WHAT IF patiets are NOT equally likely to get RHC at each level of blood pressure? Previous estimates: P ( obs survival BP =, RHC = 1 ) = 70.0%, P ( obs survival BP =, RHC = 0 ) = 58.3%. Weight each patiet such that λ 1i w 1 (1) = , i=81 λ 0i w 0 (1) = , 41 where Λ 1 λ 1i, λ 0i Λ (Λ = 1.5). λ 1i w 1 (0) = 1 2, λ 0i w 0 (0) = 1 2, Boud the weighted probabilities 120 λ 1i w 1 (X i ) Y 1i, subject to the foregoig costraits. λ 0i w 0 (X i ) Y 0i, P (!obs survival BP =, RHC = 1 ) 72.2%, P (!obs survival BP =, RHC = 0 ) 55.0%.

8 Thirty day survival curves RHC, Raw No RHC, Raw Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi

9 Thirty day survival curves RHC, Raw No RHC, Raw RHC, Weighted No RHC, Weighted Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi Weighted histogram of aps RHC No RHC Weighted histogram of meabp Weighted histogram of pafi

10 Thirty day survival curves RHC, Observed RHC, Couterfactual No RHC, Couterfactual No RHC, Observed Day Proportio of Survivig Weighted histogram of aps No RHC, Couterfactual No RHC, Observed Weighted histogram of meabp Weighted histogram of pafi Weighted histogram of aps RHC, Observed RHC, Couterfactual Weighted histogram of meabp Weighted histogram of pafi

11 No-cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: = L 1 L 2 [ ] (1 π(x i )) 1 T i π(x i ) T i [ ] G 0 ({X i, Y 0i }) 1 T i G 1 ({X i, Y 1i }) T i where G 0 is the joit distributio of (X, Y 0 ) ad G 1 is the joit distributio of (X, Y 1 ). G 0 ad G 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dg 0 (x, y 0 ) = h(x) dg 1 (x, y 1 ) for each bouded fuctio h o X. Take fiitely may costraits ad fid MLE (Ĝ 0, Ĝ 1 ): ˆµ 1 = y 1 dĝ 1 (x, y 1 ), ˆµ 0 = y 0 dĝ 0 (x, y 0 ).

12 Kow propesity score [Model S0: kow π ] Maximize the likelihood subject to the costraits π (x) dg 0 = π (x)dg 1, h j (x) dg 0 = h j (x)dg 1, j = 1,..., m. Let h = (π, 1 π, h 1,..., h m). Maximize 1 1 log(λ h (X i )) + 1 The Ĝ 1 {(X i, Y 1i )} = Ĝ 0 {(X i, Y 0i )} = i= 1 +1 log(1 λ h (X i )). 1 λ h (X i ), i = 1,..., 1, 1 1 λ h (X i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i π (X i ) β 1 [ 1 Y 0i (1 T i ) 1 π (X i ) β 0 h (X i ) ( Ti )] 1 π (X i ) π (X i ) 1, [ 1 h (X i ) ( 1 Ti )] π (X i ) 1 π (X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

13 The method of cotrol variates: 1 Y 1i T i π (X i ) b 1 [ 1 i=0 h (X i ) 1 π (X i ) ( Ti π (X i ) 1 )]. The optimal choice of b 1 is β 1 = B 1 C 1. A more geeral class of estimators: 1 Y 1i T i π (X i ) 1 ( Ti ) φ 1 (X i ) π (X i ) 1. The optimal choice of φ 1 (x) is E(Y 1 X = x). achieves semiparametric efficiecy uder S0. Choose h such that E(Y 1 X = x) is cotaied the liear spa of E(Y 0 X = x) is cotaied the liear spa of h (x) 1 π (x), h (x) π (x). Outcome regressio [Model R] E(Y 1 X) = Ψ ( α 1 g 1(X) ), E(Y 0 X) = Ψ ( α 0 g 0(X) ). Choose h = ( π, 1 π, π Ψ(ˆα 0 g 0), (1 π )Ψ(ˆα 1 g 1) ).

14 Parametric propesity score [Model S: π( ; γ)] Maximize the likelihood subject to the costraits ˆπ(x) dg 1 = ˆπ(x)dG 0, ĥ j (x) dg 1 = ĥ j (x)dg 0, j = 1,..., m. Let ĥ = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m ). Maximize 1 1 log(λ ĥ(x i )) + 1 i= log(1 λ ĥ(x i )). The Ĝ 1 {(X i, Y 1i )} = λ ĥ(x i ), i = 1,..., 1, 1 Ĝ 0 {(X i, Y 0i )} = 1 λ ĥ(x i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i ˆπ(X i ) β 1 [ 1 Y 0i (1 T i ) 1 ˆπ(X i ) β 0 [ 1 ĥ(x i ) ( Ti )] 1 ˆπ(X i ) ˆπ(X i ) 1, ĥ(x i ) ( 1 Ti )] ˆπ(X i ) 1 ˆπ(X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

15 Our strategy is To build ad check propesity score models to esure cosistecy To use outcome regressio models for variace ad bias reductio Propesity score models ca be checked with the followig idea: Pick up a collectio of test fuctios ĥ j s o X, for example, (ˆπ, 1 ˆπ, ˆπX, (1 ˆπ)X). Compute the sample average ( T Ẽ[ĥj (X) ˆπ(X) 1 T )] 1 ˆπ(X) i.e. average differece i ĥ j (X) betwee the treated ad cotrol after propesity score weightig. If model S is correct, the the sample averages relative to stadard errors, or z-ratios, should be statistically osigificat from zero. Examiatio of z-ratios agaist the stadard ormal ca reveal possible misspecificatio of model S.

16 z ratio z ratio Model Model

17 Cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: L 1 L 2 [ ] = (1 π(x i )) 1 T i π(x i ) T i [ ] H 0 ({X i, Y 0i }) 1 T i H 1 ({X i, Y 1i }) T i where H 0 is the distributio P ({Y 0 } T = 0, X)P ({X}) ad H 1 is the distributio P ({Y 1 } T = 1, X)P ({X}). H 0 ad H 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dh 0 (x, y 0 ) = h(x) dh 1 (x, y 1 ) for each bouded fuctio h o X. Covergece of previous estimates: (Ĝ 0, Ĝ 1 ) (H 0, H 1 ) ˆµ 1, µ 1 E[E(Y 1 T = 1, X)] ˆµ 0, µ 0 E[E(Y 0 T = 0, X)]

18 Umeasured cofoudig: gaps betwee P ({Y 0 } T = 0, X) ad P ({Y 0 } T = 1, X) P ({Y 1 } T = 0, X) ad P ({Y 1 } T = 1, X) i.e. systematic differeces betwee the treated ad utreated eve if they received the same treatmet. Defie the Rado-Nikodym derivatives: λ 0 (Y 0 ; X) = P (dy 0 T = 1, X) P (dy 0 T = 0, X), λ 1 (Y 1 ; X) = P (dy 1 T = 0, X) P (dy 1 T = 1, X). The case λ 0 = λ 1 1 correspods to o cofoudig, while deviatios of λ 0 ad λ 1 from 1 idicate umeasured cofoudig. By Bayes rule, λ 0 ad λ 1 ca be see as odds ratios: λ 0 (Y 0 ; X) = 1 π(x) P (T = 1 Y 0, X) π(x) P (T = 0 Y 0, X), λ 1 (Y 1 ; X) = π(x) P (T = 0 Y 1, X) 1 π(x) P (T = 1 Y 1, X). A sesitivity aalysis model: Λ 1 λ 0 (Y 0 ; X), λ 1 (Y 1 ; X) Λ, where Λ 1 idicates the degree of departure from o cofoudig.

19 Let ĥ c = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m c). For a value of Λ, fid bouds for y t λ t dh t by liear programmig: mi or max y t λ t dĝ t subject to λ t dĝ t = 1, ˆπ(x)λ t dĝ t = ˆπ(x) dĝ t, ĥ j (x)λ t dĝ t = ĥ j (x) dĝ t, j = 1,..., m c, ad 1 Λ λ t Λ. Ĝ 1 is supported o {(X i, Y 1i )},...,1 ad Ĝ 0 o {(X i, Y 0,i )} i=1 +1,...,. Itegral is fiite sum. The ukows are the values of λ t o observed data: λ 1i = λ 1 (Y 1i ; X i ), i = 1,..., 1, λ 0i = λ 0 (Y 0i ; X i ), i = 1 + 1,...,. Comparisos of the distributios Ĝ 0 [Y 0 T = 0, X][X], Ĝ 1 [Y 1 T = 1, X][X] λ 0 dĝ 0 [Y 0 T = 1, X][X], λ 1 dĝ 1 [Y 1 T = 0, X][X] idicate (i) balace o covariates, (ii) hidde bias, ad (iii) causal effects.

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov Microarray Ceter BIOSTATISTICS Lecture 5 Iterval Estimatios for Mea ad Proportio dr. Petr Nazarov 15-03-013 petr.azarov@crp-sate.lu Lecture 5. Iterval estimatio for mea ad proportio OUTLINE Iterval estimatios

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals Chapter 6 Studet Lecture Notes 6-1 Busiess Statistics: A Decisio-Makig Approach 6 th Editio Chapter 6 Itroductio to Samplig Distributios Chap 6-1 Chapter Goals After completig this chapter, you should

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms:

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Sample Size Determination (Two or More Samples)

Sample Size Determination (Two or More Samples) Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework

High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework High-Dimesioal M-Estimatio with Missig Outcomes: A Semi-Parametric Framework (A Overview of the Methods ad the Mai Results) Abhishek Chakrabortty Uiversity of Pesylvaia Harvard Visit. August 20-23, 2018.

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740 Ageda: Recap. Lecture. Chapter Homework. Chapt #,, 3 SAS Problems 3 & 4 by had. Copyright 06 by D.B. Rowe Recap. 6: Statistical Iferece: Procedures for μ -μ 6. Statistical Iferece Cocerig μ -μ Recall yes

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

Mathematics 170B Selected HW Solutions.

Mathematics 170B Selected HW Solutions. Mathematics 17B Selected HW Solutios. F 4. Suppose X is B(,p). (a)fidthemometgeeratigfuctiom (s)of(x p)/ p(1 p). Write q = 1 p. The MGF of X is (pe s + q), sice X ca be writte as the sum of idepedet Beroulli

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Aalysis Mahida Samarakoo Jauary 28, 2016 Mahida Samarakoo STAC51: Categorical data Aalysis 1 / 35 Table of cotets Iferece for Proportios 1 Iferece for Proportios Mahida Samarakoo

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

Homework 5 Solutions

Homework 5 Solutions Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity

More information

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for Sample questios Suppose that humas ca have oe of three bloodtypes: A, B, O Assume that 40% of the populatio has Type A, 50% has type B, ad 0% has Type O If a perso has type A, the probability that they

More information

Chapter two: Hypothesis testing

Chapter two: Hypothesis testing : Hypothesis testig - Some basic cocepts: - Data: The raw material of statistics is data. For our purposes we may defie data as umbers. The two kids of umbers that we use i statistics are umbers that result

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

Additional Notes and Computational Formulas CHAPTER 3

Additional Notes and Computational Formulas CHAPTER 3 Additioal Notes ad Computatioal Formulas APPENDIX CHAPTER 3 1 The Greek capital sigma is the mathematical sig for summatio If we have a sample of observatios say y 1 y 2 y 3 y their sum is y 1 + y 2 +

More information

A new distribution-free quantile estimator

A new distribution-free quantile estimator Biometrika (1982), 69, 3, pp. 635-40 Prited i Great Britai 635 A ew distributio-free quatile estimator BY FRANK E. HARRELL Cliical Biostatistics, Duke Uiversity Medical Ceter, Durham, North Carolia, U.S.A.

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}. 1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Asymptotic Coupling and Its Applications in Information Theory

Asymptotic Coupling and Its Applications in Information Theory Asymptotic Couplig ad Its Applicatios i Iformatio Theory Vicet Y. F. Ta Joit Work with Lei Yu Departmet of Electrical ad Computer Egieerig, Departmet of Mathematics, Natioal Uiversity of Sigapore IMS-APRM

More information

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Data Description. Measure of Central Tendency. Data Description. Chapter x i Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3

More information

IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS

IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS HANXIANG PENG AND ANTON SCHICK Abstract. Suppose we have data from a bivariate model with parametric margials. Efficiet

More information

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions CEU Departmet of Ecoomics Ecoometrics, Problem Set - Solutios Part A. Exogeeity - edogeeity The liear coditioal expectatio (CE) model has the followig form: We would like to estimate the effect of some

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Chapter 13: Tests of Hypothesis Section 13.1 Introduction Chapter 13: Tests of Hypothesis Sectio 13.1 Itroductio RECAP: Chapter 1 discussed the Likelihood Ratio Method as a geeral approach to fid good test procedures. Testig for the Normal Mea Example, discussed

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i : Error Error & Ucertaity The error is the differece betwee a TRUE value,, ad a MEASURED value, i : E = i There is o error-free measuremet. The sigificace of a measuremet caot be judged uless the associate

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data Appedix to: Hypothesis Testig for Multiple Mea ad Correlatio Curves with Fuctioal Data Ao Yua 1, Hog-Bi Fag 1, Haiou Li 1, Coli O. Wu, Mig T. Ta 1, 1 Departmet of Biostatistics, Bioiformatics ad Biomathematics,

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005 Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008 Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Final Review for MATH 3510

Final Review for MATH 3510 Fial Review for MATH 50 Calculatio 5 Give a fairly simple probability mass fuctio or probability desity fuctio of a radom variable, you should be able to compute the expected value ad variace of the variable

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Probability and Statistics

Probability and Statistics ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Logit regression Logit regression

Logit regression Logit regression Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

of the matrix is =-85, so it is not positive definite. Thus, the first

of the matrix is =-85, so it is not positive definite. Thus, the first BOSTON COLLEGE Departmet of Ecoomics EC771: Ecoometrics Sprig 4 Prof. Baum, Ms. Uysal Solutio Key for Problem Set 1 1. Are the followig quadratic forms positive for all values of x? (a) y = x 1 8x 1 x

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M. MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval

More information

Estimation of Gumbel Parameters under Ranked Set Sampling

Estimation of Gumbel Parameters under Ranked Set Sampling Joural of Moder Applied Statistical Methods Volume 13 Issue 2 Article 11-2014 Estimatio of Gumbel Parameters uder Raked Set Samplig Omar M. Yousef Al Balqa' Applied Uiversity, Zarqa, Jorda, abuyaza_o@yahoo.com

More information