5. Fractional Hot deck Imputation

Size: px
Start display at page:

Download "5. Fractional Hot deck Imputation"

Transcription

1 5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese that P rδ x, y does ot deped o y. We do ot wat to make strog parametric assumptios o fy x. If x is a categorical variable with support,,, the fy x g ca be described as which is sometimes called cell mea model. y x g i.i.d µ g, σ 2 g Eve whe x is ot categorical, oe ca cosider a approximatio fy x π g xf g y where π g x P z g x ad f g y fy z g. Such approximatio is accurate whe fy x, group fy group. 2 roup g is ofte called imputatio cell. Uder MAR, oe imputatio method is to take a radom sample from the set of respodets i the same imputatio cell. Hot deck imputatio

2 Partitio the sample ito groups: A A A 2 A. I group g, we have g elemets ad r g respodets. We eed to impute y values for the orespodets. For each group A g, select m g g r g imputed values from r g respodets with replacemet or without replacemet. Example A g A Rg A Mg with A Rg i A g ; δ i ad A Mg i A g ; δ i 0. Imputatio mechaism: yj Uiform y i ; i A Rg. That is, yj y i with probability /r g for i A Rg ad j A Mg. Note that E I y j r g i S Rg y i ȳ Rg V I y j r g i S Rg y i ȳ Rg S 2 Rg where E I is the expectatio take with respect to the imputatio mechaism. Imputed estimator of θ EY : ˆθ I δ i y i + δ i yi Variace V ˆθI V E I ˆθI + E V I ˆθI V g ȳ Rg + E 2 m g S2 Rg. Uder the model y i i S g i.i.d. µ g, σ 2 g, the variace ca be writte V ˆθI V g µ g + E 2 2 g + 2m g + m g m g σg 2. r g

3 Note that V ˆθ V Thus, the variace is icreased. g µ g + E Variace is icreased after imputatio. Two sources:. Reduced sample size. 2 g σ 2 g 2. Radomess due to imputatio mechaism i stochastic imputatio. Variace estimatio after imputatio: A Naive approach is treatig the imputed values as if observed ad applyig the stadard variace estimatio formula to the imputed data. Such aive approach uderestimates the true variace! Lemma 5. If Ey i Ey i ad V y i V y i, the aive variace estimator ˆV I S 2 I has expectatio E ˆV I V ˆθ A proof of Lemma 5. is give i Appedix A. Example Cotiued: Note that V y i E V I y i + V E I y i V ȳ Rg + E S 2 Rg V ˆθ V ˆθ I V ˆθ. 3 rg σg 2 + rg σg 2 σg 2 V y i. Thus, the assumptios for 3 are satisfied ad obtai E ˆVI V g µ g + E Thus, we ca write V ˆθI V ˆθ + E c g σg 2 for some c g. The approximate bias-corrected estimator is ˆV ˆV I + c g SRg. 2. g g 2m g m gm g r g σ 2 g. 3

4 2 Fractioal Hot deck imputatio For each j A Mg, select M imputed values from A Rg usig SRS simple radom samplig without replacemet. Let ȳ Ij M i A R d ij y i be the mea of the M imputed values for y j where d ij if y i is selected as oe of the imputed values for missig y j. The resultig fractioal hot deck imputatio FHDI estimator of θ EY is ˆθ F HDI, δ j y j + δ j ȳ Ij j δ j y j + δ j j i A R w ijy i where wij d ij /M. Also, the FHDI estimator of θ 2 P Y < is ˆθ F HDI,2 M δ i Iy i < + δ i w ijiyij <. Thus, oe set of FHDI data ca be used to estimate several parameters. How to estimate the variace of FHDI estimators? Kim ad Fuller 2004 cosidered a modified jackkife method where the fractioal weights are modified to estimate the variace correctly. Jackkife method for complete data: For ˆθ ȳ, the jackkife variace estimator is defied by where ad Note that ˆV JK ȳ k ȳ k j ȳk ȳ w k i y i w k 0 if i k i otherwise. ȳ k ȳ y k ȳ 4 4

5 ad so ˆV JK ȳ k ȳk ȳ y k ȳ k S 2 ad it is ubiased for V ȳ. Similarly, the jackkife variace estimator of ˆθ,2 Iy i < is computed by where ˆV JK ˆθ,2 ˆθ k,2 k ˆθk,2 ˆθ,2 w k i Iy i <. Thus, jackkife is easy to implemet as we have oly to repeat the same estimatio procedure usig w k i istead of w i. Now, to estimate the FHDI estimator i 4, first ote that the FHDI estimator of θ EY ca be writte as ˆθ F HDI, δ i α i y i where α i + M j A M d ij ad d ij if y i is selected as oe of the imputed values for missig y j. The variace of ˆθ F HDI, is the V ˆθF HDI, V g µ g + E αi 2 σg 2. 5 The modified jackkife method of Kim ad Fuller 2004 is computed by ˆV JK ˆθ F HDI, k ˆθk F HDI, ˆθ F HDI,, 6 5

6 where ad ˆθ k F HDI, w k ij j w k j δ j y j + δ j i A R w k ij d ij y i : α k i y i i A R wij φ k if i k ad d kj wij + φ k M if i k ad d kj d ij wij otherwise Here, φ k 0, /M is a costat to be determied. Note that j w k ij. The expected value of the modified jackkife variace estimator i 6 is give by E ˆV JK ˆθ F HDI, V g µ g +E k α k i To obtai the variace of FHDI estimator correctly, we wat to achieve k α k i α i α i σ 2 g. 7 α 2 i. 8 There are two approaches of computig the suitable set of replicated fractioal weights that satisfy 8. Oe is a bisectio method ad the other is a closed form solutio described i Appedix A. To describe the bisectio method, we use the followig steps:. Set φ 0 k /2M. 2. Usig the curret value of φ t k, compute Q gφ t k i A g 3. If Q gφ t < 0, the set φ t+ k If Q gφ t > 0, the set φ t+ k Cotiue this update for g,,. α k i α i α 2 i. φ t k + /2t+ /M for all k A Rg. φ t k /2 t+ /M for all k A Rg. 4. Cotiue the process util Q g φ t < ɛ for a sufficietly small ɛ > 0. 6

7 3 Fully efficiet fractioal imputatio Uder the cell mea model, the best imputatio is to use the cell mea imputatio: ˆθ Id i A g δ i y i + δ i ȳ Rg, which is algebraically equivalet to fully efficiet fractioal imputatio FEFI estimator ˆθ F EF I j A g δ jy i + δ j w ijy i, 9 with w ij /r g. The FEFI estimator uses all the respodets i the same cell as the imputed values for each missig value. Note that 9 is equivalet to ˆθ F EF I which is equivalet to usig ˆπ g g r g y i, 0 g /r g as the orespose adjustmet factor that is multiplied to the origial weight of the respodets i the cell g. For the fractioal hot deck imputatio estimator ˆθ F HDI i Sectio 2, we ca obtai V ˆθ F HDI V ˆθF EF I + V ˆθF HDI ˆθ F EF I because we have E I ˆθ F HDI ˆθ F EF I. The secod term is the variace due to radom selectio i the fractioal hot deck imputatio ad will be zero if M. How to reduce the secod term i?. Icrease M. 2. Use calibratio weightig. 3. Use a balaced samplig mechaism to select doors. Calibratio weightig approach Fuller ad Kim, 2005: 7

8 . For each j A Mg, we select M elemets from y i ; i A Rg at radom with equal probability. 2. The iitial fractioal weight for each door is w 0 ij /M for d ij. 3. The fractioal weights are modified to satisfy y i + j A Mg w ijd ij y i g ȳ Rg ad Oe solutio is w ijd ij. w ij w 0 ij where ȳ Ij w 0 ij y i, T g j A Mg ȳ Ig y g i + + ȳ Rg ȳ Ig Tg w 0 ij d ij y i ȳ Ij w 0 ij j A Mg ȳ Ij. d ij y i ȳ Ij, ad Modifyig the weights to satisfy certai costraits is a popular problem i statistics. Such weightig is ofte called calibratio weightig. Regressio weightig is of the form i 2 ad is extesively discussed i survey samplig courses Stat 52, Stat 62. Balaced imputatio approach Chauvet et al., 20: Apply a balaced samplig techique to achieve that y i + j A Mg M d ij y i gȳ Rg. Note: Balaced samplig is a set of samplig method that satisfies some costraits: i A w i z i i U where w i is the desig weight iverse of the first-order iclusio probability ad z i is the desig variable. Stratified samplig is oe example of balaced 8 z i

9 samplig whe z i is categorical. For more geeral case, we may use Cube method Deville ad Tillé, 2004 or rejective method Fuller, For variace estimatio, the first term of is easy to estimate because we ca easily take ito accout of the samplig variability of ˆπ g ito estimatio. That is, writig ˆπ ˆπ,, ˆπ, we ca express ˆθ F EF I ˆθ F EF I ˆπ. To estimate the variace of ˆθ F EF I, we eed to icorporate the samplig variability of ˆπ i ˆθ F EF I. Either Taylor liearizatio or replicatio method ca be used. Liearizatio method: Let U g π g i A g δ i π g be a estimatig fuctio for π g. i.e. ˆπ g is obtaied by solvig U g π g 0 for π g. Now, usig ˆθ F EF I ˆπ ˆθF EF I π + E ˆθ F EF I π g ˆπ g π g ad 0 U g ˆπ g Ug Ug π g + E ˆπ g π g, π g we have ˆθ F EF I ˆπ ˆθF EF I π : E ˆθ F EF I π g π g y i + i A g η i µ g + δ i π g y i µ g Ug E U g π g π g g r g µ g π g where η i µ g + δ i /π g y i µ g if i A g. Oce η i is calculated, we ca apply a stadard variace formula to ˆη i ˆµ g + δ i /ˆπ g y i ˆµ g to obtai the liearized variace estimator. Replicatio method such as jackkife is straightforward. 9

10 4 FHDI method usig a parametric model Assume y fy x where x is always observed ad y is subject to missigess. Now suppose that 2 does ot hold ad the cell mea model is ot satisfied. That is, we does ot create imputatio cells. But, we still wat to take the real observatios as the imputed values. Kim ad Yag 204 approach: Three steps. Fully efficiet fractioal imputatio FEFI by choosig all the respodets as doors. That is, we use M r imputed values for each missig uit, where r is the umber of respodets i the sample. Compute the fractioal weights. 2. Use a systematic PPS samplig to select m << r doors from the FEFI usig the fractioal weights as the size measure. 3. Use a calibratio weightig techique to compute the fial fractioal weights which lead to the same estimates of FEFI for some items. Step : FEFI step Wat to fid the fractioal weights w ij whe the j-th imputed value y j i is take from the j-th value i the set of the respodets. Without loss of geerality, we assume that the first r elemets respod ad write y j i y j. Recall that w ij fy j i x i ; ˆθ/hy j i x i whe y j i are geerated from hy x i. We have oly to fid hy j i x i whe we use y j i y j. We ca treat y i ; δ i as a realizatio from fy δ, the margial distributio of y amog respodets. 0

11 Now, we ca write fy j δ j r f y j x, δ j fx δ j dx f y j x fx δ j dx δ k f y j x k. k Thus, the fractioal weight for y j i y j becomes w ij fy j x i ; ˆθ k δ kfy j x k ; ˆθ 3 with j A R w ij, where ˆθ is computed from δ i Sθ; x i, y i 0 with Sθ; x, y log fy x; θ/ θ. Step 2: Samplig Step FEFI uses all the elemets i A R as doors for each missig i. Wat to reduce the umber of doors to, say, m 0. For each i, we ca treat the FEFI door set as the weighted populatio ad apply a samplig method to select a smaller set of doors. Fractioal weights 3 for FEFI ca be used as the selectio probabilities for the PPS samplig. That is, our goal is to obtai a systematic PPS sample D i of size m from the FEFI door set of size M r, usig w ij as the selectio probability assiged to the j-th elemet i A R. Note that w ij satisfies M j w ij ad w ij > 0. Step 3: Calibratio Step After we select D i from the complete set of respodets, the selected doors i D i are assiged with the iitial fractioal weights w ij0 /m.

12 The fractioal weights are further adjusted to satisfy δ i wij,cqx i, y j j D i δ i j A R w ijqx i, y j, 4 for some qx i, y j, ad j D i wij,c for all i with δ i 0, where wij is the fractioal weights for FEFI method, as defied i 3. Regardig the choice of the cotrol fuctio qx, y i 4, we ca use qx, y y, y 2, which will lead to fully efficiet estimates for the mea ad the variace of y. For variace estimatio, replicatio method ca be used. The imputed values are ot chaged, oly the fractioal weights are chaged for each replicatio. Details skipped Referece Chauvet,., Deville J.C. ad Haziza, D. 20. O balaced radom imputatio i surveys. Biometrika, 98, Deville, J.-C. ad Tillé, Y Efficiet balaced samplig: The cube method. Biometrika, 9, Fuller, W.A. ad Kim, J.K Hot deck imputatio for the respose model, Survey Methodology, 3, Kim, J.K. ad Fuller, W.A Fractioal hot deck imputatio, Biometrika, 9, Kim, J.K. ad Yag, S Fractioal hot deck imputatio for robust iferece uder item orespose i survey samplig, Survey Methodology, Accepted for publicatio. 2

13 Appedix A: Proof of Lemma 5. Write η i y i if δ i ad η i y i we have E ˆV I ˆV I V ˆθ if δ i 0. Sice, Eη 2 i E Eη 2 i Ey 2 i Ey 2 i ηi 2 E V ˆθ I V ˆθ. η i, 2 η i 2 Eη i + V ˆθ I 2 Ey i + V ˆθ I 2 y i + V ˆθ I V ˆθ Appedix B: A closed form solutio to 8 To fid a set of φ k that satisfies 8, ote first that we ca express the left side of 8 as For k A M, we have k A R α k i α k i α i + k A M α k i α i. + M d i if d ik 0 + M d i d ik if d ik ad α k i α i α i α i M d ik α i d ik. M 3

14 Thus, equatio 8 reduces to k A R If we defie α k i α i Q k α 2 i α2 k j A M a sufficiet coditio to 5 is to fid φ k such that α k k α k + i Rk where Rk i; j A M d ij d kj > 0, i k. Note that α k k α k k A M α k d kj, M α i d ik. 5 M α k i α i + α k i α i Qk 6 i/ Rk 0 + M φ k d k d k M + φ k d k 2 d k M where d k j A M d kj ad, for i Rk, α k i α i + α i + di M + φ kb ik M φ k M b ik where b ik j A M d ij d kj. For i / Rk, we have 2 2 M d i 2 α k i α i αi 2. Thus, 6 reduces to α k + φ k d k + i Rk which is approximately equal to for sufficietly large. + φ k d k + φ 2 k α i + φ k b ik /M + i/ Rk α 2 i Q k, i Rk b2 ik + d k d k M M M, 7 2 4

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

4.5 Multiple Imputation

4.5 Multiple Imputation 45 ultiple Imputatio Itroductio Assume a parametric model: y fy x; θ We are iterested i makig iferece about θ I Bayesia approach, we wat to make iferece about θ from fθ x, y = πθfy x, θ πθfy x, θdθ where

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Simple Random Sampling!

Simple Random Sampling! Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Solution to Chapter 2 Analytical Exercises

Solution to Chapter 2 Analytical Exercises Nov. 25, 23, Revised Dec. 27, 23 Hayashi Ecoometrics Solutio to Chapter 2 Aalytical Exercises. For ay ε >, So, plim z =. O the other had, which meas that lim E(z =. 2. As show i the hit, Prob( z > ε =

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

4.1 Non-parametric computational estimation

4.1 Non-parametric computational estimation Chapter 4 Resamplig Methods 4.1 No-parametric computatioal estimatio Let x 1,...,x be a realizatio of the i.i.d. r.vs X 1,...,X with a c.d.f. F. We are iterested i the precisio of estimatio of a populatio

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three

More information

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Clases 7-8: Métodos de reducción de varianza en Monte Carlo * Clases 7-8: Métodos de reducció de variaza e Mote Carlo * 9 de septiembre de 27 Ídice. Variace reductio 2. Atithetic variates 2 2.. Example: Uiform radom variables................ 3 2.2. Example: Tail

More information

Varanasi , India. Corresponding author

Varanasi , India. Corresponding author A Geeral Family of Estimators for Estimatig Populatio Mea i Systematic Samplig Usig Auxiliary Iformatio i the Presece of Missig Observatios Maoj K. Chaudhary, Sachi Malik, Jayat Sigh ad Rajesh Sigh Departmet

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Abstract. Ranked set sampling, auxiliary variable, variance.

Abstract. Ranked set sampling, auxiliary variable, variance. Hacettepe Joural of Mathematics ad Statistics Volume (), 1 A class of Hartley-Ross type Ubiased estimators for Populatio Mea usig Raked Set Samplig Lakhkar Kha ad Javid Shabbir Abstract I this paper, we

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

3 Resampling Methods: The Jackknife

3 Resampling Methods: The Jackknife 3 Resamplig Methods: The Jackkife 3.1 Itroductio I this sectio, much of the cotet is a summary of material from Efro ad Tibshirai (1993) ad Maly (2007). Here are several useful referece texts o resamplig

More information

Monte Carlo method and application to random processes

Monte Carlo method and application to random processes Mote Carlo method ad applicatio to radom processes Lecture 3: Variace reductio techiques (8/3/2017) 1 Lecturer: Eresto Mordecki, Facultad de Ciecias, Uiversidad de la República, Motevideo, Uruguay Graduate

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

On stratified randomized response sampling

On stratified randomized response sampling Model Assisted Statistics ad Applicatios 1 (005,006) 31 36 31 IOS ress O stratified radomized respose samplig Jea-Bok Ryu a,, Jog-Mi Kim b, Tae-Youg Heo c ad Chu Gu ark d a Statistics, Divisio of Life

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Time-Domain Representations of LTI Systems

Time-Domain Representations of LTI Systems 2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable

More information

Questions and Answers on Maximum Likelihood

Questions and Answers on Maximum Likelihood Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Computing Confidence Intervals for Sample Data

Computing Confidence Intervals for Sample Data Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

STATISTICAL INFERENCE

STATISTICAL INFERENCE STATISTICAL INFERENCE POPULATION AND SAMPLE Populatio = all elemets of iterest Characterized by a distributio F with some parameter θ Sample = the data X 1,..., X, selected subset of the populatio = sample

More information

Chapter 1 Simple Linear Regression (part 6: matrix version)

Chapter 1 Simple Linear Regression (part 6: matrix version) Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Estimating the Population Mean using Stratified Double Ranked Set Sample

Estimating the Population Mean using Stratified Double Ranked Set Sample Estimatig te Populatio Mea usig Stratified Double Raked Set Sample Mamoud Syam * Kamarulzama Ibraim Amer Ibraim Al-Omari Qatar Uiversity Foudatio Program Departmet of Mat ad Computer P.O.Box (7) Doa State

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic http://ijspccseetorg Iteratioal Joural of Statistics ad Probability Vol 7, No 6; 2018 A Relatioship Betwee the Oe-Way MANOVA Test Statistic ad the Hotellig Lawley Trace Test Statistic Hasthika S Rupasighe

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information