Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Similar documents
TUTORIAL 8 SOLUTIONS #

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Central Limit Theorem ( 5.3)

STATISTICS SYLLABUS UNIT I

Summary of Chapters 7-9

2.3 Analysis of Categorical Data

A Very Brief Summary of Statistical Inference, and Examples

MATH4427 Notebook 2 Fall Semester 2017/2018

Goodness of Fit Goodness of fit - 2 classes

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Institute of Actuaries of India

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Practice Problems Section Problems

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Spring 2012 Math 541B Exam 1

LIST OF FORMULAS FOR STK1100 AND STK1110

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Multiple Sample Categorical Data

Chapter 3 : Likelihood function and inference

Computational Systems Biology: Biology X

The Multinomial Model

A Very Brief Summary of Statistical Inference, and Examples

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

Lecture 32: Asymptotic confidence sets and likelihoods

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Probability Theory and Statistics. Peter Jochumzen

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Statistics 3858 : Maximum Likelihood Estimators

2. Map genetic distance between markers

Master s Written Examination

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

STAT 512 sp 2018 Summary Sheet

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

1 Review of Probability and Distributions

Exercises Chapter 4 Statistical Hypothesis Testing

simple if it completely specifies the density of x

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

CSE 312 Final Review: Section AA

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

1. The Multivariate Classical Linear Regression Model

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Testing Linear Restrictions: cont.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

MC3: Econometric Theory and Methods. Course Notes 4

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

An introduction to PRISM and its applications

Asymptotic Statistics-VI. Changliang Zou

STAT 461/561- Assignments, Year 2015

Case-Control Association Testing. Case-Control Association Testing

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Allele Frequency Estimation

Chapter 1. GMM: Basic Concepts

Chapter 2. Discrete Distributions

Statistics 3858 : Contingency Tables

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

j=1 π j = 1. Let X j be the number

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

BTRY 4090: Spring 2009 Theory of Statistics

Homework 9 (due November 24, 2009)

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

MULTIVARIATE POPULATIONS

STAT 830 Hypothesis Testing

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Exercises and Answers to Chapter 1

MULTIVARIATE DISTRIBUTIONS

Mathematical statistics

4.2 Estimation on the boundary of the parameter space

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Topic 10: Hypothesis Testing

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT 730 Chapter 4: Estimation

Lecture 15. Hypothesis testing in the linear model

Test Problems for Probability Theory ,

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

2.1.3 The Testing Problem and Neave s Step Method

Ch. 5 Hypothesis Testing

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Review. December 4 th, Review

Primer on statistics:

Performance Evaluation and Comparison

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Chapters 10. Hypothesis Testing

[y i α βx i ] 2 (2) Q = i=1

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

Session 3 The proportional odds model and the Mann-Whitney test

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Transcription:

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random variable X, the possible values of X is nite or countable in nite x 1 ; x 2,...,x n, X will have a probability function p(x i ) P (X x i ) with the properties p(x i ) > 0 1X p(x i ) 1 For continuous random variable X, there is a non-negative function f(x) called density function with the property, for any real number a and b P (a X b) Z b a f(x)dx The moments of a random variable X include the expectation E(X), variance V ar(x) E[X E(X)] 2 etc., where 1X E(X) x i p(x i ) for discrete r:v:x E(X) V ar(x) V ar(x) Z b a xf(x)dx for continuous r:v:x 1X [x i E(X)] 2 p(x i ) for discrete X Z b a [x E(X)] 2 f(x)dx for continuous X For more than one random variables, the probability function for discrete random variable and density functions are de ned in the similar way. For two discrete random variables (X, Y ), suppose that the possible values for X are x 1 ; x 2,..., and the possible values for Y are y 1 ; y 2 ;... The conditional probability function p(x i jy y) P (X x i; Y y) P (Y y) For two continuous random variables (X, Y ), let f(x; y) is the join density of (X, Y ), the conditional density of X given Y y is f(x; y) f(xjy) f Y (y) ; where f Y (y) R f(x; y)dx is the marginal density of X. 11 The conditional moments such as expectation of X and variance of X given Y y is similar to the formula given above, but replace the probability or density by conditional probability or conditional density given Y y. 1.1 Some special distribution 1

1. Binomial distribution: B(n; p) If a random variable X with possible values 0, 1, 2,..., n and the probability function P (X k) Cnp k k (1 p) n k ; we say that X has a binomial distribution denoted by X ~ B(n; p) and E(X) np; V ar(x) np(1 p): Background: If a experiment only have two outcomes success and failure, let p P (success), and we do the experiment n times independently. Let X denote the number of success among the n experiments, then X ~ B(n; p). Example: Consider a biallelic marker with codominant alleles A and a, and n individuals. Let p denote the frequency of allele A. Let X denote the number of allele A among the n individuals, then X ~ B(; p). 2. Multinomial distribution (generalization of Binomial distribution) The experiment have m possible outcomes. Let p (p 1 ; :::; p m ) with p i being the probability of ith outcome ( X m p i 1). We do the experiment n times independently. Let X i denote the number of ith outcomes among the n experiments and X (X 1 ; :::; X m );then we say X has a multinomial distribution denoted by Let N (n 1 ; :::; n m ) with X m n i n;then X M m (n; p): P (X N) P (X 1 n 1 ; :::; X m n m ) and n! n 1! n m! pn1 1 pnm m : E(X i ) np i ; V ar(x i ) np i (1 p i ): Note: M 2 (n; p) B(n; p 1 ) Example: Consider a marker with m codominant alleles A 1 ; :::; A m and n individuals. Let p i denote the frequency of allele A i. Let X i denote the number of allele A i among the n individuals, then X (X 1 ; :::; X m ) M m (; p); where p (p 1 ; :::; p m ). So, E(X i ) p i. 3. Some continuous random variables: such as Normal; 2 ;Gamma etc. 2 Likelihood, Maximum Likelihood Estimator and EM Algorithm 1. Likelihood 2

When we have collected data, the probability P (datajparameters) is called likelihood. The statistical inference is usually based on the likelihood. Mathematically, the data set we collected is also called sample denoted by X 1 ; X 2 ; :::; X n. In most of the cases, the sampled individuals are iid (independent identical distributed) and have the same distribution with a random variable X called population. If the sampled individuals are independent, the likelihood will be L() P (X 1 ; :::; X n j) ny P (X i j) ny p(x i j) for discrete r:v ny f(x i j) for continuous r.v 2. The maximum estimator ^(X 1 ; X 2 ; :::; X n ) of parameter is the maximizer of the likelihood function L(^) max L() Example: Consider a marker with three codominant alleles A; B; C. We sample n individuals and get the data 1. n AA ; n BB ; n CC ; n AB ; n AC and n BC the number of individuals with Genotype AA; BB; CC; AB; AC and BC; respectively. Let p A ; p B and p C denote the population frequencies of alleles A; B and C;respectively. Find the MLE of p A ; p B and p C :Solution: Note the multinomial distribution of n AA ; n BB ; n CC ; n AB ; n AC and n BC ; we can get the likelihood function of the data up to a constant, denoting (p A ; p B ; p C ); The log-likelihood is L() (p 2 A) n AA (p 2 B) n BB (p 2 C) n CC (2p A p B ) n AB (2p A p C ) n AC (2p B p C ) n BC : logl() AA log p A + BB log p B + CC log p C + n AB (log p A + log p B ) +n AC (log p A + log p C ) + nbc(log pb + log pc) ( AA + n AB + n AC ) log p A + ( BB + n AB + n BC ) log p B + ( CC + n AC + n BC ) log p C So, the MLE of p A ; p B and p C are given by ^p A AA + n AB + n AC ^p B BB + n AB + n BC ^p A CC + n AC + n BC If the marker in last example is ABO locus, there will be some di culties to estimate the parameters. For n sampled individuals, the data are n A ; n B ; n AB and n O for phenotype A; B; AB and O. How to get the MLE of allele frequencies of alleles A; B and O at this case? The data we get is not complete as given in last example. The complete data should be n AA ; n BB ; n OO ; n AB ; n AO and n BO. In this case we can use EM algorithm. 3

3. EM algorithm Let X denote the unobserved complete data (n AA ; n BB ; n OO ; n AB ; n AO and n BO in the example). Let Y denote the observed incomplete data(n A ; n B ; n AB and n O ). Some function t(x) Y collapses X onto Y. The general idea is to choose X so that the maximum likelihood become trivial for the complete data. The complete data are assume to have a probability density (or likelihood) f(xj) that is a function of as well as X. The EM-algorithm have two steps: (a) E-step: we calculate the conditional expectation where m is the current estimated value of. (b) M-step: we maximize G(j m ) E(log f(xj)jy; m ) G(j m ) with respect to. This yield a new estimator of denoted by m+1. We repeat the two steps until convergence occur. The convergence can be m or the likelihood of the observed data f(y j). Consider the allele frequencies of alleles A; B and O at ABO locus. The log-likelihood of the complete data log f(xj) ( AA + n AB + n AC ) log p A + ( BB + n AB + n BC ) log p B + ( CC + n AC + n BC ) log p C 1. E-step: ( 1 ; 2 ; 3 ) (p A ; p B ; p O ) G(j m ) (2E(n AA jy; m ) + E(n AB jy; m ) + E(n AO jy; m )) log p A +(2E(n BB jy; m ) + E(n AB jy; m ) + E(n BO jy; m )) log p B +(2E(n OO jy; m ) + E(n AO jy; m ) + E(n BO jy; m )) log p C : We know that E(n OO jy; m ) n OO ; E(n AB jy; m ) n AB. The conditional expectations we needed to calculate are E(n AA jy; m ); E(n AO jy; m ); E(n BB jy; m ) and E(n BO jy; m ). Intuitively, within Blood type A (n A individuals the genotype may be AA and AO), the proportion of the individuals with genotype AA is p AA p AA + p AO p 2 ma p 2 ma + 2p map mo So, the expected number of individuals with genotype AA is n A p 2 ma p 2 ma + 2p map mo Mathematically, to calculate E(n AA jy; m ), we need to know the conditional distribution of n AA given Y, that is, given n A n AA + n AO. Given n A ; n AA can be considered as a random variable with a binomial distribution B(n A ; p ) where p is the probability of an individual with genotype AA given this individual having blood type A. So, p P (AAjblood type A) 4

P (AA) P (AA) + P (AO) It follows that p 2 ma p 2 ma + 2p map mo : p 2 ma E(n AA jy; m ) E(n AA jn A ) n A p n A p 2 ma + 2p : map mo Similar to the argument above, we can calculate E(n AO jy; m ); E(n BB jy; m ) and E(n BO jy; m ): M-step: with complete data X, we know that the MLE of the parameters are 3 Parametric Test Statistics 1. General set of Hypothesis Testing ^p A 2E(n AAjY; m ) + E(n AB jy; m ) + E(n AO jy; m ) ^p B 2E(n BBjY; m ) + E(n AB jy; m ) + E(n BO jy; m ) ^p O 2E(n OOjY; m ) + E(n BO jy; m ) + E(n AO jy; m ) Suppose the sample Y (Y 1 ; :::Y n ) 0 has joint density (or probability) function f(yj) where 2 R d (here means that ( 1 ; :::; d ) 0 is a d dimensional vector). Given 0 (strictly), it is desired to test H 0 : 2 0 versus H A : 2 0. The general procedure of testing problem (a) Given a test statistics T (Y ) which has no relation with parameters (b) decide the rejection region, for example, T (Y ) > C (c) For a given signi cant value, calculate the value of C or calculate the p-value of the test p-value P (T (Y ) > T (y)) where y is the observed value of sample Y. This step need to know the distribution of T (Y ). In summary, a testing problem need to know (1) test statistics (2) the form of rejection region and (3) the distribution of the test statistic. 2. Three methods to construct the test (a) Likelihood ratio (Neyman and Pearson 1928). Let and 0 are the MLE of the parameter under and 0, respectively, and let L() denote the likelihood function of the sample Y. The likelihood ratio test statistic is G 2 (Y ) 2 log(l( 0 ) L( )): As n! 1; G 2 approximately has a 2 distribution with degrees of freedom d s where d and s are the dimensions of and 0, respectively. For a given signi cant value, the likelihood ratio test rejects H 0 if and only if G 2 > 2 ;d s. The p-value of the test is P (2 ;d s > G2 (y)): 5

(b) Score test (Rao 1947) @ log L()) The ith score of Y is S i () @ i for i 1; :::d. The vector S() (S 1 (); :::; S d ()) satis es S( ) 0. This observation suggests rejecting H 0 when S( 0 ) is far from the origin. The score test statistic is X 2 (Y ) (S( 0 )) 0 I 1 ( 0 )(S( 0 )): Here, I() is the d d information matrix with (i; j)th element (I()) ij E [ @2 log L() @ i @ j ] As n! 1; X 2 approximately has a 2 distribution with degrees of freedom d s. For a given signi cant value, the score test rejects H 0 if and only if X 2 > 2 ;d s. The p-value of the test is P ( 2 ;d s > X2 (y)): (c) Wald (1943) test: Wald test is to design for testing the null hypothesis 0 f! 2 : h 1 () 0; :::; h r () 0g such as 1 + 3 0; 2 + 4 0. Let h() (h 1 (); :::; h r ()) 0. The idea of the test is to reject H 0 : h() 0 when the vector h( ) is far from the null value of zero. The Wald test statistic is W (h( )) 0 [(rh( )) 0 I 1 ( )rh( )] 1 h( ) where rh( ) is the d r matrix with (i; j)th element (rh( )) ij @hj(): @ i :This test statistic also approximately has a 2 distribution with degree freedom r (note s d r; r d s) Example 1 Let Y ~ M m (n; p) with unknown parameter p belonging to the m 1 dimensional simplex f(p 1 ; :::; p d ) : p i 0; X d p i 1g: The null hypothesis is H 0 : p p 0 i:e: 0 fp 0 g: Construct likelihood ratio test, score test, and Wald test. Example 2 Using the test you constructed to test the following problem: We have sampled 100 individuals. Each individual has genotype at a marker with three codominant alleles A 1 ; A 2 ;and A 3. Among the 100 individuals, we observed 60 allele A 1, 80 allele A 2 and 60 allele A 3. Let p 1 ; p 12 ;and p 3 are the allele frequencies of A 1 ; A 2 ;and A 3, respectively. Test the null hypothesis H 0 : p 1 p 2 p 3 13: Likelihood ratio test: 1. (a) log-likelihood function up to a constant log L(p) Y i log p i The MLE of under is ^p i Yi n ; i 1; :::; m: The MLE under 0 is p p 0 : So, the likelihood ratio test statistic G 2 (Y ) 2(log L(p 0 ) log L(^p)) 2 (Y i log p 0 i Y i (log Y i log n)) 2 Y i log Y i np 0 i which has an asymptotic distribution 2 m 1. For the speci c test problem, m 3; n 200; Y 1 60; Y 2 80; and Y 3 60: So, G 2 (Y ) 2(60 log( 9 10 ) + 80 log( 6 9 5 ) + 60 log( 10 )) 3:88: p-value P ( 2 2 > 3:88) 0:1433: 6

Score test: The log-likelihood is 1. (a) So, log L(p) 1 Y i log p i + Y m log(1 p 1 ::: p m 1 ): S j (p) @ log L(p) @p j Y j p j Y m p m ; 1 j m 1 and ( @ 2 Yi log L(p) p 2 j @p k @p j Y m p 2 ; m if k j; Y m p 2 ; m if k 6 j; Calculation gives and thus the information matrix is Ep( @2 log L @p k @p j ) I(p) n[diag( 1 p 1 ; :::; ( n p j + n p ; m if k j; n p ; m if k 6 j; 1 ) + 1 1 m 1 1 0 m 1 p m 1 p m where 1 m 1 (1; 1; :::1) 0 a m 1 dimensional vector with elements 1. The MLE under H 0 is p 0 (p 0 1; :::; p 0 m 1) 0 ; the inverse is The score test statistic I 1 (p 0 ) 1 n [diag(p0 1; :::; p 0 m 1) p 0 (p 0 ) 0 ]: X 2 S(p 0 ) 0 I 1 (p 0 )S(p 0 ) 1 n ( Y1 Y m p 1 pm ; :::; Ym 1 Y m p m 1 pm ) [diag(p 0 1; :::; p 0 m 1) (p 0 1; :::; p 0 m 1)(p 0 1; :::; p 0 m 1) 0 ] ( Y1 Y m pm ; :::; Ym 1 Y m p m 1 pm ) 0 p 1 [Y i np 0 i ]2 ~ 2 np 0 m 1 i which is standard Pearson s Chi-square test For the speci c test problem: X 2 [Y i EY i ] 2 EY i X 2 [60 2003]2 2003 + [80 2003]2 2003 + [60 2003]2 2003 4 p-value P ( 2 2 > 4) 0:135: 7

Wald test: Let h i (p) p i p 0 i ; i 1; :::; m 1: we can construct the Wald test (similar to the argument above) [Y i np 0 i W 2 m 1 Y i For the speci c problem, W [60 2003]2 60 + [80 2003]2 80 + [60 2003]2 60 3:7 p-value P ( 2 2 > 3:7) 0:157: 8