Introductory statistics

Similar documents
Summary. Recap ... Last Lecture. Summary. Theorem

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

STATISTICAL INFERENCE

Last Lecture. Wald Test

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Statistical inference: example 1. Inferential Statistics

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Probability and Statistics

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 19: Convergence

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Topic 9: Sampling Distributions of Estimators

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

1.010 Uncertainty in Engineering Fall 2008

Statistical Theory MT 2009 Problems 1: Solution sketches

Stat410 Probability and Statistics II (F16)

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Statistical Theory MT 2008 Problems 1: Solution sketches

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Frequentist Inference

TAMS24: Notations and Formulas

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Stat 421-SP2012 Interval Estimation Section

STAT431 Review. X = n. n )

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Module 1 Fundamentals in statistics

Last Lecture. Unbiased Test

Expectation and Variance of a random variable

A statistical method to determine sample size to estimate characteristic value of soil parameters

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 2: Monte Carlo Simulation

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Problem Set 4 Due Oct, 12

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Random Variables, Sampling and Estimation

Math 140 Introductory Statistics

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture 12: September 27

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

32 estimating the cumulative distribution function

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Stat 319 Theory of Statistics (2) Exercises

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Lecture 9: September 19

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Sample Size Determination (Two or More Samples)

Properties and Hypothesis Testing

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

1 Review and Overview

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Common Large/Small Sample Tests 1/55

Power and Type II Error

Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

Efficient GMM LECTURE 12 GMM II

Mathematical Statistics - MS

Lecture 33: Bootstrap

STAC51: Categorical data Analysis

Composite Hypotheses

Parameter, Statistic and Random Samples

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Asymptotic Results for the Linear Regression Model

Chapter 6 Principles of Data Reduction

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

6 Sample Size Calculations

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Lecture Chapter 6: Convergence of Random Sequences

Chapter 6 Sampling Distributions

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

MA238 Assignment 4 Solutions (part a)

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

6.3 Testing Series With Positive Terms

Final Examination Solutions 17/6/2010

This is an introductory course in Analysis of Variance and Design of Experiments.

Intro to Learning Theory

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

An Introduction to Asymptotic Theory

Fall 2013 MTH431/531 Real analysis Section Notes

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

5. Likelihood Ratio Tests

Transcription:

CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key cocepts. For more details as well as more precise statemets of the mathematical results (e.g., regularity coditios ), cosult a stadard statistical text (Casella ad Berger). Statistical iferece (X,..., X ) is the data draw from a distributio P. θ is a ukow populatio parameter, i.e., a fuctio of P. The goal of statistical iferece is to lear θ from (X,..., X ). Three broad ad related tasks i statistical iferece. Estimatio. Hypothesis testig 3. Predictio For each of these tasks, we have procedures for fidig a estimator or test or predictor ad procedures for evaluatig them. We will focus o the first two tasks. Poit estimatio A statistic is a fuctio of the data. The goal of poit estimatio is to fid a statistic t such that ˆθ = t(x,..., X ) is close to θ. ˆθ is a estimator of θ. Some procedures to fid poit estimators:. Maximum likelihood. Method of momets 3. Bayesia methods. Maximum Likelihood Estimatio Defiitio.. Give X,..., X P (x θ), the likelihood fuctio L(θ x) = i= P (x i θ). Defiitio.. A maximum likelihood estimator ˆθ(x) = argmax θ L(θ x) = argmax θ LL(θ x)

LL(θ x) = log L(θ x) = log P (x i θ) i= For a differetiable log likelihood fuctio, possible cadidates for MLE are θ that solve Why oly possible cadidates? d dθ LL(θ) = 0 Not a sufficiet coditio for a maximum (uless the log likelihood is cocave). Maximum could also be o the boudary of the parameter space. Example (Normal likelihood). X,..., X N (µ, σ ) ˆµ = x = ˆσ = i= x i (x i x) i= = S We would like to evaluate poit estimators. Geerally, we ca evaluate estimators i a fiite sample (askig how it behaves for a fixed sample size) or i a asymptotic settig (askig how it behaves as the sample size becomes large).. Fiite sample Defiitio.3. Bias of a estimator ] bias(ˆθ ) = E [ˆθ θ A estimator is ubiased if bias = 0. Defiitio.4. Variace of a estimator Defiitio.5. Mea Square Error of a estimator [ ]) ] var(ˆθ ) = E (ˆθ E [ˆθ [ ) ] mse(ˆθ ) = E (ˆθ θ The mse decomposes as mse(ˆθ ) = bias(ˆθ ) + var(ˆθ ) These defiitios assess the performace of the estimator over repeated experimets, i.e., over ew data that we do ot see i practice. Comparig estimators is ot straightforward though. Besides depedig o the evaluatio criterio, the performace of the estimator ca deped o the true parameter. Example (Normal likelihood (cotiued)).. X is a ubiased estimator of µ.

. ˆµ = X is also a ubiased estimator of µ 3. Var [X ] = σ while Var [ˆµ] = σ 4. ˆσ is ot a ubiased estimator of σ. Its bias is σ. 5. S is a ubiased estimator of σ 6. O the other had, mse( ˆσ ) < mse(s ). 7. S is ot a ubiased estimator of σ 8. X ad S are ubiased estimators of the populatio mea ad variace for ay i.i.d sample from a populatio with fiite mea ad variace..3 Asymptotics It is ofte challegig to compute fiite-sample properties. Alterately, we ca ask how the estimator behaves whe sample size goes to ifiity. To aalyze behavior whe sample size goes to ifiity, we will eed a otio of the limit of sequeces of radom variables much like the limit of sequeces of umbers. Defiitio.6 (Covergece i probability). A sequece of radom variable X,..., X coverges i probability to a radom variable X, deoted X p X, if for every ɛ > 0, lim Pr( X X > ɛ) = 0 Defiitio.7 (Covergece i distributio). A sequece of radom variable X,... coverges i distributio to a radom variable X, deoted X d X, for all x where F (x) is cotiuous. lim F X (x) = F (x) Theorem (Weak Law of Large Numbers). Let X,... be radom variables with E [X i ] = µ, Var [X i ] = σ <. X p µ. Theorem (Cetral Limit Theorem). Let X,... be radom variables with E [X i ] = µ, Var [X i ] = σ <.. (X µ) σ d N (0, ) With these cocepts, we ca ow talk about asymptotic properties of poit estimators. Defiitio.8 (Cosistecy). A sequece of estimators ˆ parameter θ if ˆθ p θ. Example (Normal likelihood (cotiued)). ˆµ = x is a cosistet estimator of µ. ˆσ is a cosistet estimator of σ. S is a cosistet estimator of σ. S is a cosistet estimator of σ. θ is a cosistet sequece of estimators of a Theorem 3 (Cosistecy of the MLE). Let X,..., X f(x θ) ad let ˆθ deote the MLE of θ. ˆθ p θ If the bias of ˆθ 0 ad its variace 0, i.e., its MSE goes to 0, the it is cosistet. 3

Theorem 4 (Asymptotic Normality of the MLE). (ˆθ θ) d N (0, I (θ)) [ ] I(θ) = E log P (X θ) θ is the Fisher iformatio associated with the model. Example. Normal likelihood Let us focus o the mea. For a sigle observatio X N (µ, σ ) log P (X θ) µ log P (X θ) µ = x µ σ = σ I(µ) = σ This makes ituitive sese. Iverse of the Fisher iformatio is the variace. Smaller the variace, larger the Fisher iformatio, ad the estimates of the mea should be more precise. 3 Hypothesis testig A hypothesis is a costrait o the parameter θ. Examples:. Offsprig has equal chace of iheritig either of two alleles from a paret (Medel s first law holds). Two gees segregate idepedetly (Medel s secod law holds) 3. A geetic mutatio has o effect o disease 4. A drug has o effect o blood pressure Defiitio 3.. Null hypothesis: H 0 : θ Θ 0. Alterate hypothesis: H : θ Θ Defiitio 3. (Hypothesis test). A rule that specifies values of the sample for which H 0 should be accepted or rejected. Typically, a hypothesis test is specified i terms of a test statistic t(x,..., X ) ad a rejectio/acceptace regio. 3. Fidig tests Oe commoly used procedure to fid tests is the Likelihood Ratio Test. Other procedures iclude the Wald Test ad Score test. Defiitio 3.3 (LRT statistic). λ(x) = sup θ Θ 0 L(θ x) sup θ Θ L(θ x) = L(ˆθ 0 x) L(ˆθ x) Θ = Θ 0 Θ Defiitio 3.4 (Likelihood Ratio Test (LRT)). A likelihood ratio test (LRT) is a test that rejects H 0 if λ(x) < c. 4

Example (Test the mea of ormal model). Give X,..., X N (µ, ), H 0 : µ = 0 versus H : µ 0. λ(x) = i= exp( xi exp( Reject H 0 if λ(x) < c equivalet to reject H 0 if x > = exp( x ) ) i= (xi ˆµ) ) log( c ). How do we choose c? 3. Evaluatig tests Decisio Truth H 0 H H 0 TN FP H FN TP FP: False positive (type-i error) FN: False egative (type-ii error) TP: True positive TN: True egative For θ Θ 0, probability of false positive for a test R: = P θ (Test rejects). For θ Θ, probability of false egative: P θ (Test accepts). To evaluate a test, we eed to tradeoff false positives ad false egatives. probability below some threshold, what is the false egative threshold? For tests with false positive Defiitio 3.5 (Level of a test). A test has level α if sup θ Θ0 P θ (Test rejects) α. α is a upper boud o the false positive probability. Remark. Ofte we set α = 0.05 For the LRT, choose c to cotrol α: sup θ Θ0 P θ (λ(x) c) = α. Trivial test is to ever reject. Istead, we would like to maximize the probability of rejectig the ull, i.e., power, if it is false while cotrollig the false positive probability. Example (Normal mea (cotiued)). For the test of the ormal mea, Θ 0 = {0}. For θ = 0, the test statistic X N (0, ) to achieve the desired sigificace level. P 0 (λ(x) c) = P 0 ( X = P ( Z log( c ) ) log( c )) Here Z N (0, ). P ( Z z α ) = α where z α is the α-quatile of the stadard ormal distributio. We set log( c ) = z α. 5

The sigificace level is a crude measure. If we reject the ull hypothesis at level α = 0.05, it does ot tell us how strog the evidece agaist H 0 is. The p-value of a test is the smallest α at which the test rejects H 0. Thus, small values of p-value implies stroger evidece agaist H 0. Defiitio 3.6 (p-value). For a statistic t(x) such that large values of t give evidece agaist H 0, the p-value p(x): p(x) = sup θ Θ0 P θ (t(x) t(x)) A p-value is valid if 0 α sup θ Θ0 P θ (p(x) α) α If p(x) is a valid p-value, a test that rejects H 0 because p(x) α is a level α test. Example (p-value for the test of ormal mea). The LRT rejects H 0 : µ = 0 if t(x) = X is large. If µ = 0, the t(x) Z(0, ). So if we observe data x, the p-value for the LRT is p(x) = P ( t(x) t(x)) = P ( Z t(x)) = Φ( t(x)) + Φ(t(x)) = ( Φ(t(x))) We ca compute the p-value for differet values of the test statistic t (Table ). t p-value 0.3 0.04 3.69e-03 4 6.33e-05 5 5.73e-07 Table : P-value for test of ormal mea as a fuctio of the test statistic t = s X To use the LRT, we eed to choose c so that the false positive probability is bouded. To compute false positive probability, we eed to kow how the statistic is distributed uder the ull (the samplig distributio of the statistic). This is easy for the test of the ormal mea but difficult more geerally. A alterative approach to evaluate the LRT (or equivaletly to choose c) relies o asymptotics. 3.. Asymptotics of the LRT Theorem 5 (Wilks theorem). Let X,..., X p(x θ). For testig θ = θ 0 versus θ θ 0 log λ(x ) d χ For testig θ Θ 0 versus θ Θ 0, if ν is the differece i the umber of free parameters betwee θ Θ ad θ Θ 0 : log λ(x ) d χ ν Reject H 0 iff log λ(x) χ ν, α gives us a asymptotic (hece approximate) level α test. Remark. Oe of the coditios for the Wilks theorem to hold is that the true parameter lies i the iterior of the parameter space. This coditio ca be violated i practice, e.g., whe the ull hypothesis restricts the parameter to the boudary of the parameter space. 6

3.. Permutatio tests I may istaces, eve asymptotic results are difficult to obtai. It is also ot clear if sample size is large eough for asymptotics to hold. Example 3 (Permutatio test). Test if distributio of two samples is the same X,..., X F Y,..., Y G H 0 : F = G = P 0 We choose the statistic: t((x,..., X ), (Y,..., Y )) = X Y. The key idea is that uder H 0, all permutatios of the data are equally likely. Use this idea to compute the samplig distributio.. Radomly choose out of elemets to belog to group. The remaiig belog to group.. Compute the differece i meas betwee the two groups. 3. How ofte does the differece i meas o the permuted data exceed the observed differece? I may cases, the umber of permutatios is too large to exhaustively eumerate. The we use a radom sample of permutatios (Mote-Carlo approximatio). 4 Iterval estimatio We would like a measure of cofidece o our estimates. Defiitio 4.. For a scalar parameter θ, give data X, a iterval estimator is a pair of fuctios L(x,..., x ) ad U(x,..., x ), L U, such that [L(X), U(X)] covers θ. Remark. The iterval is radom while θ is fixed 4. Evaluatig iterval estimators Defiitio 4. (Coverage probability). The coverage probability of a iterval estimator [L(X), U(X)] is the probability that this iterval covers the true parameter θ, deoted P θ (θ [L(X), U(X)]). Defiitio 4.3 (( α) cofidece iterval). A iterval estimator is termed a α cofidece iterval if if θ P θ (θ [L(X), U(X)]) α. 4. Fidig iterval estimators A commo method for costructig cofidece itervals is to ivert a test statistic. Example 4. Give X,..., X N (µ, ), fid ( α) cofidece iterval for µ. Set H 0 : µ = µ 0 versus H : µ µ 0. Oe test is to reject H 0 if {x : x µ 0 > z α } So H 0 is accepted for {x : x µ 0 z α x z α } or equivaletly: µ 0 x + z α 7

Sice the test has size α, P (H 0 is accepted µ = µ 0 ) = α, P ( x z α Sice this is true for all µ 0, we have So [ x z α, x + z α P µ ( x z α µ 0 x + z α µ x + z α ] is a α cofidece iterval. µ = µ 0 ) = α ) = α 8