Probability and Statistics

Similar documents
This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Distribution of Random Samples & Limit theorems

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 19: Convergence

Mathematical Statistics - MS

1 Convergence in Probability and the Weak Law of Large Numbers

Introductory statistics

LECTURE 8: ASYMPTOTICS I

7.1 Convergence of sequences of random variables

Lecture 8: Convergence of transformations and law of large numbers

Advanced Stochastic Processes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Lecture 3 : Random variables and their distributions

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

STAT Homework 1 - Solutions

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

STA Object Data Analysis - A List of Projects. January 18, 2018

Random Variables, Sampling and Estimation

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Unbiased Estimation. February 7-12, 2008

Estimation for Complete Data

ST5215: Advanced Statistical Theory

TAMS24: Notations and Formulas

The Central Limit Theorem

Notes 5 : More on the a.s. convergence of sums

Summary. Recap ... Last Lecture. Summary. Theorem

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 3 The Lebesgue Integral

EE 4TM4: Digital Communications II Probability Theory

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

STATISTICAL INFERENCE

Notes 19 : Martingale CLT

Probability and Random Processes

Stat410 Probability and Statistics II (F16)

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Asymptotic Results for the Linear Regression Model

This section is optional.

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 11 and 12: Basic estimation theory

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 12: September 27

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Singular Continuous Measures by Michael Pejic 5/14/10

Statistical Theory MT 2009 Problems 1: Solution sketches

An Introduction to Asymptotic Theory

2. The volume of the solid of revolution generated by revolving the area bounded by the

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Last Lecture. Unbiased Test

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Introduction to Probability. Ariel Yadin

Topic 9: Sampling Distributions of Estimators

Solution to Chapter 2 Analytical Exercises

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Topic 9: Sampling Distributions of Estimators

Mathematics 170B Selected HW Solutions.

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Stat 319 Theory of Statistics (2) Exercises

Solutions to HW Assignment 1

Properties and Hypothesis Testing

Lecture Notes 15 Hypothesis Testing (Chapter 10)

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Lecture 7: Properties of Random Samples

Chapter 6 Infinite Series

Final Examination Statistics 200C. T. Ferguson June 10, 2010

An Introduction to Randomized Algorithms

2.1. Convergence in distribution and characteristic functions.

Last Lecture. Wald Test

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Sequences and Series of Functions

Problem Set 4 Due Oct, 12

Probability for mathematicians INDEPENDENCE TAU

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

1.010 Uncertainty in Engineering Fall 2008

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Topic 9: Sampling Distributions of Estimators

PAPER : IIT-JAM 2010

of the matrix is =-85, so it is not positive definite. Thus, the first

Lecture 33: Bootstrap

Notes 27 : Brownian motion: path properties

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

MAXIMAL INEQUALITIES AND STRONG LAW OF LARGE NUMBERS FOR AANA SEQUENCES

Transcription:

ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω, F, ), where Ω is a set of outcomes, F is a set of evets, ad : F [0, 1] is a fuctio that assigs probabilities to evets A σ-algebra (or σ-field) F is a collectio of subsets of Ω that satisfy 1, Ω F 2 if A F, the A c F 3 if A i F is a coutable sequece of sets, the i A i F A measurable space (Ω, F) is a space o which we ca put a measure A measure µ : F R is a oegative coutably additive set fuctio that satisfies 1 µ(a) µ( ) = 0 for all A F 2 if A i F is a coutable sequece of disjoit sets, the µ( i A i ) = i µ(a i ) If µ(ω) = 1, we call µ a probability measure Let µ be a measure o (Ω, F) 1 Mootoicity If A B, the µ(a) µ(b) 2 Subadditivity If A m=1a m, the µ(a) m=1 µ(a m) 3 Cotiuity from below If A i A (ie, A 1 A 2 ad i A i = A), the µ(a i ) µ(a) 4 Cotiuity from above If A i A (ie, A i A 2 ad i A i = A), with µ(a 1 ) <, the µ(a i ) µ(a) 12 istributios A radom variable X is a real-valued fuctio defied o Ω, such that for every Borel set B R, we have X 1 (B) = {ω Ω : X(ω) B} F A radom variable X is discrete if its possible values are fiite or coutably ifiite A radom variable X is cotiuous if its possible values form a ucoutable set ad the probability that X equals ay such value exactly is zero A trivial, but useful, type of example of a radom variable is the idicator fuctio of a set A F: { 1 ω A 1 A (ω) = 0 ω / A Luyag Che: lych@stafordedu 1

ICME Refresher Course: robability ad Statistics Staford Uiversity If X is a radom variable, the X iduces a probability measure o R called its distributio, by settig µ(a) = (X A) for Borel sets A The distributio of a radom variable X is described by givig its distributio fuctio F (x) = (X x) Ay distributio fuctio F has the followig properties: 1 F is odecreasig 2 lim x F (x) = 1, lim x F (x) = 0 3 F is right cotiuous, that is, lim y x F (y) = F (x) 4 lim y x F (y) = F (x ) = (X < x) Ay fuctio F satisfyig 1 3 above is the distributio fuctio of some radom variable Whe the distributio fuctio F (x) has the form we say that X has desity fuctio f 13 Itegratio & Expected Value F (x) = x Suppose f ad g are itegrable fuctios o (Ω, F, µ) 1 If f 0 ae, the fdµ 0 2 For all a R, afdµ = a fdµ 3 f + gdµ = fdµ + gdµ 4 If g f ae, the gdµ fdµ 5 If g = f ae, the gdµ = fdµ 6 fdµ f dµ f(y)dy If X is a radom variable o (Ω, F, ), the we defie its expected value to be E[X] = Xd E[X] does ot always exist Jese s iequality Suppose φ is covex, ad X ad φ(x) are both itegrable, the φ(e[x]) E[φ(X)] Hölder s iequality If p, q (1, ) with 1/p + 1/q = 1, the E[ XY ] (E[ X p ]) 1 p (E[ Y q ]) 1 q The special case p = q = 2 is called the Cauchy-Schwarz iequality Markov s iequality ( X a) a 1 E[ X ] Chebyshev s iequality ( X a) a 2 E[ X 2 ] If k is a positive iteger, the E[X k ] is called the kth momet of X The first momet E[X] is usually called the mea ad deoted by µ If E[X 2 ] <, the the variace of X is defied to be var(x) = E[(X µ) 2 ] = E[X 2 ] µ 2 The covariace of two radom variables X ad Y is defied as cov(x, Y ) = E[(X µ X )(Y µ Y )] = E[XY ] µ X µ Y Luyag Che: lych@stafordedu 2

ICME Refresher Course: robability ad Statistics Staford Uiversity 14 Itegratio to the Limit omiated Covergece Theorem If X X as, X Y for all ad E[Y ] <, the E[X ] E[X] Mootoe Covergece Theorem If 0 X X, the E[X ] E[X] Fatou s Lemma If X 0, the 15 Fubii s Theorem X Y lim if E[X ] E[lim if X ] Fubii s theorem If f 0 or f dµ <, the f(x, y)µ 2 (dy)µ 1 (dx) = fdµ = X Y Exercise Let X be a oegative radom variable Show that 2 Covergece 21 Covergece Cocepts E[X] = 0 Y (X t)dt X f(x, y)µ 1 (dx)µ 2 (dy) Coverge i probability We say that X X i probability, if for ay ε > 0, lim ( X X > ε) = 0 Coverge i L p We say that X X i L p, if lim E[ X X p ] = 0 Coverge almost surely We say that X X as, if (lim X = X) = 1 Coverge i distributio We say that X X i distributio, their CFs coverge, ie F (x) F (x) for ay cotiuous poit x of F Note The followig three statemets are equivalet: 1 lim E[g(X )] = E[g(X)] for all bouded ad cotiuous g(x) 2 lim E[e iαx ] = E[e iαx ] poitwise for all α R 3 lim F (x) = F (x) for ay cotiuous poit x of F 22 Relatioship betwee ifferet Covergeces If X as X, the X X roof ( ε>0 N>0 N { X X < ε}) = 1 = ( ε>0 N>0 N { X X ε}) = 0 = ( N>0 N { X N ε}) = 0 ε > 0 = lim ( X X ε) = 0 Luyag Che: lych@stafordedu 3

ICME Refresher Course: robability ad Statistics Staford Uiversity as X X does t imply X X Couterexample { i 1 t < i+1 f 2 +i(t) = 2 k 2 k k 0 otherwise i = 0, 1,, 2 k 1, k = 0, 1, X = f (U) where U is uiformly distributed o [0, 1] X coverges to 0 i probability, but ot as If X L p X, the X X roof ( X X ε) E[ X X p ] ε p 0 L X X does t imply p X X Couterexample f (t) = { 1/p 0 t < 1 0 otherwise X = f (U) where U is uiformly distributed o [0, 1] X coverges to 0 i probability, but ot i L p If X X, the X X If X a (costat), the X a 23 Cotiuous Mappig Theorem ad Slutsky s Theorem Cotiuous Mappig Theorem Suppose g : R R is a cotiuous fuctio 1 If X X, the g(x ) g(x) 2 If X X, the g(x ) g(x) 3 If X as X, the g(x ) as g(x) Slutsky s Theorem If X X ad Y a (costat), the X + Y X + a ad X Y ax 24 elta Method Theorem Let X 1, X 2, be a sequece of radom variables such that (X a) Z for some radom variable Z ad costat a Let g : R R be cotiuously differetiable at a The (g(x ) g(a)) g (a)z roof where X a X a (g(x ) g(a)) = g ( X ) (X a) (X a) Z X a X a g ( X ) g (a) The use Slutsky s Theorem Luyag Che: lych@stafordedu 4

ICME Refresher Course: robability ad Statistics Staford Uiversity 25 Weak Laws of Large Numbers (WLLN) Theorem Let X 1, X 2, be ucorrelated radom variables with E[X i ] = µ ad var(x i ) C < If S = X 1 + + X the as, S / µ i L 2 ad also i probability roof E[S /] = µ E[ S / µ 2 ] = var(s /) = 1 2 var(s ) = 1 2 var(x i ) C 0 Theorem Let X 1, X 2, be iid radom variables with E[X i ] = µ ad E[ X i ] < If S = X 1 + + X the as, S / µ i probability roof S / µ = 1 (X i 1 { Xi } + X i 1 { Xi >}) E[X 1 1 { X1 }] + E[X 1 1 { X1 }] E[X 1 ] ( 1 = ) (X i 1 { Xi } E[X 1 1 { X1 }]) + 1 = I + II + III ( ) X i 1 { Xi >} + E[X 1 1 { X1 }] E[X 1 ] E[ I 2 ] = 1 E[ X 11 { X1 } E[X 1 1 { X1 }] 2 ] 1 E[ X 1 2 1 { X1 }] = 1 E[ X 1 2 1 { X1 ε }] + 1 E[ X 1 2 1 {ε < X1 }] ε 2 + E[ X 1 1 { X1 >ε }] [ 1 ] E[ II ] = E X i 1 { Xi >} 1 E[ X i 1 { Xi >}] = E[ X 1 1 { X1 >}] 0 III = E[X 1 1 { X1 }] E[X 1 ] E[ X 1 1 { X1 >}] 0 Note Neither idepedece of the X i or their fiite variace are eeded for the validity of WLLN 26 Strog Laws of Large Numbers (SLLN) Theorem Let X 1, X 2, be iid radom variables with E[X i ] = µ ad E[ X i ] < If S = X 1 + + X the as, S / µ as If the iid radom variables {X i } have fiite forth order momets, E[ X i 4 ] < or E[ X i µ 4 ] <, the a applicatio of the Chebyshev iequality with p = 4 gives the eeded estimate ad we have the SLLN i this case Of course, this is oly a sufficiet coditio for its validity As with the WLLN, it is eough that E[ X i ] < 27 Cetral Limit Theorem Theorem Let X 1, x 2, be iid radom variables with E[X i ] = µ ad var(x i ) = σ 2 < If S = X 1 + + X the (S / µ) N (0, σ 2 ) roof E[e iα (S / µ) ] = E[e i α j=1 (Xj µ) ] = φ ( α ) where φ(α) = E[e iα(x1 µ) ] The φ(0) = 1, φ (0) = 0, φ (0) = σ 2 By Taylor s theorem, we have where 0 < α < α φ( α ) = 1 φ (α ) α2 2 φ ( α ) e α2 σ 2 2 Luyag Che: lych@stafordedu 5

ICME Refresher Course: robability ad Statistics Staford Uiversity 3 Statistics 31 robability ad Statistics The basic problem of probability is: Give the distributio of the data, what are the properties (eg expectatio, variace, etc ) of the outcomes? The basic problem of statistics is: Give the outcomes, what ca we say about the distributio of the data? (Give X 1,, X F, what ca we say about F? ) 32 Fudametal Cocepts oit estimatio ivolves the use of sample data to calculate a sigle value (kow as a statistic) which is to serve as a best guess or best estimate of a ukow (fixed or radom) populatio parameter Let X 1,, X be iid data poits from some distributio F (x; θ ) A poit estimator ˆθ of parameter θ is some fuctio of X 1,, X : ˆθ = g(x 1,, X ) We itroduce the followig two methods: Method of Momets ad Maximum Likelihood I statistics, the bias of a estimator is the differece betwee this estimator s expected value ad the true value of the parameter beig estimated A estimator with zero bias is called ubiased Otherwise the estimator is said to be biased Let ˆθ be a estimate of a parameter θ based o a sample of size The ˆθ is said to be cosistet i probability if ˆθ coverges i probability to θ as approaches ifiity A 1 α cofidece iterval for a parameter θ is a iterval C = (a, b) where a = a(x 1,, X ) ad b = b(x 1,, X ) are fuctios of the data such that (θ C ) 1 α 33 The Methods of Momets The kth momet of a probability law is defied as µ k = E[X k ], where X is a radom variable followig that probability law If X 1,, X are iid radom variables from that distributio, the kth sample momet is defied as ˆµ k = 1 Xk i We ca view ˆµ k as a estimate of µ k The method of momets estimates parameters by fidig expressios for them i term of the lowest possible order momets ad the substitutig sample momets ito the expressios Example The first ad secod momets for the ormal distributio N (µ, σ 2 ) are µ 1 = E[X] = µ µ 2 = E[X 2 ] = µ 2 + σ 2 Therefore, µ = µ 1 ad σ 2 = µ 2 µ 2 1 The correspodig estimates of µ ad σ 2 from the sample momets are ˆσ 2 = 1 ˆµ = 1 ( 1 Xi 2 X i = X X i ) 2 = 1 (X i X) 2 Luyag Che: lych@stafordedu 6

ICME Refresher Course: robability ad Statistics Staford Uiversity Questio Are the two estimators above ubiased? Are the two estimators above cosistet? What are the cofidece itervals? E[ˆµ] = µ ˆσ 2 = 1 (X i µ) 2 ( X µ) 2 E[ˆσ 2 ] = σ 2 1 σ2 = 1 σ2 ˆµ is ubiased ˆσ 2 is biased Both ˆµ ad ˆσ 2 are cosistet estimators A 1 α cofidece iterval of ˆµ is [µ σ Φ 1 (1 α/2), µ + σ Φ 1 (1 α/2)] (ˆσ 2 /σ 2 χ 2 ( 1)) 34 The Method of Maximum Likelihood Suppose that radom variables X 1,, X have a joit desity f(x 1,, x θ) Give observed values X i = x i, i = 1,,, the likelihood of θ as a fuctio of x 1,, x is defied as L(θ) = f(x 1,, x θ) If X i are assumed to be iid, the likelihood is L(θ) = f(x i θ) The log likelihood is l(θ) = log L(θ) = log f(x i θ) The maximum likelihood estimate (MLE) of θ is that value of θ that maximizes the likelihood, that is, makes the observed data most probable or most likely The estimates obtaied by the method of maximum likelihood are ot always the same as those obtaied by the method of momets Example If X 1,, X are iid N (µ, σ 2 ), their joit desity is the product of their margial desities: 1 ( f(x 1,, x µ, σ) = exp 1 [ xi µ ] 2 ) 2πσ 2 2 σ The log likelihood is thus l(µ, σ) = log σ 2 The partials with respect to µ ad σ are l µ = 1 σ 2 log 2π 1 2σ 2 (X i µ) l σ = σ + 1 σ 3 The followig are the good properties of the MLE: (X i µ) 2 (X i µ) 2 ˆµ MLE = X ˆσ MLE = 1 (X i X) 2 1 Uder appropriate smoothess coditios o f, the MLE from a iid sample is cosistet 2 Uder appropriate smoothess coditios o f, (ˆθ θ ) N (0, 1/I(θ )) 3 The MLE achieves the Cramer-Rao lower boud Fisher Iformatio [ ] 2 [ 2 ] I(θ) = E θ log f(x θ) = E θ 2 log f(x θ) Luyag Che: lych@stafordedu 7

ICME Refresher Course: robability ad Statistics Staford Uiversity 35 Hypothesis Testig H 0 : the ull hypotheses H 1 (or H A ): the alterative hypothesis Rejectig H 0 whe it is true is called a type I error The probability of a type I error is called the sigificace level of the test ad is usually deoted by α Acceptig the ull hypothesis whe it is false is called a type II error Its probability is usually deoted by β The set of values of the test statistic that leads to rejectio of the ull hypothesis is called the rejectio regio, ad the set of values that leads to acceptace is called the acceptace regio The probability distributio of the test statistic whe the ull hypothesis is true is called the ull distributio The p-value is the probability of a result as or more extreme tha that actually observed if the ull hypothesis were true Some familiar hypothesis tests: z-test, Studet s t-test, Geeralized Likelihood Ratio Test Suppose that the observatios X = (X 1,, X ) have a joit desity fuctio f(x 1,, x θ) H 0 specifies that θ ω 0 ad H 1 specifies that θ ω 1, where ω 0 ω 1 = ad Ω = ω 0 ω 1 The test statistic Λ = max[l(θ)] θ ω 0 max [L(θ)] θ Ω Uder smoothess coditios o the probability desity, the ull distributio of 2 log Λ teds to a chi-square distributio with degrees of freedom equal to dim Ω dim ω 0 as the sample size teds to ifiity 36 Liear Regressio Cosider the followig regressio model: where Y = y 1 y The least square estimator β = β 1 β p Y = Xβ + ε ε = ε 1 ε X = ˆβ LS = arg mi Y Xβ 2 2 Cosider the model above ad we have the followig assumptios: 1 X is o-radom matrix with full colum rak 2 E[ε] = 0 3 cov(ε i, ε j ) = σ 2 δ ij 4 ε i iid N (0, σ 2 ) ˆβ LS = (X T X) 1 X T Y Uder assumptio 1-2, ˆβ LS is a ubiased estimator x 11 x 1p x 1 x p Luyag Che: lych@stafordedu 8

ICME Refresher Course: robability ad Statistics Staford Uiversity Uder assumptio 1-3, Cov( ˆβ LS ) = σ 2 (X T X) 1 A ubiased estimator of σ 2 is s 2 = 1 p RSS = 1 p (Y X ˆβ LS ) T (Y X ˆβ LS ) Uder assumptio 1 ad 4, ˆβ LS N (β, σ 2 (X T X) 1 ) RSS σ 2 χ 2 p ˆβ LS,j β j s t p c jj where c jj is the jth elemet o the diagoal of (X T X) 1 Luyag Che: lych@stafordedu 9