Statistical inference

Size: px

Start display at page:

Download "Statistical inference"

Hester Marsh
5 years ago
Views:

1 Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1

2 1 Introduction: definition and preliminary theory In this chapter, we shall spell out some important definitions and two important results in statistics, namely the Law of Large Numbers and the Central Limit Theorem. Both results are frequently used to prove properties of estimators, and even though a profound knowledge of them is not necessary, it is worth considering them L. Trapani MSc Induction - Statistical inference 2

3 The Law of Large Numbers To state this result formally, consider: a sample Y 1,, Y n of independent and identically distributed (iid) random variables, with E[Y j ]=μ; the arithmetic average Y = 1 n n j= 1 Y j Then it holds that [ ] Y μ > ε = 0 for any ε 0 lim P > n L. Trapani MSc Induction - Statistical inference 3

4 Some comments: the LLN is an asymptotic result, i.e. it describes a situation whereby the sample size is very large roughly speaking, the LLN states that when one estimates the expected value μ by means of the average, then there is a zero chance that the average and the expected value can differ by an arbitrarily small number: thus, there is a zero chance that the average can miss the true value of the expected value μ this result holds if one has an infinite number of observations L. Trapani MSc Induction - Statistical inference 4

5 It is indeed true that the LLN holds if one has an infinite number of observations; However, when the number of observations one has is sufficiently large, then the LLN is a good approximation of the degree of accuracy of the average as an estimator of the mean Some terminology: the average (as the sample size n grows large) gets almost surely close to the expected value; this is denoted as p limy Y p μ = μ or equivalently L. Trapani MSc Induction - Statistical inference 5

6 The central limit theorem We know that, according to the LLN, the expected value of an iid population can be estimated in an almost surely perfect way as the sample size n is growing large It could be interesting to know more about the behaviour of the estimation error in other words, as the sample size n increases, we wonder what happens to Y μ L. Trapani MSc Induction - Statistical inference 6

7 It is well known, from the LLN, that this quantity will be equal to zero More informatively, the Central Limit Theorem (in its basic version) states that: given a sample Y 1,, Y n of independent and identically distributed (iid) random variables, with E[Y j ]=μ and finite variance equal to σ 2, it holds that n ( ) [ ] 2 Y μ ~ N 0, σ L. Trapani MSc Induction - Statistical inference 7

8 Some comments: the CLT is again an asymptotic result roughly speaking, the CLT states that when estimating the expected value μ by means of the average, then the estimation error, magnified by n 1/2, has a normal distribution L. Trapani MSc Induction - Statistical inference 8

9 this is a very powerful and versatile result: it refers to an estimation error: not only do we know that this error will collapse to zero (LLN), but we also know, along which pattern it will collapse to zero; irrespective of the true distribution of the random variables (which we do not need to know), the estimation error will always have a normal distribution, i.e. a distribution which is well-known more precisely, note that (with a slight abuse of notation) Y ~ 2 σ N μ, n 2 σ with n = 0 as n L. Trapani MSc Induction - Statistical inference 9

10 Some terminology: the estimation error is said to converge in distribution to a normal random variable; this is denoted as n ( ) d ( 2 Y μ N 0, σ ) Y μ n σ d N ( 0,1) or L. Trapani MSc Induction - Statistical inference 10

11 2 The properties of estimators In this chapter, we shall provide only definitions of the most commonly studied properties of estimators These definitions are of paramount importance, and will occur very frequently in econometrics L. Trapani MSc Induction - Statistical inference 11

12 Estimation In the previous section, the notion of random variables and PDF have been discussed consider e.g. the normal case, X~N[μ,σ 2 ] Suppose that the PDF of a random variable Y has the form ƒ(y ϑ) this could be e.g. the normal distribution; the PDF depends on one or more parameters ϑ=(ϑ 1,, ϑ k ) which are within a set of possible values, say Ω, called the parameter space the goal of estimation is finding a good guess (an estimate) for ϑ L. Trapani MSc Induction - Statistical inference 12

13 Consider the random variable Y, with PDF ƒ(y ϑ), where ϑ is a parameter that we would like to estimate. Let Y 1,, Y n be a sample from this population: in other words, Y 1,, Y n is a collection of n observations drawn from the random variable Y (e.g. returns to an asset observed throughout 100 days) the number of observations n is defined sample size then an estimator of ϑ is a function or rule of the form ˆ ϑ = ˆ ϑ Y 1,...,Y n ( ) L. Trapani MSc Induction - Statistical inference 13

14 Thus it can be noted that: the estimator is a function (or a transformation) of the observations Y 1,, Y n ; the estimator itself is a random variable because it is a function/transformation of random variables. There are various techniques to compute an estimator; to name the most frequently employed ones (Ordinary) Least Squares; (Generalised) Method of Moments; Maximum Likelihood. L. Trapani MSc Induction - Statistical inference 14

15 The definition of estimator given above is very general Thus, various estimators could potentially be proposed, and it becomes necessary to choose the appropriate estimator In order to do so, one needs to consider the properties of each estimator and select the estimator that best suits L. Trapani MSc Induction - Statistical inference 15

16 To tackle this questions, some properties of estimators are to be evaluated It is common to distinguish these properties into: small sample properties: these are properties/definitions that hold when the sample size n is finite, not large, i.e. not close to infinity strictly speaking, n is always finite large sample properties: these hold (strictly speaking) when the sample size n is very large, actually growing to infinity L. Trapani MSc Induction - Statistical inference 16

17 It is worth noting that: small sample properties hold for any n, whether it be finite or large; even though one can only have a finite number of observations: when n is big enough large sample properties could be a good approximation of the behaviour of the estimates note, as a general rule, that there is no theorem to tell when n is large enough particularly, it is NOT true that, as sometimes said, n=30 is large enough in order to pretend that one has an infinite number of observations L. Trapani MSc Induction - Statistical inference 17

18 Small sample properties Three main properties, known as unbiasedness; efficiency; precision. L. Trapani MSc Induction - Statistical inference 18

19 Unbiasedness Intuition: an estimator is said to be unbiased if on average it will yield the true parameter value; in other words, an estimator is unbiased if the underlying experiment is repeated infinitely many times by drawing sample of size n, the average value of the estimates from all those samples will be ϑ Formally, the definition of unbiasedness is ϑˆ is E unbiased if [ ] [ ˆ] ϑˆ ϑ or E ϑ and only = ϑ = δ = if 0 the quantity δ is also known as the bias of the estimator L. Trapani MSc Induction - Statistical inference 19

20 Exhibit 1 Unbiased and biased estimators. The density functions are normal, with (σ = 1). The true value of the population mean is (μ = 0). The dotted line denotes a biased estimator Sample value L. Trapani MSc Induction - Statistical inference 20

21 Efficiency Intuition: if we compare estimators that are unbiased, then the estimator with the smaller variance would be preferred, and would be defined more efficient it is important to note that efficient estimators do not exist: we only have more efficient estimators; efficiency is a criterion to choose among unbiased estimators. Formally: ϑˆ is 1 more ( ϑˆ ) ( ϑˆ ) Var < Var 1 efficient 2 than ϑˆ 2 if L. Trapani MSc Induction - Statistical inference 21

22 Exhibit 2 The density function of the efficient estimator is exemplified by a normal density with (σ = 0.5). The dotted line indicates a less efficient estimator (σ = 1) Sample value L. Trapani MSc Induction - Statistical inference 22

23 Precision Problem: efficiency is a criterion to select among unbiased estimators what if one needs to compare biased estimators? Intuition: the two estimators can be compared on the grounds of both their bias and their variance: Both bias and variance should be small Formally, the indicator that is commonly employed is the mean squared error (MSE) of the estimator, defined as MSE = Var ) [ ) ] 2 ( ϑ ) = E ( ϑ ϑ ) ( ) ) 2 ϑ + δ the criterion is (obviously): pick the estimator with the smaller MSE =... L. Trapani MSc Induction - Statistical inference 23

24 Some comments: the criterion is (obviously): pick the estimator with the smaller MSE note that the MSE is a criterion that gives equal weight to efficiency and bias (i.e. they are considered equally important); the MSE is employed to compare estimators, rather than to assess the goodness of one single estimator; L. Trapani MSc Induction - Statistical inference 24

25 Large sample properties Three main properties, known as consistency; limiting distribution; asymptotic efficiency. L. Trapani MSc Induction - Statistical inference 25

26 Consistency Intuition: an estimator is consistent if, as the sample size n increases, the estimated value collapses onto the true value of the parameter ϑ Formally (note the link with the LLN) ) limn P n ) p limϑn = ϑ ) p ϑ ϑ n [ ] ϑ ϑ > ε = 0 L. Trapani MSc Induction - Statistical inference 26

27 Some comments: consistency is a very important property, and it is common to discard estimators that are not consistent consistency could possibly be seen as the large sample counterpart to unbiasedness, but: an estimator does not need to be unbiased to be consistent, and an unbiased estimator is not necessarily consistent; roughly speaking, consistency means that: as n grows, the estimator will collapse on the true value of the parameter: thus, we do have asymptotic unbiasedness however (which is not considered by unbiasedness), the dispersion of the estimator around the true value must go to zero as well L. Trapani MSc Induction - Statistical inference 27

28 Limiting distribution Intuition: the limiting distribution of an estimator is the PDF (or more precisely the distribution) of the estimator as n tends to infinity example: many estimators have a distribution which is asymptotically normal with mean ϑ (the true value of the parameter) for large values of n this is referred to as asymptotic normality; when n is small, the limiting distribution is a (sometimes good) approximation of the true PDF of the estimator L. Trapani MSc Induction - Statistical inference 28

29 Asymptotic efficiency Intuition: this notion is actually the same as in the small sample case, but here the variance of the estimator is computed for n tending to infinity the asymptotic variance of an estimator is the variance of its limiting distribution; once again, asymptotic efficiency can be employed to compare estimators, rather than to assess the goodness of one estimator; the notion of asymptotic efficiency, similarly to the small sample case, can be applied only to consistent estimators. L. Trapani MSc Induction - Statistical inference 29

30 Confidence intervals As we know, save for the special and rather unrealistic case whereby one has an infinite number of observations, an estimator will guess the true value of a parameter up to an estimation error Thus, it could make sense to acknowledge that the estimator is not fully accurate Instead of estimating the parameter ϑ by simply using the estimator, it is sometimes common to find an interval wherein the true value of the parameter could lie with a certain probability using the estimator as a raw guess of ϑ is referred to as point estimation using an interval is known as interval estimation L. Trapani MSc Induction - Statistical inference 30

31 Definition: an interval estimate is a range of values where the true, unobserved ϑ lies, associated to the probability that ϑ lies within this range In other words: instead of estimating ϑ by means of a single number (the estimator or point estimate), ϑ is said to range in an interval [a,b] with probability p p is also referred to as the confidence level L. Trapani MSc Induction - Statistical inference 31

32 L. Trapani MSc Induction - Statistical inference 32 Most often, an estimator has (at least asymptotically) a normal distribution Therefore, confidence intervals are constructed as follows Suppose the estimator has a distribution Then a possible, and commonly employed confidence interval is n N 2, ~ ˆ σ ϑ ϑ + n n ˆ, 1.96 ˆ σ ϑ σ ϑ ϑ

33 Some comments here, the confidence level is p=0.95 see Lecture 2, slide 73 the interval estimation can be read as there s a 95% chance that the true value of ϑ belongs to the confidence interval note that the width of the interval itself contains some important information, as it tells about the accuracy of the estimator: the larger the interval, the higher the uncertainty about ϑ therefore, the larger the interval, the worse the precision of the estimator L. Trapani MSc Induction - Statistical inference 33

34 Note that the width of the confidence interval depends on 3 factors: the variance σ 2 : this is usually a characteristic of the data one has the larger σ 2, the less precise the estimator the number of observations n the larger n, the more precise the estimator in other words, the more information one has, the more accurate the estimates the confidence level p that one chooses: the larger p, the wider the confidence interval L. Trapani MSc Induction - Statistical inference 34

35 3 Hypothesis testing Hypothesis testing is an issue of paramount importance, and it will unfold during the various modules in econometrics. Several tests will be applied, and the mechanism of hypothesis testing will become clearer and clearer The purpose of this note is twofold. First, some useful definitions are going to be provided. Secondly, a quick (and very easy, but rigorous) rule of thumb to run hypothesis testing is presented. L. Trapani MSc Induction - Statistical inference 35

36 Hypothesis testing It is an issue of paramount importance! No theoretical background is needed (which is good news), A test is a procedure/decision rule which makes use of a sample of available data (say Y 1,, Y n ) and an estimator of a parameter ϑ to verify whether a certain hypothesis on ϑ holds true or not L. Trapani MSc Induction - Statistical inference 36

37 Some important definitions: a test is represented as a choice between the hypothesis we would like to verify (null hypothesis) or, in other words, a statement about the value of parameter ϑ an alternative which is the negation of the null (alternative hypothesis) L. Trapani MSc Induction - Statistical inference 37

38 the representation of a test is (almost universally) H0 : ϑ Ω0, H1 : ϑ Ω1 Ω Ω = Ω, Ω Ω 1 = the null hypothesis can be rejected (i.e. it s false) or not rejected (i.e. we can say it is true, with a slight abuse of terminology); L. Trapani MSc Induction - Statistical inference 38

39 sometimes a test can be wrong: there is a chance of rejecting the null when this is true or not rejecting it when this is false the probability of rejecting the null if it is true is called size of the test we would like it to be small, and usually the test is designed in such a way that the size is 0.05; the probability of rejecting the null if it is false is called power we would like this to be large, as close to 1 as possible; there exists a trade-off between size and power: namely, the smaller the size, the smaller the power L. Trapani MSc Induction - Statistical inference 39

40 P-value This is an extremely important definition, Whenever running a test, the output of any software is (together with a great deal of other results) a quantity called p-value: the p-value is a number between 0 and 1 it represents a probability with an abuse of terminology, we can employ the following rule of thumb to run a hypothesis test: the p-value is the probability that the null hypothesis is true (based on the data) L. Trapani MSc Induction - Statistical inference 40

41 Thus: with this rule of thumb, running a test means making the following decision: is the null hypothesis likely enough to be true? if we think/decide it is, then we conclude that we cannot reject the null hypothesis (i.e. we accept it, we think it is true and base our subsequent analysis as if it were true) we need a criterion to decide whether the null hypothesis is plausible enough L. Trapani MSc Induction - Statistical inference 41

42 A commonly employed criterion (which is almost universally accepted even though completely arbitrary) is if p-value>0.05, accept the null if p-value<0.05, reject the null Thus, to run a test we need to know TWO elements: the null hypothesis (and, obviously, the alternative) the p-value L. Trapani MSc Induction - Statistical inference 42

43 THE LAST SLIDE Nearly but not quite there is still an optional Q&A set of sessions but, if I don t see you, good luck for everything enjoy your MSc and have a happy life L. Trapani MSc Induction - Statistical inference 43

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).