What Is Econometrics. Chapter Data Accounting Nonexperimental Data Types

Size: px

Start display at page:

Download "What Is Econometrics. Chapter Data Accounting Nonexperimental Data Types"

Charles Hoover
6 years ago
Views:

1 Chapter What Is Econometrics. Data.. Accounting Definition. Accounting ata is routinely recore ata (that is, recors of transactions) as part of market activities. Most of the ata use in econmetrics is accounting ata. Acounting ata has many shortcomings, since it is not collecte specifically for the purposes of the econometrician. An econometrician woul prefer ata collecte in experiments or gathere for her... Nonexperimental The econometrician oes not have any control over nonexperimental ata. This causes various problems. Econometrics is use to eal with or solve these problems...3 Data Types Definition. Time-series ata are ata that represent repeate observations of some variable in subseqent time perios. A time-series variable is often subscripte with the letter t. Definition.3 Cross-sectional ata are ata that represent a set of observations of some variable at one specific instant over several agents. A crosssectional variable is often subscripte with the letter i. Definition.4 Time-series cross-sectional ata are ata that are both timeseries an cross-sectional.

2 CHAPTER. WHAT IS ECONOMETRICS An special case of time-series cross-sectional ata is panel ata.panel ata areobservationsofthesamesetofagentsovertime...4 Empirical Regularities We nee to look at the ata to etect regularities. Often, we use stylize facts, but this can lea to over-simplifications.. Moels Moels are simplifications of the real worl. The ata we use in our moel is what motivates theory... Economic Theory By choosing assumptions, the.. Postulate Relationships Moels can also be evelope by postulating relationships among the variables...3 Equations Economic moels are usually state in terms of one or more equations...4 Error Component Because none of our moels are exactly correct, we inclue an error component into our equations, usually enote u i. In econometrics, we usually assume that the error component is stochastic (that is, ranom). It is important to note that the error component cannot be moele by economic theory. We impose assumptions on u i, an as econometricians, focus on u i..3 Statistics.3. Stochastic Component Some stuff here..3. Statistical Analysis Some stuff here.

3 .4. INTERACTION Tasks There are three main tasks of econometrics:. estimating parameters;. hypothesis testing; 3. forecasting. Forecasting is perhaps the main reason for econometrics..3.4 Econometrics Since our ata is, in general, nonexperimental, econometrics makes use of economic theory to ajust for the lack of proper ata..4 Interaction.4. Scientific Metho Econometrics uses the scientific metho, but the ata are nonexperimental. In some sense, this is similar to astronomers, who gather ata, but cannot conuct experiments (for example, astronomers preict the existence of black holes, but have never mae one in a lab)..4. Role Of Econometrics The three components of econometrics are:. theory;. statistics; 3. ata. These components are interepenent, an each helps shape the others..4.3 Ocam s Razor Often in econometrics, we are face with the problem of choosing one moel over an alternative. The simplest moel that fits the ata is often the best choice.

4 Chapter Some Useful Distributions. Introuction.. Inferences Statistical Statements As statisticians, we are often calle upon to answer questions or make statements concerning certain ranom variables. For example: is a coin fair (i.e. is the probability of heas 0.5) or what is the expecte value of GNP for the quarter. Population Distribution Typically, answering such questions requires knowlege of the istribution of the ranom variable. Unfortunately, we usually o not know this istribution (althoughwemayhaveastrongiea,asinthecaseofthecoin). Experimentation In orer to gain knowlege of the istribution, we raw several realizations of the ranom variable. The notion is that the observations in this sample contain information concerning the population istribution. Inference Definition. The process by which we make statements concerning the population istribution base on the sample observations is calle inference. Example. We ecie whether a coin is fair by tossing it several times an observing whether it seems to be heas about half the time. 4

5 .. INTRODUCTION 5.. Ranom Samples Suppose we raw n observations of a ranom variable, enote {x,x,..., x n } an each x i is inepenent an has the same (marginal) istribution, then {x,x,..., x n } constitute a simple ranom sample. Example. We toss a coin three times. Supposely, the outcomes are inepenent. If x i countsthenumberofheasfortossi,thenwehaveasimple ranom sample. Note that not all samples are simple ranom. Example.3 Weareinteresteintheincomelevelforthepopulationingeneral. The n observations available in this class are not inentical since the higher income iniviuals will ten to be more variable. Example.4 Consier the aggregate consumption level. The n observations available in this set are not inepenent since a high consumption level in one perio is usually followe by a high level in the next...3 Sample Statistics Definition. Any function of the observations in the sample which is the basis for inference is calle a sample statistic. Example.5 In the coin tossing experiment, let S count the total number of heas an P S 3 count the sample proportion of heas. Both S an P are sample statistics...4 Sample Distributions A sample statistic is a ranom variable its value will vary from one experiment to another. As a ranom variable, it is subject to a istribution. Definition.3 The istribution of the sample statistic is the sample istribution of the statistic. Example.6 The statistic S introuce above has a multinomial sample istribution. Specifically Pr(S 0) /8, Pr(S ) 3/8, Pr(S ) 3/8, an Pr(S 3) /8.

6 6 CHAPTER. SOME USEFUL DISTRIBUTIONS. Normality An The Sample Mean.. Sample Sum Consier the simple ranom sample {x,x,..., x n },wherex measures the height of an ault female. We will assume that E x i µ an Var(x i )σ, for all i,,..., n Let S x + x + + x n enote the sample sum. Now, Also, E S E( x + x + + x n )E x + E x + + E x n nµ (.) Var(S) E( S E S ) E( S nµ ) E( x + x + + x n nµ ) " n # X E ( x i µ ) E[( x µ ) +(x µ )( x µ )+ + ( x µ )( x µ )+(x µ ) + +(x n µ ) ] nσ. (.) Note that E[( x i µ )( x j µ )] 0 by inepenence... Sample Mean Let x S n enote the sample mean or average...3 Moments Of The Sample Mean The mean of the sample mean is Thevarianceofthesamplemeanis E x E S n n E S nµ µ. (.3) n Var(x) E(x µ) E( S n µ ) E[ ( S nµ )] n n E( S nµ )

7 .3. THE NORMAL DISTRIBUTION 7 n nσ..4 Sampling Distribution σ n. (.4) We have been able to establish the mean an variance of the sample mean. However, in orer to know its complete istribution precisely, we must know the probability ensity function (pf) of the ranom variable x..3 The Normal Distribution.3. Density Function Definition.4 A continuous ranom variable x i with the ensity function f ( x i ) πσ e σ ( x i µ ) (.5) follows the normal istribution, where µ an σ are the mean an variance of x i, respectively. Since the istribution is characterize by the two parameters µ an σ,we enote a normal ranom variable by x i N ( µ, σ ). The normal ensity function is the familiar bell-shape curve, as is shown in Figure. for µ 0 an σ. It is symmetric about the mean µ. Approximately 3 of the probability mass lies within ±σ of µ an about.95 lies within ±σ. There are numerous examples of ranom variables that have this shape. Many economic variables are assume to be normally istribute..3. Linear Transformation Consier the transforme ranom variable We know that an Y i a + bx i µ Y E Y i a + bµ x σ Y E(Y i µ Y ) b σ x If x i is normally istribute, then Y i is normally istribute as well. That is, Y i N ( µ Y,σ Y )

8 8 CHAPTER. SOME USEFUL DISTRIBUTIONS Figure.: The Stanar Normal Distribution Moreover, if x i N ( µ x,σ x )anz i N ( µ z,σ z ) are inepenent, then Y i a + bx i + cz i N ( a + bµ x + cµ z,b σ x + c σ z ) These results will be formally emonstrate in a more general setting in the next chapter..3.3 Distribution Of The Sample Mean If, for each i,,...,n,thex i s are inepenent, ientically istribute (ii) normal ranom variables, then.3.4 The Stanar Normal x i N ( µ x, σ x n ) (.6) The istribution of x will vary with ifferent values of µ x an σ x,whichis inconvenient. Rather than ealing with a unique istribution for each case, we

9 .4. THE CENTRAL LIMIT THEOREM 9 perform the following transformation: Now, Z x µ x q σx n x q µ q x σx n σx n E Z q E x σx n µ q x σx n 0. µ q x σx n µ q x σx n Also, Var( Z ) E q x µ q x σx σx n n n E ( x µ x ) σ x n σx. σ x n Thus Z N(0, ). The N(0, ) istribution is the stanar normal an is welltabulate. The probability ensity function for the stanar normal istribution is f ( z i ) π e ( z i ) ϕ(z i ) (.8).4 The Central Limit Theorem.4. Normal Theory The normal ensity has a prominent position in statistics. This is not only because many ranom variables appear to be normal, but also because most any sample mean appears normal as the sample size increases. Specifically, suppose x,x,...,x n is a simple ranom sample an E x i µ x an Var( x i )σx,thenasn, the istribution of x becomes normal. That

10 0 CHAPTER. SOME USEFUL DISTRIBUTIONS is, lim n f x µ x q σ x n ϕ x µ x q (.9) σx n.5 Distributions Associate With The Normal Distribution.5. The Chi-Square Distribution Definition.5 Suppose that Z,Z,...,Z n is a simple ranom sample, an Z i N(0, ). Then nx Zi Xn, (.0) where n are the egrees of freeom of the Chi-square istribution. The probability ensity function for the X n is f χ ( x ) n/ Γ( n/) xn/ e x/,x>0 (.) where Γ(x) is the gamma function. See Figure.. If x,x,...,x n is a simple ranom sample, an x i N ( µ x,σ x ), then nx µ xi µ σ X n. (.) The chi-square istribution will prove useful in testing hypotheses on both the variance of a single variable an the (conitional) means of several. This multivariate usage will be explore in the next chapter. Example.7 Consier the estimate of σ P n s ( x i x ). n Then ( n ) s σ X n. (.3)

11 .5. DISTRIBUTIONS ASSOCIATED WITH THE NORMAL DISTRIBUTION Figure.: Some Chi-Square Distributions.5. The t Distribution Definition.6 Suppose that Z N(0, ), Y Xk, an that Z an Y are inepenent. Then Z q t k, (.4) Y k where k are the egrees of freeom of the t istribution. The probability ensity function for a t ranom variable with n egrees of freeom is Γ n+ f t ( x ) nπ Γ n + x (n+)/, (.5) n for <x<. See Figure.3 The t (also known as Stuent s t) istribution, is name after W.S. Gosset, who publishe uner the pseuonym Stuent. It is useful in testing hypotheses concerning the (conitional) mean when the variance is estimate. Example.8 Consier the sample mean from a simple ranom sample of normals. We know that x N( µ, σ /n )an Z x q µ N(0, ). σ n

12 CHAPTER. SOME USEFUL DISTRIBUTIONS Figure.3: Some t Distributions Also, we know that Y (n ) s σ X n, where s is the unbiase estimator of σ. Thus, if Z an Y are inepenent (which, in fact, is the case), then Z q Y ( n ) x µ σ n r ( n ) s σ ( n ) s n σ (x µ ) x q µ s n s σ t n (.6)

13 .5. DISTRIBUTIONS ASSOCIATED WITH THE NORMAL DISTRIBUTION3.5.3 The F Distribution Definition.7 Suppose that Y Xm, W Xn,anthatY an W are inepenent. Then Y F m,n, (.7) where m,n are the egrees of freeom of the F istribution. m W n The probability ensity function for a F ranom variable with m an n egrees of freeom is f F ( x ) Γ m+n (m/n) m/ Γ n Γ m x (m/) ( + mx/n) (m+n)/ (.8) The F istribution is name after the great statistician Sir Ronal A. Fisher, an is use in many applications, most notably in the analysis of variance. This situation will arise when we seek to test multiple (conitional) mean parameters with estimate variance. Note that when x t n then x F,n. Some examples of the F istribution can be seen in Figure.4. Figure.4: Some F Distributions

14 Chapter 3 Multivariate Distributions 3. Matrix Algebra Of Expectations 3.. Moments of Ranom Vectors Let x x. x m x be an m vector-value ranom variable. Each element of the vector is a scalar ranom variable of the type iscusse in the previous chapter. The expectation of a ranom vector is E[ x ] µ E[ x ] E[ x ]. µ. µ. (3.) E[ x m ] µ m Note that µ is also an m column vector. We see that the mean of the vector isthevectorofthemeans. Next, we evaluate the following: E[( x µ )( x µ ) 0 ] E ( x µ ) ( x µ )( x µ ) ( x µ )( x m µ m ) ( x µ )( x µ ) (x µ ) ( x µ )( x m µ m ) ( x m µ m )( x µ ) (x m µ m )( x µ ) ( x m µ m ) 4

15 3.. MATRIX ALGEBRA OF EXPECTATIONS 5 σ σ σ m σ σ σ m σ m σ m σ mm Σ. (3.) Σ, the covariance matrix, is an m m matrix of variance an covariance terms. The variance σ i σ ii of x i is along the iagonal, while the cross-prouct terms represent the covariance between x i an x j. 3.. Properties Of The Covariance Matrix Symmetric The variance-covariance matrix Σ is a symmetric matrix. This can be shown by noting that σ ij E( x i µ i )( x j µ j )E( x j µ j )( x i µ i )σ ji. Due to this symmetry Σ will only have m(m +)/ unique elements. Positive Semiefinite Σ is a positive semiefinite matrix. Recall that any m m matrix is positive semiefinite if an only if it meets any of the following three equivalent conitions:. All the principle minors are nonnegative;. λ 0 Σλ 0, for all {z} λ 6 0; m 3. Σ PP 0,forsome {z} P. m m The first conition (actually we use negative efiniteness) is useful in the stuy of utility maximization while the latter two are useful in econometric analysis. The secon conition is the easiest to emonstrate in the current context. Let λ 6 0. Then, we have λ 0 Σλ λ 0 E[( x µ )( x µ ) 0 ]λ E[λ 0 ( x µ )( x µ ) 0 λ ] E{ [ λ 0 ( x µ )] } 0,

16 6 CHAPTER 3. MULTIVARIATE DISTRIBUTIONS since the term insie the expectation is a quaratic. Hence, Σ is a positive semiefinite matrix. Note that P satisfying the thir relationship is not unique. Let D be any m m orthonormal martix, then DD 0 I m an P PD yiels P P 0 PDD 0 P 0 PI m P 0 Σ. Usually, we will choose P to be an upper or lower triangular matrix with m(m + )/ nonzero elements. Positive Definite Since Σ is a positive semiefinite matrix, it will be a positive efinite matrix if an only if et(σ) 6 0. Now, we know that Σ PP 0 for some m m matrix P. Thisimpliesthatet(P) Linear Transformations Let y {z} m b {z} m m z} { + {z} B x.then m m E[ y ] b + B E[ x ] b + Bµ µ y (3.3) Thus, the mean of a linear transformation is the linear transformation of the mean. Next, we have E[(y µ y )( y µ y ) 0 ] E{[ B ( x µ )][ (B ( x µ )) 0 ]} B E[( x µ )( x µ) 0 ]B 0 BΣB 0 BΣB 0 (3.4) Σ y (3.5) whereweusetheresult(abc ) 0 C 0 B 0 A 0, if conformability hols. 3. Change Of Variables 3.. Univariate Let x be a ranom variable an f x ( ) be the probability ensity function of x. Now, efine y h(x), where h 0 ( x ) h( x ) x > 0.

17 3.. CHANGE OF VARIABLES 7 That is, h( x ) is a strictly monotonically increasing function an so y is a oneto-one transformation of x. Now, we woul like to know the probability ensity function of y, f y (y). To finit,wenotethat an, Pr( y h( a ))Pr(x a ), (3.6) Pr( x a ) Pr( y h( a )) Z a Z h( a ) f x ( x ) x F x ( a ), (3.7) f y ( y ) y F y ( h( a )), (3.8) for all a. Assuming that the cumulative ensity function is ifferentiable, we use (3.6) to combine (3.7) an (3.8), an take the total ifferential, which gives us F x ( a ) F y ( h( a )) f x ( a )a f y ( h( a ))h 0 ( a )a for all a. Thus, for a small perturbation, f x ( a )f y ( h( a ))h 0 ( a ) (3.9) for all a. Also, sincey is a one-to-one transformation of x, weknowthath( ) can be inverte. That is, x h (y). Thus, a h (y), an we can rewrite (3.9) as f x ( h ( y )) f y ( y )h 0 ( h ( y )). Therefore, the probability ensity function of y is f y ( y )f x ( h ( y )) h 0 ( h ( y )). (3.0) Note that f y ( y ) has the properties of being nonnegative, since h 0 ( ) > 0. If h 0 ( ) < 0, (3.0) can be correcte by taking the absolute value of h 0 ( ), which will assure that we have only positive values for our probability ensity function. 3.. Geometric Interpretation Consier the graph of the relationship shown in Figure 3.. We know that Pr[ h( b ) >y>h( a )] Pr( b>x>a). Also, we know that Pr[ h( b ) >y>h( a )] ' f y [ h( b )][ h( b ) h( a )],

18 8 CHAPTER 3. MULTIVARIATE DISTRIBUTIONS y h(b) h(a) h(x) a b x Figure 3.: Change of Variables an So, Pr( b>x>a) ' f x ( b )( b a ). f y [ h( b )][ h( b ) h( a )] ' f x ( b )( b a ) f y [ h( b )] ' f x ( b ) [ h( b ) h( a )]/( b a )] (3.) Now, as we let a b, the enominator of (3.) approaches h 0 ( ). This is then the same formula as (3.0) Multivariate Let {z} x f x ( x ). m Define a one-to-one transformation m z} { y h ( x ). {z} m Since h( ) is a one-to-one transformation, it has an inverse: x h ( y ).

19 3.. CHANGE OF VARIABLES 9 We also assume that h( x ) x 0 h( x ) x 0 exists. This is the m m Jacobian matrix, where h ( x ) h ( x ). h m ( x ) (x x x m ) h ( x ) h ( x ) x x h ( x ) h ( x ) x x h ( x ) h ( x ) x m x m h m( x ) x h m ( x ) x h m( x ) x m J x ( x ) (3.) Given this notation, the multivariate analog to (3.) can be shown to be f y ( y )f x [ h ( y )] et(j x [ h ( y )]) (3.3) Since h( ) isifferentiable an one-to-one then et(j x [ h ( y )]) 6 0. Example 3. Let y b 0 + b x,wherex, b 0,anb are scalars. Then x y b 0 b an Therefore, y x b. f y ( y )f x µ y b0 b b. Example 3. Let y b + Bx, wherey is an m vector an et(b) 6 0. Then x B ( y b ) an Thus, y x 0 B J x( x ). f y ( y )f x B ( y b ) et( B ).

20 0 CHAPTER 3. MULTIVARIATE DISTRIBUTIONS 3.3 Multivariate Normal Distribution 3.3. Spherical Normal Distribution Definition 3. An m ranomvectorz is sai to be spherically normally istribute if f(z) (π) n/ e z 0z. Such a ranom vector can be seen to be a vector of inepenent stanar normals. Let z,z,...,z m, be i.i.. ranom variables such that z i N(0, ). That is, z i has pf given in (.8), for i,..., m. Then, by inepenence, the joint istribution of the z i s is given by where z 0 (z z... z m ). f( z,z,...,z m ) f( z )f( z ) f( z m ) ny e z i π 3.3. Multivariate Normal Definition 3. The m ranom vector x with ensity (π) m/ e n z i (π) m/ e z 0z, (3.4) f x ( x ) (π) n/ [et(σ)] / e ( x µ ) 0 Σ ( x µ ) (3.5) is sai to be istribute multivariate normal with mean vector µ an positive efinite covariance matrix Σ. Such a istribution for x is enote by x N(µ, Σ). The spherical normal istribution is seen to be a special case where µ 0 an Σ I m. There is a one-to-one relationship between the multivariate normal ranom vector an a spherical normal ranom vector. Let z be an m spherical normal ranom vector an {z} x µ + Az, m where z is efine above, an et(a) 6 0. Then, E x µ + A E z µ, (3.6)

21 3.3. MULTIVARIATE NORMAL DISTRIBUTION since E[z] 0. Also, we know that E( zz 0 )E z z z z z m z z z z z m z m z z m z zm I m, (3.7) since E[z i z j ]0,foralli 6 j, ane[zi ],foralli. Therefore, E[( x µ )( x µ ) 0 ] E( Azz 0 A 0 ) A E( zz 0 )A 0 AI m A 0 Σ, (3.8) where Σ is a positive efinite matrix (since et(a) 6 0). Next, we nee to fin the probability ensity function f x ( x )ofx. Weknow that z A ( x µ ), z 0 (x µ ) 0 A 0, an J z ( z )A, so we use (3.3) to get f x ( x ) f z [ A ( x µ )] et( A ) (π) m/ e ( x µ ) (π ) m/ e ( x µ ) 0 A 0 A ( x µ ) et( A ) 0 ( AA 0 ) ( x µ ) et( A ) (3.9) whereweusetheresults(abc) C B A an A 0 A 0. However, Σ AA 0,soet(Σ) et(a) et(a), an et(a) [et(σ)] /. Thus we can rewrite (3.9) as f x ( x ) (π) m/ [et(σ)] / e ( x µ ) 0 Σ ( x µ ) (3.0) an we see that x N(µ, Σ) withmeanvectorµ an covariance matrix Σ. Since this process is completely reversable the relationship is one-to-one.

22 CHAPTER 3. MULTIVARIATE DISTRIBUTIONS Linear Transformations Theorem 3. Suppose x N(µ, Σ) with et(σ) 6 0an y b + Bx with B square an et(b) 6 0.Theny N(µ y, Σ y ). Proof: From (3.3) an (3.4), we have E y b + Bµ µ y an E[( y µ y )( y µ y ) 0 ]BΣB 0 Σ y. To fin the probability ensity function f y ( y )ofy, we again use (3.3), which gives us f y ( y ) f x [ B ( y b )] et( B ) (π) m/ [et(σ)] / e [ B ( y b ) µ ] 0 Σ [ B ( y b ) µ ] et( B ) (π) m/ [et(bσb 0 )] / e ( y b Bµ ) ( BΣB 0 ) ( y b Bµ ) (π) m/ [et(σ y )] / e ( y µ y ) Σ y ( y µ y ) (3.) So, y N(µ y, Σ y ). Thus we see, as asserte in the previous chapter, that linear transformations of multivariate normal ranom variables are also multivariate normal ranom variables. An any linear combination of inepenent normals will also be normal Quaratic Forms Theorem 3. Let x N( µ, Σ ), whereet(σ) 6 0, then( x µ ) 0 Σ ( x µ ) X m. Proof Let Σ PP 0.Then ( x µ ) N(0, Σ ), an Therefore, z P ( x µ ) N(0, I m ). nx z 0 z zi Xm P ( x µ ) 0 P ( x µ ) (x µ ) 0 P 0 P ( x µ ) (x µ ) 0 Σ ( x µ ) Xm. (3.) Σ is calle the weight matrix. With this result, we can use the X m to make inferences about the mean µ of x.

23 3.4. NORMALITY AND THE SAMPLE MEAN Normality an the Sample Mean 3.4. Moments of the Sample Mean Consier the m vectorx i i.i.. jointly, with m vector mean E[x i ]µ an m m covariance matrix E[(x i µ)(x i µ) 0 ]Σ. Define x n P n n x i as the vector sample mean which is also the vector of scalar sample means. The mean of the vector sample mean follows irectly: E[x n ] nx E[x i ]µ. n Alternatively, this result can be obtaine by applying the scalar results element by element. The secon moment matrix of the vector sample mean is given by E[(x n µ)(x n µ) 0 ] h n E ( P n x i nµ)( P n x i nµ) 0i n E[{(x µ)+(x µ)+... +(x n µ)} n nσ n Σ {(x µ)+(x µ)+... +(x n µ)} 0 ] since the covariances between ifferent observations are zero Distribution of the Sample Mean Suppose x i i.i.. N(µ, Σ) jointly. Then it follows from joint multivariate normality that x n must also be multivariate normal since it is a linear transformation. Specifically, we have or equivalently x n N(µ, n Σ) x n µ N(0, n Σ) n(xn µ) N(0, Σ) nσ / (x n µ) N(0, I m ) where Σ Σ / Σ /0 an Σ / (Σ / ) Multivariate Central Limit Theorem Theorem 3.3 Suppose that (i) x i i.i. jointly, (ii) E[x i ] µ, an (iii) E[(x i µ)(x i µ) 0 ]Σ, then n(xn µ) N(0, Σ)

24 4 CHAPTER 3. MULTIVARIATE DISTRIBUTIONS or equivalently z nσ / (x n µ) N(0, I m ). These results apply even if the original unerlying istribution is not normal an follow irectly from the scalar results applie to any linear combination of x n Limiting Behavior of Quaratic Forms Consier the following quaratic form n (x n µ) 0 Σ (x n µ) n (x n µ) 0 (Σ / Σ /0 ) (x n µ) [n (x n µ) 0 Σ /0 Σ / (x n µ) [ nσ / (x n µ)] 0 [ nσ / (x n µ)] z 0 z χ m. This form is convenient for asymptotic joint test concerning more than one mean at a time. 3.5 Noncentral Distributions 3.5. Noncentral Scalar Normal Definition 3.3 Let x N(µ, σ ). Then, z x σ N( µ/σ, ) (3.3) has a noncentral normal istribution. Example 3.3 When we o a hypothesis test of mean, with known variance, we have, uner the null hypothesis H 0 : µ µ 0, x µ 0 N(0, ) (3.4) σ an, uner the alternative H : µ µ 6 µ 0, x µ 0 σ x µ σ + µ µ 0 σ N(0, )+ µ µ 0 σ µ µ µ 0 N σ,. (3.5) uner the alternative hypothesis follows a non- Thus, the behavior of x µ 0 σ central normal istribution. As this example makes clear, the noncentral normal istribution is especially useful when carefully exploring the behavior of the alternative hypothesis.

25 3.5. NONCENTRAL DISTRIBUTIONS Noncentral t Definition 3.4 Let z N ( µ/σ, ), w χ k,anletz an w be inepenent. Then z p w/k t k ( µ ) (3.6) has a noncentral t istribution. The noncentral t istribution is use in tests of the mean, when the variance is unknown Noncentral Chi-Square Definition 3.5 Let z N (µ, I m ). Then z 0 z Xm( δ ), (3.7) has a noncentral chi-aquare istribution, where δ µ 0 µ is the noncentrality parameter. In the noncentral chi-square istribution, the probability mass is shifte to the right as compare to a regular chi-square istribution. Example 3.4 When we o a test of µ, withknownσ, wehave H 0 : µ µ 0 ( x µ 0 ) 0 Σ ( x µ 0 ) X m (3.8) Let z P (x µ 0 ). Then, we have H : µ µ 6 µ 0 ( x µ 0 ) 0 Σ ( x µ 0 ) (x µ 0 ) 0 P 0 P ( x µ 0 ) z 0 z X m[( µ µ 0 ) 0 Σ ( µ µ 0 )] (3.9) Noncentral F Definition 3.6 Let Y χ m(δ), W χ n, an let Y an W be inepenent ranom variables. Then Y/m W/n F m,n( δ ), (3.30) has a noncentral F istribution, where δ is the noncentrality parameter. The noncentral F istribution is use in tests of mean vectors, where the variance-covariance matrix is unknown an must be estimate.

26 Chapter 4 Asymptotic Theory 4. Convergence Of Ranom Variables 4.. Limits An Orers Of Sequences Definition 4. A sequence of real numbers a,a,...,a n,...,issaitohave a limit of α if for every δ>0, there exists a positive real number N such that for all n>n, a n α <δ. This is enote as lim a n α. n Definition 4. A sequence of real numbers { a n } is sai to be of at most orer n k,anwewrite{ a n } is O( n k ), if where c is any real constant. lim n n k a n c, Example 4. Let { a n } 3+/n, an{ b n } 4 n. Then, { a n } is O( ) O(n 0 ),since lim n n a n 3, an { b n } is O( n ), since lim n n b n-. Definition 4.3 A sequence of real numbers { a n } is sai to be of orer smaller than n k,anwewrite{ a n } is o( n k ), if lim n n k a n 0. 6

27 4.. CONVERGENCE OF RANDOM VARIABLES 7 Example 4. Let { a n } /n. Then, { a n } is o( ), since lim n n 0 a n Convergence In Probability Definition 4.4 A sequence of ranom variables y,y,...,y n,..., with istribution functions F ( ),F ( ),...,F n ( ),...,issaitoconverge weakly in probability to some constant c if lim n c > ]0. n (4.) for every real number >0. Weak convergence in probability is enote by or sometimes, plim y n c, (4.) n y n p c, (4.3) or y n p c. This efinition is equivalent to saying that we have a sequence of tail probabilities (of being greater than c+ or less than c ), an that the tail probabilities approach 0 as n, regarless of how small is chosen. Equivalently, the probability mass of the istribution of y n is collapsing about the point c. Definition 4.5 A sequence of ranom variables y,y,...,y n,...,issaito converge strongly in probability to some constant cif for any real >0. or lim Pr[ sup y n c > ]0, (4.4) N n>n Strong convergence is also calle almost sure convergence an is enote y n a.s. c, (4.5) y n a.s. c. Notice that if a sequence of ranom variables converges strongly in probability, it converges weakly in probability. The ifference between the two is that almost

28 8 CHAPTER 4. ASYMPTOTIC THEORY sure convergence involves an element of uniformity that weak convergence oes not. A sequence that is weakly convergent can have Pr[ y n c > ] wigglewaggle above an below the constant δ use in the limit an then settle own to subsequently be smaller an meet the conition. For strong convergence, once the probability falls below δ for a particular N in the sequence it will subsequently be smaller for all larger N. Definition 4.6 A sequence of ranom variables y,y,...,y n,...,issaito converge in quaratic mean if lim n E[ y n ]c an lim n Var[ y n ]0. By Chebyshev s inequality, convergence in quaratic mean implies weak convergence in probability. For a ranom variable x with mean µ an variance σ, Chebyshev s inequality states Pr( x µ kσ) k. Let σ n enote thevarianceony n, then we can write the conition for the present case as Pr( y n E[y n ] kσ n ) k. Since σ n 0anE[y n ] c the probability will be less than k for sufficiently large n for any choice of k. But this is just weak convergence in probability to c Orers In Probability Definition 4.7 Let y,y,...,y n,... be a sequence of ranom variables. This sequence is sai to be boune in probability if for any >δ>0, there exist a < an some N sufficiently large such that for all n>n. Pr( y n > ) <δ, These conitions require that the tail behavior of the istributions of the sequence not be pathological. Specifically, the tail mass of the istributions cannot be rifting away from zero as we move out in the sequence. Definition 4.8 Thesequenceofranomvariables{ y n } is sai to be at most of orer in probability n λ, an is enote O p ( n λ ), if n λ y n is boune in probability. Example 4.3 Suppose z N(0, ) an y n 3+n z, thenn y n 3/n + z is a boune ranom variable since the first term is asymptotically negligible an we see that y n O p ( n ).

29 4.. CONVERGENCE OF RANDOM VARIABLES 9 Definition 4.9 The sequence of ranom variables { y n } is sai to be of orer in probability smaller than n λ, an is enote o p ( n λ ), if n λ p y n 0. Example 4.4 Convergence in probability can be represente in terms of orer p p in probability. Suppose that y n c or equivalently y n c 0, then n 0 p (y n c) 0any n c o p () Convergence In Distribution Definition 4.0 A sequence of ranom variables y,y,...,y n,...,withcumulative istribution functions F ( ),F ( ),...,F n ( ),...,issaitoconverge in istribution to a ranom variable y with a cumulative istribution function F ( y ), if lim F n( ) F ( ), (4.6) n for every point of continuity of F ( ). The istribution F ( ) issaitobethe limiting istribution of this sequence of ranom variables. For notational convenience, we often write y n F ( ) ory n F ( ) if a sequence of ranom variables converges in istribution to F ( ). Note that the moments of elements of the sequence o not necessarily converge to the moments of the limiting istribution Some Useful Propositions In the following propositions, let x n an y n be sequences of ranom vectors. Proposition 4. If x n y n converges in probability to zero, an y n has a limiting istribution, then x n has a limiting istribution, which is the same. Proposition 4. If y n has a limiting istribution an plim x n 0, then for n z n y 0 n x n, plim z n 0. n Proposition 4.3 Suppose that y n converges in istribution to a ranom variable y, anplim x n c. Thenx 0 n y n converges in istribution to c 0 y. n Proposition 4.4 If g( ) is a continuous function, an if x n y n converges in probability to zero, then g( x n ) g( y n ) converges in probability to zero. Proposition 4.5 If g( ) is a continuous function, an if x n converges in probability to a constant c, thenz n g( x n ) converges in istribution to the constant g( c ).

30 30 CHAPTER 4. ASYMPTOTIC THEORY Proposition 4.6 If g( ) is a continuous function, an if x n converges in istribution to a ranom variable x, thenz n g( x n ) converges in istribution to a ranom variable g( x ). 4. Estimation Theory 4.. Properties Of Estimators Definition 4. An estimator b θ n of the p parameter vector θ is a function of the sample observations x,x,..., x n. It follows that b θ, b θ,..., b θ n form a sequence of ranom variables. Definition 4. The estimator b θ n is sai to be unbiase if E b θ n θ, for all n. Definition 4.3 The estimator b θ n is sai to be asympotically unbiase if lim n Eb θ n θ. Note that an estimator can be biase in finite samples, but asymptotically unbiase. Definition 4.4 The estimator θ b n is sai to be consistent if plimbθ n θ. n Consistency neither implies nor is implie by asymptotic unbiaseness, as emonstrate by the following examples. Example 4.5 Let eθ n ½ θ, with probability /n θ + nc, with probability /n We have E e θ n θ + c, so e θ n is a biase estimator, an lim n E e θ n θ + c, so eθ n is asymptotically biase as well. However, lim n Pr( e θ n θ > )0,so eθ n is a consistent estimator. Example 4.6 Suppose x i i.i.. N(µ, σ )fori,,..., n,... an let ex n x n be an estimator of µ. Now E[ex n ]µ so the estimator is unbiase but Pr( ex n µ >.96σ).05 so the probability mass is not collapsing about the target point µ so the estimator is not consistent.

31 4.. ESTIMATION THEORY Laws Of Large Numbers An Central Limit Theorems Most of the large sample properties of he estimators consiere in the sequel erive from the fact the estimators involve sample averages an the asymptotic behavior of averages is well stuies. In aition to the central limit theorems presente in the previous two chapters we have the following two laws of large numbers: Theorem 4. If x,x,...,x n is a simple ranom sample, that is, the x i s are i.i.., ane x i µ an Var( x i )σ, then by Chebyshev s Inequality, plim x n µ (4.7) n Theorem 4. (Khitchine) Suppose that x,x,...,x n are i.i.. ranom variables, such that for all i,...,n, E x i µ, then, plim x n µ (4.8) n Both of these results apply element-by-element to vectors of estimators. sake of completeness we repeat the following scalar central linit theorem. For Theorem 4.3 (Linberg-Levy) Suppose that x,x,...,x n are i.i.. ranom variables, such that for all i,...,n, E x i µ an Var( x i )σ,then n( xn µ ) N( 0,σ ),or lim n f( x n µ ) πσ e σ ( x n µ ) N(0,σ ) (4.9) This result is easily generalize to obtain the multivariate version given in Theorem 3.3. Theorem 4.4 (Multivariate CLT) Suppose that m ranom vectors x, x,...,x n are (i) jointly i.i.., (ii) E x i µ, an (iii) Cov( x i )Σ, then n( xn µ ) 4..3 CUAN An Efficiency N( 0, Σ ) (4.0) Definition 4.5 An estimator is sai to be consistently uniformly asymptotically normal (CUAN) if it is consistent, an if n( b θ n θ ) converges in istribution to N( 0, Ψ ), an if the convergence is uniform over some compact subset of the parameter space.

32 3 CHAPTER 4. ASYMPTOTIC THEORY Suppose that n( θ b n θ ) converges in istribution to N( 0, Ψ ). Let θ e n be an alternative estimator such that n( θ e n θ ) converges in istribution to N( 0, Ω ). Definition 4.6 If b θ n is CUAN with asymptotic covariance Ψ an e θ n is CUAN with asymptotic covariance Ω, then b θ n is asymptotically efficient relative to e θ n if Ψ Ω is a positive semiefinite matrix. Among other properties asymptotic relative efficiency implies that the iagonal elements of Ψ are no larger than those of Ω, so the asymptotic variances of θ b n,i are no larger than those of θ e n,i. An a similar result applies for the asymptotic variance of any linear combination. Definition 4.7 ACUANestimator b θ n is sai to be asymptotically efficient if it is asymptotically efficient relative to any other CUAN estimator. 4.3 Asymptotic Inference 4.3. Normal Ratios Now, uner the conitions of the central limit theorem, n( xn µ ) N( 0,σ ), so ( x n µ ) p σ /n N( 0, ) (4.) Suppose that c σ is a consistent estimator of σ.thenwealsohave ( x n µ ) p bσ /n p σ /n ( x n µ ) p σ /n p bσ /n r σ ( x n µ ) p bσ σ /n N( 0, ) (4.) since the term uner the square root converges in probability to one an the remainer converges in istribution to N( 0, ). Most typically, such ratios will be use for inference in testing a hypothesis. Now, for H 0 : µ µ 0,wehave ( x n µ 0 ) p bσ /n while uner H : µ µ 6 µ 0,wefin that ( x n µ 0 ) p bσ /n N( 0, ), (4.3) n( xn µ ) bσ + n( µ µ 0 ) bσ N(0, )+O p ( n ) (4.4)

33 4.3. ASYMPTOTIC INFERENCE 33 Thus, extreme values of the ratio can be taken as rare events uner the null or typical events uner the alternative. Such ratios are of interest in estimation an inference with regar to more general parameters. Suppose that θ is the parameter vector n( b θ p θ ) N( 0, Ψ ). p p Then, if θ i is the parameter of particular interest we consier ( b θ i θ i ) p ψii /n an uplicating the arguments for the sample mean ( b θ i θ i ) q bψii/n N( 0, ), (4.5) N( 0, ), (4.6) where Ψ b is a consistent estimator of Ψ. This ratio will have a similar behavior uner a null an alternative hypotesis with regar to θ i Asymptotic Chi-Square Suppose that n( θ b θ ) N( 0, Ψ ), where Ψ b is a consistent estimator of the nonsingular p p matrix Ψ. Then, from the previous chapter we have an n( b θ θ ) 0 Ψ n( b θ θ )n ( b θ θ ) 0 Ψ ( b θ θ ) n ( b θ θ ) 0 b Ψ ( b θ θ ) χ p (4.7) χ p (4.8) for Ψ b a consistent estimator of Ψ. This result can be use to conuct infencebytestingtheentireparmater vector. If H 0 : θ θ 0,then n ( b θ θ 0 ) 0 b Ψ ( b θ θ 0 ) an large positive values are rare events. show (later) χ p, (4.9) while for H : θ θ 6 θ 0,wecan n ( θ b θ 0 ) 0 Ψ b ( θ b θ 0 ) n (( θ b θ )+(θ θ 0 )) 0 Ψ (( θ b θ )+(θ θ 0 )) n ( θ b θ ) 0 Ψ b ( θ b θ )+n ( θ θ 0 ) 0 Ψ ( θ b θ ) +n ( θ θ 0 ) 0 Ψ b ( θ θ 0 ) χ p +O p ( n)+o p (n) (4.0)

34 34 CHAPTER 4. ASYMPTOTIC THEORY Thus,ifweobtainalargevalueofthestatistic,wemaytakeitaseviencethat the null hypothesis is incorrect. This result can also be applie to any subvector of θ. Let µ θ θ, θ where θ is a p vector. Then n( θ b θ ) N( 0, Ψ ), (4.) where Ψ is the upper left-han q q submatrix of Ψ an, n ( θ b θ ) 0 Ψ b ( θ b θ ) χ p (4.) Tests Of General Restrictions We can use similar results to test general nonlinear restrictions. r( ) isaq continuously ifferentiable function, an Suppose that n( b θ θ ) N( 0, Ψ ). By the intermeiate value theorem we can obtain the exact Taylor s series representation r( θ)r(θ)+ b r(θ ) θ 0 ( θ b θ ) or equivalently n(r( θ) b r(θ ) r(θ)) n( θ b θ ) θ 0 R( θ ) n( θ b θ ) where R( θ ) r(θ ) θ 0 an θ lies between b θ an θ. Now b θ p θ so θ p θ an R( θ ) R( θ ). Thus, we have n[ r( b θ ) r( θ )] N( 0, R( θ )ΨR 0 ( θ )). (4.3) Thus, uner H 0 : r( θ ) 0, assuming R( θ )ΨR 0 ( θ ) is nonsingular, we have n r( b θ ) 0 [R( θ )ΨR 0 ( θ )] r( b θ ) χ q, (4.4) where q is the length of r( ). In practice, we substitute the consistent estimates R( b θ )forr( θ )anbψ for Ψ to obtain, following the arguments given above n r( b θ ) 0 [R( b θ ) b ΨR 0 ( b θ )] r( b θ ) χ q, (4.5) The behavior uner the alternative hypothesis will be O p (n) asabove.

35 Chapter 5 Maximum Likelihoo Methos 5. Maximum Likelihoo Estimation (MLE) 5.. Motivation Suppose we have a moel for the ranom variable y i,fori,,...,n,with unknown (p ) parameter vector θ. In many cases, the moel will imply a istribution f( y i, θ ) for each realization of the variable y i. A basic premise of statistical inference is to avoi unlikely or rare moels, forexample,inhypothesistesting. Ifwehavearealizationofastatisticthat excees the critical value then it is a rare event uner the null hypothesis. Uner the alternative hypothesis, however, such a realization is much more likely to occur an we reject the null in favor of the alternative. Thus in choosing between the null an alternative, we select the moel that makes the realization of the statistic more likely to have occure. Carrying this iea over to estimation, we select values of θ such that the corresponing values of f( y i, θ ) are not unlikely. After all, we o not want a moel that isagrees strongly with the ata. Maximum likelihoo estimation is merely a formalization of this notion that the moel chosen shoul not be unlikely. Specifically, we choose the values of the parameters that make the realize ata most likely to have occure. This approach oes, however, require that the moel be specifie in enough etail to imply a istribution for the variable of interest. 35

36 36 CHAPTER 5. MAXIMUM LIKELIHOOD METHODS 5.. The Likelihoo Function Suppose that the ranom variables y,y,...,y n are i.i.. Then, the joint ensity function for n realizations is f( y,y,...,y n θ ) f( y θ ) f( y, θ )... f( y n θ ) ny f( y i θ ) (5.) Given values of the parameter vector θ, this function allows us to assign local probability measures for various choices of the ranom variables y,y,...,y n. This is the function which must be integrate to make probability statements concerning the joint outcomes of y,y,...,y n. Given a set of realize values of the ranom variables, we use this same function to establish the probability measure associate with various choices of the parameter vector θ. Definition 5. The likelihoo function of the parameters, for a particular sample of y,y,...,y n, is the joint ensity function consiere as a function of θ given the y i s. That is, L( θ y,y,...,y n ) ny f( y i θ ) (5.) 5..3 Maximum Likelihoo Estimation For a particular choice of the parameter vector θ, the likelihoo function gives a probability measure for the realizations that occure. Consistent with the approach use in hypothesis testing, an using this function as the metric, we choose θ that make the realizations most likely to have occure. Definition 5. The maximum likelihoo estimator of θ is the estimator obtaine by maximizing L( θ y,y,...,y n ) with respect to θ. Thatis, where b θ is calle the MLE of θ. max L( θ y,y,...,y n ) θ, b (5.3) θ Equivalently, since log( ) is a strictly monotonic transformation, we may fin the MLE of θ by solving max θ L( θ y,y,...,y n ), (5.4) where L( θ y,y,...,y n )logl(θ y,y,...,y n )

37 5.. MAXIMUM LIKELIHOOD ESTIMATION (MLE) 37 is enote the log-likelihoo function. first-orer conitions (FOC) In practice, we obtain b θ by solving the L( θ; y,y,...,y n ) θ nx log f( y i ; b θ ) θ 0. The motivation for using the log-likelihoo function is apparent since the summation form will result, after ivision by n, in estimators that are approximately averages, about which we know a lot. This avantage is particularly clear in the folowing example. Example 5. Suppose that y i i.i..n( µ, σ ), for i,,...,n. Then, f( y i µ, σ ) πσ e σ ( y i µ ), for i,,...,n. Using the likelihoo function (5.), we have L( µ, σ y,y,...,y n ) ny Next, we take the logarithm of (5.5), which gives us log L( µ, σ y,y,...,y n ) nx πσ e σ ( y i µ ). (5.5) log( πσ ) σ ( y i µ ) n log( πσ ) σ nx ( y i µ ) (5.6) We then maximize (5.6) with respect to both µ an σ. That is, we solve the following first orer conitions: (A) (B) log L( ) µ σ P n ( y i µ )0; log L( ) σ n σ + σ 4 P n ( y i µ ) 0. By solving (A), we fin that µ n P n y i y n. Solving (B) gives us σ n P n ( y i y n ). Therefore, bµ y n,anbσ n P n ( y i bµ ). Note that bσ n P n ( y i bµ ) 6 s,wheres n P n ( y i bµ ). s is the familiar unbiase estimator for σ,anbσ is a biase estimator.

38 38 CHAPTER 5. MAXIMUM LIKELIHOOD METHODS 5. Asymptotic Behavior of MLEs 5.. Assumptions For the results we will erive in the following sections, we nee to make five assumptions:. The y i s are ii ranom variables with ensity function f( y i,θ )fori,,...,n;. log f( y i,θ ) an hence f( y i,θ ) possess erivatives with respect to θ up to the thir orer for θ Θ; 3. The range of y i is inepenent of θ hence ifferentiation uner the integral is possible; 4. The parameter vector θ is globally ientifie by the ensity function log f( y i, θ )/ θ i θ j θ k is boune in absolute value by some function H ijk ( y ) for all y an θ θ, which, in turn, has a finite expectation for all θ Θ. The first assumption is funamental an the basis of the estimator. If it is not satisfie then we are misspecifying the moel an there is little hope for obtaining correct inferences, at least in finite samples. The secon assumption is a regularity conition that is usually satisfie an easily verifie. The thir assumption is also easily verifie an guaratee to be satisfie in moels where the epenent variable has smooth an infinite support. The fourth assumption must be verifie, which is easier in some cases than others. The last assumption is crucial an bears a cost an really shoul be verifie before MLE is unertaken but is usually ignore. 5.. Some Preliminaries Now, we know that Z Z L( θ 0 y )y f( y θ 0 )y (5.7)

39 5.. ASYMPTOTIC BEHAVIOR OF MLES 39 foranyvalueofthetrueparametervectorθ 0. Therefore, 0 R L( θ 0 y )y θ Z L( θ 0 y ) y θ Z f( y θ 0 ) y θ Z log f( y θ 0 ) f( y θ 0 )y, (5.8) θ an log f( y θ 0 ) 0 E θ log L( θ 0 y ) E (5.9) θ foranyvalueofthetrueparametervectorθ 0. Differentiating (5.8) again yiels Z log f( y θ 0 ) 0 θ θ 0 f( y θ 0 )+ log f( y θ0 ) f( y θ 0 ) θ θ 0 y. (5.0) Since f( y θ 0 ) θ 0 log f( y θ0 ) θ 0 f( y θ 0 ), (5.) then we can rewrite (5.0) as log f( y θ 0 ) log f( y θ 0 ) log f( y θ 0 ) 0E θ θ 0 + E θ θ 0. (5.) or, in terms of the likelihoo function, log L( θ ϑ( θ 0 0 y ) log L( θ 0 y ) log L( θ 0 y ) )E θ θ 0 E θ θ 0. (5.3) The matrix ϑ( θ 0 ) is calle the information matrix an the relationship given in (5.3) the information matrix equality. Finally, we note that log L( θ 0 y ) E θ log L( θ 0 y ) θ 0 " X n log f i ( y θ 0 ) E θ nx log fi ( y θ 0 ) E θ nx # log f i ( y θ 0 ) θ 0 log f i ( y θ 0 ) θ 0 (5.4),

40 40 CHAPTER 5. MAXIMUM LIKELIHOOD METHODS since the covariances between ifferent observations is zero Asymptotic Properties Consistent Root Exists Consier the case where p. Then, expaning in a Taylor s series an using the intermeiate value theorem on the quaratic term yiels log L( θ y ) n θ n + n nx + n log f( y i θ 0 ) θ nx nx log f( y i θ 0 ) θ ( b θ θ 0 ) 3 log f( y i θ ) θ 3 ( b θ θ 0 ) (5.5) where θ lies between b θ an θ 0. Now, by assumption 5, we have n nx 3 log f( y i θ ) θ 3 k nx H( y i ), (5.6) for some k <. So, where log L( θ y ) aδ + bδ + c, (5.7) n θ δ θ b θ 0, a k nx H( y i ), n b nx log f( y i θ 0 ) n θ, an c nx log f( y i θ 0 ). n θ Note that a P n n H( y i ) E[H( y i )] + o p () O p (), plim c 0,an n plim b ϑ( θ 0 ). n Now, since log L( θ y b )/ θ 0,wehaveaδ + bδ + c 0. Therearetwo possibilities. If a 6 0 with probability, which will occur when the FOC are

41 5.. ASYMPTOTIC BEHAVIOR OF MLES 4 nonlinear in a, then p δ b ± b 4ac. (5.8) a Since ac o p (), then δ 0 for the plus root while δ ϑ( θ 0 )/α for the negative root if plim a α 6 0exists. Ifa 0, then the FOC are linear n in δ whereupon δ c p b an again δ 0. If the F.O.C. are nonlinear but asymptotically linear then a p 0anac in the numerator of (5.8) will still p go to zero faster than a in the enominator an δ 0. Thus there exits at least one consistent solution θ b which satisfies plim( θ b θ 0 ) 0. (5.9) n an in the asymptotically nonlinear case there is also a possibly inconsistent solution. For the case of θ a vector, we can apply a similar style proof to show there exists a solution θ b to the FOC that satisfies plim( θ θ b 0 ) 0. An in the event n of asymptotically nonlinear FOC there is at least on other possibly inconsistent root. Global Maximum Is Consistent In the event of multiple roots, we are left with the problem of selecting between them. By assumption 4, the parameter θ is globally ientifie by the ensity function. Formally, this means that f( y, θ )f( y, θ 0 ), (5.0) for all y implies that θ θ 0.Now, Z f( y,θ ) E f( y, θ 0 ) f( y, θ ) f( y, θ 0 ) f( y,θ0 ). (5.) Thus, by Jensen s Inequality, we have f( y,θ ) f( y, θ ) E log f( y, θ 0 < log E ) f( y,θ 0 0, ) (5.) unless f( y, θ ) f( y, θ 0 ) for all y, or θ θ 0. Therefore, E [log f( y, θ )] achieves a maximum if an only if θ θ 0. However, we are solving max n nx p log f( y i, θ ) E [log f( y i, θ )], (5.3) p

42 4 CHAPTER 5. MAXIMUM LIKELIHOOD METHODS an if we choose an inconsistent root, we will not obtain a global maximum. Thus, asymptotically, the global maximum is a consistent root. This choice of the global root has ae appeal since it is in fact the MLE among the possible alternatives an hence the choice that makes the realize ata most likely to have occure. There are complications in finite samples since the value of the likelihoo function for alternative roots may cross over as the sample size increases. That is the global maximum in small samples may not be the global maximum in larger samples. An ae problem is to ientify all the alternative roots so we can choose the global maximum. Sometimes a solution is available in a simple consistent estimator which may be use to start the nonlinear MLE optimization. Asymptotic Normality For p,wehaveaδ + bδ + c 0,so an δ b θ θ 0 n( b θ θ 0 ) c aδ + b nc a( θ b θ 0 )+b nc. a( θ b θ 0 )+b (5.4) Now since a O p () an b θ θ 0 o p (), then a( b θ θ 0 )o p () an An by the CLT we have nc n a( b θ θ 0 )+b n X log f( y i θ 0 ) θ Substituing these two results in (5.5), we fin n( θ b θ 0 ) N(0,ϑ( θ 0 ) ). p ϑ( θ 0 ). (5.5) N(0,ϑ( θ 0 )). (5.6) In general, for p>, we can apply the same scalar proof to show n( λ 0 θ b λ 0 θ 0 ) N(0, λ 0 ϑ( θ 0 ) λ) for any vector λ, which means n( θ b θ 0 ) N( 0, ϑ ( θ 0 )), (5.7) if b θ is the global maximum.

43 5.. ASYMPTOTIC BEHAVIOR OF MLES 43 Cramér-Rao Lower Boun In aition to being the covariance matrix of the MLE, ϑ ( θ 0 )efines a lower boun for covariance matrices with certain esirable properties. Let θ( e y )be any unbiase estimator, then Z E θ( e y ) eθ( y )f( y; θ )y θ, (5.8) for any unerlying θ θ 0. respect to θ yiels Differentiating both sies of this relationship with Next, we let I p E θ(, e y ) θ 0 Z eθ( y ) f( y; θ0 ) θ 0 y Z eθ( y ) log f( y; θ0 ) θ 0 f( y; θ 0 )y E eθ( y ) log f( y; θ0 ) θ 0 E ( θ( e y ) θ 0 ) log f( y; θ0 ) θ 0. (5.9) h C( θ 0 )E ( θ( e y ) θ 0 )( θ( e i y ) θ 0 ) 0. (5.30) bethecovariancematrixofθ( e y ), then, µ µ eθ( y ) C( θ 0 ) I Cov log L p I θ p ϑ( θ 0 ), (5.3) where I p is a p p ientity matrix, an (5.3) as a covariance matrix is positive semiefinite. Now, for any (p ) vector a, wehave a 0 a 0 ϑ( θ 0 ) µ C( θ 0 ) I p I p ϑ( θ 0 ), µ a 0 a 0 ϑ( θ 0 ) a 0 [C(θ 0 ) ϑ( θ 0 ) ]a 0. (5.3) Thus, any unbiase estimator θ( e y ) has a covariance matrix that excees ϑ( θ 0 ) by a positive semiefinite matrix. An if the MLE estimator is unbiase, it is efficient within the class of unbiase estimators. Likewise, any CUAN estimator will have a covariance exceeing ϑ( θ 0 ). Since the asymptotic covariance of MLE is, in fact, ϑ( θ 0 ),itisefficient (asymptotically). ϑ( θ 0 ) is calle the Camér-Rao lower boun.

44 44 CHAPTER 5. MAXIMUM LIKELIHOOD METHODS 5.3 Maximum Likelihoo Inference 5.3. Likelihoo Ratio Test Suppose we wish to test H 0 : θ θ 0 against H 0 : θ 6 θ 0. Then, we efine an L u max θ L( θ y )L( b θ y ) (5.33) L r L( θ 0 y ), (5.34) where L u is the unrestricte likelihoo an L r is the restricte likelihoo. We then form the likelihoo ratio λ L r. (5.35) L u Note that the restricte likelihoo can be no larger than the unrestricte which maximizes the function. As with estimation, it is more convenient to work with the logs of the likelihoo functions. It will be shown below that, uner H 0, LR logλ log L r L u [L( b θ y ) L( θ 0 y )] χ p, (5.36) where θ b is the unrestricte MLE, an θ 0 is the restricte MLE. If H applies, then LR O p ( n ). Large values of this statistic inicate that the restrictions make the observe values much less likely than the unrestricte an we prefer the unrestricte an reject the restictions. In general, for H 0 : r( θ ) 0, an H : r( θ ) 6 0,wehave an Uner H 0, L u maxl( θ y )L( θ b y ), (5.37) θ L r maxl( θ y )s.t. r( θ )0 θ L( θ e y ). (5.38) LR[L( b θ y ) L( e θ y )] χ q, (5.39) where q is the length of r( ). Note that in the general case, the likelihoo ratio test requires calculation of both the restricte an the unrestricte MLE.

45 5.3. MAXIMUM LIKELIHOOD INFERENCE Wal Test TheasymptoticnormalityofMLEmaybeusetoobtainatestbaseonlyon the unrestricte estimates. Now, uner H 0 : θ θ 0,wehave n( b θ θ 0 ) N(0,ϑ ( θ 0 )). (5.40) Thus, using the results on the asymptotic behavior of quaratic forms from the previous chapter, we have Wn( b θ θ 0 ) 0 ϑ( θ 0 )( b θ θ 0 ) χ p, (5.4) which is the Wal test. As we iscusse for quaratic tests, in general, uner H 0 : θ 6 θ 0,wewoulhaveWO p ( n ). In practice, since we use L( θ y b ) n θ θ 0 nx log f( θ y b ) p n θ θ 0 ϑ( θ 0 ), (5.4) W ( θ b θ 0 ) 0 L( θ y b ) θ θ 0 ( θ b θ 0 ) χ p. (5.43) Asie from having the same asymptotic istribution, the Likelihoo Ratio an Wal tests are asymptotically equivalent in the sense that plim(lr W )0. (5.44) n This is shown by expaning L( θ 0 y ) in a Taylor s series about b θ.thatis, L( θ 0 ) L( θ b )+ L( θ b ) θ ( θ0 θ b ) + ( θ0 θ b ) 0 L( θ b ) θ θ 0 ( θ 0 θ b ) + P P P 3 L( θ ) (θi 0 θ 6 θ i θ j θ b i )(θj 0 θ b j )(θk 0 θ b k ). (5.45) k i j k where the thir line applies the intermeiate value theorem for θ between θ b an θ 0. Now L( θ ) θ 0, an the thir line can be shown to be O p (/ n) uner assumption 5, whereupon we have L( b θ ) L( θ 0 ) ( θ0 b θ ) 0 L( b θ ) θ θ 0 ( θ 0 b θ )+O p (/ n ) (5.46)

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information