Lecture 5: Linear Regressions
|
|
- Easter Black
- 6 years ago
- Views:
Transcription
1 Lecture 5: Liear Regressios I lecture 2, we itroduced statioary liear time series models. I that lecture, we discussed the data geeratig processes ad their characteristics, assumig that we kow all parameters (autoregressive or movig average coefficiets). However, i empirical studies, we have to specify a ecoometric model, estimate this model ad draw ifereces based o the estimates. I this lecture, we will provide a itroductio to parametric estimatio of a liear model with time series observatios. Three commoly used estimatio methods are least square estimatio (LS), maximum likelihood estimatio (MLE) ad geeral method of momets (GMM). I this lecture, we will discuss LS ad MLE. Least Square Estimatio Least square (LS) estimatio is oe of the first techiques we lear i ecoometrics. It is both ituitive ad easy to implemet, ad the famous Gauss-Markov theorem tells that uder certai assumptios, ordiary least square (OLS) estimator is the best liear ubiased estimator (BLUE). We will start from review of classical LS estimatio ad the we will cosider estimatios with relaxed assumptios. Below are our otatios i this lecture ad the basic algebra i LS estimatio. Cosider the regressio y t = x tβ 0 + u t, t =,..., () where x t is k by vector ad β 0, also a k by vector is the true parameter. The the OLS estimator of β 0, deoted by ˆβ is ˆβ = x t x t x t y t (2) ad the OLS sample residual is Y = û t = y t x t ˆβ. Sometimes, it is more coveiet to work i matrix form. Defie y x u y 2 x 2 u 2. y The the regressio ca be writte as Copyright by Lig Hu. X =. x U =. u Y = X β 0 + U, (3)
2 ad the OLS estimator ca be writte as ˆβ = (X X ) X Y. (4) Defie M X = I X (X X ) X. It is easy to see that M x is symmetric, idempotet (M x M x = M x ), ad orthogoal to the colums of X. The we have Û = Y X ˆβ = M X Y. To derive the distributio of the estimator ˆβ, ˆβ = (X X ) X Y = (X X ) X (X β 0 + U ) = β 0 + (X X ) X U. (5) Therefore, the properties of ˆβ depeds o (X X ) X U. For example, if E(X X ) X U = 0, the ˆβ is ubiased estimator.. Case : OLS with determiistic regressors ad i.i.d. Gaussia errors Assumptio (a) x t is determiistic; (b) u t i.i.d(0, σ 2 ); (c) u t i.i.d.n(0, σ 2 ). ad Uder assumptio (a) ad (b), E(U ) = 0 ad E(U U ) = σ 2 I. The from (5) we have E( ˆβ ) = β 0 + (X X ) X E(U ) = β 0, E( ˆβ β 0 )( ˆβ β 0 ) = E(X X ) X U U X (X X ) = (X X ) X E(U U )X (X X ) = σ 2 (X X ) Uder these assumptios, Gauss-Markov theorem tells that the OLS estimator ˆβ is the best liear ubiased estimator for β 0. The OLS estimator for σ 2 is s 2 = ÛÛ /( k) = U M X M X U /( k) = U M X U /( k). (6) Sice M X is symmetric, there exists a by matrix P such that M X = P ΛP ad P P = I where Λ is a by matrix with the eigevalues of M X alog the pricipal diagoal ad zeros elsewhere. From properties of M X we ca compute that Λ cotais k zeros ad k oes alog its pricipal diagoal. The RSS = U M X U = U P ΛP U = (P U )Λ(P U ) = W λw = λ t wt 2 2
3 where W = P U. The E(W W ) = P E(U U )P = σ 2 I, therefore, w t are ucorrelated with mea 0 ad variace σ 2. Therefore, E(U M X U ) = λ t E(wt 2 ) = ( k)σ 2. So the s 2 defied i (6) is ubiased estimator for σ 2 : E(s 2 ) = σ 2. With the Gaussia assumptio (c), ˆβ is also Gaussia, ˆβ N(0, σ 2 (X X ) ). Note that here ˆβ is exact ormal, while may of the estimator i our later discussios are asymptotically ormal. Actually, uder assumptio, OLS estimator is optimal. Also, with the Gaussia assumptio, w t is i.i.d.n(0, σ 2 ). Therefore we have U M X U /σ 2 χ 2 ( k)..2 Case 2: OLS with stochastic regressors ad i.i.d. Gaussia errors The assumptio of determiistic regressors is very strog for empirical studies i ecoomics. Some examples of determiistic regressors are costats ad determiistic tred (i.e. x t = (, t, t 2,...)). However, most data we have for ecoometric regressio are stochastic. Therefore from this subsectio, we will allow the regressors to be stochastic. However, i case 2 ad case 3, we assume that x t is idepedet of errors (leads ad lags). This is still too strog i time series, as it rules out may processes icludig ARMA models. Assumptio 2 (a) x t is stochastic ad idepedet of u s for all t, s; (b) u t i.i.d.n(0, σ 2 ). This assumptio ca be equivaletly writte as U X N(0, σ 2 I ). Uder these assumptios, ˆβ is still ubiased: E( ˆβ ) = β 0 + E(X X ) X E(U ) = β 0. Coditioal o X, ˆβ is ormal, ˆβ X N(β 0, σ 2 (X X ) ). To get the ucoditioal probability distributio for ˆβ, we have to itegrate this coditioal desity over X. Therefore, the ucoditioal distributio of ˆβ will deped o the distributio of X. However, we still have the ucoditioal distributio for the estimate of the variace U M X U /σ 2 χ 2 ( k)..3 Case 3: OLS with stochastic regressors ad i.i.d. No-Gaussia errors Compared to case 2, i this sectio we let the error terms to follow arbitrary i.i.d. distributio with fiite fourth momets. Sice this is a arbitrary ukow distributio, it is very hard obtai exact distributio (fiite sample distributio) for ˆβ, istead, we will apply asymptotic theory i this problem. Assumptio 3 (a) x t is stochastic ad idepedet of u s for all t, s; (b) u t i.i.d.(0, σ 2 ), ad E(u 4 t ) = µ 4 < ; (c) E(x t x t) = Q t, a positive defiite matrix with (/) Q t Q, a positive defiite matrix; (d) E(x it x jt x kt x lt ) < for all i, j, k, l ad t; (e) (/) (x tx t) p Q. 3
4 With assumptio (a), we still have the ˆβ is ubiased estimator for β 0. The assumptio (c) to (e) are restrictios o x t. Basically we wat to have (/) x tx t p (/) E(x tx t). We have ˆβ β 0 = x t x t x t u t = (/) x t x t (/) From assumptios ad cotiuous mappig theorem, we have (/) x t x t p Q. x t u t x t u t is a martigale differece sequece with fiite variace, the by LLN for mixigales, we have (/) x t u t p 0. Therefore, ˆβ p β 0, so ˆβ is a cosistet estimator. Next, we will derive the distributio for it. This is the first time we derive asymptotic distributio for a OLS estimator. The routies i derivig asymptotically distributio for ˆβ are outlied as follows: first we apply LLN o the term x tx t, after properly ormed (so that the limit is a costat); the apply cotiuous mappig theorem to get the limit for x tx t. We already got this i the above proof of cosistecy for ˆβ. The we apply CLT o the term x tu t, also after properly ormed (so that the limit is odegeerate). Note E(x t x tu 2 t ) = σ 2 Q t ad (/) σ2 Q t σ 2 Q. By CLT for mds, we have (/ ) x t u t N(0, σ 2 Q). Therefore, ( ˆβ β 0 ) = (/) x t x t (/ ) x t u t N(0, Q (σ 2 Q)Q ) = N(0, σ 2 Q ). so the ˆβ follows ˆβ N (β 0, σ2 Q ). Note that this distributio is ot exact, but approximate. So we should read it as approximately distributed as ormal. 4
5 To compute this variace, we eed to kow σ 2. Whe it is ukow, the OLS estimator s 2 is still cosistet uder assumptio 3. We have u 2 t = (y t x tβ 0 ) 2 = y t x t ˆβ + x t( ˆβ β 0 ) 2 = (y t x t ˆβ ) 2 + 2(y t x t ˆβ )x t( ˆβ β 0 ) + x t( ˆβ β 0 ) 2 By LLN, we have (/) u2 t σ 2. There are three terms i the above equatio. For the secod term, we have (/) (y t x ˆβ t )x t( ˆβ β 0 ) = 0 as (y t x t ˆβ ) is orthogoal to x t. For the third term, ( ˆβ β 0 ) (/) x tx t ( ˆβ β 0 ) p 0 as ˆβ β 0 is o p () ad (/) x tx t Q. Therefore, we ca defie ad we have ˆσ 2 = (/) ˆσ 2 = (/) (y t x ˆβ t ) 2 = (/) (y t x ˆβ t ) 2, u 2 t (/) x t( ˆβ β 0 ) 2 σ 2. This estimator is oly slightly differet from ŝ 2 (ˆσ 2 = ( k)ŝ 2 /). Sice ( k)/ as, if ˆσ 2 is cosistet, so is s 2. Next, to derive the distributio of ˆσ 2. (ˆσ 2 σ 2 ) = (/ ) (u 2 t σ 2 ) ( ˆβ β 0 ) (/) x tx t ( ˆβ β 0 ). The secod term goes to zero as (/) x tx t p Q ad ˆβ β 0 p 0. Defie z t = u 2 t σ 2, the z t is i.i.d. with mea zero ad variace E(u 4 t ) σ 4 = µ 4 σ 4. Applyig CLT, we have (/ ) z t d N(0, µ 4 σ 4 ), therefore, (ˆσ 2 σ 2 ) d N(0, µ 4 σ 4 ). The same limit distributio applies for s 2, sice the differece betwee ˆσ 2 ad s 2 is o p ( /2 ). 5
6 .4 Case 4: OLS estimatio i autoregressio with i.i.d. error I a autoregressio, say, x t = φ 0 x t + ɛ t, where ɛ t is i.i.d., the regressors are o loger idepedet of ɛ t. I this case, the OLS estimator of φ 0 is biased. However, we will show that uder assumptio 4, the estimator is cosistet. Assumptio 4 The regressio model is y t = c + φ y t + φ 2 y t φ p y t p + ɛ t, with roots of ( φ z φ 2 z 2... φ p z p ) = 0 outside the uit circle (so y t is statioary) ad with ɛ t i.i.d. with mea zero, variace σ 2, ad fiite fourth momets µ 4. Page i Hamilto presets the geeral AR(p) case with costat. We will use AR(2) as a example, y t = φ y t + φ 2 y t 2 + ɛ t. Let x t = (y t, y t 2 ), u t = ɛ t ad y t = x tβ 0 + u t (so β 0 = (φ, φ 2 )). ( ˆβ β 0 ) = (/) x t x t (/ ) x t u t (7) The first term (/) x t x t = (/) y2 t y t y t 2 y t y t 2 y2 t 2 I this matrix, first, o the diagoal, y2 t j coverge to γ 0. The remaiig term coverges to γ. Therefore, (/) x t x γ0 γ t p Q = γ γ 0 Apply CLT for mds o the secod term i (7), (/ ) x t u t d N(0, σ 2 Q),. y t y t 2 therefore, ( ˆβ β 0 ) d N(0, σ 2 Q ). So far we have cosidered four cases i OLS regressios. The commo assumptio i all those four cases are i.i.d. errors. From ext sectio, we will cosider cases where the errors are ot i.i.d...5 OLS with o-i.i.d. errors Whe the error u t is i.i.d., the the variace-covariace matrix V = E(U U ) = σ 2 I. If V is still diagoal but the elemets are ot equal, for example, the errors o some dates display larger variace ad the errors o some dates display smaller variace, the the errors are said to exhibit heteroskedasticity. If V is o-diagoal, the the errors are said to be autocorrelated. For example, let u t = ɛ t φɛ t where ɛ t is i.i.d., the u t is serially correlated errors. Case 5 i Hamilto assumes 6
7 Assumptio 5 (a) x t is stochastic; (b) coditioal o the full matrix X, the vector U N(0, σ 2 V ); (c) V is a kow positive matrix. Uder these assumptios, the exact distributio of ˆβ ca be derived. However, this is a very strog assumptio ad it rules out the autoregressive regressio. Also, the assumptio that V is kow rarely holds i applicatios. Case 6 i Hamilto assumes ucorrelated but heteroskedastic errors with ukow covariace matrix. Uder assumptio 6, the OLS estimator is still cosistet ad asymptotically ormal. Assumptio 6 (a) x t stochastic, icludig perhaps lagged values of y; (b) x t u t is martigale differece sequece; (c) E(u 2 t x t x t) = Ω t, a positive defiite matrix, with (/) Ω t p Ω ad (/) u2 t x t x t p Ω; (d) E(u 4 t x it x jt x lt x kt ) < for all i, j, k, l ad t; (e) plims of (/) u tx it x t x t ad (/) x itx jt x t x t exist ad are fiite for all i, j ad (/) x tx t p Q, a osigular matrix. Agai, write the OLS estimator as ( ˆβ β 0 ) = (/) Assumptio 6 (e) esures that Apply CLT for mds, therefore, (/) (/ ) x t x t (/ ) x t x t p Q. x t u t N(0, Ω), ( ˆβ β 0 ) N(0, Q ΩQ ). x t u t However, both Q ad Ω are ot observable ad we eed to fid cosistet estimates for them. White proposes the followig estimator ˆQ = (/) x tx t ad ˆΩ = (/) û2 t x t x t where û t is the OLS residual y t x ˆβ t. Propositio With heteroskedasticity of ukow form satisfyig assumptio 6, the asymptotic variace-covariace matrix of the OLS coefficiet vector ca be cosistetly estimated by ˆQ ˆΩ ˆQ p Q ΩQ (8) Proof: Assumptio 6 (e) esures ˆQ Q ad assumptio 6 (c) esures that Ω (/) So to prove (8), we oly eed to show that ˆΩ Ω = (/) u 2 t x t x t p Ω. (û 2 t u 2 t )x t x t 0. 7
8 The trick here is to make use of a kow fact that ˆβ β 0 p 0. If we could write ˆΩ Ω as sums of some products of ˆβ β 0 ad terms that are bouded, the ˆΩ Ω p 0. û 2 t u 2 t = (û t + u t )(û t u t ) = 2(y t β 0x t ) ( ˆβ β 0 ) x t ( ˆβ β 0 ) x t = 2u t ( ˆβ β 0 ) x t + ( ˆβ β 0 ) x t 2 The ˆΩ Ω = ( 2/) u t ( ˆβ β 0 ) x t (x t x t) + (/) ( ˆβ β 0 ) x t 2 (x t x t). Write the first term ( 2/) u t ( ˆβ β 0 ) x t (x t x t) = 2 k ( ˆβ i β i0 ) (/) i= u t x it (x t x t). The term i the bracket has a fiite plim by assumptio 6 (e) ad we have ˆβ i β i0 0 for each i. The this term coverges to zero. (if this looks messy, take k =, the you ca simply move ( ˆβ β 0 ) out of the summatio. ˆβ β 0 p 0 ad the sum has a fiite limit, so the product goes to zero). Similarly for the secod term, (/) ( ˆβ k k β 0 ) x t 2 (x t x t) = ( ˆβ i β i0 )( ˆβ j β j0 ) (/) x it x jt (x t x t) p 0 i= j= as the term i bracket has a fiite plim. Therefore, ˆΩ Ω 0. Defie ˆV = ˆQ ˆΩ ˆQ, the ˆβ N(β 0, ˆV /), ad V / is a heteroskedastic-cosistet estimates for the variace-covariace matrix. Newey-West proposes the followig estimator for the variace-covariace matrix which is heteroskedastic ad autocorrelatio cosistet (HAC). q ( ˆV / = (X X ) û 2 t x t x t + k ) (x t û t û t k x t k q + + x t kû t k û t x t) (X X ). t=k+.6 Geeral least square k= Geeral least square (GLS) ad feasible geeral least square (FGLS) is preferred i least square estimatio whe the errors are heteroskedastic or/ad autocorrelated. Let x t be stochastic ad U X N(0, σ 2 V ) where V is kow (assumptio 5). Sice V is symmetric ad positive defiite, there exists matrix L such that V = L L. Premultiply L to our regressio ad get LY = LXβ 0 + LU. 8
9 The the ew error Ũ = LU is i.i.d. coditioal o X, E(ŨŨ X) = LE(UU X)L = σ 2 LV L = σ 2 I. The the estimator β = (X L LX) X L Ly = (X V X) X V y is kow as the geeral least square estimator. However, as we remarked earlier, i applicatios, V is rarely kow ad we have estimate it. The GLS estimator obtaied usig estimated V is kow as feasible GLS estimator. Usually, FGLS require that we specify a parametric model for the error. For example, let the error u t follow a AR() process, u t = ρ 0 u t + ɛ t where ɛ t i.i.d.(0, σ 2 ). I this case, we ca ru OLS first ad obtai the OLS residual û t. The ru OLS estimatio for ρ usig the û t. This estimator, deoted by ˆρ, is cosistet estimator for ρ. To show this, write û t = (y t β 0 x t + β 0 x t ˆβ x t ) = u t + (β 0 ˆβ ) x t. û t û t = = = = u t + (β 0 ˆβ ) x t u t + (β 0 ˆβ ) x t u t u t + (β 0 ˆβ ) u t u t + o p () (ɛ t + ρu t )u t ρvar(u t ). (u t x t + u t x t ) + (β 0 ˆβ ) x t x t (β 0 ˆβ) Similarly, we ca show that ûtû t p var(u t ), hece ˆρ ρ 0. Still use similar method, we ca show that û t û t = u t u t + o p (). Hece (ˆρ ρ0 ) N(0, ( ρ 2 0)). Fially the FGLS estimator for β 0 based o V (ˆρ) has the same limit distributio as the GLS estimator based o V (ρ 0 ) (page i Hamilto)..7 Statistical iferece with LS estimatio Some commoly used test statistics for LS estimator are t statistics ad F statistics. t statistics is used to test the hypothesis of a sigle parameter, say β i = c. For simplicity, we assume that c = 0, so we use t statistics to test if a variable is sigificat. The t statistics is defied as the ratio 9
10 ˆβ i /sd(β i ). Let the estimate of the variace of ˆβ be deoted by s 2 Ŵ, the the stadard deviatio of ˆβ i is the product of s ad the square root of the ith elemet o the diagoal, i.e., t = ˆβ i s 2 w ii. (9) Recall that if X/σ N(0, ), ad Y 2 /σ 2 χ 2 (m), ad let X ad Y be idepedet, the t = X m Y follows exact studet t distributio with m degree of freedom. F -statistics is used to test the hypothesis of m differet liear restrictios about β, say H 0 : Rβ = r, where R is a m by k matrix. The F statistics is the defied as F = (R ˆβ r) V ar(r ˆβ r) (R ˆβ r). (0) This is a Wald statistics. To derive the distributio of the statistics, we will eed the followig result Propositio 2 If a k by vector X N(µ, Σ), the (X µ) Σ (X µ) χ 2 (k). Also recall that a exact F (m, ) distributio is defied to be F (m, ) = χ2 (m)/m χ 2 ()/. With assumptio Ŵ = (X X ), ad uder the hull hypothesis ˆβ i N(0, σ 2 w ii ). We ca the write ˆβ i σ t = 2 w ii. s 2 σ 2 Sice the umerator is N(0, ) ad the deomiator is the square root of χ 2 ( k) divided by k (sice RSS/σ 2 χ 2 ( k)), ad the umerator ad deomiator are idepedet, so t statistics (9) uder assumptio follows exact t distributio. With assumptio ad uder the ull hypothesis, we have R ˆβ r N(0, σ 2 R(X X ) R), the by propositio 2, the F statistics defied i (0) uder hypothesis H 0 (R ˆβ r) σ 2 R(X X ) R (R ˆβ r) χ 2 (m). If we replace σ 2 with s 2, ad divide it by the umber of restrictios m, we get the OLS F test of a liear hypothesis F = (R ˆβ r) s 2 R(X X ) R (R ˆβ r)/m = F /m (RSS/σ 2 )/( k), 0
11 so F follows a exact F (m, k) distributio. A alterative way to express the F statistics is to compute the estimator without restrictio ˆβ ad its associated sum of residual RSS u ; ad the estimator with restrictio β ad its associated sum of residual RSS r, the we ca write F = (RSS r RSS u )/m. RSS u /( k) Now, with assumptio 2, X is stochastic ad ˆβ is ormal coditioal o X ad RSS σ 2 χ 2 ( k) coditioal o X. This coditioal distributio of RSS is the same for all X, therefore, the ucoditioal distributio of RSS is the same as the coditioal distributio. The same is true for the t ad F statistics. Therefore we have the same results uder assumptio 2 as that uder assumptio. From case 3, we o loger have exact distributio for the estimator, ad we have to derive the asymptotic distributio for the estimator, so we also use the asymptotic distributios for the test statistics. t = ˆβ i s wii = ˆβi s wii. where w ii is the ith elemet o the diagoal of ˆβ s asymptotic variace Q. If we let the ith elemet o the diagoal of Q deoted by q ii, the we have ˆβ i d N(0, σ 2 q ii ). Recall that uder assumptio 3, s σ, there we have t N(0, ). Next, write F = (R ˆβ r) s 2 R(X X ) R (R ˆβ r)/m = (R ˆβ r) s 2 R(X X /) R (R ˆβ r)/m Now we have s 2 p σ 2, X X / Q, ad uder the ull, (R ˆβ r) = R ( ˆβ β0 ) d N(0, σ 2 RQ R ). The by propositio 2, we have mf χ 2 (m). We ca the use similar methods to derive the distributio for other cases. I geeral if ˆβ p β 0 ad asymptotically ormal, s 2 σ 2, ad we have foud a cosistet estimate for the variace of ˆβ, the the t ad F statistics follow asymptotically ormal ad χ 2 (m) distributio. Actually, uder assumptio or 2, whe the sample size is large, we ca also use ormal ad χ 2 distributio to approximate the exact t ad F distributio. Further, sice we are usig the asymptotic distributio, the Wald test ca also be used to test oliear restrictios.
12 2 Maximum Likelihood Estimatio 2. Review: maximum likelihood priciple ad Cramer-Rao lower boud The basic idea of maximum likelihood priciple is to choose the parameter estimates that maximizes the probability of obtaiig the observed sample. Cosider that we observe a sample X = (x, x 2,..., x ) ad assume that the sample is draw from a i.i.d. distributio ad the associated parameters are deoted by θ. Let p(x t ; θ) deote the pdf of the tth observatio. For example, whe x t i.i.d.n(µ, σ 2 ), the θ = (µ, σ 2 ) ad p(x t ; θ) = (2πσ 2 ) /2 exp (x t µ) 2 2σ 2. The likelihood fuctio for the whole sample X is ad the log likelihood fuctio is L(X ; θ) = l(x ; θ) = p(x t ; θ) log p(x t ; θ). The maximum likelihood estimates for θ are chose so that l(x ; θ) is maximized. Defie the score fuctio S(θ) = l(θ)/ θ, ad the Hessia matrix H(θ) = 2 l(β)/ θ θ, the the famous Cramer-Rao iequality tells that the lowest boud for the variace of a ubiased estimator of θ is the iverse of the iformatio matrix I(θ 0 ) = ES(θ 0 )S(θ 0 ), where θ 0 deotes the true value of the parameter. A estimator that have a variace equal to this boud is kow as efficiet. Uder some regularity coditio which are satisfied for the Gaussia desity, we have the followig equality 2 l(θ) I(θ) = EH(θ) = E θ θ. So, if we fid a ubiased estimator ad its variace achieves the Cramer-Rao lower boud, the we kow that this estimator is efficiet ad there is o other ubiased estimator (liear or oliear) that could have smaller variace tha this estimator. However, this lower boud is ot always achievable. If a estimator does achieve this boud, the this estimator is idetical to MLE. Note that Cramer-Rao iequality holds for ubiased estimator while sometimes ML estimators are biased. If the estimator is biased but cosistet, ad its variace approaches the Cramer-Rao boud asymptotically, the this estimator is kow as asymptotically efficiet. Example (MLE estimatio for i.i.d. Gaussia distributio) Let x t i.i.d.n(µ, σ 2 ), so the parameter θ = (µ, σ 2 ). The we have { p(x t ; θ) = exp (x t µ) 2 } 2πσ 2 2σ 2 l(x ; θ) = 2 log(2π) 2 log(σ2 ) 2σ 2 (x t µ) 2 2
13 S(X ; µ) = l(x ; θ) µ S(X ; σ 2 ) = l(x ; θ) σ 2 = σ 2 (x t µ) 2 = 2σ 2 + 2σ 4 (x t µ) 2 Set the score fuctios to zero, we foud the MLE estimator for θ are ˆµ = X ad ˆσ 2 = (x t ˆµ) 2. It is easy to verify that E(ˆµ) = E( X ) = µ, so ˆµ is ubiased ad its variace V ar(ˆµ) = σ 2 /, while Eˆσ 2 = E (x t ˆµ) 2 = E(x t ˆµ) 2 = E(x t µ) + (µ ˆµ) 2 = σ 2 2 σ2 + σ2 = σ2 so ˆσ 2 is biased, but it is cosistet as ˆσ 2 σ 2 as. Defie s 2 = (x t ˆµ) 2, the Es 2 = σ 2, ad V ar(s 2 ) = 2σ 4 /( ). We ca further compute the Hessia matrix, H(X ; θ) = 2 l(x ;θ) 2 µ 2 l(x ;θ) σ 2 µ 2 l(x ;θ) µσ 2 2 l(x ;θ) 2 σ 2 where l(x ; θ) 2 µ = σ 2 l(x ; θ) µσ 2 = l(x ; θ) σ 2 µ l(x ; θ) 2 σ 2 = 2σ 4 σ 6 = σ 4 (x t µ) (x t µ) 2 We ca also compute that H(X ; θ) θ=ˆθ = 2 2σ 6 > 0, so we kow that the we have foud the maximum (ot miimum) of the likelihood fuctio. Next, compute the iformatio matrix, E θ (x t µ) = 0, E θ (x t µ) 2 = σ 2. 3
14 therefore the iformatio matrix I(θ) = E H(X ; θ) = σ σ 4 So the MLE of µ has achieved the Cramer-Rao lower boud of variace σ2. Although s2 does ot achieve to the lower boud, it turs out it is still the ubiased estimator for σ 2 with miimum variace. 2.2 Asymptotic Normality of MLE There are a few regularity coditios to esure that the MLE is cosistet. First we assume that the data is strictly statioary ad ergodic (for example, i.i.d.). Secod, we assume that the parameter space Θ is covex ad either the estimate ˆθ or the true parameter θ 0 lie o the boudary of Θ. Third, we require that the likelihood fuctio evaluated at ˆθ is differet from θ 0, for ay ˆθ θ 0 i Θ. This is kow as the idetificatio coditio. Fially, we assume that Esup θ Θ l(x ; θ) <. With all those coditios satisfied, the MLE is cosistet ˆθ p θ 0. Next we will discuss the asymptotic results o the score fuctio S(X ; θ), the Hessia matrix H(X ; θ) ad the asymptotic distributio of the MLE estimates ˆθ. First, we wat to show that ES(X, θ 0 ) = 0 ad ES(X, θ 0 )S(X, θ 0 ) = E(H(X ; θ 0 ). Let the itegral operator deote itegrate over X, X 2,..., X, the we have that L(X, θ 0 )dx =. Takig derivative with respect to θ, the we have L(X, θ 0 ) dx = 0. θ While, we ca write L(X, θ 0 ) dx θ L(X, θ 0 ) = L(X, θ 0 )dx L(X, θ 0 ) θ l(x ; θ 0 ) = L(X, θ 0 )dx θ = ES(X, θ 0 ) So we kow that ES(X, θ 0 ) = 0. Next, let the itegral (which equal to zero) take θ, it is l(x ; θ 0 ) L(X, θ 0 ) 2 l(x ; θ 0 ) θ θ dx + θ θ L(X, θ 0 )dx = 0. The secod term is just EH(X ; θ 0 ). The first ca be writte as ( ) l(x ; θ 0 ) L(X, θ 0 ) θ L(X, θ 0 ) θ L(X, θ 0 )dx l(x ; θ 0 ) l(x ; θ 0 ) = θ θ L(X, θ 0 )dx = ES(X, θ 0 )S(X, θ 0 ) 4
15 Now, sice ES(X, θ 0 )S(X, θ 0 ) + EH(X ; θ 0 ) = 0, we have that ES(X, θ 0 )S(X, θ 0 ) = EH(X ; θ 0 ). log p(xt;θ) Next, defie that s(x t ; θ) = θ, the we write the score fuctio as the sum of s(x t ; θ), i.e., S(X, θ) = s(x t; θ). s(x t ; θ) is i.i.d. ad we ca show that Es(x t ; θ 0 ) = 0 ad Es(x t ; θ 0 )s(x t ; θ 0 ) = EH(x t ; θ 0 ). Applyig Lideberg-Levy CLT, we obtai the asymptotic ormality of the score fuctio /2 S(X ; θ 0 ) d N(0, EH(X ; θ 0 )). Next, we cosider the properties of the Hessia matrix. First we assume that EH(X ; θ 0 ) is o-sigular. Let N ɛ be a eighborhood of θ 0, ad the we have E sup θ N ɛ H(X ; θ) <, H(x t ; θ) EH(X ; θ 0 ) V, where θ is ay cosistet estimator for θ 0. Apply the LLN, we have H(X ; θ 0 ) = H(x t ; θ 0 ) p E(x t ; θ 0 ) = E H(X ; θ 0 ) Σ. With the otatio Σ, we ca write /2 S(X ; θ 0 ) d N(0, Σ). Propositio 3 (Asymptotic ormality of MLE) With all the coditios we have outlied above, (ˆθ θ0 ) d N(0, Σ ). Proof: Do a Taylor expasio of S(X ; ˆθ) aroud θ 0, 0 = S(X ; ˆθ) S(X ; θ 0 ) + (ˆθ θ 0 )H(X ; θ 0 ). Therefore, we have (ˆθ θ0 ) = S(X ; θ 0 )H(X ; θ 0 ) ( ) ( = S(X ; θ 0 ) ) H(X ; θ 0 ) N(0, Σ ΣΣ ) = N(0, Σ ) Note that Σ = E H(X ; θ 0 ) = I(θ 0 ), so the asymptotic distributio of ˆθ ca be writte as ˆθ N(θ 0, I(θ 0 ) ). However, I(θ 0 ) depeds o θ 0 which is ukow. So we eed to fid a cosistet estimator for it, deoted by ˆV. There are two methods to compute this variace matrix of ˆθ. Oe way is that 5
16 we compute the Hessia matrix, ad evaluate it at θ = ˆθ, i.e. ˆV = H(X ; ˆθ). The secod way is to use the outer product estimate, which is ˆV = 2.3 Statistical Iferece for MLE S(x t ; ˆθ)S(x t ; ˆθ). There are three asymptotically equivalet tests for MLE: likelihood ratio (LR) test, Wald test, ad Lagrage multiplier (LM) test or score test. You ca probably fid discussio o these three tests o ay graduate text book i ecoometrics, so we oly describe them briefly here. The likelihood ratio test is based o the differece betwee the likelihood you computed (maximized) with or without the restrictio. Let l u deote the likelihood without restrictio ad l r deote the likelihood with restrictio (ote that l r l u ). If the restrictio is valid, the we expect the l r should ot be too much lower tha l u. Therefore, to test if the restrictio is valid, the statistics we compute is 2(l u l r ) which follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. To do LR test, we have to compute the likelihood uder both restricted ad urestricted coditio. I compariso, the other two tests oly use either the estimator without restrictio (deoted by ˆθ) or the estimator with restrictio (deoted by θ). Let the restrictio be H 0 : R(θ) = r, the idea of Wald test is that: if this restrictio is valid, the the estimator obtaied without restrictio ˆθ will make R(ˆθ) r close to zero. Therefore the Wald statistics is W = (R(ˆθ) r) V ar(r(ˆθ) r) (R(ˆθ) r), which also follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. To fid the ML estimator, we set the score fuctio equal to zero ad solve for the estimator, i.e., S(ˆθ) = 0. If the restrictio is valid, ad the estimator we obtaied with the restrictio is θ, the we expect that S( θ) is close to zero. This idea leads to the LM test or score test. The LM statistics is LM = S( θ) I( θ) S( θ), which also follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. 2.4 LS ad MLE I a regressio Y = X β 0 + U where U X N(0, σ 2 I ) (as i assumptio 2), the coditioal desity of Y give X is f(y X; θ) = (2πσ 2 ) /2 exp 2σ 2 (Y Xβ) (Y Xβ). The log likelihood fuctio is l(y X; θ) = 2 log(2π) 2 log(σ2 ) 2σ 2 (X Xβ) (X Xβ) 6
17 Note that ˆβ that maximizes l is the vector that miimizes the sum of squares, therefore, uder the assumptio 2, the OLS estimator is equivalet to ML estimator for ˆβ 0. It ca be show that this estimator is ubiased ad achieves the Cramer-Rao lower boud, therefore uder assumptio 2, the OLS/MLE estimator are efficiet (compared to all ubiased liear or oliear estimators). Recall that uder assumptio, we have Gauss-Markov theorem to show that OLS estimator is the best liear ubiased estimator. Now, the Cramer-Rao iequality tells the optimality of OLS estimator uder assumptio2. The ML estimator for σ 2 is (Y Xβ) (Y Xβ)/. We have itroduced this estimator a momet ago ad we showed that the differece betwee ˆσ 2 ad the OLS estimator s 2 becomes arbitrarily small as. Next, cosider assumptio 5, where U X N(0, σ 2 V ) ad V is kow. The the log likelihood fuctio omittig costat term is l(y X, β) = (/2)logV (/2)(Y Xβ) V (Y Xβ). The MLE estimator is ˆβ = (X V X) X Y, which is equivalet to the GLS estimator. The score vector is S (β) = (Y Xβ) V X, the Hessia matrix H (β) = X V X. Therefore, the iformatio matrix is I(β) = X V X. Therefore, the GLS/MLE estimator is efficiet as it achieves the Cramer-Rao lower boud (X V X). Whe V is ukow, we ca parameterize it V (ψ), say, ad maximizes the likelihood l(y X, β, ψ) = (/2)logV (ψ) (/2)(Y Xβ) V (ψ)(y Xβ). 2.5 Example: MLE i autoregressive estimatio I Hamilto s book, you ca fid may detailed discussios about MLE estimatio for a ARMA model i Chapter 5. We will take a AR() model as example. Cosider a AR() model, x t = c + βx t + u t where u t i.i.d.n(0, σ 2 ). Let θ = (c, β, σ 2 ) ad let the sample size deoted by. There are two ways to costruct the likelihood fuctio, ad the differece lies i how to specify the iitial observatio x. If we let x be radom, we kow that the ucoditioal distributio of x t is N(c/( β), σ 2 /( β 2 )), ad this will lead to a exact likelihood fuctio. Alteratively, we ca assume that x is observable (kow) ad this will lead to a coditioal likelihood fuctio. We first cosider the exact likelihood fuctio. We kow that p(x ; θ) = (2πσ 2 ) /2 exp (x c/( β)) 2 2σ 2 /( β 2. ) Coditioal o x, the coditioal distributio of x 2 is N(c + βx 2, σ 2 ), the the coditioal probability desity for the secod observatio is p(x 2 x ; θ) = (2πσ 2 ) /2 exp (x 2 c βx )) 2 2σ 2. So the joit probability desity for (x, x 2 ) is p(x, x 2 ; θ) = p(x 2 x ; θ)p(x ; θ). 7
18 Similarly, the probability desity for the th observatio coditioal o x is p(x x ; θ) = (2πσ 2 ) /2 exp (x c βx )) 2 2σ 2. ad the desity for the joit observatio of X = (x, x 2,..., x ) is L(X ; θ) = p(x ; θ) p(x t x t ; θ). Takig log we get the exact likelihood fuctio (omittig costat terms for simplicity) t=2 l(x ; θ) = 2 log ( σ 2 ) (x c/( β)) 2 β 2 2σ 2 /( β 2 ) 2 log(σ 2 ) t=2 (x t c βx t ) 2 2σ 2. () Next, to costruct the coditioal likelihood, assume that x is observable, the the log likelihood fuctio is (agai, costat terms are omitted) l(x ; θ) = 2 log(σ 2 ) (x t c βx t ) 2 2σ 2. (2) The maximum likelihood estimates ĉ ad ˆβ are obtaied by maximizig (2), or solvig the score fuctio. Note that maximizig (2) with respect to β is equivalet to miimizig t=2 (x t c βx t ) 2, which is the objective fuctio i OLS. Compared to the exact likelihood fuctio, we see that the coditioal likelihood fuctio is much easier to work with. Actually, whe the sample size is large, the first observatio becomes egligible to the total likelihood fuctio. Whe β <, the estimator computed from exact likelihood ad the estimator from coditioal likelihood are asymptotically equivalet. Fially, if the residual is ot Gaussia, ad if we estimate the parameter usig the coditioal Gaussia likelihood as i (2), the the estimate we obtai is kow as quasi-maximum likelihood estimate (QMLE). QMLE is also very frequetly used i empirical estimatio. Although we misspecified the desity fuctio, i may cases, QMLE is still cosistet. For istace, i a AR(p) process, if the sample secod momet coverges to the populatio secod momets, the QMLE usig (2) is cosistet, o matter whether the error is Gaussia or ot. However, stadard errors for the estimated coefficiets that are computed with the Gaussia assumptio eed ot be correct if the true data are ot Gaussia (White, 982). 3 Model Selectio I the discussio o estimatio above, we assume that the order of the lags is kow. However, i empirical estimatio, we have to choose a proper order. A larger umber of order (parameters) will icrease the fitess of the model, therefore we eed some criterio to balace the goodess of 8
19 fit ad model parsimoy. There are three commoly used criterio, Akaike iformatio criterio (AIC), Schwartz s Bayesia iformatio criterio (BIC), ad the posterior iformatio criterio (PIC) developed by Phillips (996). I all these criterio, we specify a maximum order k max, ad the choose ˆk to miimize a criterio equatio. ( ) SSRk AIC = log + 2k (3) where is the sample size, k =, 2,..., k max is the umber of parameters i the model, ad SSR k is the residual from the fitted model. Whe k icrease, the fit icreases, so SSR k decreases, but the secod term icreases. So this shows a trade off betwee fit ad parsimoy. Sice the model is estimated usig differet lags, the sample size also varies. We ca either use the differet sample size k, or we ca use a fixed sample size k max. Ng ad Perro (2000) has recommeded usig the fixed sample size ad use it to replace i the criterio. However, the AIC rule is ot cosistet ad teds to overfit the model by choosig larger k. With all other issues similar as i the AIC rule, the BIC rule imposes a larger pealty for icreasig umber of parameters, ( ) SSRk BIC = log + k log() (4) BIC suggests samller k tha AIC ad BIC rule is cosistet i statioary data, i.e., lim ˆkBIC = k. Further, Haa ad Deistler (988) has show that ˆk BIC is cosistet whe we set k max = c log() (the iteger part of c log()) for ay c > 0. Therefore, we ca estimate ˆk BIC cosistetly without kowig the upper boud of k. Fially, to preset the PIC criterio, let K = k max, ad let X(K) ad X(k) to deote the regressor matrix with K ad k parameters respectively. Similar for β, the parameter vector. Y = X(K)β(K) + error = X(k)β(k) + X( )β( ) + error A( ) = X( )β( ) A(k) = X(k)β(k) A(, k) = X( )X(k) A( ) = A( ) A(, k)a(k) A(k, ) ˆβ( ) = X( ) X( ) X( ) X(k)(X(k) X(k)) X(k)X( ) X( ) Y X( ) X(k)(X(k) X(k)) X(k)Y σ 2 K = SSR K /( K) the {( ) } PIC = A( )/σk 2 /2 exp ˆβ( ) 2σk 2 A( ) ˆβ( ). PIC is asymptotically equivalet to the BIC criterio whe the data is statioary, ad whe the data is ostatioary, PIC is still cosistet. Readig: Hamilto, Ch. 5, 8. 9
Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More information1 General linear Model Continued..
Geeral liear Model Cotiued.. We have We kow y = X + u X o radom u v N(0; I ) b = (X 0 X) X 0 y E( b ) = V ar( b ) = (X 0 X) We saw that b = (X 0 X) X 0 u so b is a liear fuctio of a ormally distributed
More informationx iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.
27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid
More informationAsymptotic Results for the Linear Regression Model
Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is
More information11 THE GMM ESTIMATION
Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationSolution to Chapter 2 Analytical Exercises
Nov. 25, 23, Revised Dec. 27, 23 Hayashi Ecoometrics Solutio to Chapter 2 Aalytical Exercises. For ay ε >, So, plim z =. O the other had, which meas that lim E(z =. 2. As show i the hit, Prob( z > ε =
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t
ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationQuestions and Answers on Maximum Likelihood
Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationProbability and Statistics
ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More informationPOLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries
Outlie of Liear Systems of Equatios POLS, GLS, FGLS, GMM Commo Coefficiets, Pael Data Model Prelimiaries he liear pael data model is a static model because all explaatory variables are dated cotemporaeously
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationMA Advanced Econometrics: Properties of Least Squares Estimators
MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationAsymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values
of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationNotes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley
Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationPoint Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.
Poit Estimatio: properties of estimators fiite-sample properties CB 7.3) large-sample properties CB 10.1) 1 FINITE-SAMPLE PROPERTIES How a estimator performs for fiite umber of observatios. Estimator:
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002
ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationLecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett
Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationSummary. Recap ... Last Lecture. Summary. Theorem
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca
More informationUnbiased Estimation. February 7-12, 2008
Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationof the matrix is =-85, so it is not positive definite. Thus, the first
BOSTON COLLEGE Departmet of Ecoomics EC771: Ecoometrics Sprig 4 Prof. Baum, Ms. Uysal Solutio Key for Problem Set 1 1. Are the followig quadratic forms positive for all values of x? (a) y = x 1 8x 1 x
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationLast Lecture. Wald Test
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 22 Hyu Mi Kag April 9th, 2013 Is the exact distributio of LRT statistic typically easy to obtai? How about its asymptotic distributio? For testig
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationA note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors
Appl. Math. J. Chiese Uiv. 008, 3(): 97-0 A ote o self-ormalized Dickey-Fuller test for uit root i autoregressive time series with GARCH errors YANG Xiao-rog ZHANG Li-xi Abstract. I this article, the uit
More informationClassical Linear Regression Model. Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares
Classical Liear Regressio Model Normality Assumptio Hypothesis Testig Uder Normality Maximum Likelihood Estimator Geeralized Least Squares Normality Assumptio Assumptio 5 e X ~ N(,s I ) Implicatios of
More informationLecture 23: Minimal sufficiency
Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic
More informationLECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION
Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more
More informationMaximum Likelihood Estimation
Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum
More informationSOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker
SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5
More informationFirst, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,
0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More information1 Covariance Estimation
Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.
More informationLinear Regression Models, OLS, Assumptions and Properties
Chapter 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.1 The Liear Regressio Model The liear regressio model is the sigle most useful tool i the ecoometricia s kit. The multiple regressio model
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationQuantile regression with multilayer perceptrons.
Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer
More informationRegression with an Evaporating Logarithmic Trend
Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,
More informationG. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan
Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.
More informationAn Introduction to Asymptotic Theory
A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More information