Lecture 5: Linear Regressions

Size: px
Start display at page:

Download "Lecture 5: Linear Regressions"

Transcription

1 Lecture 5: Liear Regressios I lecture 2, we itroduced statioary liear time series models. I that lecture, we discussed the data geeratig processes ad their characteristics, assumig that we kow all parameters (autoregressive or movig average coefficiets). However, i empirical studies, we have to specify a ecoometric model, estimate this model ad draw ifereces based o the estimates. I this lecture, we will provide a itroductio to parametric estimatio of a liear model with time series observatios. Three commoly used estimatio methods are least square estimatio (LS), maximum likelihood estimatio (MLE) ad geeral method of momets (GMM). I this lecture, we will discuss LS ad MLE. Least Square Estimatio Least square (LS) estimatio is oe of the first techiques we lear i ecoometrics. It is both ituitive ad easy to implemet, ad the famous Gauss-Markov theorem tells that uder certai assumptios, ordiary least square (OLS) estimator is the best liear ubiased estimator (BLUE). We will start from review of classical LS estimatio ad the we will cosider estimatios with relaxed assumptios. Below are our otatios i this lecture ad the basic algebra i LS estimatio. Cosider the regressio y t = x tβ 0 + u t, t =,..., () where x t is k by vector ad β 0, also a k by vector is the true parameter. The the OLS estimator of β 0, deoted by ˆβ is ˆβ = x t x t x t y t (2) ad the OLS sample residual is Y = û t = y t x t ˆβ. Sometimes, it is more coveiet to work i matrix form. Defie y x u y 2 x 2 u 2. y The the regressio ca be writte as Copyright by Lig Hu. X =. x U =. u Y = X β 0 + U, (3)

2 ad the OLS estimator ca be writte as ˆβ = (X X ) X Y. (4) Defie M X = I X (X X ) X. It is easy to see that M x is symmetric, idempotet (M x M x = M x ), ad orthogoal to the colums of X. The we have Û = Y X ˆβ = M X Y. To derive the distributio of the estimator ˆβ, ˆβ = (X X ) X Y = (X X ) X (X β 0 + U ) = β 0 + (X X ) X U. (5) Therefore, the properties of ˆβ depeds o (X X ) X U. For example, if E(X X ) X U = 0, the ˆβ is ubiased estimator.. Case : OLS with determiistic regressors ad i.i.d. Gaussia errors Assumptio (a) x t is determiistic; (b) u t i.i.d(0, σ 2 ); (c) u t i.i.d.n(0, σ 2 ). ad Uder assumptio (a) ad (b), E(U ) = 0 ad E(U U ) = σ 2 I. The from (5) we have E( ˆβ ) = β 0 + (X X ) X E(U ) = β 0, E( ˆβ β 0 )( ˆβ β 0 ) = E(X X ) X U U X (X X ) = (X X ) X E(U U )X (X X ) = σ 2 (X X ) Uder these assumptios, Gauss-Markov theorem tells that the OLS estimator ˆβ is the best liear ubiased estimator for β 0. The OLS estimator for σ 2 is s 2 = ÛÛ /( k) = U M X M X U /( k) = U M X U /( k). (6) Sice M X is symmetric, there exists a by matrix P such that M X = P ΛP ad P P = I where Λ is a by matrix with the eigevalues of M X alog the pricipal diagoal ad zeros elsewhere. From properties of M X we ca compute that Λ cotais k zeros ad k oes alog its pricipal diagoal. The RSS = U M X U = U P ΛP U = (P U )Λ(P U ) = W λw = λ t wt 2 2

3 where W = P U. The E(W W ) = P E(U U )P = σ 2 I, therefore, w t are ucorrelated with mea 0 ad variace σ 2. Therefore, E(U M X U ) = λ t E(wt 2 ) = ( k)σ 2. So the s 2 defied i (6) is ubiased estimator for σ 2 : E(s 2 ) = σ 2. With the Gaussia assumptio (c), ˆβ is also Gaussia, ˆβ N(0, σ 2 (X X ) ). Note that here ˆβ is exact ormal, while may of the estimator i our later discussios are asymptotically ormal. Actually, uder assumptio, OLS estimator is optimal. Also, with the Gaussia assumptio, w t is i.i.d.n(0, σ 2 ). Therefore we have U M X U /σ 2 χ 2 ( k)..2 Case 2: OLS with stochastic regressors ad i.i.d. Gaussia errors The assumptio of determiistic regressors is very strog for empirical studies i ecoomics. Some examples of determiistic regressors are costats ad determiistic tred (i.e. x t = (, t, t 2,...)). However, most data we have for ecoometric regressio are stochastic. Therefore from this subsectio, we will allow the regressors to be stochastic. However, i case 2 ad case 3, we assume that x t is idepedet of errors (leads ad lags). This is still too strog i time series, as it rules out may processes icludig ARMA models. Assumptio 2 (a) x t is stochastic ad idepedet of u s for all t, s; (b) u t i.i.d.n(0, σ 2 ). This assumptio ca be equivaletly writte as U X N(0, σ 2 I ). Uder these assumptios, ˆβ is still ubiased: E( ˆβ ) = β 0 + E(X X ) X E(U ) = β 0. Coditioal o X, ˆβ is ormal, ˆβ X N(β 0, σ 2 (X X ) ). To get the ucoditioal probability distributio for ˆβ, we have to itegrate this coditioal desity over X. Therefore, the ucoditioal distributio of ˆβ will deped o the distributio of X. However, we still have the ucoditioal distributio for the estimate of the variace U M X U /σ 2 χ 2 ( k)..3 Case 3: OLS with stochastic regressors ad i.i.d. No-Gaussia errors Compared to case 2, i this sectio we let the error terms to follow arbitrary i.i.d. distributio with fiite fourth momets. Sice this is a arbitrary ukow distributio, it is very hard obtai exact distributio (fiite sample distributio) for ˆβ, istead, we will apply asymptotic theory i this problem. Assumptio 3 (a) x t is stochastic ad idepedet of u s for all t, s; (b) u t i.i.d.(0, σ 2 ), ad E(u 4 t ) = µ 4 < ; (c) E(x t x t) = Q t, a positive defiite matrix with (/) Q t Q, a positive defiite matrix; (d) E(x it x jt x kt x lt ) < for all i, j, k, l ad t; (e) (/) (x tx t) p Q. 3

4 With assumptio (a), we still have the ˆβ is ubiased estimator for β 0. The assumptio (c) to (e) are restrictios o x t. Basically we wat to have (/) x tx t p (/) E(x tx t). We have ˆβ β 0 = x t x t x t u t = (/) x t x t (/) From assumptios ad cotiuous mappig theorem, we have (/) x t x t p Q. x t u t x t u t is a martigale differece sequece with fiite variace, the by LLN for mixigales, we have (/) x t u t p 0. Therefore, ˆβ p β 0, so ˆβ is a cosistet estimator. Next, we will derive the distributio for it. This is the first time we derive asymptotic distributio for a OLS estimator. The routies i derivig asymptotically distributio for ˆβ are outlied as follows: first we apply LLN o the term x tx t, after properly ormed (so that the limit is a costat); the apply cotiuous mappig theorem to get the limit for x tx t. We already got this i the above proof of cosistecy for ˆβ. The we apply CLT o the term x tu t, also after properly ormed (so that the limit is odegeerate). Note E(x t x tu 2 t ) = σ 2 Q t ad (/) σ2 Q t σ 2 Q. By CLT for mds, we have (/ ) x t u t N(0, σ 2 Q). Therefore, ( ˆβ β 0 ) = (/) x t x t (/ ) x t u t N(0, Q (σ 2 Q)Q ) = N(0, σ 2 Q ). so the ˆβ follows ˆβ N (β 0, σ2 Q ). Note that this distributio is ot exact, but approximate. So we should read it as approximately distributed as ormal. 4

5 To compute this variace, we eed to kow σ 2. Whe it is ukow, the OLS estimator s 2 is still cosistet uder assumptio 3. We have u 2 t = (y t x tβ 0 ) 2 = y t x t ˆβ + x t( ˆβ β 0 ) 2 = (y t x t ˆβ ) 2 + 2(y t x t ˆβ )x t( ˆβ β 0 ) + x t( ˆβ β 0 ) 2 By LLN, we have (/) u2 t σ 2. There are three terms i the above equatio. For the secod term, we have (/) (y t x ˆβ t )x t( ˆβ β 0 ) = 0 as (y t x t ˆβ ) is orthogoal to x t. For the third term, ( ˆβ β 0 ) (/) x tx t ( ˆβ β 0 ) p 0 as ˆβ β 0 is o p () ad (/) x tx t Q. Therefore, we ca defie ad we have ˆσ 2 = (/) ˆσ 2 = (/) (y t x ˆβ t ) 2 = (/) (y t x ˆβ t ) 2, u 2 t (/) x t( ˆβ β 0 ) 2 σ 2. This estimator is oly slightly differet from ŝ 2 (ˆσ 2 = ( k)ŝ 2 /). Sice ( k)/ as, if ˆσ 2 is cosistet, so is s 2. Next, to derive the distributio of ˆσ 2. (ˆσ 2 σ 2 ) = (/ ) (u 2 t σ 2 ) ( ˆβ β 0 ) (/) x tx t ( ˆβ β 0 ). The secod term goes to zero as (/) x tx t p Q ad ˆβ β 0 p 0. Defie z t = u 2 t σ 2, the z t is i.i.d. with mea zero ad variace E(u 4 t ) σ 4 = µ 4 σ 4. Applyig CLT, we have (/ ) z t d N(0, µ 4 σ 4 ), therefore, (ˆσ 2 σ 2 ) d N(0, µ 4 σ 4 ). The same limit distributio applies for s 2, sice the differece betwee ˆσ 2 ad s 2 is o p ( /2 ). 5

6 .4 Case 4: OLS estimatio i autoregressio with i.i.d. error I a autoregressio, say, x t = φ 0 x t + ɛ t, where ɛ t is i.i.d., the regressors are o loger idepedet of ɛ t. I this case, the OLS estimator of φ 0 is biased. However, we will show that uder assumptio 4, the estimator is cosistet. Assumptio 4 The regressio model is y t = c + φ y t + φ 2 y t φ p y t p + ɛ t, with roots of ( φ z φ 2 z 2... φ p z p ) = 0 outside the uit circle (so y t is statioary) ad with ɛ t i.i.d. with mea zero, variace σ 2, ad fiite fourth momets µ 4. Page i Hamilto presets the geeral AR(p) case with costat. We will use AR(2) as a example, y t = φ y t + φ 2 y t 2 + ɛ t. Let x t = (y t, y t 2 ), u t = ɛ t ad y t = x tβ 0 + u t (so β 0 = (φ, φ 2 )). ( ˆβ β 0 ) = (/) x t x t (/ ) x t u t (7) The first term (/) x t x t = (/) y2 t y t y t 2 y t y t 2 y2 t 2 I this matrix, first, o the diagoal, y2 t j coverge to γ 0. The remaiig term coverges to γ. Therefore, (/) x t x γ0 γ t p Q = γ γ 0 Apply CLT for mds o the secod term i (7), (/ ) x t u t d N(0, σ 2 Q),. y t y t 2 therefore, ( ˆβ β 0 ) d N(0, σ 2 Q ). So far we have cosidered four cases i OLS regressios. The commo assumptio i all those four cases are i.i.d. errors. From ext sectio, we will cosider cases where the errors are ot i.i.d...5 OLS with o-i.i.d. errors Whe the error u t is i.i.d., the the variace-covariace matrix V = E(U U ) = σ 2 I. If V is still diagoal but the elemets are ot equal, for example, the errors o some dates display larger variace ad the errors o some dates display smaller variace, the the errors are said to exhibit heteroskedasticity. If V is o-diagoal, the the errors are said to be autocorrelated. For example, let u t = ɛ t φɛ t where ɛ t is i.i.d., the u t is serially correlated errors. Case 5 i Hamilto assumes 6

7 Assumptio 5 (a) x t is stochastic; (b) coditioal o the full matrix X, the vector U N(0, σ 2 V ); (c) V is a kow positive matrix. Uder these assumptios, the exact distributio of ˆβ ca be derived. However, this is a very strog assumptio ad it rules out the autoregressive regressio. Also, the assumptio that V is kow rarely holds i applicatios. Case 6 i Hamilto assumes ucorrelated but heteroskedastic errors with ukow covariace matrix. Uder assumptio 6, the OLS estimator is still cosistet ad asymptotically ormal. Assumptio 6 (a) x t stochastic, icludig perhaps lagged values of y; (b) x t u t is martigale differece sequece; (c) E(u 2 t x t x t) = Ω t, a positive defiite matrix, with (/) Ω t p Ω ad (/) u2 t x t x t p Ω; (d) E(u 4 t x it x jt x lt x kt ) < for all i, j, k, l ad t; (e) plims of (/) u tx it x t x t ad (/) x itx jt x t x t exist ad are fiite for all i, j ad (/) x tx t p Q, a osigular matrix. Agai, write the OLS estimator as ( ˆβ β 0 ) = (/) Assumptio 6 (e) esures that Apply CLT for mds, therefore, (/) (/ ) x t x t (/ ) x t x t p Q. x t u t N(0, Ω), ( ˆβ β 0 ) N(0, Q ΩQ ). x t u t However, both Q ad Ω are ot observable ad we eed to fid cosistet estimates for them. White proposes the followig estimator ˆQ = (/) x tx t ad ˆΩ = (/) û2 t x t x t where û t is the OLS residual y t x ˆβ t. Propositio With heteroskedasticity of ukow form satisfyig assumptio 6, the asymptotic variace-covariace matrix of the OLS coefficiet vector ca be cosistetly estimated by ˆQ ˆΩ ˆQ p Q ΩQ (8) Proof: Assumptio 6 (e) esures ˆQ Q ad assumptio 6 (c) esures that Ω (/) So to prove (8), we oly eed to show that ˆΩ Ω = (/) u 2 t x t x t p Ω. (û 2 t u 2 t )x t x t 0. 7

8 The trick here is to make use of a kow fact that ˆβ β 0 p 0. If we could write ˆΩ Ω as sums of some products of ˆβ β 0 ad terms that are bouded, the ˆΩ Ω p 0. û 2 t u 2 t = (û t + u t )(û t u t ) = 2(y t β 0x t ) ( ˆβ β 0 ) x t ( ˆβ β 0 ) x t = 2u t ( ˆβ β 0 ) x t + ( ˆβ β 0 ) x t 2 The ˆΩ Ω = ( 2/) u t ( ˆβ β 0 ) x t (x t x t) + (/) ( ˆβ β 0 ) x t 2 (x t x t). Write the first term ( 2/) u t ( ˆβ β 0 ) x t (x t x t) = 2 k ( ˆβ i β i0 ) (/) i= u t x it (x t x t). The term i the bracket has a fiite plim by assumptio 6 (e) ad we have ˆβ i β i0 0 for each i. The this term coverges to zero. (if this looks messy, take k =, the you ca simply move ( ˆβ β 0 ) out of the summatio. ˆβ β 0 p 0 ad the sum has a fiite limit, so the product goes to zero). Similarly for the secod term, (/) ( ˆβ k k β 0 ) x t 2 (x t x t) = ( ˆβ i β i0 )( ˆβ j β j0 ) (/) x it x jt (x t x t) p 0 i= j= as the term i bracket has a fiite plim. Therefore, ˆΩ Ω 0. Defie ˆV = ˆQ ˆΩ ˆQ, the ˆβ N(β 0, ˆV /), ad V / is a heteroskedastic-cosistet estimates for the variace-covariace matrix. Newey-West proposes the followig estimator for the variace-covariace matrix which is heteroskedastic ad autocorrelatio cosistet (HAC). q ( ˆV / = (X X ) û 2 t x t x t + k ) (x t û t û t k x t k q + + x t kû t k û t x t) (X X ). t=k+.6 Geeral least square k= Geeral least square (GLS) ad feasible geeral least square (FGLS) is preferred i least square estimatio whe the errors are heteroskedastic or/ad autocorrelated. Let x t be stochastic ad U X N(0, σ 2 V ) where V is kow (assumptio 5). Sice V is symmetric ad positive defiite, there exists matrix L such that V = L L. Premultiply L to our regressio ad get LY = LXβ 0 + LU. 8

9 The the ew error Ũ = LU is i.i.d. coditioal o X, E(ŨŨ X) = LE(UU X)L = σ 2 LV L = σ 2 I. The the estimator β = (X L LX) X L Ly = (X V X) X V y is kow as the geeral least square estimator. However, as we remarked earlier, i applicatios, V is rarely kow ad we have estimate it. The GLS estimator obtaied usig estimated V is kow as feasible GLS estimator. Usually, FGLS require that we specify a parametric model for the error. For example, let the error u t follow a AR() process, u t = ρ 0 u t + ɛ t where ɛ t i.i.d.(0, σ 2 ). I this case, we ca ru OLS first ad obtai the OLS residual û t. The ru OLS estimatio for ρ usig the û t. This estimator, deoted by ˆρ, is cosistet estimator for ρ. To show this, write û t = (y t β 0 x t + β 0 x t ˆβ x t ) = u t + (β 0 ˆβ ) x t. û t û t = = = = u t + (β 0 ˆβ ) x t u t + (β 0 ˆβ ) x t u t u t + (β 0 ˆβ ) u t u t + o p () (ɛ t + ρu t )u t ρvar(u t ). (u t x t + u t x t ) + (β 0 ˆβ ) x t x t (β 0 ˆβ) Similarly, we ca show that ûtû t p var(u t ), hece ˆρ ρ 0. Still use similar method, we ca show that û t û t = u t u t + o p (). Hece (ˆρ ρ0 ) N(0, ( ρ 2 0)). Fially the FGLS estimator for β 0 based o V (ˆρ) has the same limit distributio as the GLS estimator based o V (ρ 0 ) (page i Hamilto)..7 Statistical iferece with LS estimatio Some commoly used test statistics for LS estimator are t statistics ad F statistics. t statistics is used to test the hypothesis of a sigle parameter, say β i = c. For simplicity, we assume that c = 0, so we use t statistics to test if a variable is sigificat. The t statistics is defied as the ratio 9

10 ˆβ i /sd(β i ). Let the estimate of the variace of ˆβ be deoted by s 2 Ŵ, the the stadard deviatio of ˆβ i is the product of s ad the square root of the ith elemet o the diagoal, i.e., t = ˆβ i s 2 w ii. (9) Recall that if X/σ N(0, ), ad Y 2 /σ 2 χ 2 (m), ad let X ad Y be idepedet, the t = X m Y follows exact studet t distributio with m degree of freedom. F -statistics is used to test the hypothesis of m differet liear restrictios about β, say H 0 : Rβ = r, where R is a m by k matrix. The F statistics is the defied as F = (R ˆβ r) V ar(r ˆβ r) (R ˆβ r). (0) This is a Wald statistics. To derive the distributio of the statistics, we will eed the followig result Propositio 2 If a k by vector X N(µ, Σ), the (X µ) Σ (X µ) χ 2 (k). Also recall that a exact F (m, ) distributio is defied to be F (m, ) = χ2 (m)/m χ 2 ()/. With assumptio Ŵ = (X X ), ad uder the hull hypothesis ˆβ i N(0, σ 2 w ii ). We ca the write ˆβ i σ t = 2 w ii. s 2 σ 2 Sice the umerator is N(0, ) ad the deomiator is the square root of χ 2 ( k) divided by k (sice RSS/σ 2 χ 2 ( k)), ad the umerator ad deomiator are idepedet, so t statistics (9) uder assumptio follows exact t distributio. With assumptio ad uder the ull hypothesis, we have R ˆβ r N(0, σ 2 R(X X ) R), the by propositio 2, the F statistics defied i (0) uder hypothesis H 0 (R ˆβ r) σ 2 R(X X ) R (R ˆβ r) χ 2 (m). If we replace σ 2 with s 2, ad divide it by the umber of restrictios m, we get the OLS F test of a liear hypothesis F = (R ˆβ r) s 2 R(X X ) R (R ˆβ r)/m = F /m (RSS/σ 2 )/( k), 0

11 so F follows a exact F (m, k) distributio. A alterative way to express the F statistics is to compute the estimator without restrictio ˆβ ad its associated sum of residual RSS u ; ad the estimator with restrictio β ad its associated sum of residual RSS r, the we ca write F = (RSS r RSS u )/m. RSS u /( k) Now, with assumptio 2, X is stochastic ad ˆβ is ormal coditioal o X ad RSS σ 2 χ 2 ( k) coditioal o X. This coditioal distributio of RSS is the same for all X, therefore, the ucoditioal distributio of RSS is the same as the coditioal distributio. The same is true for the t ad F statistics. Therefore we have the same results uder assumptio 2 as that uder assumptio. From case 3, we o loger have exact distributio for the estimator, ad we have to derive the asymptotic distributio for the estimator, so we also use the asymptotic distributios for the test statistics. t = ˆβ i s wii = ˆβi s wii. where w ii is the ith elemet o the diagoal of ˆβ s asymptotic variace Q. If we let the ith elemet o the diagoal of Q deoted by q ii, the we have ˆβ i d N(0, σ 2 q ii ). Recall that uder assumptio 3, s σ, there we have t N(0, ). Next, write F = (R ˆβ r) s 2 R(X X ) R (R ˆβ r)/m = (R ˆβ r) s 2 R(X X /) R (R ˆβ r)/m Now we have s 2 p σ 2, X X / Q, ad uder the ull, (R ˆβ r) = R ( ˆβ β0 ) d N(0, σ 2 RQ R ). The by propositio 2, we have mf χ 2 (m). We ca the use similar methods to derive the distributio for other cases. I geeral if ˆβ p β 0 ad asymptotically ormal, s 2 σ 2, ad we have foud a cosistet estimate for the variace of ˆβ, the the t ad F statistics follow asymptotically ormal ad χ 2 (m) distributio. Actually, uder assumptio or 2, whe the sample size is large, we ca also use ormal ad χ 2 distributio to approximate the exact t ad F distributio. Further, sice we are usig the asymptotic distributio, the Wald test ca also be used to test oliear restrictios.

12 2 Maximum Likelihood Estimatio 2. Review: maximum likelihood priciple ad Cramer-Rao lower boud The basic idea of maximum likelihood priciple is to choose the parameter estimates that maximizes the probability of obtaiig the observed sample. Cosider that we observe a sample X = (x, x 2,..., x ) ad assume that the sample is draw from a i.i.d. distributio ad the associated parameters are deoted by θ. Let p(x t ; θ) deote the pdf of the tth observatio. For example, whe x t i.i.d.n(µ, σ 2 ), the θ = (µ, σ 2 ) ad p(x t ; θ) = (2πσ 2 ) /2 exp (x t µ) 2 2σ 2. The likelihood fuctio for the whole sample X is ad the log likelihood fuctio is L(X ; θ) = l(x ; θ) = p(x t ; θ) log p(x t ; θ). The maximum likelihood estimates for θ are chose so that l(x ; θ) is maximized. Defie the score fuctio S(θ) = l(θ)/ θ, ad the Hessia matrix H(θ) = 2 l(β)/ θ θ, the the famous Cramer-Rao iequality tells that the lowest boud for the variace of a ubiased estimator of θ is the iverse of the iformatio matrix I(θ 0 ) = ES(θ 0 )S(θ 0 ), where θ 0 deotes the true value of the parameter. A estimator that have a variace equal to this boud is kow as efficiet. Uder some regularity coditio which are satisfied for the Gaussia desity, we have the followig equality 2 l(θ) I(θ) = EH(θ) = E θ θ. So, if we fid a ubiased estimator ad its variace achieves the Cramer-Rao lower boud, the we kow that this estimator is efficiet ad there is o other ubiased estimator (liear or oliear) that could have smaller variace tha this estimator. However, this lower boud is ot always achievable. If a estimator does achieve this boud, the this estimator is idetical to MLE. Note that Cramer-Rao iequality holds for ubiased estimator while sometimes ML estimators are biased. If the estimator is biased but cosistet, ad its variace approaches the Cramer-Rao boud asymptotically, the this estimator is kow as asymptotically efficiet. Example (MLE estimatio for i.i.d. Gaussia distributio) Let x t i.i.d.n(µ, σ 2 ), so the parameter θ = (µ, σ 2 ). The we have { p(x t ; θ) = exp (x t µ) 2 } 2πσ 2 2σ 2 l(x ; θ) = 2 log(2π) 2 log(σ2 ) 2σ 2 (x t µ) 2 2

13 S(X ; µ) = l(x ; θ) µ S(X ; σ 2 ) = l(x ; θ) σ 2 = σ 2 (x t µ) 2 = 2σ 2 + 2σ 4 (x t µ) 2 Set the score fuctios to zero, we foud the MLE estimator for θ are ˆµ = X ad ˆσ 2 = (x t ˆµ) 2. It is easy to verify that E(ˆµ) = E( X ) = µ, so ˆµ is ubiased ad its variace V ar(ˆµ) = σ 2 /, while Eˆσ 2 = E (x t ˆµ) 2 = E(x t ˆµ) 2 = E(x t µ) + (µ ˆµ) 2 = σ 2 2 σ2 + σ2 = σ2 so ˆσ 2 is biased, but it is cosistet as ˆσ 2 σ 2 as. Defie s 2 = (x t ˆµ) 2, the Es 2 = σ 2, ad V ar(s 2 ) = 2σ 4 /( ). We ca further compute the Hessia matrix, H(X ; θ) = 2 l(x ;θ) 2 µ 2 l(x ;θ) σ 2 µ 2 l(x ;θ) µσ 2 2 l(x ;θ) 2 σ 2 where l(x ; θ) 2 µ = σ 2 l(x ; θ) µσ 2 = l(x ; θ) σ 2 µ l(x ; θ) 2 σ 2 = 2σ 4 σ 6 = σ 4 (x t µ) (x t µ) 2 We ca also compute that H(X ; θ) θ=ˆθ = 2 2σ 6 > 0, so we kow that the we have foud the maximum (ot miimum) of the likelihood fuctio. Next, compute the iformatio matrix, E θ (x t µ) = 0, E θ (x t µ) 2 = σ 2. 3

14 therefore the iformatio matrix I(θ) = E H(X ; θ) = σ σ 4 So the MLE of µ has achieved the Cramer-Rao lower boud of variace σ2. Although s2 does ot achieve to the lower boud, it turs out it is still the ubiased estimator for σ 2 with miimum variace. 2.2 Asymptotic Normality of MLE There are a few regularity coditios to esure that the MLE is cosistet. First we assume that the data is strictly statioary ad ergodic (for example, i.i.d.). Secod, we assume that the parameter space Θ is covex ad either the estimate ˆθ or the true parameter θ 0 lie o the boudary of Θ. Third, we require that the likelihood fuctio evaluated at ˆθ is differet from θ 0, for ay ˆθ θ 0 i Θ. This is kow as the idetificatio coditio. Fially, we assume that Esup θ Θ l(x ; θ) <. With all those coditios satisfied, the MLE is cosistet ˆθ p θ 0. Next we will discuss the asymptotic results o the score fuctio S(X ; θ), the Hessia matrix H(X ; θ) ad the asymptotic distributio of the MLE estimates ˆθ. First, we wat to show that ES(X, θ 0 ) = 0 ad ES(X, θ 0 )S(X, θ 0 ) = E(H(X ; θ 0 ). Let the itegral operator deote itegrate over X, X 2,..., X, the we have that L(X, θ 0 )dx =. Takig derivative with respect to θ, the we have L(X, θ 0 ) dx = 0. θ While, we ca write L(X, θ 0 ) dx θ L(X, θ 0 ) = L(X, θ 0 )dx L(X, θ 0 ) θ l(x ; θ 0 ) = L(X, θ 0 )dx θ = ES(X, θ 0 ) So we kow that ES(X, θ 0 ) = 0. Next, let the itegral (which equal to zero) take θ, it is l(x ; θ 0 ) L(X, θ 0 ) 2 l(x ; θ 0 ) θ θ dx + θ θ L(X, θ 0 )dx = 0. The secod term is just EH(X ; θ 0 ). The first ca be writte as ( ) l(x ; θ 0 ) L(X, θ 0 ) θ L(X, θ 0 ) θ L(X, θ 0 )dx l(x ; θ 0 ) l(x ; θ 0 ) = θ θ L(X, θ 0 )dx = ES(X, θ 0 )S(X, θ 0 ) 4

15 Now, sice ES(X, θ 0 )S(X, θ 0 ) + EH(X ; θ 0 ) = 0, we have that ES(X, θ 0 )S(X, θ 0 ) = EH(X ; θ 0 ). log p(xt;θ) Next, defie that s(x t ; θ) = θ, the we write the score fuctio as the sum of s(x t ; θ), i.e., S(X, θ) = s(x t; θ). s(x t ; θ) is i.i.d. ad we ca show that Es(x t ; θ 0 ) = 0 ad Es(x t ; θ 0 )s(x t ; θ 0 ) = EH(x t ; θ 0 ). Applyig Lideberg-Levy CLT, we obtai the asymptotic ormality of the score fuctio /2 S(X ; θ 0 ) d N(0, EH(X ; θ 0 )). Next, we cosider the properties of the Hessia matrix. First we assume that EH(X ; θ 0 ) is o-sigular. Let N ɛ be a eighborhood of θ 0, ad the we have E sup θ N ɛ H(X ; θ) <, H(x t ; θ) EH(X ; θ 0 ) V, where θ is ay cosistet estimator for θ 0. Apply the LLN, we have H(X ; θ 0 ) = H(x t ; θ 0 ) p E(x t ; θ 0 ) = E H(X ; θ 0 ) Σ. With the otatio Σ, we ca write /2 S(X ; θ 0 ) d N(0, Σ). Propositio 3 (Asymptotic ormality of MLE) With all the coditios we have outlied above, (ˆθ θ0 ) d N(0, Σ ). Proof: Do a Taylor expasio of S(X ; ˆθ) aroud θ 0, 0 = S(X ; ˆθ) S(X ; θ 0 ) + (ˆθ θ 0 )H(X ; θ 0 ). Therefore, we have (ˆθ θ0 ) = S(X ; θ 0 )H(X ; θ 0 ) ( ) ( = S(X ; θ 0 ) ) H(X ; θ 0 ) N(0, Σ ΣΣ ) = N(0, Σ ) Note that Σ = E H(X ; θ 0 ) = I(θ 0 ), so the asymptotic distributio of ˆθ ca be writte as ˆθ N(θ 0, I(θ 0 ) ). However, I(θ 0 ) depeds o θ 0 which is ukow. So we eed to fid a cosistet estimator for it, deoted by ˆV. There are two methods to compute this variace matrix of ˆθ. Oe way is that 5

16 we compute the Hessia matrix, ad evaluate it at θ = ˆθ, i.e. ˆV = H(X ; ˆθ). The secod way is to use the outer product estimate, which is ˆV = 2.3 Statistical Iferece for MLE S(x t ; ˆθ)S(x t ; ˆθ). There are three asymptotically equivalet tests for MLE: likelihood ratio (LR) test, Wald test, ad Lagrage multiplier (LM) test or score test. You ca probably fid discussio o these three tests o ay graduate text book i ecoometrics, so we oly describe them briefly here. The likelihood ratio test is based o the differece betwee the likelihood you computed (maximized) with or without the restrictio. Let l u deote the likelihood without restrictio ad l r deote the likelihood with restrictio (ote that l r l u ). If the restrictio is valid, the we expect the l r should ot be too much lower tha l u. Therefore, to test if the restrictio is valid, the statistics we compute is 2(l u l r ) which follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. To do LR test, we have to compute the likelihood uder both restricted ad urestricted coditio. I compariso, the other two tests oly use either the estimator without restrictio (deoted by ˆθ) or the estimator with restrictio (deoted by θ). Let the restrictio be H 0 : R(θ) = r, the idea of Wald test is that: if this restrictio is valid, the the estimator obtaied without restrictio ˆθ will make R(ˆθ) r close to zero. Therefore the Wald statistics is W = (R(ˆθ) r) V ar(r(ˆθ) r) (R(ˆθ) r), which also follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. To fid the ML estimator, we set the score fuctio equal to zero ad solve for the estimator, i.e., S(ˆθ) = 0. If the restrictio is valid, ad the estimator we obtaied with the restrictio is θ, the we expect that S( θ) is close to zero. This idea leads to the LM test or score test. The LM statistics is LM = S( θ) I( θ) S( θ), which also follows a χ 2 distributio with degree of freedom equal to the umber of restrictios imposed. 2.4 LS ad MLE I a regressio Y = X β 0 + U where U X N(0, σ 2 I ) (as i assumptio 2), the coditioal desity of Y give X is f(y X; θ) = (2πσ 2 ) /2 exp 2σ 2 (Y Xβ) (Y Xβ). The log likelihood fuctio is l(y X; θ) = 2 log(2π) 2 log(σ2 ) 2σ 2 (X Xβ) (X Xβ) 6

17 Note that ˆβ that maximizes l is the vector that miimizes the sum of squares, therefore, uder the assumptio 2, the OLS estimator is equivalet to ML estimator for ˆβ 0. It ca be show that this estimator is ubiased ad achieves the Cramer-Rao lower boud, therefore uder assumptio 2, the OLS/MLE estimator are efficiet (compared to all ubiased liear or oliear estimators). Recall that uder assumptio, we have Gauss-Markov theorem to show that OLS estimator is the best liear ubiased estimator. Now, the Cramer-Rao iequality tells the optimality of OLS estimator uder assumptio2. The ML estimator for σ 2 is (Y Xβ) (Y Xβ)/. We have itroduced this estimator a momet ago ad we showed that the differece betwee ˆσ 2 ad the OLS estimator s 2 becomes arbitrarily small as. Next, cosider assumptio 5, where U X N(0, σ 2 V ) ad V is kow. The the log likelihood fuctio omittig costat term is l(y X, β) = (/2)logV (/2)(Y Xβ) V (Y Xβ). The MLE estimator is ˆβ = (X V X) X Y, which is equivalet to the GLS estimator. The score vector is S (β) = (Y Xβ) V X, the Hessia matrix H (β) = X V X. Therefore, the iformatio matrix is I(β) = X V X. Therefore, the GLS/MLE estimator is efficiet as it achieves the Cramer-Rao lower boud (X V X). Whe V is ukow, we ca parameterize it V (ψ), say, ad maximizes the likelihood l(y X, β, ψ) = (/2)logV (ψ) (/2)(Y Xβ) V (ψ)(y Xβ). 2.5 Example: MLE i autoregressive estimatio I Hamilto s book, you ca fid may detailed discussios about MLE estimatio for a ARMA model i Chapter 5. We will take a AR() model as example. Cosider a AR() model, x t = c + βx t + u t where u t i.i.d.n(0, σ 2 ). Let θ = (c, β, σ 2 ) ad let the sample size deoted by. There are two ways to costruct the likelihood fuctio, ad the differece lies i how to specify the iitial observatio x. If we let x be radom, we kow that the ucoditioal distributio of x t is N(c/( β), σ 2 /( β 2 )), ad this will lead to a exact likelihood fuctio. Alteratively, we ca assume that x is observable (kow) ad this will lead to a coditioal likelihood fuctio. We first cosider the exact likelihood fuctio. We kow that p(x ; θ) = (2πσ 2 ) /2 exp (x c/( β)) 2 2σ 2 /( β 2. ) Coditioal o x, the coditioal distributio of x 2 is N(c + βx 2, σ 2 ), the the coditioal probability desity for the secod observatio is p(x 2 x ; θ) = (2πσ 2 ) /2 exp (x 2 c βx )) 2 2σ 2. So the joit probability desity for (x, x 2 ) is p(x, x 2 ; θ) = p(x 2 x ; θ)p(x ; θ). 7

18 Similarly, the probability desity for the th observatio coditioal o x is p(x x ; θ) = (2πσ 2 ) /2 exp (x c βx )) 2 2σ 2. ad the desity for the joit observatio of X = (x, x 2,..., x ) is L(X ; θ) = p(x ; θ) p(x t x t ; θ). Takig log we get the exact likelihood fuctio (omittig costat terms for simplicity) t=2 l(x ; θ) = 2 log ( σ 2 ) (x c/( β)) 2 β 2 2σ 2 /( β 2 ) 2 log(σ 2 ) t=2 (x t c βx t ) 2 2σ 2. () Next, to costruct the coditioal likelihood, assume that x is observable, the the log likelihood fuctio is (agai, costat terms are omitted) l(x ; θ) = 2 log(σ 2 ) (x t c βx t ) 2 2σ 2. (2) The maximum likelihood estimates ĉ ad ˆβ are obtaied by maximizig (2), or solvig the score fuctio. Note that maximizig (2) with respect to β is equivalet to miimizig t=2 (x t c βx t ) 2, which is the objective fuctio i OLS. Compared to the exact likelihood fuctio, we see that the coditioal likelihood fuctio is much easier to work with. Actually, whe the sample size is large, the first observatio becomes egligible to the total likelihood fuctio. Whe β <, the estimator computed from exact likelihood ad the estimator from coditioal likelihood are asymptotically equivalet. Fially, if the residual is ot Gaussia, ad if we estimate the parameter usig the coditioal Gaussia likelihood as i (2), the the estimate we obtai is kow as quasi-maximum likelihood estimate (QMLE). QMLE is also very frequetly used i empirical estimatio. Although we misspecified the desity fuctio, i may cases, QMLE is still cosistet. For istace, i a AR(p) process, if the sample secod momet coverges to the populatio secod momets, the QMLE usig (2) is cosistet, o matter whether the error is Gaussia or ot. However, stadard errors for the estimated coefficiets that are computed with the Gaussia assumptio eed ot be correct if the true data are ot Gaussia (White, 982). 3 Model Selectio I the discussio o estimatio above, we assume that the order of the lags is kow. However, i empirical estimatio, we have to choose a proper order. A larger umber of order (parameters) will icrease the fitess of the model, therefore we eed some criterio to balace the goodess of 8

19 fit ad model parsimoy. There are three commoly used criterio, Akaike iformatio criterio (AIC), Schwartz s Bayesia iformatio criterio (BIC), ad the posterior iformatio criterio (PIC) developed by Phillips (996). I all these criterio, we specify a maximum order k max, ad the choose ˆk to miimize a criterio equatio. ( ) SSRk AIC = log + 2k (3) where is the sample size, k =, 2,..., k max is the umber of parameters i the model, ad SSR k is the residual from the fitted model. Whe k icrease, the fit icreases, so SSR k decreases, but the secod term icreases. So this shows a trade off betwee fit ad parsimoy. Sice the model is estimated usig differet lags, the sample size also varies. We ca either use the differet sample size k, or we ca use a fixed sample size k max. Ng ad Perro (2000) has recommeded usig the fixed sample size ad use it to replace i the criterio. However, the AIC rule is ot cosistet ad teds to overfit the model by choosig larger k. With all other issues similar as i the AIC rule, the BIC rule imposes a larger pealty for icreasig umber of parameters, ( ) SSRk BIC = log + k log() (4) BIC suggests samller k tha AIC ad BIC rule is cosistet i statioary data, i.e., lim ˆkBIC = k. Further, Haa ad Deistler (988) has show that ˆk BIC is cosistet whe we set k max = c log() (the iteger part of c log()) for ay c > 0. Therefore, we ca estimate ˆk BIC cosistetly without kowig the upper boud of k. Fially, to preset the PIC criterio, let K = k max, ad let X(K) ad X(k) to deote the regressor matrix with K ad k parameters respectively. Similar for β, the parameter vector. Y = X(K)β(K) + error = X(k)β(k) + X( )β( ) + error A( ) = X( )β( ) A(k) = X(k)β(k) A(, k) = X( )X(k) A( ) = A( ) A(, k)a(k) A(k, ) ˆβ( ) = X( ) X( ) X( ) X(k)(X(k) X(k)) X(k)X( ) X( ) Y X( ) X(k)(X(k) X(k)) X(k)Y σ 2 K = SSR K /( K) the {( ) } PIC = A( )/σk 2 /2 exp ˆβ( ) 2σk 2 A( ) ˆβ( ). PIC is asymptotically equivalet to the BIC criterio whe the data is statioary, ad whe the data is ostatioary, PIC is still cosistet. Readig: Hamilto, Ch. 5, 8. 9

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

1 General linear Model Continued..

1 General linear Model Continued.. Geeral liear Model Cotiued.. We have We kow y = X + u X o radom u v N(0; I ) b = (X 0 X) X 0 y E( b ) = V ar( b ) = (X 0 X) We saw that b = (X 0 X) X 0 u so b is a liear fuctio of a ormally distributed

More information

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0. 27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

11 THE GMM ESTIMATION

11 THE GMM ESTIMATION Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Solution to Chapter 2 Analytical Exercises

Solution to Chapter 2 Analytical Exercises Nov. 25, 23, Revised Dec. 27, 23 Hayashi Ecoometrics Solutio to Chapter 2 Aalytical Exercises. For ay ε >, So, plim z =. O the other had, which meas that lim E(z =. 2. As show i the hit, Prob( z > ε =

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Questions and Answers on Maximum Likelihood

Questions and Answers on Maximum Likelihood Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Probability and Statistics

Probability and Statistics ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries Outlie of Liear Systems of Equatios POLS, GLS, FGLS, GMM Commo Coefficiets, Pael Data Model Prelimiaries he liear pael data model is a static model because all explaatory variables are dated cotemporaeously

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

MA Advanced Econometrics: Properties of Least Squares Estimators

MA Advanced Econometrics: Properties of Least Squares Estimators MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value

More information

2.2. Central limit theorem.

2.2. Central limit theorem. 36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10. Poit Estimatio: properties of estimators fiite-sample properties CB 7.3) large-sample properties CB 10.1) 1 FINITE-SAMPLE PROPERTIES How a estimator performs for fiite umber of observatios. Estimator:

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

of the matrix is =-85, so it is not positive definite. Thus, the first

of the matrix is =-85, so it is not positive definite. Thus, the first BOSTON COLLEGE Departmet of Ecoomics EC771: Ecoometrics Sprig 4 Prof. Baum, Ms. Uysal Solutio Key for Problem Set 1 1. Are the followig quadratic forms positive for all values of x? (a) y = x 1 8x 1 x

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

Last Lecture. Wald Test

Last Lecture. Wald Test Last Lecture Biostatistics 602 - Statistical Iferece Lecture 22 Hyu Mi Kag April 9th, 2013 Is the exact distributio of LRT statistic typically easy to obtai? How about its asymptotic distributio? For testig

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors

A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors Appl. Math. J. Chiese Uiv. 008, 3(): 97-0 A ote o self-ormalized Dickey-Fuller test for uit root i autoregressive time series with GARCH errors YANG Xiao-rog ZHANG Li-xi Abstract. I this article, the uit

More information

Classical Linear Regression Model. Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares

Classical Linear Regression Model. Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares Classical Liear Regressio Model Normality Assumptio Hypothesis Testig Uder Normality Maximum Likelihood Estimator Geeralized Least Squares Normality Assumptio Assumptio 5 e X ~ N(,s I ) Implicatios of

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

Linear Regression Models, OLS, Assumptions and Properties

Linear Regression Models, OLS, Assumptions and Properties Chapter 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.1 The Liear Regressio Model The liear regressio model is the sigle most useful tool i the ecoometricia s kit. The multiple regressio model

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information