Lecture Stat Maximum Likelihood Estimation

Size: px
Start display at page:

Download "Lecture Stat Maximum Likelihood Estimation"

Transcription

1 Lecture Stat Maximum Likelihood Estimatio A.D. Jauary 2008 A.D. () Jauary / 63

2 Maximum Likelihood Estimatio Ivariace Cosistecy E ciecy Nuisace Parameters A.D. () Jauary / 63

3 Parametric Iferece Let f (xj θ) deote the joit pdf or pmf of the sample X = (X 1,..., X ) parametrized by θ 2 Θ. The give that X = x is observed, the fuctio L ( θj x) = f (xj θ) is the likelihood fuctio. The most commo estimate is the Maximum Likelihood Estimate (MLE) give by bθ = arg max L ( θj x). θ2θ A.D. () Jauary / 63

4 Example: Gaussia distributio f (x i j θ) = The we have with θ = log L ( θj x) = = 1 p exp (x i µ) 2 2πσ 2 2σ 2 µ, σ 2 log f (x i j θ) 2 log 2πσ2 1 2σ 2 (x i µ) 2. By takig the derivatives ad settig them to zero log L ( θj x) µ log L ( θj x) σ 2 = = 1 σ 2 (x i µ) = 0,! 2σ (σ 2 ) 2 (x i µ) 2 = 0.. A.D. () Jauary / 63

5 By solvig these equatios, we obtai cσ 2 = 1 bµ = 1 x i, (x i bµ) 2. Note that bµ is a ubiased estimate but c σ 2 is biased. A.D. () Jauary / 63

6 Example: Laplace Distributio (Double Expoetial) The we have f (x i j θ) = 1 2 exp ( jx i θj). log L ( θj x) = log 2 jx i θj. By takig the derivative, we obtai d log L ( θj x) dθ = sg (x i θ) hece bθ = med fx 1,..., x g for = 2p + 1. A.D. () Jauary / 63

7 Example (Uiform Distributio): Cosider X i U (0, θ), i.e. We have L ( θj x) = f (x i j θ) = 1/θ if 0 x < θ, 0 otherwise. (1/θ) f (x i j θ) = if θ x () 0 if θ < x (). It follows that bθ = x () where x (1) < x (2) < < x (). A.D. () Jauary / 63

8 Example (Liear Regressio): Let fx i, y i g be a set of data where x i = x1 i, x 2 i,..., x p i T is a set of explaatory variables ad yi 2 R is the respose. We assume y i = x T i β + ɛ i, ɛ i N 0, σ 2 thus We have for θ = log L (θ) = f (y i j x i, β) = = = β,σ 2 1 p exp y i x T i β! 2 2πσ 2 2σ 2 log f (y i j x i, β) 2 log 2πσ2 1 2σ 2 y i 2 x T i β 2 log 2πσ2 1 2σ 2 (y Xβ)T (y Xβ) where y = (y 1,..., y ) T ad X = (x 1,..., x ) T. A.D. () Jauary / 63

9 By takig the derivatives ad settig them to zero log L ( θj x) β = log L ( θj x) σ 2 = 1 2σ 2 2X T β + 2X T Xβ = 0, 2σ (σ 2 ) 2 (y Xβ)T (y Xβ) = 0. Thus we obtai 1 bβ = X X T X T y, cσ 2 = 1 y Xβ b T y Xβ b. A.D. () Jauary / 63

10 Example (Time Series): Cosider the followig autoregressio X 0 = x 0, X = αx 1 + σv where V i.i.d. N (0, 1) where θ = α, σ 2. We have L ( θj x) = f (xj θ) = Thus we have! 1 p exp (x i αx i 1 ) 2 2πσ 2 2σ 2 log L ( θj x) = cst log σ2 2 (x i αx i 1 ) 2 2σ 2 A.D. () Jauary / 63

11 It follows that Thus we have 2 log L ( θj x) σ 2 = 2 log L ( θj x) α bα = x i x 2 1 x i i 1 σ 2 + (x i αx i 1 ) 2 σ 4, = 2 σ 2 x i 1 (x i αx i 1 ) 2., σ c2 = (x i bαx i 1 ) 2. A.D. () Jauary / 63

12 Ivariace Cosider η = g (θ). We itroduce the iduced likelihood fuctio L L ( ηj x) = sup L ( θj x). fθ:g (θ)=ηg Ivariace property: If bθ is the MLE of θ the for ay fuctio η = g (θ) the g b θ is the MLE of η. Proof : The MLE of η is de ed by De e bη = arg sup η sup fθ:g (θ)=ηg L ( θj x) g 1 (η) = fθ : g (θ) = ηg. The clearly bθ 2 g 1 (bη) ad caot be i ay other preimage so bη = g b θ. A.D. () Jauary / 63

13 Cosistecy De itio. A sequece of estimators bθ = bθ (X 1,..., X ) is cosistet for the parameter θ if, for every ε > 0 ad every θ 2 Θ lim P θ bθ θ < ε = 1 (equivaletly lim P θ bθ θ ε = 0).!! i.i.d. Example: Cosider X i N (θ, 1) ad bθ = 1 X i the bθ N (θ, 1/) ad P θ bθ θ < ε = Z ε p ε p 1 p 2π exp u 2 2 du! 1. It is possible to avoid this calculatios ad use istead Chebychev s iequality b 2 E P θ bθ θ ε = P θ bθ θ 2 θ θ θ ε 2 where E θ b θ 2 2 θ = var θ b θ + E θ b θ θ. A.D. () Jauary / 63 ε 2

14 Example of icosistet MLE (Fisher) (X i, Y i ) N µi µ i The likelihood fuctio is give by L (θ) = We obtai 1 (2πσ 2 ) exp l (θ) = cste log σ 2 " 1 xi + y i 2σ We have bµ i = x i + y i 2 σ 2 0, 0 σ 2. 1 h 2σ 2 (x i µ i ) 2 + (y i µ i ) 2i! µ i , c σ 2 = (x i y i ) 2 4 # (x i y i ) 2.! σ2 2. A.D. () Jauary / 63

15 Cosistecy of the MLE Kullback-Leibler Distace: For ay desity f, g Z f (x) D (f, g) = f (x) log dx g (x) We have Ideed Z D (f, g) = D (f, g) 0 ad D (f, f ) = 0. f (x) log Z g (x) dx f (x) g (x) f (x) f (x) 1 dx = 0 D (f, g) is a very useful distace ad appears i may di eret cotexts. A.D. () Jauary / 63

16 Alterative measures of similarity Helliger distace D (f, g) = Z q f (x) q g (x) 2 dx Geeralized iformatio D (f, g) = 1 λ Z f (x) λ 1! f (x) dx g (x) L1-orm / Total variatio Z D (f, g) = jf (x) g (x)j dx L2-orm / Total variatio Z D (f, g) = (f (x) g (x)) 2 dx A.D. () Jauary / 63

17 Example Suppose we have f (x) = N x; ξ, τ 2 ad g (x) = N x; µ, σ 2. We have h E f (X µ) 2i h = E f (X ξ) (X ξ) (ξ µ) + (ξ µ) 2i So it follows that ad = τ 2 + (ξ µ) 2 E f [log g (X )] = E f " = # 1 2 log 2πσ2 (X µ) 2 2σ log 2πσ2 τ 2 + (ξ µ) 2 2σ 2 E f [log f (X )] = 1 2 log 2πτ A.D. () Jauary / 63

18 It follows that D (f, g) = Z f (x) f (x) log dx g (x) = E f [log f (X )] E f [log g (X )] ( = 1 ) σ 2 log 2 τ 2 + τ2 + (ξ µ) 2 σ 2 1 It ca be easily checked that D (f, f ) = 0 (less easy to show D (f, g) 0). A.D. () Jauary / 63

19 Example Assume we have f (x) = 1 2 exp ( jxj) ad g (x) = N x; µ, σ2. We obtai E f [log f (X )] = log 2 = log = log 2 1, Z 0 Z jxj exp ( x exp ( E f [log g (X )] = 1 2 log 2πσ2 1 4σ 2 It follows that jxj) dx jxj) dx 4 + 2µ 2 D (f, g) = 1 2 log 2πσ σ µ 2 log 2 1. A.D. () Jauary / 63

20 Assume the pdfs f (xj θ) have commo support for all θ ad f (xj θ) 6= f xj θ 0 for θ 6= θ 0 ; i.e. S θ = fx : f (xj θ) > 0g is idepedet of θ. Deote M (θ) = 1 log f (X i j θ) f (X i j θ ) As the MLE bθ maximises L ( θj x), it also maximizes M (θ). Assume X i i.i.d. f (xj θ ). Note that by the law of large umbers M (θ) coverges to f (X j θ) E θ log f (X j θ ) = Z f (xj θ) f (xj θ ) log f (xj θ ) dx = D (f ( j θ ), f ( j θ)) := M (θ). Hece, M (θ) D (f ( j θ ), f ( j θ)) which is maximized for θ so we expect that its maximizer will coverge towards θ. A.D. () Jauary / 63

21 Example Assume f (x) = g (x) = N (x; 0, 1). We approximate E f [log g (X )] = 1 2 log (2π) 1 2 = through E b f [log g (X )] = 1 2 log (2π) 1 2 Xi 2 Numerical examples , , 000 E f [log g (X )] Mea Variace Stadard deviatio Mea, variace ad stadard deviatio by ruig 1,000 Mote Carlo trials A.D. () Jauary / 63

22 Theorem. Suppose ad that, for every ε > 0, sup jm (θ) M (θ)j! P 0 θ2θ sup M (θ) < M (θ ) θ:jθ θ jε the bθ P! θ A.D. () Jauary / 63

23 Proof. Sice bθ maximizes M (θ), we have M b θ M (θ ). Thus, M (θ ) M b θ = M (θ ) M b θ + M (θ ) M (θ ) M b θ M b θ + M (θ ) M (θ ) 2sup jm (θ) θ P! 0. M (θ)j Thus it implies that for ay δ > 0, we have Pr M b θ < M (θ ) δ! 0. Now for ay ε > 0, there exists δ > 0 such that jθ θ j ε implies M (θ) < M (θ ) δ. Hece, Pr bθ θ > ε Pr M b θ < M (θ ) δ! 0. A.D. () Jauary / 63

24 Asymptotic Normality Assumig we have bθ P! θ, what ca we say about p b θ log f ( x jθ) θ Lemma. Let s (xj θ) := be the score fuctio, the we have for ay θ E θ [s (X j θ)] = 0. Proof. We have Z log f (xj θ) f (xj θ) dx θ Z f ( x jθ) Z θ f (xj θ) = f (xj θ) f (xj θ) dx = dx θ = Z f (xj θ) dx = 0. θ {z } =1 θ? A.D. () Jauary / 63

25 Lemma. We also have h var θ [s (X j θ)] = E θ s (X j θ) 2i = 2 log f (X j θ) E θ θ 2 := I (θ) Proof. This follows from Z log f (xj θ) f (xj θ) dx = 0 θ thus by takig the derivative oce more with respect to θ 0 = Z log f (xj θ) f (xj θ) dx θ θ Z = 2 Z log f (xj θ) log f (xj θ) θ 2 f (xj θ) + θ f (xj θ) dx θ A.D. () Jauary / 63

26 Heuristic Derivatio. We have for l (θ) := log L ( θj x) 0 = l 0 bθ l 0 (θ ) + b θ θ l 00 (θ ) ) b θ θ = l 0 (θ ) l 00 (θ ) That is p b θ θ = p1 l 0 (θ ) 1 l 00 (θ ). Now remember that l 0 (θ ) = s (X i j θ ) where E θ [s (X i j θ )] = 0 ad var θ [s (X i j θ )] = I (θ ) so the CLT tells us that 1 p l 0 (θ ) D! N (0, I (θ )) A.D. () Jauary / 63

27 Now the law of large umber yields 1 l 00 (θ ) P! I (θ ) so by Slutsky s theorem p b D! 1 θ θ N 0,, p q I (θ ) b D! θ θ N (0, 1) I (θ ) Note that you have already see this expressio whe establishig the Cramer-Rao boud. It is importat to remember that depedig o θ the parameter ca be more or less easy to estimate. A.D. () Jauary / 63

28 Similarly, we ca prove that r p I b θ b D! θ θ N (0, 1). We ca also prove that p g 0 bθ r I b θ g b θ g (θ ) D! N (0, 1). This allows us to derive some co dece itervals. A.D. () Jauary / 63

29 Makig the proof more rigourous We have l 0 bθ = l 0 (θ ) + b θ θ l 00 (θ ) b θ θ 2 l 000 (θ, ) where θ, lies betwee bθ ad θ so that p b θ θ = 1 p l 0 (θ ) 2 1 l 00 1 (θ ) b 2 θ θ l 000 (θ, ). To proof the result, we eed to check that 2 1 b θ θ l 000 (θ, )! P P 0. As bθ! θ, we just eed to prove that 2 1 l 000 (θ, ) is bouded (i probability). So we eed a additioal coditio of the form say for ay θ with E θ [C (X )] <. 3 log f (xj θ) θ 3 C (x) A.D. () Jauary / 63

30 Multiparameter Case The extesio to the multiparameter case θ = (θ 1,..., θ d ) is straightforward p b θ θ D! N (0, J (θ )) where J (θ ) = I (θ ) 1 where We de e rg := p g b θ [I (θ )] k,l = 2 log f (xj θ) E θ. θ k θ l g θ 1,..., g θ d T the if rg (θ ) 6= 0 D! g (θ ) N 0, rg (θ ) J (θ ) rg (θ ) T A.D. () Jauary / 63

31 Example: If X i i.i.d. N log f (xj θ) = cst s (xj θ) = I (θ) = 1 σ log σ (x µ) σ 2 µ, σ 2 with θ = (µ, σ) the + (x µ)2 σ 3 The MLE of µ is give by (x µ) 2 2σ 2! h i 1 2(x µ) E σ 2 θ σ h i 3 2(x µ) E θ E 1 σ 3 θ + σ 2 bµ = 1, 3(x µ)2 σ 4 X i ) var [bµ] = σ2 1 A = 1 σ σ 2 ad the MLE is ideed e ciet (it reaches Cramer-Rao lower boud). A.D. () Jauary / 63

32 Assume we observe a vector X = (X 1,..., X k ) where X j 2 f0, 1g, k j=1 X j = 1 with f (xj p 1,..., p k 1 ) =! k 1 p x j j j=1 where p j > 0 ad p k := 1 k 1 j=1 p j < 1. We have θ = (p 1,..., p k 1 ). We have log f (xj θ) = x j p j p j 2 log f (xj θ) pj 2 = 2 log f (xj θ) p j p l = x j p 2 j x k pk 2 1 x k p k, x k pk 2, j 6= l < k.! xk k 1 p j j=1 A.D. () Jauary / 63

33 Recall that X j has a Beroulli distributio with mea p j so 2 p1 1 + pk 1 pk 1 pk 1 pk 1 p2 1 + pk 1. I (θ) = 6 4. pk 1... pk p k Oe ca check that 2 p 1 (1 p 1 ) p 1 p 2 p 1 p k 1 I (θ) 1 p 1 p 2 p 2 (1 p 2 ) p 2 p k 1 = p 1 p k 1 p 2 p k 1 p k 1 (1 p k 1 ) A.D. () Jauary / 63

34 Now assume we observe X 1, X 2,..., X the log L ( θj x) = k t j log p j = j=1 k 1 t j log p j + t k log 1 j=1! k 1 p j j=1 where t j = x i j. So we have log L ( θj x) p j = t j p j t k p k for j = 1,..., k 1 ) bp j = t j. Clearly t j is Biomial(, p j ) with variace p j (1 e ciet. p j ) so bp j is A.D. () Jauary / 63

35 Nuisace Parameters Assume θ = (θ 1,..., θ d ) is the parameter vector but oly the scalar θ 1 is of iterest whereas (θ 2,..., θ d ) are uisace parameters. We wat to assess how the asymptotic precisio with which we estimate θ 1 is i ueced by the presece of uisace parameters; i.e. if bθ is a e ciet estimate for θ, the how does bθ 1 as a estimator of θ 1 compare to a e ciet estimatio of θ 1, say eθ 1, which would assume that all the uisace parameters are kow. h i h i Ituitively, we should have var e θ 1 var b θ 1 ; i.e. igorace caot brig you ay advatage. A.D. () Jauary / 63

36 Asymptotic variace of p b θ parameter is deoted α i,j. Asymptotic variace of p e θ 1 θ is I 1 (θ ) whose (i, j) θ,1 is I 1 (θ,1 ) = 1/γ 1,1. Theorem. We have α 1,1 1/γ 1,1, with equality if ad oly if α 1,2 = = α 1,d = 0. A.D. () Jauary / 63

37 Partitio I (θ) as follows I (θ) = γ1,1 ρ ρ T Σ. Now we use the fact that I 1 (θ) = 1 1 ρ T Σ 1 τ Σ 1 ρ Σ 1 ρρ T Σ 1 + τσ 1 where τ = γ 1,1 ρ T Σ 1 ρ. As I (θ) is de ite positive the Σ 1 is de ite positive ad α 1,1 = 1 τ 1/γ 1,1 with equality i ρ = 0. To show that τ > 0 we use the fact that I (θ) is p.d. ad that τ = v T 1 I (θ) v where v = ρ T Σ 1. A.D. () Jauary / 63

38 Beyod Maximum Likelihood: Method of Momets MLE estimates ca be di cult to compute, the method of momets is a simple alterative. The obtaied estimators are typically ot optimal but ca be used as startig values for more sophisticated methods. For 1 j d, de e the j th momet of f (xj θ) where θ = (θ 1,..., θ d ) α j (θ) = E θ X j Z = x j f (xj θ) dx ad, give X = (X 1,..., X ), the j th sample as bα j = 1 X j i. The idea of the method of momets method is to match the theoretical momets to the sample momets; that is we de ed bθ as the value of θ such that α j b θ = bα j for j = 1,..., d. A.D. () Jauary / 63

39 Example: Let X i i.i.d. N µ, σ 2 with θ = µ, σ 2 the Thus we obtai Note that c σ 2 is ot ubiased. α 1 (θ) = µ, α 2 (θ) = σ 2 + µ 2, bα 1 = 1 Xi 1, bα 2 = 1 Xi 2 bµ = bα 1 ad c σ 2 = bα 2 (bα 1 ) 2. A.D. () Jauary / 63

40 Assume X i i.i.d. U (θ 1, θ 2 ) where < θ 1 < θ 2 < + the α 1 (θ) = θ 1 + θ 2 2 Now we solve ad obtai θ 1 = 2bα 1 θ 2,, α 2 (θ) = θ2 1 + θ θ 1 θ bα 2 = (2bα 1 θ 2 ) 2 + θ (2bα 1 θ 2 ) θ 2, (θ 2 bα 1 ) 2 = 3 Sice θ 2 > E (X ) the r r bθ 2 = bα bα 2 bα 2 1, bθ 1 = bα 1 3 bα 2 bα 2 1. Note that b θ 1, bθ 2 is NOT a fuctio of the su ciet statistics X (1), X (). bα 2 bα 2 1 A.D. () Jauary / 63

41 Assume X i i.i.d. Bi (p, k) with parameters k 2 N ad p 2 (0, 1). We have Thus we obtai α 1 (θ) = kp, α 2 (θ) = kp (1 p) + k 2 p 2. bp = bα 1 + bα 2 1 bα 2 /bα 1, bk = bα 2 1/ bα 1 + bα 2 1 bα 2. The estimator bp 2 (0, 1) but bk is geerally ot a iteger. A.D. () Jauary / 63

42 Statistical Properties of the Estimate Let bα = (bα 1,..., bα d ), we have bα = h b θ ad if the iverse fuctio g = h 1 exists, the bθ = g (bα). If g is cotiuous at α = (α 1,..., α d ) the bθ is a cosistet estimate of θ as bα j! α j. Moreover if g is di eretiable at α ad E θ X 2d < the p b D! θ θ N 0, rg (α) T V α rg (α) where V α [i, j] = α i+j α i α j. A.D. () Jauary / 63

43 The result follows from p (bα α) D! N (0, V α ) as We have E θ [bα j, ] = α j, cov [bα i, bα j, ] = α i+j bθ θ = g (bα ) g (α ) α i α j so usig the delta method p b D! θ θ N 0, rg (α) T V α rg (α) A.D. () Jauary / 63

44 We ca also establish the 1 order asymptotic bias of the estimate as g (bα ) = g (α) + rg (α) T (bα α) (bα α) T r 2 g (α) (bα α) + o 1 where p (bα α) D! Z Σ with Z Σ N (0, Σ) so (bα α) T r 2 g (α) (bα α) D! Z T Σ r 2 g (α) Z Σ as recall that X D! X implies ϕ (X ) D! ϕ (X ). Thus we have E [g (bα )] = g (α) + 1 h i 1 2 E ZΣ T r 2 g (α) Z Σ + o = g (α) + tr r2 g (α) Σ 1 + o. 2 A.D. () Jauary / 63

45 Beyod Maximum Likelihood: Pseudo-Likelihood Assume X = (X 1,..., X q ) f (xj θ). Give observatios X i f (xj θ), the MLE requires maximizig L ( θj x). However i some problems, it might be di cult to specify f (xj θ) ad we may be oly able to specify say f (x k, x l j θ) for 1 k < l q Based o this iformatio ad observatios, we could de e the pseudo-log-likelihood fuctios l 1 ( θj x) = l 2 ( θj x) = l 1 θj x i = l 2 θj x i = q s=1 q s=1 log f (x s j θ), q t=s+1 log f (x s, x t j θ) + αl 1 ( θj x). These pseudo-likelihood fuctios are simpler that the full likelihood. A.D. () Jauary / 63

46 Uder regularity coditios very similar to the oes for the MLE, solvig l 0 k ( θj x) = 0 for k = 1, 2 will provide ubiased estimates. To derive the asymptotic variace, we use 0 = lk 0 b θ lk 0 (θ ) + b θ ) p b θ θ = 1 p l 0 k (θ ) 1 l 00 k (θ ) θ l 00 k (θ ) where 1 l k 00 (θ )! P E θ [lk 00] ad p 1 l 0 (θ )! D N 0, E θ l 02 k, thus p b D! θ θ N 0, E θ lk 00 2 Eθ l 02 k. We have the estimates E θ l 00 1 k l 00 k ( θj x i ), E θ l 02 1 k lk 02 ( θj x i ). A.D. () Jauary / 63

47 Example. Assume that X = (X 1,..., X q ) N (0, Σ) where [Σ] (i, j) = 1 if i = j ad ρ otherwise. We are iterested i estimatig θ = ρ. There is o iformatio about ρ i l 1 ( θj x) so we use l 2 ( θj x) for α = 0. For observatios X 1,..., X, we have l 2 ( θj x) = q (q 1) 4 (q 1) (1 ϱ) 2 (1 ρ 2 ) log 1 ρ 2 q 1 + ρ 2 (1 ρ 2 ) SS W SS B q where SS W = q s=1 X i s!! q 2 Xt i, SS B = X i 2 t. t=1 A.D. () Jauary / 63

48 After simple but tiedous calculatios, we obtai for the asymptotic variace 2 1 ρ 2 c (q, ρ) q (q 1) (1 + ρ 2 ) 2 where c (q, ρ) = 1 ρ ρ 2 + qρ 3ρ 3 + 8ρ 2 3ρ + 2 +q 2 ρ 2 1 ρ 2 whereas for MLE we have 2 (1 + (q 1) ρ) 2 (1 ρ) 2 q (q 1) 1 + (q 1) ρ 2. The ratio is 1 for q = 2 as expected ad also 1 if ρ = 0 or 1. For ay other values, there is a loss of e ciecy for l 2 ( θj x) which icreases as q!. A.D. () Jauary / 63

49 Cosider the followig time series X 0 N 0, σ 2 1 α 2, X = αx 1 + σv where V i.i.d. N (0, 1) where θ = σ 2. We ca show that we have for ay i = 0, 1,..., 1 f (x i j θ) = p exp 1 α 2! xi 2 2πσ 2 2σ 2 ad we cosider 2l 1 ( θj x) = 2 log f (x i j θ) = cste log σ 2 1 α 2 This pseudo-likelihood ca easily be maximized 2σ 2 xi 2 cσ 2 = 1 α2 xi 2. If oe is iterested i estimatig α, it would be ecessary to itroduce f (x i, x i+1 j θ). A.D. () Jauary / 63

50 Pseudo-likelihood is widely used for Markov radom elds sice its itroductio by Besag (1975). I the Gaussia cotext, we have X = (X 1,..., X ) where d is extremely large ad Gaussia ad the model is speci ed by E θ [X i j x i ] = λ H ij x j, var θ [X i j x i ] = κ. Computig the likelihood for θ = (λ, κ) ca be too computatoally itesive so the pseudo-likelihood is de ed through thus el ( θj x) = bλ = xt Hx x T H 2 x, κ = d 1 log f (x i j θ, x i ) x T x x T Hx! 2 x T H 2 x I this cotext, it ca be show that the estimate is cosistet ad has a reasoable asymptotic variace. A.D. () Jauary / 63

51 Summary of Pseudo Likelihood I may applicatios, the log-likelihood l (θ; y 1: ) is very complex to compute. Istead we maximize a surrogate fuctio l S (θ; y 1: ). If possible, we pick this fuctio such that if θ is the true parameter the E θ (l S (θ; Y 1: )) is maximized for θ = θ ad solvig is easy. rl S b θ ; Y 1: = 0 A.D. () Jauary / 63

52 Uder regularity assumptios, we have p b θ θ ) N 0, G 1 (θ ) where with G 1 (θ) = H 1 (θ) J (θ) H T (θ) J (θ) = V frl S (θ; Y 1: )g, H (θ) = E r 2 l S (θ; Y 1: ). Whe l S (θ; Y 1: ) = l (θ; Y 1: ) ad the model is correctly speci ed the G (θ) is the Fisher iformatio matrix. Whe l S (θ; Y 1: ) 6= l (θ; Y 1: ), we typically lose i terms of e ciecy. A.D. () Jauary / 63

53 Applicatio to Geeral State-Space Models Cosider the followig geeral state-space model. Let fx k g k1 be a Markov process de ed by X 1 µ θ ad X k j (X k 1 = x k 1 ) f θ ( j x k 1 ). The we have that for ay > 0 p θ (x 1: ) = p θ (x 1 ) = µ θ (x 1 ) k=2 k=2 p θ (x k j x 1:k 1 ) f θ (x k j x k 1 ). A.D. () Jauary / 63

54 We are iterested i estimatig θ from the data but we do ot have access to fx k g k1. We oly have access to a process fy k g k1 such that, coditioal upo fx k g k1, the observatios are statistically idepedet ad That is we have for ay > 0 p θ (y 1: j x 1: ) = Y k j (X k = x k ) g θ ( j x k ). p θ (y k j x k ) = g θ (y k j x k ). k=1 k=1 A.D. () Jauary / 63

55 Examples Liear Gaussia model. Cosider say for jαj < 1 X 1 N σ 2 i.i.d. 0, 1 α 2, X k = αx k 1 + σv k where V k N (0, 1), Y k = β + X k + τw k where W k i.i.d. N (0, 1). I this case we have say θ = β, σ 2, α, τ 2 ad f θ (x k j x k 1 ) = N x k ; αx k 1, σ 2, g θ (y k j x k ) = N y k ; β + x k, τ 2. A.D. () Jauary / 63

56 Stochastic Volatility model. Cosider say for jαj < 1 X 1 N σ 2 i.i.d. 0, 1 α 2, X k = αx k 1 + σv k where V k N (0, 1), Y k = β exp (X k /2) W k where W k i.i.d. N (0, 1). I this case we have say θ = β, σ 2, α ad f θ (x k j x k 1 ) = N x k ; αx k 1, σ 2, g θ (y k j x k ) = N y k ; 0, β 2 exp (x k ). A.D. () Jauary / 63

57 I this case, the likelihood of the observatios y 1: is give by Z p θ (y 1: ) = p θ (x 1:, y 1: ) dx 1: Z = p θ (y 1: j x 1: ) p θ (x 1: ) dx 1:!! Z = g θ (y k j x k ) µ θ (x 1 ) f θ (x k j x k 1 ) dx 1:. k=1 k=2 If the model is liear Gaussia or ite state-space, we ca compute the likelihood i closed-form but the maximizatio is ot trivial. Otherwise, we caot. A.D. () Jauary / 63

58 Pairwise likelihood for state-space models We cosider the followig pseudo-likelihood for m 1 L S (θ; y 1: ) = 1 mifi +m,g p θ (y i, y j ) j=i+1 where Z p θ (y i, y j ) = g θ (y i j x i ) g θ (y j j x j ) p θ (x i, x j ) dx i dx j. As a alterative, if = pm, we could maximize L S (θ; y 1: ) = p p θ y (i 1)m+1:im. A.D. () Jauary / 63

59 For the two models discussed earlier, it is possible to compute exactly p θ (x i, x j ) as p θ (x i, x j ) = p θ (x i ) p θ (x j j x i ) where σ p θ (x i ) = N 2 x i ; 0, 1 α 2! p θ (x j j x i ) = N x j ; α j i x i, σ 2 j i 1 α 2k. k=0 I a geeral case, we could approximate p θ (y i, y j ) through Mote Carlo bp θ (y i, y j ) = 1 N N g θ y i j Xi l g θ y j j Xj l l=1 where Xi l, X j l p θ (x i, x j ). A.D. () Jauary / 63

60 Uder regularity assumptios, we have Z l S (θ; y 1: ) p θ (y 1: ) dy 1: which is maximum i θ so maximizig this pseudo-likelihood method makes sese. To prove it, ote that l S (θ; y 1: ) = 1 mifi +m,g log p θ (y i, y j ) j=i+1 ad = Z Z log p θ (y i, y j ).p θ (y 1: ) dy 1: log p θ (y i, y j ).p θ (y i, y j ) dy i dy j which is maximum i θ = θ. A.D. () Jauary / 63

61 Applicatio Cosider X 1 N σ 2 i.i.d. 0, 1 α 2, X k = αx k 1 + σv k where V k N (0, 1), Y k = β + X k + τw k where W k i.i.d. N (0, 1). where θ = β, σ 2, α, τ 2. I this case, we ca directly establish ot oly p θ (x i, x j ) but p θ (y i, y j )!! Yi β τ 2 + σ2 α j i σ2 N, 1 α 2 1 α 2 Y j β α j i σ2 τ 2 + σ2 1 α 2 1 α 2 For m = 2,..., 20 we compare the performace of bθ MLE ad bθ MPL where the likelihood ad pseudo-likelihood are maximized usig a simple gradiet algorithm (EM could be used). 1,000 time series of legth = 500 with β = 0.1, τ = 1.0, α = 0.95 ad σ = 0.55 are simulated. A.D. () Jauary / 63

62 true bθ (2) MPL bθ (6) MPL bθ (12) MPL bθ (20) MPL bθ ML β (0.488) (0.489) (0.4908) (0.492) (0.481) τ (0.066) (0.048) (0.054) (0.068) (0.046) α (0.033) (0.020) (0.022) (0.024) (0.020) σ (0.160) (0.064) (0.072) (0.087) (0.061) A.D. () Jauary / 63

63 Now you should... be able to compute MLE estimate for rather complex models, be able to compute the asymptotic variace of the MLE estimate, be able to derive the expressio of the asymptotic variace of simple estimates di eret from MLE. A.D. () Jauary / 63

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation ECE 645: Estimatio Theory Sprig 2015 Istructor: Prof. Staley H. Cha Maximum Likelihood Estimatio (LaTeX prepared by Shaobo Fag) April 14, 2015 This lecture ote is based o ECE 645(Sprig 2015) by Prof. Staley

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution A Note o Box-Cox Quatile Regressio Estimatio of the Parameters of the Geeralized Pareto Distributio JM va Zyl Abstract: Makig use of the quatile equatio, Box-Cox regressio ad Laplace distributed disturbaces,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1 MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Lecture 3: MLE and Regression

Lecture 3: MLE and Regression STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Notes 5 : More on the a.s. convergence of sums

Notes 5 : More on the a.s. convergence of sums Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators Lecture 2: Poisso Sta*s*cs Probability Desity Fuc*os Expecta*o ad Variace Es*mators Biomial Distribu*o: P (k successes i attempts) =! k!( k)! p k s( p s ) k prob of each success Poisso Distributio Note

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/

More information

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2008 Problems 1: Solution sketches Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Lecture 6 Ecient estimators. Rao-Cramer bound.

Lecture 6 Ecient estimators. Rao-Cramer bound. Lecture 6 Eciet estimators. Rao-Cramer boud. 1 MSE ad Suciecy Let X (X 1,..., X) be a radom sample from distributio f θ. Let θ ˆ δ(x) be a estimator of θ. Let T (X) be a suciet statistic for θ. As we have

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Statistical Theory MT 2009 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes. Term Test 3 (Part A) November 1, 004 Name Math 6 Studet Number Directio: This test is worth 10 poits. You are required to complete this test withi miutes. I order to receive full credit, aswer each problem

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

Probability and Statistics

Probability and Statistics ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

1 General linear Model Continued..

1 General linear Model Continued.. Geeral liear Model Cotiued.. We have We kow y = X + u X o radom u v N(0; I ) b = (X 0 X) X 0 y E( b ) = V ar( b ) = (X 0 X) We saw that b = (X 0 X) X 0 u so b is a liear fuctio of a ormally distributed

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data. 17 3. OLS Part III I this sectio we derive some fiite-sample properties of the OLS estimator. 3.1 The Samplig Distributio of the OLS Estimator y = Xβ + ε ; ε ~ N[0, σ 2 I ] b = (X X) 1 X y = f(y) ε is

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information