Inference in non-linear time series
|
|
- Deirdre Rosanna Wright
- 5 years ago
- Views:
Transcription
1 Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU
2 Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators Least squares Maximum Likelihood MLE asymptotics Two theorems Fisher Information Proofs Other estimators
3 Intro LS MLE Other General Properties Popular estimatiors General We are interested in estimating a parameter vector θ 0 from data X. Ad hoc or formal estimation? Properties Denition ˆθ = T (X). Interpretation
4 Intro LS MLE Other General Properties Popular estimatiors Properties Bias b = θ 0 E[T (X)]. Asympt. bias lim θ 0 E[T (X)]. Consistency ˆθ p θ 0 P( ˆθ θ 0 > ε) 0 as. a.s. Strong consistency ˆθ θ 0. Eciency Var[T (X)] I 1 (θ 0), where I 1 (θ 0) is the Fisher information matrix. Asympt. normality Convergence α (ˆθ θ 0 ) d (µ, Σ).
5 Intro LS MLE Other General Properties Popular estimatiors Popular estimators How are the estimates computed? Optimization: Least squares (LS), Weighted Least Squares (WLS), Prediction Error Methods (PEM), Generalized Method of Moments (GMM) etc. Solving equations: Estimation functions (EFs), Method of moments (MM) Either: Bayesian methods (MCMC), Maximum Likelihood, Quasi Maximum Likelihood
6 Intro LS MLE Other Least squares Observations y 1,..., y Predictors J j=1 B j(x)θ j Vector form: Y = Zθ + ε, ε (0, Ω) (Weighted) Parameter estimation: Generalization ˆθ LS = argmin θ Θ n=1 λ n y n j B j (x n )θ j 2 (1) = (Y Zθ) T W (Y Zθ) (2) ˆθ PLS = (Y Zθ) T W (Y Zθ) + (θ θ 0 ) T D(θ θ 0 ) (3)
7 Intro LS MLE Other Least squares Observations y 1,..., y Predictors J j=1 B j(x)θ j Vector form: Y = Zθ + ε, ε (0, Ω) (Weighted) Parameter estimation: Generalization ˆθ LS = argmin θ Θ n=1 λ n y n j B j (x n )θ j 2 (1) = (Y Zθ) T W (Y Zθ) (2) ˆθ PLS = (Y Zθ) T W (Y Zθ) + (θ θ 0 ) T D(θ θ 0 ) (3)
8 Intro LS MLE Other Estimate ˆθ = ( Z T WZ) 1 (ZWY ) (4) Bias? o, not if the correct model is used ] ( 1 E [ˆθ = Z WZ) T (ZW (Zθ + ε)) (5) ( 1 = θ + Z WZ) T (ZW ε) (6) Variance ] Var [ˆθ = (Z T WZ) 1 (Z T W ΩWZ)(Z T WZ) 1 (7) Simplies if W = Ω 1, and Ω = σ 2 I. We then get ] Var [ˆθ = σ 2 (Z T Z) 1. (8)
9 Intro LS MLE Other Estimate ˆθ = ( Z T WZ) 1 (ZWY ) (4) Bias? o, not if the correct model is used ] ( 1 E [ˆθ = Z WZ) T (ZW (Zθ + ε)) (5) ( 1 = θ + Z WZ) T (ZW ε) (6) Variance ] Var [ˆθ = (Z T WZ) 1 (Z T W ΩWZ)(Z T WZ) 1 (7) Simplies if W = Ω 1, and Ω = σ 2 I. We then get ] Var [ˆθ = σ 2 (Z T Z) 1. (8)
10 Intro LS MLE Other Estimate ˆθ = ( Z T WZ) 1 (ZWY ) (4) Bias? o, not if the correct model is used ] ( 1 E [ˆθ = Z WZ) T (ZW (Zθ + ε)) (5) ( 1 = θ + Z WZ) T (ZW ε) (6) Variance ] Var [ˆθ = (Z T WZ) 1 (Z T W ΩWZ)(Z T WZ) 1 (7) Simplies if W = Ω 1, and Ω = σ 2 I. We then get ] Var [ˆθ = σ 2 (Z T Z) 1. (8)
11 Intro LS MLE Other Estimate ˆθ = ( Z T WZ) 1 (ZWY ) (4) Bias? o, not if the correct model is used ] ( 1 E [ˆθ = Z WZ) T (ZW (Zθ + ε)) (5) ( 1 = θ + Z WZ) T (ZW ε) (6) Variance ] Var [ˆθ = (Z T WZ) 1 (Z T W ΩWZ)(Z T WZ) 1 (7) Simplies if W = Ω 1, and Ω = σ 2 I. We then get ] Var [ˆθ = σ 2 (Z T Z) 1. (8)
12 Intro LS MLE Other F-tests We estimate σ 2 = (Y X θ)t (Y X θ) J Dene Q(ˆθ) = (Y X ˆθ) T (Y X ˆθ) = Y T (I P) T (I P)Y where P is the projection matrix We approximate Q by E[Q] = tr ( (I P) T (I P) Cov[Y, Y ] ). It holds for the standard model that Cov[Y, Y ] = σ 2 I and (I P) T (I P) = (I P) It then follows that ( ) tr (I P) T (I P) Cov[Y, Y ] = σ 2 tr (I P) (9) = σ 2 tr(i ) σ 2 tr(p) (10) ( ) = σ 2 σ 2 tr Z(Z T Z) 1 Z T (11) ( ) = σ 2 ( tr (Z T Z) 1 (Z T Z) = σ 2 ( J) (12)
13 Intro LS MLE Other What about the penalized asymptotics? What if the estimation is given by ˆθ PLS = (Y Zθ) T W (Y Zθ) + (θ θ 0 ) T D(θ θ 0 ) (13) Derive on blackboard. Another popular form is ˆθ Adaptive LASSO = (Y Zθ) T W (Y Zθ) + D(θ θ 0 ) 1 (14)
14 Intro LS MLE Other What about the penalized asymptotics? What if the estimation is given by ˆθ PLS = (Y Zθ) T W (Y Zθ) + (θ θ 0 ) T D(θ θ 0 ) (13) Derive on blackboard. Another popular form is ˆθ Adaptive LASSO = (Y Zθ) T W (Y Zθ) + D(θ θ 0 ) 1 (14)
15 Intro LS MLE Other Theorems Info Proofs Maximum Likelihood estimators Dened as ˆθ MLE = argmax p θ (X 1,... X ). θ Θ The MLE use all information in the data, and have nice properties: Ia. Consistency ˆθ p θ 0. Ib. Asympt. normality II. Eciency Var[T (X)] = I 1 0 (θ 0 ), III. Invariant. We have under general conditions that (ˆθ θ0 ) d (0, I 1 0 (θ 0 )).
16 Intro LS MLE Other Theorems Info Proofs Two useful theorems Assume X i iid and E[X i ] = µ, Var[X i ] = σ 2 Law of Large umbers 1 Central Limit Theorem where Z (0, 1). 1 i=1 i=1 X i p/a.s. µ. X i µ σ These theorems can be generalized further. d Z,
17 Intro LS MLE Other Theorems Info Proofs Fisher information The Fisher Information matrix is dened as [ ] I (θ) = Var θ log p θ(x ), which is equivalent to and E ote E [ θ log p θ(x θ) ] = 0. [ ( ) ] 2 θ log p θ(x ) [ ] 2 E θ 2 log p θ(x ).
18 Intro LS MLE Other Theorems Info Proofs Proofs Ia. Kullback-Leibler and law of large number Ib. Second order Taylor expansion of the Likelihood. II. Cauchy-Schwartz inequality III. Direct calculations.
19 Intro LS MLE Other Theorems Info Proofs Consistency The log-likelihood function is dened as l(θ) = log p θ (x n x 1:n 1 ) (15) The estimate is given by n=1 1 ˆθ MLE = argmax l(θ) = argmax l(θ) (16) θ Θ θ Θ = argmax θ Θ 1 n=1 log p θ (x n x 1:n 1 ) (17) ow we rewrite this as 1 l(θ) = 1 l(θ) ± E[l(θ)] (18) ( ) 1 = E[l(θ)] + l(θ) E[l(θ)] (19)
20 Intro LS MLE Other Theorems Info Proofs Consistency The log-likelihood function is dened as l(θ) = log p θ (x n x 1:n 1 ) (15) The estimate is given by n=1 1 ˆθ MLE = argmax l(θ) = argmax l(θ) (16) θ Θ θ Θ = argmax θ Θ 1 n=1 log p θ (x n x 1:n 1 ) (17) ow we rewrite this as 1 l(θ) = 1 l(θ) ± E[l(θ)] (18) ( ) 1 = E[l(θ)] + l(θ) E[l(θ)] (19)
21 Intro LS MLE Other Theorems Info Proofs Consistency The log-likelihood function is dened as l(θ) = log p θ (x n x 1:n 1 ) (15) The estimate is given by n=1 1 ˆθ MLE = argmax l(θ) = argmax l(θ) (16) θ Θ θ Θ = argmax θ Θ 1 n=1 log p θ (x n x 1:n 1 ) (17) ow we rewrite this as 1 l(θ) = 1 l(θ) ± E[l(θ)] (18) ( ) 1 = E[l(θ)] + l(θ) E[l(θ)] (19)
22 Intro LS MLE Other Theorems Info Proofs Consistency cont. This decomposition shows that 1 l(θ) = It then follows that E[l(θ)] }{{} Expected log likelihood ˆθ MLE = argmax E[l(θ)] + θ Θ ( + 1 l(θ) E[l(θ)] } {{} Random error ) ˆθ MLE argmax E[l(θ)] θ Θ (20) (21)
23 Intro LS MLE Other Theorems Info Proofs First we look at the random error, that is written so that we can apply the Law of Large umbers. If lim uniformly over Θ, then it follows that lim 1 l(θ) E[l(θ)] = 0 (22) ˆθ MLE = argmax E[l(θ)] (23) θ Θ
24 Intro LS MLE Other Theorems Info Proofs Finally, we denote any feasible density by q θ Q θ and assume that the true density p θ0 Q θ. The dierence in expected log-likelihood between the true density and any other density is given by E[log q θ (X )] E[log p θ0 (X )] = (log q θ (x) log p θ0 (x)) p θ0 (x)dx = (24) ( ) qθ (x) log p θ0 (x)dx p θ0 (x) (25) Log is a concave function. We know from Jensen's inequality that E[log(ξ)] log(e[ξ])
25 Intro LS MLE Other Theorems Info Proofs Finally, we denote any feasible density by q θ Q θ and assume that the true density p θ0 Q θ. The dierence in expected log-likelihood between the true density and any other density is given by E[log q θ (X )] E[log p θ0 (X )] = (log q θ (x) log p θ0 (x)) p θ0 (x)dx = (24) ( ) qθ (x) log p θ0 (x)dx p θ0 (x) (25) Log is a concave function. We know from Jensen's inequality that E[log(ξ)] log(e[ξ])
26 Intro LS MLE Other Theorems Info Proofs Finally, we denote any feasible density by q θ Q θ and assume that the true density p θ0 Q θ. The dierence in expected log-likelihood between the true density and any other density is given by E[log q θ (X )] E[log p θ0 (X )] = (log q θ (x) log p θ0 (x)) p θ0 (x)dx = (24) ( ) qθ (x) log p θ0 (x)dx p θ0 (x) (25) Log is a concave function. We know from Jensen's inequality that E[log(ξ)] log(e[ξ])
27 Intro LS MLE Other Theorems Info Proofs That means that ( qθ (x) log p θ0 (x) ) p θ0 (x)dx log ( ) qθ (x) p θ0 (x) p θ 0 (x)dx = 0 (26) We only get equality when q θ (x) = p θ0 (x) for all values of x. The conclusion is that and hence that as the random error vanishes as. θ 0 = argmax E[l(θ)] (27) θ Θ lim ˆθ MLE = θ 0 (28)
28 Intro LS MLE Other Theorems Info Proofs That means that ( qθ (x) log p θ0 (x) ) p θ0 (x)dx log ( ) qθ (x) p θ0 (x) p θ 0 (x)dx = 0 (26) We only get equality when q θ (x) = p θ0 (x) for all values of x. The conclusion is that and hence that as the random error vanishes as. θ 0 = argmax E[l(θ)] (27) θ Θ lim ˆθ MLE = θ 0 (28)
29 Intro LS MLE Other Theorems Info Proofs Proof Ib ( Rewrite L(X) = exp log p θ (X 1 ) + ) n=2 log p θ(x n X 1:n 1 ). Second order Taylor expansion around θ 0. Maximize (ˆθ θ 0 )... Consistency and asymp normality follows.
30 Intro LS MLE Other Theorems Info Proofs Proof I, Ext. Write l (θ) = log L(X θ). This is a sum consisting of terms (think LL and CLT!). Approximate l (θ) using a second order Taylor expansion around θ 0. Thus l (θ) = l (θ 0 ) + θ l (θ 0 )(θ θ 0 ) θ l (θ 0 )(θ θ 0 ) 2 + R. Ignore the last term, multiply l with 1/ and maximize wrt θ to obtain ˆθ. We obtain 1 θ l (θ 0 ) θ l (θ 0 ) (ˆθ θ 0 ) := 0.
31 Intro LS MLE Other Theorems Info Proofs Proof I, Ext. We have from LL that and from CLT that 1 2 θ l (θ 0 ) I (θ 0 ), 1 θ l (θ 0 ) Z, where Z (0, I (θ 0 )). Rearranging gives (ˆθ θ 0 ) d (0, I 1 (θ 0 )).
32 Intro LS MLE Other Theorems Info Proofs Proof II Denote the score function U = θ log θ p(x ) and another estimator by T. Assume that E[T ] = θ 0 (to rule out anomalies). Calculate Cov(U, T ) Cov(U, T ) = E[UT ] E[U]E[T ] = E[UT ] θ 0 0 E[UT ] = T (x) θ log p θ 0 (x)p θ0 (x)dx T (x)pθ0 (x)dx = θ θ = 1. = θ Cauchy-Schwartz states Cov(U, T ) Var[U]Var[T ]. Thus Var[T ] Var[ θ log p θ(x )] 1 I.e. Var[T ] Var[ˆθ MLE ]!
33 Intro LS MLE Other Theorems Info Proofs Proof III Original problem Dene Y = g(x ). Calculations... Ok. ˆθ MLE = argmax p θ (X 1,... X ). θ Θ
34 Intro LS MLE Other Other estimators We can derive the asymptotical distribution using the same arguments for any M-estimator, i.e. for any estimator taking the estimate as the value that maximized/minimizes a function of data. Similar arguments can also be used for Z-estimators, i.e. estimator taking the estimate as the value that solves a system of equations depending on data.
35 Intro LS MLE Other M-estimators Take any loss function J(θ), such that E.g. PEM: J(θ) = i=1 ε i(θ) 2. The corresponding problem is ˆθ = argmax J(θ). (29) θ Θ 1 θ J(θ 0 ) θ J(θ 0) (ˆθ θ 0 ) := 0. Assume that E[ 1 θ J(θ 0 )] = 0 (Why?), Var[ 1 θ J(θ 0 )] = W and E[ 1 2 θ J(θ 0)] = V. Applying the limit theorems as in the Maximum likelihood setup gives the asymptotics. Result: (ˆθ θ 0 ) d (0, V 1 W (V 1 ) T ).
36 Intro LS MLE Other Z-estimators Take any loss function G(θ), such that ˆθ = argmax G(θ) = 0. (30) θ Θ E.g. Estimation functions G(θ) = g i (θ)(x i E[X i ]) Compute a rst order Taylor expansion around θ 0, and solve wrt θ. Thus G(θ 0 ) + θ G(θ 0 )(ˆθ θ 0 ) := 0. Multiply by 1/ and apply limit theorems. Let E[ 1 G(θ 0 )] = 0 (why?), Var[ 1 G(θ 0 )] = W and E[ 1 θg(θ 0 )] = V. Then (ˆθ θ 0 ) d (0, V 1 W (V 1 ) T ).
37 Intro LS MLE Other The likelihood framework can be used to test the t of models. Condence intervals Likelihood ratio tests
38 Intro LS MLE Other Condence intervals. Assume (ˆθ θ 0 ) d (0, Σ). Then I θ0 ˆθ ± 2 diag(σ). Better accuracy: Use prole likelihood!
39 Intro LS MLE Other Likelihood based tests. Likelihood ratio tests where Θ R Θ F. Then Λ = {sup L(X θ), θ ΘF } {sup L(X θ), θ Θ R } 2 log(λ) χ 2 (k), k being the degrees of freedom. LR tests are optimal, cf. eyman-pearson. and are closely related to AIC, BIC etc.
40 Intro LS MLE Other Likelihood ratios Compare L(X ˆθ) to L(X θ 0 ). Taylor expand We have that ( ) 2 log(λ) = 2 l (ˆθ) l (θ 0 ). l (θ) = l (θ 0 ) + θ l (θ 0 )(θ θ 0 ) θ l (θ 0 )(θ θ 0 ) 2 + R. The estimate must solve θ l (θ 0 ) + 2 θ l (θ 0 )(ˆθ θ 0 ) = 0. Thus 2 log(λ) = 2 ( ) 1 2 θl (θ 0 )(ˆθ θ 0 ).
41 Intro LS MLE Other Likelihood ratios, cont. Plugging in (ˆθ θ 0 ) = ( 2 θ l (θ 0 )) 1 θ l (θ 0 ) gives 2 log(λ) = θ l (θ 0 )( 2 θ l (θ 0 )) 1 θ l (θ 0 ) Or nicer written 2 log(λ) = 1 θ l (θ 0 )( 1 2 θ l (θ 0 )) 1 1 θ l (θ 0 ). This is a quadratic form, distributed as χ 2 (k), k being the degrees of freedom.
42 Intro LS MLE Other Feedback Send feedback, questions and information about typos to
Introduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationChapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program
Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationEstimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1
Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationThe Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models
The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationZ-estimators (generalized method of moments)
Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationInformation in a Two-Stage Adaptive Optimal Design
Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More information5.2 Fisher information and the Cramer-Rao bound
Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationEvaluating the Performance of Estimators (Section 7.3)
Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationClassical Estimation Topics
Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationEstimation of arrival and service rates for M/M/c queue system
Estimation of arrival and service rates for M/M/c queue system Katarína Starinská starinskak@gmail.com Charles University Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationModel selection in penalized Gaussian graphical models
University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst January 2014 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationTime Series Analysis Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 4.384 Time Series Analysis Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Indirect Inference 4.384 Time
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationi=1 h n (ˆθ n ) = 0. (2)
Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationLecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches Ref: Huber,
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationCh. 5 Hypothesis Testing
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium November 12, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationFall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.
14.385 Fall, 2007 Nonlinear Econometrics Lecture 2. Theory: Consistency for Extremum Estimators Modeling: Probit, Logit, and Other Links. 1 Example: Binary Choice Models. The latent outcome is defined
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationMathematics Ph.D. Qualifying Examination Stat Probability, January 2018
Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationHypothesis testing: theory and methods
Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationSpring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle
More informationEconomics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation
Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 9: Asymptotics III(MLE) 1 / 20 Jensen
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More information