Lecture Stat Information Criterion

Size: px
Start display at page:

Download "Lecture Stat Information Criterion"

Transcription

1 Lecture Stat Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February / 34

2 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution of these data by a parametric model ff (xj θ) ; θ 2 Θ R p g. We estimate θ by ML; that is bθ = arg maxl (θ) where l (θ) := n k=1 log f (X k j θ) θ2θ We know that as n! then bθ converges towards θ = arg minkl (g (x) ; f (xj θ)) θ2θ where Z KL (g (x) ; f (xj θ)) := g (x) log g (x) f (xj θ) dx Arnaud Doucet () February / 34

3 A Central Limit Theorem Assuming the MLE is asymptotically consistent, then we have l (θ) 0 = l (θ) = θ θ + 2 l (θ) b θ θ θ T θ θ +... θ The law of large numbers yields 1 l (θ) n θ θ T! J (θ ) := θ The CLT yields where b θ p n 1 n Z I (θ ) = g (x) l (θ) θ θ Z log f (xj θ) θ g (x) 2 log f (xj θ) θ θ T D! N (0, I (θ )) θ log f (xj θ) θ T θ θ dx dx Arnaud Doucet () February / 34

4 Using Slutzky s theorem we have p n b θ θ D! N 0, J 1 (θ ) I (θ ) J 1 (θ ). This is sometimes known as the sandwitch variance. When g (x) = f (xj θ ) then we have J (θ ) = I (θ ) and the standard CLT follows p n b θ θ D! N 0, I 1 (θ ) Arnaud Doucet () February / 34

5 Evaluating The Statistical Model We want to compare our model f xj b θ to g (x). One way consists of evaluating KL g (x) ; f xj b Z θ = g (x) log g (x) dx and the larger R g (x) log f true one. The crucial issue is to obtain an estimate of E g hlog f X j b i Z θ = g (x) log f Z g (x) log f xj b θ dx. xj b θ dx, the closer the model is to the xj b θ dx. Clearly an estimate of E g hlog f X j b i θ is n 1 l b θ and an estimate of ne g hlog f X j b i θ is l b θ. Arnaud Doucet () February / 34

6 Selecting a Model In practice, it is di cult to precisely capture the true structure of given phenomena from a limited number of data. Often several candidate statistical models are selected and we want to select the model that closely approximates the true distribution of the data. Given m candidate models ff i (xj θ i ) ; i = 1,..., mg and the associated ML estimates bθ i, a simple solution would consist of selecting the model j where j = arg max l i b θ i ; i2f1,...,mg i.e. picking the model for which KL g (x) ; f i xj b θ i is the smallest. Arnaud Doucet () February / 34

7 Be Careful! This approach does not provide a fair comparison of models. The quantity l i b θ i contains a bias as an estimator of h ne g log f i X j b θ i i. The reason is that the data X 1,..., X n are used twice: to estimate bθ i and to approximate the expectation with respect to g. We will show that the size of the bias depends on the size p of the parameter θ. Arnaud Doucet () February / 34

8 Relationship between log-likelihood and expected log-likelihood Let us denote θi = arg min KL (g (x) ; f i (xj θ i )) the true parameter. We necessarily have h E g log f i X j b i θ i E g [log f i (X j θi )] whereas l i b θ i l i (θ i ). To correct for this bias, we want to estimate h h b i (g) = E g l i b θ i ne g log f i X j b ii θ i where remember that and bθ i = bθ i (X 1:n ). l i b θ i = n k=1 log f i X k j b θ i Arnaud Doucet () February / 34

9 Information Criterion The information criterion for model i is de ned as IC (i) = 2 n k=1 log f i X k j b θ i + 2 festimator for b i (g)g IC(i) is a biased-corrected estimate of minus the expected log-likelihood. So given a collection of models, we will select the model j = arg min IC (i) i2f1,...,mg Arnaud Doucet () February / 34

10 Derivation of Bias We want to estimate the bias b (g) which is given by E g hl b θ (X 1:n ) ne g hlog f i = E g hl b θ (X 1:n ) l (θ ) {z } D 1 +E g [l (θ ) ne g [log f (X j θ )]] {z } D 2 h +E g X j b ii θ (X 1:n ) ne g [log f (X j θ )] ne g hlog f X j b ii θ (X 1:n ) {z } D 3 Arnaud Doucet () February / 34

11 Calculation of second term We have D 2 = E g [l (θ ) ne g [log f (X j θ )]] h i n = E g k=1 log f (X k j θ ) ne g [log f (X j θ )] = 0 No bias for this term... Arnaud Doucet () February / 34

12 Calculation of third term Let us denote h η b θ = E g log f X j b i θ. By performing a Taylor expansion around θ, we obtain η b θ = η (θ η (θ) T ) + b θ θ θ + 1 b θ θ T 2 η (θ) θ 2 θ θ T = 0 then θ As η(θ) θ where J (θ ) = η b θ η (θ ) 2 η (θ) θ θ T = θ 1 b θ θ T J (θ ) b θ θ 2 Z g (x) 2 log f (xj θ) θ θ T θ θ dx b θ θ Arnaud Doucet () February / 34

13 It follows that i D 3 = E g hn η (θ ) η b θ n b 2 E g θ θ T J (θ ) b θ θ = n 2 E g tr J (θ ) b θ θ bθ θ T = n b 2 tr J (θ ) E g θ θ bθ θ T We have from the CLT established previously we have E g b θ θ bθ θ T 1 n J (θ ) 1 I (θ ) J (θ ) 1 so D tr n I (θ ) J (θ ) 1o Arnaud Doucet () February / 34

14 Calculation of rst term We have l (θ) = l b θ so + l (θ) θ T bθ θ bθ + 1 θ 2 bθ T 2 l (θ) θ θ T l (θ ) l b θ n T θ bθ J (θ ) θ 2 b θ θ bθ bθ +... D 1 = E g hl b θ (X 1:n ) n 2 E g θ bθ = n 2 E g i l (θ ) T J (θ ) θ T J (θ ) tr θ bθ θ = 1 2 tr I (θ ) J (θ ) 1 bθ bθ Arnaud Doucet () February / 34

15 Estimate of the Bias We have b (g) = D 1 + D 2 + D 3 1 n 2 tr I (θ ) J (θ ) 1o n 2 tr I (θ ) J (θ ) 1o = tr ni (θ ) J (θ ) 1o Let bi and bj be consistent estimate of I (θ ) and J (θ ), say bi = 1 log f (X k j θ) n n log f (X k j θ) k=1 θ b θ θ T, b θ 1 bj = 2 log f (X k j θ) n n k=1 θ θ T then we can estimate the bias through bb (g) = tr b I bj 1 b θ Arnaud Doucet () February / 34

16 Information Criterion We have for the information criterion for model i IC (i) = 2 n k=1 log f i X k j b θ i + 2tr b I i bj i 1 Assuming g (x) = f i (xj θi ) then I i (θi ) = J i (θi ) 1 so b i (g) = p i (where p i is the dimension of θ i ) so in this case AIC (i) = 2 n k=1 log f i X k j b θ i + 2p i. Arnaud Doucet () February / 34

17 Checking the Equality of Two Discrete Distributions Assume we have two sets of data each having k categories and Category 1 2 k Data set 1 n 1 n 2 n k Data set 2 m 1 m 2 m k where we have n n k = n observations for the rst dataset and m m k = m for the second. We assume these datasets follow the multinomial distributions p (n 1,, n k j p 1,, p k ) = p (m 1,, m k j q 1,, q k ) = n! n 1! n k! pn 1 1 pn k k, m! m 1! m k! qm 1 1 q m k k We want to check whether p 1,, p k 6= q 1,, q k or p 1,, p k = q 1,, q k Arnaud Doucet () February / 34

18 Assume the two distributions are di erent then l (p 1:k, q 1:k ) = log n! We have the MLE So l (bp 1:k, bq 1:k ) = C + with C = log n! + log m! AIC = 2 C + k j=1 + log m! k (log n j! + n j log p j ) j=1 bp j = n j n, bq j = m j n k j=1 k (log m j! + m j log q j ) j=1 n j log n j n + m j log m j n k j=1 (log n j! + log m j!) and n j log n j n + m j log m! j + 4 (k 1) n Arnaud Doucet () February / 34

19 If we assume the two distributions are equal then l (r) = C k (n j + m j ) log r j j=1 and We have l (br 1:k ) = C + br j = n j + m j n + m k j=1 nj + m j (n j + m j ) log n + m and AIC = 2 C + k j=1 nj +! m j (n j + m j ) log + 2 (k 1) n + m Arnaud Doucet () February / 34

20 Example Consider the following data Category Data set Data set From this table we can deduce Category bp j bq j br j Arnaud Doucet () February / 34

21 Ignoring the constant C, the maximum log-likelihood of the models are Model 1 : Model 2 : k n j log n j j=1 k (n j + m j ) log j=1 n + m j log m j = n , nj + m j = n + m The number of free parameters of the models is 2 (k 1) = 8 in model 1 and 4 in model 2. So it follows that AIC for model 1 and model 2 is respectively 9150,73 and 9179,22 so AIC selects Model 1. Arnaud Doucet () February / 34

22 Determining the number of bin size of an histogram Histograms are use for representing the properties of a set of observations obtained from either a discrete or continuous distribution. Assume we have a histogram fn 1, n 2,..., n k g where k is refer to as the bin size. If k is too large or too small, then it is di cult to capture the characteristics of the true distribution. We can think of an histogram as a model speci ed by a multinomial distribution p (n 1,, n k j p 1,, p k ) = where n n k = n, p p k = 1. n! n 1! n k! pn 1 1 pn k k Arnaud Doucet () February / 34

23 In this case, we have seen that the MLE is bp j = n j n l (bp 1:k ) = C + with C = log n! k j=1 log n j! We have k 1 free parameters so ( AIC = 2 C + k j=1 k j=1 n j log n j n n j log n j n ) and + 2 (k 1). We want to compare this model to a model with lower resolutions. Arnaud Doucet () February / 34

24 To compare the histogram model with a simpler one, we may assume p 2j 1 = p 2j for j = 1,...m and for sake of simplicity, we consider here k = 2m. In this case, the new MLE is bp 2j 1 = bp 2j = n 2j 1+n 2j 2n In this case, the AIC is given by ( ) AIC = 2 C + m j=1 (n 2j 1 + n 2j ) log n 2j 1 + n 2j 2n + 2 (m 1). Similarly we can consider the histogram with m bins where k = 4m. We apply this to galaxy data for which AIC selects 14 bin size log-like AIC Arnaud Doucet () February / 34

25 Application to Polynomial Regression We are given n observations fx i, y i g. We want to use the following model y = β 0 + β 1 x + β 2 x β k x k + ε, ε N 0, σ 2 where k is unknown. We have θ = β 0:k, σ 2 and l (θ) = n 2 log 2πσ2 1 2σ 2 n j=1 y j k m=0 β m x m j!! 2 We can easily establish that l b θ = n 2 log 2πbσ 2 n 2 Arnaud Doucet () February / 34

26 Experimental Results In this case for the polynomial model of order k we have k + 2 unknown parameters so This yields AIC (k) = n (log 2π + 1) + n log bσ (k + 2) k l b θ k AIC AIC selects model 4. Arnaud Doucet () February / 34

27 Application to Daily Temperature Data Daily minimum temperatures y i in January averaged from 1971 through 2000 for city i x i1 latitudes, x i2 longitudes and x i3 altitudes. A standard model is where ε i N 0, σ 2. y i = a 0 + a 1 x i1 + a 2 x i2 + a 3 x i3 + ε i But we would also like to compare models where some of the explanatory variables are omitted. Arnaud Doucet () February / 34

28 Results We have AIC (model) = n (log 2π + 1) + n log bσ (nb. explanatority var. + 2) This yields model x 1, x 3 x 1, x 2, x 3 x 1, x 2 x 1 x 2, x 3 x 2 x 3 bσ AIC We select the model with ε i N (0, 1.49). y i = x i x i3 + ε i Arnaud Doucet () February / 34

29 Selection of Order of Autoregressive Model We have the following model y k = m a i y k i + ε k, i.i.d. ε k N 0, σ 2. i=1 Assuming we have data y 1, y 2,..., y n and that y 0, y 1,..., y 1 m are deterministic then we can easily obtain the MLE estimate and check that l b θ = n 2 log 2πbσ 2 n 2 Since the model with order m has m + 1 unknown parameters then AIC (m) = n (log 2π + 1) + n log bσ (m + 1) Arnaud Doucet () February / 34

30 Application to Canadian Lynx Data This corresponds to the logarithms of the annual numbers of lynx trapped from 1821 to 1934; n = 114 observations. We try 20 candidate models and m = 11 is selected using AIC. Arnaud Doucet () February / 34

31 Detection of Structural Changes We some times encounter the situation in which the stochastic structure of the data changes at a certain time or location. Let us start with a simple model where and µ n = Y n N θ1 θ 2 µ n, σ 2 if n < k if n k where k is the unknown so-called changepoint. We have N data and, for the model k, the likelihood is L θ 1, θ 2, σ 2 = k 1 n=1 N y n; θ 1, σ 2 N n=k N y n; θ 2, σ 2 Arnaud Doucet () February / 34

32 It can be easily established that bθ 1 = bσ 2 = 1 N 1 k 1 k 1 n=1 ( k 1 n=1 y n, bθ 2 = y n k N k + 1 N y n, n=k ) 2 N 2 bθ 1 + y n bθ 2 n=k So we have the maximum log-likelihood given by l b θ 1, bθ 2, bσ 2 = N 2 log 2πbσ 2 where we emphasize that bσ 2 is a function of k, The AIC criterion is given by AIC = N log 2πbσ 2 + N and the changepoint k can be automatically determined by nding the value of k that gives the smallest AIC. Arnaud Doucet () February / 34 N 2

33 Application to Factor Analysis Suppose we have x = (x 1,..., x p ) T a vector of mean µ and variance-covariance Σ. The factor analysis model is x = µ + Lf + ε where L is a p m matrix of factor loadings whereas f = (f 1,..., f m ) T and ε = (ε 1,..., ε p ) T are unobserved random vectors assumed to satisfy E [f ] = 0, Cov [f ] = I m, E [ε] = 0, Cov [ε] = Ψ = diag (Ψ 1,..., Ψ p ), Cov [f, ε] = 0. It then follows that Σ = LL T + Ψ. Arnaud Doucet () February / 34

34 Let S denote the sample covariance. Under a normal assumption for f and ε, it can be shown that the ML estimates of L and Ψ can be obtained by minimizing log jσj log jsj + tr Σ 1 S p subject to L T Ψ 1 L being a diagonal matrix. In this case, we have n AIC (m) = n p log (2π) + log Σ b + tr Σ 1 S o 1 +2 p (m + 1) m (m 1) 2 Arnaud Doucet () February / 34

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria ECOOMETRICS II (ECO 24S) University of Toronto. Department of Economics. Winter 26 Instructor: Victor Aguirregabiria FIAL EAM. Thursday, April 4, 26. From 9:am-2:pm (3 hours) ISTRUCTIOS: - This is a closed-book

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Exercises Chapter 4 Statistical Hypothesis Testing

Exercises Chapter 4 Statistical Hypothesis Testing Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Identifiability, Invertibility

Identifiability, Invertibility Identifiability, Invertibility Defn: If {ǫ t } is a white noise series and µ and b 0,..., b p are constants then X t = µ + b 0 ǫ t + b ǫ t + + b p ǫ t p is a moving average of order p; write MA(p). Q:

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Size Distortion and Modi cation of Classical Vuong Tests

Size Distortion and Modi cation of Classical Vuong Tests Size Distortion and Modi cation of Classical Vuong Tests Xiaoxia Shi University of Wisconsin at Madison March 2011 X. Shi (UW-Mdsn) H 0 : LR = 0 IUPUI 1 / 30 Vuong Test (Vuong, 1989) Data fx i g n i=1.

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

Time Series Analysis Fall 2008

Time Series Analysis Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 4.384 Time Series Analysis Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Indirect Inference 4.384 Time

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

SEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES. Victor Aguirregabiria (Boston University) and. Pedro Mira (CEMFI) Applied Micro Workshop at Minnesota

SEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES. Victor Aguirregabiria (Boston University) and. Pedro Mira (CEMFI) Applied Micro Workshop at Minnesota SEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES Victor Aguirregabiria (Boston University) and Pedro Mira (CEMFI) Applied Micro Workshop at Minnesota February 16, 2006 CONTEXT AND MOTIVATION Many interesting

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 Instructions: Answer all four (4) questions. Point totals for each question are given in parenthesis; there are 00 points possible. Within

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

EL1820 Modeling of Dynamical Systems

EL1820 Modeling of Dynamical Systems EL1820 Modeling of Dynamical Systems Lecture 9 - Parameter estimation in linear models Model structures Parameter estimation via prediction error minimization Properties of the estimate: bias and variance

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Trending Models in the Data

Trending Models in the Data April 13, 2009 Spurious regression I Before we proceed to test for unit root and trend-stationary models, we will examine the phenomena of spurious regression. The material in this lecture can be found

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

1 The Well Ordering Principle, Induction, and Equivalence Relations

1 The Well Ordering Principle, Induction, and Equivalence Relations 1 The Well Ordering Principle, Induction, and Equivalence Relations The set of natural numbers is the set N = f1; 2; 3; : : :g. (Some authors also include the number 0 in the natural numbers, but number

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

xtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels

xtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels : Dynamic Binary Random E ects Models Estimation with Unbalanced Panels Pedro Albarran* Raquel Carrasco** Jesus M. Carro** *Universidad de Alicante **Universidad Carlos III de Madrid 2017 Spanish Stata

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

The problem is to infer on the underlying probability distribution that gives rise to the data S.

The problem is to infer on the underlying probability distribution that gives rise to the data S. Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Introduction: structural econometrics. Jean-Marc Robin

Introduction: structural econometrics. Jean-Marc Robin Introduction: structural econometrics Jean-Marc Robin Abstract 1. Descriptive vs structural models 2. Correlation is not causality a. Simultaneity b. Heterogeneity c. Selectivity Descriptive models Consider

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface

More information

Testing composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised

Testing composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised MODDEMEIJER: TESTING COMPOSITE HYPOTHESES; THE AKAIKE-CRITERION REVISED 1 Testing composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised Rudy Moddemeijer Abstract Akaike

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Generalized Method of Moments (GMM) Estimation

Generalized Method of Moments (GMM) Estimation Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary

More information

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS 1 W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS An Liu University of Groningen Henk Folmer University of Groningen Wageningen University Han Oud Radboud

More information

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic. 47 3!,57 Statistics and Econometrics Series 5 Febrary 24 Departamento de Estadística y Econometría Universidad Carlos III de Madrid Calle Madrid, 126 2893 Getafe (Spain) Fax (34) 91 624-98-49 VARIANCE

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

Topic 4: Model Specifications

Topic 4: Model Specifications Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information