Lecture Stat Information Criterion
|
|
- Anthony Dalton
- 5 years ago
- Views:
Transcription
1 Lecture Stat Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February / 34
2 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution of these data by a parametric model ff (xj θ) ; θ 2 Θ R p g. We estimate θ by ML; that is bθ = arg maxl (θ) where l (θ) := n k=1 log f (X k j θ) θ2θ We know that as n! then bθ converges towards θ = arg minkl (g (x) ; f (xj θ)) θ2θ where Z KL (g (x) ; f (xj θ)) := g (x) log g (x) f (xj θ) dx Arnaud Doucet () February / 34
3 A Central Limit Theorem Assuming the MLE is asymptotically consistent, then we have l (θ) 0 = l (θ) = θ θ + 2 l (θ) b θ θ θ T θ θ +... θ The law of large numbers yields 1 l (θ) n θ θ T! J (θ ) := θ The CLT yields where b θ p n 1 n Z I (θ ) = g (x) l (θ) θ θ Z log f (xj θ) θ g (x) 2 log f (xj θ) θ θ T D! N (0, I (θ )) θ log f (xj θ) θ T θ θ dx dx Arnaud Doucet () February / 34
4 Using Slutzky s theorem we have p n b θ θ D! N 0, J 1 (θ ) I (θ ) J 1 (θ ). This is sometimes known as the sandwitch variance. When g (x) = f (xj θ ) then we have J (θ ) = I (θ ) and the standard CLT follows p n b θ θ D! N 0, I 1 (θ ) Arnaud Doucet () February / 34
5 Evaluating The Statistical Model We want to compare our model f xj b θ to g (x). One way consists of evaluating KL g (x) ; f xj b Z θ = g (x) log g (x) dx and the larger R g (x) log f true one. The crucial issue is to obtain an estimate of E g hlog f X j b i Z θ = g (x) log f Z g (x) log f xj b θ dx. xj b θ dx, the closer the model is to the xj b θ dx. Clearly an estimate of E g hlog f X j b i θ is n 1 l b θ and an estimate of ne g hlog f X j b i θ is l b θ. Arnaud Doucet () February / 34
6 Selecting a Model In practice, it is di cult to precisely capture the true structure of given phenomena from a limited number of data. Often several candidate statistical models are selected and we want to select the model that closely approximates the true distribution of the data. Given m candidate models ff i (xj θ i ) ; i = 1,..., mg and the associated ML estimates bθ i, a simple solution would consist of selecting the model j where j = arg max l i b θ i ; i2f1,...,mg i.e. picking the model for which KL g (x) ; f i xj b θ i is the smallest. Arnaud Doucet () February / 34
7 Be Careful! This approach does not provide a fair comparison of models. The quantity l i b θ i contains a bias as an estimator of h ne g log f i X j b θ i i. The reason is that the data X 1,..., X n are used twice: to estimate bθ i and to approximate the expectation with respect to g. We will show that the size of the bias depends on the size p of the parameter θ. Arnaud Doucet () February / 34
8 Relationship between log-likelihood and expected log-likelihood Let us denote θi = arg min KL (g (x) ; f i (xj θ i )) the true parameter. We necessarily have h E g log f i X j b i θ i E g [log f i (X j θi )] whereas l i b θ i l i (θ i ). To correct for this bias, we want to estimate h h b i (g) = E g l i b θ i ne g log f i X j b ii θ i where remember that and bθ i = bθ i (X 1:n ). l i b θ i = n k=1 log f i X k j b θ i Arnaud Doucet () February / 34
9 Information Criterion The information criterion for model i is de ned as IC (i) = 2 n k=1 log f i X k j b θ i + 2 festimator for b i (g)g IC(i) is a biased-corrected estimate of minus the expected log-likelihood. So given a collection of models, we will select the model j = arg min IC (i) i2f1,...,mg Arnaud Doucet () February / 34
10 Derivation of Bias We want to estimate the bias b (g) which is given by E g hl b θ (X 1:n ) ne g hlog f i = E g hl b θ (X 1:n ) l (θ ) {z } D 1 +E g [l (θ ) ne g [log f (X j θ )]] {z } D 2 h +E g X j b ii θ (X 1:n ) ne g [log f (X j θ )] ne g hlog f X j b ii θ (X 1:n ) {z } D 3 Arnaud Doucet () February / 34
11 Calculation of second term We have D 2 = E g [l (θ ) ne g [log f (X j θ )]] h i n = E g k=1 log f (X k j θ ) ne g [log f (X j θ )] = 0 No bias for this term... Arnaud Doucet () February / 34
12 Calculation of third term Let us denote h η b θ = E g log f X j b i θ. By performing a Taylor expansion around θ, we obtain η b θ = η (θ η (θ) T ) + b θ θ θ + 1 b θ θ T 2 η (θ) θ 2 θ θ T = 0 then θ As η(θ) θ where J (θ ) = η b θ η (θ ) 2 η (θ) θ θ T = θ 1 b θ θ T J (θ ) b θ θ 2 Z g (x) 2 log f (xj θ) θ θ T θ θ dx b θ θ Arnaud Doucet () February / 34
13 It follows that i D 3 = E g hn η (θ ) η b θ n b 2 E g θ θ T J (θ ) b θ θ = n 2 E g tr J (θ ) b θ θ bθ θ T = n b 2 tr J (θ ) E g θ θ bθ θ T We have from the CLT established previously we have E g b θ θ bθ θ T 1 n J (θ ) 1 I (θ ) J (θ ) 1 so D tr n I (θ ) J (θ ) 1o Arnaud Doucet () February / 34
14 Calculation of rst term We have l (θ) = l b θ so + l (θ) θ T bθ θ bθ + 1 θ 2 bθ T 2 l (θ) θ θ T l (θ ) l b θ n T θ bθ J (θ ) θ 2 b θ θ bθ bθ +... D 1 = E g hl b θ (X 1:n ) n 2 E g θ bθ = n 2 E g i l (θ ) T J (θ ) θ T J (θ ) tr θ bθ θ = 1 2 tr I (θ ) J (θ ) 1 bθ bθ Arnaud Doucet () February / 34
15 Estimate of the Bias We have b (g) = D 1 + D 2 + D 3 1 n 2 tr I (θ ) J (θ ) 1o n 2 tr I (θ ) J (θ ) 1o = tr ni (θ ) J (θ ) 1o Let bi and bj be consistent estimate of I (θ ) and J (θ ), say bi = 1 log f (X k j θ) n n log f (X k j θ) k=1 θ b θ θ T, b θ 1 bj = 2 log f (X k j θ) n n k=1 θ θ T then we can estimate the bias through bb (g) = tr b I bj 1 b θ Arnaud Doucet () February / 34
16 Information Criterion We have for the information criterion for model i IC (i) = 2 n k=1 log f i X k j b θ i + 2tr b I i bj i 1 Assuming g (x) = f i (xj θi ) then I i (θi ) = J i (θi ) 1 so b i (g) = p i (where p i is the dimension of θ i ) so in this case AIC (i) = 2 n k=1 log f i X k j b θ i + 2p i. Arnaud Doucet () February / 34
17 Checking the Equality of Two Discrete Distributions Assume we have two sets of data each having k categories and Category 1 2 k Data set 1 n 1 n 2 n k Data set 2 m 1 m 2 m k where we have n n k = n observations for the rst dataset and m m k = m for the second. We assume these datasets follow the multinomial distributions p (n 1,, n k j p 1,, p k ) = p (m 1,, m k j q 1,, q k ) = n! n 1! n k! pn 1 1 pn k k, m! m 1! m k! qm 1 1 q m k k We want to check whether p 1,, p k 6= q 1,, q k or p 1,, p k = q 1,, q k Arnaud Doucet () February / 34
18 Assume the two distributions are di erent then l (p 1:k, q 1:k ) = log n! We have the MLE So l (bp 1:k, bq 1:k ) = C + with C = log n! + log m! AIC = 2 C + k j=1 + log m! k (log n j! + n j log p j ) j=1 bp j = n j n, bq j = m j n k j=1 k (log m j! + m j log q j ) j=1 n j log n j n + m j log m j n k j=1 (log n j! + log m j!) and n j log n j n + m j log m! j + 4 (k 1) n Arnaud Doucet () February / 34
19 If we assume the two distributions are equal then l (r) = C k (n j + m j ) log r j j=1 and We have l (br 1:k ) = C + br j = n j + m j n + m k j=1 nj + m j (n j + m j ) log n + m and AIC = 2 C + k j=1 nj +! m j (n j + m j ) log + 2 (k 1) n + m Arnaud Doucet () February / 34
20 Example Consider the following data Category Data set Data set From this table we can deduce Category bp j bq j br j Arnaud Doucet () February / 34
21 Ignoring the constant C, the maximum log-likelihood of the models are Model 1 : Model 2 : k n j log n j j=1 k (n j + m j ) log j=1 n + m j log m j = n , nj + m j = n + m The number of free parameters of the models is 2 (k 1) = 8 in model 1 and 4 in model 2. So it follows that AIC for model 1 and model 2 is respectively 9150,73 and 9179,22 so AIC selects Model 1. Arnaud Doucet () February / 34
22 Determining the number of bin size of an histogram Histograms are use for representing the properties of a set of observations obtained from either a discrete or continuous distribution. Assume we have a histogram fn 1, n 2,..., n k g where k is refer to as the bin size. If k is too large or too small, then it is di cult to capture the characteristics of the true distribution. We can think of an histogram as a model speci ed by a multinomial distribution p (n 1,, n k j p 1,, p k ) = where n n k = n, p p k = 1. n! n 1! n k! pn 1 1 pn k k Arnaud Doucet () February / 34
23 In this case, we have seen that the MLE is bp j = n j n l (bp 1:k ) = C + with C = log n! k j=1 log n j! We have k 1 free parameters so ( AIC = 2 C + k j=1 k j=1 n j log n j n n j log n j n ) and + 2 (k 1). We want to compare this model to a model with lower resolutions. Arnaud Doucet () February / 34
24 To compare the histogram model with a simpler one, we may assume p 2j 1 = p 2j for j = 1,...m and for sake of simplicity, we consider here k = 2m. In this case, the new MLE is bp 2j 1 = bp 2j = n 2j 1+n 2j 2n In this case, the AIC is given by ( ) AIC = 2 C + m j=1 (n 2j 1 + n 2j ) log n 2j 1 + n 2j 2n + 2 (m 1). Similarly we can consider the histogram with m bins where k = 4m. We apply this to galaxy data for which AIC selects 14 bin size log-like AIC Arnaud Doucet () February / 34
25 Application to Polynomial Regression We are given n observations fx i, y i g. We want to use the following model y = β 0 + β 1 x + β 2 x β k x k + ε, ε N 0, σ 2 where k is unknown. We have θ = β 0:k, σ 2 and l (θ) = n 2 log 2πσ2 1 2σ 2 n j=1 y j k m=0 β m x m j!! 2 We can easily establish that l b θ = n 2 log 2πbσ 2 n 2 Arnaud Doucet () February / 34
26 Experimental Results In this case for the polynomial model of order k we have k + 2 unknown parameters so This yields AIC (k) = n (log 2π + 1) + n log bσ (k + 2) k l b θ k AIC AIC selects model 4. Arnaud Doucet () February / 34
27 Application to Daily Temperature Data Daily minimum temperatures y i in January averaged from 1971 through 2000 for city i x i1 latitudes, x i2 longitudes and x i3 altitudes. A standard model is where ε i N 0, σ 2. y i = a 0 + a 1 x i1 + a 2 x i2 + a 3 x i3 + ε i But we would also like to compare models where some of the explanatory variables are omitted. Arnaud Doucet () February / 34
28 Results We have AIC (model) = n (log 2π + 1) + n log bσ (nb. explanatority var. + 2) This yields model x 1, x 3 x 1, x 2, x 3 x 1, x 2 x 1 x 2, x 3 x 2 x 3 bσ AIC We select the model with ε i N (0, 1.49). y i = x i x i3 + ε i Arnaud Doucet () February / 34
29 Selection of Order of Autoregressive Model We have the following model y k = m a i y k i + ε k, i.i.d. ε k N 0, σ 2. i=1 Assuming we have data y 1, y 2,..., y n and that y 0, y 1,..., y 1 m are deterministic then we can easily obtain the MLE estimate and check that l b θ = n 2 log 2πbσ 2 n 2 Since the model with order m has m + 1 unknown parameters then AIC (m) = n (log 2π + 1) + n log bσ (m + 1) Arnaud Doucet () February / 34
30 Application to Canadian Lynx Data This corresponds to the logarithms of the annual numbers of lynx trapped from 1821 to 1934; n = 114 observations. We try 20 candidate models and m = 11 is selected using AIC. Arnaud Doucet () February / 34
31 Detection of Structural Changes We some times encounter the situation in which the stochastic structure of the data changes at a certain time or location. Let us start with a simple model where and µ n = Y n N θ1 θ 2 µ n, σ 2 if n < k if n k where k is the unknown so-called changepoint. We have N data and, for the model k, the likelihood is L θ 1, θ 2, σ 2 = k 1 n=1 N y n; θ 1, σ 2 N n=k N y n; θ 2, σ 2 Arnaud Doucet () February / 34
32 It can be easily established that bθ 1 = bσ 2 = 1 N 1 k 1 k 1 n=1 ( k 1 n=1 y n, bθ 2 = y n k N k + 1 N y n, n=k ) 2 N 2 bθ 1 + y n bθ 2 n=k So we have the maximum log-likelihood given by l b θ 1, bθ 2, bσ 2 = N 2 log 2πbσ 2 where we emphasize that bσ 2 is a function of k, The AIC criterion is given by AIC = N log 2πbσ 2 + N and the changepoint k can be automatically determined by nding the value of k that gives the smallest AIC. Arnaud Doucet () February / 34 N 2
33 Application to Factor Analysis Suppose we have x = (x 1,..., x p ) T a vector of mean µ and variance-covariance Σ. The factor analysis model is x = µ + Lf + ε where L is a p m matrix of factor loadings whereas f = (f 1,..., f m ) T and ε = (ε 1,..., ε p ) T are unobserved random vectors assumed to satisfy E [f ] = 0, Cov [f ] = I m, E [ε] = 0, Cov [ε] = Ψ = diag (Ψ 1,..., Ψ p ), Cov [f, ε] = 0. It then follows that Σ = LL T + Ψ. Arnaud Doucet () February / 34
34 Let S denote the sample covariance. Under a normal assumption for f and ε, it can be shown that the ML estimates of L and Ψ can be obtained by minimizing log jσj log jsj + tr Σ 1 S p subject to L T Ψ 1 L being a diagonal matrix. In this case, we have n AIC (m) = n p log (2π) + log Σ b + tr Σ 1 S o 1 +2 p (m + 1) m (m 1) 2 Arnaud Doucet () February / 34
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria
ECOOMETRICS II (ECO 24S) University of Toronto. Department of Economics. Winter 26 Instructor: Victor Aguirregabiria FIAL EAM. Thursday, April 4, 26. From 9:am-2:pm (3 hours) ISTRUCTIOS: - This is a closed-book
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationGMM, HAC estimators, & Standard Errors for Business Cycle Statistics
GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationExercises Chapter 4 Statistical Hypothesis Testing
Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationMeasuring robustness
Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationPANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1
PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,
More informationIdentifiability, Invertibility
Identifiability, Invertibility Defn: If {ǫ t } is a white noise series and µ and b 0,..., b p are constants then X t = µ + b 0 ǫ t + b ǫ t + + b p ǫ t p is a moving average of order p; write MA(p). Q:
More informationChapter 2. Dynamic panel data models
Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationSize Distortion and Modi cation of Classical Vuong Tests
Size Distortion and Modi cation of Classical Vuong Tests Xiaoxia Shi University of Wisconsin at Madison March 2011 X. Shi (UW-Mdsn) H 0 : LR = 0 IUPUI 1 / 30 Vuong Test (Vuong, 1989) Data fx i g n i=1.
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More information1 The Multiple Regression Model: Freeing Up the Classical Assumptions
1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationTime Series Analysis Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 4.384 Time Series Analysis Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Indirect Inference 4.384 Time
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationEconomics 241B Review of Limit Theorems for Sequences of Random Variables
Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationParametric Inference on Strong Dependence
Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation
More informationSEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES. Victor Aguirregabiria (Boston University) and. Pedro Mira (CEMFI) Applied Micro Workshop at Minnesota
SEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES Victor Aguirregabiria (Boston University) and Pedro Mira (CEMFI) Applied Micro Workshop at Minnesota February 16, 2006 CONTEXT AND MOTIVATION Many interesting
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria
ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationFöreläsning /31
1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +
More informationStochastic Convergence, Delta Method & Moment Estimators
Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)
More informationChapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program
Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationChapter 6: Endogeneity and Instrumental Variables (IV) estimator
Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.
More informationOn the Power of Tests for Regime Switching
On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationGMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails
GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationECONOMETRICS FIELD EXAM Michigan State University May 9, 2008
ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 Instructions: Answer all four (4) questions. Point totals for each question are given in parenthesis; there are 00 points possible. Within
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationEL1820 Modeling of Dynamical Systems
EL1820 Modeling of Dynamical Systems Lecture 9 - Parameter estimation in linear models Model structures Parameter estimation via prediction error minimization Properties of the estimate: bias and variance
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationTrending Models in the Data
April 13, 2009 Spurious regression I Before we proceed to test for unit root and trend-stationary models, we will examine the phenomena of spurious regression. The material in this lecture can be found
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationTime Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley
Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More information1 The Well Ordering Principle, Induction, and Equivalence Relations
1 The Well Ordering Principle, Induction, and Equivalence Relations The set of natural numbers is the set N = f1; 2; 3; : : :g. (Some authors also include the number 0 in the natural numbers, but number
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationxtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels
: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels Pedro Albarran* Raquel Carrasco** Jesus M. Carro** *Universidad de Alicante **Universidad Carlos III de Madrid 2017 Spanish Stata
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationThe problem is to infer on the underlying probability distribution that gives rise to the data S.
Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationIntroduction: structural econometrics. Jean-Marc Robin
Introduction: structural econometrics Jean-Marc Robin Abstract 1. Descriptive vs structural models 2. Correlation is not causality a. Simultaneity b. Heterogeneity c. Selectivity Descriptive models Consider
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More information9.1 Orthogonal factor model.
36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms
More informationAGEC 661 Note Fourteen
AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationTesting composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised
MODDEMEIJER: TESTING COMPOSITE HYPOTHESES; THE AKAIKE-CRITERION REVISED 1 Testing composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised Rudy Moddemeijer Abstract Akaike
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More information1. The OLS Estimator. 1.1 Population model and notation
1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationGeneralized Method of Moments (GMM) Estimation
Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary
More informationW-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS
1 W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS An Liu University of Groningen Henk Folmer University of Groningen Wageningen University Han Oud Radboud
More informationHeteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.
47 3!,57 Statistics and Econometrics Series 5 Febrary 24 Departamento de Estadística y Econometría Universidad Carlos III de Madrid Calle Madrid, 126 2893 Getafe (Spain) Fax (34) 91 624-98-49 VARIANCE
More informationProbability on a Riemannian Manifold
Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and
More informationTopic 4: Model Specifications
Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLecture 2: Univariate Time Series
Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:
More information