Advanced Quantitative Methods: maximum likelihood
|
|
- Aldous Phelps
- 5 years ago
- Views:
Transcription
1 Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011
2 1 Introduction
3 Outline Introduction 1 Introduction
4 Preliminaries Introduction Ordinary least squares and its variations (GLS, WLS, 2SLS, etc.) is very flexible and applicable in many circumstances, but for some models (e.g. limited dependent variable models) we need more flexible estimation procedures. The most prominent alternative estimators are maximum likelihood (ML) and Bayesian estimators. This lecture is about the former.
5 Logarithms Introduction Some rules about logarithms, which are important for understanding ML: logab = loga+logb loge a = a loga b = bloga loga > logb iff a > b d(log a) = 1 da a d(logf(a)) da = d(f(a))/da f(a)
6 Likelihood: intuition Dice example.
7 Likelihood: intuition Dice example. Imagine a dice is thrown with unknown number of sides k. We know after throwing the dice that the outcome is 5. How likely is this outcome if k = 0? And k = 1? Continue until k = 10. Plot of this is the likelihood function.
8 Likelihood Introduction Likelihood is the hypothetical probability that an event that has already occurred would yield a specific outcome. The concept differs from that of a probability in that a probability refers to the occurrence of future events, while a likelihood refers to past events with known outcomes. (Wolfram Mathworld)
9 Preliminaries Introduction Looking first at just univariate models, we can express the model as the probability density function (PDF) f(y,θ), where y is the dependent variable, θ the set of parameters, and f( ) expresses the model specification.
10 Preliminaries Introduction Looking first at just univariate models, we can express the model as the probability density function (PDF) f(y,θ), where y is the dependent variable, θ the set of parameters, and f( ) expresses the model specification. ML is purely parametric - we need to make strong assumptions about the shape of f( ) before we can estimate θ.
11 Likelihood Introduction Given θ, f(,θ) is the PDF of y.
12 Likelihood Introduction Given θ, f(,θ) is the PDF of y. Given y, f(y, ) cannot be interpreted as a PDF and is therefore called the likelihood function.
13 Likelihood Introduction Given θ, f(,θ) is the PDF of y. Given y, f(y, ) cannot be interpreted as a PDF and is therefore called the likelihood function. ˆθ ML is the θ that maximizes this likelihood function. (Davidson & MacKinnon 1999, )
14 (Log)likelihood function Generally, observations are assumed to be independent, in which case the joint density of the entire sample is the product of the densities of the individual observations. n f(y,θ) = f(y i,θ) i=1
15 (Log)likelihood function Generally, observations are assumed to be independent, in which case the joint density of the entire sample is the product of the densities of the individual observations. n f(y,θ) = f(y i,θ) It is easier to work with the log of this function, because sums are easier to deal with than products and the logarithmic transformation is monotonic: n n l(y,θ) logf(y,θ) = logf i (y i,θ) = l i (y i,θ) i=1 i=1 i=1 (Davidson & MacKinnon 1999, )
16 Example: exponential distribution For example, take the PDF of y to be f(y,θ) = θe θy y i > 0, θ > 0
17 Example: exponential distribution For example, take the PDF of y to be f(y,θ) = θe θy y i > 0, θ > 0 This distribution is useful when modeling something which is by definition positive. (Davidson & MacKinnon 1999, )
18 Example: exponential distribution Deriving the loglikelihood function: f(y,θ) = l(y,θ) = = n f(y i,θ) = i=1 n i=1 n logf(y i,θ) = i=1 θe θy i n log(θe θy i ) i=1 n (logθ θy i ) = nlogθ θ n i=1 i=1 y i (Davidson & MacKinnon 1999, )
19 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ).
20 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ). In some cases, such as the above exponential distribution or the linear model, we can find this value analytically, but taking the derivative, and setting equal to zero.
21 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ). In some cases, such as the above exponential distribution or the linear model, we can find this value analytically, but taking the derivative, and setting equal to zero. In many other cases, there is no such analytical solution, and we use numerical search algorithms to find the value.
22 Example: exponential distribution To find the maximum, we take the derivative and set this equal to zero: n l(y,θ) = nlogθ θ d(nlogθ θ n i=1 y i) dθ n n ˆθ y i = 0 ML i=1 ˆθ ML = = n θ n i=1 n n i=1 y i y i i=1 y i (Davidson & MacKinnon 1999, )
23 Numerical methods In many cases, closed form solutions like the examples above do not exist and numerical methods are necessary. A computer algorithm searches for the value that maximizes the loglikelihood.
24 Numerical methods In many cases, closed form solutions like the examples above do not exist and numerical methods are necessary. A computer algorithm searches for the value that maximizes the loglikelihood. In R: optim()
25 Newton s Method θ (m+1) = θ (m) H 1 (m) g (m) (Davidson & MacKinnon 1999, 401)
26 Outline Introduction 1 Introduction
27 Introduction Model: y = Xβ +ε, ε N(0,σ 2 I)
28 Introduction Model: y = Xβ +ε, ε N(0,σ 2 I) We assume ε to be independently distributed and therefore, conditional on X, y is assumed to be. f i (y i,β,σ 2 ) = 1 i X i β) 2πσ 2 e (y 2σ 2 2 (Davidson & MacKinnon 1999, )
29 : loglikelihood function f(y,β,σ 2 ) = l(y,β,σ 2 ) = n n f i (y i,β,σ 2 1 i ) = X i β) 2πσ 2 e (y 2σ 2 i=1 i=1 n logf i (y i,β,σ 2 ) = i=1 i=1 n 1 i log( X i β) 2πσ 2 e (y 2 2 2σ 2 )
30 : loglikelihoodfunction To zoom in on one part of this function: log 1 2πσ 2 = log 1 σ 2π = log1 log(σ 2π) = 0 (logσ +log 2π) = 1 2 logσ2 1 2 log2π,using logσ = 1 2 logσ2
31 : loglikelihood function f(y,β,σ 2 ) = l(y,β,σ 2 ) = n n f i (y i,β,σ 2 1 i ) = X i β) 2πσ 2 e (y 2σ 2 i=1 i=1 n logf i (y i,β,σ 2 ) = i=1 i=1 n 1 i log( X i β) 2πσ 2 e (y 2 2 2σ 2 )
32 : loglikelihood function f(y,β,σ 2 ) = l(y,β,σ 2 ) = = n f i (y i,β,σ 2 ) = i=1 n 1 i X i β) 2πσ 2 e (y 2σ 2 i=1 n logf i (y i,β,σ 2 ) = i=1 n i=1 n 1 i log( X i β) 2πσ 2 e (y i= σ 2 ) ( 1 2 logσ2 1 2 log2π 1 2σ 2(y i X i β) 2 ) = n 2 logσ2 n 2 log2π 1 2σ 2 n (y i X i β) 2 i=1 = n 2 logσ2 n 2 log2π 1 2σ 2(y Xβ) (y Xβ)
33 : estimating σ 2 l(y,β,σ 2 ) σ 2 = n 2σ σ 4(y Xβ) (y Xβ) 0 = n 2ˆσ ˆσ 4(y Xβ) (y Xβ) n 2ˆσ 2 = 1 2ˆσ 4(y Xβ) (y Xβ) n = 1ˆσ 2(y Xβ) (y Xβ) ˆσ 2 = 1 n (y Xβ) (y Xβ) Thus ˆσ 2 is a function of β. (Davidson & MacKinnon 1999, )
34 : estimating σ 2 ˆσ 2 = 1 n (y Xβ) (y Xβ) = e e n Note that this is almost the equivalent of the variance estimator of OLS ( e e n k ). This estimator is a biased, but consistent, estimator of ˆσ 2.
35 : estimating β l(y,β,σ 2 ) = n 2 logσ2 n 2 log2π 1 2σ 2(y Xβ) (y Xβ) l(y,β) = n 2 log(1 n (y Xβ) (y Xβ)) n 2 log2π 1 2( 1 n (y Xβ) (y Xβ)) (y Xβ) (y Xβ) = n 2 log(1 n (y Xβ) (y Xβ)) n 2 log2π n 2
36 : estimating β l(y,β,σ 2 ) = n 2 logσ2 n 2 log2π 1 2σ 2(y Xβ) (y Xβ) l(y,β) = n 2 log(1 n (y Xβ) (y Xβ)) n 2 log2π 1 2( 1 n (y Xβ) (y Xβ)) (y Xβ) (y Xβ) = n 2 log(1 n (y Xβ) (y Xβ)) n 2 log2π n 2 Because the middle term is equivalent to minus n/2 times the log of SSR, maximizing l(y,β) with respect to β is the equivalent to minimizing SSR: ˆβ ML = ˆβ OLS.
37 Introduction Creating some fake data: n <- 100 x1 <- rnorm(n) x2 <- rnorm(n) X <- cbind(1, x1, x2) y <- X %*% c(1, 2,.5) + rnorm(n, 0,.05)
38 Introduction l(y,β,σ 2 ) = n 2 logσ2 n 2 log2π 1 2σ 2(y Xβ) (y Xβ) ll <- function(par, X, y) { s2 <- exp(par[1]) beta <- par[-1] n <- length(y) r <- y - X %*% beta n/2 * log(s2) + n/2 * log(2*pi) + 1/(2*s2) * sum(r^2) } mlest <- optim(c(1,0,0,0), ll, NULL, X, y, hessian=true)
39 Introduction Some practical tricks: By default, optim() minimizes rather than maximizes, so we implemented l( ) rather than l( ). To get ˆσ 2, we need to take the exponent of the first parameter - this is done to make sure σ 2 is always assumed positive.
40 Outline Introduction 1 Introduction
41 Gradient vector The gradient vector or score vector is a vector with typical element: g j (y,θ) l(y,θ) θ j, i.e. the vector of partial derivatives of the loglikelihood function towards each parameter in θ. (Davidson & MacKinnon 1999, 400)
42 Hessian matrix H(θ) is a k k matrix with typical element h ij = 2 l(θ) θ i θ j, i.e. the matrix of second derivatives of the loglikelihood function. (Davidson & MacKinnon 1999, 401, 407)
43 Hessian matrix H(θ) is a k k matrix with typical element h ij = 2 l(θ) θ i θ j, i.e. the matrix of second derivatives of the loglikelihood function. Asymptotic equivalent: H(θ) plim n 1 n H(y,θ). (Davidson & MacKinnon 1999, 401, 407)
44 Information matrix I(θ) = n E θ ((G i (y,θ)) G i (y,θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. (Davidson & MacKinnon 1999, )
45 Information matrix I(θ) = n E θ ((G i (y,θ)) G i (y,θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. There is also an asymptotic equivalent, I(θ) plim n 1 n I(θ). (Davidson & MacKinnon 1999, )
46 Information matrix I(θ) = n E θ ((G i (y,θ)) G i (y,θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. There is also an asymptotic equivalent, I(θ) plim n 1 n I(θ). I(θ) = H(θ) - they both measure the amount of curvature in the loglikelihood function. (Davidson & MacKinnon 1999, )
47 Outline Introduction 1 Introduction
48 Introduction Under some weak conditions, ML estimators are consistent, asymptotically efficient and asymptotically normally distributed. (Davidson & MacKinnon 1999, )
49 Introduction Under some weak conditions, ML estimators are consistent, asymptotically efficient and asymptotically normally distributed. New ML estimators thus do not require extensive proof of this - once it is shown it is an ML estimator, it inherits those properties. (Davidson & MacKinnon 1999, )
50 Variance-covariance matrix of ˆβ ML V(plim n n(ˆβ ML θ)) = H 1 (θ)i(θ)h 1 (θ) = I 1 (θ), the latter being true iff I(θ) = H(θ) is true. (Davidson & MacKinnon 1999, )
51 Variance-covariance matrix of ˆβ ML V(plim n n(ˆβ ML θ)) = H 1 (θ)i(θ)h 1 (θ) = I 1 (θ), the latter being true iff I(θ) = H(θ) is true. One estimator for the variance is: V(ˆβ ML ) = H 1 (ˆβ ML ). For alternative estimators see reference below. (Davidson & MacKinnon 1999, )
52 Estimating V(ˆβ ML ) mlest <- optim(..., hessian=true) V <- solve(mlest$hessian) sqrt(diag(v)) Note that because we implemented l( ) instead of l( ), we also take H instead of H.
53 Outline Introduction 1 Introduction
54 Within the ML framework, there are three common tests: Likelihood ratio test (LR) Wald test (W) Lagrange multiplier test (LM) These are asymptotically equivalent. Given r restrictions, all have asymptotically a χ 2 (r) distribution. (Davidson & MacKinnon 1999, 414)
55 Within the ML framework, there are three common tests: Likelihood ratio test (LR) Wald test (W) Lagrange multiplier test (LM) These are asymptotically equivalent. Given r restrictions, all have asymptotically a χ 2 (r) distribution. In the following, θ ML will denote the restricted estimate and the unrestricted one. ˆθ ML (Davidson & MacKinnon 1999, 414)
56 Likelihood ratio test LR = 2(l(ˆθ ML ) l( θ ML )) = 2log L(ˆθ ML ) L( θ ML ), thus twice the log of the ratio of likelihood functions. If we have two separate estimates, this is thus very easy to compute or even eye-ball. (Davidson & MacKinnon 1999, )
57 Likelihood ratio test loglikelihood unrestricted LR/2 restricted theta
58 Wald test Introduction The basic intuition is that the Wald test is quite comparable to an F-test on a set of restrictions. For a single regression parameter the formule would be: W = (ˆβ β 0 ) 2 ( d2 L(β) dβ 2 ) (Davidson & MacKinnon 1999, )
59 Wald test Introduction The basic intuition is that the Wald test is quite comparable to an F-test on a set of restrictions. For a single regression parameter the formule would be: W = (ˆβ β 0 ) 2 ( d2 L(β) dβ 2 ) This test does not require an estimate of θ ML. The test is somewhat sensitive to the formulation of r, so that as long as estimating θ ML is not too costly, it is better to use an LR or LM test. (Davidson & MacKinnon 1999, )
60 Wald test Introduction If the second derivative is higher, the slope of l( ) changes faster, thus the difference between the likelihoods at the restricted and unrestricted positions will be larger.
61 Wald test Introduction loglikelihood restricted unrestricted theta
62 Lagrange multiplier test For a single parameter: LM = ( dl( β) d β )2 d 2 L( β) d β 2 Thus the steeper the slope of the loglikelihood function at the point of the restricted estimate, the more likely it is significantly different from the unrestricted estimate, unless this slope changes very quickly (the second derivative is high).
63 Lagrange multiplier test loglikelihood unrestricted restricted theta
64 Exercise Introduction Imagine, we take a random sample of N colored balls from a vase, with replacement. In the vase, a fraction p of the balls are yellow and the other ones blue. In our sample, there are q yellow balls. Assuming that y i = 1 if the ball is yellow and y i = 0 for blue, the probability of obtaining our sample is: P(q) = p q (1 p) N q, which is the likelihood function with unknown parameter p. Write down the loglikelihood function. (Verbeek 2008, )
65 Exercise Introduction Imagine, we take a random sample of N colored balls from a vase, with replacement. In the vase, a fraction p of the balls are yellow and the other ones blue. In our sample, there are q yellow balls. Assuming that y i = 1 if the ball is yellow and y i = 0 for blue, the probability of obtaining our sample is: P(q) = p q (1 p) N q, which is the likelihood function with unknown parameter p. Write down the loglikelihood function. l(p) = log(p q (1 p) N q ) = qlogp +(N q)log(1 p) (Verbeek 2008, )
66 Exercise Introduction Take derivative towards p. l(p) = qlogp +(N q)log(1 p) Set equal to zero and find ˆp ML. (Verbeek 2008, )
67 Exercise Introduction l(p) = qlogp +(N q)log(1 p) Take derivative towards p. d logl(p) dp = q p + N q 1 p Set equal to zero and find ˆp ML. (Verbeek 2008, )
68 Exercise Introduction l(p) = qlogp +(N q)log(1 p) Take derivative towards p. d logl(p) dp = q p + N q 1 p Set equal to zero and find ˆp ML. q ˆp + N q 1 ˆp = 0 = ˆp = q N (Verbeek 2008, )
69 Exercise Introduction l(p) = qlogp +(N q)log(1 p) Using the loglikelihood function and optim(), find p for N = 100 and q = 30. (Note that log(0) is not possible.)
70 Exercise Introduction l(p) = qlogp +(N q)log(1 p) Using the loglikelihood function and optim(), find p for N = 100 and q = 30. (Note that log(0) is not possible.) ll <- function(p, N, q) { if (p > 0 && p < 1) { -q * log(p) - (N - q) * log(1 - p) } else { } } mlest <- optim(0, ll, NULL, N=100, q=30, hessian=true)
71 Exercise Introduction Using the previous estimate, including the Hessian, calculate a 95% confidence interval around this ˆp.
72 Exercise Introduction Using the previous estimate, including the Hessian, calculate a 95% confidence interval around this ˆp. mlest <- optim(0, ll, NULL, N=100, q=30, hessian=true) se <- sqrt(1/mlest$hessian) ci <- c(mlest$par * se, mlest$par * se) Repeat for N = 1000, q = 300.
73 Exercise Introduction Using the eaef01.dta data, estimate model log(earnings) i = β 0 +β 1 S i +β 2 LIBRARY i +ε i with the lm() function and with the optim() function. Compare estimates for β, σ 2 and V(β). Try different optimization algorithms in optim().
Advanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationIntroduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007
Hypothesis Testing Daniel Schmierer Econ 312 March 30, 2007 Basics Parameter of interest: θ Θ Structure of the test: H 0 : θ Θ 0 H 1 : θ Θ 1 for some sets Θ 0, Θ 1 Θ where Θ 0 Θ 1 = (often Θ 1 = Θ Θ 0
More informationGreene, Econometric Analysis (6th ed, 2008)
EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationLECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.
Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationStatistics and econometrics
1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More information8. Hypothesis Testing
FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements
More informationPractical Econometrics. for. Finance and Economics. (Econometrics 2)
Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationMaximum Likelihood Tests and Quasi-Maximum-Likelihood
Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationECON 5350 Class Notes Nonlinear Regression Models
ECON 5350 Class Notes Nonlinear Regression Models 1 Introduction In this section, we examine regression models that are nonlinear in the parameters and give a brief overview of methods to estimate such
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationEconometrics II - EXAM Answer each question in separate sheets in three hours
Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationLecture 6: Hypothesis Testing
Lecture 6: Hypothesis Testing Mauricio Sarrias Universidad Católica del Norte November 6, 2017 1 Moran s I Statistic Mandatory Reading Moran s I based on Cliff and Ord (1972) Kelijan and Prucha (2001)
More informationAdvanced Econometrics
Advanced Econometrics Dr. Andrea Beccarini Center for Quantitative Economics Winter 2013/2014 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156 General information Aims and prerequisites Objective:
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationBootstrap Testing in Nonlinear Models
!" # $%% %%% Bootstrap Testing in Nonlinear Models GREQAM Centre de la Vieille Charité 2 rue de la Charité 13002 Marseille, France by Russell Davidson russell@ehesscnrs-mrsfr and James G MacKinnon Department
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationMEI Exam Review. June 7, 2002
MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationSpatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationSampling distribution of GLM regression coefficients
Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationChapter 4. Theory of Tests. 4.1 Introduction
Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationECONOMETRICS I. Assignment 5 Estimation
ECONOMETRICS I Professor William Greene Phone: 212.998.0876 Office: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Email: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/econometrics/econometrics.htm
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationSolutions for Econometrics I Homework No.3
Solutions for Econometrics I Homework No3 due 6-3-15 Feldkircher, Forstner, Ghoddusi, Pichler, Reiss, Yan, Zeugner April 7, 6 Exercise 31 We have the following model: y T N 1 X T N Nk β Nk 1 + u T N 1
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationSpatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Spatial Regression 9. Specification Tests (1) Luc Anselin http://spatial.uchicago.edu 1 basic concepts types of tests Moran s I classic ML-based tests LM tests 2 Basic Concepts 3 The Logic of Specification
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationRegression. ECO 312 Fall 2013 Chris Sims. January 12, 2014
ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What
More informationLikelihood Inference in the Presence of Nuisance Parameters
PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More informationA732: Exercise #7 Maximum Likelihood
A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More information2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats
More informationLikelihood Inference in the Presence of Nuisance Parameters
Likelihood Inference in the Presence of Nuance Parameters N Reid, DAS Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe some recent approaches to likelihood based
More information13. October p. 1
Lecture 8 STK3100/4100 Linear mixed models 13. October 2014 Plan for lecture: 1. The lme function in the nlme library 2. Induced correlation structure 3. Marginal models 4. Estimation - ML and REML 5.
More informationGov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39
Gov 2001: Section 4 February 20, 2013 Gov 2001: Section 4 February 20, 2013 1 / 39 Outline 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What
More informationInference about the Indirect Effect: a Likelihood Approach
Discussion Paper: 2014/10 Inference about the Indirect Effect: a Likelihood Approach Noud P.A. van Giersbergen www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Department of Economics & Econometrics
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationWeek 7: Binary Outcomes (Scott Long Chapter 3 Part 2)
Week 7: (Scott Long Chapter 3 Part 2) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China April 29, 2014 1 / 38 ML Estimation for Probit and Logit ML Estimation for Probit and Logit
More information