Advanced Quantitative Methods: maximum likelihood
|
|
- Imogen Francis
- 5 years ago
- Views:
Transcription
1 Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014
2
3 Outline
4 of straight lines y = 1 2 x + 2 dy dx = 1 2
5 of curves y = x 2 4x + 5
6 of curves y = x 2 4x + 5 d(ax b ) dx d(a + b) dx = bax b 1 = da dx + db dx
7 of curves y = x 2 4x + 5 d(ax b ) dx d(a + b) dx = bax b 1 = da dx + db dx dy dx = 2x 4
8 Finding location of peaks and valleys A maximum or minimum value of a curve can be found by setting the derivative equal to zero.
9 Finding location of peaks and valleys A maximum or minimum value of a curve can be found by setting the derivative equal to zero. y = x 2 4x + 5 dy dx = 2x 4 = 0 2x = 4 x = 4 2 = 2
10 Finding location of peaks and valleys A maximum or minimum value of a curve can be found by setting the derivative equal to zero. y = x 2 4x + 5 dy dx = 2x 4 = 0 2x = 4 x = 4 2 = 2 So at x = 2 we will find a (local) maximum or minimum value - the plot shows that in this case it is a minimum.
11 in regression This concept of finding the minimum or maximum is crucial for regression analysis. With ordinary least squares (OLS), we estimate the β-coefficients by finding the minimum of the sum of squared errors. With (ML), we estimate the β-coefficients by finding the maximum point of the (log)likelihood function.
12 of matrices The derivative of a matrix consists of the matrix containing the derivatives of each element of the original matrix.
13 of matrices The derivative of a matrix consists of the matrix containing the derivatives of each element of the original matrix. x 2 2x 3 M = 2x + 4 3x 3 2x 3 2 4x 2x 2 0 dm dx = 2 9x
14 of matrices d(v x) = v dx d(x Ax) = (A + A )x dx d(bx) = B dx d 2 (x Ax) dxdx = A + A
15 Outline
16 Preliminaries Ordinary least squares and its variations (GLS, WLS, 2SLS, etc.) is very flexible and applicable in many circumstances, but for some models (e.g. limited dependent variable models) we need more flexible estimation procedures. The most prominent alternative estimators are maximum likelihood (ML) and Bayesian estimators. This lecture is about the former.
17 Logarithms Some rules about logarithms, which are important for understanding ML: log ab = log a + log b log e a = a log a b = b log a log a > log b iff a > b d(log a) = 1 da a d(log f (a)) da = d(f (a))/da f (a)
18 Likelihood: intuition Dice example.
19 Likelihood: intuition Dice example. Imagine a dice is thrown with unknown number of sides k. We know after throwing the dice that the outcome is 5. How likely is this outcome if k = 0? And k = 1? Continue until k = 10. Plot of this is the likelihood function.
20 Likelihood Likelihood is the hypothetical probability that an event that has already occurred would yield a specific outcome. The concept differs from that of a probability in that a probability refers to the occurrence of future events, while a likelihood refers to past events with known outcomes. (Wolfram Mathworld)
21 Preliminaries Looking first at just univariate models, we can express the model as the probability density function (PDF) f (y, θ), where y is the dependent variable, θ the set of parameters, and f ( ) expresses the model specification.
22 Preliminaries Looking first at just univariate models, we can express the model as the probability density function (PDF) f (y, θ), where y is the dependent variable, θ the set of parameters, and f ( ) expresses the model specification. ML is purely parametric - we need to make strong assumptions about the shape of f ( ) before we can estimate θ.
23 Likelihood Given θ, f (, θ) is the PDF of y.
24 Likelihood Given θ, f (, θ) is the PDF of y. Given y, f (y, ) cannot be interpreted as a PDF and is therefore called the likelihood function.
25 Likelihood Given θ, f (, θ) is the PDF of y. Given y, f (y, ) cannot be interpreted as a PDF and is therefore called the likelihood function. ˆθ ML is the θ that maximizes this likelihood function. (Davidson & MacKinnon 1999, )
26 (Log)likelihood function Generally, observations are assumed to be independent, in which case the joint density of the entire sample is the product of the densities of the individual observations. n f (y, θ) = f (y i, θ) i=1
27 (Log)likelihood function Generally, observations are assumed to be independent, in which case the joint density of the entire sample is the product of the densities of the individual observations. n f (y, θ) = f (y i, θ) It is easier to work with the log of this function, because sums are easier to deal with than products and the logarithmic transformation is monotonic: n n l(y, θ) log f (y, θ) = log f i (y i, θ) = l i (y i, θ) i=1 i=1 i=1 (Davidson & MacKinnon 1999, )
28 Example: exponential distribution For example, take the PDF of y to be f (y, θ) = θe θy y i > 0, θ > 0
29 Example: exponential distribution For example, take the PDF of y to be f (y, θ) = θe θy y i > 0, θ > 0 This distribution is useful when modeling something which is by definition positive. (Davidson & MacKinnon 1999, )
30 Example: exponential distribution Deriving the loglikelihood function: f (y, θ) = l(y, θ) = = n f (y i, θ) = i=1 n i=1 n log f (y i, θ) = i=1 θe θy i n log(θe θy i ) i=1 n (log θ θy i ) = n log θ θ n i=1 i=1 y i (Davidson & MacKinnon 1999, )
31 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ).
32 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ). In some cases, such as the above exponential distribution or the linear model, we can find this value analytically, but taking the derivative, and setting equal to zero.
33 Maximum Likelihood Estimates The estimate of θ is the value of θ which maximizes the (log)likelihood function l(y, θ). In some cases, such as the above exponential distribution or the linear model, we can find this value analytically, but taking the derivative, and setting equal to zero. In many other cases, there is no such analytical solution, and we use numerical search algorithms to find the value.
34 Example: exponential distribution To find the maximum, we take the derivative and set this equal to zero: n l(y, θ) = n log θ θ d(n log θ θ n i=1 y i) dθ n n ˆθ y i = 0 ML i=1 ˆθ ML = = n θ n i=1 n n i=1 y i y i i=1 y i (Davidson & MacKinnon 1999, )
35 Numerical methods In many cases, closed form solutions like the examples above do not exist and numerical methods are necessary. A computer algorithm searches for the value that maximizes the loglikelihood.
36 Numerical methods In many cases, closed form solutions like the examples above do not exist and numerical methods are necessary. A computer algorithm searches for the value that maximizes the loglikelihood. In R: optim()
37 Newton s Method θ (m+1) = θ (m) H 1 (m) g (m) (Davidson & MacKinnon 1999, 401)
38 Outline
39 Model: y = Xβ + ε, ε N(0, σ 2 I)
40 Model: y = Xβ + ε, ε N(0, σ 2 I) We assume ε to be independently distributed and therefore, conditional on X, y is assumed to be. f i (y i, β, σ 2 ) = 1 (y 2πσ 2 e i X i β) 2σ 2 2 (Davidson & MacKinnon 1999, )
41 : loglikelihood function f (y, β, σ 2 ) = l(y, β, σ 2 ) = = n f i (y i, β, σ 2 ) = i=1 n 1 (y 2πσ 2 e i X i β) 2σ 2 i=1 n log f i (y i, β, σ 2 ) = i=1 n ( 1 2 log σ2 1 2 i=1 = n 2 log σ2 n 2 = n 2 log σ2 n 2 n 1 log( (y 2πσ 2 e i X i β) i= σ 2 ) log 2π 1 2σ 2 (y i X i β) 2 ) log 2π 1 2σ 2 n (y i X i β) 2 i=1 log 2π 1 2σ 2 (y Xβ) (y Xβ)
42 : loglikelihoodfunction To zoom in on one part of this function: log 1 2πσ 2 = log 1 σ 2π = log 1 log(σ 2π) = 0 (log σ + log 2π) = 1 2 log σ2 1 log 2π, using 2 log σ = 1 log σ2 2
43 : loglikelihood function f (y, β, σ 2 ) = l(y, β, σ 2 ) = = n f i (y i, β, σ 2 ) = i=1 n 1 (y 2πσ 2 e i X i β) 2σ 2 i=1 n log f i (y i, β, σ 2 ) = i=1 n ( 1 2 log σ2 1 2 i=1 = n 2 log σ2 n 2 = n 2 log σ2 n 2 n 1 log( (y 2πσ 2 e i X i β) i= σ 2 ) log 2π 1 2σ 2 (y i X i β) 2 ) log 2π 1 2σ 2 n (y i X i β) 2 i=1 log 2π 1 2σ 2 (y Xβ) (y Xβ)
44 : loglikelihood function f (y, β, σ 2 ) = l(y, β, σ 2 ) = = n f i (y i, β, σ 2 ) = i=1 n 1 (y 2πσ 2 e i X i β) 2σ 2 i=1 n log f i (y i, β, σ 2 ) = i=1 n ( 1 2 log σ2 1 2 i=1 = n 2 log σ2 n 2 = n 2 log σ2 n 2 n 1 log( (y 2πσ 2 e i X i β) i= σ 2 ) log 2π 1 2σ 2 (y i X i β) 2 ) log 2π 1 2σ 2 n (y i X i β) 2 i=1 log 2π 1 2σ 2 (y Xβ) (y Xβ)
45 : estimating σ 2 l(y, β, σ 2 ) σ 2 = n 2σ σ 4 (y Xβ) (y Xβ) 0 = n 2ˆσ ˆσ 4 (y Xβ) (y Xβ) n 2ˆσ 2 = 1 2ˆσ 4 (y Xβ) (y Xβ) n = 1ˆσ 2 (y Xβ) (y Xβ) ˆσ 2 = 1 n (y Xβ) (y Xβ) Thus ˆσ 2 is a function of β. (Davidson & MacKinnon 1999, )
46 : estimating σ 2 ˆσ 2 = 1 n (y Xβ) (y Xβ) = e e n Note that this is almost the equivalent of the variance estimator of OLS ( e e n k ). This estimator is a biased, but consistent, estimator of ˆσ 2.
47 : estimating β l(y, β, σ 2 ) = n 2 log σ2 n 2 log 2π 1 2σ 2 (y Xβ) (y Xβ) l(y, β) = n 2 log( 1 n (y Xβ) (y Xβ)) n log 2π 2 1 2( 1 n (y Xβ) (y Xβ)) (y Xβ) (y Xβ) = n 2 log( 1 n (y Xβ) (y Xβ)) n 2 log 2π n 2
48 : estimating β l(y, β, σ 2 ) = n 2 log σ2 n 2 log 2π 1 2σ 2 (y Xβ) (y Xβ) l(y, β) = n 2 log( 1 n (y Xβ) (y Xβ)) n log 2π 2 1 2( 1 n (y Xβ) (y Xβ)) (y Xβ) (y Xβ) = n 2 log( 1 n (y Xβ) (y Xβ)) n 2 log 2π n 2 Because the middle term is equivalent to minus n/2 times the log of SSR, maximizing l(y, β) with respect to β is the equivalent to minimizing SSR: ˆβ ML = ˆβ OLS. (Davidson & MacKinnon 1999, )
49 l(y, β, σ 2 ) = n 2 log σ2 n 2 log 2π 1 2σ 2 (y Xβ) (y Xβ) ll <- function(par, X, y) { s2 <- exp(par[1]) beta <- par[-1] n <- length(y) r <- y - X %*% beta - n/2 * log(s2) - n/2 * log(2*pi) - 1/(2*s2) * sum(r^2) } mlest <- optim(c(1,0,0,0), ll, NULL, X, y, control = list(fnscale = -1), hessian=true)
50 Some practical tricks: By default, optim() minimizes rather than maximizes, hence the need for fnscale = -1. To get ˆσ 2, we need to take the exponent of the first parameter - this is done to make sure σ 2 is always assumed positive.
51 Outline
52 Gradient vector The gradient vector or score vector is a vector with typical element: l(y, θ) g j (y, θ), θ j i.e. the vector of partial derivatives of the loglikelihood function towards each parameter in θ. (Davidson & MacKinnon 1999, 400)
53 Hessian matrix H(θ) is a k k matrix with typical element h ij = 2 l(θ) θ i θ j, i.e. the matrix of second derivatives of the loglikelihood function. (Davidson & MacKinnon 1999, 401, 407)
54 Hessian matrix H(θ) is a k k matrix with typical element h ij = 2 l(θ) θ i θ j, i.e. the matrix of second derivatives of the loglikelihood function. 1 Asymptotic equivalent: H(θ) plim n nh(y, θ). (Davidson & MacKinnon 1999, 401, 407)
55 Information matrix I(θ) = n E θ ((G i (y, θ)) G i (y, θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. (Davidson & MacKinnon 1999, )
56 Information matrix I(θ) = n E θ ((G i (y, θ)) G i (y, θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. There is also an asymptotic equivalent, I(θ) plim n 1 n I(θ). (Davidson & MacKinnon 1999, )
57 Information matrix I(θ) = n E θ ((G i (y, θ)) G i (y, θ)) i=1 I(θ) is the information matrix, or the covariance matrix of the score vector. There is also an asymptotic equivalent, I(θ) plim n 1 n I(θ). I(θ) = H(θ) - they both measure the amount of curvature in the loglikelihood function. (Davidson & MacKinnon 1999, )
58 Outline
59 Under some weak conditions, ML estimators are consistent, asymptotically efficient and asymptotically normally distributed. (Davidson & MacKinnon 1999, )
60 Under some weak conditions, ML estimators are consistent, asymptotically efficient and asymptotically normally distributed. New ML estimators thus do not require extensive proof of this - once it is shown it is an ML estimator, it inherits those properties. (Davidson & MacKinnon 1999, )
61 Variance-covariance matrix of ˆβ ML V (plim n n( ˆβ ML θ)) = H 1 (θ)i (θ)h 1 (θ) = I 1 (θ), the latter being true iff I(θ) = H(θ) is true. (Davidson & MacKinnon 1999, )
62 Variance-covariance matrix of ˆβ ML V (plim n n( ˆβ ML θ)) = H 1 (θ)i (θ)h 1 (θ) = I 1 (θ), the latter being true iff I(θ) = H(θ) is true. One estimator for the variance is: V ( ˆβ ML ) = H 1 ( ˆβ ML ). For alternative estimators see reference below. (Davidson & MacKinnon 1999, )
63 Estimating V ( ˆβ ML ) mlest <- optim(..., hessian=true) V <- solve(-mlest$hessian) sqrt(diag(v))
64 Exercise Imagine, we take a random sample of N colored balls from a vase, with replacement. In the vase, a fraction p of the balls are yellow and the other ones blue. In our sample, there are q yellow balls. Assuming that y i = 1 if the ball is yellow and y i = 0 for blue, the probability of obtaining our sample is: P(q) = p q (1 p) N q, which is the likelihood function with unknown parameter p. Write down the loglikelihood function. (Verbeek 2008, )
65 Exercise Imagine, we take a random sample of N colored balls from a vase, with replacement. In the vase, a fraction p of the balls are yellow and the other ones blue. In our sample, there are q yellow balls. Assuming that y i = 1 if the ball is yellow and y i = 0 for blue, the probability of obtaining our sample is: P(q) = p q (1 p) N q, which is the likelihood function with unknown parameter p. Write down the loglikelihood function. l(p) = log(p q (1 p) N q ) = q log p + (N q) log(1 p) (Verbeek 2008, )
66 Exercise Take derivative towards p. l(p) = q log p + (N q) log(1 p) Set equal to zero and find ˆp ML. (Verbeek 2008, )
67 Exercise l(p) = q log p + (N q) log(1 p) Take derivative towards p. dl(p) dp = q p + N q 1 p Set equal to zero and find ˆp ML. (Verbeek 2008, )
68 Exercise l(p) = q log p + (N q) log(1 p) Take derivative towards p. dl(p) dp = q p + N q 1 p Set equal to zero and find ˆp ML. q ˆp + N q 1 ˆp = 0 = ˆp = q N (Verbeek 2008, )
69 Exercise l(p) = q log p + (N q) log(1 p) Using the loglikelihood function and optim(), find p for N = 100 and q = 30.
70 Exercise l(p) = q log p + (N q) log(1 p) Using the loglikelihood function and optim(), find p for N = 100 and q = 30. ML estimates are invariant under reparametrization, so we will use the logit transform of p instead of p, to avoid probabilities outside [0, 1]: p = e p.
71 Exercise l(p) = q log p + (N q) log(1 p) Using the loglikelihood function and optim(), find p for N = 100 and q = 30. ML estimates are invariant under reparametrization, so we will use the logit transform of p instead of p, to avoid probabilities outside [0, 1]: p = e p. ll <- function(p, N, q) { pstar <- 1 / (1 + exp(-p)) q * log(pstar) + (N - q) * log(1 - pstar) } mlest <- optim(0, ll, NULL, N=100, q=30, hessian=true, control = list(fnscale = -1)) phat <- 1 / (1 + exp(-mlest$par))
72 Exercise Using the previous estimate, including the Hessian, calculate a 95% confidence interval around this ˆp.
73 Exercise Using the previous estimate, including the Hessian, calculate a 95% confidence interval around this ˆp. se <- sqrt(-1/mlest$hessian) ci <- c(mlest$par * se, mlest$par * se) ci <- 1 / (1 + exp(-ci)) Repeat for N = 1000, q = 300.
74 Exercise Using the uswages.dta data, estimate model log(wage) i = β 0 + β 1 educ i + β 2 exper i + ε i with the lm() function and with the optim() function. Compare estimates for β, σ 2 and V (β). Try different optimization algorithms in optim().
75 Outline
76 Within the ML framework, there are three common tests: Likelihood ratio test (LR) Wald test (W ) Lagrange multiplier test (LM) These are asymptotically equivalent. Given r restrictions, all have asymptotically a χ 2 (r) distribution. (Davidson & MacKinnon 1999, 414)
77 Within the ML framework, there are three common tests: Likelihood ratio test (LR) Wald test (W ) Lagrange multiplier test (LM) These are asymptotically equivalent. Given r restrictions, all have asymptotically a χ 2 (r) distribution. In the following, θ ML will denote the restricted estimate and ˆθ ML the unrestricted one. (Davidson & MacKinnon 1999, 414)
78 Likelihood ratio test LR = 2(l( ˆθ ML ) l( θ ML )) = 2 log L( ˆθ ML ) L( θ ML ), thus twice the log of the ratio of likelihood functions. If we have two separate estimates, this is thus very easy to compute or even eye-ball. (Davidson & MacKinnon 1999, )
79 Likelihood ratio test unrestricted loglikelihood restricted LR/ theta
80 Wald test The basic intuition is that the Wald test is quite comparable to an F -test on a set of restrictions. For a single regression parameter the formule would be: W = ( ˆβ β 0 ) 2 ( d 2 L(β) dβ 2 ) (Davidson & MacKinnon 1999, )
81 Wald test The basic intuition is that the Wald test is quite comparable to an F -test on a set of restrictions. For a single regression parameter the formule would be: W = ( ˆβ β 0 ) 2 ( d 2 L(β) dβ 2 ) This test does not require an estimate of θ ML. The test is somewhat sensitive to the formulation of r, so that as long as estimating θ ML is not too costly, it is better to use an LR or LM test. (Davidson & MacKinnon 1999, )
82 Wald test If the second derivative is higher, the slope of l( ) changes faster, thus the difference between the likelihoods at the restricted and unrestricted positions will be larger.
83 Wald test loglikelihood restricted unrestricted theta
84 Lagrange multiplier test For a single parameter: LM = ( dl( β) )2 d β d 2 L( β) d β 2 Thus the steeper the slope of the loglikelihood function at the point of the restricted estimate, the more likely it is significantly different from the unrestricted estimate, unless this slope changes very quickly (the second derivative is high).
85 Lagrange multiplier test unrestricted loglikelihood restricted theta
Advanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction 2 3 4 5 Preliminaries Introduction Ordinary least squares
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationLECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.
Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationGreene, Econometric Analysis (6th ed, 2008)
EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationStatistics and econometrics
1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample
More informationIntroduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007
Hypothesis Testing Daniel Schmierer Econ 312 March 30, 2007 Basics Parameter of interest: θ Θ Structure of the test: H 0 : θ Θ 0 H 1 : θ Θ 1 for some sets Θ 0, Θ 1 Θ where Θ 0 Θ 1 = (often Θ 1 = Θ Θ 0
More information8. Hypothesis Testing
FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationMaximum Likelihood Tests and Quasi-Maximum-Likelihood
Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationBootstrap Testing in Nonlinear Models
!" # $%% %%% Bootstrap Testing in Nonlinear Models GREQAM Centre de la Vieille Charité 2 rue de la Charité 13002 Marseille, France by Russell Davidson russell@ehesscnrs-mrsfr and James G MacKinnon Department
More informationPractical Econometrics. for. Finance and Economics. (Econometrics 2)
Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics
More informationMEI Exam Review. June 7, 2002
MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)
More informationWeek 7: Binary Outcomes (Scott Long Chapter 3 Part 2)
Week 7: (Scott Long Chapter 3 Part 2) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China April 29, 2014 1 / 38 ML Estimation for Probit and Logit ML Estimation for Probit and Logit
More informationChapter 4. Theory of Tests. 4.1 Introduction
Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule
More informationEconometrics II - EXAM Answer each question in separate sheets in three hours
Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationECON 5350 Class Notes Nonlinear Regression Models
ECON 5350 Class Notes Nonlinear Regression Models 1 Introduction In this section, we examine regression models that are nonlinear in the parameters and give a brief overview of methods to estimate such
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationLikelihood Inference in the Presence of Nuisance Parameters
PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationSpatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February
More informationAnalysis of Time-to-Event Data: Chapter 4 - Parametric regression models
Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationLogistic regression model for survival time analysis using time-varying coefficients
Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLikelihood Inference in the Presence of Nuisance Parameters
Likelihood Inference in the Presence of Nuance Parameters N Reid, DAS Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe some recent approaches to likelihood based
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationAdvanced Econometrics
Advanced Econometrics Dr. Andrea Beccarini Center for Quantitative Economics Winter 2013/2014 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156 General information Aims and prerequisites Objective:
More informationMixed models in R using the lme4 package Part 4: Theory of linear mixed models
Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationLecture 6: Hypothesis Testing
Lecture 6: Hypothesis Testing Mauricio Sarrias Universidad Católica del Norte November 6, 2017 1 Moran s I Statistic Mandatory Reading Moran s I based on Cliff and Ord (1972) Kelijan and Prucha (2001)
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationStatistics, inference and ordinary least squares. Frank Venmans
Statistics, inference and ordinary least squares Frank Venmans Statistics Conditional probability Consider 2 events: A: die shows 1,3 or 5 => P(A)=3/6 B: die shows 3 or 6 =>P(B)=2/6 A B : A and B occur:
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationFinancial Econometrics
Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationPart 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2
Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts
More informationDiscrete Dependent Variable Models
Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationCHAPTER 1: BINARY LOGIT MODEL
CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationEstimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1
Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information