From Model to Log Likelihood

Similar documents
Count and Duration Models

Count and Duration Models

Logit Regression and Quantities of Interest

Logit Regression and Quantities of Interest

Mathematical statistics

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Poisson regression: Further topics

Generalized Linear Models

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Sampling distribution of GLM regression coefficients

MS&E 226: Small Data

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

CS 237: Probability in Computing

Section 20: Arrow Diagrams on the Integers

Conditional probabilities and graphical models

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Review of Discrete Probability (contd.)

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Lecture 01: Introduction

Graphing Linear Inequalities

The Growth of Functions. A Practical Introduction with as Little Theory as possible

POLI 8501 Introduction to Maximum Likelihood Estimation

Statistical Distribution Assumptions of General Linear Models

Math Lecture 3 Notes

Loglikelihood and Confidence Intervals

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Linear Classifiers and the Perceptron

Association studies and regression

Population Models Part I

Discrete Probability and State Estimation

Generalized Linear Models. Kurt Hornik

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2

1 Degree distributions and data

CSC321 Lecture 18: Learning Probabilistic Models

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1

COMP90051 Statistical Machine Learning

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Sequences and infinite series

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Astronomical Distances. Astronomical Distances 1/30

Math 5a Reading Assignments for Sections

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Solving Quadratic & Higher Degree Equations

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

AP Calculus AB Summer Assignment

Please bring the task to your first physics lesson and hand it to the teacher.

Chapter Three. Deciphering the Code. Understanding Notation

Intermediate Algebra Chapter 12 Review

Week 2: Review of probability and statistics

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Solving Quadratic & Higher Degree Equations

Outline of GLMs. Definitions

32. SOLVING LINEAR EQUATIONS IN ONE VARIABLE

Statistical Data Analysis Stat 3: p-values, parameter estimation

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

5601 Notes: The Sandwich Estimator

CS1800: Sequences & Sums. Professor Kevin Gold

CSC411 Fall 2018 Homework 5

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Algebra Exam. Solutions and Grading Guide

Fitting a Straight Line to Data

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

Pre-Calculus Notes from Week 6

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

MS&E 226: Small Data

CS 361: Probability & Statistics

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Algebra & Trig Review

Graduate Econometrics I: Maximum Likelihood I

Difference Equations

Lecture 10: Generalized likelihood ratio test

Quadratic Equations Part I

Bayesian Methods: Naïve Bayes

The Derivative of a Function

Probability and Estimation. Alan Moses

IE 303 Discrete-Event Simulation

One-sample categorical data: approximate inference

A Primer on Statistical Inference using Maximum Likelihood

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

STA Module 4 Probability Concepts. Rev.F08 1

Mathematical statistics

SB1a Applied Statistics Lectures 9-10

MLE/MAP + Naïve Bayes

Transcription:

From Model to Log Likelihood Stephen Pettigrew February 18, 2015 Stephen Pettigrew From Model to Log Likelihood February 18, 2015 1 / 38

Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 2 / 38

Big picture Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 3 / 38

Big picture Big Picture The class is all about how to do good research Last week we talked about three theories of inference: Likelihood inference Bayesian inference Neyman-Pearson hypothesis testing Whether they acknowledge it or not, most of the quantitative research you ve read in the past relied on likelihood inference, and specifically maximum likelihood estimation If it s not using OLS and doesn t have a prior, it s probably MLE Stephen Pettigrew From Model to Log Likelihood February 18, 2015 4 / 38

Big picture Goals for tonight By the end of tonight, you should feel comfortable with these three things: 1 Understand the IID assumption and why it s so important 2 Take a model specification (stochastic/systematic components) and turn it into a likelihood 3 Turn a likelihood function into a log likelihood and understand why logs should make us happy Stephen Pettigrew From Model to Log Likelihood February 18, 2015 5 / 38

Defining our model Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 6 / 38

Defining our model boston.snow <- c(5,0,2,22,1,0,1,1,0,16,0,0,1,0,1,7,15,1,1,1,0,3,13) Error: Check your data again. There s no way Boston s had that much snow. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 7 / 38

Defining our model Running Example The ten data points for this game: measles <- c(1,0,2,1,1,0,0,0,0,0) Stephen Pettigrew From Model to Log Likelihood February 18, 2015 8 / 38

Defining our model Running Example The ten data points for this game: measles <- c(1,0,2,1,1,0,0,0,0,0) Stephen Pettigrew From Model to Log Likelihood February 18, 2015 8 / 38

Defining our model Writing out our model Suppose we want to model these data as our outcome variable. We d first need to write out our model by writing out the stochastic and systematic components. The outcome is a count of measles cases, so for the stochastic component, let s use a Poisson distribution. For the systematic component, let s keep things simple and leave out any explanatory variables. We ll only model an intercept term. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 9 / 38

Defining our model Writing out our model Using the notation from UPM, this model could be written as: y i Pois(λ i ) λ i = e β 0 e β 0 because λ > 0 in the Poisson distribution. If we had other covariates (such state populations or vaccination rates) they would enter our model through the systematic component: y i Pois(λ i ) λ i = e β 0+β 1 x i = e X i β Stephen Pettigrew From Model to Log Likelihood February 18, 2015 10 / 38

Probability statements from our model Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 11 / 38

Probability statements from our model Probability statement with one data point Imagine that we only had one datapoint (like last week s section) and that we knew the true value of λ i With those two pieces of information, we d be able to calculate the probability of observing that datapoint given λ i : p(y i λ i ) = λy i i e λ i y i! p(y i = 1 λ i =.86) =.861 e.86 1! p(y i = 1 λ i =.86) = 0.3639 Stephen Pettigrew From Model to Log Likelihood February 18, 2015 12 / 38

Probability statements from our model What if we have more than one observation? Imagine that we only had two datapoints (1 and 0) and we still knew that λ =.86 What do we typically assume about those two datapoints when we do likelihood inference? Independent and identically distributed (IID) Identically distributed: the observations are drawn from the same distribution, conditional on the covariates. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 13 / 38

Probability statements from our model Independence Independent: Two events are independent if the occurrence (or magnitude) of one does not affect the occurrence (or magnitude) of another. More formally, A and B are independent if and only if P(A and B) = P(A)P(B) Why is this important? Because it makes it easy for us to write out our probability (or likelihood) function Stephen Pettigrew From Model to Log Likelihood February 18, 2015 14 / 38

Probability statements from our model Probability statement with two data points If we know that y 1 = 1, y 2 = 0, and λ 1 = λ 2 =.86 then we can calculate the probability of observing both data points: p(y λ i ) = p(y 1 = 1 λ i ) p(y 2 = 0 λ i ) p(y λ i ) = λy 1 1 e λ 1 y 1! p(y λ i ) = 2 λy 2 2 e λ 2 y 2! λ y i i e λ i y i! p(y λ i =.86) = 0.3639 0.4231 p(y λ i =.86) = 0.1540 Stephen Pettigrew From Model to Log Likelihood February 18, 2015 15 / 38

Likelihood functions Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 16 / 38

Likelihood functions Writing a likelihood function What if we don t know the value of λ? If we did, it probably wouldn t make for very interesting research. Put another way, if we knew the value of our parameters, then we could analytically calculate p(y λ) But we don t know λ, so what we re really interested in is p(λ Y) And recall from lecture that: P(λ Y) = P(λ)P(Y λ) P(λ)P(Y λ)dλ Stephen Pettigrew From Model to Log Likelihood February 18, 2015 17 / 38

Likelihood functions Writing a likelihood function By the likelihood axiom: P(λ Y) = P(λ)P(Y λ) P(λ)P(Y λ)dλ L(λ Y) k(y)p(y λ) L(λ Y) P(Y λ) In a likelihood function, the data is fixed and without variance and the parameters are variables Stephen Pettigrew From Model to Log Likelihood February 18, 2015 18 / 38

Likelihood functions Proportionality Throughout the semester you re going to become intimately familiar with the proportionality symbol, Why are we allowed to use it? Why can we get rid of constants (like k(y)) from our likelihood function? Wouldn t our estimates of ˆλ MLE be different if kept the constants in? No. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 19 / 38

Likelihood functions Proportionality Remember from 8th grade when you learned about transformations of variables? Multiplying a function by a constant vertically shrinks or expands the function, but it doesn t change the horizontal location of peaks and valleys. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 20 / 38

Likelihood functions Proportionality k(y) = 1 Density 0.0 0.2 0.4 0.6 0.8 5 0 5 10 θ Stephen Pettigrew From Model to Log Likelihood February 18, 2015 21 / 38

Likelihood functions Proportionality k(y) =.5 Density 0.0 0.2 0.4 0.6 0.8 5 0 5 10 θ Stephen Pettigrew From Model to Log Likelihood February 18, 2015 21 / 38

Likelihood functions Proportionality k(y) = 2 Density 0.0 0.2 0.4 0.6 0.8 5 0 5 10 θ Stephen Pettigrew From Model to Log Likelihood February 18, 2015 21 / 38

Likelihood functions Proportionality The likelihood function is not a probability density! But it is proportional to the probability density, so the value that maximizes the likelihood function also maximizes the probability density. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 22 / 38

Likelihood functions Writing a likelihood function L(λ i Y) P(Y λ i ) L(λ i Y) n λ y i i e λ i y i! Get rid of any constants, i.e. any terms that don t depend on the parameters: L(λ i Y) n λ y i i e λ i Stephen Pettigrew From Model to Log Likelihood February 18, 2015 23 / 38

Log likelihoods Outline 1 Big picture 2 Defining our model 3 Probability statements from our model 4 Likelihood functions 5 Log likelihoods Stephen Pettigrew From Model to Log Likelihood February 18, 2015 24 / 38

Log likelihoods Why do we use log-likelihoods? When we begin to optimize likelihood functions, we always take the log of the likelihood function first. Why? 1 Computers don t handle small decimals very well. If you re multiplying the probability of 10,000 datapoints together, you re going to get a decimal number that s incredibly tiny. Your computer could have an almost impossible time finding the maximum of such a flat surface. 2 The standard errors of maximum likelihood estimates are a function of the log-likelihood, not the likelihood. If you use R to optimize a likelihood function without taking the log, you are guaranteed to get the wrong standard error estimates. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 25 / 38

Log likelihoods 8 th Grade Algebra Review: Logarithms The log function can be thought of as an inverse for exponential functions. y = log a (x) a y = x In this class we ll always use the natural logarithm, log e (x). Typically we ll just write log(x), but you should always assume that the base is Euler s number, e = 2.718281828. What s the name of this friendly looking gentleman who discovered Euler s number, e? Jacob Bernoulli Stephen Pettigrew From Model to Log Likelihood February 18, 2015 26 / 38

Log likelihoods 8 th Grade Algebra Review: Properties of logs Five rules to remember: 1 log(xy) = log(x) + log(y) 2 log(x y ) = y log(x) 3 log(x/y) = log(x y 1 ) = log(x) + log(y 1 ) = log(x) log(y) 4 Base change formula: log b (x) = log a(x) log a (b) 5 log(x) x = 1 x Stephen Pettigrew From Model to Log Likelihood February 18, 2015 27 / 38

Log likelihoods Writing out the log-likelihood Recall that the likelihood for the Poisson(λ) is: L(λ i Y) n λ y i i e λ i It s easy to get the log likelihood: ( n ) l(λ i Y) log λ y i i e λ i Stephen Pettigrew From Model to Log Likelihood February 18, 2015 28 / 38

Log likelihoods Writing out the log-likelihood ( n ) l(λ i Y) log λ y i i e λ i There s a proof of this step in the appendix of these slides: l(λ i Y) l(λ i Y) n l(λ i Y) n ( ) log λ y i i e λ i ( ) log(λ y i i ) + log(e λ i ) n ) (y i log(λ i ) λ i Stephen Pettigrew From Model to Log Likelihood February 18, 2015 29 / 38

Log likelihoods What about the systematic components? We still haven t accounted for the systematic part of our model! Recall that our model was: y i Pois(λ i ) λ i = e β 0 Luckily, that s the easiest part to do. If our log-likelihood is: l(λ i Y) n ) (y i log(λ i ) λ i We account for the systematic component by substituting in e β 0 every time we see a λ i : n ( ) n ( ) l(β 0 Y) y i log(e β 0 ) e β 0 y i β 0 e β 0 Stephen Pettigrew From Model to Log Likelihood February 18, 2015 30 / 38

Log likelihoods What about the systematic components? If our model were more complicated, with covariates: y i Pois(λ i ) λ i = e β 0+β 1 x i = e X iβ We account for them in the same way, by substituting in e x iβ every time we see a λ i : l(β Y, X) l(β Y, X) n ( ) y i log(e xiβ ) e x iβ n ( ) y i x i β e x iβ Stephen Pettigrew From Model to Log Likelihood February 18, 2015 31 / 38

Log likelihoods How would we code this log-likelihood in R? Next week Solé will show you up to use R to optimize log-likelihood functions. In order to do that you ll have to be comfortable coding the log-likelihood as a function. l(β Y, X) n ( ) y i x i β e x iβ For this likelihood you ll have to write a function that takes a vector of your parameters ({β 0, β 1 }), your outcome variable (Y), and the matrix of your covariates (X) Stephen Pettigrew From Model to Log Likelihood February 18, 2015 32 / 38

Log likelihoods How would we code this log-likelihood in R? l(β Y, X) n ( ) y i x i β e x iβ loglike <- function(parameters, outcome, covariates){ cov <- as.matrix(cbind(1, covariates)) xb <- cov %*% parameters } sum(outcome * xb - exp(xb)) Stephen Pettigrew From Model to Log Likelihood February 18, 2015 33 / 38

Log likelihoods Any questions? Stephen Pettigrew From Model to Log Likelihood February 18, 2015 34 / 38

Appendix Appendix The switch the proportionality (and getting rid of constants) doesn t mess up the location of the maximum, but it does change the curvature of the function at the maximum. And we use the curvature at the maximum to estimate our standard errors. So doesn t getting rid of constants change the estimates we get of our standard errors? Not exactly. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 35 / 38

Appendix Appendix As we ll talk about in the next couple weeks, the standard error of MLEs are, by definition, Var(ˆθ MLE ) = I 1 (ˆθ MLE ) = [ ( 2 )] l 1 E l 2 I 1 (ˆθ MLE ) is the inverse of the Fisher information matrix, evaluated at [ ˆθ MLE. ( )] 1 E 2 l is the inverse of the negative Hessian matrix. l 2 Notice that the Hessian is the expected value of the second derivative of the log likelihood, not the likelihood. Stephen Pettigrew From Model to Log Likelihood February 18, 2015 36 / 38

Appendix Appendix Because the standard errors are a function of the log likelihood, it doesn t matter if we keep the constants in or not. They ll drop out when we differentiate anyway. Here s an example with the Poisson likelihood: L(λ y) = k(y) λy e λ l(λ y) = log(k(y)) + y log(λ) λ log(y!) l λ = 0 + y λ 1 0 All the constants disappear when you take the first derivative of the log-likelihood, so it didn t matter if you kept them in the first place! Which means that when we take out constants we ll get the same maximum likelihood estimates and the same standard errors on those estimates, regardless of whether we remove constants or not. Magic! Stephen Pettigrew From Model to Log Likelihood February 18, 2015 37 / 38 y!

Appendix Appendix When we switch from likelihoods to log-likelihoods we use the trick that ( n ) n log x i = log(x i ) If x i has 3 elements, then: And by our logarithm rules: ( n ) log x i = log(x 1 x 2 x 3 ) ( n ) log x i = log(x 1 ) + log(x 2 ) + log(x 3 ) ( n ) log x i = n log(x i ) Stephen Pettigrew From Model to Log Likelihood February 18, 2015 38 / 38