MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Similar documents
MS&E 226: Small Data

MS&E 226: Small Data

MS&E 226: Small Data

Association studies and regression

MS&E 226: Small Data

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Regression #3: Properties of OLS Estimator

MS&E 226: Small Data

Regression Estimation Least Squares and Maximum Likelihood

STAT5044: Regression and Anova. Inyoung Kim

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

MS&E 226: Small Data

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Review of Maximum Likelihood Estimators

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Linear Models in Machine Learning

Regression Models - Introduction

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Statistics: Learning models from data

Mathematical statistics

Outline of GLMs. Definitions

Quick Tour of Basic Probability Theory and Linear Algebra

COMP90051 Statistical Machine Learning

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Bias Variance Trade-off

F & B Approaches to a simple model

Econ 583 Final Exam Fall 2008

Applied Econometrics (QEM)

COS513 LECTURE 8 STATISTICAL CONCEPTS

Lecture 34: Properties of the LSE

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Statistics 3858 : Maximum Likelihood Estimators

Economics 582 Random Effects Estimation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

POLI 8501 Introduction to Maximum Likelihood Estimation

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Regression Models - Introduction

Statistics and Econometrics I

STAT 730 Chapter 4: Estimation

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Linear Model Under General Variance

Regression #4: Properties of OLS Estimator (Part 2)

Mathematical statistics

Some Probability and Statistics

ECON 4160, Autumn term Lecture 1

Lectures 5 & 6: Hypothesis Testing

Parametric Models: from data to models

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Ma 3/103: Lecture 24 Linear Regression I: Estimation

[y i α βx i ] 2 (2) Q = i=1

Introduction to Simple Linear Regression

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Introduction to Maximum Likelihood Estimation

CSC321 Lecture 18: Learning Probabilistic Models

Inference in Regression Analysis

Some Probability and Statistics

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

1. The Multivariate Classical Linear Regression Model

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Economics 583: Econometric Theory I A Primer on Asymptotics

Generalized Linear Models

Multivariate Regression

For more information about how to cite these materials visit

Generalized Linear Models. Kurt Hornik

Maximum-Likelihood Estimation: Basic Ideas

Asymptotic standard errors of MLE

Introduction to Estimation Methods for Time Series models Lecture 2

Parametric Techniques Lecture 3

ECE521 week 3: 23/26 January 2017

Econometrics I, Estimation

Bayesian Methods: Naïve Bayes

School of Education, Culture and Communication Division of Applied Mathematics

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Multiple Regression Analysis

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Regression I: Mean Squared Error and Measuring Quality of Fit

Advanced Quantitative Methods: maximum likelihood

Masters Comprehensive Examination Department of Statistics, University of Florida

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Parametric Techniques

Linear Regression Models P8111

Measuring the fit of the model - SSR

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Bayesian linear regression

Density Estimation. Seungjin Choi

Time Series Analysis

Linear regression methods

Transcription:

MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18

The likelihood function 2 / 18

Estimating the parameter This lecture develops the methodology behind the maximum likelihood approach to parameter estimation. Basic idea: In guessing what the parameters are, we first ask: what is the chance of seeing the data we observe, given a particular value of the parameter? We then pick the parameter values that maximize this chance. 3 / 18

Example 1: Flipping a biased coin Suppose we flip a coin n times, and get observations Y = (Y 1,..., Y n ). If the bias on the coin was q, what is the probability we get data Y? This is the likelihood of Y given q: f(y q) = n q Y i (1 q) 1 Y i. i=1 4 / 18

Example 2: Linear normal population model Suppose that you are given p-dimensional covariate vectors X i, i = 1,... n, together with corresponding observations Y i, i = 1,..., n. Suppose you assume the linear normal population model with i.i.d. errors: Y i = X i β + ε i, where ε i N (0, σ 2 ), and the ε i are i.i.d. What are the parameters? What is the likelihood? 5 / 18

Example 2: Linear normal population model Parameters are β and σ 2. In deriving likelihood, we will typically treat X as given; i.e., we focus on inference about the relationship between X and Y, rather than the distribution of X. Likelihood is probability density of seeing Y, given parameters and X: n ( ) 1 f(y β, σ 2, X) = exp ( (Y i X i β) 2 ) 2πσ 2 2σ 2. i=1 6 / 18

Example 2: Linear normal population model Derivation of the likelihood: 7 / 18

In general: Parametric likelihood In general, suppose there is some process that generates data Y. 1 Suppose each choice of a parameter vector θ gives rise to a conditional pmf (or pdf) f(y θ). This is the probability (or density) of seeing Y, given parameters θ. We call this the likelihood of Y given θ. 1 As noted on the previous slide, in the case of regression, we treat X as given and look at the process that generates Y from X. 8 / 18

Log likelihood It is often easier to work with logs when taking likelihoods (not least because multiplying many small probabilities together can cause numerical instability). We call this the log likelihood function (LLF). Example 1 (biased coin): log f(y q) = n Y i log q + (1 Y i ) log(1 q). i=1 Example 2 (linear normal population model): log f(y β, σ 2, X) = n 2 log(2πσ2 ) 1 2σ 2 n (Y i X i β) 2. i=1 9 / 18

Maximum likelihood 10 / 18

Maximizing the log likelihood The maximum likelihood estimate (MLE) is the parameter value that maximizes the likelihood. It is found by solving the following optimization problem: maximize f(y θ) (or log f(y θ)) over feasible choices of θ. We denote the resulting solution by ˆθ MLE. 11 / 18

Example 1: Flipping a biased coin Suppose we maximize the log likelihood. The resulting solution is: ˆq MLE = 1 n n Y i = Y. i=1 This is fairly reasonable: we estimate q by the mean number of successes. 12 / 18

Example 1: Flipping a biased coin Derivation of the MLE estimate: 13 / 18

Example 2: Linear normal model, known σ 2 Suppose that σ 2 is known, so the goal is to estimate β. Returning to LLF, our problem is equivalent to choosing ˆβ to minimize: minimize n (Y i X i ˆβ) 2. i=1 In other words: the OLS solution is the MLE estimate of the coefficients! 14 / 18

Example 2: Linear normal model, known σ 2 Let s pause to reflect on what we ve found here. When we first discussed OLS, we did so from a purely algebraic view no assumptions. 15 / 18

Example 2: Linear normal model, known σ 2 Let s pause to reflect on what we ve found here. When we first discussed OLS, we did so from a purely algebraic view no assumptions. When we discussed prediction, we showed that as long as the population model is linear and the ε i are uncorrelated with X i, then OLS is unbiased, and has minimum variance among unbiased linear estimators (Gauss-Markov theorem). 15 / 18

Example 2: Linear normal model, known σ 2 Let s pause to reflect on what we ve found here. When we first discussed OLS, we did so from a purely algebraic view no assumptions. When we discussed prediction, we showed that as long as the population model is linear and the ε i are uncorrelated with X i, then OLS is unbiased, and has minimum variance among unbiased linear estimators (Gauss-Markov theorem). Now, we have shown that in addition if we assume the ε i are i.i.d. normal random variables, then OLS is the maximum likelihood estimate. 15 / 18

Example 2: Linear normal model, unknown σ 2 What happens if σ 2 is unknown? The MLE for β remains unchanged (the OLS solution), and the MLE estimate for σ 2 is: ˆσ 2 MLE = 1 n n (Y i X i ˆβ) 2 = 1 n i=1 n ri 2. i=1 This is intuitive: the sum of squared residuals is an estimate of the variance of the error. 16 / 18

Example 3: Logistic regression The logistic regression parameters are found through maximum likelihood. The (conditional) likelihood for Y given X and parameters β is: P(Y β, X) = n g 1 (X i β) Y i (1 g 1 (X i β)) 1 Y i i=1 where g 1 (z) = exp(z)/(1 + exp(z)). Let ˆβ MLE be the resulting solution; these are the logistic regression coefficients. 17 / 18

Example 3: Logistic regression [ ] Unfortunately, in contrast to our previous examples, maximum likelihood estimation does not have a closed form solution in the case of logistic regression. However, it turns out that there are reasonably efficient iterative methods for algorithmically computing the MLE solution. One example is an algorithm inspired by weighted least squares, called iteratively reweighted least squares. This algorithm iteratively updates the weights in WLS to converge to the logistic regression MLE solution. See [AoS], Section 13.7 for details. 18 / 18