IEOR 165 Lecture 13 Maximum Likelihood Estimation

Size: px
Start display at page:

Download "IEOR 165 Lecture 13 Maximum Likelihood Estimation"

Transcription

1 IEOR 165 Lecture 13 Maximum Likelihood Estimation 1 Motivating Problem Suppose we are working for a grocery store, and we have decided to model service time of an individual using the express lane (for 10 items or less) with an exponential distribution. An exponential service time is a common assumption in basic queuing theory models. Recall that: A random variable X with exponential distribution is denoted by X E(λ), where λ > 0 is the rate; in our context, the rate is better understood to be the service rate. The exponential distribution has pdf { λexp( λu), if u 0, f X (u) = 0 otherwise. In our motivating problem, suppose the data X i for i = 1,...,n are assumed to be iid measurements of the service time. The first idea we might have to estimate the service rate is to observe that the mean of an exponential distribution is 1/λ. And so we could estimate the service rate using the equation ( 1 ˆλ = n n ) 1. X i i=1 The intuition is that the sample average converges to the expectation as we get more data. And since the service rate is assumed to be λ > 0 in the definition of an exponential random variable, it turns out that the inverse of the sample average will converge to the service rate λ. The potential weakness of this approach is that though it will eventually give us the correct answer, we are not using information we know about the underlying distribution of the data; it may be possible to get an accurate estimate of the service rate using less data with a more sophisticated approach. Recall the likelihood function for a parametric distribution P θ with pdf f θ (u) is defined as L(X,θ) = n f θ (X i ). One alternative idea is to choose the value of θ that maximizes this likelihood. And so in order to be able to solve these problems, we have to first discuss optimization problems. 1 i=1

2 2 Nonlinear Programming Consider the following optimization problem (P): min f(x) s.t. x R n g i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,k where f(x),g i (x),h i (x) are continuously differentiable functions. We call the function f(x) the objective, and we define the feasible set X as X = {x R n : g i (x) 0 i = 1,...,m h i (x) = 0 i = 1,...,k}. Note that this formulation also incorporates maximization problems such as max{f(x) x X} through rewriting the problem as min{ f(x) x X}. Let B(x,r) = {y R n : x y r} be a ball centered at point x with radius r. A point x is a local minimizer for (P) if there exists ρ > 0 such that f(x ) f(x) for all x X B(x,ρ). A point x is a global minimizer for (P) if f(x ) f(x) for all x X. Note that a global minimizer is also a local minimizer, but the opposite may not be true. 2.1 Unconstrained Optimization First consider the special case where f : R R. When (P) does not have any constraints, we know from calculus (specifically Fermat s theorem) that the global minimum must occur at points where either (i) the slope is zero f (x) = 0, (ii) at x =, or (iii) at x =. More rigorously, the theorem states that if f (x) 0 for x R, then this x is not a local minimum. This result extends naturally to the multivariate case where f : R n R, by insteading looking for points where the gradient is zero f(x) = 0. This result is useful because it gives one possible approach to solve (P) in the case where there are no constraints: We can find all points where the gradient is zero, evaluate the function in the limits as x tends to x =, or x =, and then select the minimum amongst these points. One natural question to ask is how can we extend this approach to the general case (P)? 2.2 Karush-Kuhn-Tucker (KKT) Conditions If x X is a local minimizer for (P), then under some technical conditions there exists λ i for i = 1,...,m and µ i for i = 1,...,k such that the following equations, known as the 2

3 Karush-Kuhn-Tucker (KKT) conditions, hold: f(x )+ m i=1 λ i g i (x )+ k i=1 µ i h i (x ) = 0 λ i 0, i = 0,...,m λ i g i (x ) = 0, i = 1,...,m As in the unconstrained case, this is a necessary (but not sufficient) condition for optimality. 2.3 Example Consider the following optimization problem min x s.t. x R x 0. The KKT conditions are that the local minima x satisfy 1 λ 1 = 0 λ 1 0 λ 1 x = 0 Also,allpossiblecombinationsof(λ 1,x)forwhichtheKKTconditionsholdcanbecomputed. From the condition: 1 λ 1 = 0, we have that λ 1 = 1. Combining this with the condition: λ 1 x = 0 gives that x = 0. This is the only possible solution of the KKT conditions, and so the local minimizer must also be the global minimizer. Summarizing, the global minimizer is x = 0. 3 Maximum Likelihood Estimation The idea of maximum likelihood estimation (MLE) is to generate an estimate of some unknown parameters by solving the following optimization problem max{l(x,θ) θ Θ} = max{ n i=1 f θ(x i ) θ Θ}. Here, the constraint θ Θ is meant to be short-hand notation for the inequality and equality constraints discussed earlier. 3.1 Example: Service Rate of Exponential Distribution To get a better understanding of how this approach works, consider the case of using MLE to estimate the service rate of an exponential. The likelihood is L(X,λ) = n ( i=1 λexp( λxi ) ) = λ n exp( λ i=1 X i). 3

4 And we would like to solve the problem max{λ n exp( λ i=1 X i) λ > 0}. This formulation does not exactly fit the framework described above, and the reason is that strict inequalities are somewhat problematic for optimization problems. The reason is that a minimum may not exist. And so we will consider the following optimization max{λ n exp( λ i=1 X i) λ 0}, and we will hope that λ = 0 is not the solution because this would indicate a problem. In principle, we can solve this problem by finding all points satisfying the KKT conditions. However, sometimes ad hoc approaches can be easier. For instance, suppose we find the points at which the gradient is zero: nλ n 1 exp( λ i=1 X i)+ λ n exp( λ i=1 X i) n i=1 X i = 0. Now assuming λ > 0, we can simplify this to n+ λ n i=1 X i = 0 λ = ( 1 n i=1 X i) 1. Since the random variables satisfy X i > 0 when drawn from an exponential distribution, this estimate is strictly positive. The objective at λ = 0 is zero, and the objective as λ 0 is also zero. Consequently, the above solution must be the global maximum since the function is strictly concave (i.e., concave down) in λ. Observe that in this case the MLE estimate is equivalent to the inverse of the sample average. However, in other cases the MLE estimate can differ from the estimates we previously considered. 3.2 Transformation of the Objective Function In some cases, it can be easier to solve a transformed version of the optimization problem. For instance, two transformations that are useful in practice are: 1. instead of maximizing the likelihood, we could equivalently minimize the negative of the likelihood; 2. instead of maximizing the likelihood, we could equivalently maximize the likelihood composed with a strictly increasing function. In fact, one very useful transformation is the minimize the negative log-likelihood. The utility of such transformations is most clearly elucidated by an example. 4

5 3.3 Example: Mean and Variance of Gaussian Suppose X i N(µ,σ 2 ) are iid measurements from a Gaussian with unknown mean and variance. Then the MLE estimate is given by the solution to max{(2πσ 2 ) n/2 exp ( i=1 (X i µ) 2 /(2σ 2 ) ) σ 2 > 0}. If we instead minimize the negative log-likelihood, then the problem we would like to solve is min{(n/2)log(2πσ 2 )+ i=1 (X i µ) 2 /(2σ 2 ) σ 2 > 0}. If we consider the objective function h(µ,σ 2 ) = (n/2)log(2πσ 2 )+ i=1 (X i µ) 2 /(2σ 2 ) (that is we are treating the σ 2 as a single variable), then setting its gradient equal to zero gives [ ] [ hµ = i=1 2(X ] i µ)/(2σ 2 ) h σ 2 n/(2σ 2 ) i=1 (X i µ) 2 /(2(σ 2 ) 2 = 0, ) where h µ and h σ 2 denote the partial derivatives of h(µ,σ 2 ) with respect to µ and σ 2, respectively. To solve this, we first begin with h µ = i=1 2(X i µ)/(2σ 2 ) = 0 i=1 (X i µ) = 0 ˆµ = 1 n i=1 X i, where we have made use of the fact that σ 2 > 0. Next, note that h σ 2 = n/(2σ 2 ) i=1 (X i ˆµ) 2 /(2(σ 2 ) 2 ) = 0 nσ 2 = i=1 (X i ˆµ) 2 ˆσ 2 = 1 n i=1 (X i ˆµ) 2, where we multiplied by (σ 2 ) 2 on the second step. The estimate of the mean ˆµ = 1 n n i=1 X i is simply the sample average; however, the estimate of the variance ˆσ 2 = 1 n n i=1 (X i ˆµ) 2 is the biased estimator of the variance (cf. the unbiased estimate used for hypothesis testing s 2 = 1 n n 1 i=1 (X i ˆµ) 2 ). So in this case, the MLE estimates differ from the estimates we have previously used for hypothesis testing. 5

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control 19/4/2012 Lecture content Problem formulation and sample examples (ch 13.1) Theoretical background Graphical

More information

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is Likelihoods The distribution of a random variable Y with a discrete sample space (e.g. a finite sample space or the integers) can be characterized by its probability mass function (pmf): P (Y = y) = f(y).

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Economics 101A (Lecture 3) Stefano DellaVigna

Economics 101A (Lecture 3) Stefano DellaVigna Economics 101A (Lecture 3) Stefano DellaVigna January 24, 2017 Outline 1. Implicit Function Theorem 2. Envelope Theorem 3. Convexity and concavity 4. Constrained Maximization 1 Implicit function theorem

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Chap 2. Optimality conditions

Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Applied Lagrange Duality for Constrained Optimization

Applied Lagrange Duality for Constrained Optimization Applied Lagrange Duality for Constrained Optimization February 12, 2002 Overview The Practical Importance of Duality ffl Review of Convexity ffl A Separating Hyperplane Theorem ffl Definition of the Dual

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning Point Estimation Emily Fox University of Washington January 6, 2017 Maximum likelihood estimation for a binomial distribution 1 Your first consulting job A bored Seattle billionaire asks you a question:

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

The Kuhn-Tucker Problem

The Kuhn-Tucker Problem Natalia Lazzati Mathematics for Economics (Part I) Note 8: Nonlinear Programming - The Kuhn-Tucker Problem Note 8 is based on de la Fuente (2000, Ch. 7) and Simon and Blume (1994, Ch. 18 and 19). The Kuhn-Tucker

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

Optimization. A first course on mathematics for economists

Optimization. A first course on mathematics for economists Optimization. A first course on mathematics for economists Xavier Martinez-Giralt Universitat Autònoma de Barcelona xavier.martinez.giralt@uab.eu II.3 Static optimization - Non-Linear programming OPT p.1/45

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem Unit: Optimality Conditions and Karush Kuhn Tucker Theorem Goals 1. What is the Gradient of a function? What are its properties? 2. How can it be used to find a linear approximation of a nonlinear function?

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

The Karush-Kuhn-Tucker (KKT) conditions

The Karush-Kuhn-Tucker (KKT) conditions The Karush-Kuhn-Tucker (KKT) conditions In this section, we will give a set of sufficient (and at most times necessary) conditions for a x to be the solution of a given convex optimization problem. These

More information

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING Nf SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING f(x R m g HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 5, DR RAPHAEL

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Chapter 4: Unconstrained nonlinear optimization

Chapter 4: Unconstrained nonlinear optimization Chapter 4: Unconstrained nonlinear optimization Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-15-16.shtml Academic year 2015-16 Edoardo

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45 Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written 11.8 Inequality Constraints 341 Because by assumption x is a regular point and L x is positive definite on M, it follows that this matrix is nonsingular (see Exercise 11). Thus, by the Implicit Function

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22 MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Generalization to inequality constrained problem. Maximize

Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Constrained optimization

Constrained optimization Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange

More information

Finite Dimensional Optimization Part I: The KKT Theorem 1

Finite Dimensional Optimization Part I: The KKT Theorem 1 John Nachbar Washington University March 26, 2018 1 Introduction Finite Dimensional Optimization Part I: The KKT Theorem 1 These notes characterize maxima and minima in terms of first derivatives. I focus

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Sufficient Conditions for Finite-variable Constrained Minimization

Sufficient Conditions for Finite-variable Constrained Minimization Lecture 4 It is a small de tour but it is important to understand this before we move to calculus of variations. Sufficient Conditions for Finite-variable Constrained Minimization ME 256, Indian Institute

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS GERD WACHSMUTH Abstract. Kyparisis proved in 1985 that a strict version of the Mangasarian- Fromovitz constraint qualification (MFCQ) is equivalent to

More information

IEOR 165: Spring 2019 Problem Set 2

IEOR 165: Spring 2019 Problem Set 2 IEOR 65: Spring 209 Problem Set 2 Instructor: Professor Anil Aswani Issued: 2/8/9 Due: 3//9 Problem : Part a: We may first calculate the sample means: ȳ =.6, x = 7. Then: i= ˆβ = y ix i ȳ x i= x2 i x2

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Math 181B Homework 1 Solution

Math 181B Homework 1 Solution Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information