Beyond Newton s method Thomas P. Minka

Size: px
Start display at page:

Download "Beyond Newton s method Thomas P. Minka"

Transcription

1 Beyond Newton s method Thomas P. Minka 2000 (revised 7/21/2017) Abstract Newton s method for optimization is equivalent to iteratively maimizing a local quadratic approimation to the objective function. But some functions are not well-approimated by a quadratic, leading to slow convergence, and some have turning points where the curvature changes sign, leading to failure. To fi this, we can use a more appropriate choice of local approimation than quadratic, based on the type of function we are optimizing. This paper demonstrates three such generalized Newton rules. Like Newton s method, they only involve the first two derivatives of the function, yet converge faster and fail less often. 1 Newton s method Newton s method is usually described in terms of root-finding, but it can also be understood as maimizing a local quadratic approimation to the objective function. Let the objective be f(). A quadratic approimation about 0 has the form g() = k + b 2 ( a)2 (1) g () = b( a) (2) g () = b (3) With three free parameters, we can make g and its first two derivatives match those of f at 0. That is, (k,b,a) are given by whose solution is k + b 2 ( 0 a) 2 = f( 0 ) (4) b( 0 a) = f ( 0 ) (5) b = f ( 0 ) (6) b = f ( 0 ) (7) a = 0 f ( 0 )/f ( 0 ) (8) k = f( 0 ) f ( 0 ) 2 2f ( 0 ) (9) 1

2 If b < 0, then the maimum of g is a, leading to the update rule new = f ()/f () (10) which is eactly Newton s method for optimization. Unfortunately, if b = f () 0, then the method fails and we must use some other optimization technique to get closer to the optimum of f. 2 A non-quadratic variation In maimum-likelihood estimation of a Dirichlet distribution, we encounter the objective f() = log Γ()ep(s) K k=1 Γ(m k) > 0 (11) This objective is conve, but Newton s method may still fail by producing a negative value. We can get a faster and more stable algorithm by using a local approimation of the form Matching f and its derivatives at 0 requires g() = k +alog()+b (12) g () = a/+b (13) g () = a/ 2 (14) a = 2 0f ( 0 ) (15) b = f ( 0 ) a/ 0 (16) Figure 1 demonstrates the high quality of this approimation compared to a quadratic. Since f () 0 for all, we know a 0. If in addition b < 0, then the maimum of g is a/b, giving the update rule 1 = 1 new + f () (17) 2 f () If it happens that b 0, then some other method must be used. Note that this update cannot be achieved by simply changing the parameterization of f and applying Newton s method to the new parameterization. For eample, if we write f() = f(1/y) and use a quadratic approimation in y, we get the Newton update or y new f (1/y)( 1/y 2 ) = y f (1/y)(1/y 4 )+f (1/y)(2/y 3 ) 1 = 1 new + f () 2 f ()+2f () (18) (19) 2

3 f() 7 f() Figure 1: Approimations to the objective (11) at = 1 (left) and = 6 (right). Here K = 3, m = [1/2,1/6,1/3], and s = 1.5. Iteration Newton y = 1/ (19) Newton y = log() (20) method (17) 0 = 6 = 6 = Figure2: Convergenceofthreeiteration schemesstartingfrom = 6ontheobjective infigure1. The proposed method reaches machine precision in 4 iterations. which is as close as you can get to (17) using reparameterization. Since must be positive, it is also tempting to reparameterize with f() = f(e y ). Applying Newton s method to y gives ( ) new f () = ep (20) f ()+f () Figure 2 compares the convergence rate of the three updates (17), (19), and (20). 3

4 3 Another variation In maimum-likelihood estimation of certain Gaussian models, we encounter the objective f() = n i=1 log(+v i ) s i +v i 0 (21) This objective is not conve and has turning points which confound Newton s method. Instead of quadratic g, consider a local approimation of the form Matching f and its derivatives at 0 requires g() = k nlog(+a) b +a (22) g () = n +a + b (+a) 2 (23) g n () = (+a) 2b 2 (+a) 3 (24) b = n( 0 +a)+f ( 0 )( 0 +a) 2 (25) { f ( 0 ) 2 nf ( 0 )+f ( 0 ) a = f ( 0 ) 0 if f ( 0 ) 0 n (26) 2f ( 0 ) 0 if f ( 0 ) = 0 Figure 3 demonstrates the high quality of this approimation compared to a quadratic. If a 0 and b > 0, the unconstrained maimum of g is b/n a, so the update rule is new = ma(0,+ f () n (+a)2 ) (27) If the conditions aren t met, which can happen when we are far from the maimum, then some other scheme must be used, just as with Newton s method. However, the method does not fail when f () crosses zero. Figure 4 compares the convergence rate of the updates (10), (20), and (27). 4

5 11 11 f() 12 f() Figure 3: Approimations to the objective (21) at = 2 (left) and = 40 (right). Here n = 3 and v = [1,1/2,1/3], s = [2,5,10]. In both cases, the proposed approimation (22) is nearly identical to the true objective. Iteration Newton (10) Newton y = log() (20) method (27) 0 = 2 = 2 = Figure 4: Convergence of three iteration schemes starting from = 2 on the objective in figure 3. Update (19) fails and is not shown. The proposed method reaches machine precision in 3 iterations. Starting from = 40, the proposed method converges equally fast, while the updates (10) and (20) fail. 5

6 4 Cauchy approimation In certain maimum-likelihood and robust estimation problems, we encounter the objective f() = n log(1+a i ( b i ) 2 ) (28) i=1 For consistency with previous sections, it is written as a maimization problem. Following the eample of the last section, consider a local approimation which has the form of one term: Matching f and its derivatives at 0 requires g() = k log(1+a( b) 2 ) (29) g () = 2a( b) 1+a( b) 2 (30) g () = 2a 1 a( b)2 (1+a( b) 2 ) 2 (31) b = 0 f ( 0 ) f ( 0 ) f ( 0 ) 2 (32) a = (f ( 0 ) f ( 0 ) 2 ) 2 f ( 0 ) 2 2f ( 0 ) Figure 5 compares this approimation with a quadratic. It doesn t fit particularly well, but it is robust. The approimation has a maimum when a > 0, i.e. 2f ( 0 ) < f ( 0 ) 2, which is a weaker constraint than f ( 0 ) < 0 required by Newton s method. When the constraint is satisfied, the maimum of g() is b, so the update rule is new = (33) f () f () f () 2 (34) This can be used as a general replacement for Newton s method, when maimizing any function. For the function in figure 5, the convergence rate is comparable to Newton when started at a point where Newton doesn t fail (points near the maimum). 6

7 f() f() Figure 5: Approimations to the objective (28) at = 2 (left) and = 0.6 (right). Here n = 3 and a = [10,20,30], b = [2,1.5,1]. 7

8 5 Multivariate eample The approach readily generalizes to multivariate functions. For eample, suppose (28) was a function of a vector : f() = n log(1+( b i ) T A i ( b i )) (35) i=1 Instead of using a univariate Cauchy-type approimation, we can use a multivariate Cauchy: The gradient and Hessian of g are: g() = k log(1+( b) T A( b)) (36) g() = g() = 2A( b) 1+( b) T A( b) (37) 2A 1+( b) T A( b) + 4A( b)( b)t A (1+( b) T A( b)) 2 (38) We set A and b by matching the gradient and Hessian of f at 0. This gives: H( 0 ) = f( 0 ) (39) b = 0 ( H( 0 ) f( 0 ) f( 0 ) T) 1 f(0 ) (40) s = f( 0 ) T H( 0 ) 1 f( 0 ) (41) A = (H( 0 ) f( 0 ) f( 0 ) T ) s 1 2 s (42) The approimation has a maimum when A is positive definite, which is a weaker constraint than H( 0 ) being negative definite as required by Newton s method. When the constraint is satisfied, the maimum of g() is b, so the update rule is new = ( H( 0 ) f( 0 ) f( 0 ) T) 1 f(0 ) (43) = H( 0 ) 1 f( 0 ) 1 f( 0 ) T H( 0 ) 1 f( 0 ) This can be used as a general replacement for Newton s method, when maimizing any multivariate function. From (44) we see that it moves in the same direction that Newton would, but with a different stepsize. This stepsize adjustment allows it to work even when Newton would fail. (44) 8

Topic 8c Multi Variable Optimization

Topic 8c Multi Variable Optimization Course Instructor Dr. Raymond C. Rumpf Office: A 337 Phone: (915) 747 6958 E Mail: rcrumpf@utep.edu Topic 8c Multi Variable Optimization EE 4386/5301 Computational Methods in EE Outline Mathematical Preliminaries

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

is a maximizer. However this is not the case. We will now give a graphical argument explaining why argue a further condition must be satisfied.

is a maximizer. However this is not the case. We will now give a graphical argument explaining why argue a further condition must be satisfied. D. Maimization with two variables D. Sufficient conditions for a maimum Suppose that the second order conditions hold strictly. It is tempting to believe that this might be enough to ensure that is a maimizer.

More information

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

CHAPTER 1-2: SHADOW PRICES

CHAPTER 1-2: SHADOW PRICES Essential Microeconomics -- CHAPTER -: SHADOW PRICES An intuitive approach: profit maimizing firm with a fied supply of an input Shadow prices 5 Concave maimization problem 7 Constraint qualifications

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Intro to Nonlinear Optimization

Intro to Nonlinear Optimization Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

Numerical Methods. Root Finding

Numerical Methods. Root Finding Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real

More information

9. Interpretations, Lifting, SOS and Moments

9. Interpretations, Lifting, SOS and Moments 9-1 Interpretations, Lifting, SOS and Moments P. Parrilo and S. Lall, CDC 2003 2003.12.07.04 9. Interpretations, Lifting, SOS and Moments Polynomial nonnegativity Sum of squares (SOS) decomposition Eample

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag. San Francisco State University Math Review Notes Michael Bar Sets A set is any collection of elements Eamples: a A {,,4,6,8,} - the set of even numbers between zero and b B { red, white, bule} - the set

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Chapter 9. Derivatives. Josef Leydold Mathematical Methods WS 2018/19 9 Derivatives 1 / 51. f x. (x 0, f (x 0 ))

Chapter 9. Derivatives. Josef Leydold Mathematical Methods WS 2018/19 9 Derivatives 1 / 51. f x. (x 0, f (x 0 )) Chapter 9 Derivatives Josef Leydold Mathematical Methods WS 208/9 9 Derivatives / 5 Difference Quotient Let f : R R be some function. The the ratio f = f ( 0 + ) f ( 0 ) = f ( 0) 0 is called difference

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

Economics 205 Exercises

Economics 205 Exercises Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25 Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Numerical Integration (Quadrature) Another application for our interpolation tools!

Numerical Integration (Quadrature) Another application for our interpolation tools! Numerical Integration (Quadrature) Another application for our interpolation tools! Integration: Area under a curve Curve = data or function Integrating data Finite number of data points spacing specified

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

Part 2: NLP Constrained Optimization

Part 2: NLP Constrained Optimization Part 2: NLP Constrained Optimization James G. Shanahan 2 Independent Consultant and Lecturer UC Santa Cruz EMAIL: James_DOT_Shanahan_AT_gmail_DOT_com WIFI: SSID Student USERname ucsc-guest Password EnrollNow!

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

APPLICATIONS OF DIFFERENTIATION

APPLICATIONS OF DIFFERENTIATION 4 APPLICATIONS OF DIFFERENTIATION APPLICATIONS OF DIFFERENTIATION 4.8 Newton s Method In this section, we will learn: How to solve high degree equations using Newton s method. INTRODUCTION Suppose that

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Type II variational methods in Bayesian estimation

Type II variational methods in Bayesian estimation Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

A Primer on Multidimensional Optimization

A Primer on Multidimensional Optimization A Primer on Multidimensional Optimization Prof. Dr. Florian Rupp German University of Technology in Oman (GUtech) Introduction to Numerical Methods for ENG & CS (Mathematics IV) Spring Term 2016 Eercise

More information

Language and Statistics II

Language and Statistics II Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Learning Targets: Standard Form: Quadratic Function. Parabola. Vertex Max/Min. x-coordinate of vertex Axis of symmetry. y-intercept.

Learning Targets: Standard Form: Quadratic Function. Parabola. Vertex Max/Min. x-coordinate of vertex Axis of symmetry. y-intercept. Name: Hour: Algebra A Lesson:.1 Graphing Quadratic Functions Learning Targets: Term Picture/Formula In your own words: Quadratic Function Standard Form: Parabola Verte Ma/Min -coordinate of verte Ais of

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 0.87/opre.00.0894ec e-companion ONLY AVAILABLE IN ELECTRONIC FORM informs 0 INFORMS Electronic Companion Fied-Point Approaches to Computing Bertrand-Nash Equilibrium Prices Under

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

ebay/google short course: Problem set 2

ebay/google short course: Problem set 2 18 Jan 013 ebay/google short course: Problem set 1. (the Echange Parado) You are playing the following game against an opponent, with a referee also taking part. The referee has two envelopes (numbered

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

2.3 The Fixed-Point Algorithm

2.3 The Fixed-Point Algorithm .3 The Fied-Point Algorithm 1. Mean Value Theorem: Theorem Rolle stheorem: Suppose that f is continuous on a, b and is differentiable on a, b. If f a f b, then there eists a number c in a, b such that

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

QUADRATIC FUNCTIONS. ( x 7)(5x 6) = 2. Exercises: 1 3x 5 Sum: 8. We ll expand it by using the distributive property; 9. Let s use the FOIL method;

QUADRATIC FUNCTIONS. ( x 7)(5x 6) = 2. Exercises: 1 3x 5 Sum: 8. We ll expand it by using the distributive property; 9. Let s use the FOIL method; QUADRATIC FUNCTIONS A. Eercises: 1.. 3. + = + = + + = +. ( 1)(3 5) (3 5) 1(3 5) 6 10 3 5 6 13 5 = = + = +. ( 7)(5 6) (5 6) 7(5 6) 5 6 35 4 5 41 4 3 5 6 10 1 3 5 Sum: 6 + 10+ 3 5 ( + 1)(3 5) = 6 + 13 5

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Optimization Methods: Optimization using Calculus - Equality constraints 1. Module 2 Lecture Notes 4

Optimization Methods: Optimization using Calculus - Equality constraints 1. Module 2 Lecture Notes 4 Optimization Methods: Optimization using Calculus - Equality constraints Module Lecture Notes 4 Optimization of Functions of Multiple Variables subect to Equality Constraints Introduction In the previous

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Review of Optimization Basics

Review of Optimization Basics Review of Optimization Basics. Introduction Electricity markets throughout the US are said to have a two-settlement structure. The reason for this is that the structure includes two different markets:

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Lecture 23: Conditional Gradient Method

Lecture 23: Conditional Gradient Method 10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

More information

Example - Newton-Raphson Method

Example - Newton-Raphson Method Eample - Newton-Raphson Method We now consider the following eample: minimize f( 3 3 + -- 4 4 Since f ( 3 2 + 3 3 and f ( 6 + 9 2 we form the following iteration: + n 3 ( n 3 3( n 2 ------------------------------------

More information

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS Justin Dauwels Dept. of Information Technology and Electrical Engineering ETH, CH-8092 Zürich, Switzerland dauwels@isi.ee.ethz.ch

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Actively studied subject for several reasons: Commonly encountered problem: e.g. Hamilton s and Lagrange s principles, economics

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011 .S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

Water Resources Systems: Modeling Techniques and Analysis

Water Resources Systems: Modeling Techniques and Analysis INDIAN INSTITUTE OF SCIENCE Water Resources Systems: Modeling Techniques and Analysis Lecture - 9 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture

More information

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

Numerical optimization

Numerical optimization THE UNIVERSITY OF WESTERN ONTARIO LONDON ONTARIO Paul Klein Office: SSC 408 Phone: 661-111 ext. 857 Email: paul.klein@uwo.ca URL: www.ssc.uwo.ca/economics/faculty/klein/ Numerical optimization In these

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}

More information

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Color Scheme.   swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14 Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference

More information

Applied Computational Economics Workshop. Part 3: Nonlinear Equations

Applied Computational Economics Workshop. Part 3: Nonlinear Equations Applied Computational Economics Workshop Part 3: Nonlinear Equations 1 Overview Introduction Function iteration Newton s method Quasi-Newton methods Practical example Practical issues 2 Introduction Nonlinear

More information

Calculus in Business. By Frederic A. Palmliden December 7, 1999

Calculus in Business. By Frederic A. Palmliden December 7, 1999 Calculus in Business By Frederic A. Palmliden December 7, 999 Optimization Linear Programming Game Theory Optimization The quest for the best Definition of goal equilibrium: The equilibrium state is defined

More information

Math 171 Calculus I Spring, 2019 Practice Questions for Exam IV 1

Math 171 Calculus I Spring, 2019 Practice Questions for Exam IV 1 Math 171 Calculus I Spring, 019 Practice Questions for Eam IV 1 Problem 1. a. Find the linear approimation to f() = cos() at a = π. b. Use your approimation to estimate (π + 1/100) cos(π + 1/100) Problem.

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Math Midterm Solutions

Math Midterm Solutions Math 50 - Midterm Solutions November 4, 009. a) If f ) > 0 for all in a, b), then the graph of f is concave upward on a, b). If f ) < 0 for all in a, b), then the graph of f is downward on a, b). This

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Expectation Maximization Deconvolution Algorithm

Expectation Maximization Deconvolution Algorithm Epectation Maimization Deconvolution Algorithm Miaomiao ZHANG March 30, 2011 Abstract In this paper, we use a general mathematical and eperimental methodology to analyze image deconvolution. The main procedure

More information

CSC 576: Gradient Descent Algorithms

CSC 576: Gradient Descent Algorithms CSC 576: Gradient Descent Algorithms Ji Liu Department of Computer Sciences, University of Rochester December 22, 205 Introduction The gradient descent algorithm is one of the most popular optimization

More information

How to Plan Experiments

How to Plan Experiments How to Plan Eperiments Formulate a hypothesis Choose independent variable values Eperimental planning Scientific method Statistical design of eperiments Setting specifications Formulating a Hypothesis

More information

A Robust State Estimator Based on Maximum Exponential Square (MES)

A Robust State Estimator Based on Maximum Exponential Square (MES) 11 th Int. Workshop on EPCC, May -5, 011, Altea, Spain A Robust State Estimator Based on Maimum Eponential Square (MES) Wenchuan Wu Ye Guo Boming Zhang, et al Dept. of Electrical Eng. Tsinghua University

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

= V I = Bus Admittance Matrix. Chapter 6: Power Flow. Constructing Ybus. Example. Network Solution. Triangular factorization. Let

= V I = Bus Admittance Matrix. Chapter 6: Power Flow. Constructing Ybus. Example. Network Solution. Triangular factorization. Let Chapter 6: Power Flow Network Matrices Network Solutions Newton-Raphson Method Fast Decoupled Method Bus Admittance Matri Let I = vector of currents injected into nodes V = vector of node voltages Y bus

More information

Nonlinear Equations. Chapter The Bisection Method

Nonlinear Equations. Chapter The Bisection Method Chapter 6 Nonlinear Equations Given a nonlinear function f(), a value r such that f(r) = 0, is called a root or a zero of f() For eample, for f() = e 016064, Fig?? gives the set of points satisfying y

More information