Beyond Newton s method Thomas P. Minka
|
|
- August Palmer
- 5 years ago
- Views:
Transcription
1 Beyond Newton s method Thomas P. Minka 2000 (revised 7/21/2017) Abstract Newton s method for optimization is equivalent to iteratively maimizing a local quadratic approimation to the objective function. But some functions are not well-approimated by a quadratic, leading to slow convergence, and some have turning points where the curvature changes sign, leading to failure. To fi this, we can use a more appropriate choice of local approimation than quadratic, based on the type of function we are optimizing. This paper demonstrates three such generalized Newton rules. Like Newton s method, they only involve the first two derivatives of the function, yet converge faster and fail less often. 1 Newton s method Newton s method is usually described in terms of root-finding, but it can also be understood as maimizing a local quadratic approimation to the objective function. Let the objective be f(). A quadratic approimation about 0 has the form g() = k + b 2 ( a)2 (1) g () = b( a) (2) g () = b (3) With three free parameters, we can make g and its first two derivatives match those of f at 0. That is, (k,b,a) are given by whose solution is k + b 2 ( 0 a) 2 = f( 0 ) (4) b( 0 a) = f ( 0 ) (5) b = f ( 0 ) (6) b = f ( 0 ) (7) a = 0 f ( 0 )/f ( 0 ) (8) k = f( 0 ) f ( 0 ) 2 2f ( 0 ) (9) 1
2 If b < 0, then the maimum of g is a, leading to the update rule new = f ()/f () (10) which is eactly Newton s method for optimization. Unfortunately, if b = f () 0, then the method fails and we must use some other optimization technique to get closer to the optimum of f. 2 A non-quadratic variation In maimum-likelihood estimation of a Dirichlet distribution, we encounter the objective f() = log Γ()ep(s) K k=1 Γ(m k) > 0 (11) This objective is conve, but Newton s method may still fail by producing a negative value. We can get a faster and more stable algorithm by using a local approimation of the form Matching f and its derivatives at 0 requires g() = k +alog()+b (12) g () = a/+b (13) g () = a/ 2 (14) a = 2 0f ( 0 ) (15) b = f ( 0 ) a/ 0 (16) Figure 1 demonstrates the high quality of this approimation compared to a quadratic. Since f () 0 for all, we know a 0. If in addition b < 0, then the maimum of g is a/b, giving the update rule 1 = 1 new + f () (17) 2 f () If it happens that b 0, then some other method must be used. Note that this update cannot be achieved by simply changing the parameterization of f and applying Newton s method to the new parameterization. For eample, if we write f() = f(1/y) and use a quadratic approimation in y, we get the Newton update or y new f (1/y)( 1/y 2 ) = y f (1/y)(1/y 4 )+f (1/y)(2/y 3 ) 1 = 1 new + f () 2 f ()+2f () (18) (19) 2
3 f() 7 f() Figure 1: Approimations to the objective (11) at = 1 (left) and = 6 (right). Here K = 3, m = [1/2,1/6,1/3], and s = 1.5. Iteration Newton y = 1/ (19) Newton y = log() (20) method (17) 0 = 6 = 6 = Figure2: Convergenceofthreeiteration schemesstartingfrom = 6ontheobjective infigure1. The proposed method reaches machine precision in 4 iterations. which is as close as you can get to (17) using reparameterization. Since must be positive, it is also tempting to reparameterize with f() = f(e y ). Applying Newton s method to y gives ( ) new f () = ep (20) f ()+f () Figure 2 compares the convergence rate of the three updates (17), (19), and (20). 3
4 3 Another variation In maimum-likelihood estimation of certain Gaussian models, we encounter the objective f() = n i=1 log(+v i ) s i +v i 0 (21) This objective is not conve and has turning points which confound Newton s method. Instead of quadratic g, consider a local approimation of the form Matching f and its derivatives at 0 requires g() = k nlog(+a) b +a (22) g () = n +a + b (+a) 2 (23) g n () = (+a) 2b 2 (+a) 3 (24) b = n( 0 +a)+f ( 0 )( 0 +a) 2 (25) { f ( 0 ) 2 nf ( 0 )+f ( 0 ) a = f ( 0 ) 0 if f ( 0 ) 0 n (26) 2f ( 0 ) 0 if f ( 0 ) = 0 Figure 3 demonstrates the high quality of this approimation compared to a quadratic. If a 0 and b > 0, the unconstrained maimum of g is b/n a, so the update rule is new = ma(0,+ f () n (+a)2 ) (27) If the conditions aren t met, which can happen when we are far from the maimum, then some other scheme must be used, just as with Newton s method. However, the method does not fail when f () crosses zero. Figure 4 compares the convergence rate of the updates (10), (20), and (27). 4
5 11 11 f() 12 f() Figure 3: Approimations to the objective (21) at = 2 (left) and = 40 (right). Here n = 3 and v = [1,1/2,1/3], s = [2,5,10]. In both cases, the proposed approimation (22) is nearly identical to the true objective. Iteration Newton (10) Newton y = log() (20) method (27) 0 = 2 = 2 = Figure 4: Convergence of three iteration schemes starting from = 2 on the objective in figure 3. Update (19) fails and is not shown. The proposed method reaches machine precision in 3 iterations. Starting from = 40, the proposed method converges equally fast, while the updates (10) and (20) fail. 5
6 4 Cauchy approimation In certain maimum-likelihood and robust estimation problems, we encounter the objective f() = n log(1+a i ( b i ) 2 ) (28) i=1 For consistency with previous sections, it is written as a maimization problem. Following the eample of the last section, consider a local approimation which has the form of one term: Matching f and its derivatives at 0 requires g() = k log(1+a( b) 2 ) (29) g () = 2a( b) 1+a( b) 2 (30) g () = 2a 1 a( b)2 (1+a( b) 2 ) 2 (31) b = 0 f ( 0 ) f ( 0 ) f ( 0 ) 2 (32) a = (f ( 0 ) f ( 0 ) 2 ) 2 f ( 0 ) 2 2f ( 0 ) Figure 5 compares this approimation with a quadratic. It doesn t fit particularly well, but it is robust. The approimation has a maimum when a > 0, i.e. 2f ( 0 ) < f ( 0 ) 2, which is a weaker constraint than f ( 0 ) < 0 required by Newton s method. When the constraint is satisfied, the maimum of g() is b, so the update rule is new = (33) f () f () f () 2 (34) This can be used as a general replacement for Newton s method, when maimizing any function. For the function in figure 5, the convergence rate is comparable to Newton when started at a point where Newton doesn t fail (points near the maimum). 6
7 f() f() Figure 5: Approimations to the objective (28) at = 2 (left) and = 0.6 (right). Here n = 3 and a = [10,20,30], b = [2,1.5,1]. 7
8 5 Multivariate eample The approach readily generalizes to multivariate functions. For eample, suppose (28) was a function of a vector : f() = n log(1+( b i ) T A i ( b i )) (35) i=1 Instead of using a univariate Cauchy-type approimation, we can use a multivariate Cauchy: The gradient and Hessian of g are: g() = k log(1+( b) T A( b)) (36) g() = g() = 2A( b) 1+( b) T A( b) (37) 2A 1+( b) T A( b) + 4A( b)( b)t A (1+( b) T A( b)) 2 (38) We set A and b by matching the gradient and Hessian of f at 0. This gives: H( 0 ) = f( 0 ) (39) b = 0 ( H( 0 ) f( 0 ) f( 0 ) T) 1 f(0 ) (40) s = f( 0 ) T H( 0 ) 1 f( 0 ) (41) A = (H( 0 ) f( 0 ) f( 0 ) T ) s 1 2 s (42) The approimation has a maimum when A is positive definite, which is a weaker constraint than H( 0 ) being negative definite as required by Newton s method. When the constraint is satisfied, the maimum of g() is b, so the update rule is new = ( H( 0 ) f( 0 ) f( 0 ) T) 1 f(0 ) (43) = H( 0 ) 1 f( 0 ) 1 f( 0 ) T H( 0 ) 1 f( 0 ) This can be used as a general replacement for Newton s method, when maimizing any multivariate function. From (44) we see that it moves in the same direction that Newton would, but with a different stepsize. This stepsize adjustment allows it to work even when Newton would fail. (44) 8
Topic 8c Multi Variable Optimization
Course Instructor Dr. Raymond C. Rumpf Office: A 337 Phone: (915) 747 6958 E Mail: rcrumpf@utep.edu Topic 8c Multi Variable Optimization EE 4386/5301 Computational Methods in EE Outline Mathematical Preliminaries
More informationUnconstrained Multivariate Optimization
Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued
More informationis a maximizer. However this is not the case. We will now give a graphical argument explaining why argue a further condition must be satisfied.
D. Maimization with two variables D. Sufficient conditions for a maimum Suppose that the second order conditions hold strictly. It is tempting to believe that this might be enough to ensure that is a maimizer.
More informationPower EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract
Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationCHAPTER 1-2: SHADOW PRICES
Essential Microeconomics -- CHAPTER -: SHADOW PRICES An intuitive approach: profit maimizing firm with a fied supply of an input Shadow prices 5 Concave maimization problem 7 Constraint qualifications
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationIntro to Nonlinear Optimization
Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationNumerical Methods. Root Finding
Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real
More information9. Interpretations, Lifting, SOS and Moments
9-1 Interpretations, Lifting, SOS and Moments P. Parrilo and S. Lall, CDC 2003 2003.12.07.04 9. Interpretations, Lifting, SOS and Moments Polynomial nonnegativity Sum of squares (SOS) decomposition Eample
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationVariational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M
A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is
More information1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.
San Francisco State University Math Review Notes Michael Bar Sets A set is any collection of elements Eamples: a A {,,4,6,8,} - the set of even numbers between zero and b B { red, white, bule} - the set
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationChapter 9. Derivatives. Josef Leydold Mathematical Methods WS 2018/19 9 Derivatives 1 / 51. f x. (x 0, f (x 0 ))
Chapter 9 Derivatives Josef Leydold Mathematical Methods WS 208/9 9 Derivatives / 5 Difference Quotient Let f : R R be some function. The the ratio f = f ( 0 + ) f ( 0 ) = f ( 0) 0 is called difference
More informationA Practitioner s Guide to Generalized Linear Models
A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically
More informationEconomics 205 Exercises
Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationNumerical Integration (Quadrature) Another application for our interpolation tools!
Numerical Integration (Quadrature) Another application for our interpolation tools! Integration: Area under a curve Curve = data or function Integrating data Finite number of data points spacing specified
More informationChapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence
Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we
More informationPart 2: NLP Constrained Optimization
Part 2: NLP Constrained Optimization James G. Shanahan 2 Independent Consultant and Lecturer UC Santa Cruz EMAIL: James_DOT_Shanahan_AT_gmail_DOT_com WIFI: SSID Student USERname ucsc-guest Password EnrollNow!
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationAPPLICATIONS OF DIFFERENTIATION
4 APPLICATIONS OF DIFFERENTIATION APPLICATIONS OF DIFFERENTIATION 4.8 Newton s Method In this section, we will learn: How to solve high degree equations using Newton s method. INTRODUCTION Suppose that
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationType II variational methods in Bayesian estimation
Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093
More informationSTATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY
STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationToday. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods
Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationA Primer on Multidimensional Optimization
A Primer on Multidimensional Optimization Prof. Dr. Florian Rupp German University of Technology in Oman (GUtech) Introduction to Numerical Methods for ENG & CS (Mathematics IV) Spring Term 2016 Eercise
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationLearning Targets: Standard Form: Quadratic Function. Parabola. Vertex Max/Min. x-coordinate of vertex Axis of symmetry. y-intercept.
Name: Hour: Algebra A Lesson:.1 Graphing Quadratic Functions Learning Targets: Term Picture/Formula In your own words: Quadratic Function Standard Form: Parabola Verte Ma/Min -coordinate of verte Ais of
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 0.87/opre.00.0894ec e-companion ONLY AVAILABLE IN ELECTRONIC FORM informs 0 INFORMS Electronic Companion Fied-Point Approaches to Computing Bertrand-Nash Equilibrium Prices Under
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationebay/google short course: Problem set 2
18 Jan 013 ebay/google short course: Problem set 1. (the Echange Parado) You are playing the following game against an opponent, with a referee also taking part. The referee has two envelopes (numbered
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More information2.3 The Fixed-Point Algorithm
.3 The Fied-Point Algorithm 1. Mean Value Theorem: Theorem Rolle stheorem: Suppose that f is continuous on a, b and is differentiable on a, b. If f a f b, then there eists a number c in a, b such that
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationQUADRATIC FUNCTIONS. ( x 7)(5x 6) = 2. Exercises: 1 3x 5 Sum: 8. We ll expand it by using the distributive property; 9. Let s use the FOIL method;
QUADRATIC FUNCTIONS A. Eercises: 1.. 3. + = + = + + = +. ( 1)(3 5) (3 5) 1(3 5) 6 10 3 5 6 13 5 = = + = +. ( 7)(5 6) (5 6) 7(5 6) 5 6 35 4 5 41 4 3 5 6 10 1 3 5 Sum: 6 + 10+ 3 5 ( + 1)(3 5) = 6 + 13 5
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationOptimization Methods: Optimization using Calculus - Equality constraints 1. Module 2 Lecture Notes 4
Optimization Methods: Optimization using Calculus - Equality constraints Module Lecture Notes 4 Optimization of Functions of Multiple Variables subect to Equality Constraints Introduction In the previous
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationReview of Optimization Basics
Review of Optimization Basics. Introduction Electricity markets throughout the US are said to have a two-settlement structure. The reason for this is that the structure includes two different markets:
More informationComparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems
International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming
More informationLecture 23: Conditional Gradient Method
10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley
More informationExample - Newton-Raphson Method
Eample - Newton-Raphson Method We now consider the following eample: minimize f( 3 3 + -- 4 4 Since f ( 3 2 + 3 3 and f ( 6 + 9 2 we form the following iteration: + n 3 ( n 3 3( n 2 ------------------------------------
More informationNUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS
NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS Justin Dauwels Dept. of Information Technology and Electrical Engineering ETH, CH-8092 Zürich, Switzerland dauwels@isi.ee.ethz.ch
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationReview of Classical Optimization
Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization
More informationLecture 7: Minimization or maximization of functions (Recipes Chapter 10)
Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Actively studied subject for several reasons: Commonly encountered problem: e.g. Hamilton s and Lagrange s principles, economics
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationM.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011
.S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More informationWater Resources Systems: Modeling Techniques and Analysis
INDIAN INSTITUTE OF SCIENCE Water Resources Systems: Modeling Techniques and Analysis Lecture - 9 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture
More informationLecture 3: Linesearch methods (continued). Steepest descent methods
Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).
More informationNumerical optimization
THE UNIVERSITY OF WESTERN ONTARIO LONDON ONTARIO Paul Klein Office: SSC 408 Phone: 661-111 ext. 857 Email: paul.klein@uwo.ca URL: www.ssc.uwo.ca/economics/faculty/klein/ Numerical optimization In these
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationNON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich
NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}
More informationColor Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14
Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference
More informationApplied Computational Economics Workshop. Part 3: Nonlinear Equations
Applied Computational Economics Workshop Part 3: Nonlinear Equations 1 Overview Introduction Function iteration Newton s method Quasi-Newton methods Practical example Practical issues 2 Introduction Nonlinear
More informationCalculus in Business. By Frederic A. Palmliden December 7, 1999
Calculus in Business By Frederic A. Palmliden December 7, 999 Optimization Linear Programming Game Theory Optimization The quest for the best Definition of goal equilibrium: The equilibrium state is defined
More informationMath 171 Calculus I Spring, 2019 Practice Questions for Exam IV 1
Math 171 Calculus I Spring, 019 Practice Questions for Eam IV 1 Problem 1. a. Find the linear approimation to f() = cos() at a = π. b. Use your approimation to estimate (π + 1/100) cos(π + 1/100) Problem.
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationMath Midterm Solutions
Math 50 - Midterm Solutions November 4, 009. a) If f ) > 0 for all in a, b), then the graph of f is concave upward on a, b). If f ) < 0 for all in a, b), then the graph of f is downward on a, b). This
More informationLecture 16: October 22
0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationExpectation Maximization Deconvolution Algorithm
Epectation Maimization Deconvolution Algorithm Miaomiao ZHANG March 30, 2011 Abstract In this paper, we use a general mathematical and eperimental methodology to analyze image deconvolution. The main procedure
More informationCSC 576: Gradient Descent Algorithms
CSC 576: Gradient Descent Algorithms Ji Liu Department of Computer Sciences, University of Rochester December 22, 205 Introduction The gradient descent algorithm is one of the most popular optimization
More informationHow to Plan Experiments
How to Plan Eperiments Formulate a hypothesis Choose independent variable values Eperimental planning Scientific method Statistical design of eperiments Setting specifications Formulating a Hypothesis
More informationA Robust State Estimator Based on Maximum Exponential Square (MES)
11 th Int. Workshop on EPCC, May -5, 011, Altea, Spain A Robust State Estimator Based on Maimum Eponential Square (MES) Wenchuan Wu Ye Guo Boming Zhang, et al Dept. of Electrical Eng. Tsinghua University
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationDual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.
More information= V I = Bus Admittance Matrix. Chapter 6: Power Flow. Constructing Ybus. Example. Network Solution. Triangular factorization. Let
Chapter 6: Power Flow Network Matrices Network Solutions Newton-Raphson Method Fast Decoupled Method Bus Admittance Matri Let I = vector of currents injected into nodes V = vector of node voltages Y bus
More informationNonlinear Equations. Chapter The Bisection Method
Chapter 6 Nonlinear Equations Given a nonlinear function f(), a value r such that f(r) = 0, is called a root or a zero of f() For eample, for f() = e 016064, Fig?? gives the set of points satisfying y
More information