Least squares methods in maximum likelihood problems. M.R.Osborne School of Mathematical Sciences, ANU
|
|
- Stuart George
- 5 years ago
- Views:
Transcription
1 Least squares methods in maximum likelihood problems M.R.Osborne School of Mathematical Sciences, ANU September 22,
2 Abstract: It is well known that the Gauss- Newton algorithm for solving nonlinear least squares problems is a special case of the scoring algorithm for maximizing log likelihoods. What has received less attention is that the computation of the current correction in the scoring algorithm in both its line search and trust region forms can be cast as a linear least squares problem. This is an important observation both because it provides likelihood methods with a general framework which accords with computational orthodoxy, and because it can be seen as underpinning computational procedures which have been developed for particular classes of likelihood problems (for example, generalised linear models). Aspects of this orthodoxy as it affects considerations such as convergence and effectiveness will be reviewed. 2
3 1. Point of paper has changed somewhat with more emphasis on what do we know about Gauss-Newton. 2.Will not consider estimation of DE s per se. However, 1. Results apply to stable simple shooting estimation problems, and 2. to suitably imbedded multiple shooting formulations - and we know how to do this. 3. Strictly results do not apply to (our work on) the simultaneous approach to ODE estimation. Problem is technically a mixed problem as limiting multipliers satisfy a stochastic DE. 4. Will not say anything about optimum observation points. However, the information matrix will be much in evidence,and its determinant is a quantity to maximize as a function of these. 3
4 Start with independent event outcomes y t R q, t = 1, 2,, n, associated pdf g (y t ; θ t, t) indexed by points t T R l, and structural information provided by a known parametric model θ t = η (t, x) where θ R s, and x R p. A priori information is the experimental design T n, T n = n. For asymptotics require 1 n t T n f(t) S(T ) f(t)ρ(t)dt The problem is given the event outcomes y t it is required to estimate x. It is not assumed that the individual components of y t are independent. 4
5 Example Exponential family: ( [ ]) θ g y; = c (y, φ) exp [{ y φ T } ] θ b (θ) /a (φ) E {y} = µ ( x, t ) = b (θ) T, V {y} = a (φ) 2 b (θ). signal in noise model θ = µ. (i)normal density g = 1 exp 1 (y µ)2 2πσ 2σ2 c(y, φ) = 1 exp y2 2πσ 2σ 2, a(φ) = σ2 θ = µ, b(θ) = µ 2. 5
6 (ii) multinomial (discrete) distribution g (n; ω) = n! p j=1 n j! p j=1 ω n j j = n! p j=1 n j! e p j=1 n j log ω j, where p j=1 n j = n, and the frequencies must satisfy the condition p j=1 ω j = 1. Eliminating ω p gives p j=1 n j log ω j = It follows that θ j = log p 1 j=1 n j log ω j 1 p 1 i=1 ω +n log i ω j 1 p 1 i=1 ω, b (θ) = n log i 1 1 p 1 e θ j. j=1 p 1 j=1 ω 6
7 Likelihood: G (y; x, T ) = t T g (y t ; θ t, t) Estimation principle: x = arg max x G (y; x, T ). Log likelihood L (y; x, T ) = t T log g (y t ; θ t, t) = t T L t (y t ; θ t, t) Assume: true model η, parameter vector x ; x properly in interior of region in which L is well behaved; boundedness of integrals (computing expectations etc.). 7
8 Properties: E { x L (y; x, T )} = 0; E { 2 x L (y; x, T ) } = E { x L T x L }. Fisher information I n = 1 n E { x L (y; x, T ) T x L (y; x, T ) { } 1 = V n x L (y; x, T ). } Maximum likelihood estimates are consistent and asymptotically minimum variance. 8
9 Algorithms: Describe step h, x x + h Newton J n = 1 n 2 x L (y; x, T n) h = Jn 1 1 n xl (y; x, T ) T Scoring h = In 1 1 n xl (y; x, T ) T Sample S n = 1 x L t (y t ; x, t) T x L t (y t ; x, t) n t T n h = Sn 1 1 n xl (y; x, T ) T 9
10 Equivalence lim n I n(x ) = lim S n n (x ) = lim J n n (x ) = I where I = S(T ) E { 2 x L (y; θ t, t)}ρ(t)dt. Transformation invariance - scoring, sample, but not Newton: Let u = u(x), W = u x then h u = W h x. x L = u LW, I x = W T I u W h x = ( W T I u W ) 1 1 n W T u L T = W 1 (I u ) 11 n ul T 10
11 Implementation: This requires: (1) A method for generating h, a direction in which the objective is increasing. (2) A method for estimating progress. A full step need not be satisfactory. This is especially true of initial steps. To measure progress introduce a monitor Φ(x) - needs same local stationary points and to be increasing when objective F is increasing F h 0 Φh 0 Two approaches: (1) Line search: effective search direction computed, monitor used to gauge a profitable step in this direction. (2) Trust region: step required to lie in an adaptively defined control region - typically one in which linearization of F does not depart too far from true behaviour. 11
12 Direction of search is required to be one which increases the objective. This is the case here for both scoring and sample algorithms x Lh = 1 n xli 1 n xl T > 0, x x. Note this shows that L is a suitable monitor. x Lh is invariant: n x Lh x = x LIn 1 xl T = u LW (W T I n W ) 1 W T u L T = u L(I u n ) 1 x L T = n u Lh u Will see there are good ways to compute this. 12
13 Measuring effectiveness. step x x + λh if Goldstein: accept ρ Ψ(λ, x, h) 1 ρ, 0 < ρ <.5 Φ(x + λh) Φ(x) Ψ = λ x Φ(x)h Can always choose λ to satisfy this test. Armijo: Let 0 < ρ < 1, and λ = ρ k where k is smallest integer such that Φ(x + ρ k 1 h) Φ(x) < Φ(x + ρ k h). Goldstein fine for proving results. No direct link in sense of small step asymptotic equivalence relating λ and ρ k. It does not tell how to find λ while Armijo does. Choice λ = 1 favoured for scale invariant, Newton type methods. Also Ψ.5 for scoring and sample methods provided n large enough..5 ρ.1. 13
14 Convergence: nice inequality (Kantorovitch) x Lh x L h 1 cond S (I n ) 1/2. If region R L bounded and I n positive definite then ascent direction exists uniformly lower bound λ R exists for linesearch step multiplier. Effectiveness means L can be used as a monitor (Not true for pure Newton). Goldstein gives x Lh L(x + λh) L(x) ρλ R 0 L(x) increasing and bounded convergence. Also h = (I n ) 11 n xl T xl n I n. Putting it all together gives x L 2 K (L(x + λh) L(x)) 0 14
15 Must be se- What happens if inf{λ i } = 0? quence { λ i }, inf λ i = 0 ρ > L(x i + λ i h i ) L(x i ) λ i x L(x i )h i Here, Taylor plus mean value theorem gives 1 2(1 ρ) n 2 xl(x) > σ min (I n ). λ i Almost a global result! Set of allowed approximations need not be closed: η = x(1) + x(2) exp x(3)t t = lim n (n n exp 1 n t). 15
16 Least squares. Consider sample estimate: S n = 1 n S n = t T n x L T t x L t = 1 n ST n S n. x L t. Correction satisfies the least squares problem min h r T r; r = S n h e 16
17 Scoring works pretty much the same: I n = 1 n t T n x η T E{ η L T t ηl t } x η Set Vt 1 = E{ η L T t ηl t } then obtain least squares problem where I L n = min h r T r; r = I L n h b. V 1/2 t. x η, b =. V 1/2 t η L t.. 17
18 Example: normal distribution L t = 1 2σ 2(y t µ(x, t)) 2 x L t = 1 σ 2(y t µ(x, t)) x µ t I L n = 2 x L t = 1 σ 2 ( (y t µ(x, t)) 2 x µ t. x µ t. I n = 1 nσ 2 t T n x µ T t xµ t As σ cancels the result is the Gauss-Newton method. In contrast S n = 1 (y t µ(x, t))2 nσ 2 t T σ 2 x µ T t xµ t. n Here the scale does not exit so agreeably. 18
19 Computation: I L n = [ Q 1 h = U 1 Q T 1 b Q 2 ] [ U 0 ] Can get more value from factorization x Lh = ( b T In L ) h [ = b T U Q 0 ] = Q T 1 b 2 0 U 1 Q T 1 b This is needed for Goldstein test, Also 0 as x Lh 0 so provides a scale invariant quantity for convergence testing. Typically would scale columns of I L n - see Higham s book. 19
20 Rate of convergence: Consider the unit step scoring iteration in fixed point form: where x i+1 = F n (x i ), F n (x) = x + I n (x) 1 1 n xl (x) T. The condition for convergence is ϖ ( F n (x n )) < 1, where ϖ ( F n (x n ) ) is the spectral radius of the variation F n = x F n. Then ϖ ( F n (x n )) 0, a.s., n. ϖ ( F n (x n ) ) is a (Newton-like) invariant of the likelihood surface, is a measure of the quality of the modelling, and can be estimated by a modification of the power method. 20
21 To calculate ϖ ( F n (x n ) ) note that x L (x n ) = 0. thus F n (x n ) = I + I n (x n ) 1 1 n 2 x L (x n), = I n (x n ) 1 (I n (x n ) + 1 n 2 x L (x n) If the right hand side were evaluated at x then the result would follow from the strong law of large numbers which shows that the matrix gets small (hence ϖ gets small) almost surely as n. But, by consistency of the estimates, we have ϖ ( F n (x n )) = ϖ ( F n ( x )) + O ( x n x ), a.s., and the desired result follows. ). 21
22 Trust region: Idea is to limit scope of the linear subproblem to a closed control region containing the current estimate: min r T r; h, h 2 D γ r = I L n h b h 2 D = ht D 2 h, Necessary conditions give [ r T 0 ] = λ T [ I I L n D > 0 diagonal. ] π [ 0 h T D 2 ] so h satisfies the perturbed scoring equations (I n + πn D2 ) h = 1 n xl T π can be used to control the trust region by controlling the size of h. Differentiating wrt π gives ( I n + π ) dh n D2 dπ = 1 n D2 h dht D 2 h = n dh T (I n + π ) dh dπ dπ n D2 dπ < 0 d dπ h 2 D < 0. 22
23 The classical form of the algorithm goes back to Levenberg Here two parameters α, β are kept, and the basic sequence of operations is: count=1: do while F(x+h(\pi))<F(x) count=count+1 \pi=\alpha*\pi loop x\leftarrow x+h(\pi) if count=1 then \pi=\beta*\pi Successful steps will be taken eventually as h 1 π D 2 x L T, π Experience suggests choices of α, β are not critical. Typically αβ < 1 to approach Newton like methods. The scoring and sample algorithms are not exact Newton methods. Both can be regarded as regularised methods because of their generic positive definiteness. Not obvious that setting π = 0 will increase rate of convergence. 23
24 Basic theorems mirror line search results. Convergence: Let {x i } produced by the α, β procedure be contained in compact region R in which {π i } < then {L(y; x i, T n )} converges, and limit points of {x i } are stationary points of L. Boundedness: If sequence {π i } determined by α, β procedure is unbounded in R then the norm of 2 xl is also unbounded. Remember iterations have the potential to become unbounded for smooth sets of nonlinear approximating functions as closure of these sets of functions cannot be guaranteed without more work. 24
25 Computation: The trust region problem is: [ ] [ ] min r T X L r; r = n b h h πi 0 ] [ Let Xn L U = Q then h(π) can be found by 0 solving the typically much smaller problem min h s T s; s = [ U πi ] h [ c1 0 where c 1 = Q T 1 b. Make further factorization [ ] [ [ ] [ ] U = Q U ], Q πi 0 T c10 c 1 c then 2 ] h(π) = (U ) 1 c 1 x Lh(π) = c T 1 Uh(π) = [ c T ] [ ] 1 0 πi U ( U T U + πi ) 1 [ U T πi ] [ c 1 0 ] = c
26 The linear subproblem has a useful invariance property with respect to diagonal scaling. Introduce the new variables u = T x where T is diagonal. T 1 ( S T x S x + πd 2) T 1 T h x = T 1 x L T. This is equivalent to ( S T u S u + πt 1 D 2 T 1) h u = u L T. Thus if D i transforms with x then T 1 i i D i transforms in the same way with respect to u. This requirement is satisfied by i D i = (S n ) i. This transformation effects a rescaling of the least squares problem. We have h = ( S T n S n + πd 2) 1 S T n e, Dh = ( D 1 S T n S nd 1 + πi ) 1 D 1 S T n e. The effect of this choice is to rescale the columns of S n to have unit length. It is often sufficient to set π = 1, and D = diag { (S n ) i, i = 1, 2,, p} initially. 26
27 One catch is the initial choice π = π 0. One example is provided by models containing exponential terms such as e xkt which should have negative exponents (x k > 0) but which can become positive (x k + h k < 0) by too large a step. To fix introduce a damping factor τ such that the critical components of x+τh(π 0 ) 0. τh(π 0 ) provides a possible choice for a revised trust region bound γ. This involves solving for π by eg Newton s method the equation h(π) = γ. dh dπ can be found by solving ( U T U + πi ) dh dπ = h This can be written in least squares form: min h s T s; s = [ U π 1/2 I ] dh dπ [ U T h 0 ]. 27
28 Newton s method extrapolates the function as a linear. An alternative strategy could be preferable here. This is based on the observation that, if A = W ΣV T is the singular value decomposition of the matrix in the least squares formulation, then h = p i=1 σ i (w T i b) σ 2 i + π v i is a rational function of π. To mirror this set h a b + (π π 0 ). To identify the parameters a and b use the values h(π 0 ) and dπ d h(π 0) - this corresponds to the same information as used in Newton s method. 28
29 To solve for the parameters we have the equations: The result is h(π 0 ) = a b, d dπ h(π 0) = ht dh dπ h (π 0) = a b 2. b = h(π 0) d dπ h(π 0), giving the correction a = h(π 0) b, π = π 0 h(π 0) γ h(π 0 ) d dπ h(π. 0) γ The effect of the rational extrapolation is just the Newton step modulated by the term h(π 0) γ. As this term 1 this is also a second order convergent process. 29
30 Simple exponential model: Here the model used is µ(t, x) = x(1) + x(2) exp( x(3)t). (1) The values chosen for the parameters are x(1) = 1, x(2) = 5, and x(3) = 10. Initial values are generated using x(i) 0 = x(i) + (1 + x(i))(.5 Rnd) where Rnd indicates a call to a uniform random number generator giving values in [0, 1]. This model is not difficult in the sense that the graph has features that depend on each of the parameters. Thus the interest is in the effects of simulated data errors. 30
31 Two types of random numbers are used to simulate the experimental data. Normal data: The data is generated by evaluating µ(t, x) on a uniform grid with spacing = 1/(n+1) and then perturbing these values using normally distributed random numbers to give values z i = µ(i, x)+ε i, ε i N(0, 2), i = 1, 2,, n. The choice of standard deviation was made to make small sample problems (n = 32) relatively difficult. The log likelihood is taken as L(x) = 1 2 n i=1 (z i µ(i, x)) 2. While the scale is not evident here it resurfaces in its effects on the generated data. 31
32 Poisson data: A Poisson random number generator is used to generate random counts z i corresponding to µ(i, x) as the mean model. The log likelihood used is ( ) n µ(i, x) L(x) = z i log + (z i µ(i, x)). i=1 z i Note that if z i = 0 then the contribution from the logarithm term to the log likelihood is zero. The rows of the least squares problem design matrix are given by e T i A = 1, exp( x(3)t i), x(2)t i exp( x(3)t i ), s i s i s i i = 1, 2,, n where s i = µ(i, x). The corresponding components of the right hand side are b i = z i µ(i, x) s i. 32
33 Numerical experiments comparing the performance of the line search (LS) and trust region (TR) methods are summarised below. For each n the computations were initiated with 10 different seeds for the basic random number generator, and the average number of iterations is reported as a guide to algorithm performance. The parameter settings used are α = 2.5, β =.1 for the trust region method and ρ =.25 for the Armijo parameter used in the line search. Experimenting with these values (for example, the choice α = 1.5, β =.5) made very little difference in the trust region results. Convergence is assumed if x Lh < 1.0e 8. This corresponds to final values of h in the range 1.e 4 to 1.e 6. Normal Poisson n LS TR LS TR * 14*
34 The starred entries in the table correspond to two cases of nonconvergence. The figure shows (in red) the current estimate after 50 iterations together with the data and the starting estimate. This gives an illustration that the set of approximations is not closed. Result shows a straight line fit 34
35 Second example - Gaussians with an exponential background. µ = x(1) exp x(2)t+x(3) exp (t x(4)) 2 /x(5) Line search case: + x(6) exp (t x(7)) 2 /x(8) x(1) = 5, x(2) = 10, x(3) = 18, x(4) =.3333, x(5) =.05, x(6) = 15, x(7) =.6667, x(8) =
36 The results here are much more of a mixed bag. The problems are closely related to the choice of starting values. The figure is produced using peak widths of.01 which makes it easier to see what is going on. In this case the starting values do not see the second peak. Both the background and the first peak are picked up well. Initial conditions miss the second peak 36
37 Third example - trinomial example Data for a trinomial example (m = 3) came from a consulting exercise with CSIRO. It is derived from a study of the effects of a cattle virus on chicken embryos. log 10 (titre) dead deformed normal Cattle virus data The model suggested fits the frequencies explicitly 1 ω 1 = 1 + exp ( β 1 β 3 log (t)), 1 1 ω 2 = 1 + exp ( β 2 β 3 log (t)), ω 3 = 1 ω 1 ω 2. 37
38 Makes sense to develop the algorithm in terms of the frequencies. Numerical results show an impressive rate of convergence for a relatively small data set. Suggests the model analysis is good. its L Lh β 1 β 2 β Results of computations for the trinomial data 38
39 In conclusion Work on the Levenberg algorithm in the 1970 s was responsible for at least some of the encouragement for the shift from linesearch to trust region methods in optimization problems. It can now be argued that this component of the move had somewhat dubious validity. 1. Use of the expected Hessian is already a regularising step. Improved conditioning derived from the trust region parameter could be illusory if the aim is small values for rapid convergence. If significant values of π are required then in the data analytic context the modelling could well be suspect. 2. The old papers relied on a small residual argument to explain good convergence rates. Again in our context this is not satisfactory. It is not completely obvious what the effect of non zero trust region parameters is in any particular case. 3. The trust region algorithm does not scale as well as the linesearch scoring algorithm. 4. Global convergence results of similar power appear available for both approaches. 39
linear least squares methods nonlinear least squares asymptotically second order iterations M.R. Osborne Something old,something new, and...
Something old,something new, and... M.R. Osborne Mathematical Sciences Institute Australian National University Anderssen 70 Canberra, January 30, 2009 Outline linear least squares methods nonlinear least
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationThe Bock iteration for the ODE estimation problem
he Bock iteration for the ODE estimation problem M.R.Osborne Contents 1 Introduction 2 2 Introducing the Bock iteration 5 3 he ODE estimation problem 7 4 he Bock iteration for the smoothing problem 12
More information2 Nonlinear least squares algorithms
1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationNotes for CS542G (Iterative Solvers for Linear Systems)
Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More information6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )
6. Brownian Motion. stochastic process can be thought of in one of many equivalent ways. We can begin with an underlying probability space (Ω, Σ, P) and a real valued stochastic process can be defined
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationAn introduction to Mathematical Theory of Control
An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationMcGill University Department of Mathematics and Statistics. Ph.D. preliminary examination, PART A. PURE AND APPLIED MATHEMATICS Paper BETA
McGill University Department of Mathematics and Statistics Ph.D. preliminary examination, PART A PURE AND APPLIED MATHEMATICS Paper BETA 17 August, 2018 1:00 p.m. - 5:00 p.m. INSTRUCTIONS: (i) This paper
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationBrownian Motion. Chapter Stochastic Process
Chapter 1 Brownian Motion 1.1 Stochastic Process A stochastic process can be thought of in one of many equivalent ways. We can begin with an underlying probability space (Ω, Σ,P and a real valued stochastic
More informationLECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel
LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationLP. Kap. 17: Interior-point methods
LP. Kap. 17: Interior-point methods the simplex algorithm moves along the boundary of the polyhedron P of feasible solutions an alternative is interior-point methods they find a path in the interior of
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationPhysics 6303 Lecture 22 November 7, There are numerous methods of calculating these residues, and I list them below. lim
Physics 6303 Lecture 22 November 7, 208 LAST TIME:, 2 2 2, There are numerous methods of calculating these residues, I list them below.. We may calculate the Laurent series pick out the coefficient. 2.
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More information1 Complete Statistics
Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t
More informationReview of matrices. Let m, n IN. A rectangle of numbers written like A =
Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an
More informationNext topics: Solving systems of linear equations
Next topics: Solving systems of linear equations 1 Gaussian elimination (today) 2 Gaussian elimination with partial pivoting (Week 9) 3 The method of LU-decomposition (Week 10) 4 Iterative techniques:
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationArc Search Algorithms
Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where
More informationLecture 18 Classical Iterative Methods
Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationNumerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore
Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series
More informationPenalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.
AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationA Sobolev trust-region method for numerical solution of the Ginz
A Sobolev trust-region method for numerical solution of the Ginzburg-Landau equations Robert J. Renka Parimah Kazemi Department of Computer Science & Engineering University of North Texas June 6, 2012
More informationAIMS Exercise Set # 1
AIMS Exercise Set #. Determine the form of the single precision floating point arithmetic used in the computers at AIMS. What is the largest number that can be accurately represented? What is the smallest
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationNumerical Analysis Comprehensive Exam Questions
Numerical Analysis Comprehensive Exam Questions 1. Let f(x) = (x α) m g(x) where m is an integer and g(x) C (R), g(α). Write down the Newton s method for finding the root α of f(x), and study the order
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More information30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes
LU Decomposition 30.3 Introduction In this Section we consider another direct method for obtaining the solution of systems of equations in the form AX B. Prerequisites Before starting this Section you
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationVector Derivatives and the Gradient
ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego
More informationThe Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:
Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply
More informationLecture 2: Computing functions of dense matrices
Lecture 2: Computing functions of dense matrices Paola Boito and Federico Poloni Università di Pisa Pisa - Hokkaido - Roma2 Summer School Pisa, August 27 - September 8, 2018 Introduction In this lecture
More informationPhysics 202 Laboratory 5. Linear Algebra 1. Laboratory 5. Physics 202 Laboratory
Physics 202 Laboratory 5 Linear Algebra Laboratory 5 Physics 202 Laboratory We close our whirlwind tour of numerical methods by advertising some elements of (numerical) linear algebra. There are three
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationLecture 8: Differential Equations. Philip Moriarty,
Lecture 8: Differential Equations Philip Moriarty, philip.moriarty@nottingham.ac.uk NB Notes based heavily on lecture slides prepared by DE Rourke for the F32SMS module, 2006 8.1 Overview In this final
More informationAN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING
AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationLecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming
Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and
More informationA globally convergent Levenberg Marquardt method for equality-constrained optimization
Computational Optimization and Applications manuscript No. (will be inserted by the editor) A globally convergent Levenberg Marquardt method for equality-constrained optimization A. F. Izmailov M. V. Solodov
More informationStability theory is a fundamental topic in mathematics and engineering, that include every
Stability Theory Stability theory is a fundamental topic in mathematics and engineering, that include every branches of control theory. For a control system, the least requirement is that the system is
More information7 Asymptotics for Meromorphic Functions
Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential
More informationChapter 2. Solving Systems of Equations. 2.1 Gaussian elimination
Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationClassical Estimation Topics
Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min
More informationInexact Newton Methods and Nonlinear Constrained Optimization
Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton
More informationMath 471 (Numerical methods) Chapter 3 (second half). System of equations
Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular
More information1. Introduction Boundary estimates for the second derivatives of the solution to the Dirichlet problem for the Monge-Ampere equation
POINTWISE C 2,α ESTIMATES AT THE BOUNDARY FOR THE MONGE-AMPERE EQUATION O. SAVIN Abstract. We prove a localization property of boundary sections for solutions to the Monge-Ampere equation. As a consequence
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationFree energy estimates for the two-dimensional Keller-Segel model
Free energy estimates for the two-dimensional Keller-Segel model dolbeaul@ceremade.dauphine.fr CEREMADE CNRS & Université Paris-Dauphine in collaboration with A. Blanchet (CERMICS, ENPC & Ceremade) & B.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationConstrained Nonlinear Optimization Algorithms
Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationTHE RESIDUE THEOREM. f(z) dz = 2πi res z=z0 f(z). C
THE RESIDUE THEOREM ontents 1. The Residue Formula 1 2. Applications and corollaries of the residue formula 2 3. ontour integration over more general curves 5 4. Defining the logarithm 7 Now that we have
More information6.3 Fundamental solutions in d
6.3. Fundamental solutions in d 5 6.3 Fundamental solutions in d Since Lapalce equation is invariant under translations, and rotations (see Exercise 6.4), we look for solutions to Laplace equation having
More informationDiscreteness of Transmission Eigenvalues via Upper Triangular Compact Operators
Discreteness of Transmission Eigenvalues via Upper Triangular Compact Operators John Sylvester Department of Mathematics University of Washington Seattle, Washington 98195 U.S.A. June 3, 2011 This research
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More information3. THE SIMPLEX ALGORITHM
Optimization. THE SIMPLEX ALGORITHM DPK Easter Term. Introduction We know that, if a linear programming problem has a finite optimal solution, it has an optimal solution at a basic feasible solution (b.f.s.).
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationLECTURE-15 : LOGARITHMS AND COMPLEX POWERS
LECTURE-5 : LOGARITHMS AND COMPLEX POWERS VED V. DATAR The purpose of this lecture is twofold - first, to characterize domains on which a holomorphic logarithm can be defined, and second, to show that
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.
More informationCOMPOSITIONS WITH A FIXED NUMBER OF INVERSIONS
COMPOSITIONS WITH A FIXED NUMBER OF INVERSIONS A. KNOPFMACHER, M. E. MAYS, AND S. WAGNER Abstract. A composition of the positive integer n is a representation of n as an ordered sum of positive integers
More informationLecture 4 Eigenvalue problems
Lecture 4 Eigenvalue problems Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationLecture 17: Numerical Optimization October 2014
Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More information