Convergence of the Majorization Method for Multidimensional

Size: px
Start display at page:

Download "Convergence of the Majorization Method for Multidimensional"

Transcription

1 Joumal of Classification 5: (1988) Convergence of the Majorization Method for Multidimensional Scaling Jan de Leeuw University of California Los Angeles Abstract: In this paper we study the convergence properties of an important class of multidimensional scaling algorithms. We unify and extend earlier qualitative results on convergence, which tell us when the algorithms are convergent. In order to prove global convergence results we use the majorization method. We also derive, for the first time, some quantitative convergence theorems, which give information about the speed of convergence. It turns out that in almost all cases convergence is linear, with a convergence rate close to unity. This has the practical consequence that convergence will usually be very slow, and this makes techniques to speed up convergence very important. It is pointed out that step-size techniques will generally not succeed in producing marked improvements in this respect. Keywords: Multidimensional scaling; Convergence; Step size; Local minima. 1. Introduction Recent research in multidimensional scaling has moved in the direction of proposing more and more complicated models, often with a very large number of parameters, and sometimes even with severe discontinuities in the model. The emphasis has been on producing computer programs that work, and comparatively little attention has been paid to theoretical problems associated with the loss functions and the algorithms used to minimize them. We Author's Address: Jan de Leeuw, Departments of Psychology and Mathematics, University of California Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90024, USA.

2 164 J. de Leeuw think that such a more theoretical study is long overdue. In fact we think that at the moment an in-depth study of some of the more simple models and techniques is more urgent than the development of even more complicated ones. This paper is a contribution to the study of a simple algorithm for fitting the simplest of multidimensional scaling models. Thus, our paper does not present a new algorithm or a new model. It also makes no claim about superiority of the algorithm studied here over other possible or actual algorithms. The model is chosen from a large spectrum of possible models, and the loss function is one among many. The only reason for studying this particular model, loss function, and algorithm is that they are simple and direct. This choice makes the mathematical study of their properties relatively easy. It is also the reason why they have been around for quite some time now, and why they have been used in a majority of the applications of multidimensional scaling. The paper will mainly be about metric multidimensional scaling; towards the end we shall briefly discuss the nonmetric case. Results for nonmetric scaling are often simple extensions of the metric results, and in this sense metric scaling is more basic. 2. Notation and Terminology We introduce the notation and terminology more or less standard for metric multidimensional scaling (Kruskal and Wish 1978). The data in a classical multidimensional scaling problem are collected in a symmetric non-negative matrix A = { 8ij }. The elements of A are called dissimilarities; 8ij is the dissimilarity between objects i and j. There are n objects, and thus A is of order n. We suppose that self-dissimilarities are zero; as a consequence A has a zero diagonal. A common purpose of multidimensional scaling is to represent the objects as points in a low-dimensional Euclidean space, in such a way that the distance between points i and j is approximately equal to the given dissimilarity of objects i and j. The xi denote n points inp-space, with coordinates in the n p matrix X, called the configuration. The matrix D(X), with dements d0(x), contains the Euclidean distances between the points xl. It follows that d/~(x) = (xi - xj)" (Xl _ xj) (la) = (el -ej) X X (ei - ej) (lb) = tr X Aij X. (lc) In (lb) the ei are unit vectors (columns of the identity matrix of order n), and in (lc) we have Aij = (el - ej)(ei - ej). In order to find out how successful a representation is we compute the value of a loss function, defined to be

3 Convergence of the Majorization Method 165 o(x) = 1/2 Zi Z i wq (~)ij - dq(x)) 2 (2) where W = {wij } is a symmetric, non-negative matrix of weights, with a zero diagonal. Matrix W is known and fixed throughout the computations. Its nondiagonal elements can be used to code various forms of supplementary information. If there are missing data, for instance, we can set wij corresponding to missing dissimilarities equal to zero. If there are replications of the dissimilarities in a cell we can estimate their variability, and use this information to choose weights. It sometimes also makes sense to set wq equal to a fixed function of the dissimilarity values, such as the square or the inverse square, in order to differentially weight errors. This strategy can be used to simulate the behavior of other multidimensional scaling loss functions. The purpose of the particular form of multidimensional scaling we have presented can now be stated more precisely: given weights, dissimilarities, and a dimensionality p, we want to find X that minimizes c(x). 3. Basic Algorithm The algorithm we discuss in this paper was first given by Guttman (1968). He derived it by setting the stationary equations for the minimization of o(x) equal to zero, and he observed that the algorithm could be interpreted as a gradient algorithm with constant step-size. Compare also Lingoes and Roskam (1973, p. 8-10), Hartman (1979, p ), Borg (1981, p ). In de Leeuw (1977) the very same algorithm was derived in a somewhat more general context from convex analysis as a subgradient method. It was observed that in the simple Euclidean case, which is the one we are interested in here, the algorithm could be derived from the Cauchy-Schwartz inequality, without using either differentiation or subdifferentiation. This is the derivation we present here. It is a much simplified version of the one given in de Leeuw and Heiser (1980). In order to describe the algorithm efficiently we need some additional notation. First let and ~X(x) = 1/2 E i Zj wij d~j(x), p(x) = 1/2 zj d j(x). (3) (4) If we assume, without loss of generality that

4 166 J. de Leeuw then 1/2 ~'i ~j wij ~2 = 1, o(x) = 1-2p(X) + ~2(X). (5) (6) It is clear that rl2(x) is a convex quadratic function of X. From (lc) and (3) we have with rl2(x) = tr X" V X, V = 1/2 Zi Zy wq Aq. (7) (8) Throughout the paper we assume that V has rank n - 1, an assumption which can be made without any loss of generality (de Leeuw 1977). It also follows from (8) that V is positive semidefinite, and that its (one-dimensional) null space consists of all vectors with constant elements. The function p(x) is somewhat more complicated than rl2(x). We know that dq(x) is a convex and positively homogeneous function of X. Thus, by (4), the same claim holds for 9(X). It is convenient to write p(x) as with where p(x) = tr X'B(X)X, B(X) = 1/2 Zi Ej wij 86 sij(x)a 0, sij(x) = 1/dij(X) if dij(x) so so(x) = 0 otherwise (9) (lo) (lla) (1 lb) It is obvious that the difference between (7) and (9) is that in (7) V is a constant matrix,while in (9), B(X) varies with X. Thus, rl2(x) is quadratic in X, while p(x) is not. With the notation developed so far it is easy to explain the algorithm. Clearly the partial derivatives of rl2(x) are given by V rl2(x) = 2VX. Using the definitions it is also not difficult to see that V p(x) = B(X)X, provided that p(x) is differentiable at X. Thus, V ~(X)= 2(VX-B(X)X). In de Leeuw and Heiser (1980), the Guttman transform F(Y) of a configuration Y is defined as

5 Convergence of the Majorization Method 167 F(Y) = V+B(Y)Y, (12) with V + the Moore-Penrose inverse of V. Observe that the Guttman transform depends on weights and dissimilarities, and is consequently defined relative to a particular metric multidimensional scaling problem. Using the Guttman transform makes it possible to rewrite the gradient as V o(x) = 2V(X - F(X)), and thus V (~(X) = 0 if and only if X = F(X). As a consequence, a configuration X is called stationary for a particular metric multidimensional scaling problem if it is equal to its Guttman transform. The result also immediately suggests the algorithm Xk+l = F(Xk), or more explicitly, Equivalently we can also write Xk+l = V+B(Xk)Xt. (13) Xk+l = Xk - 1/2 V + V c(xk), (14) which shows the gradient interpretation of the algorithm. In this derivation of the algorithm we have made the provision that o(x) had to be differentiable at X. This claim is true if and only ifdij(x) > 0 for all i, j for which wij 5ij > 0. Let us agree to call a configuration usable if this condition is true. Thus if X is not usable, then ~(X) is not differentiable at X, and interpretation (14) cannot be used. Using definition (10), however, the iteration (12) can still be canied out. In de Leeuw (1977) and de Leeuw and Heiser (1980) it is shown that in this case the algorithm still converges to a stationary point. There are three reasons why we simply assume differentiability in the sequel. In the first place it was proved by de Leeuw (1984), that if ~ has a local minimum at X, then X is usable. Since we are interested in the behavior of our algorithm in the neighborhood of a local minimum in most of this paper, we might as well assume differentiability. In the second place, although we do not need differentiability for the qualitative study of convergence (i.e., for proving that the algorithm converges), we do need it for the quantitative study (i.e., for establishing the rate of convergence). And, finally, we shall base our convergence proof below directly on the Cauchy- Schwartz inequality, for which differentiability is not needed in the first place. The convergence proof starts with a lemma, which is the foundation of our approach to the minimization of this particular multidimensional scaling loss function.

6 168 J. de Leeuw Lemma 1: For all X and Y Moreover for all X o(x) < 1 - rlz(f(y)) + n2fx- r(y)). o(x) = 1 - n2(r(x)) + n2(x- r(x)). (15a) (15b) Proof." By Cauchy-Schwartz tr X'AqY < { tr X'A/jX} 1/2 { tr Y'AqY } u2 = dq(x)dij(y). (16) Multiply both sides by wij ~ij sq(y), and add over all i, j. This manipulation gives, using (4) and (10), tr X'B(Y)Y < p(x), (17) with equality ify = X. We can also write (17) as and substitution in (6) gives We can write (19) as p(x) > tr X'VF(Y), (18) o(x) < 1-2 tr X'VF(Y) + tr X'VX. (19) o(x) < 1 - tr F(Y)'VF(Y) + tr (X - F(Y))'V(X - F(Y)), (20) which is (15a). We have equality ifx = Y, which is (15b). Q.E.D. Figure 1 provides a useful interpretation of Lemma 1, and also shows the algorithmic implications. If we use the abbreviation o~(x) = 1 - ~2(F(Y)) + rl2(x - F(Y)), (21) then for each Y, the function roy is quadratic in X. Moreover, by Lemma I, c(x) < oar(x) and c(y) = roy(y). Thus, for each Y, the function ohr majorizes the function a, and the two functions touch only for X = Y. Moreover, roy is minimized over X by setting X = F(Y) and thus we see that the Guttman transform decreases the loss function. We write this result as o(f(y)) < roy(f(y)) < roy(y) = o(y), (22)

7 Convergence of the Majorization Method 169 first majorization first value second majori: second value third! td P configurations minimum It first iteration second iteration third iteration Figm'e 1. Three iterations of the majorization algorithm. We have drawn a section of the stresg loss function, and two quadratic majorization functions. These touch the function at the current configuration, they are always above it, and their minimum provides the next configuration. provided that Y # F(Y). These results are illustrated in Figure 1. Thus either Y = F(Y), in which case V o(y) = 0 and we have found a solution of the stationary equations, or Y ~ F(Y) and we can decrease the loss by replacing Y by its Guttman transform. This procedure constitutes the basic algorithm. It also explains why we call it a majorization method: instead of local linear approximation we use global quadratic majorization in each iteration step.

8 170 J. de Leeuw 4. Sequences Generated by the Algorithm The algorithm (13)generates a sequence Xk, and also sequences of real numbers ok = O(Xk), Pk = p(xd, rl 2 = rl2(xt~). Also define the sequences 3. k = p(xj:) / ri(xk), and e~ = l~2(xk- F(Xk) ). The symbols 1" and,l. are used for convergence of monotone sequences which are, respectively, increasing and decreasing. The theorems in this section unify earlier results in de Leeuw (1977) and de Leeuw and Heiser (1980). Theorem l: (a) Pk 1" po*, (c) 1" = (d) Ok,l, ~3~ : 1 - poo, (e) o. : Proof." From (3) and (4) we lind, by using Cauchy-Schwartz, that p(x) < ri(x) for all X. Thus Zk <-1 for all k. We can write (9) as p(x) = tr X VF(X), and again by Cauchy-Schwartz, p(x) < rl(x)q(f(x)) for all X. Thus Pt, < rlkrlk 1, and also ~.k < rlk+l. Now write (17) as p(f(x)) > tr (F(X))'B(X)X = rl2(f(x)), which implies that Pk > rlk 2 and )~k > rlk. Summarizing the results so far gives rl/c<~lk<rlk+l <)~k+1 < 1, 1] 2 <-- Pk <- l~ktlk+l < <Pk+l < 1. (23a) (23b) These chains are sufficient to prove parts (a) (b) (c). We have already proved the decrease of ok in (22), the limit value follows trivially from ~k = 1-2pk + rl 2, together with po~ = rl 2. This proves (d). For (e) we write e rl2+1-2pk, and again use poo = rl 2. Q.E.D. Observe that the theorem does not state that ek decreases monotonically to zero. In fact usually convergence of ek to zero is nonmonotonic. The theorem also does not say anything about the convergence of Xk. The sequence Xk is studied in a separate theorem. Remember that Xoo is an accumulation point of a sequence Xk if each neighborhood of X~ contains infinitely many points of the sequence. Theorem 2: Suppose So. is the set of all accumulation points of the sequence Xk. Then: (a) So. is nonempty,

9 Convergence of the Majorization Method 171 (b) ifx~ E S.~ then a(x~) = c**, (c) if X~ ~ S~ and X** is usable, then X~ is stationary (i.e., equal to its Guttman transform), (d) if S~. is not a singleton, then it is a continuum. Proof." All Xk are column centered. In the p(n - 1) dimensional space of all column-centered n p matrices, the function 11 defines a norm. Because rik -< 1 for all k, it follows that all Xk are in the unit ball of this normed space, and consequently they have at least one accumulation point. This implication proves (a). Suppose X~ is a subsequence converging to Xo.. Then, by continuity of t~, the sequence t~(x~.) converges to c(xo~). But the only accumulation point of C(Xk) is Co~. This argument proves (b). In the neighborhood of a usable configuration the Guttman transform is continuous. Thus, by the same argument that proved (b), we find that e(x~.) = rl(x~. - F(X~,)) = rl(x~. - X~.+l) converges to rl(x** - F(X~)), which must be zero by Theorem 1, part (e). This reasoning proves (c). For result (d), we remember that a continuum is a closed set, which cannot be written as the union of two or more disjoint closed sets. A proof of (d) is given by Ostrowski (1966, Theorem 28.1). Q.E.D. Again Theorem 2 does not say that Xk converges. This fact is quite irrelevant from a practical point of view, however. If we define Ix-optimal configurations as those configurations for which rl(x - F(X)) < Ix, then for all Ix > 0, the algorithm finds a Ix-optimal configuration in a finite number of steps. This result is all the convergence we ever need in practice. In Theorem 2 there is a restriction in part (c), because we require that X~, is usable. This restriction can be removed by using subdifferentials (de Leeuw and Heiser 1980). But again, from a practical point of view, the restriction is not very important, because all local minima are usable (de Leeuw 1984). The conclusion for Theorems 1 and 2 is that the sequences of loss and fit values generated by the algorithm converge monotonically. The difference between successive solutions converges to zero, which implies that the sequence of solutions converges. In a precise mathematical sense we have either convergence to a single point, or convergence to a continuum of stationary points, with all these stationary points having the same value of the loss function. We have not been able to exclude this last possibility, and indeed a natural candidate for such a continuum is available in any multidimensional scaling problem. If X~. is a stationary point, then for any p x p rotation matrix K, the matrix XooK is also a stationary point, with the same loss function value. Thus there is the possibility that Xk converges to a

10 172 J. de Leeuw continuum of the form S**= {XIX=X.oK}, in the sense that min {ri(xk - X) I X e S** } converges to zero, but XI, does not converge to a point in So.. Again it is clear that this fact is irrelevant from a practical point of view. 5. Derivatives of the Guttman Transform A more detailed study into the convergence behavior of the algorithm is possible if we determine the derivative of the Guttman transform. This information makes it possible to investigate, for the first time, the rate of convergence of our basic algorithm. For a general discussion of the role of the derivative in one-step iterative processes, such as our (12), we refer to Ostrowski (1966, Chapter 22) or Ortega and Rheinboldt (1970, Chapter 10). For ease of reference we briefly summarize their key result here. If Xk+1 = ~(Xk) is any convergent iterative algorithm generating a sequence converging to Xoo, and if the largest eigenvalue "coo of the derivative of at X~ satisfies 0 <'co. < 1, then ]]Xk+1-Xo.ll/]]Xk-Xoo]]-->"coo, i.e., we have linear convergence with rate "c.~. The derivative is considered as a linear operator mapping the (n - 1)p dimensional space of column-centered configurations into itself. We can represent it computationally by an np x np matrix, but here we prefer to give it as a linear map which associates with each centered configuration Y another centered configuration Fx(Y). The map Fx is the derivative of the Guttman transform at the usable configuration X. Thus we have, for example, r(x + Y) = r(x) + Ix(Y) + o(nor)), (24) showing the local linear approximation provided by the derivative. Observe that in (24), we have chosen I"1 as the norm we are using in defining the derivative. This choice is not essential, of course, but it certainly is convenient. We now give the formula for the derivative. It is most usefully written as with Fx(Y) = V + {B(X)Y- U(X,Y)X}, (25) U(X,Y) = 1/2 Xi Zj {wij~ijcij(x,y) / d3(x)} Aij, (26) and clj(x,y) = tr X'A Y. Thus U(X,X) = B(X). Next we are interested in the eigenvalues and eigenvectors of Ix. Observe that we have interpreted it as an operator on the (n - 1)p dimensional

11 Convergence of the Majorization Method 173 space of centered configurations. It consequently has (n - 1)p eigenvalues, not necessarily all distinct. If we consider Fx as an operator on the space of all np matrices, then it has p additional eigenvalues equal to zero. The corresponding eigensubspace contains the solutions ofrl(y) = 0. Before proceeding, we have to single out one special case. If p = 1, then UCK,Y)X = B(X)Y, and thus Fx(Y) = 0. It is shown in de Leeuw and Heiser (1977) that the Guttman transform iterations converge in a finite number of steps if p = 1. This special case was already singled out by Guttman (1968). Defays (1978), Heiser (1981), and Hubert and Arabie (1986) show that one-dimensional scaling is essentially a combinatorial problem. Because either the result Fx(Y) = 0 or the fact of finite convergence stops all considerations having to do with rate of convergence, we assume from now onthatp > 1. Result 1. X is an eigenvector of Fx with eigenvalue zero. This result follows directly from U(X,X) = B(X) and (24). Result 2. Fx has simple structure, i.e., (n - 1)p linearly independent eigenvectors. This follows because Fx(Y) = ~Y can be written in the form (V2p(X))Y = ~,VY, where V2p is the operator corresponding with the second partial derivatives of p, which is consequently symmetric. Thus the eigenvalues are real, and the eigenvectors Ys can be chosen such that they are orthogonal, i.e., such that tr Y~VYt = Sst, with 5 st the Kronecker delta. Result 3. The eigenvalues of Fx are non-negative. This fact follows from the representation in the previous result. Because p is convex, the operator V2p is positive semidefmite. Result 4. If X is a local minimum of ~, then all eigenvalues of Fx are less than or equal to one. This result follows because if X is a local minimum, then we must have that V2c(X) is positive semidefinite. And V2a(X) is positive semidefmite if and only if I - F x is positive semidefinite, i.e., if and only if all eigenvalues of Fx are less than or equal to one. Result 5. If X is stationary, then Fx has 1/2p(p - 1) eigenvalues equal to one. To prove this result, choose Y = XS, with S anti-symmetric, i.e., S = - S'. Then c0(x,y ) = tr X'AijXS = 0, and thus from (24) and (25) we have Fx(Y)= V+B(X)XS = XS = Y. The anti-symmetric matrix S can be chosen in 1/2pfp - 1) linearly independent ways. 6. A Small Example The example we use has n = 4 objects, and has all dissimilarities ~ij, with i ~ j, equal to 1 / "~fff. The weights wij are all equal to one, and consequently normalization (5) is true. Of course this example does not constitute a realistic multidimensional scaling problem, because it is much too small. We use the example to illustrate what kind of stationary points we can expect

12 174 J. de Leeuw in a metric scaling problem, and how our simple algorithm will behave in a neighborhood of these stationary points. We do not intend to prove, in any sense, that our algorithm is better than other existing algorithms. In fact we think that the other algorithms will behave quite similarly. It seems to us that the example can be used to illustrate the great difficulties multidimensional scaling methods can encounter if the iterations are started at unfommate places, and to illustrate the slow rate of convergence that seems to be typical for multidimensional scaling problems. Both the local minimum problem and the slow convergence problem will tend to become more serious if we increase the size of the problem. We first make a small list of stationary points. This list may not be exhaustive, but it contains some of the more interesting types of stationary points. Stationary point 1. Take four points equally spaced on a line. This actually defines a whole family of stationary points, because any permutation of four equally spaced points will do. Moreover we can think of this solution as embedded in one-dimensional space, in two-dimensional space, and so on. If p = 1 we know that Fx = 0, we also know that the equally spaced points define the global minimum (de Leeuw and Stoop 1983). But, as we said before, we are really only interested in p > 1. Ifp = 2, i.e., if we have four points equally spaced on a line in two-space, then the eigenvalues of Fx are four zeros together with the eigenvalues of 1/4B(X), which are 0, 1, 1.5, and Thus for p = 2 the four points on a line are not a local minimum, in fact they define a saddle-point. The function value in this saddle-point is Stationary point 2. Take three points in the comers of an equilateral triangle, and the fourth one in the centroid of the triangle. This is, again, a whole family of stationary points, all with loss function value = Forp = 2 the operator Fx has 8 eigenvalues. Three are equal to zero, three are equal to one, and two are equal to Thus the triangle plus centroid defines a local minimum, but not an isolated one. There are three eigenvalues equal to one and only one of them corresponds with the trivial eigenvalue equal to one of result (5) in the previous section. Ifp = 3 we have the same 8 eigenvalues, plus the 4 eigenvalues of 1/2B(X), which are 0, 1, 1, and 1A641 in this case. Thus forp = 3 this stationary point is not a local minimum any more. Stationary point 3. Take four points in the corners of a square. Loss is Forp = 2 the eigenvalues of Fx are three zeros,.4142, three times.5858, and one trivial unit eigenvalue. Thus we have local minimum here that is isolated in the sense that the manifold of all rotations of the square is isolated from other stationary values. Again the square is not a local minimum forp = 3, although it remains stationary.

13 Convergence of the Majorization Method 175 Stationary point 4. Take four points in the comers of a regular tetrahedron. The loss function is now zero, which shows directly that this three-dimensional solution is certainly the global minimum. The eigenvalues of Fx are zero four times, 1/2 three times, 3/4 two times, and one three times. The unit eigenvalues correspond with the 1/2p(p- 1) trivial eigenvalues defining the manifold of rotations. 7. Rate of Convergence The discussion in the previous sections shows that at least part of the difficulty with proving actual convergence of our iterations comes from the rotational indeterminancy of multidimensional scaling. Because of this rotational indeterminancy, Fx has at least 1/2pfp - 1) unit eigenvalues at a stationary point X. If we eliminate rotational indetenninancy, then we eliminate these difficulties. First, we call a regular stationary point isolated if F x has exactly 1/2p(p - 1) unit eigenvalues. We call it an isolated local minimum if all other eigenvalues are strictly less than one. In the small example in the previous section, the four points in the comer of a square are an isolated local minimum for p = 2. For p = 3 the solution is neither isolated nor a local minimum. The equilateral triangle with centroid is a local minimum for p = 2, but not an isolated local minimum. The regular tetrahedron is an isolated local minimum for p = 3. At an isolated local minimum, we use the symbol ~: for the largest eigenvalue less than one. We call it the level of the isolated local minimum. Theorem 3: If Xk has an accumulation point, which is an isolated local minimum, and has level ~c, then ek+l / ek --~ ~. Proof." We want to apply the general theorems of Ostrowski and of Ortega and Rheinboldt referred to above. First, we eliminate rotational indeterminancy by defining a new sequence X~. For each k the configuration X~ is a rotation of Xk; moreover it is a specific rotation which identifies the configuration uniquely in the manifold of rotations. We can rotate to principal components, for example, with some special provision for equal eigenvalues. Because X~ is a rotation of Xk, the sequences ak, Pk, rh, ~k generated by this modified algorithm are exactly the same. So is 2 = (~k+1)2 + ~2-2Ok, although now ek ~:rl((xk+l) -X~). The transformation which maps X~ to (Xk+l) has a derivative F~ at a stationary point with exactly the same eigenvalues as Fx, except for the 1/2p(p- 1) unit eigenvalues, which are replaced by zeroes. Thus ~ < I is actually the largest eigenvalue of F~, so that X~ converges linearly, with rate K:. Q.E.D.

14 176 J. de Leeuw If the stationary point is a non-isolated local minimum, or not even a local minimum, then Theorem 3 does not say anything about the rate of convergence. This fact does not seem to be a very important restriction of generality in practice, because it seems difficult to get our algorithm to converge to a non-isolated local optimum. We illustrate this finding with the small example from the previous section. The equilateral triangle with centroid is a non-isolated local minimum for p = 2. Start the iterations from a small perturbation of this stationary point. With a very close start (o = ), we have convergence to the stationary value with 10-decimal precision within 5 iterations. The ratio (ek+l) 2 / e 2 continues to increase, however, although extremely slowly. It is.99 at iteration 8, and.999 at iteration 10. We have stopped the process at iteration 30, at which point we are still equally close to the stationary value, and the ratio is still increasing. We have restarted the iterations somewhat further away from the stationary point (o= ). After 10 iterations t~ is down to and e 2 is The ratio (ek+l) 2 /e 2 is At iteration 50 we have t~ = , e 2 = 36 x 10-1, and (ek l) 2 / e 2 = Around iteration 60 the value of o drops below (equilateral triangle with centroid) and e 2 begins to rise, causing a ratio ek+l / ek larger than one. This situation continues for a very long time. At iteration 200, for instance, we have o = and e 2 = 3413 x The ratio of successive epsilons is still larger than one. This continues until iteration 250. In the meantime e 2 has increased to , and o, which started dropping rapidly at iteration 225, is down to Convergence now becomes rapid, and within 20 iterations the configuration converges to the four comers of the square, which is an isolated local minimum (in fact the global minimum) forp = 2. At iteration 270 we have o = , e 2 < 10-1, and (ek+l) 2 / e 2 = , which is for all practical purposes equal to ~: 2. Thus we have started close to a non-isolated local minimum. The algorithm has great difficulty in getting away from it, but ultimately succeeds. With a restart even further away (o = ) the algorithm has difficulty escaping only until iteration 30. The e 2 decreases rapidly again, and we have convergence to the square in 55 iterations. It is now not difficult to conjecture that in the first start, in which we seemed to converge on the equilateral triangle, we merely did not continue long enough. After hundreds or perhaps thousands of iterations we would converge on the square again, if we continued.

15 Convergence of the Majorization Method Nonmetric Scaling We now introduce some additional terminology in order to define nonmetric scaling. The loss function for a nonmetric scaling problem can be written as t~(x,a) = 1/2 Ei Ej w O (~ij - dij(x)) 2, (27) where all symbols have the same meaning as before, except for A = {~ilj }, which now contains the disparities and no longer the dissimilarities. In nonmetric scaling the loss function (27) must be minimized over both the configuration X and the disparities A, where the disparities are restricted by the ordinal information in the data. Thus the disparities are additional parameters in the nonmetric scaling problem over which we minimize, in contrast to the metric scaling problem in which the dissimilarities are fixed numbers. Thus we can choose A freely, provided it is feasible, which means in this case monotone with the given dissimilarities. In this sense nonmetric scaling generalizes metric scaling, in which A must not only be monotone but in fact identical to the dissimilarities. In order to prevent trivial solutions, we must also impose a normalization condition on the disparities, for instance that their sum of squares is unity, or that their variance is unity. As indicated in de Leeuw (1977) we can think of nonmetric scaling algorithms in two different ways. First, we interpret them as alternating least squares methods, which alternate one gradient step (or Guttman-transform) with a monotone regression step. In the gradient step, the loss function (27) is minimized (or rather decreased) by choosing a new X; in the monotone regression step (27) is decreased by choosing a new A, consistent with the ordinal constraints. Of course it is possible, and perhaps sometimes advisable, to introduce some obvious modifications of this algorithm. Instead of alternating one Gutlman transform with one monotone regression we can perform more Guttman transforms between monotone regressions. These Guttman steps have a rate of convergence which is described by our results above. In the alternating least squares interpretation, the loss function (27) is dearly interpreted as a function of two sets of parameters, the coordinates and the disparities. Second, it is also possible to view the loss function in nonmetfic scaling as a function of X alone. This is the original definition of stress as proposed by Kruskal (1964a, 1964b). If c(x,a) is the "stress" used in the first approach, then Kruskal's stress is the minimum of c(x,a) over all feasible disparities A. Thus A is "projected out," and the remaining function depends only on X. This elementary fact has caused a great deal of confusion in the early days of multidimensional scaling. The confusion was made even bigger

16 178 J. de Leeuw by the fact that the derivative of t~(x) is the same as the partial derivatives of c~(x,a) with respect to X, evaluated at the optimal A(X). Thus a(x) = c(x,a(x)), but also V c(x) = Vxt~(X,A(X)). This last result is due to Kruskal (1971), who used it to show that t~(x) is differentiable (whenever o(x,a) is differentiable). It is also used by de Leeuw (1977) to show that the iteration Xk+l = V B(Xk)Xk is still a convergent algorithm if we define B(X) as in (10) but with A(X) substituted for A. Thus, our qualitative convergence results remain true without modification, both in the alternating least squares and in the (sub)differential interpretation. Unfortunately the transformation X-->A(X) is generally not differentiable. In fact, to find A(X) we have to project D(X) orthogonally on a polyhedral convex cone, which implies that the transformation is piecewise linear in the distances. The linear pieces are joined in a continuous but nonsmooth way (compare Kruskal 1971). If we have convergence of the nonmetric scaling algorithm to a point where the cone-projection is locally a constant linear map, then our convergence results apply. In monotone regression terms this means that for all points in an open neighborhood of the solution we are converging to, monotone regression finds the same partitioning into blocks in its computation of the disparities, i.e., for all these points it projects on the same face of the cone. In general, however, we cannot exclude the possibility that the convergence is to a point in the boundary of two regions with different projection maps. These linear maps again correspond to the partitioning into blocks found by the monotone regression algorithm. In this case our results do not apply, and they must be adapted. 9. Summary and Conclusions We have shown in this paper that our basic majorization algorithm for multidimensional scaling converges to a stationary point, if convergence is defined using the asymptotic regularity of the generated sequence, i.e., in terms of the fact that the distance between two consecutive members of the sequence converges to zero. We have also shown that if one of the accumulation points of the sequence is an isolated local minimum, then convergence is linear. This condition seems to be all that is needed for practical applications. It follows from our small numerical example that it is possible for actual computer programs to stop at saddle points which are not local minima, and that in the neighborhood of such saddle points convergence may look sublinear. Our experience with many practical examples indicates that the level of isolated local minima (i.e., the convergence rate of the algorithm we have described) in multidimensional scaling is very often close to unity. Thus although convergence is theoretically linear, it can be extremely slow.

17 Convergence of the Majorization Method 179 Consequently it becomes very important, at least in some cases, to look for ways to speed up linear convergence, or even for ways to attain supralinear convergence. These acceleration devices will be investigated in subsequent publications. Simple ways to speed up linear convergence were already investigated by de Leeuw and Heiser (1980); more complicated ones were studied by Stoop and de Leeuw (1983). In both papers the basic convergence of the Gutmaan-transform iterations is preserved, but a specially developed step-size procedure is added to the algorithm. Both on theoretical and on empirical grounds one can argue that stepsize procedures in MDS, including the ones devised by Kruskal (1964), will generally double the speed of convergence, i.e., half the number of iterations required for a given precision. In the case of very slow linear convergence, this does not really help very much. Of course the formulae derived in Section 5 can be used quite easily to derive the exact form of Newton's method applicable to multidimensional scaling, but our numerical experience so far suggests that Newton's method must also be used with much care in this context. Our main conclusion is that the majorization method is reliable and very simple, but that it is generally slow, and sometimes intolerably slow. It seems to us that an additional conclusion is that one should always study the second derivatives of the loss function at the stopping point of the algorithm. This information indicates if we have stopped at a local minimum, and how much improvement we can expect in various directions. Such information on improvement can be used either to try to make one or more Newton-steps, or to derive information on the stability of the solution. References BORG, I. (1981), Anwendungsorientierte Multidimensionale Skalierung, Berlin, West Germany: Springer. DEFAYS, D. (1978), "A Short Note on a Method of Seriation," British Journal of Mathematical and Statistical Psychology, 31, DE LEEUW, J. (1977), "Applications of Convex Analysis to Multidimensional Scaling," in Recent Developments in Statistics, eds. J.R. Barra, F. Brodeau, G. Romier and B. van Cutsem, Amsterdam: North Holland, DE LEEUW, J. (1984), "Differentiability of Kruskal's Stress at a Local Minimum." Psychometrika, 49, DE LEEUW, J., and HEISER, W. J. (1977), "Convergence of Correction Matrix Algorithm for Multidimensional Scaling," in Geometric Representations of Relational Data, ed. J. C. Lingoes, Ann Arbor: Mathesis Press, DE LEEUW, J., and HEISER, W. J, (1980), "Multidimensional Scaling with Restrictions on the Configuration," in Multivariate Analysis, Vol. V, ed. P. R. Krishnaiah, Amsterdam: North Holland.

18 180 J. de Leeuw DE LEEUW, J., and STOOP, I. (1984), "Upper Bounds for Kruskal's Stress," Psychometrika, 49, GUTrMAN, L. (1968), "A General Nonmetric Technique for Finding the Smallest Coordinate Space for a Configuration of Points," Psychometrika, 33, HARTMANN, W. (1979), Geometrische Modelle zur Analyse empirischer Daten, Berlin: Akademie Vedag. HEISER, W. J. (1981), Unfolding Analysis of Proximity Data, Unpublished Doctoral Dissertation, University of Leiden. HUBERT, L., and ARABIE, P. (1986), "Unidimensional Scaling and Combinatorial Optimization," in Multidimensional Data Analysis, eds. L de Leeuw, et al, Leiden: DSWO- Press, , KRUSKAL J. B. (1964a), "Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypotheses," Psychometrika, 29, KRUSKAL, L B. (1964b), "Nonmetric Multidimensional Scaling: A Nttmedcal Method," Psychometrika, 29, KRUSKAL, J. B. (1971), "Monotone Regression: Continuity and Differenfiability Properties," Psychometrika, 36, KRUSKAL, J. B., and WISH, M. (1978), Multidimensional Scaling, Newbury Park, CA: Sage. LINGOES, J. C., and ROSKAM, E. E. (1973), "A Mathematical and Empirical Comparison of Two Multidimensional Scaling Algorithms," Psychometrika, 38, Monograph Supplement. ORTEGA, J. M., and RHEINBOLDT, W. C. (1970), Iterative Solution of Nonlinear Equations in Several Variables, New York: Academic Press. OSTROWSKI, A. M. (1966), Solution of Equations and Systems of Equations, New York: Academic Press. STOOP, I., and DE LEEUW, J. (1983), The Stepsize in Multidimensional Scaling Algorithms, Paper presented at the Third European Meeting of the Psychometric Society, Jouy-en- Josas, France, July 5-8,1983.

Least squares multidimensional scaling with transformed distances

Least squares multidimensional scaling with transformed distances Least squares multidimensional scaling with transformed distances Patrick J.F. Groenen 1, Jan de Leeuw 2 and Rudolf Mathar 3 1 Department of Data Theory, University of Leiden P.O. Box 9555, 2300 RB Leiden,

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Basic Notations and Definitions

Basic Notations and Definitions PSYCHOMETRIKA--VOL. 49, NO. 3, 391-402 SEPTEMBER 1984 UPPER BOUNDS FOR KRUSKAL'S STRESS JAN DE LEEUW AND INEKE STOOP LEIDEN UNIVERSITY In this paper the relationships between the two formulas for stress

More information

Global Optimization in Least Squares. Multidimensional Scaling by Distance. October 28, Abstract

Global Optimization in Least Squares. Multidimensional Scaling by Distance. October 28, Abstract Global Optimization in Least Squares Multidimensional Scaling by Distance Smoothing P.J.F. Groenen 3 W.J. Heiser y J.J. Meulman 3 October, 1997 Abstract Least squares multidimensional scaling is known

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES JAN DE LEEUW 1. INTRODUCTION In homogeneity analysis the data are a system of m subsets of a finite set of n objects. The purpose of the technique

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

exceptional value turns out to be, "RAC" CURVATURE calling attention especially to the fundamental Theorem III, and to emphasize

exceptional value turns out to be, RAC CURVATURE calling attention especially to the fundamental Theorem III, and to emphasize VOL. 18, 1932 V,MA THEMA TICS: E. KASNER 267 COMPLEX GEOMETRY AND RELATIVITY: "RAC" CURVATURE BY EDWARD KASNER DEPARTMENT OF MATHEMATICS, COLUMBIA UNIVERSITY Communicated February 3, 1932 THEORY OF THE

More information

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example

More information

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless ALGORITHM CONSTRUCTION BY DECOMPOSITION JAN DE LEEUW ABSTRACT. We discuss and illustrate a general technique, related to augmentation, in which a complicated optimization problem is replaced by a simpler

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

A Criterion for the Stochasticity of Matrices with Specified Order Relations

A Criterion for the Stochasticity of Matrices with Specified Order Relations Rend. Istit. Mat. Univ. Trieste Vol. XL, 55 64 (2009) A Criterion for the Stochasticity of Matrices with Specified Order Relations Luca Bortolussi and Andrea Sgarro Abstract. We tackle the following problem:

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

FITTING DISTANCES BY LEAST SQUARES

FITTING DISTANCES BY LEAST SQUARES FITTING DISTANCES BY LEAST SQUARES JAN DE LEEUW CONTENTS 1. Introduction 1 2. The Loss Function 2 2.1. Reparametrization 4 2.2. Use of Homogeneity 5 2.3. Pictures of Stress 6 3. Local and Global Minima

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

7.5 Partial Fractions and Integration

7.5 Partial Fractions and Integration 650 CHPTER 7. DVNCED INTEGRTION TECHNIQUES 7.5 Partial Fractions and Integration In this section we are interested in techniques for computing integrals of the form P(x) dx, (7.49) Q(x) where P(x) and

More information

Simultaneous Diagonalization of Positive Semi-definite Matrices

Simultaneous Diagonalization of Positive Semi-definite Matrices Simultaneous Diagonalization of Positive Semi-definite Matrices Jan de Leeuw Version 21, May 21, 2017 Abstract We give necessary and sufficient conditions for solvability of A j = XW j X, with the A j

More information

Copositive Plus Matrices

Copositive Plus Matrices Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

1. Algebraic and geometric treatments Consider an LP problem in the standard form. x 0. Solutions to the system of linear equations

1. Algebraic and geometric treatments Consider an LP problem in the standard form. x 0. Solutions to the system of linear equations The Simplex Method Most textbooks in mathematical optimization, especially linear programming, deal with the simplex method. In this note we study the simplex method. It requires basically elementary linear

More information

Some Notes on Linear Algebra

Some Notes on Linear Algebra Some Notes on Linear Algebra prepared for a first course in differential equations Thomas L Scofield Department of Mathematics and Statistics Calvin College 1998 1 The purpose of these notes is to present

More information

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review B553 Lecture 3: Multivariate Calculus and Linear Algebra Review Kris Hauser December 30, 2011 We now move from the univariate setting to the multivariate setting, where we will spend the rest of the class.

More information

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations PRINCALS Notation The PRINCALS algorithm was first described in Van Rickevorsel and De Leeuw (1979) and De Leeuw and Van Rickevorsel (1980); also see Gifi (1981, 1985). Characteristic features of PRINCALS

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

6 Lecture 6: More constructions with Huber rings

6 Lecture 6: More constructions with Huber rings 6 Lecture 6: More constructions with Huber rings 6.1 Introduction Recall from Definition 5.2.4 that a Huber ring is a commutative topological ring A equipped with an open subring A 0, such that the subspace

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Spanning and Independence Properties of Finite Frames

Spanning and Independence Properties of Finite Frames Chapter 1 Spanning and Independence Properties of Finite Frames Peter G. Casazza and Darrin Speegle Abstract The fundamental notion of frame theory is redundancy. It is this property which makes frames

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Absolute value equations

Absolute value equations Linear Algebra and its Applications 419 (2006) 359 367 www.elsevier.com/locate/laa Absolute value equations O.L. Mangasarian, R.R. Meyer Computer Sciences Department, University of Wisconsin, 1210 West

More information

Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3. Diagonal matrices

Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3. Diagonal matrices Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3 Diagonal matrices What you need to know already: Basic definition, properties and operations of matrix. What you

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

TEST CODE: MMA (Objective type) 2015 SYLLABUS

TEST CODE: MMA (Objective type) 2015 SYLLABUS TEST CODE: MMA (Objective type) 2015 SYLLABUS Analytical Reasoning Algebra Arithmetic, geometric and harmonic progression. Continued fractions. Elementary combinatorics: Permutations and combinations,

More information

Lecture 6 Positive Definite Matrices

Lecture 6 Positive Definite Matrices Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

More information

Differentiation. f(x + h) f(x) Lh = L.

Differentiation. f(x + h) f(x) Lh = L. Analysis in R n Math 204, Section 30 Winter Quarter 2008 Paul Sally, e-mail: sally@math.uchicago.edu John Boller, e-mail: boller@math.uchicago.edu website: http://www.math.uchicago.edu/ boller/m203 Differentiation

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

8. Diagonalization.

8. Diagonalization. 8. Diagonalization 8.1. Matrix Representations of Linear Transformations Matrix of A Linear Operator with Respect to A Basis We know that every linear transformation T: R n R m has an associated standard

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

Introduction and Math Preliminaries

Introduction and Math Preliminaries Introduction and Math Preliminaries Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Appendices A, B, and C, Chapter

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4 Instructor: Farid Alizadeh Scribe: Haengju Lee 10/1/2001 1 Overview We examine the dual of the Fermat-Weber Problem. Next we will

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

3 The language of proof

3 The language of proof 3 The language of proof After working through this section, you should be able to: (a) understand what is asserted by various types of mathematical statements, in particular implications and equivalences;

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

The Skorokhod reflection problem for functions with discontinuities (contractive case)

The Skorokhod reflection problem for functions with discontinuities (contractive case) The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Infinite series, improper integrals, and Taylor series

Infinite series, improper integrals, and Taylor series Chapter 2 Infinite series, improper integrals, and Taylor series 2. Introduction to series In studying calculus, we have explored a variety of functions. Among the most basic are polynomials, i.e. functions

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

An Introduction to Correlation Stress Testing

An Introduction to Correlation Stress Testing An Introduction to Correlation Stress Testing Defeng Sun Department of Mathematics and Risk Management Institute National University of Singapore This is based on a joint work with GAO Yan at NUS March

More information

Math 302 Outcome Statements Winter 2013

Math 302 Outcome Statements Winter 2013 Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

Applications of Differentiation

Applications of Differentiation MathsTrack (NOTE Feb 2013: This is the old version of MathsTrack. New books will be created during 2013 and 2014) Module9 7 Introduction Applications of to Matrices Differentiation y = x(x 1)(x 2) d 2

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Evaluating Goodness of Fit in

Evaluating Goodness of Fit in Evaluating Goodness of Fit in Nonmetric Multidimensional Scaling by ALSCAL Robert MacCallum The Ohio State University Two types of information are provided to aid users of ALSCAL in evaluating goodness

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Vectors. January 13, 2013

Vectors. January 13, 2013 Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

An Algorithm for Solving the Convex Feasibility Problem With Linear Matrix Inequality Constraints and an Implementation for Second-Order Cones

An Algorithm for Solving the Convex Feasibility Problem With Linear Matrix Inequality Constraints and an Implementation for Second-Order Cones An Algorithm for Solving the Convex Feasibility Problem With Linear Matrix Inequality Constraints and an Implementation for Second-Order Cones Bryan Karlovitz July 19, 2012 West Chester University of Pennsylvania

More information

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS JAN DE LEEUW ABSTRACT. We study the weighted least squares fixed rank approximation problem in which the weight matrices depend on unknown

More information

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

More information

Mathematics 530. Practice Problems. n + 1 }

Mathematics 530. Practice Problems. n + 1 } Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined

More information

,, rectilinear,, spherical,, cylindrical. (6.1)

,, rectilinear,, spherical,, cylindrical. (6.1) Lecture 6 Review of Vectors Physics in more than one dimension (See Chapter 3 in Boas, but we try to take a more general approach and in a slightly different order) Recall that in the previous two lectures

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Chapter 2: Unconstrained Extrema

Chapter 2: Unconstrained Extrema Chapter 2: Unconstrained Extrema Math 368 c Copyright 2012, 2013 R Clark Robinson May 22, 2013 Chapter 2: Unconstrained Extrema 1 Types of Sets Definition For p R n and r > 0, the open ball about p of

More information

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then 1. x S is a global minimum point of f over S if f (x) f (x ) for any x S. 2. x S

More information

A misère-play -operator

A misère-play -operator A misère-play -operator Matthieu Dufour Silvia Heubach Urban Larsson arxiv:1608.06996v1 [math.co] 25 Aug 2016 July 31, 2018 Abstract We study the -operator (Larsson et al, 2011) of impartial vector subtraction

More information

MAXIMA AND MINIMA CHAPTER 7.1 INTRODUCTION 7.2 CONCEPT OF LOCAL MAXIMA AND LOCAL MINIMA

MAXIMA AND MINIMA CHAPTER 7.1 INTRODUCTION 7.2 CONCEPT OF LOCAL MAXIMA AND LOCAL MINIMA CHAPTER 7 MAXIMA AND MINIMA 7.1 INTRODUCTION The notion of optimizing functions is one of the most important application of calculus used in almost every sphere of life including geometry, business, trade,

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2 Instructor: Farid Alizadeh Scribe: Xuan Li 9/17/2001 1 Overview We survey the basic notions of cones and cone-lp and give several

More information