The iterative convex minorant algorithm for nonparametric estimation

Similar documents
The solution of the discretized incompressible Navier-Stokes equations with iterative methods

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

Faculteit der Technische Wiskunde en Informatica Faculty of Technical Mathematics and Informatics

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

The rate of convergence of the GMRES method

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

Generalized continuous isotonic regression

On the implementation of symmetric and antisymmetric periodic boundary conditions for incompressible flow

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Linear Regression and Its Applications

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Citation for published version (APA): van der Vlerk, M. H. (1995). Stochastic programming with integer recourse [Groningen]: University of Groningen

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

MATHEMATICAL PROGRAMMING I

4. Algebra and Duality

Institute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y.

University of California, Berkeley

16 Chapter 3. Separation Properties, Principal Pivot Transforms, Classes... for all j 2 J is said to be a subcomplementary vector of variables for (3.

Convex Feasibility Problems

IMC 2015, Blagoevgrad, Bulgaria

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Algorithms for nonlinear programming problems II

Rank-one LMIs and Lyapunov's Inequality. Gjerrit Meinsma 4. Abstract. We describe a new proof of the well-known Lyapunov's matrix inequality about

Math Camp Notes: Everything Else

Some Notes On Rissanen's Stochastic Complexity Guoqi Qian zx and Hans R. Kunsch Seminar fur Statistik ETH Zentrum CH-8092 Zurich, Switzerland November

ARE202A, Fall Contents

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Wei Pan 1. the NPMLE of the distribution function for interval censored data without covariates. We reformulate the

Further experiences with GMRESR

Rough Sets, Rough Relations and Rough Functions. Zdzislaw Pawlak. Warsaw University of Technology. ul. Nowowiejska 15/19, Warsaw, Poland.

Analysis on Graphs. Alexander Grigoryan Lecture Notes. University of Bielefeld, WS 2011/12

over the parameters θ. In both cases, consequently, we select the minimizing


LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for en

Discrete (and Continuous) Optimization WI4 131

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

2 EBERHARD BECKER ET AL. has a real root. Thus our problem can be reduced to the problem of deciding whether or not a polynomial in one more variable

1 Introduction Semidenite programming (SDP) has been an active research area following the seminal work of Nesterov and Nemirovski [9] see also Alizad

58 Appendix 1 fundamental inconsistent equation (1) can be obtained as a linear combination of the two equations in (2). This clearly implies that the

z = f (x; y) f (x ; y ) f (x; y) f (x; y )

Chernoff s distribution is log-concave. But why? (And why does it matter?)

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7, D Berlin

On Coarse Geometry and Coarse Embeddability

R. Schaback. numerical method is proposed which rst minimizes each f j separately. and then applies a penalty strategy to gradually force the

Proposition 5. Group composition in G 1 (N) induces the structure of an abelian group on K 1 (X):

Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory

Auerbach bases and minimal volume sufficient enlargements

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Dynamical Systems. August 13, 2013

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

Linear Discrimination Functions

THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS

Pointwise convergence rate for nonlinear conservation. Eitan Tadmor and Tao Tang

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996

Optimization: Interior-Point Methods and. January,1995 USA. and Cooperative Research Centre for Robust and Adaptive Systems.

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Numerical Comparisons of. Path-Following Strategies for a. Basic Interior-Point Method for. Revised August Rice University

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

On the projection onto a finitely generated cone

Zangwill s Global Convergence Theorem

A Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)

A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

and the nite horizon cost index with the nite terminal weighting matrix F > : N?1 X J(z r ; u; w) = [z(n)? z r (N)] T F [z(n)? z r (N)] + t= [kz? z r

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Matematicas Aplicadas. c1998 Universidad de Chile A CONVERGENT TRANSFER SCHEME TO THE. Av. Ejercito de Los Andes 950, 5700 San Luis, Argentina.

Linear Algebra: Linear Systems and Matrices - Quadratic Forms and Deniteness - Eigenvalues and Markov Chains

Projected Gradient Methods for NCP 57. Complementarity Problems via Normal Maps

f = 2 x* x g 2 = 20 g 2 = 4

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Introduction to Real Analysis

Werner Romisch. Humboldt University Berlin. Abstract. Perturbations of convex chance constrained stochastic programs are considered the underlying

Lecture 1. Toric Varieties: Basics

Lecture 2: Review of Prerequisites. Table of contents

University of California. Berkeley, CA fzhangjun johans lygeros Abstract

ON THE DIAMETER OF THE ATTRACTOR OF AN IFS Serge Dubuc Raouf Hamzaoui Abstract We investigate methods for the evaluation of the diameter of the attrac

3 The Simplex Method. 3.1 Basic Solutions

INRIA Rocquencourt, Le Chesnay Cedex (France) y Dept. of Mathematics, North Carolina State University, Raleigh NC USA

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace

The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion.

Semidefinite Programming Basics and Applications

Lecture 5 : Projections

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

Special Classes of Fuzzy Integer Programming Models with All-Dierent Constraints

The extreme points of symmetric norms on R^2

Optimal maintenance decisions over bounded and unbounded horizons

Error Empirical error. Generalization error. Time (number of iteration)

Transcription:

The iterative convex minorant algorithm for nonparametric estimation Report 95-05 Geurt Jongbloed Technische Universiteit Delft Delft University of Technology Faculteit der Technische Wiskunde en Informatica Faculty of Technical Mathematics and Informatics

ISSN 0922-564 Copyright c 995 by the Faculty of Technical Mathematics and Informatics, Delft, The Netherlands. No part of this Journal may be reproduced in any form, by print, photoprint, microfilm, or any other means without permission from the Faculty of Technical Mathematics and Informatics, Delft University of Technology, The Netherlands. Copies of these reports may be obtained from the bureau of the Faculty of Technical Mathematics and Informatics, Julianalaan 32, 2628 BL Delft, phone +352784568. A selection of these reports is available in PostScript form at the Faculty s anonymous ftp-site. They are located in the directory /pub/publications/tech-reports at ftp.twi.tudelft.nl

The iterative convex minorant algorithm for nonparametric estimation By Geurt Jongbloed Department of Mathematics Delft University of Technology Mekelweg 4 2628 CD Delft The Netherlands Abstract The problem of minimizing a smooth convex function over a basic cone in IR n is frequently encountered in nonparametric statistics. For that type of problem we suggest an algorithm and show that this algorithm converges to the solution of the minimization problem. Introduction Groeneboom & Wellner (992) introduce the iterative convex minorant (ICM) algorithm to compute nonparametric maximum likelihood estimators (NPMLE's) for distribution functions in some statistical inverse problems. Using the specic structure of the Interval Censoring Case II problem, Aragon & Eberly (992) show the ICM algorithm to be locally convergent under the assumption that the points of jump of the NPMLE are known in advance. Determining these points of jump is, however, the main part of the problem. In this paper we describe the ICM algorithm in its general form, show that it does not converge under mild conditions and propose a modied version that does converge under mild conditions. The ICM algorithm is taylored for minimizing a smooth convex function over one of the cones C or C + in IR n, which are dened by C = fx 2 IR n : x x 2 x n g and C + = fx 2 C : x 0g : () Although this problem might seem rather specic, it is a very general problem. Convex optimization problems over more general nitely generated closed convex cones K in IR n arising in statistics, can often be rewritten in terms of one of the cones C or C +. Examples of estimation problems where the algorithm can be applied to compute the NPMLE are the Interval Censoring Case II, Deconvolution, and Wicksell's problem (see also Jongbloed (995)). Another example where it can be applied, maximum likelihood estimation of a convex decreasing density, is given in section 5. Also least squares estimators for a convex regression function can be computed by means of the ICM algorithm. 0 AMS 99 subject classications. primary 65U05, secondary 62G05. 0 Key words and phrases. global convergence, inverse problems, isotonic regression.

For some of these examples also other algorithms have been proposed. For censoring problems, the Expectation Maximization (EM) algorithm (see e.g. Dempster et al. (977) and Wu (983) for a convergence proof of this algorithm) is frequently used. The experience with this algorithm is that it converges rather slowly to the solution of the optimization problem. Recently, for censoring problems, a combination of the EM and ICM algorithm is proposed in Zhan & Wellner (995). Simulation results indicate this hybrid algorithm to behave very well for the double censoring model. Also general optimization techniques such as Interior Point Methods have been applied to some statistical estimation problems. See e.g. Terlaky & Vial (995). Some known results from optimization theory and the theory of isotonic regression are reviewed in section 2. In section 3 we show that in general the ICM algorithm does not converge to the solution of the minimization problem. We also give a modied ICM algorithm in pseudo code. For this modied algorithm we prove a global convergence result in section 4. Finally, in section 5, we compute the maximum likelihood estimator of a convex and decreasing density on [0; ), using the modied ICM algorithm. Additionally, a useful lemma is proved which states that the ICM algorithm can also be used to maximize loglikelihood-type functions over the intersection of a closed convex cone and a hyperplane in IR n. 2 Review of some known results Let K be a cone in IR n and satisfy Condition : IR n! (?; ] is () convex and attains its minimum over K at a unique point ^x, (2) continuous, (3) continuously dierentiable on the set fx 2 IR n : (x) < g. Writing r for the vector of partial derivatives of, @ @ T r(x) = (x); ; (x) ; @x @x n and (; ) for the usual inner product in IR n, it is known from e.g. Robertson et al. (988) section 6.2, that ^x = argmin (x) x2k if and only if ^x 2 K satisfying 8x 2 K : (x; r(^x)) 0 and (^x; r(^x)) = 0: (2) Taking for K one of the cones C or C + as dened in () and for the function q(x) = 2 (x? y)t W (x? y) 2

for some xed y 2 IR n and positive denite diagonal matrix W = diag(w i ), the optimality conditions in (2) have a nice geometric interpretation. Indeed, for K = C, ^x i is the left derivative of the convex minorant of the cumulative sum diagram consisting of the points P 0 = (0; 0) and P j = 0 X @ j l= w l ; jx l= w l y l A for j n evaluated at the point P i. If K = C +, the only dierence is that the negative components of ^x should be changed to zero. The geometric interpretation of the optimality conditions when the object function is a quadratic form with a diagonal matrix of second order derivatives and the cone is C or C +, is the back bone of the theory of isotonic regression as it can be found in Barlow et al. (972) and Robertson et al. (988). Let x (0) 2 C be xed and let k = 0. The idea behind the ICM algorithm is then to approximate the convex function locally near x (k) by a quadratic form of the type q(x; x (k) ) = 2 x? x (k) + W (x (k) )? r(x (k) ) T W (x (k) ) x? x (k) + W (x (k) )? r(x (k) ) where W (x (k) ) is a positive denite diagonal matrix depending on x (k). The next iterate x (k+) is then dened as the minimizer of q(; x (k) ) over C. Incrementing k with one and repeating the procedure gives the iterative algorithm. Since x (k+) can be determined by taking the derivative of the convex minorant of the cumulative sum diagram consisting of the points P 0 = (0; 0) and P j = 0 X @ j l= w (k) l ; jx l= w (k) l x (k) l? @ @x l (x (k) ) A for j n; where w (k) l denotes the l-th diagonal entry of W (x (k) ), the name iterative convex minorant algorithm is justied. 3 Description of the algorithm In this section K denotes one of the cones C and C + rather than a general cone in IR n and satises condition. An iterative optimization algorithm to approximate ^x = argmin (y); y2k is properly specied by an initial point x (0) 2 K, an algorithmic map A and a termination criterion. An algorithmic map is a mapping x 7! A(x) dened on K and taking values in the class of nonempty subsets of K. The algorithm can then be formulated as: k := 0; while the stopping criterion is not satised, x (k+) 2 A(x (k) ) and k := k +. Once a continuous mapping x 7! W (x) from K to the class of positive denite diagonal matrices (equipped with its usual matrix norm) is specied, the algorithmic map associated with the ICM algorithm is given by B(x) = argmin y2k T y? x + W (x)? r(x) W (x) y? x + W (x)? r(x) ; 2 3 ;

where we adopt the convention to leave out the curly brackets when the set returned by an algorithmic map is a singleton. Note that, by the continuity of x 7! W (x) and Condition, the mapping B is continuous at each point x where (x) <. Taking (x) = x 2? x + 2 x x 2 + 3 4 (x2 + x 2 2); K = C, x (0) = (; ) T and W I, the identity matrix, it follows that an algorithm based on B does in general not converge. (Indeed, x (k) = (; ) T for k even and x (k) = (?;?) T for k odd in this example.) Moreover, it may happen that the value of at some iterate is innite, so that the algorithm is not even well dened. However, and this we will use when we dene the modied ICM algorithm, the algorithmic map B generates a direction of descent for at each x 2 K n f^xg such that (x) <. This result is stated in Lemma. Lemma Let satisfy Condition and x 2 K n f^xg satisfy (x) <. Then for all > 0 suciently small. (x + (B(x)? x)) < (x) Proof: Fix x 2 K n f^xg with (x) < and dene the function on [0; ] as follows: () = (x + (B(x)? x)): It suces to show that the right derivative of at zero, 0 (0) = (B(x)? x) T r(x); is strictly negative. From the denition of B(x) and the fact that x 2 K, it follows by (2) that and Subtracting (4) from (3) we see that (B(x); W (x)(b(x)? x) + r(x)) = 0 (3) (x; W (x)(b(x)? x) + r(x)) 0: (4) (B(x)? x; W (x)(b(x)? x)) + 0 (0) 0: (5) Note that the assumption x 6= ^x implies that x 6= B(x). Therefore, since W (x) is positive denite, the rst term at the left hand side of (5) is strictly positive, so that 0 (0) < 0 as was to be shown. 2 Using lemma we can construct an algorithm that converges to ^x. The idea behind this modied iterative convex minorant algorithm is to select a point x (k+) from the segment n o seg(x (k) ; B(x (k) )) = x (k) + (B(x (k) )? x (k) ) : 2 [0; ] 4

such that the value of decreases suciently when moving from x (k) to x (k+). One way to formalize this idea is to dene the algorithmic map C C(x) = 8 >< >: fb(x)g if (B(x)) < (x) + (? )r(x) T (B(x)? x) fy 2 seg(x; B(x)) : (? )r(x) T (y? x) (y)? (x) r(x) T (y? x)g elsewhere, where 2 (0; =2) is xed. See Figure for the idea behind the denition of C. (6) 0 0 0 Figure : The three possible forms of the set returned by the algorithmic map C in the parametrization () = (x + (A(x)? x)). To completely specify the algorithm we should x an initial point for the algorithm, a rule to determine x (k+) from C(x (k) ) and a termination criterion. As an initial point we take any x (0) 2 K with (x (0) ) <. As a rule to choose x (k+) from C(x (k) ) we propose to choose x (k+) = B(x (k) ) whenever it belongs to C(x (k) ), and otherwise perform a binary search for an element of C(x (k) ) in the segment seg(x (k) ; B(x (k) )). See the pseudo code below for an exact description of this binary search, which can easily be seen to terminate after a nite number of steps. Finally, we base our stopping criterion on (2), where we use that for C the inequality part of (2) is equivalent to the conditions i=j @ @x i (^x) ( 0 for j n = 0 for j = : Below we give a formal description of the algorithm obtained in this way (K = C). Modied iterative convex minorant algorithm Input: > 0: accuracy parameter; 2 (0; =2): line search parameter; x (0) 2 K: initial point satisfying (x (0) ) < ; 5

begin x := x (0) ; while j P n i= x i @ @ P @x i (x)j > or min ni=j jn P n @x i (x)j > or j i= begin ~y := argmin y2k (y? x + W (x)? r(x)) T W (x)(y? x + W (x)? r(x)); if (~y) < (x) + r(x) T (~y? x) then x := ~y else begin := ; s := =2; z := ~y; while (z) < (x) + (? )r(x) T (z? x) (I) or (z) > (x) + r(x) T (z? x) (II) do begin if (I) then := + s; if (II) then :=? s; z := x + (~y? x); s := s=2; end; x := z; end; end; end. @ @x i (x) <? do If the algorithm is used to minimize over C +, the C should be replaced by C + throughout the algorithm and the second condition in the rst while statement should be removed. In the next section we prove that under mild conditions the modied ICM algorithm generates a sequence x (k) such that x (k)! ^x for k!. 4 Convergence of the modied ICM algorithm To prove the modied ICM algorithm to converge to the point ^x we will use a general convergence theorem (cf. Bazaraa et al. (993), theorem 7.2.3 or Zangwill (969), page 9; curiously enough, this theorem is also used in Wu (983) to prove global convergence of the EM algorithm). This theorem assures convergence of the algorithm based on an algorithmic map A under three conditions. The rst is that the sequence of iterates generated by the algorithm is contained in a compact subset K of K. The second is that there exists a descent function, which is a continuous function on K such that (y) < (x) for all y 2 A(x), whenever x 6= ^x. The third condition is that the algorithmic map A is closed. This means that if (x k ) and (y k ) are sequences in K satisfying x k! x, y k 2 A(x k ) and y k! y, then y 2 A(x) necessarily. Theorem Let the function : IR n! (?; ] satisfy Condition and x (0) 2 K satisfy (x (0) ) <. Let the mapping x 7! W (x) take values in the set of positive denite (n n) 6

diagonal matrices such that x 7! W (x) is continuous on the set K = fx 2 K : (x) (x (0) )g: (7) Then an algorithm generated by the mapping C, as dened in (6), converges to ^x. Proof: From lemma it follows that the mapping C is well dened and has as a descent function: for all x 6= ^x and for all y 2 C(x), (y) < (x). From this observation it follows that fx (k) : k 0g K; where K is as dened in (7). From Condition () and (2) and the fact that (x (0) ) <, it follows that K is compact. Therefore, in view of the remarks made above, closedness of C at each x 2 K n f^xg would imply global convergence of the algorithm. Fix x 2 K n f^xg and a sequence (x k ) in K such that x k! x. Let y k 2 C(x k ) with y k! y for some y 2 K. To prove closedness of C we have to prove that y 2 C(x). First note that continuity of the mapping x 7! W (x) on K and Condition (3) yield that B(x k )! B(x) and r(x k )! r(x) (8) as k!. From this it follows that y 2 seg(x; B(x)) necessarily. Now consider the two dierent situations that can occur. The rst situation is that (B(x k )) (x k ) + (? )r(x k ) T (B(x k )? x k ) for innitely many values of k. Letting k tend to innity along a subsequence k j where this inequality holds, we get from (8) that (B(x)) (x) + (? )r(x) T (B(x)? x) so that C(x) = fb(x)g. Moreover, along the same subsequence it follows from the denition of C that y kj = B(x kj ). Therefore, for j!, y kj! B(x) by the continuity of B. This shows that y = B(x) 2 C(x) as was to be proved. The other possibility is that for all k suciently large (B(x k )) > (x k ) + (? )r(x k ) T (B(x k )? x k ): Letting k! and using (8), it then follows that (B(x)) (x) + (? )r(x) T (B(x)? x): Therefore, according to the denition of C and the fact that y 2 seg(x; B(x)), y 2 C(x) whenever (y)? (x) 2 [(? )r(x) T (y? x); r(x) T (y? x)]: This, however, immediately follows from the fact that for all k suciently large (y k )? (x k ) 2 [(? )r(x k ) T (y k? x k ); r(x k ) T (y k? x k )]; x k! x, y k! y and r(x k )! r(x). 2 7

5 Example Let z < z 2 < < z n denote an ordered realization of a sample from a density g on [0; ) which is known to be convex and decreasing and dene z? = z 0 = 0. Consider the problem of estimating g from the data. This estimation problem can be found in Hampel (987). In Groeneboom & Jongbloed (995) a sieved nonparametric maximum likelihood estimator for g is dened. This estimator is dened as the maximizer of the function g 7! n n? X i=0 log g(z i ); over the class of convex decreasing densities g on [0; ) which are piecewise linear such that all the jumps in the derivative of g are concentrated at the observation points. Therefore, dening x i = g(z n?i ) for 0 i n, we see that this class of densities can be identied with the intersection of the closed convex cone K K = x 2 IR n : x 0 and x i? x i? z n?i+? z n?i x i+? x i for i n? : (9) z n?i? z n?i? in IR n and the ane subspace A ( A = x 2 IR n : 2 i= x i (z n?i+? z n?i? ) = ) : in IR n which takes into account the fact that densities integrate to one. The problem of determining the maximum likelihood estimator is therefore equivalent to the problem of determining ^x = argmin x2k\a? n i= log x i : Lemma 2 shows that this problem is equivalent to minimizing a smooth strictly convex function over the whole cone K rather than over the intersection of K with the ane subspace A. Lemma 2 Let K be a cone in IR n and the function be dened by (x) =? n i= log x i : Let c 6= 0 be a vector in IR n and A the ane subset of IR n given by A = fx 2 IR n : c T x = g for some given 6= 0. Then argmin K\A (x) = argmin (x) + K ct x : 8

Proof: Include the linear restriction c T x = in the object function via the Lagrangian multiplier to obtain the function (x) = (x) + (c T x? ): On K \ A this function coincides with. When ^x minimizes over K, and c T ^x =, then ^x evidently minimizes over K \ A. From the structure of together with the equality part of (2), it follows that! ^x ;i? + c i = 0; n^x ;i so that we have Therefore, it is clear that which was to be proved. According to this lemma ^x = argmin y2k Noting that x 2 K if and only if x i = i= argmin A\K? ix j= i= c T ^x = =: (x) = argmin = (x); K we see that determining ^x is equivalent to determining where (y) =? =? 8 0 < : X n log @ i j= 8 0 < : X n log @ i i= i= j= n log x i? 2 x i(z n?i+? z n?i?) : (z n?j+? z n?j )y j for some y 2 C + ; (0) ^y = argmin (y); y2c + 9 (z n?j+? z n?j )y j A ix =? 2 (z n?i+? z n?i? ) (z n?j+? z n?j )y j ; j= 9 (z n?j+? z n?j )y j A =? 2 y i(z 2? n?i+ z2 n?i) ; : () Figure 2 shows the maximum likelihood estimate of a convex decreasing density on [0; ) based on a sample of size n = 000 from the density g(z) = 3(? z) 2 [0;] (z) 2 9

computed by the modied ICM algorithm. We used the settings = 0?5, = 0:, y (0) i = 2 ( i n) and the weights w(y) i = (z n?i+? z n?i ) y i j=i nx j ( i n); where x depends on y as in (0). On a NeXTSTEP machine the algorithm stopped after 05 iterations. 3.004 2.002.00 0 0 0.25 0.5 0.75 Figure 2: Maximum likelihood estimator of the density based on a sample of size 000; the dashed curve is the underlying density. References [] Aragon, J., Eberly, D. (992). On convergence of convex minorant algorithms for distribution estimation with interval-censored data. J. of Comp. and Graph. Statist. 29-40. [2] Barlow, R.E., Bartholomew, R.J., Bremner, J.M. and Brunk, H.D. (972). Statistical inference under order restrictions. Wiley, New York. [3] Bazaraa, M.S., Sherali, H.D. and Shetti, C.M. (993). Nonlinear programming, theory and algorithms. Wiley, New York. [4] Dempster, A.P., Laird, N.M. and Rubin, D.B. (977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39? 38. 0

[5] Groeneboom, P. and Jongbloed, G. (995). Maximum likelihood estimation of a convex decreasing density. In preparation. [6] Groeneboom, P. and Wellner, J.A. (992). Information bounds and nonparametric maximum likelihood estimation. Birkhauser, Basel. [7] Hampel, F.R. (987). Design, modelling, and analysis of some biological data sets. In C.L. Mallows, editor, Design, data and analysis, by some friends of Cuthbert Daniel, p. -5, Wiley, New York. [8] Jongbloed, G. (995). Three statistical inverse problems. Ph.D. thesis, Delft University of Technology, The Netherlands. [9] Robertson, T., Wright, F.T. and Dykstra, R.L. (988). Order restricted statistical inference. Wiley, New York. [0] Terlaky, T. and Vial, J.Ph. (995). Maximum likelihood estimation of convex density functions. Technical Report 95-49, Department of Mathematics, Delft University of Technology. [] Wu, C.F.J. (983). On the convergence properties of the EM algorithm. Ann. Statist. 95-03. [2] Zangwill, W.I. (969). Nonlinear programming: a unied approach. Prentice Hall, Englewood Clis, New Jersey. [3] Zhan, Y. and Wellner, J.A. (995). Double censoring: characterization and computation of the nonparametric maximum likelihood estimator. To appear as Technical Report, Department of Statistics, University of Washington.