SQUAREM. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms.

Size: px

Start display at page:

Download "SQUAREM. An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms."

Cordelia Griffin
5 years ago
Views:

1 An R package for Accelerating Slowly Convergent Fixed-Point Iterations Including the EM and MM algorithms Ravi 1 1 Division of Geriatric Medicine & Gerontology Johns Hopkins University Baltimore, MD, USA UseR! 2010 NIST, Gaithersburg, MD July 22, 2010

2 Speed Is Not All That It s Cranked Up To Be Evil deeds do not prosper; the slow man catches up with the swift - Homer (Odyssey)

3 What is a Fixed-Point Iteration? Fixed-Point Iterations Examples x k+1 = F(x k ), k = 0, 1,.... F : Ω R p Ω, and differentiable Most (if not all) iterations are FPI We are interested in contractive FPI Guaranteed convergence: {x k } x

4 EM Algorithm Background Fixed-Point Iterations Examples Let y, z, x, be observed, missing, and complete data, respectively. The k-th step of the iteration: where θ k+1 = argmax Q(θ θ k ); k = 0, 1,..., Q(θ θ k ) = E[L c (θ) y, θ k ], = L c (θ)f (z y, θ k )dz, Ascent property: L obs (θ k+1 ) L obs (θ k )

5 MM Algorithm Background Fixed-Point Iterations Examples A majorizing function, g(θ θ k ): f (θ k ) = g(θ k θ k ), f (θ k ) g(θ θ k ), θ. To minimize f (θ), construct a majorizing function and minimize it (MM) θ k+1 = argmax g(θ θ k ); k = 0, 1,... Descent property: f (θ k+1 ) f (θ k ) Is EM a subclass of MM or are they equivalent? It avoids the E-step.

6 Fixed-Point Iterations Examples Least Squares Multidimensional Scaling Minimize : σ(x) = 1 n n wij (δ ij d ij (X)) 2 2 p over all m p matrices X, where: d ij = k=1 (x ik x jk ) 2 Jan de Leeuw s SMACOF algorithm: ξ k+1 = F(ξ), Has descent property: σ(ξ k+1 ) < σ(ξ k ) An instance of MM algorithm

7 BLP Contraction Mapping Fixed-Point Iterations Examples Previous Talk!

8 Power Method Background Fixed-Point Iterations Examples To find the eigenvector corresponding to the largest (in magnitude) eigenvalue of an n n matrix, A. Not all that academic - Google s PageRank algorithm! x k+1 = A.x k / A.x k Stop if x k+1 x k ε Dominant eigenvalue (Rayleigh quotient) = A x,x x,x Geometric convergence with rate λ 1 λ 2 Power method does not converge if λ 1 = λ 2, but does!

9 R Package Why Accelerate Convergence? These FPI are globally convergent Convergence is linear: Rate = [ρ(j(x ))] 1 Slow convergence when spectral radius, ρ(j(x )), is large Need to be accelerated for practical application Without compromising on global convergence Without additional information (e.g. gradient, Hessian, Jacobian)

10 Background R Package An R package implementing a family of algorithms for speeding-up any slowly convergent multivariate sequence Easy to use Ideal for high-dimensional problems Input: fixptfn = fixed-point mapping F Optional Input: objfn = objective function (if any) Two main control parameter choices: order of extrapolation and monotonicity Available on R-forge under optimizer project. install.packages(, repos = )

11 Upshot Background R Package works great! Significant acceleration (depends on the linear rate of F) Globally convergent (especially, first-order locally non-monotonic schemes) Finds the same or (sometimes) better fixed-points than FPI (e.g. EM, SMACOF, Power method)

12 SMACOF Background Multidimensional Scaling: SMACOF Power Method for Dominant Eigenvector Mores code data (de Leeuw 2008). 36 Morse signals compared dissimilarities & 69 parameters Table: A comparison of the different schemes. Scheme # Fevals # ObjEvals CPU (sec) ObjfnValue SMACOF SQ SQ SQ SQ3*

13 Power Method - Part I Multidimensional Scaling: SMACOF Power Method for Dominant Eigenvector Generated a (arbitrary) matrix with eigenvalues as follows: eigvals <- c(2, 1.99, runif(997, 0, 1.9), -1.8) A cool algorithm using the Soules matrix! Table: A comparison of the different schemes: Average of 100 simulations Scheme # Fevals CPU (sec) Converged Power SQ SQ SQ

14 Power Method - Part II Multidimensional Scaling: SMACOF Power Method for Dominant Eigenvector Generated a (arbitrary) matrix with eigenvalues as follows: eigvals <- c(2, 1.99, runif(97, 0, 1.9), -2) Table: A comparison of the different schemes: Average of 100 simulations Scheme # Fevals CPU (sec) Converged Power SQ SQ SQ

15 Appendix For Further Reading For Further Reading I R., and C. Roland Scandinavian Journal of Statistics C. Roland, R., and C.E. Frangakis Numerical Mathematics

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may