There are two approaches to Hensel lftng. Lnear lftng starts wth polynomals f = f (0) and teratvely constructs polynomals f () such that ()f () f (?)

Similar documents
An efficient algorithm for multivariate Maclaurin Newton transformation

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Advanced Algebraic Algorithms on Integers and Polynomials

On the Interval Zoro Symmetric Single-step Procedure for Simultaneous Finding of Polynomial Zeros

36.1 Why is it important to be able to find roots to systems of equations? Up to this point, we have discussed how to find the solution to

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Journal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co.

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Solution of Linear System of Equations and Matrix Inversion Gauss Seidel Iteration Method

Finding Primitive Roots Pseudo-Deterministically

Min Cut, Fast Cut, Polynomial Identities

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Quantum Mechanics I - Session 4

Lecture 3. Ax x i a i. i i

Generalized Linear Methods

Problem Set 9 Solutions

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

CALCULUS CLASSROOM CAPSULES

Report on Image warping

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Lecture 5 Decoding Binary BCH Codes

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

A Hybrid Variational Iteration Method for Blasius Equation

On a Parallel Implementation of the One-Sided Block Jacobi SVD Algorithm

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

a b a In case b 0, a being divisible by b is the same as to say that

Hidden Markov Models

Kernel Methods and SVMs Extension

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Chapter - 2. Distribution System Power Flow Analysis

Section 3.6 Complex Zeros

Feature Selection: Part 1

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

The Minimum Universal Cost Flow in an Infeasible Flow Network

Digital Signal Processing

Section 8.3 Polar Form of Complex Numbers

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

Polynomial Regression Models

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems )

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

The Expectation-Maximization Algorithm

Numerical Solution of Ordinary Differential Equations

Module 9. Lecture 6. Duality in Assignment Problems

COMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION. Erdem Bala, Dept. of Electrical and Computer Engineering,

1 Matrix representations of canonical matrices

On the Multicriteria Integer Network Flow Problem

Global Sensitivity. Tuesday 20 th February, 2018

Lecture 10 Support Vector Machines II

Solving Nonlinear Differential Equations by a Neural Network Method

: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11:

Exercises. 18 Algorithms

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Lecture 4: Universal Hash Functions/Streaming Cont d

2 JOEL V. BRAWLE, SHUHONG GAO, AND DONALD MILLS for all ; 2 G. Then for monc polynomals f and g whose coecents are n F q and whose roots le n G, the c

A new Approach for Solving Linear Ordinary Differential Equations

Structure and Drive Paul A. Jensen Copyright July 20, 2003

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Calculation of time complexity (3%)

Finding Dense Subgraphs in G(n, 1/2)

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Errors for Linear Systems

Polynomials. 1 More properties of polynomials

Problem Solving in Math (Math 43900) Fall 2013

Difference Equations

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

18.1 Introduction and Recap

Distributed and Stochastic Machine Learning on Big Data

Gaussian process classification: a message-passing viewpoint

Lecture 21: Numerical methods for pricing American type derivatives

1 GSW Iterative Techniques for y = Ax

Math 261 Exercise sheet 2

Lecture 4. Instructor: Haipeng Luo

Integrals and Invariants of Euler-Lagrange Equations

The Study of Teaching-learning-based Optimization Algorithm

Inexact Newton Methods for Inverse Eigenvalue Problems

Lecture 13 APPROXIMATION OF SECOMD ORDER DERIVATIVES

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

A CHARACTERIZATION OF ADDITIVE DERIVATIONS ON VON NEUMANN ALGEBRAS

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

New modular multiplication and division algorithms based on continued fraction expansion

form, and they present results of tests comparng the new algorthms wth other methods. Recently, Olschowka & Neumaer [7] ntroduced another dea for choo

Computing Correlated Equilibria in Multi-Player Games

Ensemble Methods: Boosting

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

CISE301: Numerical Methods Topic 2: Solution of Nonlinear Equations

Finite Difference Method

Nodal analysis of finite square resistive grids and the teaching effectiveness of students projects

8.6 The Complex Number System

Solutions to Problem Set 6

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

O-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment

Transcription:

On Bvarate Hensel Lftng and ts Parallelzaton Laurent Bernardn Insttut fur Wssenschaftlches Rechnen ETH Zurch bernardn@nf.ethz.ch Abstract We present a new parallel algorthm for performng lnear Hensel lftng of bvarate polynomals over a nte eld. The sequental verson of our algorthm has a runnng tme of O(mn 4 ) for lftng m unvarate polynomals of degree n wth respect to a bvarate polynomal of degree n n both varables, assumng that we use classcal polynomal multplcaton. Our parallel algorthm further reduces ths complexty to O(m n s n3 ) on s processng nodes, assumng that s < n. We also present an asymptotcally faster algorthm, whch has a complexty of O((ln m)n ln n) operatons n the coecent eld, usng fast polynomal multplcaton and O(n ln m) processors. Expermental results on a massvely parallel, dstrbuted memory machne conrm that our algorthm scales well on hgh numbers of processng nodes. Introducton Gven polynomals f ; : : : ; f m F p [x], parwse relatvely prme and a prmtve, square-free polynomal f F p [x; y] such that f Y f (mod y) bvarate Hensel lftng ams to construct polynomals f ; : : : ; f m F p [x; y] such that ()8 : f f (mod y) ()f Y f (mod y k ) If k s sucently large, the f obtaned can be used to compute a factorzaton of f over F p. We restrct ourselves to the case of bvarate polynomals over a nte eld. In practce, bvarate polynomals are common. Moreover, state-of-the-art algorthms for factorng polynomals n more than two varables rely on multple factorzatons of bvarate polynomals [4, 5]. For these reasons, t s mportant to have a fast way of lftng bvarate polynomals. Parallel factorzaton algorthms for sparse multvarate polynomals have been presented n [9]. We use a dense lftng approach whch s most eectve for bvarate polynomals. Only as the number of varables ncreases does t become more and more mportant to use sparse technques to prevent exponental behavor n the number of varables. As mentoned above, we restrct ourselves to polynomals over nte elds, although the same deas can be appled over any rng.

There are two approaches to Hensel lftng. Lnear lftng starts wth polynomals f = f (0) and teratvely constructs polynomals f () such that ()f () f (?) (mod y ) ()f Y f () (mod y ) The bound k s reached after k lftng steps. Quadratc lftng also starts wth f = f (0) but constructs polynomals f ( ) such that ()f ( ) f (? ) (mod y () ) ()f Y f ( ) (mod y () ) The bound k s reached after log k lftng steps. If classcal multplcaton s used, the asymptotc complexty of both approaches s equvalent [7]. Parallelzng the quadratc algorthm s temptng as t nvolves large polynomal multplcatons that can easly be parallelzed usng Karatsuba's algorthm. However, n practce, the sequental quadratc lftng algorthm s not able to compete wth the lnear algorthm, at least for bvarate polynomals of degree up to 000 n both varables. For ths reason we wll concentrate on a parallel verson of lnear Hensel lftng. Above we assume, that we can evaluate f(x; y) at y = 0 such that deg x (f(x; y)) = deg x (f(x; 0)) and such that f(x; 0) s square-free. If ths does not hold, we compute the translated polynomal f = f(x; y ) such that f(x; 0) satses the above condtons. It s shown n [6], that ths translaton can be done usng O(n 3 ) operatons n the coecent eld (wth n a bound on the degree of f n both varables). We assume n the followng that the coecent eld F p contans such an. For more detals on ths selecton process and on the case where F p does not contan a sutable evaluaton value, see []. The Sequental Lftng Algorthm Lnear lftng algorthms for dense bvarate polynomals over nte elds gven n [3, 0, 8] need O(mn 5 ) operatons n F p, wth m the number of factors and n a bound on the degree n each varable polynomal to factor. We present a sequental algorthm, that s an order of magntude faster than these, needng only O(mn 4 ) coecent operatons. We then descrbe our parallel verson of ths algorthm. Gven polynomals f F p [x; y] and f () ::m F p[x] such that: f = f () (mod y) () wth deg x (f) n and deg y (f) n. Assume we want to lft the f () ::m up to degree n n y,.e. compute f (n) ::m such that f = f (n) (mod y n ) () and 8 = ::m f () f (n) (mod y) (3)

Usng lnear Hensel lftng, we want to compute, at step k, the f ::m F p [x; y] such that: wth For (5) to hold, we set: f f f = f (mod y k ) (4) f (k?) (mod y k ) (5) := f (k?) y k (6) wth F p [x]. Pluggng (6) nto (4), we see that lftng from y k? to y k amounts to solvng for the 's n Q m f? mx = f (k?) y k f (0) (mod y) (7) = (7) s a unvarate Dophantne equaton n F p [x] that can be solved by rst precomputng the solutons for mx = m Y = 6= Now we can easly compute the = 6= f (0) (mod y) (8) at each step by multplcaton wth the lefthand sde of (7) and reducton modulo f (0). Ths means that solvng the Dophantne equaton (7) has a cost of O(m) multplcatons n the coecent rng F p [x] and thus a total cost of O(mM(n)) operatons n F p, where M(n) s the complexty of multplyng two unvarate polynomals of degree n. Before we can solve the Dophantne equaton, we have to compute the left-hand sde of (7): Q m f? = f (k?) y k (mod y) (9) We notce that only the coecent of y k n the numerator, denoted by C k = f? = f (k?) s needed. We wll now dscuss how to ecently compute c k, such that! [y k ] (0) C k = f [yk ]? c k () Our dea s to compute the product of the f (k?) modulo y k at each step, reusng sub-products already computed n the prevous step. At step k we thus have to compute = f (k?) (mod y k ) () In the followng we wll denote the coecent of y = () ). u [] n f as u [] (notng that 3

We wll compute the product () teratvely factor by factor. We dene P q := qy = wth c k = P m [yk ]. The product of the rst two factors gves P = u[0] : : : u[] u[] u[0] f (k?) (mod y k ) (3) y : : : u[k?] u [] u[k?3] : : : u [k?3] u [] u[k?] u[k?] u [] u[k?] : : : u [k?] u [] u[k?] u [] u[k?] u [] u[k?] : : : u [k?] u [] u[k?] u [] For successve we can compute P as P = p [0] p [0] u [] p [] : : : p [0] u [k?] p [] u [k?] y : : : p [] u [k?] : : : p [k?] u [] p [k?] p [] u [k?] : : : p [k?] u [] p [k] y k y k? y k? y k y k? wth p [l] := P? [yl. Note that p s used exclusvely for smplfyng the notaton ] and that the p's are derent for varyng and k. Movng to step k we get P = u[0] P : : : = p [0] : : : u[] u[] u[0] y : : : u[k?] u [] u[k?3] : : : u [k?3] u [] u[k?] u[k?] u [] u[k?] : : : u [k?] u [] u[k?] u[k] u[] u[k?] : : : u [k?] u [] u[k] u[0] u [] u[k] u[] u[k?] : : : u [k?] u [] u[k] u[] p [0] u [k?] p [0] u [k] p [] u [k] p [0] u [] p [] y : : : y k y k? y k p [] u [k?] : : : p [k?] u [] p [k?] p [] u [k?] : : : p [k?] u [] p [k] p [] u [k?] : : : p [k] u [] p [k] y k y k y k? y k? wth p [l] := P? [y l ] We see that, lftng from y k? to y k, the only products that we have to compute and that have not already been computed n the prevous step are u[k] u [k] u[0] 4

for the rst two factors and u [] u[k] u[] u[k?] : : : u [k?] u [] u[k] u[] p [] u [k] p [0] u [k] p [] u [k?] : : : p [k] u [] p [k] for each subsequent factor. Thus the total number of multplcatons needed at step k of the lftng s equal to M k = k (m? )( (k )) = (k )(m? ) (4) Supposng that we want to lft our unvarate mage polynomals up to degree n, we need a total number of nx k= (k )(m? ) = ( m? )(n 5n) (5) multplcatons n the coecent rng F p [x] for computng the left-hand sdes of the arsng Dophantne equatons. Thus we get a total runnng tme of O(mn ) multplcatons n F p [x] for the lnear lftng algorthm. Supposng, that the degree n x of the factors are bounded by n, the total complexty of lnear Hensel lftng n terms of eld operatons s O(mn M(n)). Assumng classcal multplcaton we get a complexty of O(mn 4 ). In order to store the output,.e. the m factors lfted to degree n, we need memory for mn 4 elements from F p. In addton to these, we need to store the products P. These requre extra memory to hold (m? )n 4 elements from F p. Ths means that the amount of requred workng memory s less than the amount requred to store the result. 3 The Parallel Lftng Algorthm We wll now dscuss how to mplement ths algorthm n parallel. Each step needs the computaton of a sum of products of unvarate polynomals. We wll dstrbute ths computaton evenly across the avalable processng nodes. One node wll be reserved for collectng the results from the slave nodes and for solvng the Dophantne equaton at each step. The parallel lnear Hensel lftng algorthm s outlned n table. At step (D) and (G), the slave nodes need to compute a convoluton of the form kx =0 a b k? Ths s done by dstrbutng the products from ths sum evenly across the avalable slave nodes. The partal sums are then added together on the master node. The cost of ths addton s O( n s n) and although t could be reduced to O(n ln n s ) usng a bnary tree shaped algorthm, the ecency gan would be margnal as the cost of ths step s comparatvely small. Note, that whle the master node solves the Dophantne equaton necessary to lft the factors to y k, the slave nodes are already workng on the convoluton product that the master node wll need n order to lft the factors to y k. Ths gves us a nce computaton overlap and prevents the master node from ever havng to spn dly, watng for results from the slave nodes. At step k of each teraton, the master node has to perform m multplcatons and m dvsons of unvarate polynomals of degree n n order to solve the 5

Step Master Slave(s) (A) Input: f F p [x; y], f (0) ::m F p[x] Intalze = f (0) for = ::m Intalze p = = f (0) for = ::m Precompute the from (8) (B)?! send ::m, p ::m from master to slaves?! (C) Iterate steps (C){(J) for k from to n Compute the u [k] ::m from the Compute: and p m va equaton (7). u [] u[k?] : : : u [k?] u [] Compute u[k], u[k] u[0] Compute u [] u[k] u[k] u[] (D)? send u [] u[k?] : : : u [k?] u [] (E)?! send u [k] ::m (F) Update p (G) Iterate steps (G){(J) for from 3 to m Compute p [0]? u[k] Compute p [k]? u[] (H)? send p []? u[k] (I) Update p (J) from slaves to master? from master to slaves?! Compute: p [k]? u[0] p []? u[k] : : : p [k?]? u[] : : : p [k?]? u[] from slaves to master??! send p [k] ; p[k] from master to slaves?! Table : Parallel Algorthm 6

Dophantne equaton plus 4 3(m? ) multplcatons of unvarate polynomals of degree n to compute ts share of the convolutons. Ths amounts to a total cost of O(mnM(n)) operatons n F p, n order to lft the unvarate mage polynomals up to degree n. Assumng a number of s slave nodes, each one has to compute O(m k s ) multplcatons of unvarate polynomals of degree n at step k of the lftng. The total work of a sngle slave node sums up to O(m n s nm(n)) operatons n F p or O(m n s nn ) operatons n F p, assumng classcal polynomal multplcaton. 4 Expermental Results We have mplemented our algorthm on a massvely parallel, dstrbuted memory machne, an Intel Paragon, usng a verson of Maple that has been extended wth message passng prmtves []. Table summarzes the tmngs from lftng two degree n mage polynomals up to degree n n the second varable. Such a lftng s needed for the factorzaton of a bvarate polynomal of degree n n both varables. Our examples are over Nodes Tme Speedup Ecency 5.0 00% n = 00 3 9.7 90% 5 4.3 85% 35.0 00% n = 00 3 09.9 99% 5 58 5.6 % 34 9.6 87% 53.0 00% n = 300 3 354 3.3 09% 5 04 5.7 3% 9.7 5% 9036.0 00% 3 486 3.6 % n = 400 5 69 5.5 % 87.0 00% 49 8.4 87% 4 365 4.8 60% 7.0 00% 3 5790 3.7 % n = 500 5 379 5.6 % 88.3 0% 6 8.8 90% 4 704 30. 74% Table : Paragon Tmngs the coecent eld Z 3. Tmes are gven n wall-clock (real tme) seconds. The speedup factor s computed as: and we dene ecency as: Tme on one node Tme on s nodes Speedup Number of nodes 7

The n = 500 example corresponds to the factorzaton of a dense bvarate polynomal wth degree 000 n both varables. Its expanded form would have over a mllon terms. On a state-of-the-art workstaton, a Dgtal Alpha 500/333, the same lftng takes 099s, compared too 704s on 4 nodes of the Paragon. For n = 000, we could not get sequental tmngs on the Paragon, as our envronment mposes a ob lmt of 0 hours. However, we ran t on our Dgtal Alpha workstaton where t took 44 hours. Usng 8 nodes on the Paragon, we could reduce ths tme to hours, yeldng a speedup of. The sequental algorthm performs slghtly worse than the expected O(n 4 ). Ths s due to the overhead of Maple's garbage collectng memory manager whch ncreases wth the memory usage. Dstrbutng the computatons across more nodes, we also dstrbute the memory usage. Ths explans the super-lnear speedups that we encountered. 5 An Asymptotc Improvement n parallel. We acheve ths usng a bnary tree structured algorthm to combne the f ::m two by two. We assume that m s a power of two. If that's not the case, we pad usng A further mprovement of our lftng algorthm s to compute the P m dummy factors. Frst we dene T ; := f. Next we can compute: T ; = T ; yk ; yk wth ; = (T ) ;? (T [y 0 ] ; ) (T ) [y k ] ;? (T [y k ] ; ) [y 0 ] ; = kx q=0 (T ) ;? (T [y q ] ; ) [y k?q ] We can see that T ; = f? f T ;, wth T = ln m; P m : (mod y k ). Now we can compute successve wth ; ; T ; = T ; ; yk ; yk = (T?;? ) [y 0 ]?; (T?; ) [y 0 ]?;? ; = kx q=0 (T )?;? (T [y q ]?; ) [y k?q ] At each step k, smlarly to our ntal algorthm, the master node computes the, and those products of the ;, that nvolve coecents of yk, whle the slave nodes compute the remanng products of the ;. As n the ntal algorthm, we overlap the computaton of the ; on the slave nodes wth the computaton of (k?) ; and parts of (k?) ; on the master. Ths algorthm reduces the overall complexty to O((ln m) k ln s s nm(n)). Assumng O(n ln m) processors, the runnng tme s O(ln mnm(n)). Further assumng that we use fast unvarate polynomal multplcaton (M(n) = n ln n), we can clam an asymptotc runnng tme of O((ln m)n ln n) operatons n F p usng O(n ln m) processors. 8

6 Conclusons and further work We have presented a new algorthm for parallel bvarate Hensel lftng. The new algorthm has an asymptotc complexty of O((ln m)n ln n) operatons n F p on O(n ln m) processors. Addtonally t behaves well n practce, as experments on a massvely parallel machne have shown. The more varables a polynomal nvolves, the sparser t wll be n practce. For ths reason, even f our algorthm can be generalzed to polynomals n many varables, t wll be less ecent as t s nherently dense. Subect of further research wll be how to parallelze sparse multvarate Hensel lftng algorthms [4, 5] References [] Bernardn, L. Maple on a massvely parallel, dstrbuted memory machne. In Proceedngs of PASCO '97 (997). to appear. [] Bernardn, L., and Monagan, M. B. Ecent multvarate factorzaton over nte elds. In Proceedngs of AAECC '97 (997), Lecture Notes n Computer Scence, Sprnger-Verlag. to appear. [3] Geddes, K. O., Czapor, S. R., and Labahn, G. Algorthms for Computer Algebra. Kluwer Academc Publshers, Boston, 99. [4] Kaltofen, E. Sparse Hensel lftng. In Proceedngs of Eurocal '85, Vol. II (985), B. F. Cavness, Ed., vol. 04 of Lecture Notes n Computer Scence, Sprnger-Verlag, pp. 4{7. [5] Kaltofen, E., and Trager, B. M. Computng wth polynomals gven by black boxes for ther evaluatons: Greatest common dvsors, factorzaton, separaton of numerators and denomnators. Journal of Symbolc Computaton 9, 3 (March 990), 300{30. [6] Knuth, D. E. Semnumercal Algorthms, vol. of The Art of Computer Programmng. Addson Wesley, 98. [7] Mulders, T., and Bernardn, L. An analyss of lnear versus quadratc Hensel lftng. n preparaton, 997. [8] Vry, G. Factorzaton of multvarate polynomals wth coecents n F p. Journal of Symbolc Computaton 5, 4 (Aprl 993), 37{39. [9] Wang, P. S. Parallel polynomal operatons on SMPs: an overvew. Journal of Symbolc Computaton, 4 (996), 397{40. [0] Zppel, R. E. Eectve Polynomal Computaton. Kluwer Academc Publshers, Boston, 993. 9