Director of Research School of Mathematics

Size: px

Start display at page:

Download "Director of Research School of Mathematics"

Melanie Gibson
5 years ago
Views:

Multiprecision Research Matters Algorithms February Nick25, Higham 2009 School of Mathematics The University Nick Higham of Manchester Director of Research http://www.ma.man.ac.

1 Multiprecision Research Matters Algorithms February Nick25, Higham 2009 School of Mathematics The University Nick Higham of Manchester Director of Research School of nickhigham.wordpress.com Waseda Research Institute for Science and Engineering Institute for Mathematical Science Kickoff Meeting, Waseda University, Tokyo, March 6, / 6

2 Outline Multiprecision arithmetic: floating point arithmetic supporting multiple, possibly arbitrary, precisions. Applications of & support for low precision. Applications of & support for high precision. How to exploit different precisions to achieve faster algs with higher accuracy. Download this talk from Nick Higham Multiprecision Algorithms 2 / 27

3 Floating Point Number System Floating point number system F R: y = ±m β e t, 0 m β t 1. Base β (β = 2 in practice), precision t, exponent range e min e e max. Assume normalized: m β t 1. Nick Higham Multiprecision Algorithms 3 / 27

4 Floating Point Number System Floating point number system F R: y = ±m β e t, 0 m β t 1. Base β (β = 2 in practice), precision t, exponent range e min e e max. Assume normalized: m β t 1. Floating point numbers are not equally spaced. If β = 2, t = 3, e min = 1, and e max = 3: Nick Higham Multiprecision Algorithms 3 / 27

5 IEEE Standard and 2008 Revision Type Size Range u = 2 t half 16 bits 10 ± single 32 bits 10 ± double 64 bits 10 ± quadruple 128 bits 10 ± Arithmetic ops (+,,, /, ) performed as if first calculated to infinite precision, then rounded. Default: round to nearest, round to even in case of tie. Half precision is a storage format only. Nick Higham Multiprecision Algorithms 4 / 27

6 Rounding For x R, fl(x) is an element of F nearest to x, and the transformation x fl(x) is called rounding (to nearest). Theorem If x R lies in the range of F then fl(x) = x(1 + δ), δ u. u := 1 2 β1 t is the unit roundoff, or machine precision. The machine epsilon, ɛ M = β 1 t is the spacing between 1 and the next larger floating point number (eps in MATLAB) Nick Higham Multiprecision Algorithms 5 / 27

7 Precision versus Accuracy fl(abc) = ab(1 + δ 1 ) c(1 + δ 2 ) δ i u, = abc(1 + δ 1 )(1 + δ 2 ) abc(1 + δ 1 + δ 2 ). Precision = u. Accuracy 2u. Nick Higham Multiprecision Algorithms 6 / 27

8 Precision versus Accuracy fl(abc) = ab(1 + δ 1 ) c(1 + δ 2 ) δ i u, = abc(1 + δ 1 )(1 + δ 2 ) abc(1 + δ 1 + δ 2 ). Precision = u. Accuracy 2u. Accuracy is not limited by precision Nick Higham Multiprecision Algorithms 6 / 27

10 NVIDIA Tesla P100 (2016) The Tesla P100 is the world s first accelerator built for deep learning, and has native hardware ISA support for FP16 arithmetic, delivering over 21 TeraFLOPS of FP16 processing power. Nick Higham Multiprecision Algorithms 8 / 27

11 AMD Radeon Instinct MI25 GPU (2017) 24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPU compute performance on a single board... Up to 82 GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPU compute performance Nick Higham Multiprecision Algorithms 9 / 27

12 Machine Learning For machine learning as well as for certain image processing and signal processing applications, more data at lower precision actually yields better results with certain algorithms than a smaller amount of more precise data. We find that very low precision is sufficient not just for running trained networks but also for training them. Courbariaux, Benji & David (2015) We re solving the wrong problem (Scheinberg, 2016), so don t need an accurate solution. Low precision provides regularization. See Jorge Nocedal s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM CSE Nick Higham Multiprecision Algorithms 10 / 27

13 Climate Modelling T. Palmer, More reliable forecasts with less precise computations: a fast-track route to cloud-resolved weather and climate simulators?, Phil. Trans. R. Soc. A, 2014: Is there merit in representing variables at sufficiently high wavenumbers using half or even quarter precision floating-point numbers? T. Palmer, Build imprecise supercomputers, Nature, Nick Higham Multiprecision Algorithms 11 / 27

14 Error Analysis in Low Precision For an inner product x T y of n-vectors the standard error bound is fl(x T y) x T y nu x T y + O(u 2 ). In half precision, u , so nu = 1 for n = Nick Higham Multiprecision Algorithms 12 / 27

15 Error Analysis in Low Precision For an inner product x T y of n-vectors the standard error bound is fl(x T y) x T y nu x T y + O(u 2 ). In half precision, u , so nu = 1 for n = Most existing rounding error analysis guarantees no accuracy, and maybe not even a correct exponent, for half precision! Nick Higham Multiprecision Algorithms 12 / 27

16 Error Analysis in Low Precision For an inner product x T y of n-vectors the standard error bound is fl(x T y) x T y nu x T y + O(u 2 ). In half precision, u , so nu = 1 for n = Most existing rounding error analysis guarantees no accuracy, and maybe not even a correct exponent, for half precision! Is standard error analysis especially pessimistic in these applications? Try a statistical approach? Nick Higham Multiprecision Algorithms 12 / 27

17 Need for Higher Precision He and Ding, Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications, Bailey, Barrio & Borwein, High-Precision Computation: Mathematical Physics & Dynamics, Khanna, High-Precision Numerical Simulations on a CUDA GPU: Kerr Black Hole Tails, Beliakov and Matiyasevich, A Parallel Algorithm for Calculation of Determinants and Minors Using Arbitrary Precision Arithmetic, Ma and Saunders, Solving Multiscale Linear Programs Using the Simplex Method in Quadruple Precision, Nick Higham Multiprecision Algorithms 13 / 27

18 Going to Higher Precision If we have quadruple or higher precision, how can we modify existing algorithms to exploit it? Nick Higham Multiprecision Algorithms 14 / 27

19 Going to Higher Precision If we have quadruple or higher precision, how can we modify existing algorithms to exploit it? To what extent are existing algs precision-independent? Newton-type algs: just decrease tol? How little higher precision can we get away with? Gradually increase precision through the iterations? For Krylov methods # iterations can depend on precision, so lower precision might not give fastest computation! Nick Higham Multiprecision Algorithms 14 / 27

20 Availability of Multiprecision in Software Maple, Mathematica, PARI/GP, Sage. MATLAB: Symbolic Math Toolbox, Multiprecision Computing Toolbox (Advanpix). Julia: BigFloat. Mpmath and SymPy for Python. GNU MP Library. GNU MPFR Library. (Quad only): some C, Fortran compilers. Gone, but not forgotten: Numerical Turing: Hull et al., Nick Higham Multiprecision Algorithms 15 / 27

21 Cost of Quadruple Precision How Fast is Quadruple Precision Arithmetic? Compare MATLAB double precision, Symbolic Math Toolbox, VPA arithmetic, digits(34), Multiprecision Computing Toolbox (Advanpix), mp.digits(34). Optimized for quad. Ratios of times Intel Broadwell-E Core 6 cores mp/double vpa/double vpa/mp LU, n = , eig, nonsymm, n = , eig, symm, n = , Nick Higham Multiprecision Algorithms 16 / 27

22 Matrix Functions (Inverse) scaling and squaring-type algorithms for e A, log A, cos A, A t use Padé approximants. Padé degree and algorithm parameters chosen to achieve double precision accuracy, u = Change u and the algorithm logic needs changing! Open questions, even for scalar elementary functions? Nick Higham Multiprecision Algorithms 17 / 27

23 Log Tables Henry Briggs ( ): Arithmetica Logarithmica (1624) Logarithms to base 10 of 1 20,000 and 90, ,000 to 14 decimal places. Briggs must be viewed as one of the great figures in numerical analysis. Herman H. Goldstine, A History of Numerical Analysis (1977) Name Year Range Decimal places R. de Prony , Edward Sang , Nick Higham Multiprecision Algorithms 18 / 27

24 The Scalar Logarithm: Comm. ACM J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. Nick Higham Multiprecision Algorithms 19 / 27

25 The Scalar Logarithm: Comm. ACM J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. Nick Higham Multiprecision Algorithms 19 / 27

26 The Scalar Logarithm: Comm. ACM J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. M. L. Johnson and W. Sangren, W. (1962). Remark on Algorithm 48: Logarithm of a complex number. Nick Higham Multiprecision Algorithms 19 / 27

27 The Scalar Logarithm: Comm. ACM J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. M. L. Johnson and W. Sangren, W. (1962). Remark on Algorithm 48: Logarithm of a complex number. D. S. Collens (1964). Remark on remarks on Algorithm 48: Logarithm of a complex number. Nick Higham Multiprecision Algorithms 19 / 27

28 The Scalar Logarithm: Comm. ACM J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. M. L. Johnson and W. Sangren, W. (1962). Remark on Algorithm 48: Logarithm of a complex number. D. S. Collens (1964). Remark on remarks on Algorithm 48: Logarithm of a complex number. D. S. Collens (1964). Algorithm 243: Logarithm of a complex number: Rewrite of Algorithm 48. Nick Higham Multiprecision Algorithms 19 / 27

29 Principal Logarithm and pth Root Let A C n n have no eigenvalues on R. Principal log X = log A denotes unique X such that e X = A. π < Im ( λ(x) ) < π. Nick Higham Multiprecision Algorithms 20 / 27

30 The Average Eye First order character of optical system characterized by transference matrix [ ] S δ T = R 5 5, 0 1 where S R 4 4 is symplectic: S T JS = J = [ 0 I2 I 2 0 ]. Average m 1 m i=1 T i is not a transference matrix. Harris (2005) proposes the average exp(m 1 m i=1 log T i). Nick Higham Multiprecision Algorithms 21 / 27

31 Inverse Scaling and Squaring log A = log ( A 1/2s ) 2 s = 2 s log ( A 1/2s ) Nick Higham Multiprecision Algorithms 22 / 27

32 Inverse Scaling and Squaring log A = log ( A 1/2s ) 2 s = 2 s log ( A 1/2s ) Algorithm Basic inverse scaling and squaring 1: s 0 3: while A I 2 1 do 4: A A 1/2 5: s s + 1 6: end while 7: Find Padé approximant r km to log(1 + x) 8: return 2 s r km (A I) Nick Higham Multiprecision Algorithms 22 / 27

33 Inverse Scaling and Squaring log A = log ( A 1/2s ) 2 s = 2 s log ( A 1/2s ) Algorithm Schur Padé inverse scaling and squaring 1: s 0 2: Compute the Schur decomposition A := QTQ 3: while T I 2 1 do 4: T T 1/2 5: s s + 1 6: end while 7: Find Padé approximant r km to log(1 + x) 8: return 2 s Q r km (T I)Q What degree for the Padé approximants? Should we take more square roots once T I 2 < 1? Nick Higham Multiprecision Algorithms 22 / 27

34 Bounding the Error Kenney & Laub (1989) derived absolute f err bound log(i + X) r km (X) log(1 X ) r km ( X ). Al-Mohy, H & Relton (2012, 2013) derive b err bound. Strategy requires symbolic and high prec. comp. (done off-line) and obtains parameters dependent on u. Fasi & H (2018) use a relative f err bound based on log(i + X) X, since X 1 is possible. Strategy: Use f err bound to minimize s subject to f err bounded by u (arbitrary) for suitable m. Nick Higham Multiprecision Algorithms 23 / 27

35 Sharper Error Bound for Nonnormal Matrices Al-Mohy & H (2009) note that c i A i i=l c i A i = c i ( A i 1/i) i i=l i=l i=l c i β i, where β = max i l A i 1/i. Fasi & H (2018) refine Kenney & Laub s bound using α p (X) = max ( X p 1/p, X p+1 1/(p+1)), in place of A. Note that ρ(a) α p (A) A. Even sharper bounds derived by Nadukandi & H (2018). Nick Higham Multiprecision Algorithms 24 / 27

36 Relative Errors: 64 Digits κ log (A)u logm mct logm agm logm tayl logm tfree tayl logm pade logm tfree pade Nick Higham Multiprecision Algorithms 25 / 27

37 Relative Errors: 256 Digits Nick Higham Multiprecision Algorithms 26 / 27

38 Conclusions Both low and high precision floating-point arithmetic becoming more prevalent, in hardware and software. Need better understanding of behaviour of algs in low precision arithmetic. Matrix functions at high precision need algs with (at least) different logic. In SIAM PP18, MS54 (Thurs, 4:50) I ll show that using three precisions in iter ref brings major benefits and permits faster and more accurate solution of Ax = b. Nick Higham Multiprecision Algorithms 27 / 27

39 References I D. H. Bailey, R. Barrio, and J. M. Borwein. High-precision computation: Mathematical physics and dynamics. Appl. Math. Comput., 218(20): , G. Beliakov and Y. Matiyasevich. A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic. BIT, 56(1):33 50, Nick Higham Multiprecision Algorithms 1 / 7

40 References II E. Carson and N. J. Higham. Accelerating the solution of linear systems by iterative refinement in three precisions. MIMS EPrint , Manchester Institute for Mathematical Sciences, The University of Manchester, UK, July pp. Revised January To appear in SIAM J. Sci. Comput. E. Carson and N. J. Higham. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems. SIAM J. Sci. Comput., 39(6):A2834 A2856, Nick Higham Multiprecision Algorithms 2 / 7

41 References III M. Courbariaux, Y. Bengio, and J.-P. David. Training deep neural networks with low precision multiplications, ArXiv preprint v5. M. Fasi and N. J. Higham. Multiprecision algorithms for computing the matrix logarithm. MIMS EPrint , Manchester Institute for Mathematical Sciences, The University of Manchester, UK, May pp. Revised December To appear in SIAM J. Matrix Anal. Appl. Nick Higham Multiprecision Algorithms 3 / 7

42 References IV W. F. Harris. The average eye. Opthal. Physiol. Opt., 24: , Y. He and C. H. Q. Ding. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomputing, 18(3): , N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, second edition, ISBN xxx+680 pp. Nick Higham Multiprecision Algorithms 4 / 7

43 References V G. Khanna. High-precision numerical simulations on a CUDA GPU: Kerr black hole tails. J. Sci. Comput., 56(2): , D. Ma and M. Saunders. Solving multiscale linear programs using the simplex method in quadruple precision. In M. Al-Baali, L. Grandinetti, and A. Purnama, editors, Numerical Analysis and Optimization, number 134 in Springer Proceedings in Mathematics and Statistics, pages Springer-Verlag, Berlin, Nick Higham Multiprecision Algorithms 5 / 7

44 References VI P. Nadukandi and N. J. Higham. Computing the wave-kernel matrix functions. MIMS EPrint , Manchester Institute for Mathematical Sciences, The University of Manchester, UK, Feb pp. A. Roldao-Lopes, A. Shahzad, G. A. Constantinides, and E. C. Kerrigan. More flops or more precision? Accuracy parameterizable linear equation solvers for model predictive control. In 17th IEEE Symposium on Field Programmable Custom Computing Machines, pages , Apr Nick Higham Multiprecision Algorithms 6 / 7

45 References VII K. Scheinberg. Evolution of randomness in optimization methods for supervised machine learning. SIAG/OPT Views and News, 24(1):1 8, Nick Higham Multiprecision Algorithms 7 / 7

Research CharlieMatters

Research CharlieMatters Van Loan and the Matrix Exponential February 25, 2009 Nick Nick Higham Director Schoolof ofresearch Mathematics The School University of Mathematics of Manchester http://www.maths.manchester.ac.uk/~higham