Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography

Size: px
Start display at page:

Download "Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography"

Transcription

1 Background Parallel Sieve Processing on Vector Processor and GPU Yasunori Ushiro (Earth Simulator Center) Yoshinari Fukui (Earth Simulator Center) Hidehiko Hasegawa (Univ. of Tsukuba) () RSA Cryptography is the key technology for safe Internet use. () The safeness is based on the result that the factorization algorithm of a long-digit number n to P and Q has high computational complexity and consumes enormous computation time. () To guarantee, a decryption time of more than 0 years is necessary, even using the fastest computer. SIAM PP, Feb. 6, 0 Another interests Different Architecture for RSA (GPU and Vector Processor instead of PC) Non-Floating Point Number Operations (Almost 0 GFLOPS) Other Usage of GPU and Vector Processor RSA Cryptography Creation of keys () Choose prime numbers p, q, and e () Set n = p * q and f = (p-) * (q-) () Compute d = /e (mod f) Encryption Compute C = M e (mod n) Public keys (e, n) are used Decryption Compute M = C d (mod n); Euler s theorem M f (mod n) Secure keys (d, n) are used 4 Computation time of RSA-768 ( digits) CPU Year ratio(%) Sieve Processing Matrices Proc. 9 Exploration of polynomial 0 Algebraic Square root 0 Others 0 Cf. AMD64 (.GHz, Core) Sieve method Factorize N based on the relationship: A -B =(A-B)(A+B) 0 (mod N) () Gather numerous A i and B i such that A l A l A k lk B m B m B j mj (mod N) () Look for even l i and m i with factorization of 0- Matrices Dimension: 9,796,0 * 9,79,0 6

2 Steps of Sieve Processing Comp. Iteration Size Set Number Long 0^{00} Compute Base int 0^8 Choose prime number Q l and f l (x) int 0^ 4 Process for each l 0^ 4. Repeat Step 4.. (0^{}) /LP 4.. Compute PS[i] V[i]=0 Sieve Processing Double int int64 LP LP N*LP/p p (Harmonic mean of primes) LP: 0^6 for PC, 0^8 for GPU, 0^9 for ES LP LP LP LP LP p p p p N primes p LP/p Steps

3 4.. Kernel of Sieve for (k=0; k<n; k++) : # of primes in base { for (i=start[k]; i<lp; i+=prime[k]) { V[i] += LogP[k]; } LP : Size of Sieve } for (i=0; i<lp; i++) : Pick up sieved data { if(v[i] >= PS[i]) { Sive[No] = Pointer + i; No++; } } No : # of sieved data Update Start[0]~Start[N-] for next Sieve Tuning for Sieve Processing Base must be stored in fast memory PC (Cache) Shorter LP (LP is size of Sieve) 0^6 Vector Processor ES LP=0^9; 0^ times larger than that of PC Picking up of sieved data is slow ( if exists in a loop) GPU(GTX480) LP=0^8; 0^ times larger than that of PC Discontinuous memory access (stride varies) 4 Tuning for Vector Processor ES Picking up ratio is 0^{-6} ~0^{-0} Compute Maximum value in a block instead of picking up Modification: If the maximum number > value_b then perform Picking_up process Block size is 64K (6,6) This results in times faster Tuning for GPU Each thread has smaller LP (LP of PC * 000 / 0480 threads) Larger Prime[k] makes small hit ratio if(ii < LP) V[ii] += LogP[k]; N is used for Loop length instead LP Incorrect result for V[ii]+=LogP[k]; Sieve Processing can permit small error (Loss of pick up is OK up to 0^{-}) types of parallelization are used based on the size of prime numbers in the base 6 GPU program (before) no = gn*bn; bn=, gn=40 for (k=0; k<lp; k+= no) LP=0*04 { i = (bn * blockidx.x + threadidx.x) + k; V[i] = 0; } syncthreads(); for (k=0; k<n; k++) 0,480 threads in LP { for (i=start[k]; i<lp; i+=prime[k]*no) { ii = (bn*blockidx.x+threadidx.x)*prime[k]+i; if(ii < LP) V[ii] += LogP[k]; } syncthreads(); } 7 GPU program (after) for (k=0; k<n; k++) Sieve for ~0(K)-th primes { for (i=start[k]; i<lp; i+=prime[k]*gn*bn) { ii = (bn*blockidx.x + threadidx.x)*prime[k] + i; if(ii < LP) V[ii] += LogP[k]; syncthreads(); } } for (k=n; k<n; k+=gn) K+~40K-th primes { kk = blockidx.x + k; for (i=start[kk]; i<lp; i+=prime[kk]*bn) { ii = threadidx.x*prime[k] + i; if(ii < LP) V[ii] += LogP[kk]; syncthreads(); } } for (k=n; k<n; k+=gn*bn) Over (40K+)-th primes { kk =(bn*blockidx.x + threadidx.x) + k; for (i=start[kk]; i<lp; i+=prime[kk]) { if(i < LP) V[i] += LogP[kk]; syncthreads(); } } 8

4 Change of Result on Modified Program 60 digits, LP=0*04, Prime numbers N=0,000*04, 000 iterations M: Number of Sieved data () Original: M=4484 () After parallelization (times) M=[4488, 44867], mean=4480 largest difference = 87 (0.0%) 9 Conditions PC: Core of Dell Vostro 00 Intel Core,.GHz, GB Windows Vista, gcc, -O Vector Processor ES node (8 CPU).GHz: 89Gflops, 8GB SUPER-UX, Auto-parallel FORTRAN+MPI GPU NVIDIA GTX80.44GHz,.GB Unix, CUDA., -O 0 Time of Sieve Processing 60 digits Ratio of Sieve Processing 60 digits Computation time (hours) PC GPU ES ( node) ratio 00 0 PC/ES PC/GPU GPU/ES Prime numbers in Base (*0^6) Prime numbers in Base (0^6) ratio Dependency of Base size digits 0 digits PC/ES PC/GPU.E+04.E+0.E+06.E+07.E+08.E+09.E+0 Prime numbers in Base ES (Vector Processor) Parallelization is simple and easy, however a special treatment is needed for storing sieved data. GPU Fine grain Parallelization is needed, and that makes data dependency for sieved data. By omitting some data inconsistencies, we chose forced parallelization. 4

5 Summary of Sieve Processing Almost all ops. are addition of bits integer with different-stride 99.9 % are vectorizable, and easy MPI-parallelizable Speed depends on the size of fast memory One node of ES is times faster than PC (Check before pick up becomes times faster) GPU (GXT80) is 60 times faster than PC (Force-parallelization has 0.0% loss of sieved data; types of parallelization are used based on the magnitude of primes) High speed range of Base is ES >> GPU >> PC Forecast of RSA-768 ( digits) ratio Time(Node Year) (%) CPU GPU ES Sieve Processing Matrices polynomials Alg. SQRT Others Prediction Guess of RSA-04 (09 digits) ratio Time(Node Year) (%) CPU GPU ES Sieve Processing 8 6*0 0^ Matrices 9 4* Polynomials Alg. SQRT Others Guess is based on Sieve 0, 0- Mat. 0, Others 0

ENHANCING THE PERFORMANCE OF FACTORING ALGORITHMS

ENHANCING THE PERFORMANCE OF FACTORING ALGORITHMS ENHANCING THE PERFORMANCE OF FACTORING ALGORITHMS GIVEN n FIND p 1,p 2,..,p k SUCH THAT n = p 1 d 1 p 2 d 2.. p k d k WHERE p i ARE PRIMES FACTORING IS CONSIDERED TO BE A VERY HARD. THE BEST KNOWN ALGORITHM

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Gurgen Khachatrian Martun Karapetyan

Gurgen Khachatrian Martun Karapetyan 34 International Journal Information Theories and Applications, Vol. 23, Number 1, (c) 2016 On a public key encryption algorithm based on Permutation Polynomials and performance analyses Gurgen Khachatrian

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Bachet s equation and groups formed from solutions in Z p

Bachet s equation and groups formed from solutions in Z p Bachet s equation and groups formed from solutions in Z p Boise State University April 30, 2015 Elliptic Curves and Bachet s Equation Elliptic curves are of the form y 2 = x 3 + ax + b Bachet equations

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Public Key Encryption

Public Key Encryption Public Key Encryption 3/13/2012 Cryptography 1 Facts About Numbers Prime number p: p is an integer p 2 The only divisors of p are 1 and p s 2, 7, 19 are primes -3, 0, 1, 6 are not primes Prime decomposition

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Experience in Factoring Large Integers Using Quadratic Sieve

Experience in Factoring Large Integers Using Quadratic Sieve Experience in Factoring Large Integers Using Quadratic Sieve D. J. Guan Department of Computer Science, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 guan@cse.nsysu.edu.tw April 19, 2005 Abstract

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

Shortest Lattice Vector Enumeration on Graphics Cards

Shortest Lattice Vector Enumeration on Graphics Cards Shortest Lattice Vector Enumeration on Graphics Cards Jens Hermans 1 Michael Schneider 2 Fréderik Vercauteren 1 Johannes Buchmann 2 Bart Preneel 1 1 K.U.Leuven 2 TU Darmstadt SHARCS - 10 September 2009

More information

Ti Secured communications

Ti Secured communications Ti5318800 Secured communications Pekka Jäppinen September 20, 2007 Pekka Jäppinen, Lappeenranta University of Technology: September 20, 2007 Relies on use of two keys: Public and private Sometimes called

More information

S XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library

S XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library S6349 - XMP LIBRARY INTERNALS Niall Emmart University of Massachusetts Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library High Performance Modular Exponentiation A^K mod P Where A,

More information

Accelerated Search for Gaussian Generator Based on Triple Prime Integers

Accelerated Search for Gaussian Generator Based on Triple Prime Integers Journal of Computer Science 5 (9): 614-618, 2009 ISSN 1549-3636 2009 Science Publications Accelerated Search for Gaussian Generator Based on Triple Prime Integers 1 Boris S. Verkhovsky and 2 Md Shiblee

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

P214 Efficient Computation of Passive Seismic Interferometry

P214 Efficient Computation of Passive Seismic Interferometry P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Toward High Performance Matrix Multiplication for Exact Computation

Toward High Performance Matrix Multiplication for Exact Computation Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations

More information

Congruence Classes. Number Theory Essentials. Modular Arithmetic Systems

Congruence Classes. Number Theory Essentials. Modular Arithmetic Systems Cryptography Introduction to Number Theory 1 Preview Integers Prime Numbers Modular Arithmetic Totient Function Euler's Theorem Fermat's Little Theorem Euclid's Algorithm 2 Introduction to Number Theory

More information

Overview. Background / Context. CSC 580 Cryptography and Computer Security. March 21, 2017

Overview. Background / Context. CSC 580 Cryptography and Computer Security. March 21, 2017 CSC 580 Cryptography and Computer Security Math for Public Key Crypto, RSA, and Diffie-Hellman (Sections 2.4-2.6, 2.8, 9.2, 10.1-10.2) March 21, 2017 Overview Today: Math needed for basic public-key crypto

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Introduction to Modern Cryptography. Lecture RSA Public Key CryptoSystem 2. One way Trapdoor Functions

Introduction to Modern Cryptography. Lecture RSA Public Key CryptoSystem 2. One way Trapdoor Functions Introduction to Modern Cryptography Lecture 7 1. RSA Public Key CryptoSystem 2. One way Trapdoor Functions Diffie and Hellman (76) New Directions in Cryptography Split the Bob s secret key K to two parts:

More information

Discrete Logarithm Problem

Discrete Logarithm Problem Discrete Logarithm Problem Çetin Kaya Koç koc@cs.ucsb.edu (http://cs.ucsb.edu/~koc/ecc) Elliptic Curve Cryptography lect08 discrete log 1 / 46 Exponentiation and Logarithms in a General Group In a multiplicative

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Logic gates. Quantum logic gates. α β 0 1 X = 1 0. Quantum NOT gate (X gate) Classical NOT gate NOT A. Matrix form representation

Logic gates. Quantum logic gates. α β 0 1 X = 1 0. Quantum NOT gate (X gate) Classical NOT gate NOT A. Matrix form representation Quantum logic gates Logic gates Classical NOT gate Quantum NOT gate (X gate) A NOT A α 0 + β 1 X α 1 + β 0 A N O T A 0 1 1 0 Matrix form representation 0 1 X = 1 0 The only non-trivial single bit gate

More information

Security II: Cryptography exercises

Security II: Cryptography exercises Security II: Cryptography exercises Markus Kuhn Lent 2015 Part II Some of the exercises require the implementation of short programs. The model answers use Perl (see Part IB Unix Tools course), but you

More information

Cryptographic Hash Functions

Cryptographic Hash Functions Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction

More information

Class invariants by the CRT method

Class invariants by the CRT method Class invariants by the CRT method Andreas Enge Andrew V. Sutherland INRIA Bordeaux-Sud-Ouest Massachusetts Institute of Technology ANTS IX Andreas Enge and Andrew Sutherland Class invariants by the CRT

More information

SOLUTION of linear systems of equations of the form:

SOLUTION of linear systems of equations of the form: Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of

More information

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU Jacques M. Bahi, Raphaël Couturier, Christophe Guyeux, and Pierre-Cyrille Héam October 30, 2018 arxiv:1112.5239v1

More information

What is The Continued Fraction Factoring Method

What is The Continued Fraction Factoring Method What is The Continued Fraction Factoring Method? Slide /7 What is The Continued Fraction Factoring Method Sohail Farhangi June 208 What is The Continued Fraction Factoring Method? Slide 2/7 Why is Factoring

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography Peter Schwabe October 21 and 28, 2011 So far we assumed that Alice and Bob both have some key, which nobody else has. How

More information

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRR) is one of the most efficient and accurate

More information

Diophantine equations via weighted LLL algorithm

Diophantine equations via weighted LLL algorithm Cryptanalysis of a public key cryptosystem based on Diophantine equations via weighted LLL algorithm Momonari Kudo Graduate School of Mathematics, Kyushu University, JAPAN Kyushu University Number Theory

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

ECM at Work. Joppe W. Bos 1 and Thorsten Kleinjung 2. 1 Microsoft Research, Redmond, USA

ECM at Work. Joppe W. Bos 1 and Thorsten Kleinjung 2. 1 Microsoft Research, Redmond, USA ECM at Work Joppe W. Bos 1 and Thorsten Kleinjung 2 1 Microsoft Research, Redmond, USA 2 Laboratory for Cryptologic Algorithms, EPFL, Lausanne, Switzerland 1 / 18 Security assessment of public-key cryptography

More information

Elliptic Curve Computations

Elliptic Curve Computations Elliptic Curve Computations New Mexico Supercomputing Challenge Final Report April 1st, 2009 Team 66 Manzano High School Team Members Kristin Cordwell Chen Zhao Teacher Stephen Schum Project Mentor William

More information

Beam dynamics calculation

Beam dynamics calculation September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

On the complexity of computing discrete logarithms in the field F

On the complexity of computing discrete logarithms in the field F On the complexity of computing discrete logarithms in the field F 3 6 509 Francisco Rodríguez-Henríquez CINVESTAV-IPN Joint work with: Gora Adj Alfred Menezes Thomaz Oliveira CINVESTAV-IPN University of

More information

Gauss Sieve on GPUs. Shang-Yi Yang 1, Po-Chun Kuo 1, Bo-Yin Yang 2, and Chen-Mou Cheng 1

Gauss Sieve on GPUs. Shang-Yi Yang 1, Po-Chun Kuo 1, Bo-Yin Yang 2, and Chen-Mou Cheng 1 Gauss Sieve on GPUs Shang-Yi Yang 1, Po-Chun Kuo 1, Bo-Yin Yang 2, and Chen-Mou Cheng 1 1 Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan {ilway25,kbj,doug}@crypto.tw 2

More information

Cryptography and Security Midterm Exam

Cryptography and Security Midterm Exam Cryptography and Security Midterm Exam Serge Vaudenay 23.11.2017 duration: 1h45 no documents allowed, except one 2-sided sheet of handwritten notes a pocket calculator is allowed communication devices

More information

COMP424 Computer Security

COMP424 Computer Security COMP424 Computer Security Prof. Wiegley jeffw@csun.edu Rivest, Shamir & Adelman (RSA) Implementation 1 Relatively prime Prime: n, is prime if its only two factors are 1 and n. (and n 1). Relatively prime:

More information

9 Knapsack Cryptography

9 Knapsack Cryptography 9 Knapsack Cryptography In the past four weeks, we ve discussed public-key encryption systems that depend on various problems that we believe to be hard: prime factorization, the discrete logarithm, and

More information

Attacks on RSA & Using Asymmetric Crypto

Attacks on RSA & Using Asymmetric Crypto Attacks on RSA & Using Asymmetric Crypto Luke Anderson luke@lukeanderson.com.au 7 th April 2017 University Of Sydney Overview 1. Crypto-Bulletin 2. Breaking RSA 2.1 Chinese Remainder Theorem 2.2 Common

More information

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring

More information

Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm)

Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Alexander Smith & Khashayar Khavari Department of Electrical and Computer Engineering University of Toronto April 15, 2009 Alexander

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

8 Elliptic Curve Cryptography

8 Elliptic Curve Cryptography 8 Elliptic Curve Cryptography 8.1 Elliptic Curves over a Finite Field For the purposes of cryptography, we want to consider an elliptic curve defined over a finite field F p = Z/pZ for p a prime. Given

More information

Elliptic Curve Cryptography

Elliptic Curve Cryptography Elliptic Curve Cryptography Elliptic Curves An elliptic curve is a cubic equation of the form: y + axy + by = x 3 + cx + dx + e where a, b, c, d and e are real numbers. A special addition operation is

More information

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Christophe Negre, Thomas Plantard, Jean-Marc Robert Team DALI (UPVD) and LIRMM (UM2, CNRS), France CCISR, SCIT, (University

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Public Key Algorithms

Public Key Algorithms Public Key Algorithms Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse571-09/

More information

Polynomial Selection. Thorsten Kleinjung École Polytechnique Fédérale de Lausanne

Polynomial Selection. Thorsten Kleinjung École Polytechnique Fédérale de Lausanne Polynomial Selection Thorsten Kleinjung École Polytechnique Fédérale de Lausanne Contents Brief summary of polynomial selection (no root sieve) Motivation (lattice sieving, monic algebraic polynomial)

More information

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys

More information

Cryptography IV: Asymmetric Ciphers

Cryptography IV: Asymmetric Ciphers Cryptography IV: Asymmetric Ciphers Computer Security Lecture 7 David Aspinall School of Informatics University of Edinburgh 31st January 2011 Outline Background RSA Diffie-Hellman ElGamal Summary Outline

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

Mathematics of Public Key Cryptography

Mathematics of Public Key Cryptography Mathematics of Public Key Cryptography Eric Baxter April 12, 2014 Overview Brief review of public-key cryptography Mathematics behind public-key cryptography algorithms What is Public-Key Cryptography?

More information

Elliptic and Hyperelliptic Curves: a Practical Security Comparison"

Elliptic and Hyperelliptic Curves: a Practical Security Comparison Elliptic and Hyperelliptic Curves: a Practical Security Comparison Joppe W. Bos (Microsoft Research), Craig Costello (Microsoft Research),! Andrea Miele (EPFL) 1/13 Motivation and Goal(s)! Elliptic curves

More information

An Efficient Heuristic Algorithm for Linear Decomposition of Index Generation Functions

An Efficient Heuristic Algorithm for Linear Decomposition of Index Generation Functions An Efficient Heuristic Algorithm for Linear Decomposition of Index Generation Functions Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer and Network Eng., Hiroshima City University, Hiroshima,

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Homomorphic Encryption. Liam Morris

Homomorphic Encryption. Liam Morris Homomorphic Encryption Liam Morris Topics What Is Homomorphic Encryption? Partially Homomorphic Cryptosystems Fully Homomorphic Cryptosystems Benefits of Homomorphism Drawbacks of Homomorphism What Is

More information

The RSA Cipher and its Algorithmic Foundations

The RSA Cipher and its Algorithmic Foundations Chapter 1 The RSA Cipher and its Algorithmic Foundations The most important that is, most applied and most analyzed asymmetric cipher is RSA, named after its inventors Ron Rivest, Adi Shamir, and Len Adleman.

More information

Fast Scalar Multiplication on Elliptic Curves fo Sensor Nodes

Fast Scalar Multiplication on Elliptic Curves fo Sensor Nodes Réseaux Grand Est Fast Scalar Multiplication on Elliptic Curves fo Sensor Nodes Youssou FAYE Hervé GUYENNET Yanbo SHOU Université de Franche-Comté Besançon le 24 octobre 2013 TABLE OF CONTENTS ❶ Introduction

More information

Further improving security of Vector Stream Cipher

Further improving security of Vector Stream Cipher NOLTA, IEICE Paper Further improving security of Vector Stream Cipher Atsushi Iwasaki 1a) and Ken Umeno 2 1 Fukuoka Institute of Technology Wajiro-higashi, Higashiku, Fukuoka 811-0295, Japan 2 Graduate

More information

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2 Contents 1 Recommended Reading 1 2 Public Key/Private Key Cryptography 1 2.1 Overview............................................. 1 2.2 RSA Algorithm.......................................... 2 3 A Number

More information

Information Security

Information Security SE 4472 / ECE 9064 Information Security Week 12: Random Number Generators and Picking Appropriate Key Lengths Fall 2015 Prof. Aleksander Essex Random Number Generation Where do keys come from? So far we

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

Elliptic Curves Cryptography and factorization. Part VIII. Elliptic curves cryptography and factorization. Historical Remarks.

Elliptic Curves Cryptography and factorization. Part VIII. Elliptic curves cryptography and factorization. Historical Remarks. Elliptic Curves Cryptography and factorization Part VIII Elliptic curves cryptography and factorization Cryptography based on manipulation of points of so called elliptic curves is getting momentum and

More information

The factorization of RSA D. J. Bernstein University of Illinois at Chicago

The factorization of RSA D. J. Bernstein University of Illinois at Chicago The factorization of RSA-1024 D. J. Bernstein University of Illinois at Chicago Abstract: This talk discusses the most important tools for attackers breaking 1024-bit RSA keys today and tomorrow. The same

More information

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2012 PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Sudhir B. Kylasa

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Analysis of the RSA Encryption Algorithm

Analysis of the RSA Encryption Algorithm Analysis of the RSA Encryption Algorithm Betty Huang June 16, 2010 Abstract The RSA encryption algorithm is commonly used in public security due to the asymmetric nature of the cipher. The procedure is

More information

Efficient Leak Resistant Modular Exponentiation in RNS

Efficient Leak Resistant Modular Exponentiation in RNS Efficient Leak Resistant Modular Exponentiation in RNS Andrea Lesavourey (1), Christophe Negre (1) and Thomas Plantard (2) (1) DALI (UPVD) and LIRMM (Univ. of Montpellier, CNRS), Perpignan, France (2)

More information

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRRR) is one of the most efficient and most

More information

verdex pro Series Performance and Power Technical Reference

verdex pro Series Performance and Power Technical Reference verdex pro Series Performance and Power Technical Reference Operating Temperature PXA270-based verdex pro and verdex COMs PXA255-based basix and connex motherboards Summary Performance Benchmarks: Verdex

More information

Correctness, Security and Efficiency of RSA

Correctness, Security and Efficiency of RSA Correttezza di RSA Correctness, Security and Efficiency of RSA Ozalp Babaoglu! Bisogna dimostrare D(C(m)) = m ALMA MATER STUDIORUM UNIVERSITA DI BOLOGNA 2 Correttezza di RSA Correttezza di RSA! Risultati

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

Cpt S 223. School of EECS, WSU

Cpt S 223. School of EECS, WSU Algorithm Analysis 1 Purpose Why bother analyzing code; isn t getting it to work enough? Estimate time and memory in the average case and worst case Identify bottlenecks, i.e., where to reduce time Compare

More information

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult Our reference: YJCPH 52 P-authorquery-v7 AUTHOR QUERY FORM Journal: YJCPH Please e-mail or fax your responses and any corrections to: Article Number: 52 E-mail: corrections.esch@elsevier.vtex.lt Fax: +

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 10 February 19, 2013 CPSC 467b, Lecture 10 1/45 Primality Tests Strong primality tests Weak tests of compositeness Reformulation

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information