Quantile Precision Issues in CUDA

Size: px
Start display at page:

Download "Quantile Precision Issues in CUDA"

Transcription

1 Quantile Precision Issues in CUDA Thomas Luu and William Shaw UCL, Dec Corrections to Set up and Introduction In[1]:= This Mathematica notebook uses the high-precision arithmetic in Mathematica and its CUDALink tools to investigate the precision of kernels for the normal quantile. First we load our high-precision benchmark. Note that this has been verified to >24 sig figs by comparison with the Steinbrecher-Shaw analysis (EJAM 200). := u - 1D In[5]:= In[6]:= Out[6]= In[7]:= Next we load the Mathematica CUDALink: Needs@"CUDALink`"D CUDAQ@D True CUDAInformation@D Out[7]= 1 Ø Name Ø Quadro 4000, Clock Rate Ø , Compute Capabilities Ø 2., GPU Overlap Ø 1, Maximum Block Dimensions Ø 1024, 1024, 64, Maximum Grid Dimensions Ø , , , Maximum Threads Per Block Ø 1024, Maximum Shared Memory Per Block Ø , Total Constant Memory Ø , Warp Size Ø 32, Maximum Pitch Ø , Maximum Registers Per Block Ø 32 76, Texture Alignment Ø 512, Multiprocessor Count Ø, Core Count Ø 256, Execution Timeout Ø 1, Integrated Ø False, Can Map Host Memory Ø True, Compute Mode Ø Default, Texture1D Width Ø , Texture2D Width Ø , Texture2D Height Ø , Texture3D Width Ø 204, Texture3D Height Ø 204, Texture3D Depth Ø 204, Texture2D Array Width Ø 16 34, Texture2D Array Height Ø 16 34, Texture2D Array Slices Ø 204, Surface Alignment Ø 512, Concurrent Kernels Ø True, ECC Enabled Ø False, TCC Enabled Ø False, Total Memory Ø CUDAFunctions for the quantile - float mode In[20]:= Hre is the one built into CUDA 4 kernelcuda = CUDAFunctionLoad@" global void cuda_norminvf_kernelhfloat *in, float *outl int i = threadidx.x + blockidx.x * blockdim.x; float u = in@id; out@id = HfloatLM_SQRT2 * erfinvfh2.0f*u - 1.0fL; ", "cuda_norminvf_kernel", "Float", "Float", 512D; Here is one for the kernel of Appendix A of Shaw-Luu-Brickman 2011:

2 2 QuantilePrecisionInCUDA.nb In[21]:= kernelws = CUDAFunctionLoad@" inline device float ws_norminvfhfloat ul float half_minus_u = 0.5f - u; float v, p, q; float one_minus_x = copysignfh2.0f*u, half_minus_ul; if Hhalf_minus_u 0.0fL one_minus_x += 2.0f; v = - logfhone_minus_xl; p = e-4f; p = p*v f; p = p*v f; p = p*v f; p = p*v f; p = p*v f; q = e-6f; q = q*v f; q = q*v f; q = q*v f; q = q*v f; q = q*v f; q = q*v + 1.0f; return - fdividefhp, ql * copysignfhv, half_minus_ul; global void ws_norminvf_kernelhfloat *in, float *outl int i = threadidx.x + blockidx.x * blockdim.x; float u = in@id; out@id = ws_norminvfhul; ", "ws_norminvf_kernel", "Float", "Float", 512D; Here are the kernels based on the paper by Giles and the web site by Acklam, for float operation

3 QuantilePrecisionInCUDA.nb 3 In[]:= kernelmg = CUDAFunctionLoad@" inline device float MBG_erfinvHfloat xl float w, p; w = - logfhh1.0f-xl*h1.0f+xll; if H w f L w = w f; p = e-0f; p = e-07f + p*w; p = e-06f + p*w; p = e-06f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; else w = sqrtfhwl f; p = f; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; p = f + p*w; return p*x; global void mg_norminvfhfloat *in, float *outl int i = threadidx.x + blockidx.x * blockdim.x; float u = in@id; out@id = HfloatLM_SQRT2 * MBG_erfinvH2.0f*u - 1.0fL; ", "mg_norminvf", "Float", "Float", 512D;

4 4 QuantilePrecisionInCUDA.nb In[9]:= kernelacklam = CUDAFunctionLoad@" global void AcklamsingleHfloat * aa, float * bbl const float a@6d = e+01f, e+02f, e+02f, e+02f, e+01f, ; const float b@5d = e+01f, e+02f, e+00f e+02f, e+01f, e+01f ; const float c@6d = e-03f, e-01f, e+00f, e+00f, e+00f, e+00f ; const float d@4d = ; e-03f, e+00f, float p, q, t, u; e-01f, e+00f int idx = blockidx.x * blockdim.x + threadidx.x; p = aa@idxd; if Hp1.0f-pL q=p; else q=1.0f-p; if Hq > fL ê* Rational approximation for central region. *ê u = q-0.5f; t = u*u; u = u*hhhhha@0d*t+a@1dl*t+a@2dl*t+a@3dl*t+a@4dl*t+a@5dl êhhhhhb@0d*t+b@1dl*t+b@2dl*t+b@3dl*t+b@4dl*t+1l; else ê* Rational approximation for tail region. *ê t = fsqrt_rnh-2* logfhqll; u = HHHHHc@0D*t+c@1DL*t+c@2DL*t+c@3DL*t+c@4DL*t+c@5DL êhhhhd@0d*t+d@1dl*t+d@2dl*t+d@3dl*t+1l; ê* The relative error of the approximation has absolute value less than 1.15e-9. One iteration of Halley's rational method Hthird orderl gives full machine precision... *ê if Hp>0.5fL bb@idxd = -u; else bb@idxd=u; ", "Acklamsingle", "Float", "Float", 512D; Relative error plots in left region ü setup In[43]:= uniforms = Table@10^-i, i, 31 ê 100, 14, 1 ê 100D Reverse N; n = uniforms Length Out[44]= 1370 In[45]:= In[46]:= luniforms = Log@10, uniformsd; exact = normalquantile@uniformsd; In[47]:= gpuuniforms = CUDAMemoryLoad@uniforms, "TargetPrecision" Ø "Single"D; gpunormals = CUDAMemoryAllocate@"Float", nd;

5 QuantilePrecisionInCUDA.nb 5 In[49]:= ü CUDA 4 built in kernelcuda@gpuuniforms, gpunormalsd; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"CUDA Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14DD CUDA Quantile Realized Log_10 Error - Left Tail Out[51]= In[52]:= ü SLB 2011 kernelws@gpuuniforms, gpunormalsd; back = CUDAMemoryGet@gpuUniformsD; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"H6,6L Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14DD H6,6L Quantile Realized Log_10 Error - Left Tail Out[55]=

6 6 QuantilePrecisionInCUDA.nb In[56]:= ü Giles gpunormalsd; ê exact - 1DDD, PlotRange Ø, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"Giles Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14DD Giles Quantile Realized Log_10 Error - Left Tail Out[5]= In[59]:= ü Acklam kernelacklam@gpuuniforms, gpunormalsd; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"Acklam Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14DD Acklam Quantile Realized Log_10 Error - Left Tail Out[61]= CUDAMemoryUnload@gpuUniformsD CUDAMemoryUnload@gpuNormalsD

7 QuantilePrecisionInCUDA.nb 7 Double work In[63]:= Out[64]= 2701 uniforms = Table@SetPrecision@10^-i, 20D, i, 30 ê 100, 30, 11 ê 1000D Reverse; n = uniforms Length In[65]:= luniforms = Log@10, uniformsd; DP kernels In[73]:= In[74]:= kernelcudadp = CUDAFunctionLoad@" global void cuda_norminv_kernelhdouble *in, double *outl int i = threadidx.x + blockidx.x * blockdim.x; double u = in@id; out@id = out@id = M_SQRT2 * erfinvh2.0*u - 1.0L; ", "cuda_norminv_kernel", "Double", "Double", 512D; kernelas241 = CUDAFunctionLoad@" device double rpoly_value H int n, double a@d, double x L ****************************************************************************0 Purpose: RPOLY_VALUE evaluates a double precision polynomial. Discussion: For sanity's sake, the value of N indicates the NUMBER of coefficients, or more precisely, the ORDER of the polynomial, rather than the DEGREE of the polynomial. The two quantities differ by 1, but cause a great deal of confusion. Given N and A, the form of the polynomial is: phxl = a@0d + a@1d * x a@n-2d * x^hn-2l + a@n-1d * x^hn-1l Licensing: This code is distributed under the GNU LGPL license. Modified: 13 August 2004 Author: John Burkardt Parameters: Input, int N, the order of the polynomial. Input, double A@ND, the coefficients of the polynomial. A@0D is the constant term. Input, double X, the point at which the polynomial is to be evaluated. Output, double RPOLY_VALUE, the value of the polynomial at X. int i; double value; value = 0.0;

8 QuantilePrecisionInCUDA.nb value = 0.0; for H i = n-1; 0 = i; i-- L value = value * x + a@id; return value; global void AS241gpuHdouble * aa, double * bbl This GPU code adapted from JB's function: Hhis comments reproduced herel double r_normal_01_cdf_inverse H double p L Purpose: R_NORMAL_01_CDF_INVERSE inverts the standard normal CDF. Discussion: The result is accurate to about 1 part in 10**16. Modified: 27 December 2004 Author: Original FORTRAN77 version by Michael Wichura. C++ version by John Burkardt. Reference: Michael Wichura, The Percentage Points of the Normal Distribution, Algorithm AS 241, Applied Statistics, Volume 37, Number 3, pages , 19. Parameters: Input, double P, the value of the cumulative probability densitity function. 0 P 1. If P is outside this range, an \"infinite\" value is returned. Output, double R_NORMAL_01_CDF_INVERSE, the normal deviate value with the property that the probability of a standard normal deviate being less than or equal to this value is P. double a@d = , e+2, e+3, e+4, e+4, e+4, e+4, e+3 ; double b@d = 1.0, e+1, e+2, e+3, e+4, e+4, e+4, e+3 ; double c@d = , ,

9 QuantilePrecisionInCUDA.nb , , , , , e-1, e-2, e-4 ; double const1 = ; double const2 = 1.6; double d@d = 1.0, , , e-1, e-1, e-2, e-4, e-9 ; double e@d = , , , e-1, e-2, e-3, e-5, e-7 ; double f@d = 1.0, e-1, e-1, e-2, e-4, e-5, e-7, e-15 ; double p, q, absq; double r; double split1 = 0.425; double split2 = 5.0; double value; int idx = blockidx.x * blockdim.x + threadidx.x; p = aa@idxd; q = p - 0.5; if H q = 0 Labsq = -q; else absq = q; if Habsq = split1 L r = const1 - q * q; value = q * rpoly_value H, a, r L ê rpoly_value H, b, r L; else if H q 0.0 L r = p; else r = p; r = sqrt H -log H r L L; if H r = split2 L r = r - const2; value = rpoly_value H, c, r L ê rpoly_value H, d, r L; else r = r - split2; value = rpoly_value H, e, r L ê rpoly_value H, f, r L; if H q 0.0 L

10 10 QuantilePrecisionInCUDA.nb value = -value; In[69]:= In[70]:= bb@idxd = value; ", "AS241gpu", "Double", "Double", 512D; kernelwsexpdp = CUDAFunctionLoad@" global void ws_norminv_exp_42hdouble *in, double *outl int i = threadidx.x + blockidx.x * blockdim.x; double u = in@id; double half_minus_u = u; double v, p, q; double x = copysignh2.0*u, half_minus_ul; if Hhalf_minus_u 0.0L x += 2.0; v = -loghxl; p = e-14; p = p*v e-11; p = p*v e-; p = p*v e-6; p = p*v e-4; p = p*v e-3; p = p*v e-2; p = p*v e-1; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; q = e-13; q = q*v e; q = q*v e-7; q = q*v e-5; q = q*v e-4; q = q*v e-2; q = q*v e-1; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v + 1.0; out@id = p ê q * copysignhv, -half_minus_ul; ", "ws_norminv_exp_42", "Double", "Double", 512D; kernelwsdp = CUDAFunctionLoad@" inline device double ws_norminvhdouble ul double u_minus_half = u - 0.5; double v, p, q; v = u_minus_half * rsqrth fma_rnh-u, u, ull; Hu-0.5LêsqrtHu-u^2L v = copysignhv, 0.0L; if H allhv 15.5LL just use primary transformation p = e-;

11 QuantilePrecisionInCUDA.nb 11 p = e-; p = p*v e-6; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; q = e-9; q = q*v e-6; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v + 1.0; else fallback to exponential transformation ê* double one_minus_x = copysignh2.0*u, -u_minus_halfl; if Hu_minus_half > 0.0L one_minus_x += 2.0; v = -loghone_minus_xl; *ê ê* *ê double x = copysignh2.0*u, u_minus_halfl; x -= copysignh1.0, u_minus_halfl; v = -loghfmah-1.0, x, 1.0LL; p = e-14; p = p*v e-11; p = p*v e-; p = p*v e-6; p = p*v e-4; p = p*v e-3; p = p*v e-2; p = p*v e-1; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; p = p*v ; q = e-13; q = q*v e; q = q*v e-7; q = q*v e-5; q = q*v e-4; q = q*v e-2;

12 12 QuantilePrecisionInCUDA.nb q = q*v e-1; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v ; q = q*v + 1.0; return p ê q * copysignhv, u_minus_halfl; return p * drcp_rnhql * copysignhv, u_minus_halfl; global void ws_norminv_kernelhdouble *in, double *outl int i = threadidx.x + blockidx.x * blockdim.x; double u = in@id; out@id = ws_norminvhul; ", "ws_norminv_kernel", "Double", "Double", 512D; Precision plots In[7]:= Out[]= 1791 uniforms = Table@SetPrecision@10^-i, 20D, i, 30 ê 100, 20, 11 ê 1000D Reverse; n = uniforms Length In[9]:= luniforms = Log@10, uniformsd; In[90]:= exact = SetPrecision@normalQuantile@uniformsD, 20D; In[91]:= exact@@2dd Out[91]= In[92]:= In[94]:= gpuuniforms = CUDAMemoryLoad@uniformsD; gpunormals = CUDAMemoryAllocate@"Double", nd; exact@@2dd Out[94]= In[95]:= Log@10, 2^H-54LD N Out[95]= In[116]:= ü AS241 gpuuniforms = CUDAMemoryLoad@uniformsD; gpunormals = CUDAMemoryAllocate@"Double", nd;

13 QuantilePrecisionInCUDA.nb 13 In[11]:= gpunormalsd; normals = SetPrecision@normals, 20D; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø -20, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"AS241 Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14D, Epilog Ø Line@ , -20, , 0DD AS241 Quantile Realized Log_10 Error - Left Tail Out[121]= -15 In[96]:= ü CUDA 4 kernelcudadp@gpuuniforms, gpunormalsd; normals = SetPrecision@normals, 20D; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø -20, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"CUDA Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14D, Epilog Ø Line@ , -20, , 0DD CUDA Quantile Realized Log_10 Error - Left Tail Out[99]=

14 14 QuantilePrecisionInCUDA.nb In[100]:= ü SLB Appendix B kernelwsexpdp@gpuuniforms, gpunormalsd; normals = SetPrecision@normals, 20D; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø -20, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"Branchless Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14D, Epilog Ø Line@ , -20, , 0DD Branchless Quantile Realized Log_10 Error - Left Tail Out[103]= -15 In[104]:= ü SLB Appendix C (Student t hybrid) kernelwsdp@gpuuniforms, gpunormalsd; normals = SetPrecision@normals, 20D; ListPlot@Transpose@luniforms, Log@10, Abs@normals ê exact - 1DDD, PlotRange Ø -20, 0, Joined Ø True, InterpolationOrder Ø 1, PlotLabel Ø Style@"T2 Hybrid Quantile Realized Log_10 Error - Left Tail", 16, BoldD, LabelStyle Ø Directive@Bold, 14D, Epilog Ø Line@ , -20, , 0DD T2 Hybrid Quantile Realized Log_10 Error - Left Tail Out[107]= Timing reminder

15 QuantilePrecisionInCUDA.nb 15 In double precision the timings on a Quadro 4000 for a standard batch were AS ms CUDA ms SLB breakless 117ms SLB hybrid 933ms Timings on a C2050 are usually better than half for the Q4000.

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

Approximation of inverse Poisson CDF on GPUs

Approximation of inverse Poisson CDF on GPUs Approximation of inverse Poisson CDF on GPUs Mike Giles Mathematical Institute, University of Oxford Oxford-Man Institute of Quantitative Finance 38th Conference on Stochastic Processes and their Applications

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units V.V.Aristov a, A.A.Frolova a, S.A.Zabelok a, V.I.Kolobov b and R.R.Arslanbekov b a Dorodnicn Computing Centre of the Russian

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Mathematica examples relevant to Legendre functions

Mathematica examples relevant to Legendre functions Mathematica eamples relevant to Legendre functions Legendre Polynomials are built in Here is Legendre s equation, and Mathematica recognizes as being solved by Legendre polynomials (LegendreP) and the

More information

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix

More information

GPU Applications for Modern Large Scale Asset Management

GPU Applications for Modern Large Scale Asset Management GPU Applications for Modern Large Scale Asset Management GTC 2014 San José, California Dr. Daniel Egloff QuantAlea & IncubeAdvisory March 27, 2014 Outline Portfolio Construction Outline Portfolio Construction

More information

Topic 17. Analysis of Algorithms

Topic 17. Analysis of Algorithms Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic Computer Arithmetic MATH 375 Numerical Analysis J. Robert Buchanan Department of Mathematics Fall 2013 Machine Numbers When performing arithmetic on a computer (laptop, desktop, mainframe, cell phone,

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, UC

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

arxiv: v1 [cs.na] 8 Feb 2016

arxiv: v1 [cs.na] 8 Feb 2016 Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication

More information

Applied C Fri

Applied C Fri Applied C++11 2013-01-25 Fri Outline Introduction Auto-Type Inference Lambda Functions Threading Compiling C++11 C++11 (formerly known as C++0x) is the most recent version of the standard of the C++ Approved

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

STAT2201 Assignment 3 Semester 1, 2017 Due 13/4/2017

STAT2201 Assignment 3 Semester 1, 2017 Due 13/4/2017 Class Example 1. Single Sample Descriptive Statistics (a) Summary Statistics and Box-Plots You are working in factory producing hand held bicycle pumps and obtain a sample of 174 bicycle pump weights in

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

POLITECNICO DI MILANO DATA PARALLEL OPTIMIZATIONS ON GPU ARCHITECTURES FOR MOLECULAR DYNAMIC SIMULATIONS

POLITECNICO DI MILANO DATA PARALLEL OPTIMIZATIONS ON GPU ARCHITECTURES FOR MOLECULAR DYNAMIC SIMULATIONS POLITECNICO DI MILANO Facoltà di Ingegneria dell Informazione Corso di Laurea in Ingegneria Informatica DATA PARALLEL OPTIMIZATIONS ON GPU ARCHITECTURES FOR MOLECULAR DYNAMIC SIMULATIONS Relatore: Prof.

More information

Estimating VaR in credit risk: Aggregate vs single loss distribution

Estimating VaR in credit risk: Aggregate vs single loss distribution Estimating VaR in credit risk: Aggregate vs single loss distribution M. Assadsolimani and D. Chetalova arxiv:172.4388v1 [q-fin.cp] 14 Feb 217 Abstract Using Monte Carlo simulation to calculate the Value

More information

A new multiplication algorithm for extended precision using floating-point expansions. Valentina Popescu, Jean-Michel Muller,Ping Tak Peter Tang

A new multiplication algorithm for extended precision using floating-point expansions. Valentina Popescu, Jean-Michel Muller,Ping Tak Peter Tang A new multiplication algorithm for extended precision using floating-point expansions Valentina Popescu, Jean-Michel Muller,Ping Tak Peter Tang ARITH 23 July 2016 AMPAR CudA Multiple Precision ARithmetic

More information

Algorithm 955: approximation of the inverse Poisson cumulative distribution function

Algorithm 955: approximation of the inverse Poisson cumulative distribution function XXXX Algorithm 955: approximation of the inverse Poisson cumulative distribution function Michael B. Giles, University of Oxford New approximations for the inverse of the incomplete gamma function are

More information

University of Alberta

University of Alberta University of Alberta Parallel Electromagnetic Transient Simulation of Large-Scale Power Systems on Massive-threading Hardware by Zhiyin Zhou A thesis submitted to the Faculty of Graduate Studies and Research

More information

Section 8.1 Vector and Parametric Equations of a Line in

Section 8.1 Vector and Parametric Equations of a Line in Section 8.1 Vector and Parametric Equations of a Line in R 2 In this section, we begin with a discussion about how to find the vector and parametric equations of a line in R 2. To find the vector and parametric

More information

The Mathematica Journal p-adic Arithmetic

The Mathematica Journal p-adic Arithmetic The Mathematica Journal p-adic Arithmetic Stany De Smedt The p-adic numbers were introduced by K. Hensel in 1908 in his book Theorie der algebraïschen Zahlen, Leipzig, 1908. In this article we present

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRRR) is one of the most efficient and most

More information

HW #6. 1. Inflaton. (a) slow-roll regime. HW6.nb 1

HW #6. 1. Inflaton. (a) slow-roll regime. HW6.nb 1 HW6.nb HW #6. Inflaton (a) slow-roll regime In the slow-roll regime, we neglect the kinetic energy as well as f ÿÿ term in the equation of motion. Then H = ÅÅÅ 8 p 3 G N ÅÅÅ m f, 3 H f ÿ + m f = 0. We

More information

BOOLEAN ALGEBRA INTRODUCTION SUBSETS

BOOLEAN ALGEBRA INTRODUCTION SUBSETS BOOLEAN ALGEBRA M. Ragheb 1/294/2018 INTRODUCTION Modern algebra is centered around the concept of an algebraic system: A, consisting of a set of elements: ai, i=1, 2,, which are combined by a set of operations

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

The connected locus for complex cubic iteration

The connected locus for complex cubic iteration The connected locus for complex cubic iteration A preprint version of a Mathematical graphics column from Mathematica in Education and Research. Mark McClure Department of Mathematics University of North

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

arxiv: v1 [cs.dc] 4 Sep 2014

arxiv: v1 [cs.dc] 4 Sep 2014 and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de

More information

Computing logarithms and other special functions

Computing logarithms and other special functions Computing logarithms and other special functions Mike Giles University of Oxford Mathematical Institute Napier 400 NAIS Symposium April 2, 2014 Mike Giles (Oxford) Computing special functions April 2,

More information

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRR) is one of the most efficient and accurate

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Solving RODEs on GPU clusters

Solving RODEs on GPU clusters HIGH TEA @ SCIENCE Solving RODEs on GPU clusters Christoph Riesinger Technische Universität München March 4, 206 HIGH TEA @ SCIENCE, March 4, 206 Motivation - Parallel Computing HIGH TEA @ SCIENCE, March

More information

F O R SOCI AL WORK RESE ARCH

F O R SOCI AL WORK RESE ARCH 7 TH EUROPE AN CONFERENCE F O R SOCI AL WORK RESE ARCH C h a l l e n g e s i n s o c i a l w o r k r e s e a r c h c o n f l i c t s, b a r r i e r s a n d p o s s i b i l i t i e s i n r e l a t i o n

More information

Reclaiming Meaning in Mathematics

Reclaiming Meaning in Mathematics Reclaiming Meaning in Mathematics A Presentation for the WSCC 2007 Mathematics Conference William Bricken, PhD Lake Washington Technical College william.bricken@lwtc.edu For Teachers An educational chasm

More information

Physicist's Introduction to Mathematica

Physicist's Introduction to Mathematica Physicist's Introduction to Mathematica Laboratory 6 Part B Fitting Curves to Data Preliminary Remarks It is increasingly rare for your activity in the laboratory to be describable as "hands-on" production

More information

Adiabatic Quantum Computing Applied to the 3- SAT Problem

Adiabatic Quantum Computing Applied to the 3- SAT Problem Adiabatic Quantum Computing Applied to the 3- SAT Problem by José Luis Gómez-Muñoz http://homepage.cem.itesm.mx/lgomez/quantum/ jose.luis.gomez@itesm.mx Introduction Quantum Adiabatic Commputing encodes

More information

Fast event generation system using GPU. Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing

Fast event generation system using GPU. Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing Fast event generation system using GPU Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing Motivation The mount of LHC data is increasing. -5fb -1 in 2011-22fb -1 in 2012 High statistics data ->

More information

Planning for Reactive Behaviors in Hide and Seek

Planning for Reactive Behaviors in Hide and Seek University of Pennsylvania ScholarlyCommons Center for Human Modeling and Simulation Department of Computer & Information Science May 1995 Planning for Reactive Behaviors in Hide and Seek Michael B. Moore

More information

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2012 PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Sudhir B. Kylasa

More information

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material BIN COUNT EGPGV,. This is the author version of the work. It is posted here by permission of Eurographics for your personal use. Not for redistribution. The definitive version is available at http://diglib.eg.org/.

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult Our reference: YJCPH 52 P-authorquery-v7 AUTHOR QUERY FORM Journal: YJCPH Please e-mail or fax your responses and any corrections to: Article Number: 52 E-mail: corrections.esch@elsevier.vtex.lt Fax: +

More information

ENGG 1203 Tutorial_9 - Review. Boolean Algebra. Simplifying Logic Circuits. Combinational Logic. 1. Combinational & Sequential Logic

ENGG 1203 Tutorial_9 - Review. Boolean Algebra. Simplifying Logic Circuits. Combinational Logic. 1. Combinational & Sequential Logic ENGG 1203 Tutorial_9 - Review Boolean Algebra 1. Combinational & Sequential Logic 2. Computer Systems 3. Electronic Circuits 4. Signals, Systems, and Control Remark : Multiple Choice Questions : ** Check

More information

Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography

Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography Background Parallel Sieve Processing on Vector Processor and GPU Yasunori Ushiro (Earth Simulator Center) Yoshinari Fukui (Earth Simulator Center) Hidehiko Hasegawa (Univ. of Tsukuba) () RSA Cryptography

More information

Code Generation for GPU Accelerators in the Domain of Image Preprocessing

Code Generation for GPU Accelerators in the Domain of Image Preprocessing Code Generation for GPU Accelerators in the Domain of Image Preprocessing Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Dagstuhl,

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

Lecture Notes 1: Platonic Convergence and the Central Limit Theorem

Lecture Notes 1: Platonic Convergence and the Central Limit Theorem Lecture Notes : Platonic Convergence and the Central Limit Theorem ) An erroneous notion of limit: Take the standard formulation of the Central Limit Theorem (Feller 97, Vol. II;Grimmet & Stirzaker, 98):

More information

A Polynomial-Time Algorithm for Memory Space Reduction

A Polynomial-Time Algorithm for Memory Space Reduction A Polynomial-Time Algorithm for Memory Space Reduction Yonghong Song Cheng Wang Zhiyuan Li Sun Microsystems, Inc. Department of Computer Sciences 4150 Network Circle Purdue University Santa Clara, CA 95054

More information

Mathematica expressesvectors, matrices, and tensorsin the form of lists. For example, a one dimensional list is :

Mathematica expressesvectors, matrices, and tensorsin the form of lists. For example, a one dimensional list is : demo7.nb Demo #7 Lists, Vectors, and Matrices in Mathematica Ê Lists Mathematica expressesvectors, matrices, and tensorsin the form of lists. For example, a one dimensional list is : a = 85.0, 2.5, 4.6,

More information

16. Deblurring Gaussian blur

16. Deblurring Gaussian blur 6. Deblurring Gaussian blur 277 6. Deblurring Gaussian blur 6. Deblurring To discuss an application where really high order Gaussian derivatives are applied, we study the deblurring of Gaussian blur by

More information

Introduction to Python

Introduction to Python Introduction to Python Luis Pedro Coelho Institute for Molecular Medicine (Lisbon) Lisbon Machine Learning School II Luis Pedro Coelho (IMM) Introduction to Python Lisbon Machine Learning School II (1

More information

ME 406 Bifurcations VII Subcritical Hopf Bifurcation

ME 406 Bifurcations VII Subcritical Hopf Bifurcation ME 406 Bifurcations VII Subcritical Hopf Bifurcation sysid Mathematica 4.1.2, DynPac 10.66, 3ê5ê2002 intreset; plotreset; 1. Introduction In this notebook, the seventh in a series of notebooks on bifurcations,

More information

Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic.

Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Mourad Gouicem PEQUAN Team, LIP6/UPMC Nancy, France May 28 th 2013

More information

1 What is the area model for multiplication?

1 What is the area model for multiplication? for multiplication represents a lovely way to view the distribution property the real number exhibit. This property is the link between addition and multiplication. 1 1 What is the area model for multiplication?

More information

Short Division of Long Integers. (joint work with David Harvey)

Short Division of Long Integers. (joint work with David Harvey) Short Division of Long Integers (joint work with David Harvey) Paul Zimmermann October 6, 2011 The problem to be solved Divide efficiently a p-bit floating-point number by another p-bit f-p number in the

More information

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications 2016 Aug 23 P. F. Baumeister, T. Hater, D. Pleiter H. Boettiger, T. Maurer, J. R. Brunheroto Contributors IBM R&D

More information

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area under the curve. The curve is called the probability

More information

Theory of Computation 1 Sets and Regular Expressions

Theory of Computation 1 Sets and Regular Expressions Theory of Computation 1 Sets and Regular Expressions Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Theory of Computation

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 2013-2014 Final Examination CLOSED BOOK Kewal K. Saluja Date:

More information

Accelerating Quantum Chromodynamics Calculations with GPUs

Accelerating Quantum Chromodynamics Calculations with GPUs Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University

More information

Visualizing the distributions of the escape paths of quaternion fractals

Visualizing the distributions of the escape paths of quaternion fractals Visualizing the distributions of the escape paths of quaternion fractals S. Halayka October 25, 2018 Abstract The length, displacement, and magnitude distributions of the escape paths of the points in

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Quantum Random Walk: Mathematica Syntax and Dirac Notation

Quantum Random Walk: Mathematica Syntax and Dirac Notation Quantum Random Walk: Mathematica Syntax and Dirac Notation by José Luis Gómez-Muñoz http://homepage.cem.itesm.mx/lgomez/quantum/ jose.luis.gomez@itesm.mx Based on calculations by Salvador Venegas-Andraca

More information

Sampling Random Variables

Sampling Random Variables Sampling Random Variables Introduction Sampling a random variable X means generating a domain value x X in such a way that the probability of generating x is in accordance with p(x) (respectively, f(x)),

More information

CHAPTER 6 : LITERATURE REVIEW

CHAPTER 6 : LITERATURE REVIEW CHAPTER 6 : LITERATURE REVIEW Chapter : LITERATURE REVIEW 77 M E A S U R I N G T H E E F F I C I E N C Y O F D E C I S I O N M A K I N G U N I T S A B S T R A C T A n o n l i n e a r ( n o n c o n v e

More information

P E R E N C O - C H R I S T M A S P A R T Y

P E R E N C O - C H R I S T M A S P A R T Y L E T T I C E L E T T I C E I S A F A M I L Y R U N C O M P A N Y S P A N N I N G T W O G E N E R A T I O N S A N D T H R E E D E C A D E S. B A S E D I N L O N D O N, W E H A V E T H E P E R F E C T R

More information

Probability Density Functions

Probability Density Functions Statistical Methods in Particle Physics / WS 13 Lecture II Probability Density Functions Niklaus Berger Physics Institute, University of Heidelberg Recap of Lecture I: Kolmogorov Axioms Ingredients: Set

More information

3 A Linear Perturbation Formula for Inverse Functions Set Up... 5

3 A Linear Perturbation Formula for Inverse Functions Set Up... 5 Bryce Terwilliger Advisor: Ian Abramson June 2, 2011 Contents 1 Abstract 1 2 Introduction 1 2.1 Asymptotic Relative Efficiency................... 2 2.2 A poissonization approach...................... 3

More information

arxiv: v1 [cs.ne] 29 Jul 2014

arxiv: v1 [cs.ne] 29 Jul 2014 A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking

More information

MODULE 9 NORMAL DISTRIBUTION

MODULE 9 NORMAL DISTRIBUTION MODULE 9 NORMAL DISTRIBUTION Contents 9.1 Characteristics of a Normal Distribution........................... 62 9.2 Simple Areas Under the Curve................................. 63 9.3 Forward Calculations......................................

More information

JPEG BMP. jpeg1.nb 1 JPEG. [Reference] /10/ /10/21 Takuichi Hirano (Tokyo Institute of Technology)

JPEG BMP. jpeg1.nb 1 JPEG. [Reference] /10/ /10/21 Takuichi Hirano (Tokyo Institute of Technology) peg1.nb 1 JPEG JPEG [Reference] http://en.wikipedia.org/wiki/jpeg 2006/10/21 2006/10/21 Takuichi Hirano (Tokyo Institute of Technology) BMP In[1]:= Out[1]= SetDirectory@"d:êhira2êpublic_htmlêhobbyêeduêpeg"D

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,

More information

SIMULATION OF ISING SPIN MODEL USING CUDA

SIMULATION OF ISING SPIN MODEL USING CUDA SIMULATION OF ISING SPIN MODEL USING CUDA MIRO JURIŠIĆ Supervisor: dr.sc. Dejan Vinković Split, November 2011 Master Thesis in Physics Department of Physics Faculty of Natural Sciences and Mathematics

More information

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy 2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos

More information

Real-time signal detection for pulsars and radio transients using GPUs

Real-time signal detection for pulsars and radio transients using GPUs Real-time signal detection for pulsars and radio transients using GPUs W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 15 th July 2013 1 Background of GPUs Why use GPUs? Influence

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Imaging using GPU. V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute

Imaging using GPU. V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute Imaging using GPU V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute Introduction What is a GPU? Why another Imager? Large amount of data to be

More information

Fast evaluation of the inverse Poisson CDF

Fast evaluation of the inverse Poisson CDF Fast evaluation of the inverse Poisson CDF Mike Giles University of Oxford Mathematical Institute Ninth IMACS Seminar on Monte Carlo Methods July 16, 2013 Mike Giles (Oxford) Poisson inverse CDF July 16,

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Computer Science Introductory Course MSc - Introduction to Java

Computer Science Introductory Course MSc - Introduction to Java Computer Science Introductory Course MSc - Introduction to Java Lecture 1: Diving into java Pablo Oliveira ENST Outline 1 Introduction 2 Primitive types 3 Operators 4 5 Control Flow

More information