Some experiments with massively parallel computation for Monte Carlo simulation of stochastic dynamical systems

Size: px
Start display at page:

Download "Some experiments with massively parallel computation for Monte Carlo simulation of stochastic dynamical systems"

Transcription

1 Some experiments with massively parallel computation for Monte Carlo simulation of stochastic dynamical systems E. A. Johnson, S. F. Wojtkiewicz and L. A. Bergman University of Illinois, Urbana, Illinois ABSTRACT: The advantages and disadvantages of several numerical solution methods for the transition probability density functions of stochastic dynamical systems are discussed. Monte Carlo simulation is superior for some problems of this class. The drawbacks and benefits of its use on several computer architectures, including massively parallel and distributed-network computers, are examined. The effort required and the gains realized are discussed. Furthermore, a brief comparison of the results from MCS and finite element solutions is given. INTRODUCTION The evolution of stochastic dynamical systems is governed by Fokker Planck equations if the response process is Markovian. Analytical solutions for the transient response do not exist for all but the simplest of systems. The evolution of the transition probability density function over the phase space has been solved numerically for various two-dimensional systems subjected to additive and multiplicative random excitation using the finite element method (Spencer and Bergman 993). Systems of higher order, however, pose significant difficulty when using standard finite element formulations due to memory requirements and computational expense. Direct Monte Carlo simulation (MCS), while often regarded as less elegant than other methods, can indeed be used to solve problems of significantly higher complexity. Low order systems are often more efficiently solved by other methods (e.g., the finite element method, cell mapping, path integral methods, etc.). For example, a standard finite element solution with a grid of n points in each spatial dimension and a uniform time step requires a single reduction to upper triangular form of n d equations followed by forward and backward substitution at each time step for a d -dimensional problem. Thus, the required number of computations and memory allocation grow exponentially with the dimen- sionality of the problem. Granted, these matrix equations are not fully populated, and in fact have relatively narrow bandwidth if node numbering is done optimally; but the number of calculations required to solve these equations grows at least as n d and usually much faster. Contrastingly, a Monte Carlo simulation requires a number of computations proportional to d and to the number of realizations. Furthermore, the accuracy of the Monte Carlo simulation is not dependent on the dimensionality of the system but, rather, on the number of realizations used to characterize the system (Pradlwarter, Schuëller, and Melnik-Melnikov 993). The number of realizations required to accurately produce the transition probability density function over the entire phase space, especially in the tails, is large, but since each realization is entirely independent of the others, the Monte Carlo simulation is easily and efficiently adapted to parallel computation. The advent of highspeed, massively parallel computers permits a large number of realizations of a complex dynamical system to be determined. Consequently, Monte Carlo simulation may be more efficient for higher dimensional systems than other solution methods currently in use. Thus it is the purpose of this investigation to confirm the above observations and compare the performance of MCS on various platforms, including a massively-parallel supercomputer and distributed-network workstations, with

2 special focus on the advantages and disadvantages of each platform for this class of problems. SYSTEM DESCRIPTIONS. Duffing Oscillator Monte Carlo simulation is readily used for any number of stochastic systems. For the sake of comparison with previous solutions of the Fokker Planck equation by the finite element method, one system to be examined herein is a Duffing oscillator subjected to external white noise. The equation of motion is given by Ẋ + ζẋ + ( εx ) X = Wt () or, in state equation form, () Ẋ = X () Ẋ = ζx ( εx ) X + Wt () where Wt () is zero mean white noise E[ W() t ] =, (3) E[ W( t )Wt ( )] = ζδ( t t ) the initial conditions are X() = X () = X, Ẋ() = X () = Ẋ, () and δ( ) is the Dirac delta function. The stationary probability density function for this system is given by f X X ( x (), x ) x C --x ε = exp + --x where C is chosen such that the integral of Eq. () over the domain is unity. The two parameters are chosen to be ζ =. and ε =., and the initial joint probability distribution is bivariate Gaussian with covariance Γ = -- X Ẋ. Earthquake-Excited Linear Oscillator () In order to see the advantages of the MCS for higher-dimensional systems, a two degree of freedom oscillator will also be examined herein. Without loss of generality, this four-dimensional Figure : system is taken to be a linear oscillator driven by the Kanai-Tajimi stationary model of earthquake induced ground acceleration (Soong and Grigoriu 993), as shown in Figure. The equations of motion of this system are given by the configuration space equations Ẋ + ζωẋ + ω X = [ Ẋ g + Ẇ () t ] Ẋ g + ζ g ω g Ẋ g + ω g X g = Ẇ () t where Ẇ () t is zero mean white noise (7) E[ Ẇ () t ] =. (8) E[ Ẇ ( t )Ẇ ( t )] = πφ δ( t t ) The equivalent state space system is where yt () y g () t wt () c xt () (9), =, () () Since this is a linear, time-invariant system subjected to a white noise input, the stationary response is Gaussian with zero mean and covariance Γ XX = lim E [ X()X t T () t ] given by the t algebraic Ricatti equation, k surface ground, m g cg x g () t Bedrock Earthquake model. Note that m g» m, and thus there is no coupling of the structure dynamics into that of the ground. Ẋ() t = AX() t + GẆ () t X X X X Ẋ = = G X X 3 g X Ẋ g A = k g m ω ζω ω g ζ g ω g ω g ζ g ω g

3 + XX A T + πφ GG T =, () the solution of which is where X X X X AΓ XX XX = = X X X X 3 X X symmetric X X X X 3 X X πφ ζ g ω3 g πφ ζ g ω g πφ ω g ζζ g Ω ω 3 [ ζω 3 + ζ ω ζ g ω g + ζωζ g ω g + ζω 3 ζ g + ζ g ω3 g + ω ζ 3 g ω g ] πφ ω g = ζζ g Ω [ ζωζ ω g ω g + ζ g ω g + ω ζ3 g + ζωω g ] X X 3 πφ = ζζ g Ω [ ω + 8ζωζ g ω g ω g + 8ζ g ω g ] X X Γ πφ = X X = ζ g Ω [ ζωω g + ω ζ g ζ g ω g ] X X πφ = ζ g Ω [ ω ω g + ζωζ g ω g ω3 g + ωζ g ω g ] Ω = ω ω ω g + ζ ω ω g + ζωζ g ω3 g + ζω 3 ζ g ω g + ω g + ω ζ g ω g (3) () () () (7) (8) (9) For the parameters used in this study, ω = π, ζ =., ω g =.3, ζ g =.3, and Φ =, the covariance matrix becomes XX = () The initial density is chosen to be zero mean multivariate Gaussian with diagonal covariance Γ X X = πφ ζ g ω3 g πφ ζ g ω g () (The initial oscillator variances are ; the filter variances are chosen to be the stationary filter variances.) The number of realizations required for an accurate representation of the PDF is a topic that needs further study, but a quick measure would be to determine the expected number of realizations that fall into a given bin. At stationarity for the -D system given above, the -D marginal PDF of the structure states x and x is given by f X X ( x, x ) = exp -- πσ σ x x σ σ () At a ασ radius, that is for the locus of points for which α = ( x σ ) + ( x σ ), the distribution is given by f X X ( α) = exp πσ σ --α (3) The expected number of realizations that would fall into a bin near such a location would be E number of realizations in a bin of size x x at a radius of ασ = nf X X ( α) x x () For bin size..8, total number of realizations n =, and the stationary variance values given in Eq. (), these values are charted in Table. α f X X ( α) E [# in bin] Table : Expected number of realizations in a bin. 3 PLATFORM DESCRIPTIONS Four computing platforms were used in this study: a Cray Y MP, a Convex C, a Thinking Machines CM, and a network of Sun SPARCstation computers. Table is a speed comparison of these systems, showing maximum theoretical speed, the speed found using the FLOPS benchmark (Aburto 99), as reported by NCSA research scientist Fouad Ahmad (Cohen 993), and as found in the current study (discussed further in Performance on Various Platforms, below). 3

4 System Max. Theoretical FLOPS benchmark Ahmad study results Current study averages SPARC n/a.7 n/a. Convex C * n/a 8 Cray Y MP * node CM 9 n/a n/a -node CM 89 n/a ~ 8-node CM 38 n/a n/a -node CM 378 n/a ~ 7 -node CM 3 n/a 97 Table : MFLOPS ratings of the various platforms. ( * single-processor rating) The Cray Y-MP/, operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana Champaign, is a four-processor vectorized system running on a 7MHz clock cycle with MB of central memory and GB of secondary memory used primarily for I/O caching (UNICOS User s Guide 99). The maximum theoretical speed of this system is 333 million floating point operations per second (MFLOPS) per processor, but speeds of MFLOPS/processor are more typical. The Convex C, run by the Computing and Communications Services Office at the University of Illinois at Urbana Champaign, is a MB, four-processor vectorized system with a maximum theoretical performance of MFLOPS per processor. NCSA also operates a Thinking Machines Connection Machine CM. This is a massively parallel supercomputer with nodes; each node has one processor, vector units, and 3MB of memory. The peak theoretical speed of this system is 8 MFLOPS/node for a total theoretical speed of 3 MFLOPS (NCSA Connection Machine User Guide 993). In practice, however, 3- MFLOPS/node is more realistic (CM- CM Fortran Performance Guide 99). The CM was run in SIMD mode (single instruction stream, multiple data each processor executes the same set of instructions concurrently on different data) for this study, but can also be run in MIMD mode (multiple instruction stream, multiple data each processor works independently and passes messages to the other processors when required). A cluster of workstations administered by the College of Engineering at the University of Illinois was used as a distributed network platform. These workstations are Sun SPARCstation (/7, MHz) computers. COMPARISON OF FEM AND MCS FOR A -D DUFFING SYSTEM In order to assess the accuracy of the Monte Carlo simulations for the -D Duffing oscillator, the evolutionary second moments of the system will be examined. Figures - show the evolution of the second moments as found from FEM and by,,,, and,, realization Monte Carlo simulations. The, realization MCS does rather well over the entire analysis. In fact, the difference between the different MCS runs is hardly distinguishable except in the zoomed inset graphs that show further detail near the end of the analysis Note that for the variances of X () t and X () t, the FEM converges to a value slightly below the exact stationary variances (shown in the inset); E[ X ()X t () t ] Exact Stationary FEM MCS, Realizations MCS, Realizations MCS, Realizations Time [secs] Figure : Evolution of E[ X ()X t () t ] for the -D Duffing system computed by MCS and FEM. E[ X ()X t () t ] Exact Stationary FEM MCS, Realizations MCS, Realizations MCS, Realizations 3 Time [secs] Figure 3: Evolution of E[ X ()X t () t ] for the -D Duffing system computed by MCS and FEM.

5 3... Realizations. E[ X ()X t () t ] Time [secs] the MCS, however, while still fluctuating at t = π, is doing so about the exact value. The MCS does not do quite as well in determining the response probability density function with few realizations as it does for the moments. The PDFs at three instances in time ( t =, π, π secs ) are shown in Figs. 7 as computed by FEM and by,,,, and,, realization Monte Carlo simulations. The, realization MCS is relatively close to the FEM solution. For determining the evolution of the second moments of this system, the MCS is significantly more attractive since even a, realization simulation characterizes the moments well and required less than 3% of the 7 minutes of CPU time and less than % of the MB required by the FEM. Furthermore, the MCS appears to converge to the correct variance values. For the evolutionary PDF, however, the MCS is somewhat less outstanding but still a viable option. For this system, the, realization MCS gives the same order of performance as the FEM. The CPU time and memory requirements are summarized in Table 3. Method Exact Stationary FEM MCS, Realizations MCS, Realizations MCS, Realizations Figure : Evolution of E[ X ()X t () t ] for the -D Duffing system computed by MCS and FEM. CPU time [min] Memory [MB] FEM MCS,, Reals... MCS,, Reals MCS,, Reals MCS,, Reals MCS,, Reals Table 3: MCS and FEM computational expense on a Cray Y MP for the -D Duffing system. Velocity x Velocity x Velocity x Velocity x Realizations Realizations FEM Solution Displacement x Figure : PDFs of -D Duffing at t = secs.

6 Realizations Realizations Velocity x Velocity x Velocity x Velocity x Realizations Realizations FEM Solution Velocity x Velocity x Velocity x Velocity x Realizations Realizations FEM Solution Displacement x Displacement x Figure : PDFs of -D Duffing at t = π secs. Figure 7: PDFs of -D Duffing at t = π secs.

7 For these analyses, a mesh (. apart in each dimension) was used. It must be noted that for a finer mesh (i.e., more nodes or bins), the finite element solution will increase in computational expense, requiring the solution of a number of equations equal to the number of nodes. In order to retain the same accuracy with smaller bins, the MCS would require that the number of realizations grows with the number of bins. Thus the MCS should be more efficient, compared to the FEM, when the mesh is finer. RESULTS OF EARTHQUAKE- EXCITED OSCILLATOR SYSTEM The earthquake-excited oscillator system is a -D linear system in which two of the states, the earthquake filter states, are not of primary interest. The evolution of the second moments of the structure (oscillator) states is shown in Figs. 8. Monte Carlo simulations with,,,,,, and,, realizations were performed. The variances of X () t and X () t are fairly accurate even for a small number of realizations. The same is true of the marginal density functions of X () t and X () t for small t, i.e., while the PDF is relatively concentrated near the origin. Figure shows this marginal PDF at.7 secs into the simulation; even the, realization MCS is quite good. Due to the parameters chosen for this system, the marginal PDF rapidly disperses across the phase plane. The marginal PDF is shown in Fig. for t = secs. Here, the, realization simulation is hardly recognizable, and the, realization MCS is only marginally better. The reason for this is that, at stationarity, the magnitude of the marginal PDF is sufficiently small that the coefficient of variation of the number of realizations that fall in a given bin at a given time is high. If, however, a coarser mesh is used to determine the marginal PDF, where each bin is larger in area, the number of realizations falling in a given bin at a given time will be larger and its coefficient of variation smaller. This is apparent in the marginal PDF contour plots shown in Fig. 3, where the thin contour lines are the PDFs over a mesh (the centers of the..8 bins are at the small dots), and the bold contour lines are over a mesh (the centers of the.7. bins are at the large dots). Note that with the coarser mesh, the E[ X ()X t () t ] E[ X ()X t () t ] E[ X ()X t () t ] 8 Exact Stationary MCS, Realizations MCS, Realizations MCS, Realizations MCS, Realizations Time [secs] 8 Figure 8: Evolution of E[ X ()X t () t ] for the -D linear system. Exact Stationary MCS, Realizations MCS, Realizations MCS, Realizations MCS, Realizations Time [secs] Figure 9: Evolution of E[ X ()X t () t ] for the -D linear system Exact Stationary MCS, Realizations MCS, Realizations MCS, Realizations MCS, Realizations 8 8 Time [secs] Figure : Evolution of E[ X ()X t () t ] for the -D linear system. 7

8 Velocity x Velocity x Velocity x Velocity x Realizations Realizations Realizations Realizations Velocity x Velocity x Velocity x Velocity x Realizations Realizations Realizations Realizations Displacement x Figure : PDFs of -D Linear at t =.7 secs Displacement x Figure : PDFs of -D Linear at t = secs. 8

9 Realizations, realization is significantly more usable than its fine-mesh counterpart. Further study is needed here to quantify the trade-off between PDF accuracy and mesh coarseness in some general way. Velocity x Velocity x Velocity x Velocity x Realizations Realizations Realizations Displacement x Figure 3: PDFs of -D Linear at t = secs. Bold contours are on the coarse mesh represented by the bold dots; light contours are on the fine mesh represented by the small dots. PERFORMANCE ON VARIOUS PLAT- FORMS A total of simulations of the earthquakeexcited oscillator system were performed on the four computing platforms, varying the number of realizations, the duration of the simulation, the frequency of the storage of the PDFs, and the size of the mesh. One set of parameters ( time steps of. secs, storing the mesh of size every time steps) was chosen as the basis for comparison. (A complete simulation of this system actually requires time steps, but the performance is comparable for shorter simulations.) The memory required on various platforms is shown as a function of the number of realizations in Fig.. The reason that the Sparc (single and network) memory requirements are constant and large is that the Fortran compiler on that platform does not allow dynamic memory allocation dependent on the system parameters, resulting in the need to hard-code the array sizes large enough to accommodate any problem given. Anther observation is that the massively parallel CM has significantly higher memory requirements because, in order to parallelize the integration of Required Memory [MB] 3 Convex C Cray Y-MP -node CM- 3-node CM- Sparc -Sparc Network 3 Number of Realizations Figure : Memory requirements on various platforms for the -D linear system. 9

10 the state equations, a number of temporary variables, that are scalars on other platforms, must be arrays of the same length as the vector of state variables of all of the realizations. The result is that the CM implementation uses at least three times the amount of memory as the Cray Y MP or Convex C. Note that the CM allocates at least MB per processor regardless of the problem size. This per processor overhead is one drawback of massively parallel systems. The performance of the MCS on these platforms, as measured by the average number of millions of floating-point operations per second (MFLOPS), is shown in Fig.. The 3 node CM performance values vary quite a bit. This is because the timing of CM codes that do extensive I/O is quite inaccurate due to some quirks in the operating system software. Simulations at everywhere from half to double the average performance were observed. (The -node performance would be expected to have the same variations, but this was not verified by running multiple simulations of the same size.) The parallel implementations (CM and Sparc network) do not reach peak efficiencies until the number of realizations is quite large; thus for problems requiring large numbers of realizations, the parallel implementations appear superior. This is even more true on the CM if the PDFs are not needed; Fig. shows the effect on CM performance when PDF calculation and storage are removed. The speed increase, which is negligible on the other machines, is as large as a factor of six, as is shown in Fig. 7. On the other machines, finding the PDFs every time steps results in less than a % performance loss over the case where no PDFs are computed; finding them every time steps and every step result in less than % and less than % performance losses, respectively. On the CM, however, even computing and storing every th time step results in a tremendous performance loss. The explanation for this is that computing the PDF, essentially the calculation of a histogram, requires that the states of each realization must be passed to the front end machine to be put in a given bin. This interprocessor communication is quite slow compared to the in-processor integration of the states. (Note: the histogram algorithm is currently under investigation by NCSA to determine its performance bottlenecks, so it is possible that the CM performance loss may be partially ameliorated in future versions of the CMSSL libraries.) Performance [MFLOPS] Performance [MFLOPS] Speed Increase Factor 3 Cray Y-MP* Sparc Convex C* -node CM- 3-node CM- -Sparc Network 3 Number of Realizations Figure : Floating-point performance on various platforms for the -D linear system. ( * single-processor performance) 3-node CM- -node CM- PDFs every time steps no PDFs 3 3 Number of Realizations Figure : The effect of computing the PDFs of the -D linear system on performance of the CM node CM- -node CM- 3 Number of Realizations Figure 7: The speed increase when not computing the PDFs of the -D linear system on the CM

11 Another performance issue is the significant percentage of the time-step iteration CPU time spent in generating the uniform random values used to compute the white noise input to the system (as low as 3% on the Sparcs, % on the Cray Y MP, -% on the CM, and 8% on the Convex C). Performance could be increased significantly with faster random number generation routines. One model of speed increase in parallel computation is to define the speed increase factor by S n = () ( α) α fn () n where n is the number of processors, α the fraction of the code that cannot be processed in parallel, and fn (), a function of the number of processors, is the parallel overhead (Sues et al. 99). The parallel efficiency is defined to be S observed n ST n, where ST n is the theoretical speed increase defined by ST n = S n ( α =, fn () = ). The parallel speed increase and efficiency for the CM is shown in Fig. 8. The efficiency of the node CM is only 83% of the theoretical compared to the 3 node CM. 7 DISCUSSION AND CONCLUSIONS The CM has obvious advantages in its parallel architecture for problems requiring little interprocessor communication. Thus, if the PDF is not required, or only the stationary probability density is of interest, then the massively parallel architecture is perfectly suited to the MCS. The fast vector machines, on the other hand, are well suited to performing the PDF computation. The effort required in porting code to the CM is minimal if one is already familiar with Fortran 9 array operations (similar to the method used by MATLAB ). The authors had a working port of the MCS for this study in a matter of a week. Performance was increased two-fold with an additional week or so of investigation, reading, and re-coding. The CM does offer CMAX (Using the CMAX Converter 993), a program that attempts to convert existing Fortran 77 code to CM Fortran. In the opinion of the authors, the converter performs only marginally well for general codes. For the code used in this study, the converter did rather poorly; the generated CM Fortran code ran at only a fraction (less than %) of the speed of even the first CM Fortran code written by the authors, and less than % of the latest and most optimized version. One pitfall found here was strange behavior of the MCS on the Cray Y MP. When the number of realizations was to a power greater than 7, the PDFs were far too peaked. For any other numbers of realizations, the system behaves as expected, with the error declining with increased number of realizations. Figure 9 shows the rms error of the PDF at stationarity for the -D Duffing system discussed above. This problem was determined by Cray Research to be an error in the vectorization of the Cray random number generator that caused correlation in the white noise. Theoretical Speedup Observed Speedup Parallel Speed Increase Parallel Efficiency RMS Error #realizations=^n- #realizations=^n #realizations=^n+ other - Observed Efficiency Number of Processors.8 Number of Processors - Number of Realizations Figure 8: The parallel speed increase and efficiency on the CM. Figure 9: The error in the stationary PDF of the -D Duffing system on the Cray Y MP.

12 8 ACKNOWLEDGMENT This project has been supported in part by National Science Foundation contracts ECS- 988, CEE-9-N, and MSS-9-N, the latter two through the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. 9 REFERENCES Aburto, A. (aburto@marlin.nosc.mil) 99. FLOPS v.. benchmark code flops.c. Bergman, L.A., Spencer, B.F., Wojtkiewicz, S.F., and E.A. Johnson 99. Robust Numerical Solution of the Fokker-Planck Equation for Second Order Dynamical Systems Under Parametric and External White Noise Excitations. Proceedings of the International Symposium on Nonlinear Dynamics and Stochastic Mechanics, Waterloo, Ontario, 993 (in press). CM- CM Fortran Performance Guide 99. Version. (January 99). Cambridge, Mass.: Thinking Machines Corporation. CM- Technical Summary 993. Nov Cambridge, Mass.: Thinking Machines Corporation. Cohen, Jarrett 993. NCSA and Structural Mechanics. access 7:-. NCSA Connection Machine User Guide 993. Version.. Board of Trustees of the University of Illinois. Pradlwarter, H.J., G.I. Schuëller, and P.G. Melnik- Melnikov 99. Reliability of MDOF-Systems. Journal of Probabilistic Engineering Mechanics (in review). Soong, T. and M. Grigoriu 993. Random Vibration of Mechanical and Structural Systems. Englewood Cliffs, New Jersey: Prentice Hall. Spencer, B.F., Jr. and L.A. Bergman 993. On the Numerical Solution of the Fokker Planck Equation for Nonlinear Stochastic Systems. Nonlinear Dynamics : Sues, R.H., H.-C. Chen, and L.A. Twisdale 99. Probabilistic Structural Mechanics Research for Parallel Processing Computers. NASA CR-87. Sues, R.H., Y.J. Lua, and M.D. Smith 99. Parallel Computing for Probabilistic Response Analysis of High Temperature Composites. NASA CR-97. UNICOS User s Guide 99. Version 3., Board of Trustees of the University of Illinois. Using the CMAX Converter 993. Version. (July 993). Cambridge, Mass.: Thinking Machines Corporation.

First Excursion Probabilities of Non-Linear Dynamical Systems by Importance Sampling. REN Limei [a],*

First Excursion Probabilities of Non-Linear Dynamical Systems by Importance Sampling. REN Limei [a],* Progress in Applied Mathematics Vol. 5, No. 1, 2013, pp. [41 48] DOI: 10.3968/j.pam.1925252820130501.718 ISSN 1925-251X [Print] ISSN 1925-2528 [Online] www.cscanada.net www.cscanada.org First Excursion

More information

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Lecture No. # 33 Probabilistic methods in earthquake engineering-2 So, we have

More information

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding A Parallel Implementation of the Block-GTH algorithm Yuan-Jye Jason Wu y September 2, 1994 Abstract The GTH algorithm is a very accurate direct method for nding the stationary distribution of a nite-state,

More information

EPC procedure for PDF solution of nonlinear. oscillators excited by Poisson white noise

EPC procedure for PDF solution of nonlinear. oscillators excited by Poisson white noise * Manuscript Click here to view linked References EPC procedure for PDF solution of nonlinear oscillators excited by Poisson white noise H.T. Zhu,G.K.Er,V.P.Iu,K.P.Kou Department of Civil and Environmental

More information

New Developments in Tail-Equivalent Linearization method for Nonlinear Stochastic Dynamics

New Developments in Tail-Equivalent Linearization method for Nonlinear Stochastic Dynamics New Developments in Tail-Equivalent Linearization method for Nonlinear Stochastic Dynamics Armen Der Kiureghian President, American University of Armenia Taisei Professor of Civil Engineering Emeritus

More information

Solution of Fokker Planck equation by finite element and finite difference methods for nonlinear systems

Solution of Fokker Planck equation by finite element and finite difference methods for nonlinear systems Sādhanā Vol. 31, Part 4, August 2006, pp. 445 461. Printed in India Solution of Fokker Planck equation by finite element and finite difference methods for nonlinear systems PANKAJ KUMAR and S NARAYANAN

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Stochastic Processes- IV

Stochastic Processes- IV !! Module 2! Lecture 7 :Random Vibrations & Failure Analysis Stochastic Processes- IV!! Sayan Gupta Department of Applied Mechanics Indian Institute of Technology Madras Properties of Power Spectral Density

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

Stochastic Dynamics of SDOF Systems (cont.).

Stochastic Dynamics of SDOF Systems (cont.). Outline of Stochastic Dynamics of SDOF Systems (cont.). Weakly Stationary Response Processes. Equivalent White Noise Approximations. Gaussian Response Processes as Conditional Normal Distributions. Stochastic

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Sparse solver 64 bit and out-of-core addition

Sparse solver 64 bit and out-of-core addition Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Approximation of Top Lyapunov Exponent of Stochastic Delayed Turning Model Using Fokker-Planck Approach

Approximation of Top Lyapunov Exponent of Stochastic Delayed Turning Model Using Fokker-Planck Approach Approximation of Top Lyapunov Exponent of Stochastic Delayed Turning Model Using Fokker-Planck Approach Henrik T. Sykora, Walter V. Wedig, Daniel Bachrathy and Gabor Stepan Department of Applied Mechanics,

More information

Reliability Theory of Dynamically Loaded Structures (cont.)

Reliability Theory of Dynamically Loaded Structures (cont.) Outline of Reliability Theory of Dynamically Loaded Structures (cont.) Probability Density Function of Local Maxima in a Stationary Gaussian Process. Distribution of Extreme Values. Monte Carlo Simulation

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors Jong-bok Lee, Soo-Mook Moon and Wonyong Sung fjblee@mpeg,smoon@altair,wysung@dspg.snu.ac.kr School of Electrical Engineering,

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Parallel Particle Filter in Julia

Parallel Particle Filter in Julia Parallel Particle Filter in Julia Gustavo Goretkin December 12, 2011 1 / 27 First a disclaimer The project in a sentence. workings 2 / 27 First a disclaimer First a disclaimer The project in a sentence.

More information

Research Article A Novel Differential Evolution Invasive Weed Optimization Algorithm for Solving Nonlinear Equations Systems

Research Article A Novel Differential Evolution Invasive Weed Optimization Algorithm for Solving Nonlinear Equations Systems Journal of Applied Mathematics Volume 2013, Article ID 757391, 18 pages http://dx.doi.org/10.1155/2013/757391 Research Article A Novel Differential Evolution Invasive Weed Optimization for Solving Nonlinear

More information

Stochastic structural dynamic analysis with random damping parameters

Stochastic structural dynamic analysis with random damping parameters Stochastic structural dynamic analysis with random damping parameters K. Sepahvand 1, F. Saati Khosroshahi, C. A. Geweth and S. Marburg Chair of Vibroacoustics of Vehicles and Machines Department of Mechanical

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks Sanjeeb Nanda and Narsingh Deo School of Computer Science University of Central Florida Orlando, Florida 32816-2362 sanjeeb@earthlink.net,

More information

Reduction of Random Variables in Structural Reliability Analysis

Reduction of Random Variables in Structural Reliability Analysis Reduction of Random Variables in Structural Reliability Analysis S. Adhikari and R. S. Langley Department of Engineering University of Cambridge Trumpington Street Cambridge CB2 1PZ (U.K.) February 21,

More information

System Reliability-Based Design Optimization of Structures Constrained by First Passage Probability

System Reliability-Based Design Optimization of Structures Constrained by First Passage Probability System Reliability-Based Design Optimization of Structures Constrained by First Passage Probability Junho Chun* University of Illinois at Urbana-Champaign, USA Junho Song Seoul National University, Korea

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

ON THE CONSERVATION OF MASS AND ENERGY IN HYGROTHERMAL NUMERICAL SIMULATION WITH COMSOL MULTIPHYSICS

ON THE CONSERVATION OF MASS AND ENERGY IN HYGROTHERMAL NUMERICAL SIMULATION WITH COMSOL MULTIPHYSICS ON THE CONSERVATION OF MASS AND ENERGY IN HYGROTHERMAL NUMERICAL SIMULATION WITH COMSOL MULTIPHYSICS Michele Bianchi Janetti, Fabian Ochs, and Wolfgang Feist,2 Unit for Energy Efficient Buildings, University

More information

INTENSIVE COMPUTATION. Annalisa Massini

INTENSIVE COMPUTATION. Annalisa Massini INTENSIVE COMPUTATION Annalisa Massini 2015-2016 Course topics The course will cover topics that are in some sense related to intensive computation: Matlab (an introduction) GPU (an introduction) Sparse

More information

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL Journal of Optoelectronics and Advanced Materials Vol. 5, No. 4, December 003, p. 971-976 MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF D AND 3D ISING MODEL M. Diaconu *, R. Puscasu, A. Stancu

More information

Comparison study of the computational methods for eigenvalues IFE analysis

Comparison study of the computational methods for eigenvalues IFE analysis Applied and Computational Mechanics 2 (2008) 157 166 Comparison study of the computational methods for eigenvalues IFE analysis M. Vaško a,,m.sága a,m.handrik a a Department of Applied Mechanics, Faculty

More information

D (1) i + x i. i=1. j=1

D (1) i + x i. i=1. j=1 A SEMIANALYTIC MESHLESS APPROACH TO THE TRANSIENT FOKKER-PLANCK EQUATION Mrinal Kumar, Suman Chakravorty, and John L. Junkins Texas A&M University, College Station, TX 77843 mrinal@neo.tamu.edu, schakrav@aeromail.tamu.edu,

More information

1. Fast Iterative Solvers of SLE

1. Fast Iterative Solvers of SLE 1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid

More information

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic

More information

Reliability Theory of Dynamic Loaded Structures (cont.) Calculation of Out-Crossing Frequencies Approximations to the Failure Probability.

Reliability Theory of Dynamic Loaded Structures (cont.) Calculation of Out-Crossing Frequencies Approximations to the Failure Probability. Outline of Reliability Theory of Dynamic Loaded Structures (cont.) Calculation of Out-Crossing Frequencies Approximations to the Failure Probability. Poisson Approximation. Upper Bound Solution. Approximation

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Robust solution of Poisson-like problems with aggregation-based AMG

Robust solution of Poisson-like problems with aggregation-based AMG Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Outline. Random Variables. Examples. Random Variable

Outline. Random Variables. Examples. Random Variable Outline Random Variables M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Random variables. CDF and pdf. Joint random variables. Correlated, independent, orthogonal. Correlation,

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

EEG- Signal Processing

EEG- Signal Processing Fatemeh Hadaeghi EEG- Signal Processing Lecture Notes for BSP, Chapter 5 Master Program Data Engineering 1 5 Introduction The complex patterns of neural activity, both in presence and absence of external

More information

A Fast Newton-Raphson Method in Stochastic Linearization

A Fast Newton-Raphson Method in Stochastic Linearization A Fast Newton-Raphson Method in Stochastic Linearization Thomas Canor 1, Nicolas Blaise 1, Vincent Denoël 1 1 Structural Engineering Division, University of Liège, Chemin des Chevreuils, B52/3, 4000 Liège,

More information

Applications of Mathematical Economics

Applications of Mathematical Economics Applications of Mathematical Economics Michael Curran Trinity College Dublin Overview Introduction. Data Preparation Filters. Dynamic Stochastic General Equilibrium Models: Sunspots and Blanchard-Kahn

More information

Probability and Stochastic Processes

Probability and Stochastic Processes Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University

More information

Data analysis of massive data sets a Planck example

Data analysis of massive data sets a Planck example Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck

More information

ORDINARY DIFFERENTIAL EQUATIONS

ORDINARY DIFFERENTIAL EQUATIONS PREFACE i Preface If an application of mathematics has a component that varies continuously as a function of time, then it probably involves a differential equation. For this reason, ordinary differential

More information

Statistical Comparison and Improvement of Methods for Combining Random and Harmonic Loads

Statistical Comparison and Improvement of Methods for Combining Random and Harmonic Loads 45th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics & Materials Conference 19 - April 004, Palm Springs, California AIAA 004-1535 Statistical Comparison and Improvement of Methods for Combining

More information

THE subject of the analysis is system composed by

THE subject of the analysis is system composed by MECHANICAL VIBRATION ASSIGNEMENT 1 On 3 DOF system identification Diego Zenari, 182160, M.Sc Mechatronics engineering Abstract The present investigation carries out several analyses on a 3-DOF system.

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

APPROXIMATE DYNAMIC MODEL SENSITIVITY ANALYSIS FOR LARGE, COMPLEX SPACE STRUCTURES. Timothy S. West, Senior Engineer

APPROXIMATE DYNAMIC MODEL SENSITIVITY ANALYSIS FOR LARGE, COMPLEX SPACE STRUCTURES. Timothy S. West, Senior Engineer APPROXIMATE DYNAMIC MODEL SENSITIVITY ANALYSIS FOR LARGE, COMPLEX SPACE STRUCTURES Timothy S. West, Senior Engineer McDonnell Douglas Aerospace Space Station Division, Houston, Texas ABSTRACT During the

More information

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &

More information

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will

More information

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Lecture No. # 32 Probabilistic Methods in Earthquake Engineering-1 (Refer Slide

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

ENGR352 Problem Set 02

ENGR352 Problem Set 02 engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).

More information

MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1

MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1 Mathematics and Computers in Simulation 50 (1999) 247±254 MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1 Ivan Lirkov *, Svetozar Margenov Central Laboratory for Parallel

More information

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr, Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

BME STUDIES OF STOCHASTIC DIFFERENTIAL EQUATIONS REPRESENTING PHYSICAL LAW

BME STUDIES OF STOCHASTIC DIFFERENTIAL EQUATIONS REPRESENTING PHYSICAL LAW 7 VIII. BME STUDIES OF STOCHASTIC DIFFERENTIAL EQUATIONS REPRESENTING PHYSICAL LAW A wide variety of natural processes are described using physical laws. A physical law may be expressed by means of an

More information

NON-LINEAR PARAMETER ESTIMATION USING VOLTERRA AND WIENER THEORIES

NON-LINEAR PARAMETER ESTIMATION USING VOLTERRA AND WIENER THEORIES Journal of Sound and Vibration (1999) 221(5), 85 821 Article No. jsvi.1998.1984, available online at http://www.idealibrary.com on NON-LINEAR PARAMETER ESTIMATION USING VOLTERRA AND WIENER THEORIES Department

More information

A STRATEGY FOR IDENTIFICATION OF BUILDING STRUCTURES UNDER BASE EXCITATIONS

A STRATEGY FOR IDENTIFICATION OF BUILDING STRUCTURES UNDER BASE EXCITATIONS A STRATEGY FOR IDENTIFICATION OF BUILDING STRUCTURES UNDER BASE EXCITATIONS G. Amato and L. Cavaleri PhD Student, Dipartimento di Ingegneria Strutturale e Geotecnica,University of Palermo, Italy. Professor,

More information

ANNEX A: ANALYSIS METHODOLOGIES

ANNEX A: ANALYSIS METHODOLOGIES ANNEX A: ANALYSIS METHODOLOGIES A.1 Introduction Before discussing supplemental damping devices, this annex provides a brief review of the seismic analysis methods used in the optimization algorithms considered

More information

Stochastic chemical kinetics on an FPGA: Bruce R Land. Introduction

Stochastic chemical kinetics on an FPGA: Bruce R Land. Introduction Stochastic chemical kinetics on an FPGA: Bruce R Land Introduction As you read this, there are thousands of chemical reactions going on in your body. Some are very fast, for instance, the binding of neurotransmitters

More information

Building Blocks for Direct Sequential Simulation on Unstructured Grids

Building Blocks for Direct Sequential Simulation on Unstructured Grids Building Blocks for Direct Sequential Simulation on Unstructured Grids Abstract M. J. Pyrcz (mpyrcz@ualberta.ca) and C. V. Deutsch (cdeutsch@ualberta.ca) University of Alberta, Edmonton, Alberta, CANADA

More information

THE problem of phase noise and its influence on oscillators

THE problem of phase noise and its influence on oscillators IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 54, NO. 5, MAY 2007 435 Phase Diffusion Coefficient for Oscillators Perturbed by Colored Noise Fergal O Doherty and James P. Gleeson Abstract

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

PHI: Open quantum dynamics of multi-molecule systems

PHI: Open quantum dynamics of multi-molecule systems University of Illinois at Urbana-Champaign Beckman Institute for Advanced Science and Technology Theoretical and Computational Biophysics Group PHI: Open quantum dynamics of multi-molecule systems Johan

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19 EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org (based on Dr. Raj Jain s lecture

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System Dan M. Ghiocel & Joshua Altmann STI Technologies, Rochester, New York, USA Keywords: reliability, stochastic

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

FAS and Solver Performance

FAS and Solver Performance FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07

More information

Integer Factorisation on the AP1000

Integer Factorisation on the AP1000 Integer Factorisation on the AP000 Craig Eldershaw Mathematics Department University of Queensland St Lucia Queensland 07 cs9@student.uq.edu.au Richard P. Brent Computer Sciences Laboratory Australian

More information

ON THE CORRELATION OF GROUND MOTION INDICES TO DAMAGE OF STRUCTURE MODELS

ON THE CORRELATION OF GROUND MOTION INDICES TO DAMAGE OF STRUCTURE MODELS 3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 74 ON THE CORRELATION OF GROUND MOTION INDICES TO DAMAGE OF STRUCTURE MODELS Gilbert MOLAS, Mohsen RAHNAMA

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Dynamic Analysis in FEMAP. May 24 th, presented by Philippe Tremblay Marc Lafontaine

Dynamic Analysis in FEMAP. May 24 th, presented by Philippe Tremblay Marc Lafontaine Dynamic Analysis in FEMAP presented by Philippe Tremblay Marc Lafontaine marc.lafontaine@mayasim.com 514-951-3429 date May 24 th, 2016 Agenda NX Nastran Transient, frequency response, random, response

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Roundoff Noise in Digital Feedback Control Systems

Roundoff Noise in Digital Feedback Control Systems Chapter 7 Roundoff Noise in Digital Feedback Control Systems Digital control systems are generally feedback systems. Within their feedback loops are parts that are analog and parts that are digital. At

More information

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for DHANALAKSHMI COLLEGE OF ENEINEERING, CHENNAI DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING MA645 PROBABILITY AND RANDOM PROCESS UNIT I : RANDOM VARIABLES PART B (6 MARKS). A random variable X

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information