Image Reconstruction And Poisson s equation

Similar documents
Modelling and implementation of algorithms in applied mathematics using MPI

Kasetsart University Workshop. Multigrid methods: An introduction

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Poisson Equation in 2D

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Finite difference methods. Finite difference methods p. 1

Block-tridiagonal matrices

Solving PDEs with CUDA Jonathan Cohen

Overview: Synchronous Computations

Elliptic Problems / Multigrid. PHY 604: Computational Methods for Physics and Astrophysics II

Stabilization and Acceleration of Algebraic Multigrid Method

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains

Review: From problem to parallel algorithm

Iterative Methods and Multigrid

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

PDEs in Image Processing, Tutorials

Multigrid Methods and their application in CFD

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

MATH 333: Partial Differential Equations

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Notes for CS542G (Iterative Solvers for Linear Systems)

Adaptive algebraic multigrid methods in lattice computations

Numerical Analysis of Differential Equations Numerical Solution of Elliptic Boundary Value

Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa

Parallel Programming in C with MPI and OpenMP

c 2015 Society for Industrial and Applied Mathematics

Earth System Modeling Domain decomposition

9. Iterative Methods for Large Linear Systems

7 Mathematical Methods 7.6 Insulation (10 units)

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Linear Systems of Equations by Gaussian Elimination

Numerical Solution of Laplace's Equation

Solving the Generalized Poisson Equation Using the Finite-Difference Method (FDM)

ECE539 - Advanced Theory of Semiconductors and Semiconductor Devices. Numerical Methods and Simulation / Umberto Ravaioli

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Solving Linear Systems

Representing regions in 2 ways:

Iterative Methods for Solving A x = b

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

Next topics: Solving systems of linear equations

Motivation: Sparse matrices and numerical PDE's

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI

Iterative Methods and Multigrid

Lecture 18 Classical Iterative Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Iterative Solvers. Lab 6. Iterative Methods

Domain decomposition for the Jacobi-Davidson method: practical strategies

A simple FEM solver and its data parallelism

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Numerically Solving Partial Differential Equations

Robust solution of Poisson-like problems with aggregation-based AMG

Lecture 4: Linear Algebra 1

The Finite-Difference Time-Domain (FDTD) Algorithm

The Finite-Difference Time-Domain (FDTD) Algorithm

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters

JACOBI S ITERATION METHOD

Cache Oblivious Stencil Computations

A Hybrid Method for the Wave Equation. beilina

Scientific Computing II

Computational Linear Algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra)

1. Fast Iterative Solvers of SLE

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Finite Difference Methods for Boundary Value Problems

A Novel Technique to Improve the Online Calculation Performance of Nonlinear Problems in DC Power Systems

Solving PDEs with Multigrid Methods p.1

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Lecture 7: Edge Detection

MA3232 Numerical Analysis Week 9. James Cooley (1926-)

Lecture 4. Writing parallel programs with MPI Measuring performance

Chapter 6. Finite Element Method. Literature: (tiny selection from an enormous number of publications)

New Multigrid Solver Advances in TOPS

Geometric Multigrid Methods

1 - Systems of Linear Equations

Divergence Formulation of Source Term

A Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya

Scientific Computing: An Introductory Survey

Numerical Analysis Comprehensive Exam Questions

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW

High Performance Nonlinear Solvers

c 1999 Society for Industrial and Applied Mathematics

Solution of Linear Systems

Theoretical Computer Science

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Linear Diffusion and Image Processing. Outline

Edge Detection in Computer Vision Systems

Solving linear systems (6 lectures)

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Covariance Matrix Simplification For Efficient Uncertainty Management

Numerical Methods I Non-Square and Sparse Linear Systems

9.1 Preconditioned Krylov Subspace Methods

Transcription:

Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I

Chapter 1, p. 2/58 Outline 1 2 3 4

Chapter 1, p. 3/58 Question What have image processing and the solution of partial differential equations in common?

Chapter 1, p. 4/58 Digital Images Definition A (digital) image is an M N-matrix of pixel values, the pixmap. We will assume that each pixel is represented by its gray level. Thus, we assume the image to be black/white. A colored image consists of a collection of pixmaps. We assume that the type of a pixel is double. In practice, most often 8-bit values are used (unsigned char)

Chapter 1, p. 5/58 Smoothing, Sharpening, Noise Reduction Smoothing suppresses large fluctuations in intensity over the image Sharpening accentuates transitions and enhances the details Noise reduction suppresses a noise signal present in an image

Chapter 1, p. 6/58 Smoothing By Local Filtering Idea: Replace each pixel value u mn by the mean of the surrounding pixels: ũ mn = 1 9 (u m 1,n 1 + u m 1,n + u m 1,n+1 + u m,n 1 + u mn + u m,n+1 + u m+1,n 1 + u m+1,n + u m+1,n+1 )

Chapter 1, p. 7/58 Smoothing: Example Original Result

Chapter 1, p. 8/58 Weighted Masks mean value can be conveniently be described by a a 3 3 matrix W, W = 1 1 1 1 1 1 1 9 1 1 1 Application: ũ mn =w 1, 1 u m 1,n 1 + w 1,0 u m 1,n + w 1,1 u m 1,n+1 + w 0, 1 u m.n 1 w 0,0 u mn + w 0,1 u m,n+1 + w 1, 1 u m+1,n 1 w 1,0 u m+1,n + w 1,1 u m+1,n+1 Mathematically: Convolution

Chapter 1, p. 9/58 Noise Reduction Noisy image W = 1 16 1 1 1 1 8 1 1 1 1 denoised

Chapter 1, p. 10/58 Edge Detection Edge detection is the high-lightening of the edges of an object, where an edge is a significant change in the gray-level intensity. Basic idea: rate of change of a quantity can be measured by the magnitude of its derivative(s)

Chapter 1, p. 11/58 Example

Chapter 1, p. 12/58 Laplace Operator Definition For any function u defined on some two-dimensional domain, the Laplacian u of u is defined as u = 2 u x 2 + 2 u y 2.

Chapter 1, p. 13/58 Approximating Laplacian Approximate the derivatives, 2 u x 2 (x, y) 1 (u(x h, y) 2u(x, y) + u(x + h, y)), h > 0 h2 and similarly for 2 u/ y 2. We obtain the weight matrix, W = 1 h 2 0 1 0 1 4 1 0 1 0 This fits exactly in our framework!

Chapter 1, p. 14/58 Edge Detection By Discrete Laplacian Original Edges

Chapter 1, p. 15/58 Using First Order Derivatives: Sobel Operator Original Edges

Chapter 1, p. 16/58 Poisson s Equation Ubiquitous equation Fluid flow, electromagnetics, gravitational interaction,... In two dimensions, Poisson s equation reads: Solve u = f (x, y) for (x, y) Ω, subject to the boundary condition u(x, y) = g(x, y) for (x, y) Ω For simplicity, consider only Ω = (0, 1) (0, 1). Generalizations to other dimensions are obvious.

Chapter 1, p. 17/58 Discrete Approximation Define a mesh (or grid): For a given N, let h = 1/(N 1), x m = mh, y n = nh Let u mn u(x m, y n ), f mn = f (x m, y n ). Using the Laplace approximation from above, we obtain a system of equations 1 h 2 (u m 1,n + u m+1,n + u m,n 1 + u m,n+1 4u mn ) = f mn, 0 < m, n < N 1 In the context of pde s, the matrix W is usually called a stencil.

Chapter 1, p. 18/58 Jacobi Iteration Basic idea: Rewrite the equations as u mn = 1 4 (u m 1,n + u m+1,n + u m,n 1 + u m,n+1 h 2 f mn ) For some starting guess (e.g., u mn = 0), iterate this equation, while (not_done) for (m,n) in 1:N-2 x 1:N-2 ut(m,n)=(u(m-1,n)+u(m+1,n)+u(m,n-1) +u(m,n+1)-h^2f(m,n))/4; end u = ut; end

Chapter 1, p. 19/58 Accuracy How do we know that the answer is good enough? When the computed solution has reached a reasonable approximation to the exact solution When we can validate the computed solution in the field But often we do not know the exact solution, and must estimate the error, e.g., Stop when the residual is small enough, r = Au f Stop when the change u u in u is small. Both approaches must be designed carefully!

Chapter 1, p. 20/58 Boundary Conditions Evaluating the stencil is not possible near the boundary. For Poisson s equation, invoke the boundary condition. In image processing, there are two possibilities: 1 Discard the boundary (the new image is 2 pixels smaller in both dimensions). 2 Modify the weight matrices such that only existing neighbors are used.

Chapter 1, p. 21/58 Common Denominator Conclusion methods considered use a uniform mesh for their data. Such methods are very common in applications y can be easily adapted to problems of any (spatial) dimension

Chapter 1, p. 22/58 Observations Observations computations for each point u ij are completely decoupled number of operations per data point is constant new value at each data point depends only on its nearest neighbors Conclusion A good parallelization strategy is data partitioning. To keep things as simple as possible, consider only a one-dimensional array (a vector) u = (u 0,..., u M 1 ) T

Chapter 1, p. 23/58 Definition Assume that we have P processes (enumerated 0,..., P 1). A P-fold data distribution of the index set M = {0,..., M 1} is a bijective mapping µ which maps each global index m M to a pair of indices (p, i) where p is the process identifier and i the local index. Notes: This definition allows for the fact that the number of elements on each process varies with p. Of course, this is necessary if P does not divide M evenly. Technically, we assume that the local index set on each process is a set of consecutive integers, often (but not always!) 0 i < I p.

Chapter 1, p. 24/58 Example: Linear Idea Split the vector into equal chunks and allocate the p-th chunk to process p. p = 0 : u 0, u 1,..., u I0 1 p = 1: u I0,..., u I0+I 1 1 p :u Ip 1,..., u Ip 1+I p 1 If P does not divide M evenly, distribute the remaining R elements to the first few processes. load-balanced linear data distribution is: M L = P R =M mod P { ( ) p = max m, m R µ(m) =(p, i) where L+1 L i = m pl min(p, R) M + P p 1 I p = P µ 1 (p, i) =pl + min(p, R) + i

Chapter 1, p. 25/58 Example: Scatter Idea Allocate consecutive vector components to consecutive processes. p = 0: u 0, u P, u 2P,... p = 1: u 1, u P+1,... p: u p, u P+p,... load balanced scatter distribution is: { p = m mod P µ(m) =(p, i) where i = m P M + P p 1 I p = P µ 1 (p, i) =ip + p

Chapter 1, p. 26/58 A Distributed Vector one-dimensional version of the convolution formula reads ũ m = w 1 u m 1 + w 0 u m + w 1 u m+1 m 1 m m+1 Each evaluation needs its neighbors. Consequently, the linear data distribution is most appropriate Each process needs one element stored on the processes to the left and right External element External element Local elements Conclusion In addition to the local data, introduce ghost cells.

Chapter 1, p. 27/58 Ghost Cells Two adjacent processes p and p + 1 need to share two data points. This is called the overlap between two processes overlap is a = 2 in this case. overlap is dependent on the width of the stencil local array u i, 0 i < I p will be surrounded by two cells (with the exception of the first and the last processes) This is conveniently done by enlarging the local vector

Chapter 1, p. 28/58 Linear With Overlap Let q = a/2. M =M + (P 1)a M L = P R = M mod P { L + 1, for p < R, I p = L, for p R µ 1 (p, i) =(L a)p + min(r, p) + i Note that the latter formulae is only defined for 0 < i < I p 1 So u[0] and u[i p 1] do not contain valid entries!

Chapter 1, p. 29/58 Fill Ghost Cells: Communication Before we can start applying the stencil, the ghost cells must be filled Attempted erroneous solution (assume a = 2 for simplicity) receive(u[0],p-1); send(u[1],p-1); receive(u[ip-1],p+1); send(u[ip-2],p+1); Deadlock! Mismatch in communication. All processes waiting to receive Possible solutions: Rewrite program so that calls to send and receive are matched non-blocking communication

Chapter 1, p. 30/58 Communication: A New Attempt Exchange send and receive: if p > 0 send(u[1],p-1); receive(u[0],p-1); end if p < P-1 receive(u[ip-1],p+1); send(u[ip-2],p+1); end Code works! But very inefficient! Most processes are idle during communication Possible solution: Use different communication pattern

Chapter 1, p. 31/58 An Efficient But Unreliable Solution send(u[ip-2],p+1); receive(u[0],p-1); send(u[1],p-1); receive(u[ip-1],p+1); Properties: + communication time is optimal: 2(t startup + 8t data ) - Relies on the network to buffer the messages. This is not guaranteed by MPI!

Chapter 1, p. 32/58 Safe Solution idea is a red-black (chequerboard) coloring: Even p: assign red Odd p: assign black Communication appears in two steps: red/black and black red: if mycolor == red send(u[ip_2],p+1); receive(u[ip-1],p+1); send(u[1],p-1); receive(u[0],p-1); else receive(u[0],p-1); send(u[1],p-1); receive(u[ip-1],p+1); send(u[ip-2],p+1); end Communication time is only doubled compared to the previous version.

Chapter 1, p. 33/58 Generalizations To Two Dimensions Sample stencil (Poisson): Use an array of R = P Q processes Distribute equal chunks of the pixmap/solution onto these processes Different partitions are called process geometry or process topology

Chapter 1, p. 34/58 Process Topology P = Q = 2 P = 1, Q = 5

Chapter 1, p. 35/58 Ghost Cells Each process needs values found on neighboring processes Use ghost cells, Circles: local grid points Crosses: ghost points data distribution is constructed individually for the x and y directions along the lines of the 1D example

Chapter 1, p. 36/58 Communication of Ghost Points Question How should the exchange of the ghost points corresponding to the inter-process boundaries be implemented? handling of the outer boundaries depends on the problem at hand (either ignore them or apply physical boundary conditions).

Chapter 1, p. 37/58 Some notation For a process with coordinates (p, q), the neighbors are defined as follows (if they exist): neighbor coordinates east (p + 1, q) west (p 1, q) north (p, q + 1) south (p, q 1)

Chapter 1, p. 38/58 Non-Blocking Implementation 1 Initiate send (MPI_Isend) to east, west, north, and south neighbors (if present) 2 Initiate receive (MPI_Irecv) from west, east, south, and north neighbors 3 Evaluate the stencil away from the boundaries 4 Wait for communication to complete 5 Evaluate stencil near boundaries

Chapter 1, p. 39/58 Red-Black Communication generalizes the red-black communication in 1D Associate each process with a color (red or black) in the p and q directions such that no neighbor has the same color East-west sweep if color(1) == black send(p+1,q); receive(p+1,q); send(p-1,q); receive(p-1,q); else receive(p-1,q); send(p-1,q); receive(p+1,q); send(p+1,q); end

Chapter 1, p. 40/58 Red-Black Communication (cont) South-north sweep if color(2) == black send(p,q+1); receive(p,q+1); send(p,q-1); receive(p,q-1); else receive(p,q-1); send(p,q-1); receive(p,q+1); send(p,q+1); end

Chapter 1, p. 41/58 Red-Black Communication (cont) (R,B) 2 (R,R) 4 (R,B) 1 3 1 5 5 5 5 2 (R,R) 6 6 6 6 q (B,R) (B,B) (B,R) 2 4 2 1 3 1 (B,B) p Number and colors show the communication pattern process color indicated by (q, p) (note the order)

Chapter 1, p. 42/58 Red-Black Communication Time We assume a perfectly load balanced (linear) distribution, I p M P, J q N Q East-west sweep: t comm,1 = C(P)(t startup + I p t data ) where 0, if P = 1 C(P) = 2, if P = 2 4, if P 3 Similarly, for the South-north sweep: Total communication time t comm,2 = C(Q)(t startup + J q t data ) t comm (C(P) + C(Q))t startup + t data (C(P)QM + C(Q)PN) PQ

Chapter 1, p. 43/58 Time Assume a (compact) stencil w 1, 1 w 0, 1 w 1,1 W = w 1,0 w 0,0 w 0,1 w 1,1 w 0,1 w 1,1 Let w be the number of nonzero entries in W. n (0 < α is a small constant) Best sequential time t comp,pq = αwi p J q t a αw MN PQ t a T s = αwmnt a

Chapter 1, p. 44/58 Speedup S R = S PQ = T s T R αwmnt a R αwmnt a + 8Rt startup + 4(QM + PN)t data 1 R 1 + 8Rtstartup αwmnt a + 4 αw ( P M + Q N ) t data t a Conclusions For constant R, the speedup reaches an optimal value if MN becomes large If MN is fixed, the speedup will eventually degrade if R gets larger speedup becomes better if (P/M + Q/N) attains a minimum for a given problem size and a given number of processes

Chapter 1, p. 45/58 Optimal process Topology For a given problem size MN and a given number of processes R, find P and Q = R/P such that ( P Φ(P) = M + Q ) N becomes minimal A simple calculation gives P = (provided that these are integers) M N R In the case M = N and R being a square, P = R

Chapter 1, p. 46/58 Efficiency For typical data on lucidor, this is the efficiency E R = S R /R 1 0.9 0.8 efficiency 0.7 0.6 0.5 0.4 0.3 0.2 0.1 N=1024 N=2048 N=4096 N=8192 N=16384 0 10 0 10 1 10 2 10 3 processors

Chapter 1, p. 47/58 Communication Fraction communication fraction 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 N=1024 N=2048 N=4096 N=8192 N=16384 0.2 0.1 0 10 0 10 1 10 2 10 3 processors

Chapter 1, p. 48/58 Surface to Volume Ratio Observation: computation time t comp is proportional to the area I p J q of the data communication time t comm is proportional to the perimeter 2(I p + J q ) Area-perimeter law communication time is negligible if the number of data M N is large compared to the number of processes.

Chapter 1, p. 49/58 Curse of Dimensionality As we move to higher dimensional spaces, communication becomes relatively more costly, in 1D: 2/N in 2D: 4N/N 2 = 4/N in 3D: 6N 2 /N 3 = 6/N

Chapter 1, p. 50/58 Virtual Topologies Virtual Topologies MPI includes a number of standard routines for defining and handling different process topologies. y are called virtual topologies. se routines lead to a great simplification of the programming efforts needed.

Chapter 1, p. 51/58 Jacobi Iteration Basic idea: Rewrite the equations as u mn = 1 4 (u m 1,n + u m+1,n + u m,n 1 + u m,n+1 h 2 f mn ) For some starting guess (e.g., u mn = 0), iterate this equation, while (not_done) for (m,n) in 1:N-2 x 1:N-2 ut(m,n)=(u(m-1,n)+u(m+1,n)+u(m,n-1) +u(m,n+1)-h^2f(m,n))/4; end u = ut; end

Chapter 1, p. 52/58 Gauss-Seidel Iteration Observation: Jacobi iteration converges very slowly u k+1 mn = 1 4 (uk m 1,n + u k m+1,n + u k m,n 1 + u k m,n+1 h 2 f mn ) Idea Use the new (better?) values as soon as they are available

Chapter 1, p. 53/58 Gauss-Seidel Iteration (cont) while (not_done) for (m,n) in 1:N-2 x 1:N-2 u(m,n)=(u(m-1,n)+u(m+1,n)+u(m,n-1) +u(m,n+1)-h^2f(m,n))/4; end % u = ut; end Observation This iteration depends on the order of the unknown!

Chapter 1, p. 54/58 Lexicographic Order Definition lexicographic order of the array u mn is given by u 11, u 21, u 31,... u M,1, u 12, u 22,..., u MN lexicographic order corresponds to for n = 1:N for m = 1:M % u(m,n) =... end end

Chapter 1, p. 55/58 Pipelined s Gauss-Seidel iterations are purely sequential Assume a P Q process grid as before Process (p, q) cannot start computing before the values on processes (p 1, q) and (p, q 1) are available This leads to pipelined computations. At every moment in time, only the processes along diagonals are active.

Chapter 1, p. 56/58 How to Parallelize? Idea Use red-black ordering! Black points: m + n is even Red points: m + n is odd Gauss-Seidel Iteration u mn = 1 4 (u m 1,n + u m+1,n + u m,n 1 + u m,n+1 h 2 f mn ) If u mn is black, the values on the right hand side are all red and vice versa. black sweep and the red sweep can be parallelized independently Note: This is a different kind of iteration!

Chapter 1, p. 57/58 Final Remarks More efficient methods for solving Poisson s equation include multigrid methods For a full 9-point stencil, four colors are needed Today, the most complex parallel circuit in a PC is the GPU (graphic processing unit) Not surprisingly, the GPU is used as a parallel solver unit even for PDEs MPI includes the possibility to define virtual topologies thus simplifying the design of the communication a lot

Chapter 1, p. 58/58 What Did We Learn? Evaluation of stencils for different purposes (image processing, solutions of partial differential equations) distributions, ghost points, practical aspects Efficient communication strategies Performance evaluation of the corresponding algorithms Pipelined computations (Gauss-Seidel iterations) Reformulation of recursive algorithms