Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30
Scientific computing Involves the solution of large computational problems that arise in applications. Data Science: developing methods to extract knowledge from large data. Numerical algorithms for solving discrete and continuous problems. High performance computing: gaining efficiency and performance. The fundamental mathematical tools: Applied Linear Algebra. Mathematical Optimization. Partial Differential Equations. (Statistics). 2 / 30
Applied Linear Algebra The most basic tool. Simulations of natural phenomena. Matrices represent graphs (directed or undirected). Data measurements are often given in a form of a table - a matrix. We usually want To solve a linear system. To approximate a solution of a linear system (solve a least squares problem). Find eigenvalues and eigenvectors. 3 / 30
Applied Linear Algebra In many cases the matrices are sparse (graphs, meshes...): Sparse: O(n) non zeros out of n 2 entries. n reaches tens of millions, standard methods require O(n 3 ) for typical tasks. 4 / 30
Ranking of web-pages The problem: finding the leading eigenvector of a huge sparse (stochastic) matrix. 5 / 30
Mathematical Optimization In many cases we wish to optimize situations: Improve bad images. Earn more money in stocks. Reduce costs and time of some process. Reduce bottlenecks in airports/roads or in internet networks. Reduce storage of data (compression). We may also want to Characterize data that we collected (classification). Learn certain properties from data in order to predict future data. Reconstruct shapes from measurements (ultrasound, 3D scanning, CT, MRI). 6 / 30
Optimization problems All of these ideas are often formalized as Unconstrained minimization problem x = arg min f (x). x R n f (x) is a scalar cost function that we wish to minimize. x are the properties that we wish to find/learn. Minimization may be subject to constraints. 7 / 30
Hand-writing characterization Given hand written images and labels, learn how to classify letters. 8 / 30
Imaging applications Also: compression, compressed sensing, computed tomography. 9 / 30
Statistical covariance learning We learn Σ by maximizing the above probability of {x i }. 10 / 30
Statistical covariance learning applications Can be also used in other imaging and data science applications. 11 / 30
Optimization problems Back to our problem Unconstrained minimization problem x = arg min f (x). x R n f (x) is a scalar cost function that we wish to minimize. x may have hundreds of millions of unknowns. We wish to have an efficient, memory-friendly, and parallel methods (software) for solving such problems. 12 / 30
Partial differential equations (PDEs) A differential equation is an equation where a function is the unknown f (x) + f (x) = 0 f (x) = Asin(x) + Bcos(x) f (x) f (x) = 0 f (x) = Aexp(x) Equations are usually multidimensional: f = q. Equations are discretized on a mesh and solved numerically. This often leads to a sparse linear system, or a discrete time integration. 13 / 30
Partial differential equations (PDEs) The heterogenous (anisotropic) diffusion equation (u(x) unknown): σ(x) u(x) = q(x). The wave equation in frequency domain: u + ω 2 Mu = q We will focus on multigrid methods: 14 / 30
Parameter estimation of physical phenomena Combines optimization and PDEs. Full waveform inversion: given measurements of earth vibrations (from explosives) we estimate the structure of the earth s subsurface. Similar problems: ultrasound imaging, elecrtomagnetic imaging. 15 / 30
About the course Projects in groups of 2-3 students. Each group selects a project out of a list. Groups may suggest their own project. Final product: code + report (method description and demonstration). Programming language: Matlab, Julia, Python. C/C++ where necessary. The project is presented before the end of moed A exams. Grade: quality of the work, understanding the material, independence. 16 / 30
Project 1: Linear algebra Parallel LU solver for multiple right-hand-sides. The problem: Ax i = b i We use an LU factorization of A: A = LU i = 1,..., k where L - lower triangular. U - upper. The goal - use a multicore machine to solve all the equations in parallel given L, U. Requires programming in C using multithreading. Requires support in 16-bit operations. Suitable only for a group of two. 17 / 30
Project 2: Linear algebra Chebyshev linear iteration. The problem: Ax = b We assume that we have some iterative method: x (k+1) = x (k) M 1 (b Ax (k) ). Chebyshev iteration accelerates the method without inner products. Advantageous in parallel, low precision computations. Requires understanding the Chebyshev iteration and implementing it. Suitable only for a group of two. 18 / 30
Projects 3/4: Multigrid for general linear systems or PDEs Algebraic multigrid methods for Google s Pagerank. Bx = x, x = 1. Suitable for two or three. Geometric Multigrid for PDEs: σ(x) u(x) = q(x). Suitable for two. 19 / 30
Project 5: Sparse optimization Optimization problems in many fields are sometimes formulated to have a sparse solution. This is a way to reduce the number of parameters. Involves with a non-smooth optimization problem. Problems: LASSO, logistic regression, soft-max regression, covariance estimation. Applications: Machine learning, imaging. Suitable for two or three. 20 / 30
Project 6: Stochastic optimization A leading optimization method in machine learning. Works best when there is a lot of data. Problems: Least squares, LASSO, logistic regression, soft-max regression, covariance estimation, matrix completion. Applications: Machine learning, deep learning, imaging. Suitable for two or three. 21 / 30
Project 7: Fast convolutions Convolutions are a crucial and expensive part of deep learning. In most cases, learning of networks require a lot of time. The goal: efficient implementation of a deep learning convolution. Requires programming in C using multithreading. Requires understanding how to connect to Intel s CNN code. Suitable for two or three. 22 / 30
Project 8: Learning Gaussian Mixture Models Gaussian distribution are a fundamental statistical tool. Gaussian mixture are even more powerful: x i α i N (µ i, Σ i ) The goal: Understanding how to learn the parameters of Gaussian Mixture Models, and trying algorithms. Suitable for two or three. 23 / 30
Project 9: Deep learning for decision making A competition issued by psychologists in the Technion. Psychological models VS deep learning. The goal: build a deep learning model for predicting answers for questions. Suitable for three. 24 / 30
Project 10: Stochastic optimization for Full Waveform Inversion For FWI - standard and trace-estimation based methods are applicable. The goal: compare between the two methods. Suitable for three. 25 / 30
Length metrics - vector norms What is the length of a vector v R n? We usually use one of the following norms v 2 = ( n i=1 v 2 i ) 1 2, v 1 = n i=1 v i, v = max v i i A norm is always non-negative, and satisfy λv = λ v for λ R. We will use the scalar (dot) product u, u = i u i v i = u v. Also u 2 = u, u. 26 / 30
Quadratic optimization Quadratic optimization x = arg min x, Ax x, b. x R n where A R n n is symmetric, b R n. The answer for this problem is obtained by solving a linear system Ax = b. Definition (Positive definite matrices) A matrix A R n n is called positive definite (PD) if x, Ax > 0 x R n (A 0). If the matrix A is symmetric positive definite then the minimization above is well-defined and has a unique solution. 27 / 30
Least Squares Minimization Example (Best linear approximation from measurements.) Suppose that you are given with {(x i, y i )} n i=1 and wish to find scalars a, b such that ax i + b y i for all i in an optimal way. In matrix form [ ] x 1 1 a x 2 1 b.. x n 1 y 1 y 2. y n. 28 / 30
Least Squares Minimization Example (Best linear approximation from measurements.) A common choice for the optimality measure is the l 2 norm. We get a problem (ax i + b y i ) 2, or min a,b min a,b i x 1 1 x 2 1.. x n 1 [ a b ] y 1 y 2. y n This problem is also called the linear regression, and it boils to least squares minimization. This is just a quadratic minimization, which can be solved by solving a linear system. 2 2. 29 / 30
Least Squares Minimization Example (Best linear approximation from measurements.) In the language of some of our applications, we found parameters a, b that characterize the data {(x i, y i )} n i=1. If we did not have enough data (say, one point) then we would have needed more information/assumptions. 30 / 30