Mini-project in scientific computing

Similar documents
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Logistic Regression

Eigenvalues and Eigenvectors

A Note on Google s PageRank

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Preface to Second Edition... vii. Preface to First Edition...

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

A New Method to Find the Eigenvalues of Convex. Matrices with Application in Web Page Rating

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Introduction to Machine Learning

Compressed Sensing and Neural Networks

Math 307 Learning Goals. March 23, 2010

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Lecture 2: Linear Algebra Review

course overview 18.06: Linear Algebra

Motivating the Covariance Matrix

CS534 Machine Learning - Spring Final Exam

3.3 Eigenvalues and Eigenvectors

Discriminative Models

CS6220: DATA MINING TECHNIQUES

Introduction to Machine Learning

14 Singular Value Decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Solvers. Andrew Hazel

Methods for sparse analysis of high-dimensional data, II

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

CS 6375 Machine Learning

Applied Linear Algebra in Geoscience Using MATLAB

ECE521 lecture 4: 19 January Optimization, MLE, regularization

CS 542G: Conditioning, BLAS, LU Factorization

Channel Pruning and Other Methods for Compressing CNN

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Lecture 2: Linear Algebra Review

Solving linear systems (6 lectures)

Discriminative Models

CS6220: DATA MINING TECHNIQUES

CS281 Section 4: Factor Analysis and PCA

Dimensionality Reduction

Basic Concepts in Linear Algebra

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

STA141C: Big Data & High Performance Statistical Computing

Linear Algebra (MATH ) Spring 2011 Final Exam Practice Problem Solutions

Hands-on Matrix Algebra Using R

MATH36061 Convex Optimization

Methods for sparse analysis of high-dimensional data, II

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Final Exam. Problem Score Problem Score 1 /10 8 /10 2 /10 9 /10 3 /10 10 /10 4 /10 11 /10 5 /10 12 /10 6 /10 13 /10 7 /10 Total /130

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

CS6220: DATA MINING TECHNIQUES

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Review of Basic Concepts in Linear Algebra

Stabilization and Acceleration of Algebraic Multigrid Method

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

B553 Lecture 5: Matrix Algebra Review

Numerical linear algebra

6 Linear Systems of Equations

STA141C: Big Data & High Performance Statistical Computing

ACM/CMS 107 Linear Analysis & Applications Fall 2017 Assignment 2: PDEs and Finite Element Methods Due: 7th November 2017

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

6.842 Randomness and Computation March 3, Lecture 8

Introduction to Machine Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Lecture Notes on Support Vector Machine

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Course Notes: Week 1

Support Vector Machines

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Statistical Machine Learning

The recognition of substantia nigra in brain stem ultrasound images based on Principal Component Analysis

Least Squares Optimization

Introduction, basic but important concepts

7 Principal Component Analysis

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Iterative Methods for Linear Systems

HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS

CS6220: DATA MINING TECHNIQUES

What is it we are looking for in these algorithms? We want algorithms that are

Algebraic Multigrid Preconditioners for Computing Stationary Distributions of Markov Processes

Google Page Rank Project Linear Algebra Summer 2012

Introduction to Machine Learning

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Columbus State Community College Mathematics Department Public Syllabus

Numerical Methods I Non-Square and Sparse Linear Systems

Mathematical foundations - linear algebra

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Math 1553, Introduction to Linear Algebra

Linear Algebra Review. Fei-Fei Li

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLĂ„RNING

Scientific Computing

STA141C: Big Data & High Performance Statistical Computing

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS145: INTRODUCTION TO DATA MINING

Lecture 9: September 28

Transcription:

Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30

Scientific computing Involves the solution of large computational problems that arise in applications. Data Science: developing methods to extract knowledge from large data. Numerical algorithms for solving discrete and continuous problems. High performance computing: gaining efficiency and performance. The fundamental mathematical tools: Applied Linear Algebra. Mathematical Optimization. Partial Differential Equations. (Statistics). 2 / 30

Applied Linear Algebra The most basic tool. Simulations of natural phenomena. Matrices represent graphs (directed or undirected). Data measurements are often given in a form of a table - a matrix. We usually want To solve a linear system. To approximate a solution of a linear system (solve a least squares problem). Find eigenvalues and eigenvectors. 3 / 30

Applied Linear Algebra In many cases the matrices are sparse (graphs, meshes...): Sparse: O(n) non zeros out of n 2 entries. n reaches tens of millions, standard methods require O(n 3 ) for typical tasks. 4 / 30

Ranking of web-pages The problem: finding the leading eigenvector of a huge sparse (stochastic) matrix. 5 / 30

Mathematical Optimization In many cases we wish to optimize situations: Improve bad images. Earn more money in stocks. Reduce costs and time of some process. Reduce bottlenecks in airports/roads or in internet networks. Reduce storage of data (compression). We may also want to Characterize data that we collected (classification). Learn certain properties from data in order to predict future data. Reconstruct shapes from measurements (ultrasound, 3D scanning, CT, MRI). 6 / 30

Optimization problems All of these ideas are often formalized as Unconstrained minimization problem x = arg min f (x). x R n f (x) is a scalar cost function that we wish to minimize. x are the properties that we wish to find/learn. Minimization may be subject to constraints. 7 / 30

Hand-writing characterization Given hand written images and labels, learn how to classify letters. 8 / 30

Imaging applications Also: compression, compressed sensing, computed tomography. 9 / 30

Statistical covariance learning We learn Σ by maximizing the above probability of {x i }. 10 / 30

Statistical covariance learning applications Can be also used in other imaging and data science applications. 11 / 30

Optimization problems Back to our problem Unconstrained minimization problem x = arg min f (x). x R n f (x) is a scalar cost function that we wish to minimize. x may have hundreds of millions of unknowns. We wish to have an efficient, memory-friendly, and parallel methods (software) for solving such problems. 12 / 30

Partial differential equations (PDEs) A differential equation is an equation where a function is the unknown f (x) + f (x) = 0 f (x) = Asin(x) + Bcos(x) f (x) f (x) = 0 f (x) = Aexp(x) Equations are usually multidimensional: f = q. Equations are discretized on a mesh and solved numerically. This often leads to a sparse linear system, or a discrete time integration. 13 / 30

Partial differential equations (PDEs) The heterogenous (anisotropic) diffusion equation (u(x) unknown): σ(x) u(x) = q(x). The wave equation in frequency domain: u + ω 2 Mu = q We will focus on multigrid methods: 14 / 30

Parameter estimation of physical phenomena Combines optimization and PDEs. Full waveform inversion: given measurements of earth vibrations (from explosives) we estimate the structure of the earth s subsurface. Similar problems: ultrasound imaging, elecrtomagnetic imaging. 15 / 30

About the course Projects in groups of 2-3 students. Each group selects a project out of a list. Groups may suggest their own project. Final product: code + report (method description and demonstration). Programming language: Matlab, Julia, Python. C/C++ where necessary. The project is presented before the end of moed A exams. Grade: quality of the work, understanding the material, independence. 16 / 30

Project 1: Linear algebra Parallel LU solver for multiple right-hand-sides. The problem: Ax i = b i We use an LU factorization of A: A = LU i = 1,..., k where L - lower triangular. U - upper. The goal - use a multicore machine to solve all the equations in parallel given L, U. Requires programming in C using multithreading. Requires support in 16-bit operations. Suitable only for a group of two. 17 / 30

Project 2: Linear algebra Chebyshev linear iteration. The problem: Ax = b We assume that we have some iterative method: x (k+1) = x (k) M 1 (b Ax (k) ). Chebyshev iteration accelerates the method without inner products. Advantageous in parallel, low precision computations. Requires understanding the Chebyshev iteration and implementing it. Suitable only for a group of two. 18 / 30

Projects 3/4: Multigrid for general linear systems or PDEs Algebraic multigrid methods for Google s Pagerank. Bx = x, x = 1. Suitable for two or three. Geometric Multigrid for PDEs: σ(x) u(x) = q(x). Suitable for two. 19 / 30

Project 5: Sparse optimization Optimization problems in many fields are sometimes formulated to have a sparse solution. This is a way to reduce the number of parameters. Involves with a non-smooth optimization problem. Problems: LASSO, logistic regression, soft-max regression, covariance estimation. Applications: Machine learning, imaging. Suitable for two or three. 20 / 30

Project 6: Stochastic optimization A leading optimization method in machine learning. Works best when there is a lot of data. Problems: Least squares, LASSO, logistic regression, soft-max regression, covariance estimation, matrix completion. Applications: Machine learning, deep learning, imaging. Suitable for two or three. 21 / 30

Project 7: Fast convolutions Convolutions are a crucial and expensive part of deep learning. In most cases, learning of networks require a lot of time. The goal: efficient implementation of a deep learning convolution. Requires programming in C using multithreading. Requires understanding how to connect to Intel s CNN code. Suitable for two or three. 22 / 30

Project 8: Learning Gaussian Mixture Models Gaussian distribution are a fundamental statistical tool. Gaussian mixture are even more powerful: x i α i N (µ i, Σ i ) The goal: Understanding how to learn the parameters of Gaussian Mixture Models, and trying algorithms. Suitable for two or three. 23 / 30

Project 9: Deep learning for decision making A competition issued by psychologists in the Technion. Psychological models VS deep learning. The goal: build a deep learning model for predicting answers for questions. Suitable for three. 24 / 30

Project 10: Stochastic optimization for Full Waveform Inversion For FWI - standard and trace-estimation based methods are applicable. The goal: compare between the two methods. Suitable for three. 25 / 30

Length metrics - vector norms What is the length of a vector v R n? We usually use one of the following norms v 2 = ( n i=1 v 2 i ) 1 2, v 1 = n i=1 v i, v = max v i i A norm is always non-negative, and satisfy λv = λ v for λ R. We will use the scalar (dot) product u, u = i u i v i = u v. Also u 2 = u, u. 26 / 30

Quadratic optimization Quadratic optimization x = arg min x, Ax x, b. x R n where A R n n is symmetric, b R n. The answer for this problem is obtained by solving a linear system Ax = b. Definition (Positive definite matrices) A matrix A R n n is called positive definite (PD) if x, Ax > 0 x R n (A 0). If the matrix A is symmetric positive definite then the minimization above is well-defined and has a unique solution. 27 / 30

Least Squares Minimization Example (Best linear approximation from measurements.) Suppose that you are given with {(x i, y i )} n i=1 and wish to find scalars a, b such that ax i + b y i for all i in an optimal way. In matrix form [ ] x 1 1 a x 2 1 b.. x n 1 y 1 y 2. y n. 28 / 30

Least Squares Minimization Example (Best linear approximation from measurements.) A common choice for the optimality measure is the l 2 norm. We get a problem (ax i + b y i ) 2, or min a,b min a,b i x 1 1 x 2 1.. x n 1 [ a b ] y 1 y 2. y n This problem is also called the linear regression, and it boils to least squares minimization. This is just a quadratic minimization, which can be solved by solving a linear system. 2 2. 29 / 30

Least Squares Minimization Example (Best linear approximation from measurements.) In the language of some of our applications, we found parameters a, b that characterize the data {(x i, y i )} n i=1. If we did not have enough data (say, one point) then we would have needed more information/assumptions. 30 / 30