Stat. 758: Computation and Programming

Similar documents
Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney

Course Notes: Week 1

Scientific Computing

Numerical Methods I Non-Square and Sparse Linear Systems

Introduction, basic but important concepts

SOLVING LINEAR SYSTEMS

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

1 Lecture 8: Interpolating polynomials.

Consider the following example of a linear system:

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

LINEAR SYSTEMS (11) Intensive Computation

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

Process Model Formulation and Solution, 3E4

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Matrix decompositions

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

Numerical Linear Algebra

Lecture 3. Big-O notation, more recurrences!!

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Next topics: Solving systems of linear equations

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Solving linear systems (6 lectures)

Lecture 10: Powers of Matrices, Difference Equations

Numerical Methods I: Numerical linear algebra

Linear System of Equations

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CSE 160 Lecture 13. Numerical Linear Algebra

Iterative Methods. Splitting Methods

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Lecture Note 2: The Gaussian Elimination and LU Decomposition

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Gaussian Elimination and Back Substitution

Approximating a single component of the solution to a linear system

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

MAA507, Power method, QR-method and sparse matrix representation.

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

9 - Markov processes and Burt & Allison 1963 AGEC

JACOBI S ITERATION METHOD

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Stat 206: Linear algebra

7.2 Linear equation systems. 7.3 Linear least square fit

Things we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Example: Current in an Electrical Circuit. Solving Linear Systems:Direct Methods. Linear Systems of Equations. Solving Linear Systems: Direct Methods

An Introduction to NeRDS (Nearly Rank Deficient Systems)

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Linear Algebra and Eigenproblems

Computational Linear Algebra

ECS289: Scalable Machine Learning

Chapter 9: Gaussian Elimination

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Matrix decompositions

Basic Linear Algebra. Florida State University. Acknowledgements: Daniele Panozzo. CAP Computer Graphics - Fall 18 Xifeng Gao

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

P, NP, NP-Complete, and NPhard

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Solving PDEs with CUDA Jonathan Cohen

Simulation. Where real stuff starts

, and rewards and transition matrices as shown below:

Computational statistics

Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

OR MSc Maths Revision Course

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

9. Numerical linear algebra background

Perform the same three operations as above on the values in the matrix, where some notation is given as a shorthand way to describe each operation:

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

arxiv: v1 [cs.sc] 17 Apr 2013

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

Math 304 (Spring 2010) - Lecture 2

Qualifying Examination

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Review from Bootcamp: Linear Algebra

9.1 Preconditioned Krylov Subspace Methods

Linear Models for Regression CS534

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra /34

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

9. Numerical linear algebra background

1 What is numerical analysis and scientific computing?

STA 294: Stochastic Processes & Bayesian Nonparametrics

Transcription:

Stat. 758: Computation and Programming Eric B. Laber Department of Statistics, North Carolina State University Lecture 4a Sept. 10, 2015

Ambition is like a frog sitting on a Venus flytrap. The flytrap can bite and bite, but it won t bother the frog because it only has little tiny plant teeth. But some other stuff could happen and it could be like ambition. Chiu Chang Suan Shu The word matrix means womb in latin. The title of that stupid Keanu Reeves movie finally makes sense. Terry L. Laber

House keeping Beach trip happened! (A thing you should do!) I will give a week s notice for each quiz HW 1 is due September 22 but HW2 will be up before then Python is on the lab machines in SAS hall Work together! But turn in our your own HW and write your own code!

Warm-up Explain to your stat buddy How linear systems might arise in statistics What is big-o notation? What is Gaussian elimination? (As a child, I always thought this must be some form of assassination) True or false: The mathematician who coined the term matrix for arrangments of numbers as rectangular arrays fled the U.S. after killing a student with a newspaper stick Research on solving linear least squares problems ceased with the invention of modern computing software An Irish penny has a harp on one side and a chicken on the other (this is why they ask harps or chickens before kickofff in Ireland)

Big-O From your mathematics days, f, g : D R f (x) = O {g(x)} as x x 0 lim x x 0 f (x) L g(x), for some fixed constant L True or false f (x) = O {g(x)}: f (x) = 4x 2 10x, g(x) = 10x 2 + x + 2, x f (x) = x log(x 2 ), g(x) = x log(x), x 0 f (n) = log(n!), g(n) = nlog(n), n Z +, n f (x) = x 3 + x 2 log(x), g(x) = 2x 2 log(x) + x, x

Little-O Assume f, g : D R f (x) = o {g(x)} as x x 0 lim x x 0 f (x) g(x) = 0 alternatively, for any constant κ > 0 there exists ɛ κ such that f (x) κ g(x) if x x 0 ɛ κ True or false f (x) = o {g(x)} f (x) = x 2 + x, g(x) = x + 1 as x 0 f (x) = x log(x), g(x) = x 2 as x f (x) = x log(x), g(x) = x + x log(x) as x

Oh-pee! We often require probabilistic notions of big and little O O p through examples We say r.v. X = OP (1) if lim L P( X L) = 1 Given sequence of r.v. s {(Xn, Y n )} n 1 we say X n = O P (Y n ) if for any ɛ > 0 there exists L ɛ s.t. for all sufficiently large n P ( X n L ɛ Y n ) 1 ɛ Ex. Suppose X1,..., X n are i.i.d. with finite mean µ and variance-covariance Σ show X n = O P (n 1/2 )

Oh-pee! cont d More general defn: let X(t), Y(t) be stochastic processes indexed by t T, say X(t) = O P {Y(t)} as t t 0 if for any ɛ there exists L ɛ > 0 and δ ɛ > 0 so that if t t 0 δ ɛ P { X(t) L ɛ Y(t) } 1 ɛ

Op-pee! cont d o p through examples We say X n = o P (a n ) to mean P( X n /a n > ɛ) 0 as n for any ɛ > 0 (think delta-method) We say X n = o P (Y n ) if for any κ > 0 and ɛ > 0 P ( X n κ Y n ) 1 ɛ for all sufficiently large n Ex. prove that if Xn = O P (1) and Y n = o P (1) then X n Y n = o P (1)

Why do we care? Big O notation used to characterize deterministic algorithm complexity Big O P notation used heavily in asymptotic analyses Stochastic approximation algorithms Bounding Monte Carlo error Dealing with remainder terms in asymptotic expansions

Flops We describe algorithms in the number of floating point operations (flops) Addition, subtraction, multiplication, division Built-in functions, e.g,. exp() harder to evaluate How many flops to compute Av where A R n p and v R p? How many flops to compute A A??

Linear systems: warm-up Let A R p p and b R p, want soln to Ax = b If A were upper triangular how would you solve for x? Go over linsys.ipynb

Linear systems: Gaussian elimination I Triangular systems rare in practice I I Idea! Transform general linear system to triangular system Ax = b BAx = Bb if B, thus sufficient to find B so that BA is triangular invertible

Linear systems: Gaussian elimination cont d Primary school example, reduce to triangular system 1.0 2.0 1.0 0.0 x 1 0.5 1.0 0.0 1.0 x 2 0.0 2.0 0.5 1.5 x 3 = 1.0 1.0 1.5 0.0 x 4 0.5 1.0 1.5 2.0

Linear systems: Gaussian elimination cont d Algorithm for Gaussian elimination to triangular system A (0) = A, (B (1) ) i,1 = A (0) if j 1 i,1 /A(0) 1,1 if i > 1 and (B(1) ) i,j = 1 i=j Recursively for k = 1,..., p 1 A (k+1) = B (k) A (k 1), and (B (k+1) ) i,k+1 = A (k) i,k+1 /A(k) k+1,k+1 if i > k + 1 and (B (k+1) ) i,j = 1 i=j if j k + 1 (k,k) We assume that A k+1,k+1 0 for all k, which need not hold in general, your will fix this in HW2! Back to linsys.ipynb

Iterative methods for large systems Gaussian elimination requires O(p 3 ) operations Manageable for small/moderate-sized problems When p is large iterative methods may be preferable especially if the matrix is sparse Canonical example: Gauss-Seidel iteration for Ax = b Suppose we knew {x j j i} solve for x i via x i = b i k i A i,kx k A i,i Idea! start with initial guess, x 0, then repeatedly update each component of our guess using the above updates

Gauss-Seidel pseudo code Input: x (0), A, b Set m = 0 Repeat forever x (m+1) = x (m) For i = 1,..., p x (m+1) i If x (m+1) x (m) ɛ break m = m + 1 = (b i k i A i,kx k )/A i,i With your stat buddy: convert this to python code!

Sparse matrices Many applications in statistics involve large sparse matrices Functional data analysis Markov decision processes Matrix completion problems Graphical models... Computational savings obtained by exploiting sparsity Save memory: store only non-zero elements Save flops: matrix ops only with non-zero elements

Dictionary of keys Suppose our lin. sys. Ax = b has A = 0 1 0 2 3 0 0 0 1 0 0 4 0 0 8 9 we can store this as the set of triples { (1, 2, 1), (1, 4, 2), (2, 1, 3), (3, 1, 1), (3, 2, 1), (3, 4, 4), (4, 3, 8), (4, 4, 9) } More convenient to store as an associate array where each pair of indices is associated with its respective matrix value, i.e., (1, 2) 1, (1, 4) 2,..., (4, 4) 9

Dictionary of keys cont d Dictionary of keys (DOK) storage format is a set of key value pairs Key: indices of non-zero matrix elements Value: non-zero matrix elements Store matrix A as {(i, j) A i,j : A i,j 0} First part of sparsemats.ipynb

Compressed row storage DOK is intuitive and useful for constructing sparse matrices Slower than alternatives for numerical operations E.g., matrix-vector mult can be slower than dense case Compressed row storage (CRS) faster for numerical operations Pattern: construct with DOK convert to CRS Suppose that A = 0 1 0 2 3 0 0 0 1 0 0 4 0 0 8 9 CRS stores this as three arrays: Value 1 2 3 1 4 8 9 Col. 2 4 1 1 4 3 4 Row 1 3 4 6 8

Compressed row storage cont d With your stat buddy: Convert to dense format Value 1 2 2 3 5 7 6 8 9 Col. 1 1 2 1 3 4 2 3 4 Row 1 2 4 7 10 Convert to CSR format A = 1 1 0 2 0 0 4 0 1 0 0 0 0 0 8 9 Back to sparsemats.ipynb

Cholesky decomposition If A is symmetric positive definite then A = LL, where L is lower triangular Solve Ax = b by solving triangular systems Ly = b L x = y

Cholesky decomposition cont d Algorithm to compute A = LL similar to Gaussian elimination GE and Cholesky both O(p 3 ) but Cholesky better constant Generally Cholesky is more stable Generate Z Normal p (µ, Σ) via 1. Compute Σ = LL 2. Generate W Normal p (0, I p ) 3. Set Z = LW + µ

Break: Warm-up quiz II Explain to your stat buddy: What is a random walk? What is a Brownian bridge? What is importance sampling? True of false: Brownian motion was invented by Cavell Brownie The term Monte Carlo was a code-name for stochastic computer experiments related to nuclear research during WWII Hotter than Satan s Toenails is the name of a nail salon in Chattanooga TN

Ex. Brownian motion Brownian motion shows up frequently in asymptotic statistics Recall {X (t) : t 0} is a Brownian motion process if: (P1) X (0) = 0; (wp1) (P2) {X (t) : t 0} has ind. increments (what does this mean?) (P3) X (t) Normal(0, c 2 t) for all t 0 (We will assume c = 1 hereafter)

Ex. Brownian motion cont d Goal: simulate Brownian motion Problem: computer cannot simulate continuous values Idea: discretize interval [0, T ], 0 = t0 < t 1 < < t n = T and simulate {X (t 1 ),..., X (t n )} Fact: {X (t 1 ),..., X (t n )} is normally distributed with mean 0 and variance-covariance (Σ i,j ) = min(t i, t j ) (see HW2)

Generating random functions In some applications necessary to generate random smooth functions over some domain (e.g., time, space, etc.) Grown curves Depression scores Humidity... Basic idea Choose space for domain of random function Choose basis for this space Generate random linear combinations of basis functions

Review: basis functions Recall: basis for space of functions, F, is a collection {b j } j 1 in F so that for any f F {λ j } j 1 that satisfy f = j 1 λ j b j Stone-Weierstrass theorem: every continuous function on [0, 1] can be uniformly approximated by a polynomial function. Thus, a basis for the space C[0, 1] is {x j 1 } j 1.

Genenerating a random function in F Goal: generate a random element of F Random linear combination of basis functions: Let {b j } j 1 be a basis for F Choose finite truncation J Generate random loadings λ 1,..., λ J, e.g., i.i.d. normal Define f = J j=1 λ jb j, i.e., f (x) = J j=1 λ jb j (x)

Ex. Fourier basis A Fourier basis has the form cos ( ) πx 2 b j (x) = } sin { (j+1)πx 2 if j even if j odd Dense in L 2 [0, 1] (square integrable functions on [0, 1]) Go over fourierrando.py

In class example Dependent spatial binary data (on board)