Lecture 5 Least-squares

Similar documents
EECS 275 Matrix Computation

Lecture Notes for EE263

Linear Least Square Problems Dr.-Ing. Sudchai Boonto

Linear algebra review

Lecture 19 Observability and state estimation

Least-squares data fitting

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017

Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Observability and state estimation

EE263 homework 3 solutions

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Math 407: Linear Optimization

1 Cricket chirps: an example

Typical Problem: Compute.

EE263: Introduction to Linear Dynamical Systems Review Session 2

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Least squares problems Linear Algebra with Computer Science Application

Lecture 6. Numerical methods. Approximation of functions

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Notes on Solving Linear Least-Squares Problems

Pseudoinverse & Moore-Penrose Conditions

Lecture 3: Linear Algebra Review, Part II

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Pseudoinverse and Adjoint Operators

Solutions to Review Problems for Chapter 6 ( ), 7.1

Chapter 6 - Orthogonality

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Linear dynamical systems with inputs & outputs

Integer Least Squares: Sphere Decoding and the LLL Algorithm

11 - The linear model

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

6. Approximation and fitting

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Lecture 6, Sci. Comp. for DPhil Students

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Vector Spaces, Orthogonality, and Linear Least Squares

AMS526: Numerical Analysis I (Numerical Linear Algebra)

3 Gramians and Balanced Realizations

Least squares: the big idea

Linear algebra for computational statistics

1 Inner Product and Orthogonality

Lecture 1: Review of linear algebra

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

MATH 323 Linear Algebra Lecture 6: Matrix algebra (continued). Determinants.

Lecture 4 Orthonormal vectors and QR factorization

Application of the LLL Algorithm in Sphere Decoding

Mathematical Methods

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

The Singular Value Decomposition

Chapter 6: Orthogonality

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Lecture: Quadratic optimization

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

8. Least squares. ˆ Review of linear equations. ˆ Least squares. ˆ Example: curve-fitting. ˆ Vector norms. ˆ Geometrical intuition

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.

Math Linear Algebra Final Exam Review Sheet

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

A Tutorial on Recursive methods in Linear Least Squares Problems

Linear Analysis Lecture 16

MATH 350: Introduction to Computational Mathematics

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Answers in blue. If you have questions or spot an error, let me know. 1. Find all matrices that commute with A =. 4 3

Compressed Sensing and Robust Recovery of Low Rank Matrices

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

Chapter 6. Orthogonality

2. Review of Linear Algebra

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

2. Every linear system with the same number of equations as unknowns has a unique solution.

Getting Started with Communications Engineering

6.241 Dynamic Systems and Control

MATH 22A: LINEAR ALGEBRA Chapter 4

Q T A = R ))) ) = A n 1 R

Algebra C Numerical Linear Algebra Sample Exam Problems

To find the least-squares solution to Ax = b, we project b onto Col A to obtain the vector ) b = proj ColA b,

5.6. PSEUDOINVERSES 101. A H w.

A Brief Outline of Math 355

Lecture 3 Linear Algebra Background

Homework 2 Solutions

Large Scale Data Analysis Using Deep Learning

Matrices Gaussian elimination Determinants. Graphics 2009/2010, period 1. Lecture 4: matrices

Convex Optimization: Applications

n n matrices The system of m linear equations in n variables x 1, x 2,..., x n can be written as a matrix equation by Ax = b, or in full

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

EE263 homework 2 solutions

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Elementary maths for GMT

Singular Value Decomposition

October 25, 2013 INNER PRODUCT SPACES

Orthogonality and Least Squares

Linear Least-Squares Data Fitting

Linear Methods for Regression. Lijun Zhang

DS-GA 1002 Lecture notes 10 November 23, Linear models

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

LINEAR ALGEBRA SUMMARY SHEET.

Computational Methods. Least Squares Approximation/Optimization

Transcription:

EE263 Autumn 2008-09 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property 5 1

Overdetermined linear equations consider y = Ax where A R m n is (strictly) skinny, i.e., m > n called overdetermined set of linear equations (more equations than unknowns) for most y, cannot solve for x one approach to approximately solve y = Ax: define residual or error r = Ax y find x = x ls that minimizes r x ls called least-squares (approximate) solution of y = Ax Least-squares 5 2

Geometric interpretation Ax ls is point in R(A) closest to y (Ax ls is projection of y onto R(A)) y r Ax ls R(A) Least-squares 5 3

Least-squares (approximate) solution assume A is full rank, skinny to find x ls, we ll minimize norm of residual squared, r 2 = x T A T Ax 2y T Ax + y T y set gradient w.r.t. x to zero: x r 2 = 2A T Ax 2A T y = 0 yields the normal equations: A T Ax = A T y assumptions imply A T A invertible, so we have x ls = (A T A) 1 A T y... a very famous formula Least-squares 5 4

x ls is linear function of y x ls = A 1 y if A is square x ls solves y = Ax ls if y R(A) A = (A T A) 1 A T is called the pseudo-inverse of A A is a left inverse of (full rank, skinny) A: A A = (A T A) 1 A T A = I Least-squares 5 5

Projection on R(A) Ax ls is (by definition) the point in R(A) that is closest to y, i.e., it is the projection of y onto R(A) Ax ls = P R(A) (y) the projection function P R(A) is linear, and given by P R(A) (y) = Ax ls = A(A T A) 1 A T y A(A T A) 1 A T is called the projection matrix (associated with R(A)) Least-squares 5 6

Orthogonality principle optimal residual r = Ax ls y = (A(A T A) 1 A T I)y is orthogonal to R(A): r, Az = y T (A(A T A) 1 A T I) T Az = 0 for all z R n y r Ax ls R(A) Least-squares 5 7

Completion of squares since r = Ax ls y A(x x ls ) for any x, we have Ax y 2 = (Ax ls y) + A(x x ls ) 2 = Ax ls y 2 + A(x x ls ) 2 this shows that for x x ls, Ax y > Ax ls y Least-squares 5 8

Least-squares via QR factorization A R m n skinny, full rank factor as A = QR with Q T Q = I n, R R n n upper triangular, invertible pseudo-inverse is so x ls = R 1 Q T y (A T A) 1 A T = (R T Q T QR) 1 R T Q T = R 1 Q T projection on R(A) given by matrix A(A T A) 1 A T = AR 1 Q T = QQ T Least-squares 5 9

Least-squares via full QR factorization full QR factorization: A = [Q 1 Q 2 ] [ R1 0 ] with [Q 1 Q 2 ] R m m orthogonal, R 1 R n n upper triangular, invertible multiplication by orthogonal matrix doesn t change norm, so Ax y 2 = = [ ] 2 [Q R1 1 Q 2 ] x y 0 [ ] [Q 1 Q 2 ] T R1 [Q 1 Q 2 ] x [Q 0 1 Q 2 ] T y 2 Least-squares 5 10

[ ] = R1 x Q T 1 y 2 Q T 2 y = R 1 x Q T 1 y 2 + Q T 2 y 2 this is evidently minimized by choice x ls = R 1 1 QT 1 y (which makes first term zero) residual with optimal x is Ax ls y = Q 2 Q T 2 y Q 1 Q T 1 gives projection onto R(A) Q 2 Q T 2 gives projection onto R(A) Least-squares 5 11

Least-squares estimation many applications in inversion, estimation, and reconstruction problems have form y = Ax + v x is what we want to estimate or reconstruct y is our sensor measurement(s) v is an unknown noise or measurement error (assumed small) ith row of A characterizes ith sensor Least-squares 5 12

least-squares estimation: choose as estimate ˆx that minimizes i.e., deviation between Aˆx y what we actually observed (y), and what we would observe if x = ˆx, and there were no noise (v = 0) least-squares estimate is just ˆx = (A T A) 1 A T y Least-squares 5 13

BLUE property linear measurement with noise: with A full rank, skinny y = Ax + v consider a linear estimator of form ˆx = By called unbiased if ˆx = x whenever v = 0 (i.e., no estimation error when there is no noise) same as BA = I, i.e., B is left inverse of A Least-squares 5 14

estimation error of unbiased linear estimator is x ˆx = x B(Ax + v) = Bv obviously, then, we d like B small (and BA = I) fact: A = (A T A) 1 A T is the smallest left inverse of A, in the following sense: for any B with BA = I, we have Bij 2 i,j i,j A 2 ij i.e., least-squares provides the best linear unbiased estimator (BLUE) Least-squares 5 15

Navigation from range measurements navigation using range measurements from distant beacons beacons k 4 unknown position x k 3 k 1k2 beacons far from unknown position x R 2, so linearization around x = 0 (say) nearly exact Least-squares 5 16

ranges y R 4 measured, with measurement noise v: y = k T 1 k T 2 k T 3 k T 4 where k i is unit vector from 0 to beacon i x + v measurement errors are independent, Gaussian, with standard deviation 2 (details not important) problem: estimate x R 2, given y R 4 (roughly speaking, a 2:1 measurement redundancy ratio) actual position is x = (5.59,10.58); measurement is y = ( 11.95, 2.84, 9.81, 2.81) Least-squares 5 17

Just enough measurements method y 1 and y 2 suffice to find x (when v = 0) compute estimate ˆx by inverting top (2 2) half of A: ˆx = B je y = [ 0 1.0 0 0 1.12 0.5 0 0 ] y = [ 2.84 11.9 ] (norm of error: 3.07) Least-squares 5 18

Least-squares method compute estimate ˆx by least-squares: ˆx = A y = [ 0.23 0.48 0.04 0.44 0.47 0.02 0.51 0.18 ] y = [ 4.95 10.26 ] (norm of error: 0.72) B je and A are both left inverses of A larger entries in B lead to larger estimation error Least-squares 5 19

Example from overview lecture u w y H(s) A/D signal u is piecewise constant, period 1sec, 0 t 10: u(t) = x j, j 1 t < j, j = 1,...,10 filtered by system with impulse response h(t): w(t) = t 0 h(t τ)u(τ) dτ sample at 10Hz: ỹ i = w(0.1i), i = 1,...,100 Least-squares 5 20

3-bit quantization: y i = Q(ỹ i ), i = 1,...,100, where Q is 3-bit quantizer characteristic Q(a) = (1/4)(round(4a + 1/2) 1/2) problem: estimate x R 10 given y R 100 example: s(t) u(t) w(t) y(t) 1 0 1 0 1 2 3 4 5 6 7 8 9 10 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 9 10 1 0 1 0 1 2 3 4 5 6 7 8 9 10 1 0 1 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5 21

we have y = Ax + v, where A R 100 10 is given by A ij = j j 1 h(0.1i τ) dτ v R 100 is quantization error: v i = Q(ỹ i ) ỹ i (so v i 0.125) least-squares estimate: x ls = (A T A) 1 A T y u(t) (solid) & û(t) (dotted) 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5 22

RMS error is x x ls 10 = 0.03 better than if we had no filtering! (RMS error 0.07) more on this later... Least-squares 5 23

some rows of B ls = (A T A) 1 A T : row 2 row 5 row 8 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 t rows show how sampled measurements of y are used to form estimate of x i for i = 2,5, 8 to estimate x 5, which is the original input signal for 4 t < 5, we mostly use y(t) for 3 t 7 Least-squares 5 24