Linear Approximation with Regularization and Moving Least Squares

Similar documents
Generalized Linear Methods

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Composite Hypotheses testing

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Kernel Methods and SVMs Extension

Lecture Notes on Linear Regression

2.3 Nilpotent endomorphisms

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Feb 14: Spatial analysis of data fields

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 12: Discrete Laplacian

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Chapter 13: Multiple Regression

APPENDIX A Some Linear Algebra

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

MMA and GCMMA two methods for nonlinear optimization

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Laboratory 3: Method of Least Squares

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

Laboratory 1c: Method of Least Squares

Numerical Heat and Mass Transfer

Limited Dependent Variables

e i is a random error

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Lecture 12: Classification

Time-Varying Systems and Computations Lecture 6

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Chapter 11: Simple Linear Regression and Correlation

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Lecture 10 Support Vector Machines II

Solutions to exam in SF1811 Optimization, Jan 14, 2015

The exam is closed book, closed notes except your one-page cheat sheet.

x i1 =1 for all i (the constant ).

STAT 3008 Applied Regression Analysis

Chapter Newton s Method

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Section 3.6 Complex Zeros

Primer on High-Order Moment Estimators

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

NUMERICAL DIFFERENTIATION

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Singular Value Decomposition: Theory and Applications

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Economics 130. Lecture 4 Simple Linear Regression Continued

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Norms, Condition Numbers, Eigenvalues and Eigenvectors

The Geometry of Logit and Probit

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Complex Numbers. x = B B 2 4AC 2A. or x = x = 2 ± 4 4 (1) (5) 2 (1)

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Lecture 3 Stat102, Spring 2007

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

1 GSW Iterative Techniques for y = Ax

Lecture 4: September 12

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Difference Equations

STK4080/9080 Survival and event history analysis

Fisher Linear Discriminant Analysis

Classification as a Regression Problem

Lecture 3. Ax x i a i. i i

COS 521: Advanced Algorithms Game Theory and Linear Programming

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Deriving the X-Z Identity from Auxiliary Space Method

1 Convex Optimization

Rockefeller College University at Albany

Linear Feature Engineering 11

Errors for Linear Systems

ACTM State Calculus Competition Saturday April 30, 2011

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Chapter 9: Statistical Inference and the Relationship between Two Variables

Lecture 2: Prelude to the big shrink

Polynomial Regression Models

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Review: Fit a line to N data points

1 Matrix representations of canonical matrices

An (almost) unbiased estimator for the S-Gini index

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

STAT 511 FINAL EXAM NAME Spring 2001

Comparison of Regression Lines

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Transcription:

Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4

Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton... 4.. Soluton of over-determned system of equatons by QR decomposton... 6.. Statstcal bacground... 9. Weghted Least Squares Approxmaton of Functon Values and Gradents....3 Regularzaton of the Problem... 4.3. Addton of fcttous ponts... 5.3. Addton of mnmzng condtons wth respect to coeffcent sze... 7.3.3 Regularzaton by Addng the Mnmzng Condtons wth Respect to the Dfference n Coeffcents Obtaned by Related Approxmatons....4 Comments on Choce of Weghts... 3.5 General Weghted Least Squares... 8 Low Order Polynomal Approxmatons...8. Constant and Lnear Bass... 9. Quadratc Bass... 9.. One dmenson... 3.. wo dmensons... 3..3 hree dmensons... 3..4 Old (alternatve) mappng of coeffcents... 33.3 Cubc Bass... 35.4 Lnear Fttng wth Gradent data... 36.5 Quadratc Fttng wth Gradent data... 36 3 Movng least squares (MLS) approxmaton...38 4 Spatal dervatves of the MLS approxmaton anpproxmaton wth gradent nformaton 4 4. Normal system of equatons... 4 4.. Implementaton remars... 43 4.. Second order dervatves (normal system)... 46 4..3 Approxmaton wth values and gradents (normal system)... 48 4. Over-determned system of equatons... 50 5 Appendx...50 5. Quc remnder... 50 5.. WLS approxmaton... 50 5.. MLS approxmaton... 5 5..3 Implementaton remars... 55 5. Formulas for functon gradents... 60 6 Sandbox...

Change of notaton: m N v - number of ponts where approxmated functon s evaluated m N g - number of ponts where gradent of the approxmated functon s evaluated n N b - number of bass functons Use of ndces: ndex of samplng ponts, ndces of components of approxmaton coeffcents, components of rght-hand sde vector and components of the system matrx n systems of equatons l, m - components of co-ordnate dervatves t components of gradents of the sampled (approxmated) functon. LINEAR FIING. Weghted Least Squares n Functon Approxmaton We have values of some functon ( x) f, x R N, n N v ponts: ( ) =, =,..., f x y N. () v We would le to evaluate coeffcents of lnear combnaton of N b functons f ( x),, f ( x) such that N b = + + + n n = ( x; a) ( x) ( x)... ( x) ( x) f% a f a f a f a f, () ( x ) ( x ) = =,..., f % f y N, (3) v.e. we want that the lnear approxmaton (or approxmaton) agrees as much as possble wth values of f ( x) n all ponts x. We loo for the best agreement n the weghted least squares sense,.e. we mnmze the functon χ a a x x. (4) Nv Nv ( ) = φ ( ) = w ( y ( ) y ) = w a f ( ) y = = wth respect to parameters of approxmaton a. w s the m-dmensonal vector of weghts, whch weght sgnfcance of ponts x. Mnmum s the statonary pont of φ where n

d φ = 0 =,...,. (5) Dervatves of φ are d φ Nv = w a f ( ) y f x ( x ) (6) = Equaton ( 5) therefore gves the followng system of equatons for unnown coeffcents where and ( ( ) ( )) ( ) Nv Nv ( ) = = a : a w f x f x = w y f x, =,..., n (7) Coeffcents a can therefore be obtaned by solvng the lnear system of equatons Ca = d, (8) C = w f x f x (9) Nv = ( ) ( ) d = w f x y (0) Nv ( ) = he system of equatons ( 8) for calculaton of approxmaton coeffcents s calle normal system of equatons. It can be shown that C s postve-semdefnte. If C has a full ran n then t s postve-defnte, and the system can be solved by the Cholesy factorzaton C = V V. () (where V s upper trangular) followed by the solutons of a lower trangular system, V y = d, () ann upper trangular system, V a = y. (3)

.. Soluton of over-determned system of equatons by QR decomposton Here we pont at the relaton between the least squares formulaton ( 4), ( 8)and drect soluton of the over-determned system of equatons ( 3). We can wrte ntroduce matrx A and vector b such that and A ( ) = w f x (4) b = w y. (5) hen the equaton A a% = b (6) ( m n) ( n ) ( m ) reads component-wse as or ( x ) %, w f a = w y, w a% f ( x ) = w y, (7) whch s exactly ( 3), f we tae nto account ( ) and denote coeffcents by a% nstead of a. Equaton ( 6) (or ( 7) n component-wse notaton) s an over-determned system, therefore we can not dvde both sdes of each equaton by w because the ths woulffect the sgnfcance of ndvdual equatons and therefore the soluton (the system n ths case does not have an exact soluton and therefore the relatve sgnfcance of equatons s mportant). Now, we can show that the system ( 6) s n some sense equvalent to the system ( 8). hs s seen by observng that C = d = A A A b, (8).e. the least squares system of equatons ( 8) (also referred to as the normal system of equatons) s obtaned by left multplyng the over-determned system ( 6) by A. It can be shown that we can obtan the soluton of the normal system ( 8) by performng the QR decomposton of the matrx A form the system ( 6) []:

A = Q U (9) ( Nv ) ( Nv Nv ) Nv We denote by z soluton of the orthogonal system Q z = b,.e. z = Q b, (0) Matrx U s upper trapezod (by the QR decomposton). We wrte U and z n bloc form, U ( Nv ) V( ) = = > 0 (( Nv ) ) ( v 0, ), () z ( Nv ) y( ) = w( Nv ). () Now t follows from C = A A (tang nto account the decomposton and the bloc form) that C = A A = V V,.e. V s a Cholesy factor of the normal matrx C = A A. Wth QR factorzaton, we have avoded calculaton of C and ts Cholesy factorzaton. We can further verfy that d = A b = V y. (3) hs means that y, whch s the upper part of the transformed z, s the soluton of the lower trangular system ( ). he least squares soluton s therefore obtaned (accordng to ( 3)) by solvng the least squares system V a% = y. (4) Advantage of usng the QR factorzaton s that the matrx A s better condtoned than AA. If spectral senstvty of matrx A equals, then the spectral senstvty of AA = B s n. In ths expresson, s the largest and n n the smallest egenvalue of B. =========================== From ( ) we can see that = (, ) a f ( ) y x a x. (5) =

and therefore ( x,a) d y ( ) = f x = A w (6) We see that A w d y ( x, a) = (7) Sometmes we defne matrx X so that X ( x,a) d y = = f x ( ) = A. (8).. Statstcal bacground We have a model that predcts a set of measurements (observatons) y, whch s dependent on a set of unnown parameters a : ( ) y a. (9) In functon approxmaton, we have a model for a functon of one or a set of ndependent varables, y ( ) = y ( ; ) a x a (30) From the pont of vew of parameter estmaton, ths s the same as ( 9) because ndependent varables x are used ust to dstngush between dstnct measurements (to ndex the measurements, the same as ndex n ( 9)), anctual functonal relatons are not actually used. In the least squares formulaton, parameters a are estmated by mnmzng the sum of squares, mn χ Note that for lnear models, or n functon approxmaton, a ( ) ( y y ) Nv a =. (3) = ( ) = y a a f (3)

y = ( ; ) a f ( ) x a x (33) Both forms are equvalent, whch can be easly seen f we wrthe the second form ( 33) for x = x.... Statstcal bacground he statstcal bacground descrbed here appled for general least squares fttng (also nonlnear). For foundng the least squares procedures, we must assume that measurement errors are ndependently random and normally dstrbuted: y ~ e π y µ. (34) When fttng the parameters, we would le to fnd the parameters that are most lely to be correct. It s not meanngful to as e.g. What s the probablty that gven parameters a are correct. However, the ntuton tells us that the parameters for whch the model data doesn t loo le the measured data are unlely. We can as the queston Gven the partcular set of parameters, what s the probablty that the specfc data set could have occurred? If y tae contnuous values then we must say the probablty that y ± y occur. If ths probablty s very small then we conclude that the parameter under consderaton are unlely to be rght. Conversely, the ntuton tells us that the data should not be too unlely to mprobable for the correct model parameters. In other words, we ntutvely dentfy the probablty of data gven the parameters, as the lelhood of the parameters gven the data. hs s based on ntuton anhs no mathematcal bacground! We loo for parameters that maxmze the lelhood defned n the above way, and ths form of estmaton s the maxmum lelhood estmaton. Accordng to assumpton ( 34), the probablty of the data set s the product of probabltes for ndvdual data ponts: m y y P = exp y = (35) Maxmzng ths probablty s equvalent to mnmzng negatve of ts logarthm, ( y ( ) ) y m a mln y. = he pont s that there s ust one model the correct one, and there s a statstcal unverse of data sets that are drawn from that model.

Snce the last term s constant, mnmzng the equaton s equvalent to mnmzng ( 3). Remar: he dscusson s lmted to statstcal errors, whch we can average away (n a desred extent) f we tae enough data. Measurements are also susceptble systematc errors, whch can not be annhlated by any amount of averagng. In equatons ( 8)-( 0) and ( 4)-( 6), regardng statstcal argumentaton, we must set the weghts to w =. (36) wth Let us now estmate the uncertantes of the estmated parameters. he varance assocated a can be found from m a ( a ) = = y. (37) from ( 8) we have a ( x ) n m m y f = C d = C (38) = = = Snce C s ndependent of y, We wrte C = W Consequently, ( a ) ( ) a f x = y C. (39) = ( ) ( ) f x f x = n n m l WWl = l= =. (40) he fnal term n bracets n the above equaton s ust C. Snce ths s nverse of W, the equaton reduces do W,.e. ( a ) = C. (4) Off dagonal elements of C are covarances between Cov a (, a ) a an : = C (4) E.g. calbraton of a measurement equpment can depend on the temperature, and f we perform all measurements at a wrong temperature then averagng wll not reduce the systematc error.

... Non-normal dstrbuton of errors In the case of non-normal errors, we often do the followng thngs that are derved from the assumpton that the error dstrbuton s normal: Ft parameters by mnmzng χ Use contours of constant χ as the boundary of the confdence regon Use Monte Carlo smulatons or analytcal calculatons to determne whch contour of χ s the correct one for the desred confdence level Gve the covarance matrx C as the formal covarance matrx of the ft on the assumpton of normally dstrbuted errors Interpret C as the actual squared standard errors of the parameter estomaton. Weghted Least Squares Approxmaton of Functon Values and Gradents Sometmes we have gradent nformaton besde the values of a functon n a gven set of ponts, and we want to construct an approxmaton that best fts the specfed values and the gradents. In ths secton equatons are derved for approxmatons that consder both value and gradent data. We have values of some functon f ( x) and ts gradents n m ponts: ( ) = ( ) =, =,..., f x y f x g N. (43) g g v We would le to evaluate coeffcents of lnear combnaton of N b functons f ( x),, f ( x ) such that = + + + n n = ( x) ( x) ( x)... ( x) ( x) f% a f a f a f a f, (44) ( x ) ( x ) ( x ) g ( x ) g g g f % y = f =,..., N f % = f =,..., N, (45) v g g In order to eep the generalty of dervaton, we wll allow throughout the text that values and gradents of the approxmated functon f ( x ) are evaluated n dfferent sets of ponts (whch may however partally or fully concde). We wll denote the -th component of g by g. he gradent of the approxmaton s smply ( ) ( ) f x = a f x (46) N b

We want that the lnear approxmaton (or approxmaton) agrees as much as possble wth values of f ( x) and that ts gradent agrees as much as possble wth gradents of f ( x) n all ponts x. We loo for the best agreement n the weghted least squares sense,.e. we mnmze the functon Ng ( ) ( ) N y Nv φ = w ( y ( x ) y ) + w g t x g g g t = = g = t= x t Nv Ng N f w a f ( ) y + w ( ) gt a g x x g g t = g = t= x t. (47) wth respect to parameters of approxmaton a. w are N v weghts that wegh sgnfcance of values n ponts x and w gl w are N g N weghts that wegh sgnfcance of ndvdual gradent components n x. Mnmum s the statonary pont of g Dervatves of φ are d φ φ where = 0 =,...,. (48) d φ Nv = w a f ( ) y f ( ) + x x = + f f ( x ) ( x ) Ng N w gl a g g gt g g = t= xt xt (49) Equaton ( 48) therefore gves the followng system of equatons for unnown coeffcents a :