Review: Fit a line to N data points

Similar documents
Error Bars in both X and Y

Lecture Notes on Linear Regression

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Linear Approximation with Regularization and Moving Least Squares

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

IV. Performance Optimization

Composite Hypotheses testing

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Strong Lens Modeling (I): Principles and Basic Methods

Report on Image warping

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

The Ordinary Least Squares (OLS) Estimator

Chapter Newton s Method

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Statistical pattern recognition

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Classification as a Regression Problem

Feb 14: Spatial analysis of data fields

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Tensor Analysis. For orthogonal curvilinear coordinates, ˆ ˆ (98) Expanding the derivative, we have, ˆ. h q. . h q h q

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

LINEAR REGRESSION MODELS W4315

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

SPANC -- SPlitpole ANalysis Code User Manual

VECTORS AND MATRICES:

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Polynomial Regression Models

Chapter 11: Simple Linear Regression and Correlation

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

The Geometry of Logit and Probit

Differentiating Gaussian Processes

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Lecture 3 Stat102, Spring 2007

Least-Squares Fitting of a Hyperplane

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

1 Matrix representations of canonical matrices

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 10 Support Vector Machines II

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

Lecture 4: Universal Hash Functions/Streaming Cont d

Primer on High-Order Moment Estimators

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Generalized Linear Methods

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Newton s Method for One - Dimensional Optimization - Theory

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Maximum Likelihood Estimation (MLE)

Lecture 3: Dual problems and Kernels

Lecture 21: Numerical methods for pricing American type derivatives

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Four Bar Linkages in Two Dimensions. A link has fixed length and is joined to other links and also possibly to a fixed point.

Relevance Vector Machines Explained

Support Vector Machines

APPENDIX A Some Linear Algebra

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Technical Advisor: Dr. D.C. Look, Jr. Version: 11/03/00

Nice plotting of proteins II

Mathematical Preparations

Important Instructions to the Examiners:

Tracking with Kalman Filter

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Some basic statistics and curve fitting techniques

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kernel Methods and SVMs Extension

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Which Separator? Spring 1

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Regression Analysis. Regression Analysis

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Spectral Clustering. Shannon Quinn

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Math 217 Fall 2013 Homework 2 Solutions

Strong Markov property: Same assertion holds for stopping times τ.

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

STAT 3008 Applied Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 2: Prelude to the big shrink

Biostatistics 360 F&t Tests and Intervals in Regression 1

Singular Value Decomposition: Theory and Applications

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming

PHYS 705: Classical Mechanics. Calculus of Variations II

Transcription:

Revew: Ft a lne to data ponts Correlated parameters: L y = a x + b Orthogonal parameters: J y = a (x ˆ x + b For ntercept b, set a=0 and fnd b by optmal average: ˆ b = y, Var[ b ˆ ] = For slope a, set b=0 and fnd a by optmal scalng: ˆ a = y ( x x ˆ ( x x ˆ o need to terate. (Why?, Var[ a ˆ ] = Pvot pont: ˆx x ( x x ˆ ˆ x ˆ y

Ft a lne => ft patterns => ft M patterns Model : y = b + a x ˆ x ( = α 0 P 0 (x + α P (x ( Patterns : P 0 (x = P (x = x ˆ x Optmal Scalng: ˆ α 0 = ˆ α = ( y α ˆ P P 0 P 0 ( y α ˆ 0 P 0 P P, Var[ α ˆ 0 ] =, Var[ α ˆ ] = P 0 P ˆx x ˆ Pvot pont: x ˆ y Iterate (f patterns not orthogonal. LIEAR REGRESSIO: Generalse model to M patterns: M k= y = α k P k (x Iterated Optmal Scalng: smple algorthm, easy to code, often adequate.

Example: Sne Curve + Background Data : X ± at t = t Model : X(t = A + S sn(ω t + Ccos(ω t 3 Patterns :, s = sn(ω t, c = cos(ω t Iterated Optmal Scalng: ˆ A = ˆ S = ˆ C = ( X S ˆ s C ˆ c ( X A ˆ C ˆ c s s ( X A ˆ S ˆ s c c, Var[ A ˆ ] =, Var[ S ˆ ] =, Var[ C ˆ ] = Iterate ( f patterns not orthogonal. s c Varance formulas assume orthogonal parameters, otherwse gve error bars too small. Use nverse of Hessan matrx (see later.

χ = 0 = χ a = y (a x + b x ( y a x b σ 0 = χ b = ( y a x b σ The ormal Equatons : a x σ + b x σ = x y σ a x σ + b σ = y σ Matrx form : Σ x /σ Σ x /σ Σ x /σ Σ /σ a b = Σ x y /σ Σ y /σ H α = c(y Soluton : α = H c(y χ analyss of the straght lne ft ( H = Hessan matrx y = a x + b

ormal Equatons : H α = c(y Σ x /σ Σ x /σ Σ x /σ Σ/σ a b = Σ x y /σ Σ y /σ Soluton : a = b Δ α = H c(y Σ/σ Σ x /σ Σ x y /σ Σ x /σ Σ x /σ Σ y /σ Hessan Determnant : Δ = ( Σ/σ ( Σ x /σ ( Σ x /σ Orthogonal bass : x (x ˆ x Σ (x ˆ x /σ = 0, Δ = Σ/σ a = b Δ a ˆ = Σ (x x ˆ y /σ Σ (x x ˆ /σ x ˆ ( Σ x /σ ( Σ/σ ( ( Σ (x x ˆ /σ Σ/σ 0 Σ (x x ˆ y /σ 0 Σ (x x ˆ /σ Σ y /σ b ˆ = Σ y /σ Σ/σ χ analyss of the straght lne ft ( same as Optmal Scalng y = a x + b ( Dagonal Hessan Matrx

H jk Example: y = a x + b. χ a = x The Hessan Matrx χ a j a k, / χ a b = x / x χ /σ x /σ b = /σ, so H = x / /σ χ χ = a = y (a x + b x ( y a x b σ χ b = ( y a x b σ For lnear models, Hessan matrx s ndependent of the parameters, and χ surface s parabolc.

Parameter Uncertantes Hessan matrx descrbes the curvature of the χ surface : χ (α = χ ( ˆ α + j,k ( α j α ˆ j H j k α k ˆ ( +... α k H jk χ a j a k, For lnear models, Hessan matrx s ndependent of the parameters, and χ surface s parabolc. For a one-parameter ft: f ˆ α mnmzes χ, then Var( ˆ α = χ / α. For a mult-parameter ft the covarance of any par of parameters s an element of the nverse-hessan matrx: Cov(a k,a j = χ α k α j = [ H ] k j

Prncpal Axes of the χ Ellpsod b b Egenvectors of H defne the prncpal axes of the χ ellpsod. Equvalent to rotatng the coordnate system n parameter space. y = a x + b = a x cosθ snθ ( + b xsnθ + cosθ ( θ a a Can also dagonalse H by : a x + b a (x x ˆ + b Ths shears the parameter space, gvng H = x ˆ / 0 0 / σ Dagonalsng the Hessan matrx orthogonalses the parameters. b a

Scale M Patterns: General Lnear Regresson Example: Lnear Model: y(x = a P (x + a P (x +... = M k Polynomal : y(x = a 0 + a x + a x +...+ a M x M a k P k (x χ = y y ormal Equatons: = = 0 = χ # M & = y a j P j a % ( k $ j ' M y a j P j σ j P k k =...M M # P j P k & y % ( ( a j = P k P k P k $ ' j M P H j k a j = c k (y H j k = j P k c k (y = j y P k

Prncpal Axes for general Lnear Models In the general lnear case we ft M functons P k (x wth scale factors a k : M y(x = a k P k (x k= The (M x M Hessan matrx has elements: H jk χ a j a k = ormal equatons (M equatons for M unknowns: H jk a k = c j where c j = = P j P k y P j Ths gves M-dmensonal ellpsodal surfaces of constant χ whose prncpal axes are the M egenvectors of the Hessan matrx H. Use standard matrx methods to fnd lnear combnatons of P that dagonalse H. ( More detals later =

Lnear vs on-lnear Models Lnear Model: y(x = H j k χ = α j α k on - Lnear Models : power - law : y = A x B = M blackbody : f ν = Ω B ν (λ,t k α k P k (x P j P k H j k χ depends on parameters α j α k χ (α = χ ( α ˆ + α j α ˆ j ( α k +... j,k ( H j k α k ˆ M scale parameters α k Ellptcal χ contours, unque soluton by lnear regresson (matrx nverson. Banana-shaped contours, multple local mnma, requre teratve methods.

Method : Lnearse the on-lnear Model Lnearsaton: use local lnear approxmaton to the model, gvng a quadratc approxmaton to χ surface. Solve by lnear regresson, then terate. Example : gaussan peak + background : µ = A g + B g e η / Δµ ΔA µ A + ΔB µ B + Δx 0 µ A = g µ = A g η /σ x 0 µ B = µ σ = A g η /σ η x x 0 σ µ + Δσ µ x 0 σ µ A = g µ x 0 µ σ Guess x 0 and σ, ft lnear parameters A and B, evaluate dervatves, adjust x 0 and σ usng lnear approxmaton, terate. (Levenberg-Marquadt method: add constant to Hessan dagonal to prevent over-steppng. See e.g. umercal Recpes. A and B are scale parameters. µ B = x0 and σ are non-lnear parameters.

Method : Amoeba (Downhll Smplex Amoeba (downhll smplex Smplex = cluster of M+ ponts n the M-dmensonal parameter space. 3. Evaluate χ at each node.. Pck node wth hghest χ, move t on a lne thru the centrod of the other M nodes, usng smple rules to fnd new place wth lower χ. 3. Repeat untl converged. 4 5 6 8 7 Amoeba requres no dervatves : Amoeba crawls downhll, adjustng shape to match the χ landscape, then shrnks down onto a local mnmum. See umercal Recpes for full descrpton.

Method 3: Markov Chan Monte Carlo (MCMC. Start wth M+ ponts n the M-dmensonal parameter space.. Evaluate for each parameter (and covarance matrx from last n ponts. 3. Take a random step, e.g. usng a Gaussan random number wth same (and covarances as recent ponts. Δα ~ G(0, 4. Evaluate Δχ = χ new - χ old and keep the step wth probablty 5. Iterate steps -4 untl convergence. P = mn, exp Δχ / ( MCMC requres no dervatves : MCMC generates a chan of ponts tendng to move downhll, then settlng nto a pattern matchng the posteror dstrbuton of the parameters. Can escape from local mnma. Can also nclude pror dstrbutons on the parameters.

Example: MCMC ft of exoplanet model to transt lghtcurves and radal velocty curve data. 0.5 Relatve radal velocty (km s - 0-0.5 - -0.08 0.6 0.8..4 Transt phase -0.06-0.04 Dfferental magntude -0.0 0 0.0 0.04 0.06 0.08 0. 0. 0.4 0.95.05 Transt phase