Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Similar documents
Topic 5a Introduction to Curve Fitting & Linear Regression

COS 424: Interacting with Data. Written Exercises

Kernel Methods and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

OBJECTIVES INTRODUCTION

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Feature Extraction Techniques

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Bayes Decision Rule and Naïve Bayes Classifier

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Non-Parametric Non-Line-of-Sight Identification 1

A Note on the Applied Use of MDL Approximations

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

List Scheduling and LPT Oliver Braun (09/05/2017)

Projectile Motion with Air Resistance (Numerical Modeling, Euler s Method)

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Least Squares Fitting of Data

a a a a a a a m a b a b

Homework 3 Solutions CSE 101 Summer 2017

A Simple Regression Problem

A method to determine relative stroke detection efficiencies from multiplicity distributions

Boosting with log-loss

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Block designs and statistics

Bipartite subgraphs and the smallest eigenvalue

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

When Short Runs Beat Long Runs

Support Vector Machines MIT Course Notes Cynthia Rudin

Research in Area of Longevity of Sylphon Scraies

On Rough Interval Three Level Large Scale Quadratic Integer Programming Problem

lecture 36: Linear Multistep Mehods: Zero Stability

Introduction to Machine Learning. Recitation 11

Chaotic Coupled Map Lattices

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Introduction to Discrete Optimization

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Least squares fitting with elliptic paraboloids

Lecture 21. Interior Point Methods Setup and Algorithm

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

The Methods of Solution for Constrained Nonlinear Programming

Some Perspective. Forces and Newton s Laws

REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION

arxiv: v1 [cs.ds] 3 Feb 2014

Probability Distributions

lecture 37: Linear Multistep Methods: Absolute Stability, Part I lecture 38: Linear Multistep Methods: Absolute Stability, Part II

MODIFICATION OF AN ANALYTICAL MODEL FOR CONTAINER LOADING PROBLEMS

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437

CHAPTER ONE. Physics and the Life Sciences

Analyzing Simulation Results

Research Article Approximate Multidegree Reduction of λ-bézier Curves

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Measuring Temperature with a Silicon Diode

A Model for the Selection of Internet Service Providers

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

On Constant Power Water-filling

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Estimating Parameters for a Gaussian pdf

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

1 Proof of learning bounds

Principal Components Analysis

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

Problem Set 2. Chapter 1 Numerical:

Biostatistics Department Technical Report

I. Understand get a conceptual grasp of the problem

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

National 5 Summary Notes

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Machine Learning Basics: Estimators, Bias and Variance

Hybrid System Identification: An SDP Approach

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN

Determining OWA Operator Weights by Mean Absolute Deviation Minimization

MODULAR HYPERBOLAS AND THE CONGRUENCE ax 1 x 2 x k + bx k+1 x k+2 x 2k c (mod m)

The Wilson Model of Cortical Neurons Richard B. Wells

Convex Programming for Scheduling Unrelated Parallel Machines

PROXSCAL. Notation. W n n matrix with weights for source k. E n s matrix with raw independent variables F n p matrix with fixed coordinates

Bootstrapping Dependent Data

Kinetic Theory of Gases: Elementary Ideas

Reading from Young & Freedman: For this topic, read the introduction to chapter 25 and sections 25.1 to 25.3 & 25.6.

Ch 12: Variations on Backpropagation

The Weierstrass Approximation Theorem

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

The Transactional Nature of Quantum Information

Pattern Recognition and Machine Learning. Artificial Neural networks

Randomized Recovery for Boolean Compressed Sensing

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Transcription:

Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential behavior), and then choose the ost appropriate odel by deterining which odel fits best. Before discussing criteria on which to base curve-fitting decisions, we first will look at sources and types of errors. Forulation errrors. These errors result fro the assuption that certain variables are negligible or fro siplifications in describing interrelationships aong the variables in the various subodels constructed as part of the odeling process. Truncation errors. These errors are attributable to nuerical ethods used to solve a atheatical proble. An exaple is the use of a Taylor polynoial to approxiate a function. Round-off errors. These errors are caused by using finite-digit achines for coputations. Each nuber can be represented in a coputer with a nuber having only finite precision. Thus, each arithetic operation that is perfored, each with its own roundoff, causes an accuulated round-off error. Such an error can significantly alter the results of a odel s solution. Measureent errors. These errors are caused by iprecision in the data collection. This ay result fro huan error in recording data or fro liitations in the accuracy of easuring equipent. Analytic Methods of Model Fitting There are actually several ethods for fitting curves to a set of data points, based on different sets of criterion. We will briefly exaine soe of these, before focusing on the ost coon. The ajority of the aterial in this handout is taken directly fro A First Course in Matheatical Modeling by Frank Giordano, et al. 1

Note that this ay also be found in Section 4.1 of the textbook. Least-Squares Criterion Currently, the ost frequently used curve-fitting criterion is the least-squares criterion. In this case, given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, we seek to iniize the su y i f(x i ). (1) In other words, we are iniizing the standard deviation of the function f(x) fro the data set. One of the advantages of the least-squares ethod (also known as regression) is that the resulting optiization proble can be solved only using the calculus of several variables. Miniizing the Su of the Absolute Deviations In this case, given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, we seek to iniize y i f(x i ). Again, in general, this presents soe probles because, in order to solve, we ust differentiate the above su with respect to the paraeters in f(x i ) to find the critical points. However, because of the absolute values, the derivatives are not continuous. Chebyshev Approxiation Criterion Given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, iniize the largest absolute deviation y i f(x i ) over the entire set. The difficulty with the Chebyshev criterion is that it is often difficult to apply in practice, because application of the criterion results in an optiization proble that ay require advanced atheatical procedures or nuerical algoriths to solve. Relating the Criteria The geoetric interpretations of the three curve-fitting criteria help provide a qualitative description coparing the criteria. Miniizing the su of the absolute deviations tends

to treat each data point with equal weight and to average the deviations. The Chevyshev criterion gives ore weight to a single point potentially having a large deviation. The leastsquares criterion, in ters of weighting individual points, is soewhat in between both of these criteria. The following also ay be found in Section 4.3 of the tetbook. Now, we will copare the Chebyshev and least-squares approaches. Suppose the Chebyshev criterion is applied and the resulting optiization proble solved to give the function f 1 (x). The absolute deviations resulting fro the fit are defined as follows: y i f 1 (x i ) = c i, i = 1,,...,. Define c ax as the largest of the absolute deviations c i. Since the paraeters of f 1 (x) are deterined so as to iniize c ax, it is the inial largest absolute deviation obtainable. Suppose, instead, that the least-squares criterion is applied and the resulting optiization proble solved to yield the function f (x). The absolute deviations resulting fro this fit are given by y i f (x i ) = d i, i = 1,,...,. Define d ax as the largest of the absolute deviations d i. Clearly, d ax c ax. Since the su of the squares of d i is the sallest such su obtainable, we know that d 1 + d + + d c 1 + c + + c. However, c i c ax for i = 1,,...,. Therefore, or Thus, D = d 1 + d + + d c ax, d 1 + d + + d D c ax d ax. c ax. This gives an effective bound on the axiu absolute deviation c ax. So, if there is a considerable difference between D and d ax, it ight be better to apply the Chebyshev criterion. 3 Obtaining a Least-Squares Fit (or Regression) Linear Regression Suppose a odel of the for y = ax + b is expected, and it has been decided to use the data points (x i, y i ), i = 1,,..., to estiate a and b. Applying the least-squares criterion in equation (1), we find that we ust iniize E = [y i f(x i )] = (y i ax i b). 3

In order to iniize E, we need to copute and set to 0 the partial derivatives of E with respect to both a and b: E a = (y i ax i b) x i = 0 E b = (y i ax i b) = 0 Rewriting these equations, we obtain the following syste of two equations in two unknowns a x i + b x i = a x i + b = x i y i y i, which ay easily be solved to yield and a = x i y i x i yi x i ( x i ) () b = x i yi x i y i xi x i ( x i ) (3) Maple can be used to deterine the linear least-squares fit for any set of data. Exaple: The following table shows the average age versus height for children. The data is copied fro An Introduction to the Matheatics of Biology, with Coputer Algebra Models by Yeargers, Shonkwiler, and Herod. A plot of the data (height vs. age) reveals that the height (c) 75 9 108 11 130 14 155 age 1 3 5 7 9 11 13 data appears to be linear in nature, so we will perfor a linear regression to deterine the least-squares best linear fit, letting x i represent the age and y i represent the height. Working out the necessary sus, we find that xi = 49 x i = 455 yi = 83 xi y i = 6485. Applying the forulas in () and (3), we obtain a = 6.4643 and b = 7.314. Therefore, the data is approxiated by the curve y = 6.4643x + 7.314. See also the Maple exaple regression.w under Linear Regression. 4

4 Plotting the Residuals for a Least Squares Fit We plot the residuals in the y-axis and the independent variable on the x-axis. The deviations should be randoly distributed and contained in a reasonably sall band that is consistent with the accuracy required by the odel. If the residual is excessively large, this eans that the data point in question should be exained to discover the cause of the large deviation (i.e., did we iss an outlier or is soething else going on). A pattern or trend in the residuals indicates that a predictable effect reains to be odeled, and the nature of hte pattern gives a hint as to how to refine the odel. If the residual is increasing in absolute value, then that is an indication that our odel is unacceptable. We will use a Maple exaple to illustrate this. 5 Fitting a Power Curve Suppose we think that the curve will be of the for y = Ax N, where we know what N is (usually unlikely). Applying the least-squares ethod to fit the curve to a set of data points requires iniization of E = [y i f(x i )] = ( yi Ax N i ). A necessary condition for optiality is that the derivative of E with respect to A equals 0, giving de da = x N i (y i Ax n i ) = 0, sor A = x N i y i. x N i However, it is not likely that we know the value of N in advance. What if we do not know N? In other words, now assue that we seek a least-squares estiate of the for y = ax n, where both a and n are paraeters to be deterined. In this case, we will first need to transfor the function by taking the logarith of both sides to obtain ln y = ln a + n ln x. (4) Then, we siply transfor the data by taking the natural log of both x i and y i, then apply linear regression to obtain ln a and n. The paraeter a is then found by a = e ln a. Exaple: The following table shows the ideal weights for ediu built ales. The data is copied fro An Introduction to the Matheatics of Biology, with Coputer Algebra Models by Yeargers, Shonkwiler, and Herod. This exaple is worked out on the Maple worksheet, regression.w. 5

Height (in) Weight (lb) 6 18 63 131 64 135 65 139 66 14 67 146 68 150 69 154 70 158 71 16 7 167 73 17 74 177 First, we try a linear least-squares fit. As we can see fro the plot of the residuals, the residual is increasing as the height increases. Therefore, this odel does not see to be acceptable. However, since we know that in any geoetric solids, the volue changes with the cube of the height, we will try to find a cubic fit for the data. The results fro the cubic fit are uch better: the residual is not large, and it decreases as the height increases. We would like to know if we can do better. So, we will try to find a fit of the for weight = A(height 60) n. Ideally, n is an integer. However, we discover that n 0.168. Since the atheatics behind our approach does not require n to be an integer, this is okay. If we wish to ake n a rational nuber, then the nearest rational nuber to the value of n we obtained is 1. So, we also 6 look at the results with this value of n. 6 Multiple Regression In any real probles, there are ore than two variables to be considered. For exaple, (1) The pressure of a gas depends on the volue V and the teperature T : P V = kt. () The relationship between the surface area of a huan as a function of height and weight: SA = c(wt) α (ht) β. (3) Deterination of percent body fat fro body ass index and skin fold: % body fat = a(bmi) + b(skinfold) + c. 6

Look at (), for exaple. Taking the natural log of both sides gives ln(sa) = a ln(wt) + b ln(ht) + ln c, so we obtain an expression siilar to that in (3). In general, we have a relation of the for Y = a 1 X 1 + a X + + a r X r. Suppose that we have n data points, (X 1,i, X,i,..., X r,i, Y i ), i = 1,,..., n. Our goal is to iniize the least-squared error: E = (Y i (a 1 X 1,i + a X,i + + a r X r,i )) Differentiating with respect to each paraeter a i and setting the derivatives to zero gives the following syste of equations a 1 a 1 X 1,i X 1,i + + a r X r,i X 1,i + + a r X 1,i X r,i =. X r,i X r,i = X 1,i Y i X r,i Y i. Solve by writing the equations in atrix for. algebra systes, such as Maple. This is easily done in various coputer Maple code for exaples is in the file regression.w. 7