Laboratory 1c: Method of Least Squares

Similar documents
Laboratory 3: Method of Least Squares

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 11: Simple Linear Regression and Correlation

Statistics MINITAB - Lab 2

Linear Regression Analysis: Terminology and Notation

NUMERICAL DIFFERENTIATION

STAT 3008 Applied Regression Analysis

Statistics for Economics & Business

U-Pb Geochronology Practical: Background

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

APPENDIX 2 FITTING A STRAIGHT LINE TO OBSERVATIONS

Linear Approximation with Regularization and Moving Least Squares

Introduction to Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Comparison of Regression Lines

Chapter 9: Statistical Inference and the Relationship between Two Variables

Statistics for Business and Economics

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Basic Business Statistics, 10/e

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Some basic statistics and curve fitting techniques

Chapter 13: Multiple Regression

Lecture 6: Introduction to Linear Regression

Generalized Linear Methods

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Lecture Notes on Linear Regression

Gravitational Acceleration: A case of constant acceleration (approx. 2 hr.) (6/7/11)

/ n ) are compared. The logic is: if the two

Statistics Chapter 4

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

e i is a random error

β0 + β1xi. You are interested in estimating the unknown parameters β

Regression Analysis. Regression Analysis

Composite Hypotheses testing

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

A REVIEW OF ERROR ANALYSIS

Lecture 3 Stat102, Spring 2007

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

The Geometry of Logit and Probit

Kernel Methods and SVMs Extension

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

13 ERRORS, MEANS AND FITS

x = , so that calculated

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

The Ordinary Least Squares (OLS) Estimator

SIMPLE LINEAR REGRESSION

β0 + β1xi. You are interested in estimating the unknown parameters β

Goodness of fit and Wilks theorem

THE CURRENT BALANCE Physics 258/259

Uncertainty and auto-correlation in. Measurement

Rockefeller College University at Albany

Chapter 8 Indicator Variables

Properties of Least Squares

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Polynomial Regression Models

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

This column is a continuation of our previous column

a. (All your answers should be in the letter!

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Economics 130. Lecture 4 Simple Linear Regression Continued

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Limited Dependent Variables

Statistics II Final Exam 26/6/18

Chapter 14 Simple Linear Regression

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

x i1 =1 for all i (the constant ).

Learning Objectives for Chapter 11

Uncertainty in measurements of power and energy on power networks

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Negative Binomial Regression

T E C O L O T E R E S E A R C H, I N C.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Biostatistics 360 F&t Tests and Intervals in Regression 1

Linear Feature Engineering 11

Analytical Chemistry Calibration Curve Handout

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 4: Regression With One Regressor

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Estimation: Part 2. Chapter GREG estimation

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

MAE140 - Linear Circuits - Fall 13 Midterm, October 31

Transcription:

Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth each other and the correlaton seems to be lnear. We would lke to fnd the lne of the form y(x) = a + bx (1) that best fts ths data, [Whch n truth actually consst of pars of measurements (x, y )]. Whle we can eyeball the lne t would be nce to have an objectve well establshed method to determne the values for a and b. The most common method to do ths s called the method of least squares. Fgure 1: Plot of Y versus X Our task s to determne the coeffcents a and b n such a way that the dscrepancy between the values of our measurements y, and the correspondng ftted values y(x ) are mnmzed. The best we can do s to determne the most probable estmates for coeffcents usng the data avalable. ote that ths technque assumes that the uncertantes are all n the y varable and that the uncertantes follow Gaussan statstcs Uncertantes arsng from fluctuatons n repeated readngs of an nstrument's scale (caused by settngs that are not exactly reproducble) are often Gaussan. These uncertantes are called nstrumental regardless of whether they are due to equpment mperfectons or to human mprecson.

Method of Maxmum Lkelhood Our data consst of a sample of observatons extracted from a parent populaton whch determnes the probablty of makng a partcular observaton. Let us defne parent coeffcents a o and b o such that the actual lnear relatonshp between y and x s gven by y0 ) o 0 ( x a b x () For any gven value x we can calculate the probablty densty Ω (a, b) for makng the observed measurement y, by assumng a Gaussan dstrbuton about the actual value y o (x ) wth a standard devaton σ = σ,.e. ( y y0 ( x )) 1 P( x ) e (3) The probablty of makng smultaneous measurements of all y s the product of these probabltes 1 1 a0, b0 ( x ) e 1 y y ( 0 x ) (4) Of course, we do not know the parent dstrbuton values for a o and b o, but for any estmated values of the coeffcents a and b, we can calculate the probablty densty for makng the observed set of measurements as where Δy = y - (a + bx ). y 1 a, b e (5) The method of maxmum lkelhood conssts of makng the assumpton that by maxmzng equaton (5) for the observed set of measurements we are most lkely to obtan the best estmates for a o and b o. Maxmzng the probablty Ω(a, b) s equvalent to mnmzng the sum n the exponental. We defne the quantty ch squared, χ, to be the sum n the exponental: y (6) 1 We have used the same symbol χ, defned n Lab. 1b, because ths s essentally the same defnton n a dfferent context. Our method for fndng the optmum ft to the data wll be to mnmze ths sum of squared devatons and, hence, to fnd the ft whch produces the smallest χ.

a = 0 (7) Lab 1c, Least Squares ote: Our development assumes the error assocated wth any sngle measurement s the same for all measurements. Modfcatons to the development must be made when ths s not so. Mnmzng χ In order to fnd the values of the coeffcents a and b whch yeld the mnmum value for χ we use the methods of the calculus,.e. χ And χ b = 0 (8) Rearranged these equatons yelds a par of smultaneous equatons to be solved for the coeffcents a and b. Ths wll gve us the values of the coeffcents for whch χ s mnmzed. Ths s done wth the determnants below. In these equatons be sure to dstngush the dfference between the square of the sum of x, ( x ), and the sum of the squares of x, x : a = 1 Δ ( x y x x y ) b = 1 Δ ( x y x y ) Δ = x ( x ) (9) Estmaton of Errors In order to fnd the uncertanty n the estmaton of the coeffcents a and b n our fttng procedure, we refer to our dscusson of the propagaton of errors n Laboratory 1. Each of our data ponts y have been used n the determnaton of the parameters, and each has contrbuted some fracton of ts own uncertanty to the uncertanty n our fnal determnaton. Ignorng systematc errors whch would ntroduce correlatons between the uncertantes, the standard devaton σ z of the determnaton of any parameter z s gven by z z z (10) 1 y 1 y If we assume that the uncertantes are nstrumental and all the same, they can be estmated from the data. Our defnton n Laboratory 1 of the sample varance s, whch approxmates σ, s the

sum of the squares of devatons of the data ponts from the calculated mean dvded by the number of degrees of freedom. In ths case, the number of degrees of freedom s the number of data ponts mnus the number of parameters (two) whch we determned before calculatng s. Thus, our estmated parent standard devaton σ = σ s 1 s y a bx (11) 1 ote that t s ths common uncertanty, σ, whch we have mnmzed by our least-squares fttng procedure. The dervatves n equaton (10) can be evaluated by takng the dervatves of equatons (9), and we can fnd an expresson for the uncertantes n parameters a and b,.e. a b x (1) It may not be obvous from these forms, but the larger the number of data ponts the smaller s the error n the quanttes a and b. Usng R to graph and do least squares regresson A follow along example. The two most mportant commands are xyplot for graphng and lm for fttng a lnear model. At ther most basc they have the syntax: xyplot(y~x, data=data_frame) and lm(y~x, data=data_frame). An example s useful, we wll use the rubberband dataset that comes as part of R. Frst we must brng fastr and rubberband nto our current sesson: requre(fastr) data(rubberband) ext we wll look at the structure of the dataset: head(rubberband) Graphng From ths you wll see that there are two varables: Stretch and Dstance. Stretch s the one we have control over so t s the ndependent varable and Dstance s the dependent. Use xyplot to graph the data, use the varable names separated by a tlde and specfy the data as comng from the data_frame rubberband. xyplot(dstance~stretch, data=rubberband, xlab= Stretch (cm), ylab= Dstance (cm) )

Lab 1c, Least Squares Where Dstance s the dependent varable and Stretch s the ndependent varable. By default xyplot() has used type p (for pont). We can make ths explct: xyplot(dstance~stretch, data=rubberband, type=c( p )) Other types exst, the type that s relevant here s r for regresson. xyplot(dstance~stretch, data=rubberband, type=c( r )) Even better s to use both on the same graph: xyplot(dstance~stretch, data=rubberband, type=c( p, r )) And to fnsh t off error bars usng an uncertanty of s=5cm s<-5 } ) xyplot(dstance~stretch, rubberband, lb=rubberband$dstance-s,ub= rubberband$dstance+s, panel=functon(x, y, lb, ub, ){ panel.xyplot(x, y, type=c( p, r ), ) panel.segments(x0=x, x1=x, y0=lb, y1=ub, ) ote that an error bar about a data pont reflects the probablty that another measurement would reproduce the frst value wthn one standard devaton 66% of the tme. Fttng It s useful to see the least-squares regresson lne graphed wth the data to make decsons about how meanngful the ft s, but t s useful to know the coeffcents n the equaton for the lne Try usng lm: Dstance = a + b Stretch. (13) lm(dstance~stretch, data=rubberband) Ths gves the most basc nformaton, the ntercept (a) and the slope for the Stretch varable (b). We can extract more nformaton f we look at the lnear model's summary: summary(lm(dstance~stretch, data=rubberband)) Ths gves a varety of nformaton ncludng estmates for the coeffcents wth ther uncertantes (a=100+/- 0, b=5+/-4), probabltes that the coeffcents values could be explaned by chance nstead

of a real correlaton (1.3e-3% for a and 7.4e-7% for b) and an estmate of the varance n the parent dstrbuton, s, whch s reported as the resdual standard error (s =19.15). Even more valuable than s s the R value whch gves the percent of the varance n the data that s explaned by the model (R =0.91). A model that does a perfect job wll have R =1. Expermental Procedure for the General Case Gven the data shown n Table 1 below: Table 1: Expermental data for temperature versus poston along a rod Tral X (cm) T ( o C) 1 1.0 15.6.0 17.5 3 3.0 36.6 4 4.0 43.8 5 5.0 58. 6 6.0 61.6 7 7.0 64. 8 8.0 70.4 9 9.0 98.8 1. Determne the best parameters a and b to the equaton T = a + bx. ote that we are assumng that all of the error s assocated wth a measurement of temperature, not length.. Determne the standard devaton of the temperature data. 3. Determne the errors assocated wth a and b. 4. Express the thermal gradent n a manner sutable for reportng;.e., the slope of the lne. 5. Plot the data and ft. 6. Use the resdual standard error, s, to draw error bars on each data pont of temperature ± s. 7. Assume the thermometer has 1 o C markngs (ΔT = 0.5 o C). How does ths compare to s?

Lab 1c, Least Squares Specfc Case for Constrant a = 0 An mportant subset of the general problem dscussed above s the lnear functon constraned to pass through the orgn,.e. y(x) = c x (14) Ths s a lnear problem but t represents a functon whch has a y ntercept of zero. We follow the same approach as above startng wth the defnton of χ, and we proceed as n the prevous secton and fnd the mnmum of χ wth respect to c by lettng: c 0 (15) The standard devaton assocated wth a data pont y n ths case s 1 s y cx (16) 1 1 where the - 1 factor appears nstead of - snce we have only used the data once to determne c. To fnd the error n the slope we fnd the change n c wth respect to y,.e. c y, then we obtan σ c = σ x =1 (17) Thus, equatons (15), (16), and (17) provde us wth the lnear least squares ft of data constraned to go through the orgn. Expermental Procedure for the Specfc Case of Data Passng Through the Orgn Measurng : You are gven a set of alumnum dsks wth dfferent dameters. 1. Devse a technque for measurng the crcumference of the dsk and label these data ponts as crc.. Determne the best parameter p ft to the equaton crc = p dam. In R ths s done by gvng the ntercept an explct value of 0 Dsk.model<-lm(crc~dam+0, data=dsk)

ote that we are choosng to treat the dameter as the ndependent varable, ths s arbtrary and s equally vald the other way around. 3. Determne the standard devaton assocated wth each crc. 4. Determne the error assocated wth p. 5. Use your model to predct a crcumference based on the model for a dameter of 10cm. pdata<-predct(dsk.model, newdata=data.frame(dam=10)) 6. Use your model to predct a crcumference for all dameters n your dataset. 7. Use the results of 7 to plot the data and the ntercept=0 ft, xyplot(crc+pdata~dam,data=dsk, type= p ) 8. Record the best ft parameter for the slope, p, as well as for the estmated error n the slope. Compare to the known value of and dscuss the sgnfcance of your results for the goodness of your data. Ohm s Law: You are gven a voltage supply, a decade resstor box, an analog current meter, and a dgtal volt meter. Set the analog current meter to the 5 mllamp scale and the decade resstor box to 10 ohms. Use the dgtal meter to record the resstance of the decade resstor box at ths settng when t s not yet connected to anythng else. ow, connect the voltage supply, the resstor box, and the current meter nto a smple seres crcut. Place the dgtal volt meter n parallel across the termnals of the voltage supply and set t to measure DC voltages up to 00 mv. Usng ths set up: 1. Measure and record the current through the resstor as well as the voltage of the power supply for at least 10 dfferent voltages between 0 and 00 mv. (OTE: be sure not to exceed 00 mv so as not to damage the current meter).. Plot the voltage versus current data you collected usng R and fnd what t determnes to be the best ft parameter for the slope as well as the estmated uncertanly n the slope. Make sure to constran the ntercept to zero when fttng the lne to your data. 3. Usng Ohm s Law (V=IR), compare what you found from the graph for the resstance of the decade resstor box to what the dgtal meter found t to be. If they do not agree wthn the level of your expermental precson, dscuss possble reasons for the dfferences. Where possble cte specfc measurements or estmated values taken from the laboratory that support your suppostons for the sources of the error (.e. do not just say human error n readng the meters and leave t at that.) WHE YOU FIISH: LEAVE THIGS AS YOU FOUD THEM!