BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 6: Least Squares Curve Fitting Methods

Size: px
Start display at page:

Download "BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 6: Least Squares Curve Fitting Methods"

Transcription

1 BIOE 98MI Biomedical Data Analysis. Spring Semester 09. Lab 6: Least Squares Curve Fitting Methods A. Modeling Data You have a pair of night vision goggles for seeing IR fluorescent photons emerging from the skin of patients whose lymphatic vessels were injected with a fluorescent dye (see Fig ). These goggles are necessary because the detector can see light levels below the sensitivity of the human eye. The manufacturer says the output voltage signal is directly proportional to the input light levels with slope one. Great! However, there is additive noise. Before noise suppression, the output signal from the goggles y relative to deterministic input x is modeled simply as: y x + n, where for each x the noise sample n is a time-independent sample drawn from the normal pdf, (0,0). (In reality, x is a Poisson random variable, which we ignore in this example.) Assume the units of all three terms is micro candela per square meter (cd/m ). Fig.. Images from the lab of Dr. Eva Sevick-Muraca, University of Texas Health Science Center at Houston, TX. Publication: Journal of Vascular Surgery: Venous and Lymphatic Disorders, Volume 4, Issue, January 06, Pages 9-7. Example. You obtain one of these devices and decide to experiment in the lab before seeing patients. The experiment is to input a known quantity of light x into the device and measure its response y. The experiment begins with input flux x() 0 μcd/m and is indexed in steps of 0 cd/m, i.e., 0, 0,, 00 cd/m while output y (t) is recorded. The results are: Experimental Data: x y

2 Plotting the x,y pairs of recorded measurements, you find the red circles in Fig. Compared to the noise free model provided by the manufacturer, y x (solid black line), the results are not very impressive because of the noise. The device assumes a linear input-output relationship, so these data can be fit to a linear regression line of the form z P x P using the method of least squares: MSSD argmin d i (x) argmin (y i P x i P ). P,P P,P i This equation tells us to find parameters P and P that yield the minimum sum squared differences (MSSD) between each of the 0 measurements and a straight line. The differences (see Fig below) are defined as i Fig.. (Above) Graphical representation of data in Example. Least-squares regression line is z while modeled output are labeled y and measured data y. (Below) The distances d between y and z at each x are displayed along with z equation, GOF and R values. d i y i z i y i (P x i P ), for i. For statistical optimization reasons, we use d i instead of d i. otice that in this example, the least-squares regression line z is not the same as the model function y. To apply the method of least squares, we must assume y is linearly related to x or a transformation of x deviations from the regression line (residuals d i ) are normally distributed random variables (d i, σ i ) all variances i are equal. My code for generating these results is (note the new plotting function at the end!) close all x0:0:00;rng('default');yx+0*randn(size(x)); yx+n where n~(0,0) Ppolyfit(x,y,); Apply the method of least squares: P(),P() zpolyval(p,x); Compute the regression line zp*x + P myplot(x,x,':b');hold on;myplot(x,y,'or'),myplot(x,z,'-k') xlabel('x (input)');ylabel('output'); legend('x','y','z','location','southeast') qgoodnessoffit(y,z,'mse'); str['gof between z and y is ' numstr(q)]; disp(str); qgoodnessoffit(y,x,'mse'); str['gof between x and y is ' numstr(q)]; disp(str); The following correlation coefficient relates x to y. Acorrcoef(x,y);str['R^ for x vs y is ' numstr(a(,))]; disp(str); str4['regression line: y ' numstr(p()) 'x + ' numstr(p())]; disp(str4); function myplot(x,y,z) I adapted myplot to modify line/point types plot(x,y,numstr(z),'linewidth',) ax gca; current axes ax.fontsize 0; end

3 ote how I adapted myplot from Lab b, I called it myplot, to add the ability to control line and point properties. The Pearson correlation coefficient R is found from either of the two off-diagonal terms obtained from the correlation matrix generated between x and y by the Matlab function corrcoef(x,y); R square of sample covariance product of two sample variances s xy s x s ( i (x i x ) (y i y )) y i(x i x ) (y i y ) [ (x i y i x i j y j y i x j + j x i j j j y j )] [ (x i x i j x j + ( j x i j) )] [ (y i y i j y j + ( y j [ i x i y i i x i [ i x i ( i x i)( j x j ) + ( x j i i j ) )] j y j i y i j x j + ( j x j y j j ) ] [ i y i i j )] ( y i j ) i )( j y j ) + ( y j i ] [ i x i y i x i i i y i i y i i x i + x i i y i i ) [ i x i ( i x i)( i x i ) + ( x i [ i x i y i ( i x i )( i y i )] [ x i ( j x j ) i ] [ y i ( j y j ) i ] Whew! To follow the math, note that y similarly for x. ] [ i y i ( y i i ) i )( i y i ) + ( y i [ i x iy i x y ] [ x i x i ] i ] [ y i y i y i j y j. Also, ( i j ) y y j i y and ] ] Pearson s correlation coefficient R describes the strength of the linear relationship between two functions or arrays of data (Fig 3). It estimates the fraction of sample variance s y explained by changes in x. While 0 R, we also have R. For example, when R ~, (see S4 in Fig 3 and in Example above where R ~ 0.96), we see that variations in x mostly explain variation in y ; increasing one variable by unit increases the other by approximately one unit and we say the two variables are strongly positively correlated. However, when R 0 (see S and S3 in Fig 3), there is no relationship between the two variables. The downside of correlation is that it doesn t tell us which variable is dependent. We will come back to R and R in a minute. First, another test statistic of interest is the Goodness of Fit measure, GOF. Using the Matlab function goodnessoffit, I separately tested how well x and z in Fig each represent y by selecting mean-square error as the GOF metric: MSE i (y i z i ). Since Fig. 3. -D scatter plots for two variables with different means, variances, and correlation coefficients. S: R -0.40, sy < sx. S: R 0, sy > sx. S3: R 0, sy sx. S4: R 0.98, sy sx. The code is given at the end of this lab write up. 3

4 regression line z is the MSSD result, it make sense that z will fit data y even better than the true underlying model y. Summary: Correlation R is useful for quantifying the existence of relationships between variables, but it cannot establish causal relationships. That is, there is no way to tell if one variable is causing the response observed by the other; i.e., is y (x) true or is x(y ) true? We prefer R over R when correlation is used to explain variance in the data. However, we use R to describe scatter plots (Fig 3) because it tells us if the correlation between variables is negative or positive (compare S and S4 in Fig 3). While R describes correlation between x and y, GOF describe how well y is described by linear regression line z. Exercise. Look at the line of code below. It generates random data in three columns. The first two columns are uncorrelated random data, R ~ 0, while the last two are perfectly correlated random data, R. If the correlation coefficient between columns i and j is R ij, the resulting correlation matrix from R R R 3 Matlab has elements given by ( R R 3 R R 3 R 3 ). What fraction of each random variable can be R 33 explained by the others? Why is the matrix symmetric about the diagonal? clear all;yzeros(000,3);yrandn(000,);y(:,3)3*y(:,);ccorrcoef(y) C Assignment : Obtain the file lab6.mat that contains three variables. t is a time axis while y and y are data arrays such that y (t) and y (t). I suggest you approach this assignment in stages. (a) Plot y and y versus t. Adapt the code from Example above to propose linear models for both y and y. ote that y is one data recording without signal averaging, while y a data recording resulting after averaging 0 waveforms. The linear models will not fit well, so now try higher-order polynomials. (b) Change the code to enter a range of m in Ppolyfit(t,y,m) from the keyboard, where 0 m 8 is the order of polynomial generated to fit to the data. P is a vector of values where P()*t.^m + P()*t.^(m-) P(m)*t + P(m+). Use goodnessoffit to compute the MSE values between z and y and between z and y as a function of m. (c) Create code that allows the operator to view the MSE values versus m from (b) and then input a value of m from the keyboard to see plots of both y and y. Explore plots of model fits for various values of m and tell me what you consider achieves the best fit. At m0, for example, the model is a constant, and hence MSE should be large. At m8, the model is clearly fitting noise. Hence, you need to tell me how you would best select m based on your observations of the analysis of y and y. ote that you do not know the true model function, so your answer needs to be related from your analysis. This discussion needs to appear in the DISCUSSIO section. Grading: pts for introduction of the problem. pts for methods adopted to decide on optimal model. 3 pts for visualization of results that will lead to a convincing decision given in Discussion & Conclusions ( pts). pt for overall appearance of the report. 4

5 Code that generated Figure 3 scatter plots. ot an assignment! The following script simulates four flow cytometry data sets that are each bivariate normal. Parameter vectors vary between data sets. clear all; close all; rho-0.4;s[00 40];m[500 80];50; First data set zrandn(); ^ is the number of data points simulated Xs()*z+m(); X and Y convert from standard normal pdf Ys()*(rho*z+sqrt(-rho^)*randn())+m(); plot(x,y,'k.');axis([ ]);axis square; plot on fixed axis text(730,70,'s');hold on label the first group S rho0;s[40 00];m[ ];0; Second data set zrandn(); Xs()*z+m(); Ys()*(rho*z+sqrt(-rho^)*randn())+m(); plot(x,y,'ro');text(60,880,'s') rho0;s3[30 30];m3[00 700];330; Third data set zrandn(3); X3s3()*z+m3(); Y3s3()*(rho*z+sqrt(-rho^)*randn(3))+m3(); plot(x3,y3,'bx') text(80,850,'s3') rho0.98;s4[40 40];m4[ ];450; Fourth data set zrandn(4); X4s4()*z+m4(); Y4s4()*(rho*z+sqrt(-rho^)*randn(4))+m4(); plot(x4,y4,'bx');hold off title('bivariant ormal Flow Cytometry Data'); xlabel('side Scatter Intensity');ylabel('Forward Scatter Intensity') text(850,580,'s4');hold off save('f','x','y','x','y','x3','y3') use these for the assignment load('f.mat'); then use whos command to see what is loaded [rho]corr(x,y);rtrace(rho)/; Here we estimate Pearson s str ['rho_ ' numstr(r)];disp(str) correlation and average for [rho]corr(x,y);rtrace(rho)/; all points along diagonal str ['rho_ ' numstr(r)];disp(str) [rho3]corr(x3,y3);r3trace(rho3)/3; str ['rho_3 ' numstr(r3)];disp(str) [rho4]corr(x4,y4);r4trace(rho4)/4; str ['rho_4 ' numstr(r4)];disp(str) 5

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Statistics

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Statistics BIOE 98MI Biomedical Data Analysis. Spring Semester 08. Lab 4: Introduction to Probability and Statistics A. Discrete Distributions: The probability of specific event E occurring, Pr(E), is given by ne

More information

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Random Data

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Random Data BIOE 98MI Biomedical Data Analysis. Spring Semester 209. Lab 4: Introduction to Probability and Random Data A. Random Variables Randomness is a component of all measurement data, either from acquisition

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA EM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA In this unit of the course we use statistical methods to look for trends in data. Often experiments are conducted by having

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics BIOE 98MI Biomedical Data Analysis. Spring Semester 209. Lab 5: Introduction to Statistics A. Review: Ensemble and Sample Statistics The normal probability density function (pdf) from which random samples

More information

Temperature measurement

Temperature measurement Luleå University of Technology Johan Carlson Last revision: July 22, 2009 Measurement Technology and Uncertainty Analysis - E7021E Lab 3 Temperature measurement Introduction In this lab you are given a

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

A First Course on Kinetics and Reaction Engineering Example S3.1

A First Course on Kinetics and Reaction Engineering Example S3.1 Example S3.1 Problem Purpose This example shows how to use the MATLAB script file, FitLinSR.m, to fit a linear model to experimental data. Problem Statement Assume that in the course of solving a kinetics

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Chapter 5: Data Transformation

Chapter 5: Data Transformation Chapter 5: Data Transformation The circle of transformations The x-squared transformation The log transformation The reciprocal transformation Regression analysis choosing the best transformation TEXT:

More information

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression EE145L Lab 1, Linear Regression 10/4/2012 Abstract We examined multiple sets of data to assess the relationship between the variables, linear or non-linear, in addition to studying ways of transforming

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Simple Linear Regression

Simple Linear Regression 9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Math 343 Lab 7: Line and Curve Fitting

Math 343 Lab 7: Line and Curve Fitting Objective Math 343 Lab 7: Line and Curve Fitting In this lab, we explore another use of linear algebra in statistics. Specifically, we discuss the notion of least squares as a way to fit lines and curves

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Newton s Cooling Model in Matlab and the Cooling Project!

Newton s Cooling Model in Matlab and the Cooling Project! Newton s Cooling Model in Matlab and the Cooling Project! James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 10, 2014 Outline Your Newton

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

5-Sep-15 PHYS101-2 GRAPHING

5-Sep-15 PHYS101-2 GRAPHING GRAPHING Objectives 1- To plot and analyze a graph manually and using Microsoft Excel. 2- To find constants from a nonlinear relation. Exercise 1 - Using Excel to plot a graph Suppose you have measured

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Unit 1 Linear Models. Answer. Question. Model. Question. Answer. 3 Types of Models. Model: A simplified version of reality

Unit 1 Linear Models. Answer. Question. Model. Question. Answer. 3 Types of Models. Model: A simplified version of reality Model: A simplified version of reality Model Example: Free Fall is a model of falling objects that ignores air resistance. Free Fall assumes that only gravity affects falling motion. Real fall in an atmosphere

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Companion. Jeffrey E. Jones

Companion. Jeffrey E. Jones MATLAB7 Companion 1O11OO1O1O1OOOO1O1OO1111O1O1OO 1O1O1OO1OO1O11OOO1O111O1O1O1O1 O11O1O1O11O1O1O1O1OO1O11O1O1O1 O1O1O1111O11O1O1OO1O1O1O1OOOOO O1111O1O1O1O1O1O1OO1OO1OO1OOO1 O1O11111O1O1O1O1O Jeffrey E.

More information

IMMERSE 2008: Assignment 4

IMMERSE 2008: Assignment 4 IMMERSE 2008: Assignment 4 4.) Let A be a ring and set R = A[x,...,x n ]. For each let R a = A x a...xa. Prove that is an -graded ring. a = (a,...,a ) R = a R a Proof: It is necessary to show that (a)

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Lab 4: Quantization, Oversampling, and Noise Shaping

Lab 4: Quantization, Oversampling, and Noise Shaping Lab 4: Quantization, Oversampling, and Noise Shaping Due Friday 04/21/17 Overview: This assignment should be completed with your assigned lab partner(s). Each group must turn in a report composed using

More information

LABORATORY NUMBER 9 STATISTICAL ANALYSIS OF DATA

LABORATORY NUMBER 9 STATISTICAL ANALYSIS OF DATA LABORATORY NUMBER 9 STATISTICAL ANALYSIS OF DATA 1.0 INTRODUCTION The purpose of this laboratory is to introduce the student to the use of statistics to analyze data. Using the data acquisition system

More information

INTERSECTIONS OF PLANES

INTERSECTIONS OF PLANES GG303 Lab 4 9/17/14 1 INTERSECTIONS OF PLANES Lab 4 (125 points total) Read each exercise completely before you start it so that you understand the problem-solving approach you are asked to execute This

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

APPENDIX 1 BASIC STATISTICS. Summarizing Data

APPENDIX 1 BASIC STATISTICS. Summarizing Data 1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

INTERSECTIONS OF PLANES

INTERSECTIONS OF PLANES GG303 Lab 4 10/9/13 1 INTERSECTIONS OF PLANES Lab 4 Read each exercise completely before you start it so that you understand the problem-solving approach you are asked to execute This will help keep the

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 05 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

CORELATION - Pearson-r - Spearman-rho

CORELATION - Pearson-r - Spearman-rho CORELATION - Pearson-r - Spearman-rho Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is

More information

Measuring Associations : Pearson s correlation

Measuring Associations : Pearson s correlation Measuring Associations : Pearson s correlation Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB SECTION 7: CURVE FITTING MAE 4020/5020 Numerical Methods with MATLAB 2 Introduction Curve Fitting 3 Often have data,, that is a function of some independent variable,, but the underlying relationship is

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

New Mexico Tech Hyd 510

New Mexico Tech Hyd 510 Vectors vector - has magnitude and direction (e.g. velocity, specific discharge, hydraulic gradient) scalar - has magnitude only (e.g. porosity, specific yield, storage coefficient) unit vector - a unit

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

b) (1) Using the results of part (a), let Q be the matrix with column vectors b j and A be the matrix with column vectors v j :

b) (1) Using the results of part (a), let Q be the matrix with column vectors b j and A be the matrix with column vectors v j : Exercise assignment 2 Each exercise assignment has two parts. The first part consists of 3 5 elementary problems for a maximum of 10 points from each assignment. For the second part consisting of problems

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 0 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Computational Physics

Computational Physics Interpolation, Extrapolation & Polynomial Approximation Lectures based on course notes by Pablo Laguna and Kostas Kokkotas revamped by Deirdre Shoemaker Spring 2014 Introduction In many cases, a function

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Graphs. 1. Graph paper 2. Ruler

Graphs. 1. Graph paper 2. Ruler Graphs Objective The purpose of this activity is to learn and develop some of the necessary techniques to graphically analyze data and extract relevant relationships between independent and dependent phenomena,

More information

MATLAB Workbook CME106. Introduction to Probability and Statistics for Engineers. First Edition. Vadim Khayms

MATLAB Workbook CME106. Introduction to Probability and Statistics for Engineers. First Edition. Vadim Khayms MATLAB Workbook CME106 Introduction to Probability and Statistics for Engineers First Edition Vadim Khayms Table of Contents 1. Random Number Generation 2. Probability Distributions 3. Parameter Estimation

More information

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data Lecture 3: Bivariate Data & Linear Regression 1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data a) Freehand Linear Fit b) Least Squares Fit c) Interpola9on/Extrapola9on 4. Correla9on 1. Introduc9on

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

Correlation. January 11, 2018

Correlation. January 11, 2018 Correlation January 11, 2018 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016 Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation

More information

ES-2 Lecture: More Least-squares Fitting. Spring 2017

ES-2 Lecture: More Least-squares Fitting. Spring 2017 ES-2 Lecture: More Least-squares Fitting Spring 2017 Outline Quick review of least-squares line fitting (also called `linear regression ) How can we find the best-fit line? (Brute-force method is not efficient)

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Estadística II Chapter 4: Simple linear regression

Estadística II Chapter 4: Simple linear regression Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties

More information

Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression

Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression Dependent and independent variables The independent variable (x) is the one that is chosen freely or occur naturally.

More information

Ratio of Polynomials Fit One Variable

Ratio of Polynomials Fit One Variable Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Introduction to Computer Tools and Uncertainties

Introduction to Computer Tools and Uncertainties Experiment 1 Introduction to Computer Tools and Uncertainties 1.1 Objectives To become familiar with the computer programs and utilities that will be used throughout the semester. To become familiar with

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS Introduction Path analysis and multiple regression go hand in hand (almost). Also, it is easier to learn about multivariate

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

ik () uk () Today s menu Last lecture Some definitions Repeatability of sensing elements

ik () uk () Today s menu Last lecture Some definitions Repeatability of sensing elements Last lecture Overview of the elements of measurement systems. Sensing elements. Signal conditioning elements. Signal processing elements. Data presentation elements. Static characteristics of measurement

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Arizona Western College MAT 105 Mathematics for Applied Sciences Final Exam Review Solutions Spring = = 12 (page 43)

Arizona Western College MAT 105 Mathematics for Applied Sciences Final Exam Review Solutions Spring = = 12 (page 43) Arizona Western College MAT 0 Mathematics for Applied Sciences Final Exam Review Solutions Spring 06 Student NAME You are expected to show your process. That means show how you get from the initial problem

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Describing Bivariate Relationships

Describing Bivariate Relationships Describing Bivariate Relationships Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables Plot the data

More information

Introduction to Vectors

Introduction to Vectors Introduction to Vectors Why Vectors? Say you wanted to tell your friend that you re running late and will be there in five minutes. That s precisely enough information for your friend to know when you

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information