Microarra Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 14-1-213 petr.nazarov@crp-sante.lu Statistical data analsis in Ecel. 5. Linear regression
OUTLINE Lecture 1 Introduction covariation and correlation measures dependent and independent random variables scatter plot and linear trendline linear model Testing for significance estimation of the noise variance interval estimations testing hpothesis about significance Regression Analsis confidence and prediction multiple linear regression nonlinear regression Statistical data analsis in Ecel. 5. Linear regression 2
INTRODUCTION Dependent and Independent Variables mice.ls Ending weight vs. Starting weight Blood ph vs. Starting weight Ending weight 6 5 4 3 2 1 Blood ph 7.45 7.4 7.35 7.3 7.25 7.2 7.15 7.1 7.5 1 2 3 4 5 Starting weight 7 1 2 3 4 5 Starting weight Statistical data analsis in Ecel. 5. Linear regression 3
INTRODUCTION Dependent and Independent Variables Statistical data analsis in Ecel. 5. Linear regression 4
INTRODUCTION Measure of Association between 2 Variables Covariance A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship. σ = population ( i µ ( i µ N s sample ( m ( m i = n 1 i mice.ls 6 5 In Ecel use function: =COVAR(data Ending weight vs. Starting weight Ending weight 4 3 2 1 1 2 3 4 5 Starting weight s = 39.8 hard to interpret Statistical data analsis in Ecel. 5. Linear regression 5
INTRODUCTION Measure of Association between 2 Variables Correlation (Pearson product moment correlation coefficient A measure of linear association between two variables that takes on values between -1 and 1. Values near 1 indicate a strong positive linear relationship, values near -1 indicate a strong negative linear relationship; and values near zero indicate the lack of a linear relationship. population ( m ( m σ i i ρ = = σ σ σ σ N r sample s i m = = s s s s ( ( m i ( n 1 6 5 Ending weight 4 3 2 1 In Ecel use function: =CORREL(data r =.94 1 2 3 4 5 Starting weight Statistical data analsis in Ecel. 5. Linear regression 6
LINEAR REGRESSION Eperiments Temperature Cell Number 2 83 21 139 22 99 23 143 24 164 25 233 26 198 27 261 28 235 29 264 3 243 31 339 32 331 33 346 34 35 35 368 36 36 37 397 38 361 39 358 4 381 Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature Cells are grown under different temperature conditions from 2 to 4. A researched would like to find a dependenc between T and cell number. Dependent variable The variable that is being predicted or eplained. It is denoted b. Independent variable The variable that is doing the predicting or eplaining. It is denoted b. Statistical data analsis in Ecel. 5. Linear regression 7
LINEAR REGRESSION Eperiments Simple linear regression Regression analsis involving one independent variable and one dependent variable in which the relationship between the variables is approimated b a straight line. Building a regression means finding and tuning the model to eplain the behaviour of the data 45 4 35 Number of cells 3 25 2 15 1 5 15 25 35 45 Temperature Statistical data analsis in Ecel. 5. Linear regression 8
LINEAR REGRESSION Eperiments Regression model The equation describing how is related to and an error term; in simple linear regression, the regression model is = β β 1 ε Regression equation The equation that describes how the mean or epected value of the dependent variable is related to the independent variable; in simple linear regression, E( =β β 1 Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature Model for a simple linear regression: ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 9
LINEAR REGRESSION Regression Model and Regression Line ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 1
LINEAR REGRESSION Eperiments Estimated regression equation The estimate of the regression equation developed from sample data b using the least squares method. For simple linear regression, the estimated regression equation is = b b 1 cells.ls 1. Make a scatter plot for the data. ( β β ε = 1 ˆ ( = b 1 b [ ( ] = b 1 b E Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature 2. Right click to Add Trendline. Show equation. Number of cells 45 4 = 15.339-191.1 35 3 25 2 15 1 5 15 25 35 45 Temperature Statistical data analsis in Ecel. 5. Linear regression 11
SIMPLE LINEAR REGRESSION Overview Statistical data analsis in Ecel. 5. Linear regression 12
SIMPLE LINEAR REGRESSION Eperiments Least squares method A procedure used to develop the estimated regression equation. 2 ˆi The objective is to minimize ( i Slope: b 1 = ( m ( m i ( m 2 1 i Intersect: b = m b1m Statistical data analsis in Ecel. 5. Linear regression 13
LINEAR REGRESSION Coefficient of Determination Sum squares due to error SSE ( ˆ 2 i = i 45 4 35 = 15.339-191.1 Sum squares total SST ( 2 = i Number of cells 3 25 2 15 Sum squares due to regression ( ˆ 2 SSR = i 1 5 15 2 25 3 35 4 45 Temperature The Main Equation SST = SSR SSE Statistical data analsis in Ecel. 5. Linear regression 14
LINEAR REGRESSION ANOVA and Regression: Testing for Significance 45 4 35 Number of cells 3 25 2 15 1 5 15 2 25 3 35 4 45 Temperature SST = SSR SSE H : β 1 = insignificant H a : β 1 Statistical data analsis in Ecel. 5. Linear regression 15
LINEAR REGRESSION Coefficient of Determination SSE SST ( ˆ 2 i = i ( 2 = i ( ˆ 2 SSR = i SST = SSR SSE Coefficient of determination A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variabilit in the dependent variable that is eplained b the estimated regression equation. Number of cells 45 4 35 = 15.339-191.1 R 2 =.941 3 25 2 15 1 5 15 2 25 3 35 4 45 Temperature 2 R = SSR SST Correlation coefficient A measure of the strength of the linear relationship between two 2 r = sign ( b variables (previousl discussed in Lecture 1. 1 R Statistical data analsis in Ecel. 5. Linear regression 16
LINEAR REGRESSION Assumptions Assumptions for Simple Linear Regression 1. The error term ε is a random variable with mean, i.e. E[ε]= 2. The variance of ε, denoted b σ 2, is the same for all values of 3. The values of ε are independent 3. The term ε is a normall distributed variable ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 17
REGRESSION ANALYSIS Confidence and Prediction Confidence interval The interval estimate of the mean value of for a given value of. Prediction interval The interval estimate of an individual value of for a given value of. Statistical data analsis in Ecel. 5. Linear regression 18
REGRESSION ANALYSIS Residuals Statistical data analsis in Ecel. 5. Linear regression 19
TESTING FOR SIGNIFICANCE Sampling Distribution for b 1 If assumptions for ε are fulfilled, then the sampling distribution for b 1 is as follows: ( β β ε = 1 ˆ ( = b 1 b Epected value E[ b = β 1 ] 1 Variance σ b 1 = σ i ( m 2 = Standard Error Distribution: normal Interval Estimation for β 1 β 1 = b 1 ( n 2 ± t α / 2 σ i ( m 2 t ( n 2 β b 1 = 1 ± α / 2 SE Statistical data analsis in Ecel. 5. Linear regression 2
TESTING FOR SIGNIFICANCE Test for Significance H : β 1 = H a : β 1 insignificant 1. Build a F-test statistics. MSR F = MSE 1. Build a t-test statistics. t b ( m 2 = 1 = 1 i σ b s 1 b 2. Calculate a p-value 2. Calculate p-value for t Statistical data analsis in Ecel. 5. Linear regression 21
REGRESSION ANALYSIS Eample cells.ls 1. Calculate manuall b 1 and b Intercept b= -191.8119 Slope b1= 15.3385723 In Ecel use the function: = INTERCEPT(, = SLOPE(, 2. Let s do it automaticall Tools Data Analsis Regression SUMMARY OUTPUT Regression Statistics Multiple R.9584238 R Square.941195 Adjusted R Square.89953784 Standard Error 31.81893 Observations 21 ANOVA df SS MS F Significance F Regression 1 181159.2853 181159.3 179.1253 4.169E-11 Residual 19 19215.7461 111.355 Total 2 2375.314 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.% Upper 95.% Intercept -191.81194 35.751626-5.445689 2.97E-5-264.421163-117.595784-264.421163-117.595784 X Variable 1 15.33857226 1.14657646 13.38377 4.2E-11 12.9398465 17.73729848 12.9398465 17.73729848 Statistical data analsis in Ecel. 5. Linear regression 22
REGRESSION ANALYSIS Eample2 rana.ls A biolog student wishes to determine the relationship between temperature and heart rate in leopard frog, Rana pipiens. He manipulates the temperature in 2 increment ranging from 2 to 18 C and records the heart rate at each interval. His data are presented in table rana.tt 1 Build the model and provide the p-value for linear dependenc 2 Provide interval estimation for the slope of the dependenc 3 Estmate 95% prediction interval for heart rate at 15 Statistical data analsis in Ecel. 5. Linear regression 23
QUESTIONS? Thank ou for our attention to be continued Statistical data analsis in Ecel. 5. Linear regression 24