STATISTICAL DATA ANALYSIS IN EXCEL

Similar documents
S A T T A I T ST S I T CA C L A L DAT A A T

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Correlation Analysis

The Multiple Regression Model

Inferences for Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

Statistics for Managers using Microsoft Excel 6 th Edition

Inference about the Slope and Intercept

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 10 Multiple Linear Regression

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Chapter 14 Student Lecture Notes 14-1

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 16. Simple Linear Regression and Correlation

Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Intro to Linear Regression

Chapter 7 Student Lecture Notes 7-1

Intro to Linear Regression

Basic Business Statistics 6 th Edition

Inference for the Regression Coefficient

Chapter 4: Regression Models

Ch 2: Simple Linear Regression

Inference for Regression Simple Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Chapter 12 - Lecture 2 Inferences about regression coefficient

Inference for Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Topic - 12 Linear Regression and Correlation

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Business Statistics. Lecture 10: Correlation and Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Simple Linear Regression Using Ordinary Least Squares

Linear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 3 Multiple Regression Complete Example

Inference for Regression Inference about the Regression Model and Using the Regression Line

Correlation and simple linear regression S5

Confidence Interval for the mean response

Ch 3: Multiple Linear Regression

Multiple Linear Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

Regression Models - Introduction

Mathematics for Economics MA course

STAT 350 Final (new Material) Review Problems Key Spring 2016

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Lectures on Simple Linear Regression Stat 431, Summer 2012

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation and the Analysis of Variance Approach to Simple Linear Regression

TMA4255 Applied Statistics V2016 (5)

Measuring the fit of the model - SSR

11. Regression and Least Squares

Correlation and Regression

Ordinary Least Squares Regression Explained: Vartanian

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Regression Models. Chapter 4

Lecture 11: Simple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Basic Business Statistics, 10/e

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LINEAR REGRESSION ANALYSIS

Can you tell the relationship between students SAT scores and their college grades?

Ch. 1: Data and Distributions

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION

Six Sigma Black Belt Study Guides

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

df=degrees of freedom = n - 1

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Biostatistics in Research Practice - Regression I

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Statistical Techniques II EXST7015 Simple Linear Regression

Linear Regression Model. Badr Missaoui

Statistics and Quantitative Analysis U4320

Lecture 9: Linear Regression

CHAPTER EIGHT Linear Regression

Applied Regression Analysis

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

STA121: Applied Regression Analysis

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Categorical Predictor Variables

Lecture 1 Linear Regression with One Predictor Variable.p2

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Chapter 11. Correlation and Regression

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

F-tests and Nested Models

Regression Analysis II

FREC 608 Guided Exercise 9

Transcription:

Microarra Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 14-1-213 petr.nazarov@crp-sante.lu Statistical data analsis in Ecel. 5. Linear regression

OUTLINE Lecture 1 Introduction covariation and correlation measures dependent and independent random variables scatter plot and linear trendline linear model Testing for significance estimation of the noise variance interval estimations testing hpothesis about significance Regression Analsis confidence and prediction multiple linear regression nonlinear regression Statistical data analsis in Ecel. 5. Linear regression 2

INTRODUCTION Dependent and Independent Variables mice.ls Ending weight vs. Starting weight Blood ph vs. Starting weight Ending weight 6 5 4 3 2 1 Blood ph 7.45 7.4 7.35 7.3 7.25 7.2 7.15 7.1 7.5 1 2 3 4 5 Starting weight 7 1 2 3 4 5 Starting weight Statistical data analsis in Ecel. 5. Linear regression 3

INTRODUCTION Dependent and Independent Variables Statistical data analsis in Ecel. 5. Linear regression 4

INTRODUCTION Measure of Association between 2 Variables Covariance A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship. σ = population ( i µ ( i µ N s sample ( m ( m i = n 1 i mice.ls 6 5 In Ecel use function: =COVAR(data Ending weight vs. Starting weight Ending weight 4 3 2 1 1 2 3 4 5 Starting weight s = 39.8 hard to interpret Statistical data analsis in Ecel. 5. Linear regression 5

INTRODUCTION Measure of Association between 2 Variables Correlation (Pearson product moment correlation coefficient A measure of linear association between two variables that takes on values between -1 and 1. Values near 1 indicate a strong positive linear relationship, values near -1 indicate a strong negative linear relationship; and values near zero indicate the lack of a linear relationship. population ( m ( m σ i i ρ = = σ σ σ σ N r sample s i m = = s s s s ( ( m i ( n 1 6 5 Ending weight 4 3 2 1 In Ecel use function: =CORREL(data r =.94 1 2 3 4 5 Starting weight Statistical data analsis in Ecel. 5. Linear regression 6

LINEAR REGRESSION Eperiments Temperature Cell Number 2 83 21 139 22 99 23 143 24 164 25 233 26 198 27 261 28 235 29 264 3 243 31 339 32 331 33 346 34 35 35 368 36 36 37 397 38 361 39 358 4 381 Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature Cells are grown under different temperature conditions from 2 to 4. A researched would like to find a dependenc between T and cell number. Dependent variable The variable that is being predicted or eplained. It is denoted b. Independent variable The variable that is doing the predicting or eplaining. It is denoted b. Statistical data analsis in Ecel. 5. Linear regression 7

LINEAR REGRESSION Eperiments Simple linear regression Regression analsis involving one independent variable and one dependent variable in which the relationship between the variables is approimated b a straight line. Building a regression means finding and tuning the model to eplain the behaviour of the data 45 4 35 Number of cells 3 25 2 15 1 5 15 25 35 45 Temperature Statistical data analsis in Ecel. 5. Linear regression 8

LINEAR REGRESSION Eperiments Regression model The equation describing how is related to and an error term; in simple linear regression, the regression model is = β β 1 ε Regression equation The equation that describes how the mean or epected value of the dependent variable is related to the independent variable; in simple linear regression, E( =β β 1 Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature Model for a simple linear regression: ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 9

LINEAR REGRESSION Regression Model and Regression Line ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 1

LINEAR REGRESSION Eperiments Estimated regression equation The estimate of the regression equation developed from sample data b using the least squares method. For simple linear regression, the estimated regression equation is = b b 1 cells.ls 1. Make a scatter plot for the data. ( β β ε = 1 ˆ ( = b 1 b [ ( ] = b 1 b E Number of cells 45 4 35 3 25 2 15 1 5 15 25 35 45 Temperature 2. Right click to Add Trendline. Show equation. Number of cells 45 4 = 15.339-191.1 35 3 25 2 15 1 5 15 25 35 45 Temperature Statistical data analsis in Ecel. 5. Linear regression 11

SIMPLE LINEAR REGRESSION Overview Statistical data analsis in Ecel. 5. Linear regression 12

SIMPLE LINEAR REGRESSION Eperiments Least squares method A procedure used to develop the estimated regression equation. 2 ˆi The objective is to minimize ( i Slope: b 1 = ( m ( m i ( m 2 1 i Intersect: b = m b1m Statistical data analsis in Ecel. 5. Linear regression 13

LINEAR REGRESSION Coefficient of Determination Sum squares due to error SSE ( ˆ 2 i = i 45 4 35 = 15.339-191.1 Sum squares total SST ( 2 = i Number of cells 3 25 2 15 Sum squares due to regression ( ˆ 2 SSR = i 1 5 15 2 25 3 35 4 45 Temperature The Main Equation SST = SSR SSE Statistical data analsis in Ecel. 5. Linear regression 14

LINEAR REGRESSION ANOVA and Regression: Testing for Significance 45 4 35 Number of cells 3 25 2 15 1 5 15 2 25 3 35 4 45 Temperature SST = SSR SSE H : β 1 = insignificant H a : β 1 Statistical data analsis in Ecel. 5. Linear regression 15

LINEAR REGRESSION Coefficient of Determination SSE SST ( ˆ 2 i = i ( 2 = i ( ˆ 2 SSR = i SST = SSR SSE Coefficient of determination A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variabilit in the dependent variable that is eplained b the estimated regression equation. Number of cells 45 4 35 = 15.339-191.1 R 2 =.941 3 25 2 15 1 5 15 2 25 3 35 4 45 Temperature 2 R = SSR SST Correlation coefficient A measure of the strength of the linear relationship between two 2 r = sign ( b variables (previousl discussed in Lecture 1. 1 R Statistical data analsis in Ecel. 5. Linear regression 16

LINEAR REGRESSION Assumptions Assumptions for Simple Linear Regression 1. The error term ε is a random variable with mean, i.e. E[ε]= 2. The variance of ε, denoted b σ 2, is the same for all values of 3. The values of ε are independent 3. The term ε is a normall distributed variable ( β β ε = 1 Statistical data analsis in Ecel. 5. Linear regression 17

REGRESSION ANALYSIS Confidence and Prediction Confidence interval The interval estimate of the mean value of for a given value of. Prediction interval The interval estimate of an individual value of for a given value of. Statistical data analsis in Ecel. 5. Linear regression 18

REGRESSION ANALYSIS Residuals Statistical data analsis in Ecel. 5. Linear regression 19

TESTING FOR SIGNIFICANCE Sampling Distribution for b 1 If assumptions for ε are fulfilled, then the sampling distribution for b 1 is as follows: ( β β ε = 1 ˆ ( = b 1 b Epected value E[ b = β 1 ] 1 Variance σ b 1 = σ i ( m 2 = Standard Error Distribution: normal Interval Estimation for β 1 β 1 = b 1 ( n 2 ± t α / 2 σ i ( m 2 t ( n 2 β b 1 = 1 ± α / 2 SE Statistical data analsis in Ecel. 5. Linear regression 2

TESTING FOR SIGNIFICANCE Test for Significance H : β 1 = H a : β 1 insignificant 1. Build a F-test statistics. MSR F = MSE 1. Build a t-test statistics. t b ( m 2 = 1 = 1 i σ b s 1 b 2. Calculate a p-value 2. Calculate p-value for t Statistical data analsis in Ecel. 5. Linear regression 21

REGRESSION ANALYSIS Eample cells.ls 1. Calculate manuall b 1 and b Intercept b= -191.8119 Slope b1= 15.3385723 In Ecel use the function: = INTERCEPT(, = SLOPE(, 2. Let s do it automaticall Tools Data Analsis Regression SUMMARY OUTPUT Regression Statistics Multiple R.9584238 R Square.941195 Adjusted R Square.89953784 Standard Error 31.81893 Observations 21 ANOVA df SS MS F Significance F Regression 1 181159.2853 181159.3 179.1253 4.169E-11 Residual 19 19215.7461 111.355 Total 2 2375.314 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.% Upper 95.% Intercept -191.81194 35.751626-5.445689 2.97E-5-264.421163-117.595784-264.421163-117.595784 X Variable 1 15.33857226 1.14657646 13.38377 4.2E-11 12.9398465 17.73729848 12.9398465 17.73729848 Statistical data analsis in Ecel. 5. Linear regression 22

REGRESSION ANALYSIS Eample2 rana.ls A biolog student wishes to determine the relationship between temperature and heart rate in leopard frog, Rana pipiens. He manipulates the temperature in 2 increment ranging from 2 to 18 C and records the heart rate at each interval. His data are presented in table rana.tt 1 Build the model and provide the p-value for linear dependenc 2 Provide interval estimation for the slope of the dependenc 3 Estmate 95% prediction interval for heart rate at 15 Statistical data analsis in Ecel. 5. Linear regression 23

QUESTIONS? Thank ou for our attention to be continued Statistical data analsis in Ecel. 5. Linear regression 24