Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Size: px
Start display at page:

Download "Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0"

Transcription

1 Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation? In everyday language: some sort of a relationship In mathematical language: well-defined parameter Model: Sample from distribution of (X, Y ): (X 1,Y 1 ), (X 2,Y 2 ), (X n,y n ), assumed to be from a two-dimensional normal distribution: X i N 2 µ x, σ2 x ρσ x σ y Y i µ y ρσ x σ y σy 2 ρ = ρ xy is called the correlation ρσ x σ y is called the covariance

2 Bivariate densities with contour plots Example (contd.) Variables: OBESE: obesity index, i.e. weight/ideal weight BP: Systolic blood pressure Data set: OBS SEX OBESE BP Scatter plot (different symbols for each sex) 1 male male male male male male male female female female female female female

3 Scatter plot after logarithmic transformation The correlation measures: To what extent does the plot look like a straight line? Not: How near are the points to the straight line? Coefficient of correlation is estimated by: r = S xy r xy = Sxx S yy = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i ȳ) 2 S xy = n i=1 (x i x)(y i ȳ) S xx = n i=1 (x i x) 2 S yy = n i=1 (y i ȳ) 2 assumes values between -1 and 1 0 corresponds to independence +1 and -1 correspond to a perfect linear relationship >0 (<0) : positive (negative) slope Test of independence (no correlation) H 0 : ρ xy =0 Given a sample: Is r xy = This is measured by: T = Sxy SxxSyy near 0? r xy n 2 1 rxy 2 Under H 0, i.e. if ρ xy is equal to 0, then T has a t distribution with n 2 degrees of freedom. Whether the correlation is significantly different from 0 depends on the magnitude of the true correlation, ρ xy the number of observations, n chance

4 The above correlation coefficient is based on the bivariate normal distribution the so-called Pearson correlation SAS Analyst: Statistics/Descriptive/Correlations Correlation Analysis 2 VAR Variables: LOBESE LBP Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum LOBESE LBP Pearson Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Nonparametric correlation coefficients: Spearman s ρ and Kendall s τ SAS Analyst: Statistics/Descriptive/Correlations, Options, tick... Correlation Analysis 2 VAR Variables: LOBESE LBP Simple Statistics Variable N Mean Std Dev Median Minimum Maximum LOBESE LBP Spearman Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Kendall Tau b Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Spearman s rank correlation Each variable is rank ordered on its own obese robese bp rbp This gives the rank differences d i (dif): PERSON ROBESE RBP DIF Estimated correlation coefficient (no ties): r s =1 6 i d2 i n 3 n r s =1 (-1): scatter plot is strictly increasing (decreasing); not necessarily linear For the example we obtain: n = 102, d 2 i = r s = =0.30 Correction for ties: complicated, but... i

5 Correction for ties (Spearman) Use formula r s = S rx,ry Srx,rx S ry,ry where rx and ry denote the rank values robese and rbp, respectively. Test of independence (no correlation) T = H 0 : ρ s =0 r s 1 r 2 s n 2 approximately t n 2 distributed under H 0. The approximation holds for n 30; otherwise use tables. Here: t =3.14,n= 102, i.e. use t 100 p =0.002

6 Regression Examples Relationship between 2 continuous variables. Not necessarily causality! Purpose of regression analysis: Prediction Test of relationship Estimation Correction by comparing inhomogeneous groups Y : Response variable, dependent variable X: Explaining variable, covariate DATA: Paired observations of X and Y per row (individuals / units ): (x i,y i ),i=1,...,n Note: The x i s can be chosen beforehand! 1. Relationship between colinesterase activity (CE) and time till awakening (TIME) Response: TIME Explaining variable: CE Questions: How long is the expected time till awakening for a given value of CE? How large is the uncertainty about this prediction? 2. Comparison of lung capacity (FEV 1 )for smokers and non-smokers Problem: FEV 1 depends also on, e.g. height Response: FEV 1 Explaining variables: height, smoking habits Question: How much worse is the lung function in smokers?

7 Example (DGA p.300) Relationship between fasting blood glucose level (blodsuk) andmean velocity of circumferential shortening of the left heart ventricle (vcf) in diabetics?(n = 23) Response: Y =vcf, %/sec. Covariate: X=blodsuk, mmol/l Scatter plot Graphs/Scatter Plot/Two-Dimensional blodsuk X Axis vcf Y Axis (here one can also choose titles for the axes) Model for a straight line: Y (X) =α + βx OBS BLODSUK VCF Interpretation: α: intercept (intersection with Y axis) e.g.: Vcf of a diabetic with blood glucose value 0. Often an inadmissable extrapolation! β: slope, regression coefficient e.g.: Difference in vcf of two diabetics, who differ in their blood glucose values by 1 mmol/l. Often the parameter of greatest interest.

8 Statistical model: Solution / Least squares estimators SAS Analyst Y i = Y (X i )=α+βx i +ε i, ε i N(0,σ 2 ), indep. Estimation of α and β is done via the least squares method: Determine α and β, such that the sum of the squared vertical deviations, n n (y i (α + βx i )) 2 = ε 2 i, i=1 gets as small as possible i=1 Slope: ˆβ = S xy = s xy S xx s 2, x where the empirical covariance s xy = 1 n (x i x)(y i ȳ) = S xy n 1 n 1 i=1 is a measure for the co-variation between the observed X and Y values, and s 2 x = 1 n (x i x) 2 = S xx n 1 n 1 i=1 is the usual variance estimator for the X values. Intercept: ˆα =ȳ ˆβ x Example: Vcf vs. blood glucose ˆα =1.10, ˆβ = In the regression setting (cf. below): click Statistics, andtheretickat Plot observed vs. independent Estimated regression line: vcf= blodsuk

9 Dependent Variable: vcf Output Analysis of Variance Re-parametrization often there is no good interpretation of ˆα good idea: re-parametrize / use new explaining variable, e.g.: Regression analysis in SAS Analyst Statistics/Regression/Simple vcf Dependent blodsuk Explanatory Statistics Confidence limits for estimates Correlation matrix of estimates Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 blodsuk Variable DF 95% Confidence Limits Intercept blodsuk regression of Y on Z: Z = X 10 Y (Z) = α + βz (1) = α + β(x 10) = (α 10β)+βX thus, α = α +10β, i.e. α is the y value on the original line at x=10. Correlation of Estimates Variable Intercept blodsuk Intercept blodsuk Interpretation of α in the example: Vcf for a diabetic with blood glucose 10mmol/l.

10 Realization in SAS Analyst new variable sukker10 : Data/Transform/Compute sukker10 = blodsuk - 10 regression with blodsuk replaced by sukker10 : Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 sukker Parameter Estimates Variable DF 95% Confidence Limits Intercept sukker Correlation of Estimates Variable Intercept sukker10 Variance estimation estimate of σ 2, i.e. the variance around the regression line: ˆσ 2 = s 2 = 1 n 2 (= s 2 y ˆβs xy ) n (y i ˆα ˆβx i ) 2 i=1 estimate of the standard deviation around the regression line / residual standard deviation (here called Root Mean Square Error ) ˆσ = s = s 2 How good / precise are the estimates of the unknown parameters α and β? Slope It can be shown that ˆβ N(β, σ2 S xx ) i.e. we make a precise estimate of the slope, if the observations are close to the line the variation in the x values is large The estimate s is used instead of σ, so: SE( ˆ ˆβ) = s Sxx Intercept sukker This is the estimated standard error of ˆβ.

11 Intercept Similarly, thus ˆα N ( )) (α, σ 2 1 n + x2, Sxx SE(ˆα) ˆ 1 =s n + x2 Sxx Note: The two estimates ˆα and ˆβ are correlated! Cov(ˆα, ˆβ) = x Var( ˆβ) = σ2 x S xx If we center the covariates, i.e. use z i = x i x instead of x i (call the new intercept α ), we get the estimates ˆβ = S xy S xx ˆα = ȳ These estimators are independent! Tests and c.i.s for the slope typical null hypothesis: test statistic: H 0 : β =0 ˆβ T = SE( ˆ ˆβ) t n 2 95% confidence interval: ˆβ ± t 97.5%,n 2 ˆ SE( ˆβ) In the example we get: ˆβ=0.0220, s 2 =0.0470= SE( ˆ ˆβ)= n = 23, t 97.5%,21 =2.080 t = =2.10 t21, p = % confidence interval: ± = (0.0002, ) Tests and c.i.s for the intercept null hypothesis: H 0 : α = α 0 test statistic: 95% c.i. for α: T = ˆα α 0 SE(ˆα) ˆ t n 2 ˆα ± t 97.5%,n 2 ˆ SE(ˆα) e.g.: ± = (0.854, 1.342) however, this is not so interesting... instead, we could replace blodsuk by blodsuk then the new intercept estimate would be 1.317(0.045), with 95% c.i ± = (1.223, 1.411) this can be interpreted...

12 We can also test hypotheses on both α and β. But: Don t rely on two tests at the same time. Don t accept two parallel hypotheses Fitted / predicted values ŷ(x) =ˆα + ˆβx Moreover, we can construct confidence interval for line itself in order to compare with other groups of people constructed with help of SE(ˆα) ˆ and SE( ˆ ˆβ), and their mutual covariance prediction interval (normal region) for single observations to use as a diagnostic tool constructed with help of SE(ˆα) ˆ and SE( ˆ ˆβ), their mutual covariance, and s 2 Confidence intervals for the line ŷ(x) =ˆα + ˆβx Var (ŷ(x 0)) = σ 2 1 n (x0 x)2 + S xx large uncertainty, if x 0 is far from x narrowest interval at x 0 = x 95% confidence intervals (pointwise, i.e. for each x 0 ): ˆα + ˆβx 0 ± t 97.5%,n 2 sr 1 (x0 x)2 + n S xx These limits get arbitrarily narrow, as the sample size increases. This is often irrelevant!

13 Regression line with c.i.s in SAS Analyst In the regression setting click Statistics, andtickat Plot observed vs. independent Confidence limits Prediction intervals In which region will typical observations of y = α + βx i + ε lie, given x = x 0? Var(y(x 0) ŷ(x 0)) = σ (x0 x)2 + n S xx 95% prediction intervals (pointwise): ˆα + ˆβx 1n (x0 x)2 0 ± t 97.5%,n 2 s r1 + + S xx Prediction limits in SAS Analyst In the regression setting click Statistics, andtickat Plot observed vs. independent Prediction limits Interpretation: The prediction intervals include about 95% of the future observations, also for large n. These limits don t get much narrower, as the number of observations increases They are used to assess whether a new person is atypical as compared to the norm. Again, they are narrowest at x 0 = x.

14 Analysis of variance scheme in regression analysis Underlying question: Does x have an important impact as an explaining variable? Estimate models with and without the explaining variable x. Without x With x Residual sum of squares (SS) n (y i ȳ) 2 = S yy = SS total i=1 n (y i (ˆα + ˆβx i )) 2 = SS resid i=1 x is a good explaining variable, if SS resid is small compared to SS total Partition of the variation SS total = n (y i ȳ) 2 = SS resid + SS model i=1 Total variation = variation, which cannot be explained + variation, which can be explained degrees of freedom: (n-1)=(n-2)+1 null hypothesis: H 0 : β =0 test statistic: F = under H 0 SS model/1 SS resid /(n 2) F 1,n 2 Note: ˆβ T = SE( ˆ ˆβ) = F Here: f =4.414 = = t 2 Coefficient of determination, R 2 Proportion of the variation explained by the model as compared to the toal variation (in y): R 2 =1 SS resid SS total = SS model SS total Sxy =( ) 2 = r 2 SxxSyy For simple linear regression: square of Corr(x, y), i.e. grade of linear relationship For multiple regression models: square of Corr(ŷ, y) Here: R 2 =0.17 (r =0.42)

15 Regression vs. correlation But take care: If (X, Y ) is bivariate normally distributed, then we can calculate the conditional distribution of Y given X = x. It can be seen, that this is again a normal distribution E(Y X = x) is linear in x Var(Y X = x) is independent of x This means that one can perform a linear regression analysis of Y on X, aswellas calculate a correlation coefficient: ˆβ = Sxy Sxx r xy = Sxy SxxSyy ˆβ = r xy Syy Sxx The test for β =0isidentical to the test for ρ xy =0 1 r 2 xy = If ˆβ and s 2 are fixed: s 2 s 2 + ˆβ 2 Sxx n 2 S xx large, then 1 r 2 xy is near 0, and r 2 xy is near 1 r 2 xy canbearbitrarilycloseto1incaseof strongly varying x values e.g., if the central ones are left out The correlation is irrelevant, ifthex values are influenced!

16 Spurious correlation Correlation coefficient expresses a relationship not agreement (e.g., there is a relationship between age and blood pressure, but of course there is no agreement) The number itself is only meaningful, if we have sampled randomly from a well-defined population. For Pearson s correlation this population should be well described by a bivariate normal distribution. In case of selection sampling the number can be manipulated and (in theory) get arbitrarily close to 1 (or -1). Pearson s correlation measures the grade of a linear relationship. For nonlinear relationships one should use rank correlations (Spearman). Test of H 0 : ρ = 0 (independence) is OK, if the conditions for a linear regression are fulfilled. A statistically significant correlation can be theoretically interesting, but clinically not interesting. The existence of a significant correlation between two variables does not necessarily mean that there is a causal relationship between them. X and Y are positively correlated for men positively correlated for women negatively correlated for human beings

17 Model checking in simple linear regression X and Y are apparently positively correlated but uncorrelated for each age group X and Y both increase with age Misuse of correlations The correlation coefficient is very often used to measure relationships between two variables, but: The correlation coefficient expresses relationships, not agreement The correlation depends on the selection of the patients When comparing two measurement methodsitisacompletely senseless conclusion just to state that there is a signficant relationship. Of course there is one, since the same thing was measured twice! The statistical model was Y i = α + βx i + ε i, ε i N(0,σ 2 ) indep. What should we check here? linearity independence between the ε i variance homogeneity (constant σ 2 ) normally distributed errors ε i To this end we use the residuals (model deviations; observed - fitted values): ˆε i = y i ŷ i used mainly for graphical model checking Note: No assumption of normality for the x i!!

18 We have assumed that ε i N(0,σ 2 ) indep., so we would expect that the same holds for the residuals ˆε i = y i ŷ i. This is not true! They are not independent (sum up to 0) doesn t mean much if there are sufficiently many observations They do not all have the same variance where Var(ˆε i )=σ 2 (1 h ii ), h ii = 1 n + (x i x) 2 S xx is the leverage of the i th observation Normalized / studentized residuals: r i = ˆε i s 1 h ii Var(r i ) 1 Residual plots Residuals ˆε i or r i are plotted vs. the explaining variable x i to check linearity the fitted values ŷ i to check variance homogeneity and normality of the errors time or consecutively to check independence normal scores, i.e. probability plot or histogram to check normality The first three should give an impression of disorder (evenly scattered values, nothing systematic). The probability plot should fit to a straight line. SAS Analyst: Variance homogeneity? Certain plots can be produced directly in the regression setting, by clicking Plots/Residual and then choosing Residual vs. Predicted

19 What happens if the assumptions don t hold? Linearity Model gets uninterpretable transformation more explaining variables non-linear regression Variance homogeneity Estimation is inefficient (have unnecessarily large variance) transformation weighted regression Independence Variance estimate gets wrong difficult (repeated measurements) Normally distributed errors Estimation is inefficient (a little) transformation robust regression If linearity is dubious: Linearity add more covariates quadratical term blodsuk 2 vcf=α+β 1 blodsuk+β 2 blodsuk 2 Test of linearity: H 0 : β 2 =0 alder transform variables by logarithms square root inverse non-linear regression

20 Model with quadratic term: Dependent Variable: vcf Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 sukker blodsuk_i_anden Parameter Estimates Variable DF 95% Confidence Limits Intercept sukker blodsuk_i_anden Correlation of Estimates blodsuk_ Variable Intercept sukker10 i_anden Intercept sukker blodsuk_i_anden Variance homogeneity (homoscedasticity) Var(ε i )=σ 2, i =1,,n constant variance (or standard deviation) Which alternatives could there be? constant relative standard deviation = constant coefficient of variation (CV ) CV = standard deviation mean value often constant, if one measures small positive quantities, e.g. concentrations this will cause a trumpet shape in the residual plot transform by logarithm Stratified experiment, e.g. in case of several instruments or laboratories difference in variances can be checked with Bartlett s test (cf. next week) Normally distributed errors not critical for the fit itself Least squares method yields the best estimate at any rate the t distribution is based on the normality assumption, but actually on the normality assumption for the estimate ˆβ, and this is often okay in case of sufficiently many observations, due to : The central limit theorem, which states that sums (and certain other functions) of many observations get more and more normally distributed.

21 Transformation logarithm, squareroot,inverse Why take logarithms? of the explaining variable to achieve linearity: if there are successive doublings, which have a constant effect: Use logarithms to the basis 2! of the response variable to achieve linearity to achieve variance homogeneity Var(log(y)) Var(y) y 2 i.e. a constant coefficient of variation of Y means a constant variance of log(y ) (the natural logarithm, to the basis e). Regression diagnostics Are the conclusions supported by the whole data set? Or are there observations with rather large influence on the results? Leverage = potential influence (hat-matrix) h ii = 1 n + (x i x) 2 S xx Observations with extreme x values can have a large influence on the results,... y but they do not necessarily! if they lie nicely with respect to the regression line, i.e. have a little residual x

22 Influencing observations Those, which have a combination of high leverage large residual Regression diagnostics Leave out the i th person and find new estimates, ˆα (i) and ˆβ (i) Calculate Cook s distance, an aggregate measure for the changes in the parameter estimates Split Cook s distance into its coordinates and specify: By how many SE s is ˆβ changed, e.g., if the i th person is left out? What do with influencing observations? leave them out? Regression with the whole data set ŷ(x) = x, ˆβ =0.022(0.010) t = =2.1, p = Regression without obs. no. 13: ŷ(x) = x, ˆβ =0.011(0.010) t = =1.05, p = state a measure for their influence?

23 dfbetas(lm.velo)[, 1] dfbetas(lm.velo)[, 2] Changes in parameter estimates and predicted values (leaving out ith obs.) blood.glucose dfbetas(lm.velo)[, 1] dfbetas(lm.velo)[, 2] dffits(lm.velo) blood.glucose blood.glucose Outliers Observations, which don t fit into the relationship they are not necessarily influencing they don t necessarily have a large residual Predicted residuals Residuals, which are obtained at each x i,if the corresponding observation (x i,y i )is excluded from the estimation. used for detecting outliers PRESS: Predicted Residuals SS What to do with outliers? look more closely at them, they are often quite interesting When can we exclude them? if they lie quite far away, i.e. have high leverage remember to distinguish the conclusions accordingly! if one can find the reason and then all these should be excluded!

24 Model checking and Diagnostics in SAS Analyst In the regression setting, use Save Data tick at Create and save diagnostics data insert (click Add) the quantities to be saved (typically: Predicted, Residual, Student, Rstudent, Cookd, Press). Double-click at Diagnostics Table in the project tree Save that by clicking File/Save as By SAS Name

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income Scatterplots Quantitative Research Methods: Introduction to correlation and regression Scatterplots can be considered as interval/ratio analogue of cross-tabs: arbitrarily many values mapped out in -dimensions

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Regression and correlation

Regression and correlation 6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

WORKSHOP 3 Measuring Association

WORKSHOP 3 Measuring Association WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

Correlation and Regression

Correlation and Regression Correlation and Regression 1 Overview Introduction Scatter Plots Correlation Regression Coefficient of Determination 2 Objectives of the topic 1. Draw a scatter plot for a set of ordered pairs. 2. Compute

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

STAT2012 Statistical Tests 23 Regression analysis: method of least squares 23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

6. CORRELATION SCATTER PLOTS. PEARSON S CORRELATION COEFFICIENT: Definition

6. CORRELATION SCATTER PLOTS. PEARSON S CORRELATION COEFFICIENT: Definition 6. CORRELATION Scatter plots Pearson s correlation coefficient (r ). Definition Hypothesis test & CI Spearman s rank correlation coefficient rho (ρ) Correlation & causation Misuse of correlation Two techniques

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours 01 This question paper consists of 11 printed pages, each of which is identified by the reference. Only approved basic scientific calculators may be used. Statistical tables are provided at the end of

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information