Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0
|
|
- Brittany Flowers
- 6 years ago
- Views:
Transcription
1 Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation? In everyday language: some sort of a relationship In mathematical language: well-defined parameter Model: Sample from distribution of (X, Y ): (X 1,Y 1 ), (X 2,Y 2 ), (X n,y n ), assumed to be from a two-dimensional normal distribution: X i N 2 µ x, σ2 x ρσ x σ y Y i µ y ρσ x σ y σy 2 ρ = ρ xy is called the correlation ρσ x σ y is called the covariance
2 Bivariate densities with contour plots Example (contd.) Variables: OBESE: obesity index, i.e. weight/ideal weight BP: Systolic blood pressure Data set: OBS SEX OBESE BP Scatter plot (different symbols for each sex) 1 male male male male male male male female female female female female female
3 Scatter plot after logarithmic transformation The correlation measures: To what extent does the plot look like a straight line? Not: How near are the points to the straight line? Coefficient of correlation is estimated by: r = S xy r xy = Sxx S yy = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i ȳ) 2 S xy = n i=1 (x i x)(y i ȳ) S xx = n i=1 (x i x) 2 S yy = n i=1 (y i ȳ) 2 assumes values between -1 and 1 0 corresponds to independence +1 and -1 correspond to a perfect linear relationship >0 (<0) : positive (negative) slope Test of independence (no correlation) H 0 : ρ xy =0 Given a sample: Is r xy = This is measured by: T = Sxy SxxSyy near 0? r xy n 2 1 rxy 2 Under H 0, i.e. if ρ xy is equal to 0, then T has a t distribution with n 2 degrees of freedom. Whether the correlation is significantly different from 0 depends on the magnitude of the true correlation, ρ xy the number of observations, n chance
4 The above correlation coefficient is based on the bivariate normal distribution the so-called Pearson correlation SAS Analyst: Statistics/Descriptive/Correlations Correlation Analysis 2 VAR Variables: LOBESE LBP Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum LOBESE LBP Pearson Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Nonparametric correlation coefficients: Spearman s ρ and Kendall s τ SAS Analyst: Statistics/Descriptive/Correlations, Options, tick... Correlation Analysis 2 VAR Variables: LOBESE LBP Simple Statistics Variable N Mean Std Dev Median Minimum Maximum LOBESE LBP Spearman Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Kendall Tau b Correlation Coefficients / Prob > R under Ho: Rho=0 / N = 102 LOBESE LBP LOBESE LBP Spearman s rank correlation Each variable is rank ordered on its own obese robese bp rbp This gives the rank differences d i (dif): PERSON ROBESE RBP DIF Estimated correlation coefficient (no ties): r s =1 6 i d2 i n 3 n r s =1 (-1): scatter plot is strictly increasing (decreasing); not necessarily linear For the example we obtain: n = 102, d 2 i = r s = =0.30 Correction for ties: complicated, but... i
5 Correction for ties (Spearman) Use formula r s = S rx,ry Srx,rx S ry,ry where rx and ry denote the rank values robese and rbp, respectively. Test of independence (no correlation) T = H 0 : ρ s =0 r s 1 r 2 s n 2 approximately t n 2 distributed under H 0. The approximation holds for n 30; otherwise use tables. Here: t =3.14,n= 102, i.e. use t 100 p =0.002
6 Regression Examples Relationship between 2 continuous variables. Not necessarily causality! Purpose of regression analysis: Prediction Test of relationship Estimation Correction by comparing inhomogeneous groups Y : Response variable, dependent variable X: Explaining variable, covariate DATA: Paired observations of X and Y per row (individuals / units ): (x i,y i ),i=1,...,n Note: The x i s can be chosen beforehand! 1. Relationship between colinesterase activity (CE) and time till awakening (TIME) Response: TIME Explaining variable: CE Questions: How long is the expected time till awakening for a given value of CE? How large is the uncertainty about this prediction? 2. Comparison of lung capacity (FEV 1 )for smokers and non-smokers Problem: FEV 1 depends also on, e.g. height Response: FEV 1 Explaining variables: height, smoking habits Question: How much worse is the lung function in smokers?
7 Example (DGA p.300) Relationship between fasting blood glucose level (blodsuk) andmean velocity of circumferential shortening of the left heart ventricle (vcf) in diabetics?(n = 23) Response: Y =vcf, %/sec. Covariate: X=blodsuk, mmol/l Scatter plot Graphs/Scatter Plot/Two-Dimensional blodsuk X Axis vcf Y Axis (here one can also choose titles for the axes) Model for a straight line: Y (X) =α + βx OBS BLODSUK VCF Interpretation: α: intercept (intersection with Y axis) e.g.: Vcf of a diabetic with blood glucose value 0. Often an inadmissable extrapolation! β: slope, regression coefficient e.g.: Difference in vcf of two diabetics, who differ in their blood glucose values by 1 mmol/l. Often the parameter of greatest interest.
8 Statistical model: Solution / Least squares estimators SAS Analyst Y i = Y (X i )=α+βx i +ε i, ε i N(0,σ 2 ), indep. Estimation of α and β is done via the least squares method: Determine α and β, such that the sum of the squared vertical deviations, n n (y i (α + βx i )) 2 = ε 2 i, i=1 gets as small as possible i=1 Slope: ˆβ = S xy = s xy S xx s 2, x where the empirical covariance s xy = 1 n (x i x)(y i ȳ) = S xy n 1 n 1 i=1 is a measure for the co-variation between the observed X and Y values, and s 2 x = 1 n (x i x) 2 = S xx n 1 n 1 i=1 is the usual variance estimator for the X values. Intercept: ˆα =ȳ ˆβ x Example: Vcf vs. blood glucose ˆα =1.10, ˆβ = In the regression setting (cf. below): click Statistics, andtheretickat Plot observed vs. independent Estimated regression line: vcf= blodsuk
9 Dependent Variable: vcf Output Analysis of Variance Re-parametrization often there is no good interpretation of ˆα good idea: re-parametrize / use new explaining variable, e.g.: Regression analysis in SAS Analyst Statistics/Regression/Simple vcf Dependent blodsuk Explanatory Statistics Confidence limits for estimates Correlation matrix of estimates Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 blodsuk Variable DF 95% Confidence Limits Intercept blodsuk regression of Y on Z: Z = X 10 Y (Z) = α + βz (1) = α + β(x 10) = (α 10β)+βX thus, α = α +10β, i.e. α is the y value on the original line at x=10. Correlation of Estimates Variable Intercept blodsuk Intercept blodsuk Interpretation of α in the example: Vcf for a diabetic with blood glucose 10mmol/l.
10 Realization in SAS Analyst new variable sukker10 : Data/Transform/Compute sukker10 = blodsuk - 10 regression with blodsuk replaced by sukker10 : Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 sukker Parameter Estimates Variable DF 95% Confidence Limits Intercept sukker Correlation of Estimates Variable Intercept sukker10 Variance estimation estimate of σ 2, i.e. the variance around the regression line: ˆσ 2 = s 2 = 1 n 2 (= s 2 y ˆβs xy ) n (y i ˆα ˆβx i ) 2 i=1 estimate of the standard deviation around the regression line / residual standard deviation (here called Root Mean Square Error ) ˆσ = s = s 2 How good / precise are the estimates of the unknown parameters α and β? Slope It can be shown that ˆβ N(β, σ2 S xx ) i.e. we make a precise estimate of the slope, if the observations are close to the line the variation in the x values is large The estimate s is used instead of σ, so: SE( ˆ ˆβ) = s Sxx Intercept sukker This is the estimated standard error of ˆβ.
11 Intercept Similarly, thus ˆα N ( )) (α, σ 2 1 n + x2, Sxx SE(ˆα) ˆ 1 =s n + x2 Sxx Note: The two estimates ˆα and ˆβ are correlated! Cov(ˆα, ˆβ) = x Var( ˆβ) = σ2 x S xx If we center the covariates, i.e. use z i = x i x instead of x i (call the new intercept α ), we get the estimates ˆβ = S xy S xx ˆα = ȳ These estimators are independent! Tests and c.i.s for the slope typical null hypothesis: test statistic: H 0 : β =0 ˆβ T = SE( ˆ ˆβ) t n 2 95% confidence interval: ˆβ ± t 97.5%,n 2 ˆ SE( ˆβ) In the example we get: ˆβ=0.0220, s 2 =0.0470= SE( ˆ ˆβ)= n = 23, t 97.5%,21 =2.080 t = =2.10 t21, p = % confidence interval: ± = (0.0002, ) Tests and c.i.s for the intercept null hypothesis: H 0 : α = α 0 test statistic: 95% c.i. for α: T = ˆα α 0 SE(ˆα) ˆ t n 2 ˆα ± t 97.5%,n 2 ˆ SE(ˆα) e.g.: ± = (0.854, 1.342) however, this is not so interesting... instead, we could replace blodsuk by blodsuk then the new intercept estimate would be 1.317(0.045), with 95% c.i ± = (1.223, 1.411) this can be interpreted...
12 We can also test hypotheses on both α and β. But: Don t rely on two tests at the same time. Don t accept two parallel hypotheses Fitted / predicted values ŷ(x) =ˆα + ˆβx Moreover, we can construct confidence interval for line itself in order to compare with other groups of people constructed with help of SE(ˆα) ˆ and SE( ˆ ˆβ), and their mutual covariance prediction interval (normal region) for single observations to use as a diagnostic tool constructed with help of SE(ˆα) ˆ and SE( ˆ ˆβ), their mutual covariance, and s 2 Confidence intervals for the line ŷ(x) =ˆα + ˆβx Var (ŷ(x 0)) = σ 2 1 n (x0 x)2 + S xx large uncertainty, if x 0 is far from x narrowest interval at x 0 = x 95% confidence intervals (pointwise, i.e. for each x 0 ): ˆα + ˆβx 0 ± t 97.5%,n 2 sr 1 (x0 x)2 + n S xx These limits get arbitrarily narrow, as the sample size increases. This is often irrelevant!
13 Regression line with c.i.s in SAS Analyst In the regression setting click Statistics, andtickat Plot observed vs. independent Confidence limits Prediction intervals In which region will typical observations of y = α + βx i + ε lie, given x = x 0? Var(y(x 0) ŷ(x 0)) = σ (x0 x)2 + n S xx 95% prediction intervals (pointwise): ˆα + ˆβx 1n (x0 x)2 0 ± t 97.5%,n 2 s r1 + + S xx Prediction limits in SAS Analyst In the regression setting click Statistics, andtickat Plot observed vs. independent Prediction limits Interpretation: The prediction intervals include about 95% of the future observations, also for large n. These limits don t get much narrower, as the number of observations increases They are used to assess whether a new person is atypical as compared to the norm. Again, they are narrowest at x 0 = x.
14 Analysis of variance scheme in regression analysis Underlying question: Does x have an important impact as an explaining variable? Estimate models with and without the explaining variable x. Without x With x Residual sum of squares (SS) n (y i ȳ) 2 = S yy = SS total i=1 n (y i (ˆα + ˆβx i )) 2 = SS resid i=1 x is a good explaining variable, if SS resid is small compared to SS total Partition of the variation SS total = n (y i ȳ) 2 = SS resid + SS model i=1 Total variation = variation, which cannot be explained + variation, which can be explained degrees of freedom: (n-1)=(n-2)+1 null hypothesis: H 0 : β =0 test statistic: F = under H 0 SS model/1 SS resid /(n 2) F 1,n 2 Note: ˆβ T = SE( ˆ ˆβ) = F Here: f =4.414 = = t 2 Coefficient of determination, R 2 Proportion of the variation explained by the model as compared to the toal variation (in y): R 2 =1 SS resid SS total = SS model SS total Sxy =( ) 2 = r 2 SxxSyy For simple linear regression: square of Corr(x, y), i.e. grade of linear relationship For multiple regression models: square of Corr(ŷ, y) Here: R 2 =0.17 (r =0.42)
15 Regression vs. correlation But take care: If (X, Y ) is bivariate normally distributed, then we can calculate the conditional distribution of Y given X = x. It can be seen, that this is again a normal distribution E(Y X = x) is linear in x Var(Y X = x) is independent of x This means that one can perform a linear regression analysis of Y on X, aswellas calculate a correlation coefficient: ˆβ = Sxy Sxx r xy = Sxy SxxSyy ˆβ = r xy Syy Sxx The test for β =0isidentical to the test for ρ xy =0 1 r 2 xy = If ˆβ and s 2 are fixed: s 2 s 2 + ˆβ 2 Sxx n 2 S xx large, then 1 r 2 xy is near 0, and r 2 xy is near 1 r 2 xy canbearbitrarilycloseto1incaseof strongly varying x values e.g., if the central ones are left out The correlation is irrelevant, ifthex values are influenced!
16 Spurious correlation Correlation coefficient expresses a relationship not agreement (e.g., there is a relationship between age and blood pressure, but of course there is no agreement) The number itself is only meaningful, if we have sampled randomly from a well-defined population. For Pearson s correlation this population should be well described by a bivariate normal distribution. In case of selection sampling the number can be manipulated and (in theory) get arbitrarily close to 1 (or -1). Pearson s correlation measures the grade of a linear relationship. For nonlinear relationships one should use rank correlations (Spearman). Test of H 0 : ρ = 0 (independence) is OK, if the conditions for a linear regression are fulfilled. A statistically significant correlation can be theoretically interesting, but clinically not interesting. The existence of a significant correlation between two variables does not necessarily mean that there is a causal relationship between them. X and Y are positively correlated for men positively correlated for women negatively correlated for human beings
17 Model checking in simple linear regression X and Y are apparently positively correlated but uncorrelated for each age group X and Y both increase with age Misuse of correlations The correlation coefficient is very often used to measure relationships between two variables, but: The correlation coefficient expresses relationships, not agreement The correlation depends on the selection of the patients When comparing two measurement methodsitisacompletely senseless conclusion just to state that there is a signficant relationship. Of course there is one, since the same thing was measured twice! The statistical model was Y i = α + βx i + ε i, ε i N(0,σ 2 ) indep. What should we check here? linearity independence between the ε i variance homogeneity (constant σ 2 ) normally distributed errors ε i To this end we use the residuals (model deviations; observed - fitted values): ˆε i = y i ŷ i used mainly for graphical model checking Note: No assumption of normality for the x i!!
18 We have assumed that ε i N(0,σ 2 ) indep., so we would expect that the same holds for the residuals ˆε i = y i ŷ i. This is not true! They are not independent (sum up to 0) doesn t mean much if there are sufficiently many observations They do not all have the same variance where Var(ˆε i )=σ 2 (1 h ii ), h ii = 1 n + (x i x) 2 S xx is the leverage of the i th observation Normalized / studentized residuals: r i = ˆε i s 1 h ii Var(r i ) 1 Residual plots Residuals ˆε i or r i are plotted vs. the explaining variable x i to check linearity the fitted values ŷ i to check variance homogeneity and normality of the errors time or consecutively to check independence normal scores, i.e. probability plot or histogram to check normality The first three should give an impression of disorder (evenly scattered values, nothing systematic). The probability plot should fit to a straight line. SAS Analyst: Variance homogeneity? Certain plots can be produced directly in the regression setting, by clicking Plots/Residual and then choosing Residual vs. Predicted
19 What happens if the assumptions don t hold? Linearity Model gets uninterpretable transformation more explaining variables non-linear regression Variance homogeneity Estimation is inefficient (have unnecessarily large variance) transformation weighted regression Independence Variance estimate gets wrong difficult (repeated measurements) Normally distributed errors Estimation is inefficient (a little) transformation robust regression If linearity is dubious: Linearity add more covariates quadratical term blodsuk 2 vcf=α+β 1 blodsuk+β 2 blodsuk 2 Test of linearity: H 0 : β 2 =0 alder transform variables by logarithms square root inverse non-linear regression
20 Model with quadratic term: Dependent Variable: vcf Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 sukker blodsuk_i_anden Parameter Estimates Variable DF 95% Confidence Limits Intercept sukker blodsuk_i_anden Correlation of Estimates blodsuk_ Variable Intercept sukker10 i_anden Intercept sukker blodsuk_i_anden Variance homogeneity (homoscedasticity) Var(ε i )=σ 2, i =1,,n constant variance (or standard deviation) Which alternatives could there be? constant relative standard deviation = constant coefficient of variation (CV ) CV = standard deviation mean value often constant, if one measures small positive quantities, e.g. concentrations this will cause a trumpet shape in the residual plot transform by logarithm Stratified experiment, e.g. in case of several instruments or laboratories difference in variances can be checked with Bartlett s test (cf. next week) Normally distributed errors not critical for the fit itself Least squares method yields the best estimate at any rate the t distribution is based on the normality assumption, but actually on the normality assumption for the estimate ˆβ, and this is often okay in case of sufficiently many observations, due to : The central limit theorem, which states that sums (and certain other functions) of many observations get more and more normally distributed.
21 Transformation logarithm, squareroot,inverse Why take logarithms? of the explaining variable to achieve linearity: if there are successive doublings, which have a constant effect: Use logarithms to the basis 2! of the response variable to achieve linearity to achieve variance homogeneity Var(log(y)) Var(y) y 2 i.e. a constant coefficient of variation of Y means a constant variance of log(y ) (the natural logarithm, to the basis e). Regression diagnostics Are the conclusions supported by the whole data set? Or are there observations with rather large influence on the results? Leverage = potential influence (hat-matrix) h ii = 1 n + (x i x) 2 S xx Observations with extreme x values can have a large influence on the results,... y but they do not necessarily! if they lie nicely with respect to the regression line, i.e. have a little residual x
22 Influencing observations Those, which have a combination of high leverage large residual Regression diagnostics Leave out the i th person and find new estimates, ˆα (i) and ˆβ (i) Calculate Cook s distance, an aggregate measure for the changes in the parameter estimates Split Cook s distance into its coordinates and specify: By how many SE s is ˆβ changed, e.g., if the i th person is left out? What do with influencing observations? leave them out? Regression with the whole data set ŷ(x) = x, ˆβ =0.022(0.010) t = =2.1, p = Regression without obs. no. 13: ŷ(x) = x, ˆβ =0.011(0.010) t = =1.05, p = state a measure for their influence?
23 dfbetas(lm.velo)[, 1] dfbetas(lm.velo)[, 2] Changes in parameter estimates and predicted values (leaving out ith obs.) blood.glucose dfbetas(lm.velo)[, 1] dfbetas(lm.velo)[, 2] dffits(lm.velo) blood.glucose blood.glucose Outliers Observations, which don t fit into the relationship they are not necessarily influencing they don t necessarily have a large residual Predicted residuals Residuals, which are obtained at each x i,if the corresponding observation (x i,y i )is excluded from the estimation. used for detecting outliers PRESS: Predicted Residuals SS What to do with outliers? look more closely at them, they are often quite interesting When can we exclude them? if they lie quite far away, i.e. have high leverage remember to distinguish the conclusions accordingly! if one can find the reason and then all these should be excluded!
24 Model checking and Diagnostics in SAS Analyst In the regression setting, use Save Data tick at Create and save diagnostics data insert (click Add) the quantities to be saved (typically: Predicted, Residual, Student, Rstudent, Cookd, Press). Double-click at Diagnostics Table in the project tree Save that by clicking File/Save as By SAS Name
Statistics for exp. medical researchers Regression and Correlation
Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLinear models Analysis of Covariance
Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor
More informationLinear models Analysis of Covariance
Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationLinear regression and correlation
Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationsociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income
Scatterplots Quantitative Research Methods: Introduction to correlation and regression Scatterplots can be considered as interval/ratio analogue of cross-tabs: arbitrarily many values mapped out in -dimensions
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationSimple Linear Regression
Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationMeasuring relationships among multiple responses
Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.
More informationTHE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS
THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationBiostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich
Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More informationRegression and correlation
6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationOverview Scatter Plot Example
Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationCh. 16: Correlation and Regression
Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More information6. Multiple regression - PROC GLM
Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationWORKSHOP 3 Measuring Association
WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More informationCorrelation and Regression
Correlation and Regression 1 Overview Introduction Scatter Plots Correlation Regression Coefficient of Determination 2 Objectives of the topic 1. Draw a scatter plot for a set of ordered pairs. 2. Compute
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationSTAT2012 Statistical Tests 23 Regression analysis: method of least squares
23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationOutline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping
Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More information6. CORRELATION SCATTER PLOTS. PEARSON S CORRELATION COEFFICIENT: Definition
6. CORRELATION Scatter plots Pearson s correlation coefficient (r ). Definition Hypothesis test & CI Spearman s rank correlation coefficient rho (ρ) Correlation & causation Misuse of correlation Two techniques
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationAn overview of applied econometrics
An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationMy data doesn t look like that..
Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing
More informationMATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours
01 This question paper consists of 11 printed pages, each of which is identified by the reference. Only approved basic scientific calculators may be used. Statistical tables are provided at the end of
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationStatistics 5100 Spring 2018 Exam 1
Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all
More information