Lecture 1: Linear Models and Applications
|
|
- Colin Foster
- 5 years ago
- Views:
Transcription
1 Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
2 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation and testing in linear models Regression diagnostics c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
3 Introduction to linear models Goal of regression models is to determine how a response variable depends on covariates. A special class of regression models are linear models. The general setup is given by Data Responses Covariates (Y i, x i1,..., x ik ), i = 1,..., n Y = (Y 1,..., Y n ) T x i = (x i1,..., x ik ) T (known) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
4 Example: Life Expectancies from 40 countries Source: The Economist s Book of World Statistics, 1990, Time Books, The World Almanac of Facts, 1992, World Almanac Books LIFE.EXP TEMP URBAN HOSP.POP COUNTRY Life Expectancy at Birth Temperature in degrees Fahrenheit Percent of population living in urban areas No. of hospitals per population The name of the country Which is the response and which are the covariates? c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
5 Linear models (LM) under normality assumption Y i = β 0 + β 1 x i β k x ik + ɛ i, ɛ i N(0, σ 2 ) iid, i = 1,.., n, where the unknown β 0,..., β k regression parameters and the the unknown error variance σ 2 needs to be estimated. Note E(Y i ) is a linear function in β 0,..., β k. E(Y i ) = β 0 + β 1 x i β k x ik V ar(y i ) = V ar(ɛ i ) = σ 2 variance homogeneity c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
6 LM s in matrix notation Y = Xβ + ɛ, X R n p, p = k + 1, β = (β 0,..., β k ) T E(Y) = Xβ V ar(y) = σ 2 I n Under the normal assumption we have Y N n (Xβ, σ 2 I n ), where N n (µ, Σ) is the n-dimensional normal distribution with mean vector µ and covariance matrix Σ. The matrix X is also called the design matrix. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
7 Exploratory data analysis (EDA) Consider the ranges of the responses and covariates. discrete, group covariate levels if they are sparse. When covariates are Plot the covariates against the responses. These scatter plots should look linear. Otherwise consider transformations of the covariates. To check if the constant variance assumption is reasonable the scatter plot of covariates against the responses should be contained in a band. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
8 Example: Life expectancies: Data summaries > summary(health.data) LIFE.EXP TEMP URBAN Min. :37 Min. :36 Min. :14 1st Qu.:56 1st Qu.:52 1st Qu.:42 Median :67 Median :61 Median :62 Mean :65 Mean :62 Mean :59 3rd Qu.:73 3rd Qu.:73 3rd Qu.:76 Max. :79 Max. :85 Max. :96 NA s : 1 HOSP.POP Min. : st Qu.: Median : Mean : rd Qu.: Max. : NA s : NA=not available (missing data) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
9 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen Example: Life expectancies: EDA LIFE.EXP TEMP URBAN HOSP.POP LIFE.EXP decreases if TEMP increases LIFE.EXP increases if URBAN increases LIFE.EXP increases if HOSP.POP increases Check for nonlinearities and nonconstant variances
10 Example: Life expectancies: Transformations France UnitedStates Japan Australia NewZealand Argentina LIFE.EXP Egypt China Tonga PapuaNewGuinea Norway LIFE.EXP Morocco CostaRica Egypt China Tonga Australia Norway PapuaNewGuinea India HOSP.POP India log(hosp.pop) Logarithm makes relationship more linear. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
11 Parameter estimation of β Under the normality assumption it is enough to minimize Q(β) := Y Xβ 2 for β R p to calculate the maximum likelihood estimate (MLE) of β. An estimator which minimizes Q(β) is also called a least square estimator (LSE). If the matrix X R n p has full rank p the minimum of Q(β) is taken at ˆβ = (X T X) 1 X T Y SS Res := n T (Y i x i ˆβ) 2 = Q(ˆβ) Residual Sum of Squares i=1 T Ŷ i := x i ˆβ fitted value for µi := E(Y i ) e i := Y i Ŷi raw residual It follows that E(ˆβ) = β and V ar(ˆβ) = σ 2 (X T X) 1, therefore one has under normality ˆβ N p (β, σ 2 (X T X) 1 ). c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
12 The MLE of σ 2 is given by Parameter estimation of σ 2 ˆσ 2 := 1 n n T (Y i x i ˆβ) 2 = 1 n i=1 n (Y i Ŷi) 2 i=1 One can show that the estimator is biased, in particular E(ˆσ 2 ) = n p n σ2. An unbiased estimator for σ 2 is therefore given by s 2 := 1 n p n i=1 (Y i Ŷi) 2 = n n pˆσ2 Under normality it follows that (n p)s 2 σ 2 = n (Y i Ŷi) 2 i=1 σ 2 χ 2 n p is independent of ˆβ. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
13 Goodness of fit in linear models Consider SS T otal = n (Y i Ȳ )2, SS Reg = n (Ŷi Ȳ )2, SS Res = n (Y i Ŷi) 2, i=1 i=1 i=1 where Ȳ = 1 n n i=1 Y i. It follows that SS T otal = SS Reg + SS Res, therefore R 2 := SS Reg = 1 SS Res SS T otal SS T otal explains the proportion of variability explained by the regression. R 2 is called the multiple coefficient of determination. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
14 Statistical hypothesis tests in LM s: F-test The restriction H 0 : Cβ = d for C R r,p, rank(c) = r, d R r is called general linear hypothesis with alternative H 1 : not H 0. Consider the restricted least square problem SS H0 = min β { Y Xβ 2 2 Cβ }{{ = d } under H 0 } = SS Res + (Cˆβ d) T [C(X T X) 1 C T ] 1 (Cˆβ d) Under normality and that H 0 : Cβ = d is valid we have F = (SS H 0 SS Res )/r SS Res /(n p) F r,n p, therefore the F-test is given by reject H 0 at level α if F > F 1 α,r,n p. Here F n,m denotes the F distribution with n numerator and m denominator degree of freedom and F 1 α,n,m is the corresponding 1 α quantile. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
15 Statistical hypothesis tests in LM s: t-test Consider for each regression parameter β j the hypothesis H 0j : β j = 0, against H 1j : not H 0j and use the corresponding F statistics F j := (SS H 0j SS Res )/1 SS Res /(n p) F 1,n p under H 0j. It follows that F j = T 2 j, where T j := ˆβ j ŝe( ˆβ j ) t n p under H 0j, where ŝe( ˆβ j ) is the estimated standard error of ˆβ j i.e. ŝe( ˆβ j ) = s (X T X) 1 jj. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
16 Weighted Least Squares Y = Xβ + ɛ, ɛ N n (0, σ 2 W ), X R n p, p = k + 1, β = (β 0,..., β k ) T An MLE of β (weighted LSE) is given by ˆβ = (X T W 1 X) 1 X T W 1 Y. It follows that E(ˆβ) = β and V ar(ˆβ) = σ 2 (X T W 1 X) 1. The MLE of σ 2 is given by ˆσ 2 := 1 n (Y X ˆβ) T W 1 (Y X ˆβ) = 1 n et W 1 e. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
17 Example: Life expectancies: First models Using the Original Scale for HOSP.POP > attach(health.data) > f1_life.exp ~ TEMP + URBAN + HOSP.POP > r1_lm(f1,na.action = na.omit) > summary(r1) Call: lm(formula = f1, na.action = na.omit) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) TEMP URBAN HOSP.POP Residual standard error: 7.3 on 29 degrees of freedom Multiple R-Squared: 0.46 F-statistic: 8.3 on 3 and 29 degrees of freedom, the p-value is Correlation of Coefficients: (Intercept) TEMP URBAN TEMP URBAN HOSP.POP TEMP and HOSP.POP are nonsignificant at the 10% level. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
18 Example: Life expectancies: First models Using the Logarithmic Scale for HOSP.POP > f2_life.exp ~ TEMP + URBAN + log(hosp.pop) > r2_lm(f2,na.action = na.omit) > summary(r2) Call: lm(formula = f2, na.action = na.omit) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) TEMP URBAN log(hosp.pop) Residual standard error: 7 on 29 degrees of freedom Multiple R-Squared: 0.5 F-statistic: 9.7 on 3 and 29 degrees of freedom, the p-value is Correlation of Coefficients: (Intercept) TEMP URBAN TEMP URBAN log(hosp.pop) TEMP is still nonsignificant, while log(hosp.pop) is now significant at the 10% level. 50% of the total variability is explained by the regression. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
19 ANOVA= ANalysis Of VAriance ANOVA Tables in LM s Model Formula SS null Y i = β 0 + ɛ i SSE 0 = n i=1 (Y i ˆβ 0 ) 2 = n i=1 (Y i Y ) 2 full Y = Xβ + ɛ SSE(X) = Y X ˆβ 2 reduced Y = X 1 β 1 + ɛ SSE(X 1 ) = Y X 1ˆβ1 2 ANOVA table: Source df Sum of Sq MS F regression p 1 SS Reg = SSE 0 SSE(X) MS Reg = SS Reg p 1 residual n p SSE(X) s 2 = SSE(X) n p total n 1 SSE 0 MS Reg s 2 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
20 Hierarchical ANOVA Tables in LM s ANOVA table: Y = Xβ + ɛ = X 1 β 1 + X 2 β 2 + ɛ X R n,p, X 1 R n,p 1, X 2 R n,p 2, p 1 + p 2 = p Source df Sum of Sq MS F X 1 p 1 1 SSE 0 SSE(X 1 ) MS(X 1 ) = SSE 0 SSE(X 1 ) p 1 1 F (X 1 ) = MS(X 1 ) SSE(X 1 )/(n p 1 ) X 2 given X 1 p 2 SSE(X 1 ) SSE(X) MS(X 2 X 1 ) = SSE(X 1 ) SSE(X) p 2 F (X 2 X 1 ) = MS(X 2 X 1 ) s 2 residual n p SSE(X) s 2 = SSE(X) n p total n 1 SSE 0 F (X 1 ) tests H 0 : β 1 = 0 in Y = X 1 β 1 + ɛ (overall F-test in reduced model). F (X 2 X 1 ) tests H 0 : β 2 = 0 in Y = X 1 β 1 + X 2 β 2 + ɛ (partial F-test). c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
21 > r <- lm(y ~x1+x2+x3) > anova(r) ANOVA Tables in Splus Source df Sum of Sq MS F x1 1 SS 1 = SSE 0 SSE(x 1 ) MS 1 = SS 1 1 MS 1 s 2 x2 1 SS 2 = SSE(x 1 ) SSE(x 1, x 2 ) MS 2 = SS 2 1 x3 1 SS 3 = SSE(x 1, x 2 ) SSE(x 1, x 2, x 3 ) MS 3 = SS 3 1 MS 2 s 2 MS 3 s 2 residual n p SSE(x 1, x 2, x 3 ) s 2 = SSE(X) n 4 These F-values cannot be interpreted as partial F-values, since the denominator always assumes the full model. You need to replace it by s 2 1 = SSE(x 1) n 2, s 2 2 = SSE(x 1,x 2 ) n 3 and s 2 3 = SSE(x 1,x 2,x 3 ) n 4 = s 2, respectively. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
22 Example: Life Expectancies: ANOVA Tables ANOVA Table (Original Scale for HOSP.POP) > anova(r1) Analysis of Variance Table Response: LIFE.EXP Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) TEMP URBAN HOSP.POP Residuals ANOVA Table (Log Scale for HOSP.POP) > anova(r2) Analysis of Variance Table Response: LIFE.EXP Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) TEMP URBAN log(hosp.pop) Residuals c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
23 Goal is to determine Regression diagnostics for LM s outliers with regard to the response (y outliers) outliers with regard to the design space covered by the covariates (x outliers or high leverage points) observations which change the results greatly when removed (influential observations). A general tool for this is to consider the hat matrix and residuals. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
24 Residuals in LM s: hat matrix Properties: H := X(X T X) 1 X T hat matrix Ŷ := X ˆβ = X(X T X) 1 X T Y = HY fitted values r := Y Ŷ = (I H)Y residual vector H is a projection matrix, i.e. H 2 = H and symmetric. E(r) = (I H)E(Y ) = (I H)Xβ = Xβ HXβ = Xβ Xβ = 0. V ar(r) = σ 2 (I H)(I H) T = σ 2 (I H) V ar(r i ) = σ 2 (1 h ii ), where h ii is i-th diagonal element of H. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
25 Standardized Residuals in LM s s i := r i 1 hii s, where s2 := Y X ˆβ 2 n p are called standardized residuals or internally studentized residuals h ii is called the leverage of observation i. Problem: The estimate s 2 is biased when outliers are present in the data, therefore one wants to estimate σ 2 without the ith observation. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
26 Jackknifed Residuals in LM s Y i = X i β + ɛ i model without ith obs X i = design matrix without ith obs ˆβ i := corresponding LSE of β Ŷ i, i := x T i ˆβ i ith fitted value without using ith obs r i, i := Y i Ŷi, i predictive residual n s 2 j=1,j j i := (Y j x T ˆβ j i ) 2 estimate of n p 1 t i := σ 2 in model without ith obs r i externally studentized or jackknifed residuals 1 hii s i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
27 Jackknifed Residuals in LM s For a fast computation one can use linear algebra to show that ˆβ i = ˆβ (XT X) 1 x i r i 1 h ii r i, i = r i 1 h ii s 2 i = (n p)s2 r 2 i 1 h ii n p 1 Residual plots are plots where the observation number i or the jth covariate x ij is plotted against r i, s i or t i. There is no problem with the model if these plots look like a band. Deviations from this structure might indicate nonlinear regression effects or a violation of the variance homogeneity. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
28 Residuals in Splus r_lm(y~x1+x2+...+xk, x=t) Raw Residuals: r i = Y i Ŷi Residual Standard Error: s = Y X ˆβ 2 n p e_resid(summary(r)) sigma_summary(r)$sigma External Residual Standard Error: s i sigmai_lm.influence(r)$sigma Hat Diagonals: h ii hi_hat(r$x) or hi_lm.influence(r)$hat Internally Studentized Residuals: s i = r i si_e/(sigma*((1-hi)^.5)) 1 hii s Externally Studentized Residuals: r t i = i ti_e/(sigmai*((1-hi)^.5)) 1 hii s i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
29 High leverage in LM s Since h ii = x i (X T X) 1 x T i one can interpret h ii as standardized measure between x i and x. Further we have therefore we call points with n h ii = p, i=1 as high leverage points or x-outliers. h ii > 2p n c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
30 Outlier detection in LM s To detect y outliers, we can use t i, since it can be written as t i = Y i Ŷ i = ˆδ i for δ i := Y i Ŷ i. V ar(y i Ŷ i) V ar(δ i ) In the mean shift outlier model for obs l given by Y i = x t iβ + γd li + ɛ i with D li = { 1 l = i 0 l i. Therefore reject H : γ = 0 versus K : γ 0 at level α iff t l > t n p 1,1 α 2, where t m,α is the α quantil of a t m distribution. Problem: If one uses this outlier test for every obs. l, one has the problem of multiple testing, since one needs to substitute α by α n. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
31 Leverage and Influence Figure A Figure B y with obs 20 without obs 20 y with obs 10 without obs x x Figure C y with obs 20 without obs x c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
32 Leverage and Influence Figure A shows that obs 20 is a high leverage point, which is also influential. Figure B shows that obs 10 is not a high leverage point and not influential. Figure C shows that obs 20 is a high leverage point, which is not influential. A real valued measure for influential obs. is the Cook s distance, which is defined as D i := (ˆβ ˆβ i ) T (X T X)(ˆβ ˆβ i ) ps 2, which can be interpreted as the shift in the confidence region when the ith obs. is deleted. Therefore obs. with D i > 1 are considered influential. The cutpoint 1 corresponds that the ith obs moves ˆβ to the edge of a 50 % confidence region. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
33 Ex:Life expectancies: Residuals and hat values Externally Studentized Residuals ex. stud. residuals UnitedStates Chile Index Hat Diagonals hat diagonals Australia PapuaNewGuinea Norway Index c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
34 Ex:Life expectancies: Residual plots raw residuals UnitedStates Chile TEMP raw residuals UnitedStates Chile URBAN raw residuals UnitedStates Chile log(hosp.pop) e UnitedStates Chile raw residuals Bottom right plot plots r i versus r i 1 to detect correlated errors. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
35 > r_lm(y~x) > plot(r) Splus function plot() for LM s Plot 1: Ŷ i versus r i to check for linearity and variance homogeneity. Plot 2: Ŷ i versus r i to check for large residuals. Plot 3: Ŷ i versus Y i should cluster around x = y line if fit is good. Plot 4: qq-plot of r i to check normality of errors Plot 5: Residual-Fit (r-f) spread plot: f-values (:= (1:n).5 n against ordered fitted values (left panel) and ordered residuals (right panel). When a good fit is present the spread of the residuals should be much smaller than that of the fitted values. Plot 6: i versus D i to check for influential observations. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
36 Ex:Life expectancies: Output from plot() > plot(r2) Residuals Fitted : TEMP + URBAN + log(hosp.pop) sqrt(abs(resid(r2))) Fitted : TEMP + URBAN + log(hosp.pop) LIFE.EXP Fitted : TEMP + URBAN + log(hosp.pop) Residuals Quantiles of Standard Normal 8 LIFE.EXP Fitted Values Residuals f-value Cook s Distance Index c (Claudia Czado, TU Munich) ZFS/IMS Göttingen
Ch 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationLikelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42
Likelihood Ratio Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 42 Consider the Likelihood Ratio Test of H 0 : Cβ = d vs H A : Cβ d. Suppose
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationLecture One: A Quick Review/Overview on Regular Linear Regression Models
Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationGeneral Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35
General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationBeam Example: Identifying Influential Observations using the Hat Matrix
Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the
More informationChapter 12: Multiple Linear Regression
Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationSTAT 3A03 Applied Regression Analysis With SAS Fall 2017
STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationLecture 2. Simple linear regression
Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationMultivariate Linear Regression Models
Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationMultiple Predictor Variables: ANOVA
Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More information1 The classical linear regression model
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationSCHOOL OF MATHEMATICS AND STATISTICS
SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More information> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))
School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More information20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36
20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More information