Lecture 1: Linear Models and Applications

Size: px
Start display at page:

Download "Lecture 1: Linear Models and Applications"

Transcription

1 Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

2 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation and testing in linear models Regression diagnostics c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

3 Introduction to linear models Goal of regression models is to determine how a response variable depends on covariates. A special class of regression models are linear models. The general setup is given by Data Responses Covariates (Y i, x i1,..., x ik ), i = 1,..., n Y = (Y 1,..., Y n ) T x i = (x i1,..., x ik ) T (known) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

4 Example: Life Expectancies from 40 countries Source: The Economist s Book of World Statistics, 1990, Time Books, The World Almanac of Facts, 1992, World Almanac Books LIFE.EXP TEMP URBAN HOSP.POP COUNTRY Life Expectancy at Birth Temperature in degrees Fahrenheit Percent of population living in urban areas No. of hospitals per population The name of the country Which is the response and which are the covariates? c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

5 Linear models (LM) under normality assumption Y i = β 0 + β 1 x i β k x ik + ɛ i, ɛ i N(0, σ 2 ) iid, i = 1,.., n, where the unknown β 0,..., β k regression parameters and the the unknown error variance σ 2 needs to be estimated. Note E(Y i ) is a linear function in β 0,..., β k. E(Y i ) = β 0 + β 1 x i β k x ik V ar(y i ) = V ar(ɛ i ) = σ 2 variance homogeneity c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

6 LM s in matrix notation Y = Xβ + ɛ, X R n p, p = k + 1, β = (β 0,..., β k ) T E(Y) = Xβ V ar(y) = σ 2 I n Under the normal assumption we have Y N n (Xβ, σ 2 I n ), where N n (µ, Σ) is the n-dimensional normal distribution with mean vector µ and covariance matrix Σ. The matrix X is also called the design matrix. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

7 Exploratory data analysis (EDA) Consider the ranges of the responses and covariates. discrete, group covariate levels if they are sparse. When covariates are Plot the covariates against the responses. These scatter plots should look linear. Otherwise consider transformations of the covariates. To check if the constant variance assumption is reasonable the scatter plot of covariates against the responses should be contained in a band. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

8 Example: Life expectancies: Data summaries > summary(health.data) LIFE.EXP TEMP URBAN Min. :37 Min. :36 Min. :14 1st Qu.:56 1st Qu.:52 1st Qu.:42 Median :67 Median :61 Median :62 Mean :65 Mean :62 Mean :59 3rd Qu.:73 3rd Qu.:73 3rd Qu.:76 Max. :79 Max. :85 Max. :96 NA s : 1 HOSP.POP Min. : st Qu.: Median : Mean : rd Qu.: Max. : NA s : NA=not available (missing data) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

9 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen Example: Life expectancies: EDA LIFE.EXP TEMP URBAN HOSP.POP LIFE.EXP decreases if TEMP increases LIFE.EXP increases if URBAN increases LIFE.EXP increases if HOSP.POP increases Check for nonlinearities and nonconstant variances

10 Example: Life expectancies: Transformations France UnitedStates Japan Australia NewZealand Argentina LIFE.EXP Egypt China Tonga PapuaNewGuinea Norway LIFE.EXP Morocco CostaRica Egypt China Tonga Australia Norway PapuaNewGuinea India HOSP.POP India log(hosp.pop) Logarithm makes relationship more linear. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

11 Parameter estimation of β Under the normality assumption it is enough to minimize Q(β) := Y Xβ 2 for β R p to calculate the maximum likelihood estimate (MLE) of β. An estimator which minimizes Q(β) is also called a least square estimator (LSE). If the matrix X R n p has full rank p the minimum of Q(β) is taken at ˆβ = (X T X) 1 X T Y SS Res := n T (Y i x i ˆβ) 2 = Q(ˆβ) Residual Sum of Squares i=1 T Ŷ i := x i ˆβ fitted value for µi := E(Y i ) e i := Y i Ŷi raw residual It follows that E(ˆβ) = β and V ar(ˆβ) = σ 2 (X T X) 1, therefore one has under normality ˆβ N p (β, σ 2 (X T X) 1 ). c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

12 The MLE of σ 2 is given by Parameter estimation of σ 2 ˆσ 2 := 1 n n T (Y i x i ˆβ) 2 = 1 n i=1 n (Y i Ŷi) 2 i=1 One can show that the estimator is biased, in particular E(ˆσ 2 ) = n p n σ2. An unbiased estimator for σ 2 is therefore given by s 2 := 1 n p n i=1 (Y i Ŷi) 2 = n n pˆσ2 Under normality it follows that (n p)s 2 σ 2 = n (Y i Ŷi) 2 i=1 σ 2 χ 2 n p is independent of ˆβ. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

13 Goodness of fit in linear models Consider SS T otal = n (Y i Ȳ )2, SS Reg = n (Ŷi Ȳ )2, SS Res = n (Y i Ŷi) 2, i=1 i=1 i=1 where Ȳ = 1 n n i=1 Y i. It follows that SS T otal = SS Reg + SS Res, therefore R 2 := SS Reg = 1 SS Res SS T otal SS T otal explains the proportion of variability explained by the regression. R 2 is called the multiple coefficient of determination. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

14 Statistical hypothesis tests in LM s: F-test The restriction H 0 : Cβ = d for C R r,p, rank(c) = r, d R r is called general linear hypothesis with alternative H 1 : not H 0. Consider the restricted least square problem SS H0 = min β { Y Xβ 2 2 Cβ }{{ = d } under H 0 } = SS Res + (Cˆβ d) T [C(X T X) 1 C T ] 1 (Cˆβ d) Under normality and that H 0 : Cβ = d is valid we have F = (SS H 0 SS Res )/r SS Res /(n p) F r,n p, therefore the F-test is given by reject H 0 at level α if F > F 1 α,r,n p. Here F n,m denotes the F distribution with n numerator and m denominator degree of freedom and F 1 α,n,m is the corresponding 1 α quantile. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

15 Statistical hypothesis tests in LM s: t-test Consider for each regression parameter β j the hypothesis H 0j : β j = 0, against H 1j : not H 0j and use the corresponding F statistics F j := (SS H 0j SS Res )/1 SS Res /(n p) F 1,n p under H 0j. It follows that F j = T 2 j, where T j := ˆβ j ŝe( ˆβ j ) t n p under H 0j, where ŝe( ˆβ j ) is the estimated standard error of ˆβ j i.e. ŝe( ˆβ j ) = s (X T X) 1 jj. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

16 Weighted Least Squares Y = Xβ + ɛ, ɛ N n (0, σ 2 W ), X R n p, p = k + 1, β = (β 0,..., β k ) T An MLE of β (weighted LSE) is given by ˆβ = (X T W 1 X) 1 X T W 1 Y. It follows that E(ˆβ) = β and V ar(ˆβ) = σ 2 (X T W 1 X) 1. The MLE of σ 2 is given by ˆσ 2 := 1 n (Y X ˆβ) T W 1 (Y X ˆβ) = 1 n et W 1 e. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

17 Example: Life expectancies: First models Using the Original Scale for HOSP.POP > attach(health.data) > f1_life.exp ~ TEMP + URBAN + HOSP.POP > r1_lm(f1,na.action = na.omit) > summary(r1) Call: lm(formula = f1, na.action = na.omit) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) TEMP URBAN HOSP.POP Residual standard error: 7.3 on 29 degrees of freedom Multiple R-Squared: 0.46 F-statistic: 8.3 on 3 and 29 degrees of freedom, the p-value is Correlation of Coefficients: (Intercept) TEMP URBAN TEMP URBAN HOSP.POP TEMP and HOSP.POP are nonsignificant at the 10% level. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

18 Example: Life expectancies: First models Using the Logarithmic Scale for HOSP.POP > f2_life.exp ~ TEMP + URBAN + log(hosp.pop) > r2_lm(f2,na.action = na.omit) > summary(r2) Call: lm(formula = f2, na.action = na.omit) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) TEMP URBAN log(hosp.pop) Residual standard error: 7 on 29 degrees of freedom Multiple R-Squared: 0.5 F-statistic: 9.7 on 3 and 29 degrees of freedom, the p-value is Correlation of Coefficients: (Intercept) TEMP URBAN TEMP URBAN log(hosp.pop) TEMP is still nonsignificant, while log(hosp.pop) is now significant at the 10% level. 50% of the total variability is explained by the regression. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

19 ANOVA= ANalysis Of VAriance ANOVA Tables in LM s Model Formula SS null Y i = β 0 + ɛ i SSE 0 = n i=1 (Y i ˆβ 0 ) 2 = n i=1 (Y i Y ) 2 full Y = Xβ + ɛ SSE(X) = Y X ˆβ 2 reduced Y = X 1 β 1 + ɛ SSE(X 1 ) = Y X 1ˆβ1 2 ANOVA table: Source df Sum of Sq MS F regression p 1 SS Reg = SSE 0 SSE(X) MS Reg = SS Reg p 1 residual n p SSE(X) s 2 = SSE(X) n p total n 1 SSE 0 MS Reg s 2 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

20 Hierarchical ANOVA Tables in LM s ANOVA table: Y = Xβ + ɛ = X 1 β 1 + X 2 β 2 + ɛ X R n,p, X 1 R n,p 1, X 2 R n,p 2, p 1 + p 2 = p Source df Sum of Sq MS F X 1 p 1 1 SSE 0 SSE(X 1 ) MS(X 1 ) = SSE 0 SSE(X 1 ) p 1 1 F (X 1 ) = MS(X 1 ) SSE(X 1 )/(n p 1 ) X 2 given X 1 p 2 SSE(X 1 ) SSE(X) MS(X 2 X 1 ) = SSE(X 1 ) SSE(X) p 2 F (X 2 X 1 ) = MS(X 2 X 1 ) s 2 residual n p SSE(X) s 2 = SSE(X) n p total n 1 SSE 0 F (X 1 ) tests H 0 : β 1 = 0 in Y = X 1 β 1 + ɛ (overall F-test in reduced model). F (X 2 X 1 ) tests H 0 : β 2 = 0 in Y = X 1 β 1 + X 2 β 2 + ɛ (partial F-test). c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

21 > r <- lm(y ~x1+x2+x3) > anova(r) ANOVA Tables in Splus Source df Sum of Sq MS F x1 1 SS 1 = SSE 0 SSE(x 1 ) MS 1 = SS 1 1 MS 1 s 2 x2 1 SS 2 = SSE(x 1 ) SSE(x 1, x 2 ) MS 2 = SS 2 1 x3 1 SS 3 = SSE(x 1, x 2 ) SSE(x 1, x 2, x 3 ) MS 3 = SS 3 1 MS 2 s 2 MS 3 s 2 residual n p SSE(x 1, x 2, x 3 ) s 2 = SSE(X) n 4 These F-values cannot be interpreted as partial F-values, since the denominator always assumes the full model. You need to replace it by s 2 1 = SSE(x 1) n 2, s 2 2 = SSE(x 1,x 2 ) n 3 and s 2 3 = SSE(x 1,x 2,x 3 ) n 4 = s 2, respectively. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

22 Example: Life Expectancies: ANOVA Tables ANOVA Table (Original Scale for HOSP.POP) > anova(r1) Analysis of Variance Table Response: LIFE.EXP Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) TEMP URBAN HOSP.POP Residuals ANOVA Table (Log Scale for HOSP.POP) > anova(r2) Analysis of Variance Table Response: LIFE.EXP Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) TEMP URBAN log(hosp.pop) Residuals c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

23 Goal is to determine Regression diagnostics for LM s outliers with regard to the response (y outliers) outliers with regard to the design space covered by the covariates (x outliers or high leverage points) observations which change the results greatly when removed (influential observations). A general tool for this is to consider the hat matrix and residuals. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

24 Residuals in LM s: hat matrix Properties: H := X(X T X) 1 X T hat matrix Ŷ := X ˆβ = X(X T X) 1 X T Y = HY fitted values r := Y Ŷ = (I H)Y residual vector H is a projection matrix, i.e. H 2 = H and symmetric. E(r) = (I H)E(Y ) = (I H)Xβ = Xβ HXβ = Xβ Xβ = 0. V ar(r) = σ 2 (I H)(I H) T = σ 2 (I H) V ar(r i ) = σ 2 (1 h ii ), where h ii is i-th diagonal element of H. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

25 Standardized Residuals in LM s s i := r i 1 hii s, where s2 := Y X ˆβ 2 n p are called standardized residuals or internally studentized residuals h ii is called the leverage of observation i. Problem: The estimate s 2 is biased when outliers are present in the data, therefore one wants to estimate σ 2 without the ith observation. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

26 Jackknifed Residuals in LM s Y i = X i β + ɛ i model without ith obs X i = design matrix without ith obs ˆβ i := corresponding LSE of β Ŷ i, i := x T i ˆβ i ith fitted value without using ith obs r i, i := Y i Ŷi, i predictive residual n s 2 j=1,j j i := (Y j x T ˆβ j i ) 2 estimate of n p 1 t i := σ 2 in model without ith obs r i externally studentized or jackknifed residuals 1 hii s i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

27 Jackknifed Residuals in LM s For a fast computation one can use linear algebra to show that ˆβ i = ˆβ (XT X) 1 x i r i 1 h ii r i, i = r i 1 h ii s 2 i = (n p)s2 r 2 i 1 h ii n p 1 Residual plots are plots where the observation number i or the jth covariate x ij is plotted against r i, s i or t i. There is no problem with the model if these plots look like a band. Deviations from this structure might indicate nonlinear regression effects or a violation of the variance homogeneity. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

28 Residuals in Splus r_lm(y~x1+x2+...+xk, x=t) Raw Residuals: r i = Y i Ŷi Residual Standard Error: s = Y X ˆβ 2 n p e_resid(summary(r)) sigma_summary(r)$sigma External Residual Standard Error: s i sigmai_lm.influence(r)$sigma Hat Diagonals: h ii hi_hat(r$x) or hi_lm.influence(r)$hat Internally Studentized Residuals: s i = r i si_e/(sigma*((1-hi)^.5)) 1 hii s Externally Studentized Residuals: r t i = i ti_e/(sigmai*((1-hi)^.5)) 1 hii s i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

29 High leverage in LM s Since h ii = x i (X T X) 1 x T i one can interpret h ii as standardized measure between x i and x. Further we have therefore we call points with n h ii = p, i=1 as high leverage points or x-outliers. h ii > 2p n c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

30 Outlier detection in LM s To detect y outliers, we can use t i, since it can be written as t i = Y i Ŷ i = ˆδ i for δ i := Y i Ŷ i. V ar(y i Ŷ i) V ar(δ i ) In the mean shift outlier model for obs l given by Y i = x t iβ + γd li + ɛ i with D li = { 1 l = i 0 l i. Therefore reject H : γ = 0 versus K : γ 0 at level α iff t l > t n p 1,1 α 2, where t m,α is the α quantil of a t m distribution. Problem: If one uses this outlier test for every obs. l, one has the problem of multiple testing, since one needs to substitute α by α n. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

31 Leverage and Influence Figure A Figure B y with obs 20 without obs 20 y with obs 10 without obs x x Figure C y with obs 20 without obs x c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

32 Leverage and Influence Figure A shows that obs 20 is a high leverage point, which is also influential. Figure B shows that obs 10 is not a high leverage point and not influential. Figure C shows that obs 20 is a high leverage point, which is not influential. A real valued measure for influential obs. is the Cook s distance, which is defined as D i := (ˆβ ˆβ i ) T (X T X)(ˆβ ˆβ i ) ps 2, which can be interpreted as the shift in the confidence region when the ith obs. is deleted. Therefore obs. with D i > 1 are considered influential. The cutpoint 1 corresponds that the ith obs moves ˆβ to the edge of a 50 % confidence region. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

33 Ex:Life expectancies: Residuals and hat values Externally Studentized Residuals ex. stud. residuals UnitedStates Chile Index Hat Diagonals hat diagonals Australia PapuaNewGuinea Norway Index c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

34 Ex:Life expectancies: Residual plots raw residuals UnitedStates Chile TEMP raw residuals UnitedStates Chile URBAN raw residuals UnitedStates Chile log(hosp.pop) e UnitedStates Chile raw residuals Bottom right plot plots r i versus r i 1 to detect correlated errors. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

35 > r_lm(y~x) > plot(r) Splus function plot() for LM s Plot 1: Ŷ i versus r i to check for linearity and variance homogeneity. Plot 2: Ŷ i versus r i to check for large residuals. Plot 3: Ŷ i versus Y i should cluster around x = y line if fit is good. Plot 4: qq-plot of r i to check normality of errors Plot 5: Residual-Fit (r-f) spread plot: f-values (:= (1:n).5 n against ordered fitted values (left panel) and ordered residuals (right panel). When a good fit is present the spread of the residuals should be much smaller than that of the fitted values. Plot 6: i versus D i to check for influential observations. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

36 Ex:Life expectancies: Output from plot() > plot(r2) Residuals Fitted : TEMP + URBAN + log(hosp.pop) sqrt(abs(resid(r2))) Fitted : TEMP + URBAN + log(hosp.pop) LIFE.EXP Fitted : TEMP + URBAN + log(hosp.pop) Residuals Quantiles of Standard Normal 8 LIFE.EXP Fitted Values Residuals f-value Cook s Distance Index c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42 Likelihood Ratio Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 42 Consider the Likelihood Ratio Test of H 0 : Cβ = d vs H A : Cβ d. Suppose

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Lecture One: A Quick Review/Overview on Regular Linear Regression Models Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35 General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Beam Example: Identifying Influential Observations using the Hat Matrix

Beam Example: Identifying Influential Observations using the Hat Matrix Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

CHAPTER 2 SIMPLE LINEAR REGRESSION

CHAPTER 2 SIMPLE LINEAR REGRESSION CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

1 The classical linear regression model

1 The classical linear regression model THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE)) School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36 20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information