Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Size: px
Start display at page:

Download "Lecture One: A Quick Review/Overview on Regular Linear Regression Models"

Transcription

1 Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics (t-test, F-test and ANOVA table, Residuals, Outliers, Other diagnostic tools) 1

2 1 Model Specification Basics Statistical regression models are tools used to study (un-deterministic) statistical relationship between one variable with other variables. Statistical relationship describes general patterns between variables of interest. Unlike models in other fields, the pattern that describes a statistical relationship is not deterministic. Forexample,ason sheightisrelatedtothehisfather sandmother s heights. But one can never deterministically tell the son s height based oneitherhisfatherormother sheightorboth. Linear model formulation Response(y) = linear function in covariates(x) + random error [Insert Figure 1.1 here] Twokindsofvariablesinaregressionmodel: Response variable or dependent variable (often denoted as y). This variable is treated as a random variable in regression. It usually requires to take numerical, often normal-like, values in a linear regression model. Covariate variables or independent variables or explanatory variables (often denoted as x). The variables are treated as fixed and not-random inalinearregressionmodel. 1,2 Onejustificationisfromexperimental design where the covariates or independent variables are pre-determined values. The types of a covariate variables can be either numeric (not-necessarily normal-like), categorical (e.g., answers yes/no, blood types A/B/O/AB, ratings excellent/good/fair/bad, etc.) or mixed. All covariate variables are categorical = ANOVA model. 1 Sometimes statisticians also view the covariate variables as random variable. But in this case, one has to specify the regression model through conditional probability by conditional on the covariates. 2 Here, independent in independent variable has nothing to do with the concept of independent random variables. 2

3 Mixed type of covariate variables = ANCOVA model. (both numerical and categorical) Linear Regression Models Simple linear regression model is the model where we only have a single covariate variable. y i =β 0 +β 1 x i +ɛ i, fori=1,2,...,n, where β = (β 0,β 1 ) T are the unknown regression parameters and ɛ s are random errors. Commonassumptionsonɛ sareoneofthefollowing: Therandomerrorsɛ 1,...,ɛ n areuncorrelatedwithmeanzeroandcommonvarianceσ 2 (unknownparameter). (Stronger) The random errors ɛ 1,...,ɛ n are independently identically distributed(i.i.d.) randomvariablesfromn(0,σ 2 ). Theβ 0 isknownastheinterceptandβ 1 istheslopeparameter. Onecan interpret β 1 are the expected/average increase of response y with a unit increase in the covariate x, when the covariate variable x is a numerical type. Whenthecovariateiscategorical(e.g. inanovamodels),β 1 canbe interpreted as the expected/average change in response due to the choice of the category. [Bring back Figure1.1. here] Multiple linear regression model(some books also call it general liner regression model) is the case we have multiple covariate variables. In a matrix form,itis y=xβ+ɛ where y= y 1 y 2... y n, ɛ= ɛ 1 ɛ 2... ɛ n 1,x 11,x 12,...,x 1p, X= 1,x 21,x 22,...,x 2p..., andβ= 1,x n1,x n2,...,x np Theassumptionsonɛ sarethesameasbefore,andtheunknownparameters aretheregressionparametersβ andtheerrorvarianceσ 2. 3 β 0 β 1... β p

4 2 Estimation We need to estimate the unknown quantities/parameters in the model from data. Least Squares(LS) Estimators The least squares (LS) estimators of β are the values (in the parameter space) that minimize the sum of squared distances between the responses and the regression line. where Q(β)= ˆβ LS =argmin β Q(β) or ˆβLS solveequations β Q(β)=0 n {y i (β 0 +β 1 x i β p x ip )} 2 =(y Xβ) T (y Xβ). i=1 In particular, ˆβ LS =(X T X) 1 X T y. [Insert Figure 1.1b here] TheLSestimatorofσ 2 is ˆσ 2 LS= 1 n (p+1) n (y i ŷ i ) 2 = i=1 1 n (p+1) (y Xˆβ LS ) T (y Xˆβ LS ). Fromnowon,wearegoingtouses 2 todenotethis ˆσ 2 LS. Maximum Likelihood Estimators (MLEs) When the random errors ɛ s are normally distributed, the likelihood functionofobservedresponsesy=(y 1,y 2,...,y n ) T is L(β,σ 2 y)= n f(y i β,σ 2 )= i=1 n i=1 1 2πσ 2 e 1 2¾ 2(y i µ i ) 2 = whereµ i =β 0 +β 1 x i1 +β 2 x i β p x ip. 1 (2πσ 2 ) n 2 TheMLEsof ˆβ andσ 2 maximizesthelikelihoodfunction,i.e., (ˆβ MLE,ˆσ 2 MLE)=argmax β,σ L(β,σ 2 y). 4 e 1 2¾ 2 n i=1 (y i µ i ) 2,

5 In particular, ˆβ MLE =(X T X) 1 X T y= ˆβ LS, andˆσ MLE= 2 1 n (y Xˆβ LS ) T (y Xˆβ LS )= n (p+1) ˆσ LS 2 n Fromnowon,wearegoingtouse ˆβ todenoteeither ˆβ MLE or ˆβ LS. [Insert Figure1.2.'s here (SAS code and output)] Some(theoretical/nice) properties of the estimators When the random errors ɛ s are normally distributed, one can prove that ˆβ N(β,σ 2 (X T X) 1 )and n (p+1) n s 2 σ 2 χ 2 n (p+1). Inaddition, ˆβ ands 2 are independent. From this, we know that both ˆβ and ˆσ 2 LS are unbiased estimators, i.e., Eˆβ=β andes 2 =σ 2. (Gauss-MarkovTheorem)Theestimators ˆβ=(X T X) 1 X T yarethebest linear unbiased estimators(blues) of β. 5

6 3 Hypothesis Testing and Diagnostics Hypothesis Testing Student t-test for jth regression parameter β j. One often wants to test the hypothesis whether the jth covariate has significant contribution to response or not. In math term, the hypotheses are H 0 : β j = 0 versus H 1 : β j 0. To formally conduct the test, the t-statistic T j = ˆβ j se(ˆβ j ) is comparedtothestudentt-distributionwithn (p+1)degreesoffreedom. Whentheabsolutevalue T j islarge(correspondingtosmallp-value),we rejectthenullhypothesish 0 :β j =0. [Bring back Figure1.2. here (SAS output)] F-test for regression line. To test whether the regression line is significant or not(i.e., any of the slope parameters are significantly different from zero ornot),oneturntothef-testandtheanalysisofvariancetable(anova table). In math term, the hypotheses are H 0 : β 1 = β 2 =... = β p = 0 versush 1 :atleastoneoftheβ sisnotequaltozero. Toformallyconduct the test, the F-statistic F = ni=1 (ŷ i ȳ) 2 /p ni=1 (y i ŷ i ) 2 /(n p 1) is compared to the F-distribution with degrees of freedom p and n (p+1). If the F-statistic F is large(corresponding to small p-value), we reject the null hypothesis H 0 : β 1 = β 2 =... = β p = 0. The results are usually summarized in an ANOVA table. [Bring back Figure1.2. here (SAS output)] The quantity n i=1 (y i ȳ) 2 isknownas he total sum of squares(ssto), ni=1 (y i ŷ i ) 2 isknownasthesumofsquaresduetoerror(ortheresidual sum of squares) (SSE) and n i=1 (ŷ i ȳ) 2 is knownasthe sum of squares duetoregression(ssr).here,ȳisthesamplemeanofy i s. Onecanproveanequationthat SSTO=SSR+SSE. (1) Also,SSR χ 2 p,sse χ 2 n p 1,SSTO χ 2 n 1,andSSRandSSEare independent. 6

7 Theratio R= SSR SSTO is called the coe±cient of multiple determination. The closer it is to 1, the larger the proportion of variability is explained by the regression. [Bring back Figure1.2. here (SAS output)] F-test fornested models. Toanswerthequestionwhetherasubsetofthe regression parameters are significant or not in a linear regression model, we use the F-test for nested models. Without loss of generality, assume that wewanttotestwhetherthelastp q(q<p)parametersarezeroornot, thatis,h 0 :β q+1 =...=β p =0versusH 1 :Oneofthesep qparameters arenotequalto0. UnderthenullhypothesisH 0,theregressionmodelhas q covariatesandwecallthismodelareducemodel(r);themodelunder H 1 haspcovariatesandwecallitafullmodel(f).undereachmodel,we havethesumofsquares(ss)equation(1). Toformallyconductthetest, the F-statistic F = {SSR(F) SSR(R)}/(p q) SSE(F)/(n p 1) iscomparedtothef-distributionwithdegreesoffreedomp qandn (p+ 1). If the F-statistic F is large(corresponding to p-value small), we reject thenullhypothesish 0 :β q+1 =...=β p =0. Theresultsaresummarized in an ANOVA table sometimes. Diagnostics for Influential Points ThematrixH=X(X T X) 1 X T isknownasthehatmatrix. Itisfromthe equationthatŷ=xˆβ=x(x T X) 1 X T y=hy. Denote by h ii the ith diagonal element of H. All 0 h ii < 1 and ni=1 h ii =p+1. Wecallthoseobservationshigh leverage points, iftheircorrespondingh ii values are very large (close 1). High leverage points are also called x- outliers(outliers in the covariate space). Ifh ii islarge(closeto1) var(e i )=σ 2 (1 h ii )issmall(here,residual e i = y i ŷ i ) the fitted regression line should be close to y i the location of the ith observation is important and it is very influential to the fitted regression line! 7

8 Heuristicrule: anyobservationwithitscorrespondingh ii > 2(p+1) n is considered as a high leverage point. Residuals and Deletion Diagnostic Theresiduale i =y i ŷ i hasvariance(1 h ii )σ 2. Astandardizedformofthe residual(called standardized residual or externally standardized residual) is r i = e i (1 hii )s 2. - One objection to these usual definition of residuals in diagnostic is thattheydonottakeaccountoftheinfluenceanoutlierhasonthemodel fitting. Onewaytotakeaccountoftheinfluenceofanoutlieristore-fitthemodel with the offending observation deleted. This leads to the deletion residual e i(i) =y i ŷ i(i). Here, the subscript(i) is taken(here and subsequently) to mean that the model has been re-fitted without the ith observation. For example, ŷ i(i) meansthepredictedvalueofy i basedonthemodelfitinwhichobservation i is omitted. Remark: Onecanprovethate i(i) =e i /(1 h ii ). So,infactwedonot need to repeat the entire analysis to compute these deletion residuals. The variance of the deletion residual e i(i) is var(e i(i) ) = σ 2 /(1 h ii ). A standardized form of the deletion residual (called studentized residual or internally standardized residual) is t i = { e i(i) s 2 (i) /(1 h ii ) =...=e n p 2 i (1 h ii )(n p 1)s 2 e 2 i }1 2. Thet i valueisoftencomparedtothequantilesofthestudentt n p 2 distribution. Iftheabsolutevalue t i islargerthanthe1 α/2percentileof thetdistributiont n p 2 (1 α/2), theithobservationisapotentialoutlier (at α level). Sometimes, people prefer more conservative rule with a Bonferroni justification in which α is replaced by α/n. 8

9 [Bring back Figure1.2. here (SAS output)] Measures of Influence DFBETAS measurestheeffectofdeletingasingleobservation(say, ith observation) on the estimator of a particular regression parameter(say, kth parameterβ k ). (DFBETAS) k(i) = ˆβ k ˆβ k(i) s (i) ckk, wherec kk isthekthdiagonalentryof(x T X) 1. -Remark: Thekeypartisthedifferenceofthebetasinthenumerator; thedenominatorissortofastandardization(note,se(ˆβ k )=σ c kk ). TheInterpretationisthatifthisislarge,thentheithobservationhas an undue influence on the kth parameter estimate. Heuristicrule: toflaganyvalueof DFBETAS thatisbiggerthan1 forasmalldatasetor2/ nforalargedataset. DFFITS measures how the fitted value of y i is affected by deleting the observation i from the fit. (DFFITS) i = ŷi ŷ i(i) s (i) hii =t i hii 1 h i i It can be viewed in some degree as a combined measure of influence thattakesintoaccountthehighleveragepointaswellasthesizeofthe residual. Heuristic rule: one recommendation is to consider an observation influentialifdffits isgreaterthan1inthecaseofsmalldatasets,or 2 p/nforlargedatasets. Cook's Distance is intended as an overall measure of the influence of the ith observation on all parameter estimates. D i = (ˆβ ˆβ (i) ) T X T X(ˆβ ˆβ (i) ) (p+1)s 2 = e 2 i (p+1)s 2 h ii (1 h ii ) 2. Heuristic rule: a suggested interpretation is to identify the ith observationasinfluentialifd i isgreatthatthe10%quantileofthef p+1,n (p+1) 9

10 distribution, and highly influential if it is greater that 50% quantile of thef p+1,n (p+1) distribution. [Bring back Figure1.2. here (SAS output)] Graphical Methods The regular residual Plots. Useful residual plots include For fixed j (fixed covariate variable), plot residuals e i against the jth covariatevaluesx ij. Thisservesasacheckonthelinearityoftherelationship. Plotresidualse i againstanyx-variablesomittedfromthemodel. This servesasacheckwhethertheomittedvariableshouldindeedbeinthe model or not. Plotresidualse i againstfittedvaluesŷ i. Thishelpstodetectwaysin which the model changes with the overall level of the process. The general idea is that any systematic variability detectable in the plot potentially represents some feature of data not captured by the current model. If the model is a good fit, then the residuals should look totally random. Atkinson(1985)proposedtouseahalfnormalplot,inwhichtheabsolute values of the deletion residuals are ordered and the kth largest absolute residualisplottedagainstz( k+n 1/8 ). Here,z(α)istheα-percentileofthe 2n+1/2 standard normal distribution. Thehalfnormalplotcanbeusedtodetectinfluentialpointsorpotential outliers. To identify outlying residuals, Atkinson suggested a simulation technique which can produce a simulated envelope. This envelope constitutes a band suchthattheplottedresidualsarelikelytofallwithinthebandifthefitted model is correct. Some details of the simulation are Step1. Foreachofthencases,generateanewy i fromn(ŷ i,s 2 ). Fitalinear regression to the new set ofdata ofsize n, with thecovariateskeepingtheir original values. Order the absolute deletion residuals in ascending order. Step2. Repeatstep1eighteentimes(total19times). 10

11 Step3. Foreachk,k=1,2,...,n,assemblethekthsmallestabsoluteresiduals from the 19 groups and determine the minimum value, the mean, and the maximum value of these 19 kth smallest absolute residuals. Step4. Plottheseminimum,mean,andmaximumvaluesagainstz( k+n 1/8 2n+1/2 ) on the half-normal probability plot for the original data and connect the points by straight lines. Partial residual plot is useful for identifying the nature of relationship for an independent variable under consideration, say the kth covariate x k = (x 1k,x 2k,...,x nk ) T,foradditiontotheregressionmodel. Definethepartial residuals for the kth covariate as: e [k] i =e i +ˆβ k x ik fori=1,2,...,n. Thepartialresidualplotforthekthcovariateplotse [k] i againstx ik. Ifthe responsey i islinearlyrelatedtothekthcovariatex ik,thepointsshouldbe moreorlessaroundastraightline. Other Topics(not going to details) Transformations of variables on the responses(e.g., Box-Cox transformation and the variance stabilization transformation) on the covariates(e.g. Box-Tidwell transformation) Modelselectioncriteria(e.g.,C p,aic,bic)andstepwiseregressions Robust regressions(available softwares only in R or Splus) 11

12 4 References Neter, J., Kutner, M, Nachtsheim, C. and Wasserman, W.(1996). Applied Linear regression Models. 3rd edition. Irwin, Chicago. 12

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Lecture 34: Properties of the LSE

Lecture 34: Properties of the LSE Lecture 34: Properties of the LSE The following results explain why the LSE is popular. Gauss-Markov Theorem Assume a general linear model previously described: Y = Xβ + E with assumption A2, i.e., Var(E

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Chapter 10 Building the Regression Model II: Diagnostics

Chapter 10 Building the Regression Model II: Diagnostics Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41 10.1 Model Adequacy for a Predictor Variable-Added

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

((n r) 1) (r 1) ε 1 ε 2. X Z β+

((n r) 1) (r 1) ε 1 ε 2. X Z β+ Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Correlation in Linear Regression

Correlation in Linear Regression Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Throughout this chapter we consider a sample X taken from a population indexed by θ Θ R k. Instead of estimating the unknown parameter, we

More information

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling

More information

Chapter 6 Multiple Regression

Chapter 6 Multiple Regression STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA Chapter 11 - Lecture 1 Single Factor ANOVA April 7th, 2010 Means Variance Sum of Squares Review In Chapter 9 we have seen how to make hypothesis testing for one population mean. In Chapter 10 we have seen

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information